Reading Pages

SQL Server Architecture

SQL Server Architecture

Reading Pages

The read requests generated by an instance of Microsoft® SQL Server™ 2000 are controlled by the relational engine and further optimized by the storage engine. The access method used to read pages from a table, such as a table scan, an index scan, or a keyed read, determines the general pattern of reads that will be performed. The relational engine determines the most effective access method. This request is then given to the storage engine, which optimizes the reads required to implement the access method. The thread executing the batch schedules the reads.

Table scans are extremely efficient in SQL Server 2000. The IAM pages in a SQL Server 2000 database list the extents used by a table or index. The storage engine can read the IAM to build a sorted list of the disk addresses that must be read. This allows SQL Server 2000 to optimize its I/Os as large sequential reads that are done in sequence based on their location on the disk. SQL Server 2000 issues multiple serial read-ahead reads at once for each file involved in the scan. This takes advantage of striped disk sets. SQL Server 2000 Enterprise Edition dynamically adjusts the maximum number of read ahead pages based on the amount of memory present; it is fixed in all other editions of SQL Server 2000.

One part of the SQL Server 2000 Enterprise Edition advanced scan feature allows multiple tasks to share full table scans. If the execution plan of a SQL statement calls for a scan of the data pages in a table, and the relational database engine detects that the table is already being scanned for another execution plan, the database engine joins the second scan to the first, at the current location of the second scan. The database engine reads each page once and passes the rows from each page to both execution plans. This continues until the end of the table is reached. At that point, the first execution plan has the complete results of a scan, but the second execution plan must still retrieve the data pages that occur before the point at which it joined the in-progress scan. The scan for second execution plan then wraps back to the first data page of the table and scans forward to the point at which it joined the first scan. Any number of scans can be combined in this way, the database engine will keep looping through the data pages until it has completed all the scans.

For example, assume that you have a table with 500,000 pages. UserA executes a SQL statement that requires a scan of the table. When that scan has processed 100,000 pages, UserB executes another SQL statement that scans the same table. The database engine will schedule one set of read requests for pages after 100,001, and passes the rows from each page back to both scans. When the scan reaches the 200,000th page, UserC executes another SQL statement that scans the same table. Starting with page 200,001, the database engine passes the rows from each page it reads back to all three scans. After reading the 500,000th row, the scan for UserA is complete, and the scans for UserB and UserC wrap back and start reading pages starting with page 1. When the database engine gets to page 100,000, the scan for UserB is complete. The scan for Userc then keeps going alone until it reads page 200,000, at which point all the scans have been completed.

Reading Index Pages

SQL Server 2000 reads index pages serially in key order. For example, this illustration shows a simplified representation of a set of leaf pages containing a set of keys and the intermediate index node mapping the leaf pages.

SQL Server 2000 uses the information in the intermediate index page above the leaf level to schedule serial read-ahead I/Os for the pages containing the keys. If a request is made for all the keys from 'ABC' to 'DEF', the instance of SQL Server 2000 first reads the index page above the leaf page. It does not, however, simply read each individual data page in sequence from page 504 to page 556, the last one with keys in the desired range. Instead, the storage engine scans the intermediate index page and builds a list of the leaf pages that must be read. The storage engine then schedules all the I/Os in key order. The storage engine also recognizes that pages 504/505 and 527/528 are contiguous, and performs a single scatter-gather read to retrieve the adjacent pages in one operation. When there are many pages to be retrieved in a serial operation, SQL Server schedules a block of reads at a time. When a subset of these reads is completed, SQL Server schedules an equal number of new reads until all the needed reads have been scheduled.

SQL Server 2000 uses pre-fetching to speed the processing of non-clustered indexes. The leaf rows of a non-clustered index contain pointers to the data rows containing each specific key value. As the database engine reads through the leaf pages of the non-clustered index, it also starts scheduling asynchronous reads for the data rows whose pointers have already been retrieved. This allows the database engine to start retrieving rows before it has completed the scan of the non-clustered index. This process is followed regardless of whether or not the table has a clustered index. SQL Server 2000 Enterprise Edition uses more pre-fetching than other editions of SQL Server, and the level of pre-fetching is not configurable in any edition.