Freeing and Writing Buffer Pages

SQL Server Architecture

SQL Server Architecture

Freeing and Writing Buffer Pages

In Microsoft® SQL Server™ 2000, one system is responsible for:

  • Writing modified buffer pages to disk.

  • Marking as free those pages that have not been referenced for some time.

SQL Server 2000 has a singly linked list containing the addresses of free buffer pages. Any thread needing a buffer page uses the first page in the free buffer list.

The buffer cache is an in-memory structure. Each buffer page has a header that contains a reference counter and an indicator of whether the page is dirty, which means the page contains modifications that have not yet been written to disk. The reference counter is incremented by 1 each time a SQL statement references the buffer page. The buffer cache is periodically scanned from the start to the end. Because the buffer cache is all in memory, these scans are very quick and require no I/O. During the scan, the reference counter in each buffer page header is divided by 4 and the remainder discarded. When the reference counter goes to 0, the dirty page indicator is checked. If the page is dirty, a write is scheduled to write the modifications to disk. Instances of SQL Server use a write-ahead log, so the write of the dirty data page is blocked while the log page recording the modification is first written to disk. After the modified page has been flushed to disk, or if the page was not dirty to start with, the page is freed. The association between the buffer page and the data page it contains is removed and the buffer is placed on the free list.

Using this process, frequently referenced pages remain in memory while buffers holding pages not referenced eventually return to the free buffer list. The instance of SQL Server determines internally the size of the free buffer list, based on the size of the buffer cache. The size cannot be configured.

When an instance of SQL Server is running on Microsoft Windows NT® or Windows® 2000, the work of scanning the buffer, writing dirty pages, and populating the free buffer list is mostly done by the individual worker threads. The worker threads perform their scans in the interval of time after they have scheduled an asynchronous read and the read completes. A thread gets the address of the next section of the buffer pool that needs to be scanned from a central data structure, then scans that section of the buffer pool while the read I/O processes asynchronously. If a write must be performed, it is also scheduled asynchronously and does not interfere with the thread's ability to process the completion of its own read.

Each instance also has a separate lazywriter thread that scans through the buffer cache. The lazywriter process sleeps for an interval of time. When it is restarted, it checks the size of the free buffer list. If the free buffer list is below a certain point (dependent on the size of the cache) the lazywriter process scans the buffer cache to reclaim unused pages and write dirty pages that have a reference count of 0. On the Windows NT and Windows 2000 operating systems, most of the work populating the free buffer list and writing dirty pages is done by the individual threads and the lazywriter thread typically finds little to do. Windows 95 and Windows 98 do not support asynchronous writes, so the lazywriter thread does the work of populating the free buffer list and writing dirty pages.

The checkpoint process also scans the buffer cache periodically and writes any dirty log or data pages to disk. The difference is that the checkpoint process does not place the buffer page back on the free list. The work of the checkpoint process is intended to minimize the number of dirty pages in memory to reduce the length of a recovery if the server fails, not to populate the free buffer list. Checkpoints typically find few dirty pages to write to disk because most dirty pages are written to disk by the worker threads or lazywriter thread in the period between two checkpoints.

Writes of log records are usually scheduled asynchronously by a logwriter thread. The exceptions are when:

  • A commit forces all pending log records for a transaction to disk.

  • A checkpoint forces all pending log records for all transactions to disk.