Distribution Statistics

SQL Server Architecture

SQL Server Architecture

Distribution Statistics

All indexes have distribution statistics that describe the selectivity and distribution of the key values in the index. Selectivity is a property that relates to how many rows are typically identified by a key value. A unique key has high selectivity; a key value found in 1,000 rows has poor selectivity. The selectivity and distribution statistics are used by Microsoft® SQL Server™ 2000 to optimize its navigation through tables and indexed views when processing Transact-SQL statements. The distribution statistics are used to estimate how efficient an index would be in retrieving data associated with a key value or range specified in the query. The statistics for each index are not limited to a single page but are stored as a long string of bits across multiple pages in the same way image data is stored. The column sysindexes.statblob points to this distribution data. You can use the DBCC SHOW_STATISTICS statement to get a report on the distribution statistics for an index.

Distribution statistics may also be maintained for unindexed columns. These can be defined manually using the CREATE STATISTICS statement or created automatically by the query optimizer. Statistics on unindexed columns count against the limit of 249 nonclustered indexes allowed on a table.

To be useful to query optimizer, distribution statistics must be kept reasonably current. The distribution statistics should be refreshed anytime significant numbers of changes to keys occur in the index. Distribution statistics can be updated manually using the UPDATE STATISTICS statement. SQL Server 2000 can also detect when distribution statistics are out of date and update the statistics automatically. This update is performed by the task that detected that the statistics needed to be updated. The update is performed using a complex sampling method that minimizes the effect of the update on transaction throughput.

See Also

Statistical Information