Single SQL Statement Processing
Processing a single SQL statement is the most basic way that Microsoft® SQL Server™ 2000 executes SQL statements. The steps used to process a single SELECT statement that references only local base tables (no views or remote tables) illustrates the basic process.
Optimizing SELECT Statements
A SELECT statement is nonprocedural; it does not state the exact steps the database server should use to retrieve the requested data. This means the database server must analyze the statement to determine the most efficient way to extract the requested data. This is called optimizing the SELECT statement, and the component that does this is called the query optimizer.
A SELECT statement defines only:
- The format of the result set. This is specified mostly in the select list, although other clauses such as ORDER BY and GROUP BY also affect the final form of the result set.
- The tables containing the source data. This is specified in the FROM clause.
- How the tables are logically related for the purposes of the SELECT statement. This is defined in the join specifications.
- What conditions the rows in the source tables must satisfy to qualify for the SELECT statement. These are specified in the WHERE and HAVING clauses.
A query execution plan is a definition of:
- The sequence in which the source tables are accessed.
Typically, there are many sequences in which the database server can access the base tables to build the result set. For example, if the SELECT statement references three tables, the database server could first access TableA, use the data from TableA to extract matching rows from TableB, and then use the data from TableB to extract data from TableC. The other sequences in which the database server could access the tables are: TableC, TableB, TableA; or TableB, TableA, TableC; or TableB, TableC, TableA; or TableC, TableA, TableB.
- The methods used to extract data from each table.
Usually, there are different methods for accessing the data in each table. If only a few rows with specific key values are needed, the database server can use an index. If all the rows in the table are needed, the database server can ignore the indexes and do a table scan. If all the rows in a table are needed, but there is an index whose key columns are in an ORDER BY, performing an index scan instead of a table scan may save a separate sort of the result set. If a table is very small, table scans may be the most efficient method for almost all access to the table.
The process of choosing one execution plan out of several possible plans is called optimization. The query optimizer is one of the most important components of a SQL database system. While some overhead is used by the query optimizer to analyze the query and choose a plan, this overhead is saved several-fold when the query optimizer picks an efficient execution plan. For example, two construction companies can be given identical blueprints for a house. If one company spends a few days at the start to plan how they will build the house, and the other company starts building without planning, the company that takes the time to plan their project will most likely finish first.
The SQL Server query optimizer is a cost-based optimizer. Each possible execution plan has an associated cost in terms of the amount of computing resources used. The query optimizer must analyze the possible plans and choose the one with the lowest estimated cost. Some complex SELECT statements have thousands of possible execution plans. In these cases, the query optimizer does not analyze all possible combinations. Instead, it uses complex algorithms to find an execution plan that has a cost reasonably close to the theoretical minimum.
The SQL Server query optimizer does not choose only the execution plan with the lowest resource cost; it chooses the plan that returns results to the user with a reasonable cost in resources and returns the results the fastest. For example, processing a query in parallel typically uses more resources than processing it serially, but completes the query faster. The SQL Server optimizer will use a parallel execution plan to return results if the load on the server will not be adversely affected.
Query optimizer relies on distribution statistics when estimating the resource costs of different methods of extracting information from a table or index. Distribution statistics are kept for columns and indexes. They indicate the selectivity of the values in a particular index or column. For example, in a table representing cars, many cars have the same manufacturer, but each car has a unique vehicle identification number (VIN). An index on the VIN is more selective than an index on the manufacturer. If the index statistics are not current, the query optimizer may not make the best choice for the current state of the table. For more information about keeping index statistics current, see Statistical Information.
The query optimizer is important because it enables the database server adjust dynamically to changing conditions in the database without requiring input from a programmer or database administrator. This enables programmers to focus on describing the final result of the query. They can trust that the query optimizer will always build an efficient execution plan for the state of the database each time the statement is run.
Processing a SELECT Statement
The basic steps that SQL Server uses to process a single SELECT statement are:
- The parser scans the SELECT statement and breaks it into logical units such as keywords, expressions, operators, and identifiers.
- A query tree, sometimes called a sequence tree, is built describing the logical steps needed to transform the source data into the format needed by the result set.
- The query optimizer analyzes all the ways the source tables can be accessed and selects the series of steps that returns the results fastest while consuming fewer resources. The query tree is updated to record this exact series of steps, and the final, optimized version of the query tree is called the execution plan.
- The relational engine begins executing the execution plan. As steps that need data from the base tables are processed, the relational engine uses OLE DB to request that the storage engine pass up data from the rowsets requested from the relational engine.
- The relational engine processes the data returned from the storage engine into the format defined for the result set, and returns the result set to the client.
Processing Other Statements
The basic steps described for processing a SELECT statement apply to other SQL statements such as INSERT, UPDATE, and DELETE. UPDATE and DELETE statements both have to target the set of rows to be modified or deleted; the process of identifying these rows is the same process used to identify the source rows that contribute to the result set of a SELECT statement. The UPDATE and INSERT statements may both contain embedded SELECT statements that provide the data values to be updated or inserted.
Even Data Definition Language (DDL) statements such as CREATE PROCEDURE or ALTER TABLE are ultimately resolved to a series of relational operations on the system catalog tables and sometimes (such as ALTER TABLE ADD COLUMN) against the data tables.