Aggregations
Aggregations are precalculated summaries of data that improve query response time by having the answers ready before the questions are asked. For example, when a data warehouse fact table contains hundreds of thousands of rows, a query requesting the weekly sales totals for a particular product line can take a long time to answer if the fact table has to be scanned to compute the answer. However, the response can be almost immediate if the summarization data to answer this query has been precalculated. Precalculation of summary data is the foundation for the rapid response times of OLAP technology.
Cubes are the way OLAP technology organizes summary data into multidimensional structures. Dimensions and their hierarchical levels reflect the queries that can be asked of the cube. Aggregations are stored in the multidimensional structure in cells at coordinates specified by the dimensions. For example, the question "What were the sales of product X in 1998 for the Northwest region?" involves three dimensions (product, time, and geography) and one measure (sales). The value in the sales cell within the cube at the coordinates (product X, 1998, Northwest) is the answer, a single numerical value.
Other questions may return multiple values, such as "What were the sales of hardware products by quarter by region for 1998?" Such queries return sets of cells from the coordinates that satisfy the specified conditions. The number of cells returned by the query depends on the number of items in the hardware level of the product dimension, the four quarters in 1998, and the number of regions in the geography dimension. If all summary data has been precalculated into aggregations, the response time of queries like this will depend only on the time needed to extract the specified cells. No calculation or reading of data from the fact table is necessary.
Precalculation of all possible aggregations in a cube results in the fastest possible response time for all queries. However, the storage and processing time required for the aggregations can be substantial. Storage requirements depend not only on the number of dimensions and measures, but also on the number of levels in the dimensions and the number of members of each level.
There is a tradeoff between storage requirements and the percentage of possible aggregations that are precalculated. If no aggregations are precalculated (0%), little storage space is required beyond that necessary to store the base data. In this case, however, query response time will vary and may be quite slow because all answers will have to be calculated from the base data for each query. Returning the single number that answers the first query ("What were the sales of product X in 1998 for the Northwest region") might require reading thousands of rows of data, extracting the sale value from each, and calculating the sum.
Microsoft® SQL Server™ 2000 Analysis Services incorporates a sophisticated algorithm to select aggregations for precalculation so that other aggregations can be quickly computed from precalculated values. For example, if the aggregations are precalculated for the month level of a time dimension, the calculation for a quarter requires only the summarization of three numbers, which can be quickly computed on the fly. This technique saves storage with little effect on query response time.
The Storage Design Wizard provides options for you to specify storage and percentage constraints to the algorithm to achieve a satisfactory tradeoff between query response time and storage requirements. For more information about using the Storage Design Wizard, see Designing Storage Options and Aggregations. The Usage-Based Optimization Wizard enables you to adjust the aggregation design for a cube by analyzing the queries that have been submitted by client applications. You can tune a cube's aggregation design to provide rapid response to frequent queries and less rapid response to infrequent queries without substantially affecting the storage needed for the cube. For more information about using the Usage-Based Optimization Wizard, see Optimizing Performance Based on Usage.
Aggregations are designed in the preceding wizards but are created when the cube or partition for which the aggregations are designed is processed. After aggregation creation, if the structure of a cube changes, or if data is added to or changed in a cube's source tables, it is usually necessary to design the cube's aggregations again and process the cube again. For more information, see Processing Cubes.
In the object hierarchy shown in Object Architecture Overview, aggregations are immediately subordinate to the partition. Aggregations are always subordinate to a specific partition, but if a cube contains only one partition, its aggregations can be considered subordinate to the cube. For this reason, if the preceding wizards are run on a cube, selection of a partition is requested only if the cube contains multiple partitions.
Aggregations for a partition are stored in one of three locations, depending on the storage mode and type of the partition.
Storage mode | Type | Location of aggregations |
---|---|---|
Multidimensional OLAP (MOLAP) or hybrid OLAP (HOLAP) | Local | Subfolder of the Data folder of the Analysis server on which the partition is defined |
Multidimensional OLAP (MOLAP) or hybrid OLAP (HOLAP) | Remote | Subfolder of the Data folder of the remote Analysis server specified when partition was created |
Relational OLAP (ROLAP) | Local or remote | Dedicated tables or indexed views in the database specified in the data source of the partition |
Note You can use the ROLAP storage mode for the data of a partition without creating aggregations in the relational database. For more information, see Set Aggregation Options (Storage Design Wizard) or Set Aggregation Options (Usage-Based Optimization Wizard).
For more information about the Data folder, see Analysis Server.
After aggregations are created, roles are usually the next objects to be created. For more information, see Roles.
If you are programming with Decision Support Objects (DSO), the class type associated with the aggregation is clsAggregation. For more information, see clsAggregation.