Optimizing the Data Warehouse Database for Analysis Services Performance
The design and performance of the data warehouse database significantly affect the performance of Microsoft® SQL Server™ 2000 Analysis Services.
Analysis Services creates multidimensional presentations of data warehouse data by reading and organizing the data into multidimensional objects such as dimensions and cubes. The Analysis server uses the database relational engine to access the data warehouse database when creating and processing dimensions and cubes. Therefore, the data warehouse schema design and relational database performance have a significant effect on the ease of designing cubes and on the performance of processing cubes.
Cube Storage Modes
A cube's storage mode significantly influences the degree to which the data warehouse schema and relational database performance affect cube query performance.
- With multidimensional OLAP (MOLAP) storage, cube query performance does not depend on relational database performance because all of the cube's data is contained in the multidimensional structure used by the Analysis server.
- With hybrid OLAP (HOLAP) storage, only cube queries that require retrieval of facts from the fact table will require the Analysis server to query the relational database. Such queries are affected by database performance, but queries that do not access the fact table are unaffected.
- With relational OLAP (ROLAP) storage, all cube queries are affected by the database performance because the Analysis server must retrieve fact and aggregated data from the relational database. If dimensions also use ROLAP storage, cube query performance will depend to a large extent on the performance of the relational database. Caching in the Analysis server affects cube performance in this situation, but only to the extent that queries can be at least partially resolved from cached information.
Dimensional Modeling
The design of the data warehouse database schema should incorporate the principles of dimensional modeling so the dimension tables and fact tables represent the business data and the way clients will view and query the data. Most dimensional modeling results in a star or snowflake database schema. Such schemas facilitate cube design and reduce the number of multiple-table joins when querying the database to process dimensions and cubes. If a snowflake schema is needed, minimize the number of dimension tables beyond the first level from the fact table, and minimize the depth of the snowflake dimensions. Fact tables usually hold the vast majority of data in the data warehouse, sometimes containing hundreds of millions of rows. Fact tables should be carefully designed to eliminate duplicated data and to minimize the length of the rows.
Cube Processing
Relational database optimization techniques that improve the speed of reading the data will improve dimension and cube processing performance in Analysis Services. One of the most important optimization techniques is to design and use effective indexes on the fact and dimension tables to facilitate performance of the joins and queries Analysis Services issues when processing dimensions and cubes.
SQL Server 2000 offers a number of options and suggestions for optimizing logical and physical relational database design and query performance. Many of these techniques apply to data warehouse databases, as well as to transaction processing databases. For more information, see Optimizing Database Performance Overview.
Common Data Warehouse Techniques
Many common data warehouse design and optimization techniques apply regardless of the tools that are used to present data to clients. However, some optimization techniques that are appropriate when SQL queries are used as the presentation tool may not be necessary or appropriate if the tool is Analysis Services. In general, the granularity of the fact table should not be reduced by summarization; the number of fact table rows affects the time to process cubes but has little effect on OLAP client query response performance. Analysis Services uses sophisticated algorithms to create aggregation tables of summary data, so these need not be created in the database schema. Special bridge tables for parent-child dimensions such as organizational hierarchies are not necessary; Analysis Services can use self-referential dimension tables in their native form.
Record Size and Data Types
The size of a record affects Analysis Services performance in all areas, including cube size, processing time, server memory usage, server to client data transfer time, and client memory usage. Fact tables typically contain the vast majority of data in the data warehouse. Fact table records should be kept as short as possible and include only fields for measures and indexed key columns. Measure fields should use the smallest data type consistent with the measure data, but be sure the data type is large enough to contain summarized values to prevent overflow when aggregations are calculated. In a fact table containing millions of rows, a saving of even two bytes per record can amount to a significant reduction in table size and the time required to process the table when creating cubes.
Data Warehouse Updates
Because Analysis Services reads data warehouse data and stores it in multidimensional structures, the frequency of changes to the data warehouse data affects how often OLAP cubes and dimensions have to be reprocessed. A data warehouse update strategy should be designed to take the need for reprocessing OLAP objects into account.