OLAP and Data Warehouses
OLAP provides a multidimensional presentation of data warehouse data, creating cubes that organize and summarize data for efficient analytical querying. The design of the data warehouse structure can affect how easily these cubes can be designed and constructed.
Microsoft® SQL Server™ 2000 Analysis Services relies on the data provided by the data warehouse to be accurate, stable, and to have referential integrity. When creating a data warehouse for use with Analysis Services, these design factors should be considered:
- Use a star schema if possible.
If a snowflake schema is needed, minimize the number of dimension tables beyond the first level from the fact table.
- Design dimension tables for the users.
Dimension tables should include meaningful information about the facts that users will want to explore, such as the color or size of a product.
- Apply commonsense normalization to dimension table design.
Unrelated data should not be combined into a single dimension table, and data should not be repeated in multiple dimension tables. For example, create a separate customer dimension instead of repeating customer information in more than one dimension table.
- Do not over-summarize in the fact table.
Retain the finest level of granularity users need to access, and keep all fact table records at the same level of detail. Analysis Services is designed to create and manage summary data from highly granular data warehouses without penalizing users in query response time.
- Use a common fact table structure for similar data.
Data intended to be used in the same cube can be stored in multiple fact tables, but those tables must have the same structure.
- Do not create auxiliary tables of summarized data.
Analysis Services precalculates summaries into structures that are designed for query efficiency. Other auxiliary summarization tables are not used.
- Create indexes on key fields.
For each dimension table, create an index on its key column. For each fact table, create a single index on the combination of columns that contain the foreign keys of the dimension tables associated with the fact table. Analysis Services uses these indexes when it loads multidimensional data structures and calculates summary data. These indexes significantly improve cube processing performance.
- Ensure referential integrity.
It is important that all facts be represented in all dimension tables. Facts in a fact table that do not have a corresponding key in a dimension table can cause errors or fact table rows to be ignored if the fact and dimension tables are used in the same cube.
- Design a data update strategy.
When data is added to or changed in the data warehouse, cubes that have been built from previous data must be updated before the new data is available to users. Incorporating additional data into cubes requires less time than rebuilding cubes when existing data changes. For more information, see Maintaining OLAP Data.