Dimension Tables

Creating and Using Data Warehouses

Creating and Using Data Warehouses

Dimension Tables

Dimension tables contain attributes that describe fact records in the fact table. Some of these attributes provide descriptive information; others are used to specify how fact table data should be summarized to provide useful information to the analyst. Dimension tables contain hierarchies of attributes that aid in summarization. For example, a dimension containing product information would often contain a hierarchy that separates products into categories such as food, drink, and nonconsumable items, with each of these categories further subdivided a number of times until the individual product SKU is reached at the lowest level.

Dimensional modeling produces dimension tables in which each table contains fact attributes that are independent of those in other dimensions. For example, a customer dimension table contains data about customers, a product dimension table contains information about products, and a store dimension table contains information about stores. Queries use attributes in dimensions to specify a view into the fact information. For example, a query might use the product, store, and time dimensions to ask the question "What was the cost of nonconsumable goods sold in the northeast region in 1999?" Subsequent queries might drill down along one or more dimensions to examine more detailed data, such as "What was the cost of kitchen products in New York City in the third quarter of 1999?" In these examples, the dimension tables are used to specify how a measure (cost) in the fact table is to be summarized.

Columns in a dimension table can be used to categorize the information into hierarchical levels. For example, a dimension table for stores in the FoodMart 2000 sample database includes the following columns that specify the hierarchy levels.

Column Description
store_country Specifies the country in which the store is located. This is the country level of the hierarchy.
store_state Specifies the state in which the store is located. This is the state level of the hierarchy.
store_city Specifies the city in which the store is located. This is the city level of the hierarchy.
store_id Specifies the individual store. This is the lowest level of the hierarchy. This field contains the primary key of the store dimension table and is used to join the dimension table to the fact table.
store_name Specifies the name of the store. The values in this column are used to identify the store to users in a readable form.

Other columns not shown provide additional attribute information. For information about how dimension tables are used in OLAP cubes built using Microsoft® SQL Server™ 2000 Analysis Services, see Dimensions.

Varieties of Dimension Tables

The preceding example illustrates a dimension table that contains a balanced hierarchy that is separated into regular levels. Other types of dimension tables contain less balanced information, such as part-breakdown structures or organization charts in which the hierarchy is represented by parent-child relationships instead of an array of levels.

Surrogate Keys

It is important that primary keys of dimension tables remain stable. It is strongly recommended that surrogate keys be created and used for primary keys for all dimension tables. Surrogate keys are keys that are maintained within the data warehouse instead of keys taken from source data systems. There are several reasons for the use of surrogate keys:

  • Data tables in various source systems may use different keys for the same entity.

    Legacy systems that provide historical data might have used a different numbering system than a current online transaction processing system. A surrogate key uniquely identifies each entity in the dimension table regardless of its source key. A separate field can be used to contain the key used in the source system.

    Systems developed independently in company divisions may not use the same keys, or they may use keys that conflict with data in the systems of other divisions. This situation may not cause problems when each division independently reports summary data, but it cannot be permitted in the data warehouse where data is consolidated.

  • Keys may change or be reused in the source data systems.

    This situation is usually less likely than others, but some systems have been known to reuse keys belonging to obsolete data. However, the key may still be in use in historical data in the data warehouse, and the same key cannot be used to identify different entities.

  • Changes in organizational structures may move keys in the hierarchy.

    This can be a common situation. For example, if a salesperson is transferred from one region to another, the company may prefer to track two things: sales data for the salesperson with the person's original region for data prior to the transfer date, and sales data for the salesperson in the person's new region after the transfer date. To represent this organization of data, the salesperson's record must exist in two places in the sales force dimension table, which is not possible if the salesperson's company employee identification number is used as the primary key for the dimension table. A surrogate key allows the same salesperson to participate in different locations in the dimension hierarchy.

    In this case, the salesperson will be represented twice in the dimension table with two different surrogate keys. These surrogate keys are used to join the salesperson's records to the sets of facts appropriate to the various locations in the hierarchy occupied by the salesperson.

    The employee's identification number should be carried in a separate column in the table so information about the employee can be reviewed or summarized regardless of the number of times the employee's record appears in the dimension table.

    Dimensions that exhibit this type of change are called slowly changing dimensions.

    Another example of a situation that causes this type of change is the creation of a new version of a product, such as a reduced-fat version of a food item. The item will receive a new SKU or Uniform Product Code (UPC), but may retain most of the same attributes of the original item, which is still manufactured and sold. The appropriate use of surrogate keys can allow the two versions of the item to be summarized together or separately.

The implementation and management of surrogate keys is the responsibility of the data warehouse. OLTP systems are rarely affected by these situations, and the purpose of these keys is to accurately track history in the data warehouse. Surrogate keys are maintained in the data preparation area during the data transformation process.

Referential Integrity

Referential integrity must be maintained between all dimension tables and the fact table. Each fact record contains foreign keys that relate to primary keys in the dimension tables. Every fact record must have a related record in every dimension table used with that fact table. Missing records in a dimension table can cause facts to be ignored when the dimension table is joined to the fact table to respond to queries or for the population of OLAP cubes. Queries can return inconsistent results if records are missing in one or more dimension tables. Queries that join a defective dimension table to the fact table will exclude facts whereas queries that do not join the defective dimension table will include those facts.

Shared Dimensions

A data warehouse must provide consistent information for similar queries. One method to maintain consistency is to create dimension tables that are shared and used by all components and data marts in the data warehouse. Candidates for shared dimensions include customers, time, products, and geographical dimensions such as the store dimension in the example earlier in this topic. For example, requiring that all OLAP cubes and data marts use the same shared time dimension enforces consistency of results summarized by time.