Fact Tables

Creating and Using Data Warehouses

Creating and Using Data Warehouses

Fact Tables

Each data warehouse or data mart includes one or more fact tables. Central to a star or snowflake schema, a fact table captures the data that measures the organization's business operations. A fact table might contain business sales events such as cash register transactions or the contributions and expenditures of a nonprofit organization. Fact tables usually contain large numbers of rows, sometimes in the hundreds of millions of records when they contain one or more years of history for a large organization.

A key characteristic of a fact table is that it contains numerical data (facts) that can be summarized to provide information about the history of the operation of the organization. Each fact table also includes a multipart index that contains as foreign keys the primary keys of related dimension tables, which contain the attributes of the fact records. Fact tables should not contain descriptive information or any data other than the numerical measurement fields and the index fields that relate the facts to corresponding entries in the dimension tables.

In the FoodMart 2000 sample database provided with Microsoft® SQL Server™ 2000 Analysis Services, one fact table, sales_fact_1998, contains the following columns.

Column Description
product_id Foreign key for dimension table product.
time_id Foreign key for dimension table time_by_day.
customer_id Foreign key for dimension table customer.
promotion_id Foreign key for dimension table promotion.
store_id Foreign key for dimension table store.
store_sales Currency column containing the value of the sale.
store_cost Currency column containing the cost to the store of the sale.
unit_sales Numeric column containing the quantity sold.

In this fact table, each entry represents the sale of a specific product on a specific day to a specific customer in accordance with a specific promotion at a specific store. The business measurements captured are the value of the sale, the cost to the store, and the quantity sold.

The most useful measures to include in a fact table are numbers that are additive. Additive measures allow summary information to be obtained by adding various quantities of the measure, such as the sales of a specific item at a group of stores for a particular time period. Nonadditive measures such as inventory quantity-on-hand values can also be used in fact tables, but different summarization techniques must then be used.

Aggregation in Fact Tables

Aggregation is the process of calculating summary data from detail records. It is often tempting to reduce the size of fact tables by aggregating data into summary records when the fact table is created. However, when data is summarized in the fact table, detailed information is no longer directly available to the analyst. If detailed information is needed, the detail rows that were summarized will have to be identified and located, possibly in the source system that provided the data. Fact table data should be maintained at the finest granularity possible. Aggregating data in the fact table should only be done after considering the consequences.

Mixing aggregated and detailed data in the fact table can cause issues and complications when using the data warehouse. For example, a sales order often contains several line items and may contain a discount, tax, or shipping cost that is applied to the order total instead of individual line items, yet the quantities and item identification are recorded at the line item level. Summarization queries become more complex in this situation, and tools such as Analysis Services often require the creation of special filters to deal with the mixture of granularity.

There are two approaches that can be used in this situation. One approach is to allocate the order level values to line items based on value, quantity, or shipping weight. Another approach is to create two fact tables, one containing data at the line item level, the other containing the order level information. The order identification key should be carried in the detail fact table so the two tables can be related. The order table can then be used as a dimension table to the detail table, with the order-level values considered as attributes of the order level in the dimension hierarchy.