Designing a Data Warehouse

Creating and Using Data Warehouses

Creating and Using Data Warehouses

Designing a Data Warehouse

Designing a data warehouse is very different from designing an online transaction processing (OLTP) system. In contrast to an OLTP system in which the purpose is to capture high rates of data changes and additions, the purpose of a data warehouse is to organize large amounts of stable data for ease of analysis and retrieval. Because of these differing purposes, there are many considerations in data warehouse design that differ from OLTP database design.

Data warehouse data must be organized to meet the purpose of the data warehouse, which is rapid access to information for analysis and reporting. Dimensional modeling is used in the design of data warehouse databases to organize the data for efficiency of queries that are intended to analyze and summarize large volumes of data. The data warehouse schema is almost always very different and much simpler than the schema of an OLTP system designed using entity-relation modeling.

Verification tables used in OLTP systems to validate data entry transactions are not necessary in the data warehouse database. This is because the data warehouse data has been cleansed and verified before it is posted to the data warehouse database, and historical data is not expected to change frequently once it is in the data warehouse.

Transaction locking considerations, and transactions themselves, play very small roles in data warehouse databases. OLTP systems specialize in large volumes of data update transactions. In contrast, data warehouses specialize in rapid retrieval of information from stable data, and data updates consist primarily of periodic additions of new data.

Backup and restore strategies also differ in a data warehouse from those necessary for an OLTP system. Much of the data in a data warehouse is unchanging history and does not need repetitive backup. Backup of new data can be accomplished at the time of update, and in some situations it is feasible to do these backups from the data preparation database to minimize performance impact on the data warehouse database. Restore policies for a data warehouse might also differ from those for an OLTP, depending on how critical it is for an organization to have uninterrupted access to data warehouse data.

There are some considerations to take into account when designing the data warehouse if you are planning to use Microsoft® SQL Server™ 2000 Analysis Services for OLAP and data mining. For more information, see Analysis Services Overview and OLAP and Data Warehouses.

Data Mart Design

There are two approaches to creating a data warehouse system for an organization. A central data warehouse can be developed and implemented first with data marts created later, or data marts can be implemented such that they make up the data warehouse when their information is joined. In either approach, design must be centralized so that all of the organization's data warehouse information is consistent and usable. Data marts that adhere to central design specifications produce reports that are consistent even though the data resides in different places. For example, a sales data mart must use the same product table arranged in the same way as the inventory data mart or summary information will be inconsistent between the two.

See Also

Using Dimensional Modeling

Fact Tables

Aggregation Tables

Dimension Tables

Indexes