You will need to create tables and other database objects to support the data extraction, cleansing, and transformation operations required to prepare the data for loading into the data warehouse. You can create a separate database for the data preparation area, or you can create these items in the data warehouse database.
The data preparation area should include tables to contain the incoming data, tables to aid in implementing surrogate keys, and tables to hold transformed data. Other tables may be required for reconciling data from diverse data sources; such tables may contain cross-reference information to identify common entities such as customer records from systems that use different keys. A variety of temporary tables may also be needed for intermediate transformations.
The specific design of the data preparation area will depend on the diversity of data sources, the degree of transformation necessary to organize the data for data warehouse loading, and the consistency of the incoming data.
Data that is ready to load into the data warehouse should be in tables that have schemas identical to the target tables in the data warehouse. If not, the data should be ready to load into the data warehouse tables through a transformation that can be accomplished in a single step as it is loaded.
The data preparation area should also contain the processes that are used to extract the data from the data sources, the processes that transform and cleanse the data, and the processes that load the data to the data warehouse. These processes may be in the form of SQL queries, stored procedures, Data Transformation Services (DTS) packages, or documents of manual instructions. As in the development of any database system, the objective is to automate as much of the process as possible and to manage and maintain the automated tools developed. Storing and maintaining the transformation processes in the data preparation area permits the use of standard database backup and restore mechanisms to preserve them.
Regardless of whether a separate database is used, creating the data preparation area involves creating tables, views, indexes, DTS packages, and other elements common to relational databases.