Transforming OLTP Data to OLAP Data Warehouses
The transformation of OLTP data so that it gives acceptable performance in an OLAP system requires these processes:
Merge Data
You must be able to merge all the data related to specific items (products, customers, employees) from multiple OLTP systems into a single OLAP system. The merge process must resolve differences in encoding between the different OLTP systems. For example, one system may assign an ID to each employee, and the other systems have no employee IDs. The merge process must be able to match common employee data from both systems, perhaps by comparing employee names and addresses. The merge process must also be able to convert data stored using different data types in each OLTP system to a single data type used in the OLAP system. You must also select which columns in the OLTP system are not relevant to an OLAP system, and exclude these columns from the merge process.
The systems providing input data for an OLAP system are not strictly limited to traditional, centrally located OLTP systems. Valuable information may be stored in various legacy locations, even in some cases including relatively small sources such as Microsoft® Excel spreadsheets stored on a file share.
Scrub Data
Merging the OLTP data into a data warehouse gives you an opportunity to scrub data. You may find that various OLTP systems spell items differently, or the merge process may uncover previously unknown spelling errors. You may find other inconsistencies, such as having different addresses for the same store, employee, or customer. These inconsistencies have to be addressed before the data can be loaded into the data warehouse for use by the OLAP system.
Aggregate Data
OLTP data records all transaction details. OLAP queries typically need summary data, or data aggregated in some fashion. For example, a query to retrieve the monthly sales totals for each product over the last year runs much faster if the database only has summary rows showing the daily or hourly sales for each product, than if the query must scan every transaction detail record for the last year.
The degree to which you aggregate the data in a data warehouse depends on a number of design factors, such as the speed requirements of your OLAP queries and the level of granularity required for your analysis. For example, if you aggregate sales details into daily summaries instead of hourly summaries, your OLAP queries would run faster, but you could only do this if you had no need to analyze sales on an hourly basis.
Organize Data in Cubes
Relational OLTP data is organized in a way that makes some analysis processing difficult and time-consuming. When OLTP data is moved into a data warehouse, it must be transformed into an organization that better supports decision support analysis. The process of building a data warehouse involves reorganizing OLTP data stored in relational tables into OLAP data stored in multidimensional cubes.
Transformation Stages and Data Warehousing Components
The process of making data available through OLAP applications typically goes through three phases:
- Extract the data from OLTP or legacy data sources into a staging area.
- Transform the data into a form usable in an OLAP system. This involves actions such as data scrubbing and aggregation.
- Load the data into a data warehouse or data mart.
The process of extracting the data from the OLTP and legacy data sources and transforming it into the warehouse servers is called the ETL process, and is typically run on a periodic basis, such as once a week or once a month.
Once the data is loaded into a data warehouse, an important part of an OLAP system is to provide facilities for decision makers to access and analyze the data in the data warehouses and data marts.
The illustration shows the general categories of components that OLAP systems use to provide these services.
Data Sources
The OLTP databases and other legacy sources of data that contain the data that must be transformed into the OLAP data in data warehouses and data marts.
Intermediate Data Stores
The combined data storage areas and processes that stage, cleanse, and transform the OLTP data into useful OLAP data.
Warehouse Servers
Warehouse servers are the computers running the relational databases that contain the data for data warehouses and data marts, and the servers that manage the OLAP data.
Business Intelligence
The sets of tools and applications that query the OLAP data and provide reports and information to the enterprise decision makers.
Meta Data
Models the organization of data and applications in the different OLAP components. Meta data describes objects such as tables in OLTP databases, cubes in data warehouses and data marts, and also records which applications reference the various pieces of data.