Tasks That Transform Data

Data Transformation Services

Data Transformation Services

Tasks That Transform Data

In Data Transformation Services (DTS), tasks that transform data are based on an architectural component called the DTS data pump.

The following DTS tasks implement the DTS data pump.

Task Description
Transform Data Task Allows point-to-point copying and transforming of data between a broad range of OLE DB-compliant data sources.
Data Driven Query Task Allows you to perform flexible, Transact-SQL based operations on data, including stored procedures and INSERT, UPDATE or DELETE statements.
ParallelDataPump Task Object Allows copying and transforming of data containing OLE DB hierarchical rowsets. This task is only accessible programmatically, through the DTS object model.

The DTS data pump, an OLE DB service provider, is a COM object that provides a set of data movement interfaces.

The data pump architecture supports:

  • High-speed batch copying of transformed or non-transformed data.

  • Use of scripting code to define transformations.

  • A variety of supplied transformations for converting string data.

  • Custom transformations, written in Microsoft® Visual C++® and compiled as COM objects, that you can access programmatically or through the DTS Designer user interface.

The DTS data pump also allows users to add programs or Microsoft ActiveX® script functions that can access specific phases of a data pump operation. For example, you can add a function that instructs the data pump to write header information to a file before the source data is copied and transformed. For more information, see Multiphase Data Pump Functionality.

Configuring a Task that Transforms Data

To configure a task that transforms data, follow these steps:

  1. Establish a connection. You need to connect, at minimum, to the source and destination data sources.

    Before configuring these tasks in DTS Designer, you must create two live connections to data sources. If you define additional connections in an ActiveX Script transformation, those connections can be made at package run time.

    For more information, see DTS Connections and Configuring a Simple Lookup Query.

  2. Perform any required data type conversions and transformations onto the data copied from the source connection.

    If you intend to transform the data, you need to map column-level transformations. You can use one of the following DTS tools:

    • The DTS Import/Export Wizard. By default, the wizard creates one or more transformation tasks (depending on the number of source tables selected for copying). To transform column or row data in the wizard, you must write a transformation script using ActiveX scripting code. To edit the transformation script, you must open the package in DTS Designer, open the properties dialog box for the task, and edit the script.

      For more information, see Using ActiveX Scripts in DTS.

    • DTS Designer. You can graphically map source columns to destination columns, select a transformation type, and apply one to a mapping. Alternatively, you can map transformations using selection boxes.

      Note  For most situations, you will only map columns with the Transform Data task. It is not recommended you change the default column mappings for a Data Driven Query task unless you are an advanced user. For more information, see Building a Data Driven Query.

Detecting Row-Level Errors

Tasks that transform data use exception files to record information about failed rows. Exception files can contain:

  • Package information, such as package name, description, and version.

  • Step execution information, including the name of the package step associated with the data pump operation and step execution times.

  • Error information, including the source of the error (for example, the data pump or a connection) and a description of the error (for example, an insert error that occurred on EmployeeData column, row 2007).

The tasks that transform data are able to detect row-level errors before the row is submitted to the database. For example, suppose an input row contains missing or incorrectly formatted data. When these tasks encounter such a row, they fail the row and do not pass it to the destination. This error counts as one failure toward the maximum error count.

Some errors, such as duplicate keys or referential integrity violations, cannot be detected at row level by these tasks. Such rows fail only after being passed to the destination. The failure is noted in the exception file, but the actual rows that failed are not logged. Thus, complete error information is not always available in the exception logs.

If you configure an exception log for a task that transforms data, step execution information is appended to the exception file you specify each time the package is run. If you specify an exception file that does not currently exist, the file will be created at package execution time. If the step associated with the task does not run, no exception file data is generated.

You also can create additional log files to capture source and destination rows that failed when a task that transforms data is executing by using the Microsoft SQL Server™ 2000 exception file options. You can use these files to examine failed rows and troubleshoot problems with the data. The source row and destination row log files have the same name as the exception file, but with the extensions ".Source" and ".Destination", respectively, appended to the name. These files are only created if source row errors or destination row errors occur during execution of the transformation task.

To configure the data pump exception files