Transform Data Task

Data Transformation Services

Data Transformation Services

Transform Data Task

You use the Transform Data task to copy data between a source and destination and to optionally apply column-level transformations to the data. The Transform Data task is the most basic implementation of the data pump engine in Data Transformation Services (DTS).

The Transform Data task is optimized for insert-based copying and transforming of column-level data between commercial databases, spreadsheets, and text files. You can use the task to copy and transform data between any supported OLE DB connections. Because the task handles such a wide variety of data sources and transformation scenarios, you will frequently use one or more instances of it when creating packages that consolidate data from disparate sources.

Note  If you need to bulk insert text files into Microsoft® SQL Server™ and are concerned with performance, use the Bulk Insert task. However, you cannot transform data with the Bulk Insert task. For more information, see Bulk Insert Task.

Batching Data

If the destination connection for a Transform Data task is the Microsoft OLE DB Provider for SQL Server, you can use the fast load option, which is available through the OLE DB IRowsetFastLoad interface. The fast load option supports high-performance bulk-copy processing. When the fast load option is enabled, the data pump can accept batches of transformed data.

When you batch data, the data pump writes the transformed rows to a destination buffer but does not submit them to SQL Server until either the specified batch size or the end of the source data is reached.

You can customize batch processing in a Transform Data task by writing your own functions. For more information, see Multiphase Data Pump Functionality.

You cannot use fast load with a Data Driven Query task.

Important  Also, when using the fast load option in a Transform Data task with a lookup connection, make sure your lookup connection is not the same as the source or destination connections.

Applying Transformations to Batches

Transformations are applied prior to the bulk copying process. Therefore, you can use the same column-level transformations with the fast load option that you can use without the option (for example, a Microsoft ActiveX® Script transformation).

Handling Errors and Batched Data

If a row contains errors and the Transform Data task does not detect them, the row causes the entire batch to fail on submission. Valid rows in such a batch are neither inserted nor noted as an error in the data pump exception files. The failure of an entire batch counts as only one failure toward the maximum error count.

You can control both the size of a batch and the way the data will be committed if a failure occurs. Before setting the batch size, consider the following:

  • Error descriptions that occur in a batch will be recorded in a log file, but the error rows may not be available by the time the batch rolls back. As a result, the error rows may not be logged.

  • By default, the batch size is set to 0, which means that all the rows copied from the source are placed in a single batch before being submitted to SQL Server.

    For example, if your source contains a million rows of data, and the batch size is set to the default, the data will not commit until the one-millionth row is processed. In cases such as these, you may want to commit the data in batches of one thousand, or ten thousand, rather than in a single batch of one million.

  • A batch size of one means that each row of data processed is treated as a batch.

  • Input rows containing errors detected by the Transform Data task are neither added to the current batch nor counted as rows in the batch. For example, a batch size of 10 might be filled from source rows 1 thru 12, where rows 3 and 4 contain incorrectly formatted dates and are not copied to the destination. You cannot in this case assume that the third batch of data contains source rows 21 thru 30.

Before configuring the error options for batching data, consider the following:

  • When you enter a maximum error count value for a Transform Data task, you specify the sum of row-level errors detected by the Transform Data task and batch failures. When the Max error count value is exceeded, task execution is terminated.

  • Because rows containing errors detected by the Transform Data task are discarded before batch submission, these errors do not trigger nor count as batch failures. Errors caught at the destination will fail the batch and add one to the error count regardless of how many rows are in the batch.

  • You can set an option that will trigger the submission of the current batch when the maximum error count value is exceeded. This option is useful when the only types of errors you anticipate are row-level errors that will be detected by the Transform Data task.
Batches and Package Transactions

Batches succeed and fail independently of the package transaction. For more information, see Configuring Properties for DTS Transactions.

If the Transform Data task takes part in the package transaction, any successfully added batches are submitted only after the package transaction is committed.

Configuring a Transform Data Task

In DTS Designer, configure a Transform Data task by following these steps:

  1. Configure connections. You must first establish a source and destination connection.

    To configure the connections for a Transform Data task