Multiphase Data Pump Functionality

Data Transformation Services

Data Transformation Services

Multiphase Data Pump Functionality

Data Transformation Services (DTS) provides advanced users the capability to add programs that customize the data pump at various phases of its operation. By customizing the operation of the data pump, you can add a wide range of functionality to a package. For example:

  • Row-level restartability, or the ability to restart the data pump without having to reload large numbers of rows that were already processed. You can add functions to save processed row data, batches, or partial batches, writing that data back to the source or a status table for later use.

  • Individual handling of types of insert or transformation errors. For example, you could add special error handlers to customize problems handling NULL data or constraint violations.

  • Customizing data pump initialization or termination steps. For example, on data pump initialization you could write out a schema header to a file prior to writing XML (Extensible Markup Language) data to the file.
Data Pump Process

The multiphase data pump option allows you to access the data pump at several points during its operation and add functionality. When copying a row of data from source to a destination, the data pump follows this basic process:

  • Fetches a row of source data.

  • Optionally applies transformations to the row.

  • Attempts to insert the row of data to the destination buffer.

  • Processes exceptions.

  • Stores the results in a batch, if a batch is specified.

  • Repeats the previous steps until the batch is filled, then commits the data stored in the buffer and starts the next batch, or rolls back the batch.

After the data pump processes the last row of data, the task is finished and the data pump operation terminates.

Data Pump Phases

The following figure shows the data pump phases and how they map to the data flow.

In the figure, the data flow (detailed in the expanded, gray area of the figure):

  • Originates in the Row Transform phase, where a row of data is copied from the source and any transformations are applied.

  • If the transformation is successful, the data for the row moves to the next phase. If the transformation is unsuccessful, a Transform Failure occurs, and the next row of data is fetched.

  • Moves to the Post Row phase, where an attempt is made to copy the row to the destination buffer. There are two possible outcomes for this operation: Insert Success or Insert Failure.

  • Ends in the Batch Complete phase, where the row data is stored in a batch and eventually inserted or not committed, depending on whether a batch was configured for the task, and the size of the batch.

Three additional data pump phases not directly tied to the processing of row data are shown in the figure: Pre Source, Post Source, and Pump Complete. Each of these phases covers events prior to or after the row-by-row copying of data, transformation, and commit (or rollback) process. For example, the Pre Source phase occurs before the first row of data is fetched. The Post Source phase occurs after the last row of data is processed, and the Pump Complete phase occurs at the end of the transformation task.

Accessing Data Pump Phases

To display the multiphase data pump options in DTS Designer, you must select an option in SQL Server Enterprise Manager. After you have selected the multiphase data pump option, the feature will remain accessible to any packages opened in DTS Designer, for any future sessions. By default, this option is not selected.

To activate the multiphase data pump feature