Enhancing Performance of DTS Packages

Data Transformation Services

Data Transformation Services

Enhancing Performance of DTS Packages

There are a number of factors that can affect the performance of Data Transformation Services (DTS) packages.

Using ActiveX Scripts

Depending on the type of transformation and your choice of scripting language, data pump operations that use Microsoft® ActiveX® transformation scripts can be up to two to four times slower than copy operations.

Using ordinal values to refer to columns in ActiveX transformations can be much faster then referring to columns by name. For example, use:

DTSSource(1) 

instead of:

DTSSource("CustomerID").

This speed improvement is not significant when the number of columns is less then 20, but can become significant when the transformation contains many columns. For fewer columns, ignore this optimization to make the script easier to read.

In addition, scripts written in Microsoft Visual Basic® Scripting Edition (VBScript) run approximately 10 percent faster than scripts written in Microsoft JScript®, which in turn run approximately 10 percent faster than scripts written in PerlScript.

Enhancing Data Pump Performance

The data pump is the transformation component of the Transform Data task and the Data Driven Query task. When you use these tasks to transform data, you may be able to enhance performance when:

  • There are large amounts of data.

  • The transformations are numerous.

  • The scripting code is complex.

To improve performance in these situations, use many-to-many mappings whenever possible. That way, you avoid mapping a separate transformation function for each column. The script engine is not invoked for each transformation. As a result, performance is faster. For more information about the column mappings used with transformations, see Mapping Column Transformations.

The DTS Import/Export Wizard creates packages with many-to-many column mappings. However, by default, the DTS Designer assigns one-to-one column mappings to transformations in order to improve readability. Therefore, when using DTS Designer, consider remapping as many transformations as possible to a many-to-many configuration. As with the use of ordinals in scripts, this optimization becomes more noticeable as the number of transformations increases. In general, if you have more than 20 transformations, you can see a noticeable performance decrease.

Data Driven Query Task vs. Transform Data Task

When Microsoft SQL Server™ is a data destination of the Transform Data task, transformations, by default, use the IRowsetFastLoad interface. When SQL Server is not the destination, the Transform Data task uses the IRowsetChange interface (typically, sending INSERT statements).

Transformations in the Data Driven Query task use the ICommand interface on the destination using prepared insert operations. This might be faster than the IRowsetChange interface that transformations in the Transform Data task use, depending on how your destination OLE DB provider implements the interfaces. However, transformations using the ICommand interface or the IRowsetChange interface will not show better performance than transformations using the IRowsetFastLoad interface.

Using Bulk Insert and bcp

The Bulk Insert task creates and executes the Transact-SQL BULK INSERT statement. BULK INSERT, supported by the Microsoft OLE DB Provider for SQL Server, is significantly faster than bcp or the data pump for performing text file import operations. Therefore, if transformations are not used, use the Bulk Insert task and achieve faster throughput.

The BULK INSERT statement is limited to file import operations.

When importing data, bcp and DTS copy operations are approximately the same speed. However, when exporting data, bcp operations can be approximately three to six times faster. Native bcp, which only applies to SQL Server data, is faster than the DTS data pump. BULK INSERT and native bcp are comparable in speed.

Using Connections

Use the ExecuteOnMainThread property only when necessary (for example, with drivers that are not thread-safe). Always avoid using ExecuteOnMainThread in ActiveX scripts to achieve concurrency unless the package or package step includes:

  • Precedence constraints, which may affect the order of operation of a DTS package.

  • Scripts that call COM objects written in Visual Basic.

  • Custom tasks that are not free threaded.

For safety, only one task can use a connection at a time. To achieve parallel execution, you must set up different connections for each task. For example, a source (A) might connect to two destinations (C) and (D), but the operation occurs serially. The same source (A) could also be configured as (B), and parallelism could be achieved by (A) connecting to (C) and (B) connecting to (D).

However, if two tasks join the package transaction and then access the same instance of SQL Server in parallel, the package will fail. For more information, see Incorporating Transactions in a DTS Package.

By default, the maximum number of concurrent steps is four. Use the MaxConcurrentSteps property to modify this setting.

Using Other SQL Server Solutions

A Transact-SQL query is the fastest method to move data without transformations or validations between tables. For multiple sources, consider running a distributed query such as a SELECT INTO statement.

In an environment using multiple packages, each DTS package must be run as a separate process using dtsrun in either a batch operation or using SQL Server Agent. This makes DTS a client process. If you must run a large number of packages (for example, 1,000 or more) as a server process, consider using SQL Server 2000 replication, which provides snapshot, transactional, and merge capabilities. Also, consider using SQL Server 2000 replication if your primary task is copying just the SQL Server data that has changed.

Improving Query Performance on Large DTS Packages Stored in a Repository

When you query a large DTS package that is stored in a SQL Server 2000 Meta Data Services repository, you can achieve better results if you increase the query time-out value. This ensures that your query has time to complete the roundtrip from the repository database, even if the DTS package is very large. By default, the query time-out value is 10 seconds. Adjust the query time-out value by creating the following registry key: HKEY_LOCALMACHINE\SOFTWARE\Microsoft\Repository\Engine\ODBCQueryTimeout. For this key, set a value that is larger than the default (for example, 60 seconds or greater). The unit measurement is in seconds.