Optimizing Bulk Copy Performance

Optimizing SQL Database Performance

previous page next page

Optimizing Database Performance

Optimizing Bulk Copy Performance

To bulk copy data as fast as possible, several options are available to specify how data should be bulk copied into Microsoft® SQL Server™ 2000 using the bcp utility or BULK INSERT statement, including:

Using logged and nonlogged bulk copies.
Using the bcp utility for parallel data loading.
Controlling the locking behavior.
Using batches.
Ordering data files.

For more information, see Bulk Copy Performance Considerations.

Note If possible, use the BULK INSERT statement rather than the bcp utility to bulk copy data into SQL Server. The BULK INSERT statement is faster than the bcp utility.

Two factors determine which of these options can or should be used to increase the performance of bulk-copy operations:

Amount of existing data in the table compared to the amount of data to be copied into the table.
Number and type of indexes on the table.

Additionally, these factors depend on whether data is bulk copied into a table from a single client or in parallel from multiple clients.

Loading Data into an Empty Table from a Single Client

When you are loading data into an empty table from a single client, it is recommended that you specify:

The TABLOCK hint. This causes a table-level lock to be taken for the duration of the bulk-copy operation.
A large, batch size, using the ROWS_PER_BATCH hint. A single batch representing the size of the entire file is recommended.
A nonlogged bulk-copy operation. Transaction log backups should not be created after performing a nonlogged operation. For more information, see Logged and Nonlogged Bulk Copy Operations.

Additionally, if your table has a clustered index and the data in the data file is ordered to match the clustered index key columns, bulk copy the data into the table with the clustered index already in place and specify the ORDER hint. This is significantly faster than creating the clustered index after the data is copied into the table.

If nonclustered indexes are also present on the table, drop these before copying data into the table. It is generally faster to bulk copy data into a table without nonclustered indexes, and then to re-create the nonclustered indexes, rather than bulk copy data into a table with the nonclustered indexes in place.

Loading Data into a Nonempty Table from a Single Client

When you are copying data into a table that has existing data, the recommendation to perform the bulk copy operation with the indexes in place depends on the amount of data to be copied into the table compared to the amount of existing data already in the table. As the percentage of data to be copied into the table increases (based on the amount of existing data in the table), the faster it is to drop all indexes on the table, perform the bulk copy operation, and then re-create the indexes after the data is loaded.

As a general guide, the following table shows suggested figures for the amount of data to be added to a table for various types of indexes. If you exceed these percentages, you may find it faster to drop and re-create the indexes.

Indexes	Amount of data added
Clustered index only	30%
Clustered and one nonclustered index	25%
Clustered and two nonclustered indexes	25%
Single nonclustered index only	100%
Two nonclustered indexes	60%

Loading Data in Parallel from Multiple Clients

If SQL Server is running on a computer with more than one processor and the data to be bulk copied into the table can be partitioned into separate data files, then it is recommended that data be loaded into the same table from multiple clients in parallel, thereby improving the performance of the bulk-copy operation. For example, when bulk copy loading from eight clients into one table, each client must have one input data file containing a portion of the partitioned data. To achieve maximum performance, the batch size specified for each client should be the same as the size of the client data file.

When copying data into a table from multiple clients, consider that:

All indexes on the table must be dropped first, and then re-created on the table. Consider re-creating the secondary indexes in parallel by creating each secondary index from a separate client at the same time.
Using ordered data and the ORDER hint will not affect performance because the clustered index is not present during the load.
The data must be partitioned into multiple input files, one file per client.

As with bulk-copy operations from a single client, specify:

The TABLOCK hint. This causes a table-level lock to be taken for the duration of the bulk-copy operation.
A large, batch size, using the ROWS_PER_BATCH hint. A single batch representing the size of the entire client file is recommended for each client.
Set the database option select into/bulkcopy to true to enable nonlogged operations.

Copying Data Between Computers Running SQL Server

If data is being copied from one computer running an instance of SQL Server to another, perform all bulk-copy operations using either native or Unicode native format. For more information, see Using Native, Character, and Unicode Formats.

If the source table has a clustered index or if you intend to bulk copy the data into a table with a clustered index:

Bulk copy the data out of the source table specifying a SELECT statement and an appropriate ORDER BY clause to create an ordered data file.
Use the ORDER hint when bulk copying the data into SQL Server. For more information, see Ordered Data Files.

Optimizing Bulk Copy Performance

Optimizing SQL Database Performance

Optimizing Bulk Copy Performance

Loading Data into an Empty Table from a Single Client

Loading Data into a Nonempty Table from a Single Client

Loading Data in Parallel from Multiple Clients

Copying Data Between Computers Running SQL Server

See Also

Get in touch