Copying Data Between Different Collations

Administering SQL Server

Administering SQL Server

Copying Data Between Different Collations

When bulk copying data using native or character format, bcp, by default, converts character data to:

  • OEM code page characters when exporting data from an instance of Microsoft® SQL Server™.

  • ANSI/Microsoft Windows® code page characters when importing data into an instance of SQL Server.

This can cause the loss of extended or DBCS characters during the conversion between OEM and ANSI code pages. To prevent the loss of extended or DBCS characters, bcp can create data files using:

  • Unicode native data format (-N).

  • Unicode character data format (-w).

  • A specific code page (-C).

Unicode native format and Unicode character format convert character data to Unicode during the bulk copy, resulting in no loss of extended characters.

Using the -C (code page) switch, the bcp utility can create or read data files using the code page specified by the user. For example, to bulk copy the authors2 table in the pubs database to the Authors.txt data file using code page 850, execute from the command prompt:

bcp pubs..authors2 out authors.txt -c -C850 -Sservername -Usa -Ppassword

Alternatively, using the CODEPAGE clause, the BULK INSERT statement can read data files using the code page specified by the user. For example, to bulk copy the Authors.txt data file into the authors2 table in the pubs database using code page 850, execute from a query tool such as SQL Query Analyzer:

BULK INSERT pubs..authors2 FROM 'c:\authors.txt'
WITH (
   CODEPAGE = 850
)

The following are valid values for the code page.

Code page value Description
ACP Columns of char, varchar, or text data type are converted from the ANSI/Windows code page (ISO 1252) to the SQL Server code page when importing data to an instance of SQL Server, and vice versa when exporting data from an instance of SQL Server.
OEM (default) Columns of char, varchar, or text data type are converted from the system OEM code page to the SQL Server code page when importing data to an instance of SQL Server, and vice versa when exporting data from an instance of SQL Server.
RAW This is the fastest option because no conversion from one code page to another occurs.
<value> Specific code page number (for example, 850).

Column-level Collations

In SQL Server 2000, you can specify column-level collations for bulk copy operations. These collations define how character and Unicode data is stored in the specified columns of the data file.

Users and applications specify only the collation in which the data is stored in the data file. The bulk copy components perform internally any required translations between the data file collation and the collations of the source or destination columns in the database.

On a bulk copy out operation, the column and default collation specifications define the code pages used to build all SQLCHAR data in the resulting bulk copy data file. On a bulk copy in operation, the column and default collation specifications define the code pages used to read SQLCHAR data from the source data file.

If the SORTED hint is specified on a bulk copy in operation, the collations defined for any character and Unicode columns referenced in the SORTED hint define the expected sequence of the data.

On a bulk copy in operation, you must ensure that the collation specifications you make match the collations present in the bulk copy data file.

Format files in SQL Server 2000 support an eighth column in which you can provide a collation specification that defines how the data for that column is stored in the data file:

  • "RAW" specifies the data is stored in the collation specified in the –C switch, BCPFILECP hint, or CODEPAGE option. If none of these is specified, the collation of the data file is that of the OEM code page of the bulk copy client computer.

  • "name" specifies the name of the collation used to store the data in the data file.

  • "" has the same meaning as RAW.

This is an example of a format file with column collations specified:

8.0
5
1  SQLCHAR  0   4  "/t" pub_id   1   "SQL_LATIN1_General_Cp1_CI_AS_KI_WI"
2  SQLCHAR  0  40  "/t" pub_name 2   "SQL_LATIN1_General_Cp850_BIN"
3  SQLCHAR  0  20  "/t" city     3   "RAW"
4  SQLCHAR  0   2  "/t" state    4   "RAW"
5  SQLCHAR  0  30  "/t" country  5   ""

Column collation specifications are ignored for columns that do not have SQLCHAR or SQLNCHAR specified as their host data type. Collations for SQLNCHAR columns are ignored on bulk copy out operations; they apply only to bulk copy in operations where the SQLNCHAR column is referenced in a SORTED hint. Collations apply to SQLCHAR columns on both in and out operations.

On a bulk copy out operation, the collation specification controls only the code page used to store character data in the bulk copy data file. It applies to:

  • All columns in a character mode data file.

  • Any column in a native mode file where SQLCHAR is specified as the host file data type.

  • SQLCHAR characters whose values are greater than 127 or less than 32. Collations are applied to characters whose values are between 32 and 127, but all code pages map the same characters to the values from 32 to 127, so applying different collations has no noticeable effect.

The rules for determining which collation is used on a bulk copy out are:

  • If a column collation is specified in either a format file or by using bcp_setcolfmt, the character data is stored using the ANSI code page associated with the collation. This overrides all other methods of specifying a collation.

  • If a column collation was not specified, but either the bcp –C switch or the bcp_control BCPFILECP hint was specified, all SQLCHAR data from columns having no column collation specification is stored using the code page specified in BCPFILECP or –C. Column collations are not specified for any columns when producing a character mode data file with no format file. This rules also applies when "" or "RAW" is specified for a column collation.

  • If no collations are specified at all (no column collation specifications, no –C switch, and no BCPFILECP hint), SQLCHAR data is stored using the OEM code page of the bulk copy client computer.

On a bulk copy in operation, the collation specification controls:

  • How bulk copy attempts to interpret the code page of SQLCHAR columns in the data file.

  • How bulk copy applies the ORDER hint.

For a bulk copy in operation, code page interpretation applies only to columns stored as SQLCHAR in a data file. All columns in a character mode data file are stored as SQLCHAR in a data file. It also applies to any column for which SQLCHAR is specified in a format file or using bcp_setcolfmt:

  • If a column collation is specified in a format file or using bcp_setcolfmt, the SQLCHAR data in a data file is interpreted using the ANSI code page associated with the specified column collation.

  • If a column collation is not specified, but a default code page is specified using the BULK INSERT CODEPAGE option, the bcp –C switch, or the bcp_control BCPFILECP hint, the SQLCHAR data is interpreted using the code page specified in either CODEPAGE, –C, or BCPFILECP.

  • If the user did not specify any collations (no column collation, no BULK INSERT CODEPAGE option, no bcp –C switch, no BCPFILECP hint), then data in SQLCHAR columns is interpreted using the OEM code page of the client computer.

A bulk copy in operation also uses collations to properly interpret the ORDER bulk copy hint. This applies to both SQLCHAR and SQLNCHAR columns. The data in the columns referenced by a SORTED hint must be in the sequence defined by the collation mapped to those columns.

See Also

bcp Utility

BULK INSERT.

SetCodePage Method

Unicode Character Format

Unicode Native Format