Mixed Collation Environments

SQL Server Setup Help

SQL Server Setup Help

Mixed Collation Environments

Compatibility issues can have an impact on organizations that use multiple collations to store their data. Most organizations use the same collation for all of their Microsoft® SQL Server™ 2000 databases, thereby eliminating all collation compatibility issues. Other organizations, however, must store data viewed by users who speak various languages and want to do so with a minimum of collation compatibility issues.

All character and Unicode objects (such as columns, variables, and constants) have a collation. Whenever you work with objects that have different collations and code pages, you must code your queries to comply with the rules of collation coercion. When you code a complex expression that uses operators to combine multiple simple expressions that have different collations, all of the collations must be implicitly convertible, or explicitly converted using the COLLATE clause. For more information about collation coercion, see Collation Precedence.

If you do not specify a collation in a character or Unicode expression, the default collation may vary depending on the current database setting for the connection. For example, if you do not specify a COLLATE clause on a character or Unicode constant, the constant is assigned the default collation of the current database. This means that the result of a Transact-SQL statement may have different collations when executed in the context of different databases.

If you are setting up replication, all of the databases involved in a replication network, including Publishers, Subscribers, and Distributors, must have the same code page.

The bulk copy functions, BULK INSERT, and the bcp command prompt utility support column collations. For more information, see Copying Data Between Different Collations.

Minimizing Collation Issues

If you must store character data that reflects multiple languages, you can minimize collation compatibility issues by always using the Unicode nchar, nvarchar, and ntext data types instead of the char, varchar, text data types. Using the Unicode data types eliminates code page conversion issues.

Another recommendation that minimizes collation compatibility issues is to standardize your site as either sensitive or insensitive to case and accented characters. If you always choose collations with the same case and accent sensitivity, end users experience consistent behavior across all systems. Most SQL Server 2000 sites choose to be case-insensitive and accent-sensitive. Case sensitivity also applies to the names of SQL Server objects; therefore, if you specify case-sensitive collations, all users must specify the correct case when querying the database. For example, if you have a case-sensitive server and create a table named Employees, all queries must refer to the table as Employees. References that do not use the correct case, such as EMPLOYEES or employees, are invalid.

Collations and tempdb

The tempdb database is built each time SQL Server is started, and has the same default collation as the model database, which is typically the same as the default collation of the instance. If you create a user database and specify a different default collation than model, the user database has a different default collation than tempdb. All temporary stored procedures or temporary tables are created and stored in tempdb, which means that all implicit columns in temporary tables and all coercible-default constants, variables, and parameters in temporary stored procedures have different collations than comparable objects created in permanent tables and stored procedures.

This can lead to problems with the text data type, which does not support code page conversions. For example, an instance of SQL Server 2000 defaults to the Latin1_General_CS_AS collation, and you execute these statements:

CREATE DATABASE TestDB COLLATE Estonian_CS_AS
USE TestDB
CREATE TABLE TestPermTab (PrimaryKey int PRIMARY KEY, TextCol text )

In this system, the tempdb database uses the Latin1_General_CS_AS collation with code page 1252, and TestDB and TestPermTab.TextCol use the Estonian_CS_AS collation with code page 1257. If you then execute:

USE TestDB
GO
-- Create a temporary table with the same column declarations
-- as TestPermTab
CREATE TABLE #TestTempTab (PrimaryKey int PRIMARY KEY, TextCol text )
-- This statement gets an code page conversion not allowed error
-- because the temporary table is created in tempdb, which has a 
-- different default collation than TestDB.
INSERT INTO #TestTempTab
         SELECT * FROM TestPermTab
GO

To eliminate the error you can use one of these alternatives:

  • Use the Unicode data type ntext instead of text for the two TextCol columns.

  • Specify that the temporary table column use the default collation of the user database, not tempdb. This allows the temporary table to work with similarly formatted tables in multiple databases, if that is a requirement of your system.
    CREATE TABLE #TestTempTab
       (PrimaryKey int PRIMARY KEY,
        TextCol text COLLATE database_default
       )
    
  • Specify the correct collation for the #TestTempTab column:
    CREATE TABLE #TestTempTab
       (PrimaryKey int PRIMARY KEY,
        TextCol text COLLATE Estonian_CS_AS
       )
    
Collations in BACKUP and RESTORE

If you restore a database, RESTORE uses the collation of the source database that was recorded in the backup file. The restored database has the same collation as the original database that was backed up. Individual objects within the database that have different collations also retain their original collation. The database can be restored even if the instance on which you run restore has a different default collation than the instance on which BACKUP was run.

If there is already a database with the same name on the target server, the only way to restore from the backup is to specify REPLACE on the RESTORE statement. If you specify REPLACE, the existing database is completely replaced with the contents of the database on the backup file, and the restored version of the database will have the same collation recorded in the backup file.

If you are restoring log backups, the destination database must have the same collation as the source database.

Collations and text column

If you create a table with a text column that has a different code page than the code page of the database's default collation, there are only two ways you can specify data values to be inserted into the column, or update existing values. You can:

  • Specify a Unicode constant.

  • Select a value from another column with the same code page.

Assume the following database and table:

-- Create a database with a default of code page 1252.
CREATE DATABASE TestDB COLLATE Latin1_General_CS_AS
-- Create a table with a different code page, 1253.
CREATE TABLE TestTab
   (PrimaryKey int PRIMARY KEY,
    TextCol text COLLATE Greek_CS_AS
   )

-- This INSERT statement successfully inserts a Unicode string.
INSERT INTO TestTab VALUES (1, N'abc')
-- This INSERT statement successfully inserts data by selecting
-- from a similarly formatted table in another database that uses
-- uses the Greek 1253 code page as its default.
INSERT INTO TestTab
     SELECT * FROM GreekDatabase.dbo.TestTab