SQL Server Collation Fundamentals

SQL Server Architecture

SQL Server Architecture

SQL Server Collation Fundamentals

Microsoft® SQL Server™ 2000 supports several collations. A collation encodes the rules governing the proper use of characters for either a language, such as Macedonian or Polish, or an alphabet, such as Latin1_General (the Latin alphabet used by western European languages).

Each SQL Server collation specifies three properties:

  • The sort order to use for Unicode data types (nchar, nvarchar, and ntext). A sort order defines the sequence in which characters are sorted, and the way characters are evaluated in comparison operations.

  • The sort order to use for non-Unicode character data types (char, varchar, and text).

  • The code page used to store non-Unicode character data.

    Note  You cannot specify the equivalent of a code page for the Unicode data types (nchar, nvarchar, and ntext). The double-byte bit patterns used for Unicode characters are defined by the Unicode standard and cannot be changed.

SQL Server 2000 collations can be specified at any level. When you install an instance of SQL Server 2000, you specify the default collation for that instance. Each time you create a database, you can specify the default collation used for the database. If you do not specify a collation, the default collation for the database is the default collation for the instance. Whenever you define a character column, variable, or parameter, you can specify the collation of the object. If you do not specify a collation, the object is created with the default collation of the database.

If all of the users of your instance of SQL Server speak the same language, you should pick the collation that supports that language. For example, if all of the users speak French, choose the French collation.

If the users of your instance of SQL Server speak multiple languages, you should pick a collation that best supports the requirements of the various languages. For example, if the users generally speak western European languages, choose the Latin1_General collation. When you support users who speak multiple languages, it is most important to use the Unicode data types, nchar, nvarchar, and ntext, for all character data. Unicode was designed to eliminate the code page conversion difficulties of the non-Unicode char, varchar, and text data types. Collation still makes a difference when you implement all columns using Unicode data types because it defines the sort order for comparisons and sorts of Unicode characters. Even when you store your character data using Unicode data types you should pick a collation that supports most of the users in case a column or variable is implemented using the non-Unicode data types.

A SQL Server collation defines how the database engine stores and operates on character and Unicode data. After data has been moved into an application, however, character sorts and comparisons done in the application are controlled by the Windows locale selected on the computer. The collation used for character data by applications is one of the items controlled by the Windows locale (a locale also defines other items, such as number, time, date, and currency formats). For Microsoft Windows NT® 4.0, Microsoft Windows® 98, and Microsoft Windows 95, the Windows locale is specified using the Regional Settings application in Control Panel. For Microsoft Windows 2000, the locale is specified using the Regional Options application in Control Panel. For more information about Windows locales, see Developing International Software for Windows 95 and Windows NT 4.0 in the MSDN® page at Microsoft Web site.

Multiple collations can use the same code page for non-Unicode data. For example, the 1251 code page defines a set of Cyrillic characters. This code page is used by several collations, such as Cyrillic_General, Ukrainian, and Macedonian. Although all of these collations use the same set of bits to represent non-Unicode character data, the sorting and comparison rules they apply are slightly different to handle the dictionary definitions of the correct sequence of characters in the language or alphabet associated with the collation.

Because SQL Server 2000 collations control both the Unicode and non-Unicode sort orders, you do not encounter problems caused by specifying different sorting rules for Unicode and non-Unicode data. In earlier versions of SQL Server, the code page number, the character sort order, and the Unicode collation are specified separately. Earlier versions of SQL Server also support varying numbers of sort orders for each code pages, and for some code pages support sort orders not available in Windows locales. In SQL Server 7.0, it is also possible to specify a Unicode sort order that is different from the sort order chosen for non-Unicode data. This can cause ordering and comparison operations to return different results when working with Unicode data as opposed to non-Unicode data.

See Also

COLLATE

Collation Options for International Support

Collations

Unicode Data

Using Unicode Data