Selecting Collations

SQL Server Architecture

SQL Server Architecture

Selecting Collations

In Microsoft® SQL Server™ 2000, you specify a single collation name that controls all three collation attributes: the Unicode sort order, the non-Unicode code page, and the non-Unicode sort order. None of the SQL Server 2000 collations allow different comparison and sorting rules for Unicode and non-Unicode character data. There are two groups of SQL Server 2000 collations: Windows collations and SQL collations.

Windows Collations

Windows collations are collations defined for SQL Server to support Microsoft Windows® locales. By specifying a Windows collation for SQL Server, the instance of SQL Server uses the same code pages and sorting and comparison rules as an application running on a computer for which you have specified the associated Windows locale. For example, the French Windows collation for SQL Server matches the collation attributes of the French locale for Windows.

There are more Windows locales than there are SQL Server Windows collations. The names of Windows locales are based on a language and territory, for example French (Canada). Several languages, however, share common alphabets and rules for sorting and comparing characters. For example, 33 Windows locales, including all of the Portuguese, and English Windows locales, use the Latin1 code page (1252) and follow a common set of rules for sorting and comparing characters. The SQL Server Windows collation based on the Latin1_General code page and sorting rules supports all 33 of these Windows locales. Also, Windows locales specify attributes not covered by SQL Server Windows collations, such as currency, date, and time formats. Because countries such as Great Britain and the United States have different currency, date, and time formats, they require different Windows collations. They do not require different SQL Server collations because they have the same alphabet and rules for sorting and comparing characters.

SQL Collations

SQL collations are a compatibility option to match the attributes of common combinations of code page number and sort orders that have been specified in earlier versions of SQL Server. For example, for mapping a SQL Server 2000 SQL collation to what is specified in earlier versions of SQL Server, the SQL Server 2000 SQL collation SQL_Latin1_General_CP1_CI_AS matches the SQL Server version 7.0 default specification of:

  • The ISO code page 1252.

  • The dictionary order, case-insensitive character sort order.

  • The General Unicode collation.

The SQL collations available in SQL Server 2000 do not match all combinations that can be specified in earlier versions of SQL Server. For example, no SQL Server 2000 SQL collation supports a case-sensitive sort order for non-Unicode data and case-insensitive sort order for Unicode data. The earlier SQL collations that cannot be exactly specified in SQL Server 2000 are called obsolescent SQL collations.

In SQL Server 2000, you should primarily use Windows collations. You should use SQL collations only to maintain compatibility with existing instances of earlier versions of SQL Server, or to maintain compatibility in applications developed using SQL collations in earlier versions of SQL Server.

Collation Comparison and Ordering Rules

Most of the comparison and ordering rules defined in a collation are governed by the dictionary definition of the correct sequence of characters for the alphabet or language. The attributes you can control are whether comparisons and sorts of character and Unicode data should be:

  • Based on the dictionary conventions that define the correct sequence of characters in the language or alphabet associated with the collation, or based on the sequence of the binary bit patterns representing the different characters.

  • Case-sensitive or case-insensitive. For example, defining whether 'a' is equal or not equal to 'A'. If you choose case-insensitive, comparisons always ignore case, so the uppercase version of a character evaluates to being equal to the lowercase version of the character. When you choose case-insensitivity, the relative sequence in which uppercase and lowercase are sorted is undefined unless you also specify uppercase preference. Uppercase preference affects only sort operations and specifies that uppercase versions of a character come earlier in the sort sequence than lowercase versions of the same character. Uppercase preference has no affect on comparisons, so 'A' still evaluates to being equal to 'a' when uppercase preference is on. Uppercase preference can be specified only in SQL collations, not in Windows collations.

  • Sensitive or insensitive to accented characters, also known as extended characters. Accented characters are those characters that have a diacritical mark, such as the German umlaut (ë) or the Spanish tilde (~). For example, accent sensitivity defines whether 'a' is equal or not equal to 'ä'.

When you choose a collation, you can specify if you want binary behavior, or dictionary sorting that is sensitive or insensitive to case and accents:

  • In binary collations, comparisons and sorting are based strictly on the bit pattern of the characters. This is the fastest option. Because uppercase characters are stored with different bit patterns than their corresponding lowercase characters, and accented characters have different bit patterns than characters without accents, binary sort orders are always case-sensitive and accent sensitive. Binary collations also ignore dictionary sequences that have been defined for specific languages. They simply order the characters based on the relative value of the bit patterns that represent each character. While the bit patterns defined for Latin characters, such as 'A' or 'z', are such that binary sorting yields the correct results, the bit patterns for some extended characters in some code pages may be different than the ordering sequence defined in dictionaries for the language associated with a collation. This can lead to occasional ordering and comparison results that are different than what a speaker of the language might expect.

  • If you do not specify a binary collation, SQL Server uses the dictionary ordering of the collation you have chosen. Dictionary order means characters are not sorted or compared based only on their bit patterns. The collation follows the conventions of the associated language regarding the proper sequence for characters. For example, case-insensitive sort orders must use dictionary rules to determine which lowercase and uppercase bit patterns are equal.

    Although the bit patterns in a code page generally yield the correct comparison and ordering results for any language that uses the code page, the conventions for some of the languages may require different results than are generated for the bit patterns of a small number of characters. For example, the Czech, Hungarian, and Polish collations use the same code page, 1250, which was designed for the Slavic languages. Each of these languages, however, use slightly different conventions for the sequence in which accented characters should be sorted.

    If you do not specify binary sorting, all SQL Server operations follow the dictionary conventions for sorting and comparing characters. When the dictionary order is used, you can specify whether you want the collation to be sensitive or insensitive to both case and accented characters.

Case-sensitivity applies to SQL identifiers and passwords as well as to data. If you specify a binary or case-sensitive default sort order for an instance of SQL Server or database, all references to objects must use the same case with which they were created. For example, consider this table:

CREATE TABLE MyTable (PrimaryKey int PRIMARY KEY, CharColumn nchar(10))

If the CREATE TABLE statement is executed on an instance of SQL Server or database that has a case-sensitive or binary sort order, all references to the table must use the same case that was specified in the CREATE TABLE statement:

-- Object not found error because case is not correct:
SELECT * FROM MYTABLE
-- Invalid column name error because case is not correct
-- for the WHERE clause reference to the PrimaryKey column.
SELECT *
FROM MyTable
WHERE PRIMARYKEY = 123
-- Correct statement:
SELECT CharColumn
FROM MyTable
WHERE PrimaryKey = 123

See Also

Collation Options for International Support

Specifying Collations

Unicode Data

Using Unicode Data