10.3. Specifying Character Sets and Collations

MySQL 5.0

10.3. Specifying Character Sets and Collations

There are default settings for character sets and collations at four levels: server, database, table, and column. The following description may appear complex, but it has been found in practice that multiple-level defaulting leads to natural and obvious results.

is used in clauses that specify a character set. may be used as a synonym for .

10.3.1. Server Character Set and Collation

MySQL Server has a server character set and a server collation. These can be set at server startup and changed at runtime.

Initially, the server character set and collation depend on the options that you use when you start mysqld. You can use for the character set. Along with it, you can add for the collation. If you don't specify a character set, that is the same as saying . If you specify only a character set (for example, ) but not a collation, that is the same as saying because is the default collation for . Therefore, the following three commands all have the same effect:

shell> 
shell> 
shell> 
           

One way to change the settings is by recompiling. If you want to change the default server character set and collation when building from sources, use: and as arguments for configure. For example:

shell> 

Or:

shell> 
           

Both mysqld and configure verify that the character set/collation combination is valid. If not, each program displays an error message and terminates.

The current server character set and collation can be determined from the values of the and system variables. These variables can be changed at runtime.

10.3.2. Database Character Set and Collation

Every database has a database character set and a database collation. The and statements have optional clauses for specifying the database character set and collation:

CREATE DATABASE 
    [[DEFAULT] CHARACTER SET ]
    [[DEFAULT] COLLATE ]

ALTER DATABASE 
    [[DEFAULT] CHARACTER SET ]
    [[DEFAULT] COLLATE ]

The keyword can be used instead of .

All database options are stored in a text file named that can be found in the database directory.

The and clauses make it possible to create databases with different character sets and collations on the same MySQL server.

Example:

CREATE DATABASE  CHARACTER SET latin1 COLLATE latin1_swedish_ci;

MySQL chooses the database character set and database collation in the following manner:

  • If both and were specified, then character set and collation .

  • If was specified without , then character set and its default collation.

  • If was specified without , then the character set associated with and collation .

  • Otherwise, the server character set and server collation.

The database character set and collation are used as default values if the table character set and collation are not specified in statements. They have no other purpose.

The character set and collation for the default database can be determined from the values of the and system variables. The server sets these variables whenever the default database changes. If there is no default database, the variables have the same value as the corresponding server-level system variables, and .

10.3.3. Table Character Set and Collation

Every table has a table character set and a table collation. The and statements have optional clauses for specifying the table character set and collation:

CREATE TABLE  ()
    [[DEFAULT] CHARACTER SET ] [COLLATE ]]

ALTER TABLE 
    [[DEFAULT] CHARACTER SET ] [COLLATE ]

Example:

CREATE TABLE t1 ( ... ) CHARACTER SET latin1 COLLATE latin1_danish_ci;

MySQL chooses the table character set and collation in the following manner:

  • If both and were specified, then character set and collation .

  • If was specified without , then character set and its default collation.

  • If was specified without , then the character set associated with and collation .

  • Otherwise, the database character set and collation.

The table character set and collation are used as default values if the column character set and collation are not specified in individual column definitions. The table character set and collation are MySQL extensions; there are no such things in standard SQL.

10.3.4. Column Character Set and Collation

Every “character” column (that is, a column of type , , or ) has a column character set and a column collation. Column definition syntax has optional clauses for specifying the column character set and collation:

 {CHAR | VARCHAR | TEXT} ()
    [CHARACTER SET ] [COLLATE ]

Example:

CREATE TABLE Table1
(
    column1 VARCHAR(5) CHARACTER SET latin1 COLLATE latin1_german1_ci
);

MySQL chooses the column character set and collation in the following manner:

  • If both and were specified, then character set and collation are used.

  • If was specified without , then character set and its default collation are used.

  • If was specified without , then the character set associated with and collation .

  • Otherwise, the table character set and collation are used.

The and clauses are standard SQL.

10.3.5. Character String Literal Character Set and Collation

Every character string literal has a character set and a collation.

A character string literal may have an optional character set introducer and clause:

[_]'' [COLLATE ]

Examples:

SELECT '';
SELECT _latin1'';
SELECT _latin1'' COLLATE latin1_danish_ci;

For the simple statement ', the string has the character set and collation defined by the and system variables.

The expression is formally called an introducer. It tells the parser, “the string that is about to follow uses character set .” Because this has confused people in the past, we emphasize that an introducer does not cause any conversion; it is strictly a signal that does not change the string's value. An introducer is also legal before standard hex literal and numeric hex literal notation (' and )>.

Examples:

SELECT _latin1 x'AABBCC';
SELECT _latin1 0xAABBCC;

MySQL determines a literal's character set and collation in the following manner:

  • If both and were specified, then character set and collation are used.

  • If is specified but is not specified, then character set and its default collation are used.

  • Otherwise, the character set and collation given by the and system variables are used.

Examples:

  • A string with character set and collation:

    SELECT _latin1'Müller' COLLATE latin1_german1_ci;
    
  • A string with character set and its default collation (that is, ):

    SELECT _latin1'Müller';
    
  • A string with the connection default character set and collation:

    SELECT 'Müller';
    

Character set introducers and the clause are implemented according to standard SQL specifications.

10.3.6. National Character Set

Standard SQL defines or as a way to indicate that a column should use some predefined character set. MySQL 5.0 uses as this predefined character set. For example, these data type declarations are equivalent:

CHAR(10) CHARACTER SET utf8
NATIONAL CHARACTER(10)
NCHAR(10)

As are these:

VARCHAR(10) CHARACTER SET utf8
NATIONAL VARCHAR(10)
NCHAR VARCHAR(10)
NATIONAL CHARACTER VARYING(10)
NATIONAL CHAR VARYING(10)

You can use ' to create a string in the national character set. These two statements are equivalent:

SELECT N'some text';
SELECT _utf8'some text';

For information on upgrading character sets to MySQL 5.0 from versions prior to 4.1, see the MySQL 3.23, 4.0, 4.1 Reference Manual.

10.3.7. Examples of Character Set and Collation Assignment

The following examples show how MySQL determines default character set and collation values.

Example 1: Table and Column Definition

CREATE TABLE t1
(
    c1 CHAR(10) CHARACTER SET latin1 COLLATE latin1_german1_ci
) DEFAULT CHARACTER SET latin2 COLLATE latin2_bin;

Here we have a column with a character set and a collation. The definition is explicit, so that's straightforward. Notice that there is no problem with storing a column in a table.

Example 2: Table and Column Definition

CREATE TABLE t1
(
    c1 CHAR(10) CHARACTER SET latin1
) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;

This time we have a column with a character set and a default collation. Although it might seem natural, the default collation is not taken from the table level. Instead, because the default collation for is always , column has a collation of (not ).

Example 3: Table and Column Definition

CREATE TABLE t1
(
    c1 CHAR(10)
) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;

We have a column with a default character set and a default collation. In this circumstance, MySQL checks the table level to determine the column character set and collation. Consequently, the character set for column is and its collation is .

Example 4: Database, Table, and Column Definition

CREATE DATABASE d1
    DEFAULT CHARACTER SET latin2 COLLATE latin2_czech_ci;
USE d1;
CREATE TABLE t1
(
    c1 CHAR(10)
);

We create a column without specifying its character set and collation. We're also not specifying a character set and a collation at the table level. In this circumstance, MySQL checks the database level to determine the table settings, which thereafter become the column settings.) Consequently, the character set for column is and its collation is .

10.3.8. Compatibility with Other DBMSs

For MaxDB compatibility these two statements are the same:

CREATE TABLE t1 (f1 CHAR() UNICODE);
CREATE TABLE t1 (f1 CHAR() CHARACTER SET ucs2);