There are default settings for character sets and collations at four levels: server, database, table, and column. The following description may appear complex, but it has been found in practice that multiple-level defaulting leads to natural and obvious results.
CHARACTER SET
is used in clauses that specify a
character set. CHARSET
may be used as a synonym
for CHARACTER SET
.
MySQL Server has a server character set and a server collation. These can be set at server startup and changed at runtime.
Initially, the server character set and collation depend on the
options that you use when you start mysqld.
You can use --character-set-server
for the
character set. Along with it, you can add
--collation-server
for the collation. If you
don't specify a character set, that is the same as saying
--character-set-server=latin1
. If you specify
only a character set (for example, latin1
)
but not a collation, that is the same as saying
--character-set-server=latin1
--collation-server=latin1_swedish_ci
because
latin1_swedish_ci
is the default collation
for latin1
. Therefore, the following three
commands all have the same effect:
shell>mysqld
shell>mysqld --character-set-server=latin1
shell>mysqld --character-set-server=latin1 \
--collation-server=latin1_swedish_ci
One way to change the settings is by recompiling. If you want to
change the default server character set and collation when
building from sources, use: --with-charset
and
--with-collation
as arguments for
configure. For example:
shell> ./configure --with-charset=latin1
Or:
shell>./configure --with-charset=latin1 \
--with-collation=latin1_german1_ci
Both mysqld and configure verify that the character set/collation combination is valid. If not, each program displays an error message and terminates.
The current server character set and collation can be determined
from the values of the character_set_server
and collation_server
system variables. These
variables can be changed at runtime.
Every database has a database character set and a database
collation. The CREATE DATABASE
and
ALTER DATABASE
statements have optional
clauses for specifying the database character set and collation:
CREATE DATABASEdb_name
[[DEFAULT] CHARACTER SETcharset_name
] [[DEFAULT] COLLATEcollation_name
] ALTER DATABASEdb_name
[[DEFAULT] CHARACTER SETcharset_name
] [[DEFAULT] COLLATEcollation_name
]
The keyword SCHEMA
can be used instead of
DATABASE
.
All database options are stored in a text file named
db.opt
that can be found in the database
directory.
The CHARACTER SET
and
COLLATE
clauses make it possible to create
databases with different character sets and collations on the
same MySQL server.
Example:
CREATE DATABASE db_name
CHARACTER SET latin1 COLLATE latin1_swedish_ci;
MySQL chooses the database character set and database collation in the following manner:
-
If both
CHARACTER SET
X
andCOLLATE
Y
were specified, then character setX
and collationY
. -
If
CHARACTER SET
X
was specified withoutCOLLATE
, then character setX
and its default collation. -
If
COLLATE
Y
was specified withoutCHARACTER SET
, then the character set associated withY
and collationY
. -
Otherwise, the server character set and server collation.
The database character set and collation are used as default
values if the table character set and collation are not
specified in CREATE TABLE
statements. They
have no other purpose.
The character set and collation for the default database can be
determined from the values of the
character_set_database
and
collation_database
system variables. The
server sets these variables whenever the default database
changes. If there is no default database, the variables have the
same value as the corresponding server-level system variables,
character_set_server
and
collation_server
.
Every table has a table character set and a table collation. The
CREATE TABLE
and ALTER
TABLE
statements have optional clauses for specifying
the table character set and collation:
CREATE TABLEtbl_name
(column_list
) [[DEFAULT] CHARACTER SETcharset_name
] [COLLATEcollation_name
]] ALTER TABLEtbl_name
[[DEFAULT] CHARACTER SETcharset_name
] [COLLATEcollation_name
]
Example:
CREATE TABLE t1 ( ... ) CHARACTER SET latin1 COLLATE latin1_danish_ci;
MySQL chooses the table character set and collation in the following manner:
-
If both
CHARACTER SET
X
andCOLLATE
Y
were specified, then character setX
and collationY
. -
If
CHARACTER SET
X
was specified withoutCOLLATE
, then character setX
and its default collation. -
If
COLLATE
Y
was specified withoutCHARACTER SET
, then the character set associated withY
and collationY
. -
Otherwise, the database character set and collation.
The table character set and collation are used as default values if the column character set and collation are not specified in individual column definitions. The table character set and collation are MySQL extensions; there are no such things in standard SQL.
Every “character” column (that is, a column of type
CHAR
, VARCHAR
, or
TEXT
) has a column character set and a column
collation. Column definition syntax has optional clauses for
specifying the column character set and collation:
col_name
{CHAR | VARCHAR | TEXT} (col_length
) [CHARACTER SETcharset_name
] [COLLATEcollation_name
]
Example:
CREATE TABLE Table1 ( column1 VARCHAR(5) CHARACTER SET latin1 COLLATE latin1_german1_ci );
MySQL chooses the column character set and collation in the following manner:
-
If both
CHARACTER SET
X
andCOLLATE
Y
were specified, then character setX
and collationY
are used. -
If
CHARACTER SET
X
was specified withoutCOLLATE
, then character setX
and its default collation are used. -
If
COLLATE
Y
was specified withoutCHARACTER SET
, then the character set associated withY
and collationY
. -
Otherwise, the table character set and collation are used.
The CHARACTER SET
and
COLLATE
clauses are standard SQL.
Every character string literal has a character set and a collation.
A character string literal may have an optional character set
introducer and COLLATE
clause:
[_charset_name
]'string
' [COLLATEcollation_name
]
Examples:
SELECT 'string
'; SELECT _latin1'string
'; SELECT _latin1'string
' COLLATE latin1_danish_ci;
For the simple statement SELECT
'
string
', the string has
the character set and collation defined by the
character_set_connection
and
collation_connection
system variables.
The _
charset_name
expression is formally called an
introducer. It tells the parser, “the
string that is about to follow uses character set
X
.” Because this has confused
people in the past, we emphasize that an introducer does not
cause any conversion; it is strictly a signal that does not
change the string's value. An introducer is also legal before
standard hex literal and numeric hex literal notation
(x'
literal
' and
0x
nnnn
)>.
Examples:
SELECT _latin1 x'AABBCC'; SELECT _latin1 0xAABBCC;
MySQL determines a literal's character set and collation in the following manner:
-
If both
_X
andCOLLATE
Y
were specified, then character setX
and collationY
are used. -
If
_X
is specified butCOLLATE
is not specified, then character setX
and its default collation are used. -
Otherwise, the character set and collation given by the
character_set_connection
andcollation_connection
system variables are used.
Examples:
-
A string with
latin1
character set andlatin1_german1_ci
collation:SELECT _latin1'Müller' COLLATE latin1_german1_ci;
-
A string with
latin1
character set and its default collation (that is,latin1_swedish_ci
):SELECT _latin1'Müller';
-
A string with the connection default character set and collation:
SELECT 'Müller';
Character set introducers and the COLLATE
clause are implemented according to standard SQL specifications.
Standard SQL defines NCHAR
or
NATIONAL CHAR
as a way to indicate that a
CHAR
column should use some predefined
character set. MySQL 5.0 uses
utf8
as this predefined character set. For
example, these data type declarations are equivalent:
CHAR(10) CHARACTER SET utf8 NATIONAL CHARACTER(10) NCHAR(10)
As are these:
VARCHAR(10) CHARACTER SET utf8 NATIONAL VARCHAR(10) NCHAR VARCHAR(10) NATIONAL CHARACTER VARYING(10) NATIONAL CHAR VARYING(10)
You can use
N'
literal
' to
create a string in the national character set. These two
statements are equivalent:
SELECT N'some text'; SELECT _utf8'some text';
For information on upgrading character sets to MySQL 5.0 from versions prior to 4.1, see the MySQL 3.23, 4.0, 4.1 Reference Manual.
The following examples show how MySQL determines default character set and collation values.
Example 1: Table and Column Definition
CREATE TABLE t1 ( c1 CHAR(10) CHARACTER SET latin1 COLLATE latin1_german1_ci ) DEFAULT CHARACTER SET latin2 COLLATE latin2_bin;
Here we have a column with a latin1
character
set and a latin1_german1_ci
collation. The
definition is explicit, so that's straightforward. Notice that
there is no problem with storing a latin1
column in a latin2
table.
Example 2: Table and Column Definition
CREATE TABLE t1 ( c1 CHAR(10) CHARACTER SET latin1 ) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;
This time we have a column with a latin1
character set and a default collation. Although it might seem
natural, the default collation is not taken from the table
level. Instead, because the default collation for
latin1
is always
latin1_swedish_ci
, column
c1
has a collation of
latin1_swedish_ci
(not
latin1_danish_ci
).
Example 3: Table and Column Definition
CREATE TABLE t1 ( c1 CHAR(10) ) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;
We have a column with a default character set and a default
collation. In this circumstance, MySQL checks the table level to
determine the column character set and collation. Consequently,
the character set for column c1
is
latin1
and its collation is
latin1_danish_ci
.
Example 4: Database, Table, and Column Definition
CREATE DATABASE d1 DEFAULT CHARACTER SET latin2 COLLATE latin2_czech_ci; USE d1; CREATE TABLE t1 ( c1 CHAR(10) );
We create a column without specifying its character set and
collation. We're also not specifying a character set and a
collation at the table level. In this circumstance, MySQL checks
the database level to determine the table settings, which
thereafter become the column settings.) Consequently, the
character set for column c1
is
latin2
and its collation is
latin2_czech_ci
.