This section describes how to configure the server to use different character sets. It also discusses how to set the server's time zone and enable per-connection time zone support.
By default, MySQL uses the latin1
(cp1252
West European) character set and the
latin1_swedish_ci
collation that sorts
according to Swedish/Finnish rules. These defaults are suitable
for the United States and most of Western Europe.
All MySQL binary distributions are compiled with
--with-extra-charsets=complex
. This adds code
to all standard programs that enables them to handle
latin1
and all multi-byte character sets
within the binary. Other character sets are loaded from a
character-set definition file when needed.
The character set determines what characters are allowed in
identifiers. The collation determines how strings are sorted by
the ORDER BY
and GROUP BY
clauses of the SELECT
statement.
You can change the default server character set and collation
with the --character-set-server
and
--collation-server
options when you start the
server. The collation must be a legal collation for the default
character set. (Use the SHOW COLLATION
statement to determine which collations are available for each
character set.) See Section 5.2.1, “mysqld Command Options”.
The character sets available depend on the
--with-charset=
charset_name
and
--with-extra-charsets=
list-of-charsets
| complex | all | none options to
configure, and the character set
configuration files listed in
SHAREDIR
/charsets/Index.
See Section 2.9.2, “Typical configure Options”.
If you change the character set when running MySQL, that may
also change the sort order. Consequently, you must run
myisamchk -r -q
--set-collation=collation_name
on all MyISAM
tables, or your indexes may not
be ordered correctly.
When a client connects to a MySQL server, the server indicates to the client what the server's default character set is. The client switches to this character set for this connection.
You should use mysql_real_escape_string()
when escaping strings for an SQL query.
mysql_real_escape_string()
is identical to
the old mysql_escape_string()
function,
except that it takes the MYSQL
connection
handle as the first parameter so that the appropriate character
set can be taken into account when escaping characters.
If the client is compiled with paths that differ from where the
server is installed and the user who configured MySQL didn't
include all character sets in the MySQL binary, you must tell
the client where it can find the additional character sets it
needs if the server runs with a different character set from the
client. You can do this by specifying a
--character-sets-dir
option to indicate the
path to the directory in which the dynamic MySQL character sets
are stored. For example, you can put the following in an option
file:
[client] character-sets-dir=/usr/local/mysql/share/mysql/charsets
You can force the client to use specific character set as follows:
[client]
default-character-set=charset_name
This is normally unnecessary, however.
In MySQL 5.0, character set and collation are
specified separately. This means that if you want German sort
order, you should select the latin1
character set and either the
latin1_german1_ci
or
latin1_german2_ci
collation. For example,
to start the server with the
latin1_german1_ci
collation, use the
--character-set-server=latin1
and
--collation-server=latin1_german1_ci
options.
For information on the differences between these two collations, see Section 10.9.2, “West European Character Sets”.
By default, mysqld produces error messages in English, but they can also be displayed in any of these other languages: Czech, Danish, Dutch, Estonian, French, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Norwegian-ny, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, or Swedish.
To start mysqld with a particular language
for error messages, use the --language
or
-L
option. The option value can be a language
name or the full path to the error message file. For example:
shell> mysqld --language=swedish
Or:
shell> mysqld --language=/usr/local/share/swedish
The language name should be specified in lowercase.
By default, the language files are located in the
share/
LANGUAGE
directory under the MySQL base directory.
You can also change the content of the error messages produced by the server. Details can be found in the MySQL Internals manual, available at http://dev.mysql.com/doc/. If you upgrade to a newer version of MySQL after changing the error messages, remember to repeat your changes after the upgrade.
This section discusses the procedure for adding a new character set to MySQL. You must have a MySQL source distribution to use these instructions. To choose the proper procedure, determine whether the character set is simple or complex:
-
If the character set does not need to use special string collating routines for sorting and does not need multi-byte character support, it is simple.
-
If it needs either of those features, it is complex.
For example, latin1
and
danish
are simple character sets, whereas
big5
and czech
are complex
character sets.
In the following instructions, the name of the character set is
represented by MYSET
.
For a simple character set, do the following:
-
Add
MYSET
to the end of thesql/share/charsets/Index
file. Assign a unique number to it. -
Create the file
sql/share/charsets/
MYSET
.conf. (You can use a copy ofsql/share/charsets/latin1.conf
as the basis for this file.)The syntax for the file is very simple:
-
Comments start with a ‘
#
’ character and continue to the end of the line. -
Words are separated by arbitrary amounts of whitespace.
-
When defining the character set, every word must be a number in hexadecimal format.
-
The
ctype
array takes up the first 257 words. Theto_lower[]
,to_upper[]
andsort_order[]
arrays take up 256 words each after that.
-
-
Add the character set name to the
CHARSETS_AVAILABLE
andCOMPILED_CHARSETS
lists inconfigure.in
. -
Reconfigure, recompile, and test.
For a complex character set, do the following:
-
Create the file
strings/ctype-
MYSET
.c in the MySQL source distribution. -
Add
MYSET
to the end of thesql/share/charsets/Index
file. Assign a unique number to it. -
Look at one of the existing
ctype-*.c
files (such asstrings/ctype-big5.c
) to see what needs to be defined. Note that the arrays in your file must have names likectype_
MYSET
,to_lower_
MYSET
, and so on. These correspond to the arrays for a simple character set. See Section 5.11.4, “The Character Definition Arrays”. -
Near the top of the file, place a special comment like this:
/* * This comment is parsed by configure to create ctype.c, * so don't change it unless you know what you are doing. * * .configure. number_
MYSET
=MYNUMBER
* .configure. strxfrm_multiply_MYSET
=N
* .configure. mbmaxlen_MYSET
=N
*/The configure program uses this comment to include the character set into the MySQL library automatically.
The
strxfrm_multiply
andmbmaxlen
lines are explained in the following sections. You need include them only if you need the string collating functions or the multi-byte character set functions, respectively. -
You should then create some of the following functions:
-
my_strncoll_
MYSET
() -
my_strcoll_
MYSET
() -
my_strxfrm_
MYSET
() -
my_like_range_
MYSET
()
-
-
Add the character set name to the
CHARSETS_AVAILABLE
andCOMPILED_CHARSETS
lists inconfigure.in
. -
Reconfigure, recompile, and test.
The sql/share/charsets/README
file includes
additional instructions.
If you want to have the character set included in the MySQL
distribution, mail a patch to the MySQL
internals
mailing list. See
Section 1.7.1, “MySQL Mailing Lists”.
to_lower[]
and to_upper[]
are simple arrays that hold the lowercase and uppercase
characters corresponding to each member of the character set.
For example:
to_lower['A'] should contain 'a' to_upper['a'] should contain 'A'
sort_order[]
is a map indicating how
characters should be ordered for comparison and sorting
purposes. Quite often (but not for all character sets) this is
the same as to_upper[]
, which means that
sorting is case-insensitive. MySQL sorts characters based on the
values of sort_order[]
elements. For more
complicated sorting rules, see the discussion of string
collating in Section 5.11.5, “String Collating Support”.
ctype[]
is an array of bit values, with one
element for one character. (Note that
to_lower[]
, to_upper[]
,
and sort_order[]
are indexed by character
value, but ctype[]
is indexed by character
value + 1. This is an old legacy convention for handling
EOF
.)
You can find the following bitmask definitions in
m_ctype.h
:
#define _U 01 /* Uppercase */ #define _L 02 /* Lowercase */ #define _N 04 /* Numeral (digit) */ #define _S 010 /* Spacing character */ #define _P 020 /* Punctuation */ #define _C 040 /* Control character */ #define _B 0100 /* Blank */ #define _X 0200 /* heXadecimal digit */
The ctype[]
entry for each character should
be the union of the applicable bitmask values that describe the
character. For example, 'A'
is an uppercase
character (_U
) as well as a hexadecimal digit
(_X
), so ctype['A'+1]
should contain the value:
_U + _X = 01 + 0200 = 0201
If the sorting rules for your language are too complex to be
handled with the simple sort_order[]
table,
you need to use the string collating functions.
The best documentation for this is the existing character sets.
Look at the big5
, czech
,
gbk
, sjis
, and
tis160
character sets for examples.
You must specify the
strxfrm_multiply_
MYSET
=N
value in the special comment at the top of the file.
N
should be set to the maximum ratio
the strings may grow during
my_strxfrm_
MYSET
(it must be a positive integer).
If you want to add support for a new character set that includes multi-byte characters, you need to use the multi-byte character functions.
The best documentation for this is the existing character sets.
Look at the euc_kr
,
gb2312
, gbk
,
sjis
, and ujis
character
sets for examples. These are implemented in the
ctype-
charset_name
.c
files in the strings
directory.
You must specify the
mbmaxlen_
MYSET
=N
value in the special comment at the top of the source file.
N
should be set to the size in bytes
of the largest character in the set.
If you try to use a character set that is not compiled into your binary, you might run into the following problems:
-
Your program uses an incorrect path to determine where the character sets are stored. (Default
/usr/local/mysql/share/mysql/charsets
). This can be fixed by using the--character-sets-dir
option when you run the program in question. -
The character set is a multi-byte character set that cannot be loaded dynamically. In this case, you must recompile the program with support for the character set.
-
The character set is a dynamic character set, but you do not have a configure file for it. In this case, you should install the configure file for the character set from a new MySQL distribution.
-
If your
Index
file does not contain the name for the character set, your program displays the following error message:ERROR 1105: File '/usr/local/share/mysql/charsets/?.conf' not found (Errcode: 2)
In this case, you should either get a new
Index
file or manually add the name of any missing character sets to the current file.
For MyISAM
tables, you can check the
character set name and number for a table with
myisamchk -dvv
tbl_name
.
The MySQL server maintains several time zone settings:
-
The system time zone. When the server starts, it attempts to determine the time zone of the host machine and uses it to set the
system_time_zone
system variable. The value does not change thereafter. -
The server's current time zone. The global
time_zone
system variable indicates the time zone the server currently is operating in. The initial value fortime_zone
is'SYSTEM'
, which indicates that the server time zone is the same as the system time zone. The initial value can be specified explicitly with the--default-time-zone=
timezone
option. If you have theSUPER
privilege, you can set the global value at runtime with this statement:mysql>
SET GLOBAL time_zone =
timezone
; -
Per-connection time zones. Each client that connects has its own time zone setting, given by the session
time_zone
variable. Initially, the session variable takes its value from the globaltime_zone
variable, but the client can change its own time zone with this statement:mysql>
SET time_zone =
timezone
;
The current values of the global and client-specific time zones can be retrieved like this:
mysql> SELECT @@global.time_zone, @@session.time_zone;
timezone
values can be given as
strings indicating an offset from UTC, such as
'+10:00'
or '-6:00'
. If
the time zone information tables in the mysql
database have been created and populated, you can also use named
time zones, such as 'Europe/Helsinki'
,
'US/Eastern'
, or 'MET'
.
The value 'SYSTEM'
can be used to indicate
that the time zone should be the same as the system time zone.
Time zone names are not case sensitive.
The MySQL installation procedure creates the time zone tables in
the mysql
database, but does not load them.
You must do so manually. (If you are upgrading to MySQL 4.1.3 or
later from an earlier version, you should create the tables by
upgrading your mysql
database. Use the
instructions in Section 5.6.2, “mysql_upgrade — Check Tables for MySQL Upgrade”.)
If your system has its own zoneinfo
database (the set of files describing time zones), you should
use the mysql_tzinfo_to_sql program for
filling the time zone tables. Examples of such systems are
Linux, FreeBSD, Sun Solaris, and Mac OS X. One likely location
for these files is the /usr/share/zoneinfo
directory. If your system does not have a zoneinfo database, you
can use the downloadable package described later in this
section.
The mysql_tzinfo_to_sql program is used to load the time zone tables. On the command line, pass the zoneinfo directory pathname to mysql_tzinfo_to_sql and send the output into the mysql program. For example:
shell> mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql
mysql_tzinfo_to_sql reads your system's time zone files and generates SQL statements from them. mysql processes those statements to load the time zone tables.
mysql_tzinfo_to_sql also can be used to load a single time zone file, and to generate leap second information:
-
To load a single time zone file
tz_file
that corresponds to a time zone nametz_name
, invoke mysql_tzinfo_to_sql like this:shell>
mysql_tzinfo_to_sql
tz_file
tz_name
| mysql -u root mysql -
If your time zone needs to account for leap seconds, initialize the leap second information like this, where
tz_file
is the name of your time zone file:shell>
mysql_tzinfo_to_sql --leap
tz_file
| mysql -u root mysql
If your system doesn't have a zoneinfo database (for example,
Windows or HP-UX), you can use the package of pre-built time
zone tables that is available for download at
http://dev.mysql.com/downloads/timezones.html. This package
contains .frm
, .MYD
,
and .MYI
files for the
MyISAM
time zone tables. These tables should
be part of the mysql
database, so you should
place the files in the mysql
subdirectory
of your MySQL server's data directory. The server should be
stopped while you do this.
Warning: Please don't use the downloadable package if your system has a zoneinfo database. Use the mysql_tzinfo_to_sql utility instead. Otherwise, you may cause a difference in datetime handling between MySQL and other applications on your system.
For information about time zone settings in replication setup, please see Section 6.7, “Replication Features and Known Problems”.