5.11. MySQL Localization and International Usage

MySQL 5.0

5.11. MySQL Localization and International Usage

This section describes how to configure the server to use different character sets. It also discusses how to set the server's time zone and enable per-connection time zone support.

5.11.1. The Character Set Used for Data and Sorting

By default, MySQL uses the (cp1252 West European) character set and the collation that sorts according to Swedish/Finnish rules. These defaults are suitable for the United States and most of Western Europe.

All MySQL binary distributions are compiled with . This adds code to all standard programs that enables them to handle and all multi-byte character sets within the binary. Other character sets are loaded from a character-set definition file when needed.

The character set determines what characters are allowed in identifiers. The collation determines how strings are sorted by the and clauses of the statement.

You can change the default server character set and collation with the and options when you start the server. The collation must be a legal collation for the default character set. (Use the statement to determine which collations are available for each character set.) See Section 5.2.1, “mysqld Command Options”.

The character sets available depend on the and | complex | all | none options to configure, and the character set configuration files listed in /charsets/Index. See Section 2.9.2, “Typical configure Options”.

If you change the character set when running MySQL, that may also change the sort order. Consequently, you must run myisamchk -r -q --set-collation= on all tables, or your indexes may not be ordered correctly.

When a client connects to a MySQL server, the server indicates to the client what the server's default character set is. The client switches to this character set for this connection.

You should use when escaping strings for an SQL query. is identical to the old function, except that it takes the connection handle as the first parameter so that the appropriate character set can be taken into account when escaping characters.

If the client is compiled with paths that differ from where the server is installed and the user who configured MySQL didn't include all character sets in the MySQL binary, you must tell the client where it can find the additional character sets it needs if the server runs with a different character set from the client. You can do this by specifying a option to indicate the path to the directory in which the dynamic MySQL character sets are stored. For example, you can put the following in an option file:

[client]
character-sets-dir=/usr/local/mysql/share/mysql/charsets

You can force the client to use specific character set as follows:

[client]
default-character-set=

This is normally unnecessary, however.

5.11.1.1. Using the German Character Set

In MySQL 5.0, character set and collation are specified separately. This means that if you want German sort order, you should select the character set and either the or collation. For example, to start the server with the collation, use the and options.

For information on the differences between these two collations, see Section 10.9.2, “West European Character Sets”.

5.11.2. Setting the Error Message Language

By default, mysqld produces error messages in English, but they can also be displayed in any of these other languages: Czech, Danish, Dutch, Estonian, French, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Norwegian-ny, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, or Swedish.

To start mysqld with a particular language for error messages, use the or option. The option value can be a language name or the full path to the error message file. For example:

shell> 

Or:

shell> 

The language name should be specified in lowercase.

By default, the language files are located in the directory under the MySQL base directory.

You can also change the content of the error messages produced by the server. Details can be found in the MySQL Internals manual, available at http://dev.mysql.com/doc/. If you upgrade to a newer version of MySQL after changing the error messages, remember to repeat your changes after the upgrade.

5.11.3. Adding a New Character Set

This section discusses the procedure for adding a new character set to MySQL. You must have a MySQL source distribution to use these instructions. To choose the proper procedure, determine whether the character set is simple or complex:

  • If the character set does not need to use special string collating routines for sorting and does not need multi-byte character support, it is simple.

  • If it needs either of those features, it is complex.

For example, and are simple character sets, whereas and are complex character sets.

In the following instructions, the name of the character set is represented by .

For a simple character set, do the following:

  1. Add to the end of the file. Assign a unique number to it.

  2. Create the file .conf. (You can use a copy of as the basis for this file.)

    The syntax for the file is very simple:

    • Comments start with a ‘’ character and continue to the end of the line.

    • Words are separated by arbitrary amounts of whitespace.

    • When defining the character set, every word must be a number in hexadecimal format.

    • The array takes up the first 257 words. The , and arrays take up 256 words each after that.

    See Section 5.11.4, “The Character Definition Arrays”.

  3. Add the character set name to the and lists in .

  4. Reconfigure, recompile, and test.

For a complex character set, do the following:

  1. Create the file .c in the MySQL source distribution.

  2. Add to the end of the file. Assign a unique number to it.

  3. Look at one of the existing files (such as ) to see what needs to be defined. Note that the arrays in your file must have names like , , and so on. These correspond to the arrays for a simple character set. See Section 5.11.4, “The Character Definition Arrays”.

  4. Near the top of the file, place a special comment like this:

    /*
     * This comment is parsed by configure to create ctype.c,
     * so don't change it unless you know what you are doing.
     *
     * .configure. number_=
     * .configure. strxfrm_multiply_=
     * .configure. mbmaxlen_=
     */
    

    The configure program uses this comment to include the character set into the MySQL library automatically.

    The and lines are explained in the following sections. You need include them only if you need the string collating functions or the multi-byte character set functions, respectively.

  5. You should then create some of the following functions:

    • ()

    • ()

    • ()

    • ()

    See Section 5.11.5, “String Collating Support”.

  6. Add the character set name to the and lists in .

  7. Reconfigure, recompile, and test.

The file includes additional instructions.

If you want to have the character set included in the MySQL distribution, mail a patch to the MySQL mailing list. See Section 1.7.1, “MySQL Mailing Lists”.

5.11.4. The Character Definition Arrays

and are simple arrays that hold the lowercase and uppercase characters corresponding to each member of the character set. For example:

to_lower['A'] should contain 'a'
to_upper['a'] should contain 'A'

is a map indicating how characters should be ordered for comparison and sorting purposes. Quite often (but not for all character sets) this is the same as , which means that sorting is case-insensitive. MySQL sorts characters based on the values of elements. For more complicated sorting rules, see the discussion of string collating in Section 5.11.5, “String Collating Support”.

is an array of bit values, with one element for one character. (Note that , , and are indexed by character value, but is indexed by character value + 1. This is an old legacy convention for handling .)

You can find the following bitmask definitions in :

#define _U      01      /* Uppercase */
#define _L      02      /* Lowercase */
#define _N      04      /* Numeral (digit) */
#define _S      010     /* Spacing character */
#define _P      020     /* Punctuation */
#define _C      040     /* Control character */
#define _B      0100    /* Blank */
#define _X      0200    /* heXadecimal digit */

The entry for each character should be the union of the applicable bitmask values that describe the character. For example, is an uppercase character () as well as a hexadecimal digit (), so should contain the value:

_U + _X = 01 + 0200 = 0201

5.11.5. String Collating Support

If the sorting rules for your language are too complex to be handled with the simple table, you need to use the string collating functions.

The best documentation for this is the existing character sets. Look at the , , , , and character sets for examples.

You must specify the = value in the special comment at the top of the file. should be set to the maximum ratio the strings may grow during (it must be a positive integer).

5.11.6. Multi-Byte Character Support

If you want to add support for a new character set that includes multi-byte characters, you need to use the multi-byte character functions.

The best documentation for this is the existing character sets. Look at the , , , , and character sets for examples. These are implemented in the .c files in the directory.

You must specify the = value in the special comment at the top of the source file. should be set to the size in bytes of the largest character in the set.

5.11.7. Problems With Character Sets

If you try to use a character set that is not compiled into your binary, you might run into the following problems:

  • Your program uses an incorrect path to determine where the character sets are stored. (Default ). This can be fixed by using the option when you run the program in question.

  • The character set is a multi-byte character set that cannot be loaded dynamically. In this case, you must recompile the program with support for the character set.

  • The character set is a dynamic character set, but you do not have a configure file for it. In this case, you should install the configure file for the character set from a new MySQL distribution.

  • If your file does not contain the name for the character set, your program displays the following error message:

    ERROR 1105: File '/usr/local/share/mysql/charsets/?.conf'
    not found (Errcode: 2)
    

    In this case, you should either get a new file or manually add the name of any missing character sets to the current file.

For tables, you can check the character set name and number for a table with myisamchk -dvv .

5.11.8. MySQL Server Time Zone Support

The MySQL server maintains several time zone settings:

  • The system time zone. When the server starts, it attempts to determine the time zone of the host machine and uses it to set the system variable. The value does not change thereafter.

  • The server's current time zone. The global system variable indicates the time zone the server currently is operating in. The initial value for is , which indicates that the server time zone is the same as the system time zone. The initial value can be specified explicitly with the option. If you have the privilege, you can set the global value at runtime with this statement:

    mysql> ;
    
  • Per-connection time zones. Each client that connects has its own time zone setting, given by the session variable. Initially, the session variable takes its value from the global variable, but the client can change its own time zone with this statement:

    mysql> ;
    

The current values of the global and client-specific time zones can be retrieved like this:

mysql> 

values can be given as strings indicating an offset from UTC, such as or . If the time zone information tables in the database have been created and populated, you can also use named time zones, such as , , or . The value can be used to indicate that the time zone should be the same as the system time zone. Time zone names are not case sensitive.

The MySQL installation procedure creates the time zone tables in the database, but does not load them. You must do so manually. (If you are upgrading to MySQL 4.1.3 or later from an earlier version, you should create the tables by upgrading your database. Use the instructions in Section 5.6.2, “mysql_upgrade — Check Tables for MySQL Upgrade”.)

If your system has its own zoneinfo database (the set of files describing time zones), you should use the mysql_tzinfo_to_sql program for filling the time zone tables. Examples of such systems are Linux, FreeBSD, Sun Solaris, and Mac OS X. One likely location for these files is the directory. If your system does not have a zoneinfo database, you can use the downloadable package described later in this section.

The mysql_tzinfo_to_sql program is used to load the time zone tables. On the command line, pass the zoneinfo directory pathname to mysql_tzinfo_to_sql and send the output into the mysql program. For example:

shell> 

mysql_tzinfo_to_sql reads your system's time zone files and generates SQL statements from them. mysql processes those statements to load the time zone tables.

mysql_tzinfo_to_sql also can be used to load a single time zone file, and to generate leap second information:

  • To load a single time zone file that corresponds to a time zone name , invoke mysql_tzinfo_to_sql like this:

    shell>   | mysql -u root mysql
    
  • If your time zone needs to account for leap seconds, initialize the leap second information like this, where is the name of your time zone file:

    shell>  | mysql -u root mysql
    

If your system doesn't have a zoneinfo database (for example, Windows or HP-UX), you can use the package of pre-built time zone tables that is available for download at http://dev.mysql.com/downloads/timezones.html. This package contains , , and files for the time zone tables. These tables should be part of the database, so you should place the files in the subdirectory of your MySQL server's data directory. The server should be stopped while you do this.

Warning: Please don't use the downloadable package if your system has a zoneinfo database. Use the mysql_tzinfo_to_sql utility instead. Otherwise, you may cause a difference in datetime handling between MySQL and other applications on your system.

For information about time zone settings in replication setup, please see Section 6.7, “Replication Features and Known Problems”.