10.4. Connection Character Sets and Collations

MySQL 5.0

10.4. Connection Character Sets and Collations

Several character set and collation system variables relate to a client's interaction with the server. Some of these have been mentioned in earlier sections:

  • The server character set and collation can be determined from the values of the and system variables.

  • The character set and collation of the default database can be determined from the values of the and system variables.

Additional character set and collation system variables are involved in handling traffic for the connection between a client and the server. Every client has connection-related character set and collation system variables.

Consider what a “connection” is: It's what you make when you connect to the server. The client sends SQL statements, such as queries, over the connection to the server. The server sends responses, such as result sets, over the connection back to the client. This leads to several questions about character set and collation handling for client connections, each of which can be answered in terms of system variables:

  • What character set is the statement in when it leaves the client?

    The server takes the system variable to be the character set in which statements are sent by the client.

  • What character set should the server translate a statement to after receiving it?

    For this, the server uses the and system variables. It converts statements sent by the client from to (except for string literals that have an introducer such as or ). is important for comparisons of literal strings. For comparisons of strings with column values, does not matter because columns have their own collation, which has a higher collation precedence.

  • What character set should the server translate to before shipping result sets or error messages back to the client?

    The system variable indicates the character set in which the server returns query results to the client. This includes result data such as column values, and result metadata such as column names.

You can fine-tune the settings for these variables, or you can depend on the defaults (in which case, you can skip the rest of this section).

There are two statements that affect the connection character sets:

SET NAMES ''
SET CHARACTER SET 

indicates what character set the client will use to send SQL statements to the server. Thus, tells the server “future incoming messages from this client are in character set .” It also specifies the character set that the server should use for sending results back to the client. (For example, it indicates what character set to use for column values if you use a statement.)

A ' statement is equivalent to these three statements:

SET character_set_client = ;
SET character_set_results = ;
SET character_set_connection = ;

Setting to also sets to the default collation for . To specify one of the character set's collations explicitly, use the optional clause:

SET NAMES '' COLLATE ''

is similar to but sets the connection character set and collation to be those of the default database. A statement is equivalent to these three statements:

SET character_set_client = ;
SET character_set_results = ;
SET collation_connection = @@collation_database;

Setting also sets to the character set associated with the collation.

When a client connects, it sends to the server the name of the character set that it wants to use. The server uses the name to set the , , and system variables. In effect, the server performs a operation using the character set name.

With the mysql client, it is not necessary to execute every time you start up if you want to use a character set different from the default. You can add the option setting to your mysql statement line, or in your option file. For example, the following option file setting changes the three character set variables set to each time you invoke mysql:

[mysql]
default-character-set=koi8r

Example: Suppose that is defined as . If you do not say or , then for , the server sends back all the values for using the character set that the client specified when it connected. On the other hand, if you say or before issuing the statement, the server converts the values to just before sending results back. Conversion may be lossy if there are characters that are not in both character sets.

If you do not want the server to perform any conversion of result sets, set to :

SET character_set_results = NULL;

Note: Currently, UCS-2 cannot be used as a client character set, which means that does not work.

To see the values of the character set and collation system variables that apply to your connection, use these statements:

SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';