11.4. String Types

MySQL 5.0

11.4. String Types

The string types are , , , , , , , and . This section describes how these types work and how to use them in your queries. For string type storage requirements, see Section 11.5, “Data Type Storage Requirements”.

11.4.1. The CHAR and VARCHAR Types

The and types are similar, but differ in the way they are stored and retrieved. As of MySQL 5.0.3, they also differ in maximum length and in whether trailing spaces are retained.

The and types are declared with a length that indicates the maximum number of characters you want to store. For example, can hold up to 30 characters.

The length of a column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When values are stored, they are right-padded with spaces to the specified length. When values are retrieved, trailing spaces are removed.

Values in columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. (The maximum effective length of a in MySQL 5.0.3 and later is determined by the maximum row size and the character set used. The maximum length overall is 65,532 bytes.)

In contrast to , values are stored using only as many characters as are needed, plus one byte to record the length (two bytes for columns that are declared with a length longer than 255).

values are not padded when they are stored. Handling of trailing spaces is version-dependent. As of MySQL 5.0.3, trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL. Before MySQL 5.0.3, trailing spaces are removed from values when they are stored into a column; this means that the spaces also are absent from retrieved values.

If you assign a value to a or column that exceeds the column's maximum length, the value is truncated to fit. If the truncated characters are not spaces, a warning is generated. For truncation of non-space characters, you can cause an error to occur (rather than a warning) and suppress insertion of the value by using strict SQL mode. See Section 5.2.5, “The Server SQL Mode”.

Before MySQL 5.0.3, if you need a data type for which trailing spaces are not removed, consider using a or type. Also, if you want to store binary values such as results from an encryption or compression function that might contain arbitrary byte values, use a column rather than a or column, to avoid potential problems with trailing space removal that would change data values.

The following table illustrates the differences between and by showing the result of storing various string values into and columns:

Value Storage Required Storage Required
4 bytes 1 byte
4 bytes 3 bytes
4 bytes 5 bytes
4 bytes 5 bytes

Note that the values shown as stored in the last row of the table apply only when not using strict mode; if MySQL is running in strict mode, values that exceed the column length are not stored, and an error results.

If a given value is stored into the and columns, the values retrieved from the columns are not always the same because trailing spaces are removed from columns upon retrieval. The following example illustrates this difference:

mysql> 
Query OK, 0 rows affected (0.01 sec)

mysql> 
Query OK, 1 row affected (0.00 sec)

mysql> 
+---------------------+---------------------+
| CONCAT('(', v, ')') | CONCAT('(', c, ')') |
+---------------------+---------------------+
| (ab  )              | (ab)                |
+---------------------+---------------------+
1 row in set (0.06 sec)

Values in and columns are sorted and compared according to the character set collation assigned to the column.

Note that all MySQL collations are of type . This means that all and values in MySQL are compared without regard to any trailing spaces. For example:

mysql> 
Query OK, 0 rows affected (0.09 sec)

mysql> 
Query OK, 1 row affected (0.00 sec)

mysql> 
+--------------------+----------------------+
| myname = 'Monty  ' | yourname = 'Monty  ' |
+--------------------+----------------------+
|                  1 |                    1 |
+--------------------+----------------------+
1 row in set (0.00 sec)

Note that this is true for all MySQL versions, and it makes no difference whether your version trims trailing spaces from values before storing them. Nor does the server SQL mode make any difference in this regard.

For those cases where trailing pad characters are stripped or comparisons ignore them, if a column has an index that requires unique values, inserting into the column values that differ only in number of trailing pad characters will result in a duplicate-key error. For example, if a table contains , an attempt to store causes a duplicate-key error.

11.4.2. The BINARY and VARBINARY Types

The and types are similar to and , except that they contain binary strings rather than non-binary strings. That is, they contain byte strings rather than character strings. This means that they have no character set, and sorting and comparison are based on the numeric values of the bytes in the values.

The allowable maximum length is the same for and as it is for and , except that the length for and is a length in bytes rather than in characters.

The and data types are distinct from the and data types. For the latter types, the attribute does not cause the column to be treated as a binary string column. Instead, it causes the binary collation for the column character set to be used, and the column itself contains non-binary character strings rather than binary byte strings. For example, is treated as , assuming that the default character set is . This differs from , which stores 5-bytes binary strings that have no character set or collation.

When values are stored, they are right-padded with the pad value to the specified length. The pad value and how it is handled is version specific:

  • As of MySQL 5.0.15, the pad value is (the zero byte). Values are right-padded with on insert, and no trailing bytes are removed on select. All bytes are significant in comparisons, including and operations. bytes and spaces are different in comparisons, with < space.

    Example: For a column, becomes when inserted. becomes when inserted. Both inserted values remain unchanged when selected.

  • Before MySQL 5.0.15, the pad value is space. Values are right-padded with space on insert, and trailing spaces are removed on select. Trailing spaces are ignored in comparisons, including and operations. bytes and spaces are different in comparisons, with < space.

    Example: For a column, becomes when inserted and when selected. becomes when inserted and when selected.

For , there is no padding on insert and no bytes are stripped on select. All bytes are significant in comparisons, including and operations. bytes and spaces are different in comparisons, with < space. (Exceptions: Before MySQL 5.0.3, trailing spaces are removed when values are stored. Before MySQL 5.0.15, trailing 0x00 bytes are removed for operations.)

Note: The storage engine continues to preserve trailing spaces in and column values through MySQL 5.0.18. Beginning with MySQL 5.0.19, uses trailing space characters in making comparisons as do other MySQL storage engines.

For those cases where trailing pad bytes are stripped or comparisons ignore them, if a column has an index that requires unique values, inserting into the column values that differ only in number of trailing pad bytes will result in a duplicate-key error. For example, if a table contains , an attempt to store causes a duplicate-key error.

You should consider the preceding padding and stripping characteristics carefully if you plan to use the data type for storing binary data and you require that the value retrieved be exactly the same as the value stored. The following example illustrates how -padding of values affects column value comparisons:

mysql> 
Query OK, 0 rows affected (0.01 sec)

mysql> 
Query OK, 1 row affected (0.01 sec)

mysql> 
+--------+---------+-------------+
| HEX(c) | c = 'a' | c = 'a\0\0' |
+--------+---------+-------------+
| 610000 |       0 |           1 |
+--------+---------+-------------+
1 row in set (0.09 sec)

If the value retrieved must be the same as the value specified for storage with no padding, it might be preferable to use or one of the data types instead.

11.4.3. The BLOB and TEXT Types

A is a binary large object that can hold a variable amount of data. The four types are , , , and . These differ only in the maximum length of the values they can hold. The four types are , , , and . These correspond to the four types and have the same maximum lengths and storage requirements. See Section 11.5, “Data Type Storage Requirements”. No lettercase conversion for or columns takes place during storage or retrieval.

columns are treated as binary strings (byte strings). columns are treated as non-binary strings (character strings). columns have no character set, and sorting and comparison are based on the numeric values of the bytes in column values. columns have a character set, and values are sorted and compared based on the collation of the character set.

If a column is indexed, index entry comparisons are space-padded at the end. This means that, if the index requires unique values, duplicate-key errors will occur for values that differ only in the number of trailing spaces. For example, if a table contains , an attempt to store causes a duplicate-key error. This is not true for columns.

When not running in strict mode, if you assign a value to a or column that exceeds the data type's maximum length, the value is truncated to fit. If the truncated characters are not spaces, a warning is generated. You can cause an error to occur and the value to be rejected rather than to be truncated with a warning by using strict SQL mode. See Section 5.2.5, “The Server SQL Mode”.

In most respects, you can regard a column as a column that can be as large as you like. Similarly, you can regard a column as a column. and differ from and in the following ways:

  • There is no trailing-space removal for and columns when values are stored or retrieved. Before MySQL 5.0.3, this differs from and , for which trailing spaces are removed when values are stored.

    Note that is on comparison space extended to fit the compared object, exactly like and .

  • For indexes on and columns, you must specify an index prefix length. For and , a prefix length is optional. See Section 7.4.3, “Column Indexes”.

  • and columns cannot have values.

and map to the data type. This is a compatibility feature. If you use the attribute with a data type, the column is assigned the binary collation of the column character set.

MySQL Connector/ODBC defines values as and values as .

Because and values can be extremely long, you might encounter some constraints in using them:

  • Only the first bytes of the column are used when sorting. The default value of is 1024. This value can be changed using the option when starting the mysqld server. See Section 5.2.2, “Server System Variables”.

    You can make more bytes significant in sorting or grouping by increasing the value of at runtime. Any client can change the value of its session variable:

    mysql> 
    mysql> 
        -> 
    

    Another way to use or on a or column containing long values when you want more than bytes to be significant is to convert the column value into a fixed-length object. The standard way to do this is with the function. For example, the following statement causes 2000 bytes of the column to be taken into account for sorting:

    mysql> 
        -> 
    
  • The maximum size of a or object is determined by its type, but the largest value you actually can transmit between the client and server is determined by the amount of available memory and the size of the communications buffers. You can change the message buffer size by changing the value of the variable, but you must do so for both the server and your client program. For example, both mysql and mysqldump allow you to change the client-side value. See Section 7.5.2, “Tuning Server Parameters”, Section 8.6, “mysql — The MySQL Command-Line Tool”, and Section 8.12, “mysqldump — A Database Backup Program”.

Each or value is represented internally by a separately allocated object. This is in contrast to all other data types, for which storage is allocated once per column when the table is opened.

In some cases, it may be desirable to store binary data such as media files in or columns. You may find MySQL's string handling functions useful for working with such data. See Section 12.3, “String Functions”. For security and other reasons, it is usually preferable to do so using application code rather than allowing application users the privilege. You can discuss specifics for various languages and platforms in the MySQL Forums (http://forums.mysql.com/).

11.4.4. The ENUM Type

An is a string object with a value chosen from a list of allowed values that are enumerated explicitly in the column specification at table creation time.

An enumeration value must be a quoted string literal; it may not be an expression, even one that evaluates to a string value. This means that you also may not employ a user variable as an enumeration value.

For example, you can create a table with an column like this:

CREATE TABLE sizes (
    name ENUM('small', 'medium', 'large')
);

However, this version of the previous statement does not work:

CREATE TABLE sizes (
    c1 ENUM('small', CONCAT('med','ium'), 'large')
);

You also may not employ a user variable as an enumeration value. This pair of statements do not work:

SET @mysize = 'medium';

CREATE TABLE sizes (
    name ENUM('small', @mysize, 'large')
);

If you wish to use a number as an enumeration value, you must enclose it in quotes.

The value may also be the empty string () or under certain circumstances:

  • If you insert an invalid value into an (that is, a string not present in the list of allowed values), the empty string is inserted instead as a special error value. This string can be distinguished from a “normal” empty string by the fact that this string has the numerical value 0. More about this later.

    If strict SQL mode is enabled, attempts to insert invalid values result in an error.

  • If an column is declared to allow , the value is a legal value for the column, and the default value is . If an column is declared , its default value is the first element of the list of allowed values.

Each enumeration value has an index:

  • Values from the list of allowable elements in the column specification are numbered beginning with 1.

  • The index value of the empty string error value is 0. This means that you can use the following statement to find rows into which invalid values were assigned:

    mysql>  WHERE =0;
    
  • The index of the value is .

  • The term “index” here refers only to position within the list of enumeration values. It has nothing to do with table indexes.

For example, a column specified as can have any of the values shown here. The index of each value is also shown:

Value Index
0
1
2
3

An enumeration can have a maximum of 65,535 elements.

Trailing spaces are automatically deleted from member values in the table definition when a table is created.

When retrieved, values stored into an column are displayed using the lettercase that was used in the column definition. Note that columns can be assigned a character set and collation. For binary or case-sensitive collations, lettercase is taken into account when assigning values to the column.

If you retrieve an value in a numeric context, the column value's index is returned. For example, you can retrieve numeric values from an column like this:

mysql> +0 FROM ;

If you store a number into an column, the number is treated as an index, and the value stored is the enumeration member with that index. (However, this does not work with , which treats all input as strings.) It is not advisable to define an column with enumeration values that look like numbers, because this can easily become confusing. For example, the following column has enumeration members with string values of , , and , but numeric index values of , , and :

numbers ENUM('0','1','2')

values are sorted according to the order in which the enumeration members were listed in the column specification. (In other words, values are sorted according to their index numbers.) For example, sorts before for , but sorts before for . The empty string sorts before non-empty strings, and values sort before all other enumeration values. To prevent unexpected results, specify the list in alphabetical order. You can also use or to make sure that the column is sorted lexically rather than by index number.

If you want to determine all possible values for an column, use LIKE and parse the definition in the column of the output.

11.4.5. The SET Type

A is a string object that can have zero or more values, each of which must be chosen from a list of allowed values specified when the table is created. column values that consist of multiple set members are specified with members separated by commas (‘’). A consequence of this is that member values should not themselves contain commas.

For example, a column specified as can have any of these values:

''
'one'
'two'
'one,two'

A can have a maximum of 64 different members.

Trailing spaces are automatically deleted from member values in the table definition when a table is created.

When retrieved, values stored in a column are displayed using the lettercase that was used in the column definition. Note that columns can be assigned a character set and collation. For binary or case-sensitive collations, lettercase is taken into account when assigning values to the column.

MySQL stores values numerically, with the low-order bit of the stored value corresponding to the first set member. If you retrieve a value in a numeric context, the value retrieved has bits set corresponding to the set members that make up the column value. For example, you can retrieve numeric values from a column like this:

mysql> +0 FROM ;

If a number is stored into a column, the bits that are set in the binary representation of the number determine the set members in the column value. For a column specified as , the members have the following decimal and binary values:

Member Decimal Value Binary Value

If you assign a value of to this column, that is in binary, so the first and fourth value members and are selected and the resulting value is .

For a value containing more than one element, it does not matter what order the elements are listed in when you insert the value. It also does not matter how many times a given element is listed in the value. When the value is retrieved later, each element in the value appears once, with elements listed according to the order in which they were specified at table creation time. For example, suppose that a column is specified as :

mysql> 

If you insert the values , , , , and :

mysql> 
-> ('a,d'), ('d,a'), ('a,d,a'), ('a,d,d'), ('d,a,d');
Query OK, 5 rows affected (0.01 sec)
Records: 5  Duplicates: 0  Warnings: 0

Then all of these values appear as when retrieved:

mysql> 
+------+
| col  |
+------+
| a,d  |
| a,d  |
| a,d  |
| a,d  |
| a,d  |
+------+
5 rows in set (0.04 sec)

If you set a column to an unsupported value, the value is ignored and a warning is issued:

mysql> 
Query OK, 1 row affected, 1 warning (0.03 sec)

mysql> 
+---------+------+------------------------------------------+
| Level   | Code | Message                                  |
+---------+------+------------------------------------------+
| Warning | 1265 | Data truncated for column 'col' at row 1 |
+---------+------+------------------------------------------+
1 row in set (0.04 sec)

mysql> 
+------+
| col  |
+------+
| a,d  |
| a,d  |
| a,d  |
| a,d  |
| a,d  |
| a,d  |
+------+
6 rows in set (0.01 sec)

If strict SQL mode is enabled, attempts to insert invalid values result in an error.

values are sorted numerically. values sort before non- values.

Normally, you search for values using the function or the operator:

mysql>  WHERE FIND_IN_SET('',)>0;
mysql>  WHERE  LIKE '%%';

The first statement finds rows where contains the set member. The second is similar, but not the same: It finds rows where contains anywhere, even as a substring of another set member.

The following statements also are legal:

mysql>  WHERE  & 1;
mysql>  WHERE  = ',';

The first of these statements looks for values containing the first set member. The second looks for an exact match. Be careful with comparisons of the second type. Comparing set values to ,' returns different results than comparing values to ,'. You should specify the values in the same order they are listed in the column definition.

If you want to determine all possible values for a column, use LIKE and parse the definition in the column of the output.