13.2. Data Manipulation Statements

MySQL 5.0

13.2. Data Manipulation Statements

13.2.1. DELETE Syntax

Single-table syntax:

DELETE [LOW_PRIORITY] [QUICK] [IGNORE] FROM 
    [WHERE ]
    [ORDER BY ...]
    [LIMIT ]

Multiple-table syntax:

DELETE [LOW_PRIORITY] [QUICK] [IGNORE]
    [.*] [, [.*]] ...
    FROM 
    [WHERE ]

Or:

DELETE [LOW_PRIORITY] [QUICK] [IGNORE]
    FROM [.*] [, [.*]] ...
    USING 
    [WHERE ]

For the single-table syntax, the statement deletes rows from and returns the number of rows deleted. The clause, if given, specifies the conditions that identify which rows to delete. With no clause, all rows are deleted. If the clause is specified, the rows are deleted in the order that is specified. The clause places a limit on the number of rows that can be deleted.

For the multiple-table syntax, deletes from each the rows that satisfy the conditions. In this case, and cannot be used.

is an expression that evaluates to true for each row to be deleted. It is specified as described in Section 13.2.7, “ Syntax”.

As stated, a statement with no clause deletes all rows. A faster way to do this, when you do not want to know the number of deleted rows, is to use . See Section 13.2.9, “ Syntax”.

If you delete the row containing the maximum value for an column, the value is reused later for a table, but not for a or table. If you delete all rows in the table with (without a clause) in mode, the sequence starts over for all storage engines except and . There are some exceptions to this behavior for tables, as discussed in Section 14.2.6.3, “How Columns Work in .

For and tables, you can specify an secondary column in a multiple-column key. In this case, reuse of values deleted from the top of the sequence occurs even for tables. See Section 3.6.9, “Using .

The statement supports the following modifiers:

  • If you specify , the server delays execution of the until no other clients are reading from the table.

  • For tables, if you use the keyword, the storage engine does not merge index leaves during delete, which may speed up some kinds of delete operations.

  • The keyword causes MySQL to ignore all errors during the process of deleting rows. (Errors encountered during the parsing stage are processed in the usual manner.) Errors that are ignored due to the use of are returned as warnings.

The speed of delete operations may also be affected by factors discussed in Section 7.2.18, “Speed of Statements”.

In tables, deleted rows are maintained in a linked list and subsequent operations reuse old row positions. To reclaim unused space and reduce file sizes, use the statement or the myisamchk utility to reorganize tables. is easier, but myisamchk is faster. See Section 13.5.2.5, “ Syntax”, and Section 8.3, “myisamchk — MyISAM Table-Maintenance Utility”.

The modifier affects whether index leaves are merged for delete operations. is most useful for applications where index values for deleted rows are replaced by similar index values from rows inserted later. In this case, the holes left by deleted values are reused.

is not useful when deleted values lead to undef-filled index blocks spanning a range of index values for which new inserts occur again. In this case, use of can lead to wasted space in the index that remains unreclaimed. Here is an example of such a scenario:

  1. Create a table that contains an indexed column.

  2. Insert many rows into the table. Each insert results in an index value that is added to the high end of the index.

  3. Delete a block of rows at the low end of the column range using .

In this scenario, the index blocks associated with the deleted index values become undef-filled but are not merged with other index blocks due to the use of . They remain undef-filled when new inserts occur, because new rows do not have index values in the deleted range. Furthermore, they remain undef-filled even if you later use without , unless some of the deleted index values happen to lie in index blocks within or adjacent to the undef-filled blocks. To reclaim unused index space under these circumstances, use .

If you are going to delete many rows from a table, it might be faster to use followed by . This rebuilds the index rather than performing many index block merge operations.

The MySQL-specific option to tells the server the maximum number of rows to be deleted before control is returned to the client. This can be used to ensure that a given statement does not take too much time. You can simply repeat the statement until the number of affected rows is less than the value.

If the statement includes an clause, the rows are deleted in the order specified by the clause. This is really useful only in conjunction with . For example, the following statement finds rows matching the clause, sorts them by , and deletes the first (oldest) one:

DELETE FROM somelog WHERE user = 'jcole'
ORDER BY timestamp_column LIMIT 1;

You can specify multiple tables in a statement to delete rows from one or more tables depending on the particular condition in the clause. However, you cannot use or in a multiple-table . The clause lists the tables involved in the join. Its syntax is described in Section 13.2.7.1, “ Syntax”.

For the first multiple-table syntax, only matching rows from the tables listed before the clause are deleted. For the second multiple-table syntax, only matching rows from the tables listed in the clause (before the clause) are deleted. The effect is that you can delete rows from many tables at the same time and have additional tables that are used only for searching:

DELETE t1, t2 FROM t1, t2, t3 WHERE t1.id=t2.id AND t2.id=t3.id;

Or:

DELETE FROM t1, t2 USING t1, t2, t3 WHERE t1.id=t2.id AND t2.id=t3.id;

These statements use all three tables when searching for rows to delete, but delete matching rows only from tables and .

The preceding examples show inner joins that use the comma operator, but multiple-table statements can use any type of join allowed in statements, such as .

The syntax allows after the table names for compatibility with Access.

If you use a multiple-table statement involving tables for which there are foreign key constraints, the MySQL optimizer might process tables in an order that differs from that of their parent/child relationship. In this case, the statement fails and rolls back. Instead, you should delete from a single table and rely on the capabilities that provides to cause the other tables to be modified accordingly.

Note: If you provide an alias for a table, you must use the alias when referring to the table:

DELETE t1 FROM test AS t1, test2 WHERE ...

Cross-database deletes are supported for multiple-table deletes, but in this case, you must refer to the tables without using aliases. For example:

DELETE test1.tmp1, test2.tmp2 FROM test1.tmp1, test2.tmp2 WHERE ...

Currently, you cannot delete from a table and select from the same table in a subquery.

13.2.2. DO Syntax

DO  [, ] ...

executes the expressions but does not return any results. In most respects, is shorthand for , ..., but has the advantage that it is slightly faster when you do not care about the result.

is useful primarily with functions that have side effects, such as .

13.2.3. HANDLER Syntax

HANDLER  OPEN [ AS  ]
HANDLER  READ  { = | >= | <= | < } (,,...)
    [ WHERE  ] [LIMIT ... ]
HANDLER  READ  { FIRST | NEXT | PREV | LAST }
    [ WHERE  ] [LIMIT ... ]
HANDLER  READ { FIRST | NEXT }
    [ WHERE  ] [LIMIT ... ]
HANDLER  CLOSE

The statement provides direct access to table storage engine interfaces. It is available for and tables.

The statement opens a table, making it accessible via subsequent statements. This table object is not shared by other threads and is not closed until the thread calls or the thread terminates. If you open the table using an alias, further references to the open table with other statements must use the alias rather than the table name.

The first syntax fetches a row where the index specified satisfies the given values and the condition is met. If you have a multiple-column index, specify the index column values as a comma-separated list. Either specify values for all the columns in the index, or specify values for a leftmost prefix of the index columns. Suppose that an index includes three columns named , , and , in that order. The statement can specify values for all three columns in the index, or for the columns in a leftmost prefix. For example:

HANDLER ... READ my_idx = (col_a_val,col_b_val,col_c_val) ...
HANDLER ... READ my_idx = (col_a_val,col_b_val) ...
HANDLER ... READ my_idx = (col_a_val) ...

To employ the interface to refer to a table's , use the quoted identifier :

HANDLER  READ `PRIMARY` ...

The second syntax fetches a row from the table in index order that matches the condition.

The third syntax fetches a row from the table in natural row order that matches the condition. It is faster than READ when a full table scan is desired. Natural row order is the order in which rows are stored in a table data file. This statement works for tables as well, but there is no such concept because there is no separate data file.

Without a clause, all forms of fetch a single row if one is available. To return a specific number of rows, include a clause. It has the same syntax as for the statement. See Section 13.2.7, “ Syntax”.

closes a table that was opened with .

is a somewhat low-level statement. For example, it does not provide consistency. That is, does not take a snapshot of the table, and does not lock the table. This means that after a statement is issued, table data can be modified (by the current thread or other threads) and these modifications might be only partially visible to or scans.

There are several reasons to use the interface instead of normal statements:

  • is faster than :

    • A designated storage engine handler object is allocated for the . The object is reused for subsequent statements for that table; it need not be reinitialized for each one.

    • There is less parsing involved.

    • There is no optimizer or query-checking overhead.

    • The table does not have to be locked between two handler requests.

    • The handler interface does not have to provide a consistent look of the data (for example, dirty reads are allowed), so the storage engine can use optimizations that does not normally allow.

  • For applications that use a low-level -like interface, makes it much easier to port them to MySQL.

  • enables you to traverse a database in a manner that is difficult (or even impossible) to accomplish with . The interface is a more natural way to look at data when working with applications that provide an interactive user interface to the database.

13.2.4. INSERT Syntax

INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE]
    [INTO]  [(,...)]
    VALUES ({ | DEFAULT},...),(...),...
    [ ON DUPLICATE KEY UPDATE =, ... ]

Or:

INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE]
    [INTO] 
    SET ={ | DEFAULT}, ...
    [ ON DUPLICATE KEY UPDATE =, ... ]

Or:

INSERT [LOW_PRIORITY | HIGH_PRIORITY] [IGNORE]
    [INTO]  [(,...)]
    SELECT ...
    [ ON DUPLICATE KEY UPDATE =, ... ]

inserts new rows into an existing table. The and forms of the statement insert rows based on explicitly specified values. The form inserts rows selected from another table or tables. is discussed further in Section 13.2.4.1, “ Syntax”.

You can use instead of to overwrite old rows. is the counterpart to in the treatment of new rows that contain unique key values that duplicate old rows: The new rows are used to replace the old rows rather than being discarded. See Section 13.2.6, “ Syntax”.

is the table into which rows should be inserted. The columns for which the statement provides values can be specified as follows:

  • You can provide a comma-separated list of column names following the table name. In this case, a value for each named column must be provided by the list or the statement.

  • If you do not specify a list of column names for or , values for every column in the table must be provided by the list or the statement. If you do not know the order of the columns in the table, use to find out.

  • The clause indicates the column names explicitly.

Column values can be given in several ways:

  • If you are not running in strict SQL mode, any column not explicitly given a value is set to its default (explicit or implicit) value. For example, if you specify a column list that does not name all the columns in the table, unnamed columns are set to their default values. Default value assignment is described in Section 11.1.4, “Data Type Default Values”. See also Section 1.9.6.2, “Constraints on Invalid Data”.

    If you want an statement to generate an error unless you explicitly specify values for all columns that do not have a default value, you should use strict mode. See Section 5.2.5, “The Server SQL Mode”.

  • Use the keyword to set a column explicitly to its default value. This makes it easier to write statements that assign values to all but a few columns, because it enables you to avoid writing an incomplete list that does not include a value for each column in the table. Otherwise, you would have to write out the list of column names corresponding to each value in the list.

    You can also use ) as a more general form that can be used in expressions to produce a given column's default value.

  • If both the column list and the list are empty, creates a row with each column set to its default value:

    INSERT INTO  () VALUES();
    

    In strict mode, an error occurs if any column doesn't have a default value. Otherwise, MySQL uses the implicit default value for any column that does not have an explicitly defined default.

  • You can specify an expression to provide a column value. This might involve type conversion if the type of the expression does not match the type of the column, and conversion of a given value can result in different inserted values depending on the data type. For example, inserting the string into an , , , or column results in the values , , , and being inserted, respectively. The reason the value stored in the and columns is is that the string-to-integer conversion looks only at as much of the initial part of the string as may be considered a valid integer or year. For the floating-point and fixed-point columns, the string-to-floating-point conversion considers the entire string a valid floating-point value.

    An expression can refer to any column that was set earlier in a value list. For example, you can do this because the value for refers to , which has previously been assigned:

    INSERT INTO  (col1,col2) VALUES(15,col1*2);
    

    But the following is not legal, because the value for refers to , which is assigned after :

    INSERT INTO  (col1,col2) VALUES(col2*2,15);
    

    One exception involves columns that contain values. Because the value is generated after other value assignments, any reference to an column in the assignment returns a .

statements that use syntax can insert multiple rows. To do this, include multiple lists of column values, each enclosed within parentheses and separated by commas. Example:

INSERT INTO  (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);

The values list for each row must be enclosed within parentheses. The following statement is illegal because the number of values in the list does not match the number of column names:

INSERT INTO  (a,b,c) VALUES(1,2,3,4,5,6,7,8,9);

The rows-affected value for an can be obtained using the C API function. See Section 22.2.3.1, “.

If you use an statement with multiple value lists or , the statement returns an information string in this format:

Records: 100 Duplicates: 0 Warnings: 0

indicates the number of rows processed by the statement. (This is not necessarily the number of rows actually inserted because can be non-zero.) indicates the number of rows that could not be inserted because they would duplicate some existing unique index value. indicates the number of attempts to insert column values that were problematic in some way. Warnings can occur under any of the following conditions:

  • Inserting into a column that has been declared . For multiple-row statements or statements, the column is set to the implicit default value for the column data type. This is for numeric types, the empty string () for string types, and the “zero” value for date and time types. statements are handled the same way as multiple-row inserts because the server does not examine the result set from the to see whether it returns a single row. (For a single-row , no warning occurs when is inserted into a column. Instead, the statement fails with an error.)

  • Setting a numeric column to a value that lies outside the column's range. The value is clipped to the closest endpoint of the range.

  • Assigning a value such as to a numeric column. The trailing non-numeric text is stripped off and the remaining numeric part is inserted. If the string value has no leading numeric part, the column is set to .

  • Inserting a string into a string column (, , , or ) that exceeds the column's maximum length. The value is truncated to the column's maximum length.

  • Inserting a value into a date or time column that is illegal for the data type. The column is set to the appropriate zero value for the type.

If you are using the C API, the information string can be obtained by invoking the function. See Section 22.2.3.34, “.

If inserts a row into a table that has an column, you can find the value used for that column by using the SQL function. From within the C API, use the function. However, you should note that the two functions do not always behave identically. The behavior of statements with respect to columns is discussed further in Section 12.9.3, “Information Functions”, and Section 22.2.3.36, “.

The statement supports the following modifiers:

  • If you use the keyword, the server puts the row or rows to be inserted into a buffer, and the client issuing the statement can then continue immediately. If the table is in use, the server holds the rows. When the table is free, the server begins inserting rows, checking periodically to see whether there are any new read requests for the table. If there are, the delayed row queue is suspended until the table becomes free again. See Section 13.2.4.2, “ Syntax”.

    is ignored with or .

  • If you use the keyword, execution of the is delayed until no other clients are reading from the table. This includes other clients that began reading while existing clients are reading, and while the statement is waiting. It is possible, therefore, for a client that issues an statement to wait for a very long time (or even forever) in a read-heavy environment. (This is in contrast to , which lets the client continue at once. Note that should normally not be used with tables because doing so disables concurrent inserts. See Section 7.3.3, “Concurrent Inserts”.

  • If you specify , it overrides the effect of the option if the server was started with that option. It also causes concurrent inserts not to be used.

  • If you use the keyword, errors that occur while executing the statement are treated as warnings instead. For example, without , a row that duplicates an existing index or value in the table causes a duplicate-key error and the statement is aborted. With , the row still is not inserted, but no error is issued. Data conversions that would trigger errors abort the statement if is not specified. With , invalid values are adjusted to the closest values and inserted; warnings are produced but the statement does not abort. You can determine with the C API function how many rows were actually inserted into the table.

  • If you specify , and a row is inserted that would cause a duplicate value in a index or , an of the old row is performed. See Section 13.2.4.3, “ Syntax”.

13.2.4.1.  Syntax

INSERT [LOW_PRIORITY | HIGH_PRIORITY] [IGNORE]
    [INTO]  [(,...)]
    SELECT ...
    [ ON DUPLICATE KEY UPDATE =, ... ]

With , you can quickly insert many rows into a table from one or many tables. For example:

INSERT INTO tbl_temp2 (fld_id)
  SELECT tbl_temp1.fld_order_id
  FROM tbl_temp1 WHERE tbl_temp1.fld_order_id > 100;

The following conditions hold for a statements:

  • Specify to ignore rows that would cause duplicate-key violations.

  • is ignored with .

  • The target table of the statement may appear in the clause of the part of the query. (This was not possible in some older versions of MySQL.) In this case, MySQL creates a temporary table to hold the rows from the and then inserts those rows into the target table.

  • columns work as usual.

  • To ensure that the binary log can be used to re-create the original tables, MySQL does not allow concurrent inserts for statements.

  • Currently, you cannot insert into a table and select from the same table in a subquery.

In the values part of , you can refer to columns in other tables, as long as you do not use in the part. One side effect is that you must qualify non-unique column names in the values part.

13.2.4.2.  Syntax

INSERT DELAYED ...

The option for the statement is a MySQL extension to standard SQL that is very useful if you have clients that cannot or need not wait for the to complete. This is a common situation when you use MySQL for logging and you also periodically run and statements that take a long time to complete.

When a client uses , it gets an okay from the server at once, and the row is queued to be inserted when the table is not in use by any other thread.

Another major benefit of using is that inserts from many clients are bundled together and written in one block. This is much faster than performing many separate inserts.

Note that is slower than a normal if the table is not otherwise in use. There is also the additional overhead for the server to handle a separate thread for each table for which there are delayed rows. This means that you should use only when you are really sure that you need it.

The queued rows are held only in memory until they are inserted into the table. This means that if you terminate mysqld forcibly (for example, with ) or if mysqld dies unexpectedly, any queued rows that have not been written to disk are lost.

There are some constraints on the use of :

  • works only with , , and tables. See Section 14.1, “The Storage Engine”, Section 14.4, “The () Storage Engine”, and Section 14.8, “The Storage Engine”.

    For tables, if there are no free blocks in the middle of the data file, concurrent and statements are supported. Under these circumstances, you very seldom need to use with .

  • should be used only for statements that specify value lists. The server ignores for or statements.

  • Because the statement returns immediately, before the rows are inserted, you cannot use to get the value that the statement might generate.

  • rows are not visible to statements until they actually have been inserted.

  • is ignored on slave replication servers because it could cause the slave to have different data than the master.

  • Pending statements are lost if a table is write locked and is used to modify the table structure.

The following describes in detail what happens when you use the option to or . In this description, the “thread” is the thread that received an statement and “handler” is the thread that handles all statements for a particular table.

  • When a thread executes a statement for a table, a handler thread is created to process all statements for the table, if no such handler already exists.

  • The thread checks whether the handler has previously acquired a lock; if not, it tells the handler thread to do so. The lock can be obtained even if other threads have a or lock on the table. However, the handler waits for all locks or statements to finish, to ensure that the table structure is up to date.

  • The thread executes the statement, but instead of writing the row to the table, it puts a copy of the final row into a queue that is managed by the handler thread. Any syntax errors are noticed by the thread and reported to the client program.

  • The client cannot obtain from the server the number of duplicate rows or the value for the resulting row, because the returns before the insert operation has been completed. (If you use the C API, the function does not return anything meaningful, for the same reason.)

  • The binary log is updated by the handler thread when the row is inserted into the table. In case of multiple-row inserts, the binary log is updated when the first row is inserted.

  • Each time that rows are written, the handler checks whether any statements are still pending. If so, it allows these to execute before continuing.

  • When the handler has no more rows in its queue, the table is unlocked. If no new statements are received within seconds, the handler terminates.

  • If more than rows are pending in a specific handler queue, the thread requesting waits until there is room in the queue. This is done to ensure that mysqld does not use all memory for the delayed memory queue.

  • The handler thread shows up in the MySQL process list with in the column. It is killed if you execute a statement or kill it with . However, before exiting, it first stores all queued rows into the table. During this time it does not accept any new statements from other threads. If you execute an statement after this, a new handler thread is created.

    Note that this means that statements have higher priority than normal statements if there is an handler running. Other update statements have to wait until the queue is empty, someone terminates the handler thread (with ), or someone executes a .

  • The following status variables provide information about statements:

    Status Variable Meaning
    Number of handler threads
    Number of rows written with
    Number of rows waiting to be written

    You can view these variables by issuing a statement or by executing a mysqladmin extended-status command.

13.2.4.3.  Syntax

If you specify , and a row is inserted that would cause a duplicate value in a index or , an of the old row is performed. For example, if column is declared as and contains the value , the following two statements have identical effect:

INSERT INTO table (a,b,c) VALUES (1,2,3)
  ON DUPLICATE KEY UPDATE c=c+1;

UPDATE table SET c=c+1 WHERE a=1;

The rows-affected value is 1 if the row is inserted as a new record and 2 if an existing record is updated.

If column is also unique, the is equivalent to this statement instead:

UPDATE table SET c=c+1 WHERE a=1 OR b=2 LIMIT 1;

If matches several rows, only one row is updated. In general, you should try to avoid using an clause on tables with multiple unique indexes.

You can use the ) function in the clause to refer to column values from the portion of the statement. In other words, ) in the clause refers to the value of that would be inserted, had no duplicate-key conflict occurred. This function is especially useful in multiple-row inserts. The function is meaningful only in statements and returns otherwise. Example:

INSERT INTO table (a,b,c) VALUES (1,2,3),(4,5,6)
  ON DUPLICATE KEY UPDATE c=VALUES(a)+VALUES(b);

That statement is identical to the following two statements:

INSERT INTO table (a,b,c) VALUES (1,2,3)
  ON DUPLICATE KEY UPDATE c=3;
INSERT INTO table (a,b,c) VALUES (4,5,6)
  ON DUPLICATE KEY UPDATE c=9;

The option is ignored when you use .

13.2.5. LOAD DATA INFILE Syntax

LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE ''
    [REPLACE | IGNORE]
    INTO TABLE 
    [FIELDS
        [TERMINATED BY '']
        [[OPTIONALLY] ENCLOSED BY '']
        [ESCAPED BY '']
    ]
    [LINES
        [STARTING BY '']
        [TERMINATED BY '']
    ]
    [IGNORE  LINES]
    [(,...)]
    [SET  = ,...)]

The statement reads rows from a text file into a table at a very high speed. The filename must be given as a literal string.

is the complement of . (See Section 13.2.7, “ Syntax”.) To write data from a table to a file, use . To read the file back into a table, use . The syntax of the and clauses is the same for both statements. Both clauses are optional, but must precede if both are specified.

For more information about the efficiency of versus and speeding up , see Section 7.2.16, “Speed of Statements”.

The character set indicated by the system variable is used to interpret the information in the file. and the setting of do not affect interpretation of input.

Note that it is currently not possible to load data files that use the character set.

As of MySQL 5.0.19, the system variable controls the interpretation of the filename.

You can also load data files by using the mysqlimport utility; it operates by sending a statement to the server. The option causes mysqlimport to read data files from the client host. You can specify the option to get better performance over slow networks if the client and server support the compressed protocol. See Section 8.14, “mysqlimport — A Data Import Program”.

If you use , execution of the statement is delayed until no other clients are reading from the table.

If you specify with a table that satisfies the condition for concurrent inserts (that is, it contains no free blocks in the middle), other threads can retrieve data from the table while is executing. Using this option affects the performance of a bit, even if no other thread is using the table at the same time.

The keyword, if specified, is interpreted with respect to the client end of the connection:

  • If is specified, the file is read by the client program on the client host and sent to the server. The file can be given as a full pathname to specify its exact location. If given as a relative pathname, the name is interpreted relative to the directory in which the client program was started.

  • If is not specified, the file must be located on the server host and is read directly by the server. The server uses the following rules to locate the file:

    • If the filename is an absolute pathname, the server uses it as given.

    • If the filename is a relative pathname with one or more leading components, the server searches for the file relative to the server's data directory.

    • If a filename with no leading components is given, the server looks for the file in the database directory of the default database.

Note that, in the non- case, these rules mean that a file named as is read from the server's data directory, whereas the file named as is read from the database directory of the default database. For example, if is the default database, the following statement reads the file from the database directory for , even though the statement explicitly loads the file into a table in the database:

LOAD DATA INFILE 'data.txt' INTO TABLE db2.my_table;

Windows pathnames are specified using forward slashes rather than backslashes. If you do use backslashes, you must double them.

For security reasons, when reading text files located on the server, the files must either reside in the database directory or be readable by all. Also, to use on server files, you must have the privilege. See Section 5.8.3, “Privileges Provided by MySQL”.

Using is a bit slower than letting the server access the files directly, because the contents of the file must be sent over the connection by the client to the server. On the other hand, you do not need the privilege to load local files.

works only if your server and your client both have been enabled to allow it. For example, if mysqld was started with , does not work. See Section 5.7.4, “Security Issues with .

On Unix, if you need to read from a pipe, you can use the following technique (here we load the listing of the directory into a table):

mkfifo /mysql/db/x/x
chmod 666 /mysql/db/x/x
find / -ls > /mysql/db/x/x
mysql -e "LOAD DATA INFILE 'x' INTO TABLE x" x

The and keywords control handling of input rows that duplicate existing rows on unique key values:

  • If you specify , input rows replace existing rows. In other words, rows that have the same value for a primary key or unique index as an existing row. See Section 13.2.6, “ Syntax”.

  • If you specify , input rows that duplicate an existing row on a unique key value are skipped. If you do not specify either option, the behavior depends on whether the keyword is specified. Without , an error occurs when a duplicate key value is found, and the rest of the text file is ignored. With , the default behavior is the same as if is specified; this is because the server has no way to stop transmission of the file in the middle of the operation.

If you want to ignore foreign key constraints during the load operation, you can issue a statement before executing .

If you use on an empty table, all non-unique indexes are created in a separate batch (as for ). Normally, this makes much faster when you have many indexes. In some extreme cases, you can create the indexes even faster by turning them off with before loading the file into the table and using to re-create the indexes after loading the file. See Section 7.2.16, “Speed of Statements”.

For both the and statements, the syntax of the and clauses is the same. Both clauses are optional, but must precede if both are specified.

If you specify a clause, each of its subclauses (, , and ) is also optional, except that you must specify at least one of them.

If you specify no clause, the defaults are the same as if you had written this:

FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\'

If you specify no clause, the defaults are the same as if you had written this:

LINES TERMINATED BY '\n' STARTING BY ''

In other words, the defaults cause to act as follows when reading input:

  • Look for line boundaries at newlines.

  • Do not skip over any line prefix.

  • Break lines into fields at tabs.

  • Do not expect fields to be enclosed within any quoting characters.

  • Interpret occurrences of tab, newline, or ‘’ preceded by ‘’ as literal characters that are part of field values.

Conversely, the defaults cause to act as follows when writing output:

  • Write tabs between fields.

  • Do not enclose fields within any quoting characters.

  • Use ‘’ to escape instances of tab, newline, or ‘’ that occur within field values.

  • Write newlines at the ends of lines.

Backslash is the MySQL escape character within strings, so to write , you must specify two backslashes for the value to be interpreted as a single backslash.

Note: If you have generated the text file on a Windows system, you might have to use to read the file properly, because Windows programs typically use two characters as a line terminator. Some programs, such as WordPad, might use as a line terminator when writing files. To read such files, use .

If all the lines you want to read in have a common prefix that you want to ignore, you can use ' to skip over the prefix, and anything before it. If a line does not include the prefix, the entire line is skipped. Suppose that you issue the following statement:

LOAD DATA INFILE '/tmp/test.txt' INTO TABLE test
  FIELDS TERMINATED BY ','  LINES STARTING BY 'xxx';

If the data file looks like this:

xxx"abc",1
something xxx"def",2
"ghi",3

The resulting rows will be and . The third row in the file is skipped because it does not contain the prefix.

The LINES option can be used to ignore lines at the start of the file. For example, you can use to skip over an initial header line containing column names:

LOAD DATA INFILE '/tmp/test.txt' INTO TABLE test IGNORE 1 LINES;

When you use in tandem with to write data from a database into a file and then read the file back into the database later, the field- and line-handling options for both statements must match. Otherwise, will not interpret the contents of the file properly. Suppose that you use to write a file with fields delimited by commas:

SELECT * INTO OUTFILE 'data.txt'
  FIELDS TERMINATED BY ','
  FROM table2;

To read the comma-delimited file back in, the correct statement would be:

LOAD DATA INFILE 'data.txt' INTO TABLE table2
  FIELDS TERMINATED BY ',';

If instead you tried to read in the file with the statement shown following, it wouldn't work because it instructs to look for tabs between fields:

LOAD DATA INFILE 'data.txt' INTO TABLE table2
  FIELDS TERMINATED BY '\t';

The likely result is that each input line would be interpreted as a single field.

can be used to read files obtained from external sources. For example, many programs can export data in comma-separate values (CSV) format, such that lines have fields separated by commas and enclosed within double quotes. If lines in such a file are terminated by newlines, the statement shown here illustrates the field- and line-handling options you would use to load the file:

LOAD DATA INFILE 'data.txt' INTO TABLE 
  FIELDS TERMINATED BY ',' ENCLOSED BY '"'
  LINES TERMINATED BY '\n';

Any of the field- or line-handling options can specify an empty string (). If not empty, the and values must be a single character. The , , and values can be more than one character. For example, to write lines that are terminated by carriage return/linefeed pairs, or to read a file containing such lines, specify a clause.

To read a file containing jokes that are separated by lines consisting of , you can do this

CREATE TABLE jokes
  (a INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  joke TEXT NOT NULL);
LOAD DATA INFILE '/tmp/jokes.txt' INTO TABLE jokes
  FIELDS TERMINATED BY ''
  LINES TERMINATED BY '\n%%\n' (joke);

controls quoting of fields. For output (), if you omit the word , all fields are enclosed by the character. An example of such output (using a comma as the field delimiter) is shown here:

"1","a string","100.20"
"2","a string containing a , comma","102.20"
"3","a string containing a \" quote","102.20"
"4","a string containing a \", quote and comma","102.20"

If you specify , the character is used only to enclose values from columns that have a string data type (such as , , , or ):

1,"a string",100.20
2,"a string containing a , comma",102.20
3,"a string containing a \" quote",102.20
4,"a string containing a \", quote and comma",102.20

Note that occurrences of the character within a field value are escaped by prefixing them with the character. Also note that if you specify an empty value, it is possible to inadvertently generate output that cannot be read properly by . For example, the preceding output just shown would appear as follows if the escape character is empty. Observe that the second field in the fourth line contains a comma following the quote, which (erroneously) appears to terminate the field:

1,"a string",100.20
2,"a string containing a , comma",102.20
3,"a string containing a " quote",102.20
4,"a string containing a ", quote and comma",102.20

For input, the character, if present, is stripped from the ends of field values. (This is true regardless of whether is specified; has no effect on input interpretation.) Occurrences of the character preceded by the character are interpreted as part of the current field value.

If the field begins with the character, instances of that character are recognized as terminating a field value only if followed by the field or line sequence. To avoid ambiguity, occurrences of the character within a field value can be doubled and are interpreted as a single instance of the character. For example, if is specified, quotes are handled as shown here:

"The ""BIG"" boss"  -> The "BIG" boss
The "BIG" boss      -> The "BIG" boss
The ""BIG"" boss    -> The ""BIG"" boss

controls how to write or read special characters. If the character is not empty, it is used to prefix the following characters on output:

  • The character

  • The character

  • The first character of the and values

  • ASCII (what is actually written following the escape character is ASCII ‘’, not a zero-valued byte)

If the character is empty, no characters are escaped and is output as , not . It is probably not a good idea to specify an empty escape character, particularly if field values in your data contain any of the characters in the list just given.

For input, if the character is not empty, occurrences of that character are stripped and the following character is taken literally as part of a field value. The exceptions are an escaped ‘’ or ‘’ (for example, or if the escape character is ‘’). These sequences are interpreted as ASCII NUL (a zero-valued byte) and . The rules for handling are described later in this section.

For more information about ‘’-escape syntax, see Section 9.1, “Literal Values”.

In certain cases, field- and line-handling options interact:

  • If is an empty string and is non-empty, lines are also terminated with .

  • If the and values are both empty (), a fixed-row (non-delimited) format is used. With fixed-row format, no delimiters are used between fields (but you can still have a line terminator). Instead, column values are read and written using a field width wide enough to hold all values in the field. For , , , , and , the field widths are 4, 6, 8, 11, and 20, respectively, no matter what the declared display width is.

    is still used to separate lines. If a line does not contain all fields, the rest of the columns are set to their default values. If you do not have a line terminator, you should set this to . In this case, the text file must contain all fields for each row.

    Fixed-row format also affects handling of values, as described later. Note that fixed-size format does not work if you are using a multi-byte character set.

    Note: Before MySQL 5.0.6, fixed-row format used the display width of the column. For example, was read or written using a field with a width of 4. However, if the column contained wider values, they were dumped to their full width, leading to the possibility of a “ragged” field holding values of different widths. Using a field wide enough to hold all values in the field prevents this problem. However, data files written before this change was made might not be reloaded correctly with for MySQL 5.0.6 and up. This change also affects data files read by mysqlimport and written by mysqldump --tab, which use and .

Handling of values varies according to the and options in use:

  • For the default and values, is written as a field value of for output, and a field value of is read as for input (assuming that the character is ‘’).

  • If is not empty, a field containing the literal word as its value is read as a value. This differs from the word enclosed within characters, which is read as the string .

  • If is empty, is written as the word .

  • With fixed-row format (which is used when and are both empty), is written as an empty string. Note that this causes both values and empty strings in the table to be indistinguishable when written to the file because both are written as empty strings. If you need to be able to tell the two apart when reading the file back in, you should not use fixed-row format.

An attempt to load into a column causes assignment of the implicit default value for the column's data type and a warning, or an error in strict SQL mode. Implicit default values are discussed in Section 11.1.4, “Data Type Default Values”.

Some cases are not supported by :

  • Fixed-size rows ( and both empty) and or columns.

  • If you specify one separator that is the same as or a prefix of another, cannot interpret the input properly. For example, the following clause would cause problems:

    FIELDS TERMINATED BY '"' ENCLOSED BY '"'
    
  • If is empty, a field value that contains an occurrence of or followed by the value causes to stop reading a field or line too early. This happens because cannot properly determine where the field or line value ends.

The following example loads all columns of the table:

LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata;

By default, when no column list is provided at the end of the statement, input lines are expected to contain a field for each table column. If you want to load only some of a table's columns, specify a column list:

LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata (col1,col2,...);

You must also specify a column list if the order of the fields in the input file differs from the order of the columns in the table. Otherwise, MySQL cannot tell how to match input fields with table columns.

Before MySQL 5.0.3, the column list must contain only names of columns in the table being loaded, and the clause is not supported. As of MySQL 5.0.3, the column list can contain either column names or user variables. With user variables, the clause enables you to perform transformations on their values before assigning the result to columns.

User variables in the clause can be used in several ways. The following example uses the first input column directly for the value of , and assigns the second input column to a user variable that is subjected to a division operation before being used for the value of :

LOAD DATA INFILE 'file.txt'
  INTO TABLE t1
  (column1, @var1)
  SET column2 = @var1/100;

The clause can be used to supply values not derived from the input file. The following statement sets to the current date and time:

LOAD DATA INFILE 'file.txt'
  INTO TABLE t1
  (column1, column2)
  SET column3 = CURRENT_TIMESTAMP;

You can also discard an input value by assigning it to a user variable and not assigning the variable to a table column:

LOAD DATA INFILE 'file.txt'
  INTO TABLE t1
  (column1, @dummy, column2, @dummy, column3);

Use of the column/variable list and clause is subject to the following restrictions:

  • Assignments in the clause should have only column names on the left hand side of assignment operators.

  • You can use subqueries in the right hand side of assignments. A subquery that returns a value to be assigned to a column may be a scalar subquery only. Also, you cannot use a subquery to select from the table that is being loaded.

  • Lines ignored by an clause are not processed for the column/variable list or clause.

  • User variables cannot be used when loading data with fixed-row format because user variables do not have a display width.

When processing an input line, splits it into fields and uses the values according to the column/variable list and the clause, if they are present. Then the resulting row is inserted into the table. If there are or triggers for the table, they are activated before or after inserting the row, respectively.

If an input line has too many fields, the extra fields are ignored and the number of warnings is incremented.

If an input line has too few fields, the table columns for which input fields are missing are set to their default values. Default value assignment is described in Section 11.1.4, “Data Type Default Values”.

An empty field value is interpreted differently than if the field value is missing:

  • For string types, the column is set to the empty string.

  • For numeric types, the column is set to .

  • For date and time types, the column is set to the appropriate “zero” value for the type. See Section 11.3, “Date and Time Types”.

These are the same values that result if you assign an empty string explicitly to a string, numeric, or date or time type explicitly in an or statement.

columns are set to the current date and time only if there is a value for the column (that is, ), or if the column's default value is the current timestamp and it is omitted from the field list when a field list is specified.

regards all input as strings, so you cannot use numeric values for or columns the way you can with statements. All and values must be specified as strings.

values cannot be loaded using binary notation (for example, ). To work around this, specify the values as regular integers and use the clause to convert them so that MySQL performs a numeric type conversion and loads them into the column properly:

shell> 
2
127
shell> 
mysql> 
    -> 
Query OK, 2 rows affected (0.00 sec)
Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

mysql> 
+----------+
| bin(b+0) |
+----------+
| 10       |
| 1111111  |
+----------+
2 rows in set (0.00 sec)

When the statement finishes, it returns an information string in the following format:

Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

If you are using the C API, you can get information about the statement by calling the function. See Section 22.2.3.34, “.

Warnings occur under the same circumstances as when values are inserted via the statement (see Section 13.2.4, “ Syntax”), except that also generates warnings when there are too few or too many fields in the input row. The warnings are not stored anywhere; the number of warnings can be used only as an indication of whether everything went well.

You can use to get a list of the first warnings as information about what went wrong. See Section 13.5.4.25, “ Syntax”.

13.2.6. REPLACE Syntax

REPLACE [LOW_PRIORITY | DELAYED]
    [INTO]  [(,...)]
    VALUES ({ | DEFAULT},...),(...),...

Or:

REPLACE [LOW_PRIORITY | DELAYED]
    [INTO] 
    SET ={ | DEFAULT}, ...

Or:

REPLACE [LOW_PRIORITY | DELAYED]
    [INTO]  [(,...)]
    SELECT ...

works exactly like , except that if an old row in the table has the same value as a new row for a or a index, the old row is deleted before the new row is inserted. See Section 13.2.4, “ Syntax”.

is a MySQL extension to the SQL standard. It either inserts, or deletes and inserts. For another MySQL extension to standard SQL — that either inserts or updates — see Section 13.2.4.3, “ Syntax”.

Note that unless the table has a or index, using a statement makes no sense. It becomes equivalent to , because there is no index to be used to determine whether a new row duplicates another.

Values for all columns are taken from the values specified in the statement. Any missing columns are set to their default values, just as happens for . You cannot refer to values from the current row and use them in the new row. If you use an assignment such as = + 1, the reference to the column name on the right hand side is treated as ), so the assignment is equivalent to = DEFAULT() + 1.

To use , you must have both the and privileges for the table.

The statement returns a count to indicate the number of rows affected. This is the sum of the rows deleted and inserted. If the count is 1 for a single-row , a row was inserted and no rows were deleted. If the count is greater than 1, one or more old rows were deleted before the new row was inserted. It is possible for a single row to replace more than one old row if the table contains multiple unique indexes and the new row duplicates values for different old rows in different unique indexes.

The affected-rows count makes it easy to determine whether only added a row or whether it also replaced any rows: Check whether the count is 1 (added) or greater (replaced).

If you are using the C API, the affected-rows count can be obtained using the function.

Currently, you cannot replace into a table and select from the same table in a subquery.

MySQL uses the following algorithm for (and ):

  1. Try to insert the new row into the table

  2. While the insertion fails because a duplicate-key error occurs for a primary key or unique index:

    1. Delete from the table the conflicting row that has the duplicate key value

    2. Try again to insert the new row into the table

13.2.7. SELECT Syntax

SELECT
    [ALL | DISTINCT | DISTINCTROW ]
      [HIGH_PRIORITY]
      [STRAIGHT_JOIN]
      [SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT]
      [SQL_CACHE | SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS]
    , ...
    [FROM 
    [WHERE ]
    [GROUP BY { |  | }
      [ASC | DESC], ... [WITH ROLLUP]]
    [HAVING ]
    [ORDER BY { |  | }
      [ASC | DESC], ...]
    [LIMIT {[,]  |  OFFSET }]
    [PROCEDURE ()]
    [INTO OUTFILE '' 
      | INTO DUMPFILE ''
      | INTO @ [, @]]
    [FOR UPDATE | LOCK IN SHARE MODE]]

is used to retrieve rows selected from one or more tables, and can include statements and subqueries. See Section 13.2.7.2, “ Syntax”, and Section 13.2.8, “Subquery Syntax”.

The most commonly used clauses of statements are these:

  • Each indicates a column that you want to retrieve. There must be at least one .

  • indicates the table or tables from which to retrieve rows. Its syntax is described in Section 13.2.7.1, “ Syntax”.

  • The clause, if given, indicates the condition or conditions that rows must satisfy to be selected. is an expression that evaluates to true for each row to be selected. The statement selects all rows if there is no clause.

    In the clause, you can use any of the functions and operators that MySQL supports, except for aggregate (summary) functions. See Chapter 12, Functions and Operators.

can also be used to retrieve rows computed without reference to any table.

For example:

mysql> 
        -> 2

You are allowed to specify as a dummy table name in situations where no tables are referenced:

mysql> 
        -> 2

is purely for compatibility with some other database servers that require a clause. MySQL does not require the clause if no tables are referenced.

In general, clauses used must be given in exactly the order shown in the syntax description. For example, a clause must come after any clause and before any clause. The exception is that the clause can appear either as shown in the syntax description or immediately preceding the clause.

  • A can be given an alias using . The alias is used as the expression's column name and can be used in , , or clauses. For example:

    SELECT CONCAT(last_name,', ',first_name) AS full_name
      FROM mytable ORDER BY full_name;
    

    The keyword is optional when aliasing a . The preceding example could have been written like this:

    SELECT CONCAT(last_name,', ',first_name) full_name
      FROM mytable ORDER BY full_name;
    

    However, because the is optional, a subtle problem can occur if you forget the comma between two expressions: MySQL interprets the second as an alias name. For example, in the following statement, is treated as an alias name:

    SELECT columna columnb FROM mytable;
    

    For this reason, it is good practice to be in the habit of using explicitly when specifying column aliases.

  • It is not allowable to use a column alias in a clause, because the column value might not yet be determined when the clause is executed. See Section A.5.4, “Problems with Column Aliases”.

  • The clause indicates the table or tables from which to retrieve rows. If you name more than one table, you are performing a join. For information on join syntax, see Section 13.2.7.1, “ Syntax”. For each table specified, you can optionally specify an alias.

     [[AS] ]
        [{USE|IGNORE|FORCE} INDEX ()]
    

    The use of , , to give the optimizer hints about how to choose indexes is described in Section 13.2.7.1, “ Syntax”.

    You can use as an alternative way to force MySQL to prefer key scans instead of table scans. See Section 5.2.2, “Server System Variables”.

  • You can refer to a table within the default database as , or as . to specify a database explicitly. You can refer to a column as , ., or ... You need not specify a or . prefix for a column reference unless the reference would be ambiguous. See Section 9.2.1, “Identifier Qualifiers”, for examples of ambiguity that require the more explicit column reference forms.

  • A table reference can be aliased using AS or :

    SELECT t1.name, t2.salary FROM employee AS t1, info AS t2
      WHERE t1.name = t2.name;
    
    SELECT t1.name, t2.salary FROM employee t1, info t2
      WHERE t1.name = t2.name;
    
  • Columns selected for output can be referred to in and clauses using column names, column aliases, or column positions. Column positions are integers and begin with 1:

    SELECT college, region, seed FROM tournament
      ORDER BY region, seed;
    
    SELECT college, region AS r, seed AS s FROM tournament
      ORDER BY r, s;
    
    SELECT college, region, seed FROM tournament
      ORDER BY 2, 3;
    

    To sort in reverse order, add the (descending) keyword to the name of the column in the clause that you are sorting by. The default is ascending order; this can be specified explicitly using the keyword.

    Use of column positions is deprecated because the syntax has been removed from the SQL standard.

  • If you use , output rows are sorted according to the columns as if you had an for the same columns. To avoid the overhead of sorting that produces, add :

    SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;
    
  • MySQL extends the clause so that you can also specify and after columns named in the clause:

    SELECT a, COUNT(b) FROM test_table GROUP BY a DESC;
    
  • MySQL extends the use of to allow selecting fields that are not mentioned in the clause. If you are not getting the results that you expect from your query, please read the description of found in Section 12.10, “Functions and Modifiers for Use with Clauses”.

  • allows a modifier. See Section 12.10.2, “ Modifiers”.

  • The clause is applied nearly last, just before items are sent to the client, with no optimization. ( is applied after .)

    A clause can refer to any column or alias named in a in the list or in outer subqueries, and to aggregate functions. However, the SQL standard requires that must reference only columns in the clause or columns used in aggregate functions. To accommodate both standard SQL and the MySQL-specific behavior of being able to refer columns in the list, MySQL 5.0.2 and up allows to refer to columns in the list, columns in the clause, columns in outer subqueries, and to aggregate functions.

    For example, the following statement works in MySQL 5.0.2 but produces an error for earlier versions:

    mysql> 
    

    If the clause refers to a column that is ambiguous, a warning occurs. In the following statement, is ambiguous because it is used as both an alias and a column name:

    SELECT COUNT(col1) AS col2 FROM t GROUP BY col2 HAVING col2 = 2;
    

    Preference is given to standard SQL behavior, so if a column name is used both in and as an aliased column in the output column list, preference is given to the column in the column.

  • Do not use for items that should be in the clause. For example, do not write the following:

    SELECT  FROM  HAVING  > 0;
    

    Write this instead:

    SELECT  FROM  WHERE  > 0;
    
  • The clause can refer to aggregate functions, which the clause cannot:

    SELECT user, MAX(salary) FROM users
      GROUP BY user HAVING MAX(salary) > 10;
    

    (This did not work in some older versions of MySQL.)

  • MySQL allows duplicate column names. That is, there can be more than one with the same name. This is an extension to standard SQL. Because MySQL also allows and to refer to values, this can result in an ambiguity:

    SELECT 12 AS a, a FROM t GROUP BY a;
    

    In that statement, both columns have the name . To ensure that the correct column is used for grouping, use different names for each .

  • When MySQL resolves an unqualified column or alias reference in an , , or clause, it first searches for the name in the values. If the name is not found, it looks in the columns of the tables named in the clause.

  • The clause can be used to constrain the number of rows returned by the statement. takes one or two numeric arguments, which must both be non-negative integer constants (except when using prepared statements).

    With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1):

    SELECT * FROM tbl LIMIT 5,10;  # Retrieve rows 6-15
    

    To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:

    SELECT * FROM tbl LIMIT 95,18446744073709551615;
    

    With one argument, the value specifies the number of rows to return from the beginning of the result set:

    SELECT * FROM tbl LIMIT 5;     # Retrieve first 5 rows
    

    In other words, is equivalent to .

    For prepared statements, you can use placeholders (supported as of MySQL version 5.0.7). The following statements will return one row from the table:

    SET @a=1;
    PREPARE STMT FROM 'SELECT * FROM tbl LIMIT ?';
    EXECUTE STMT USING @a;
    

    The following statements will return the second to sixth row from the table:

    SET @skip=1; SET @numrows=5;
    PREPARE STMT FROM 'SELECT * FROM tbl LIMIT ?, ?';
    EXECUTE STMT USING @skip, @numrows;
    

    For compatibility with PostgreSQL, MySQL also supports the OFFSET syntax.

  • The ' form of writes the selected rows to a file. The file is created on the server host, so you must have the privilege to use this syntax. cannot be an existing file, which among other things prevents files such as and database tables from being destroyed. As of MySQL 5.0.19, the system variable controls the interpretation of the filename.

    The statement is intended primarily to let you very quickly dump a table to a text file on the server machine. If you want to create the resulting file on some client host other than the server host, you cannot use . In that case, you should instead use a command such as to generate the file on the client host.

    is the complement of ; the syntax for the part of the statement consists of the same and clauses that are used with the statement. See Section 13.2.5, “ Syntax”.

    controls how to write special characters. If the character is not empty, it is used as a prefix that precedes following characters on output:

    • The character

    • The character

    • The first character of the and values

    • ASCII (the zero-valued byte; what is actually written following the escape character is ASCII ‘’, not a zero-valued byte)

    The , , , or characters must be escaped so that you can read the file back in reliably. ASCII is escaped to make it easier to view with some pagers.

    The resulting file does not have to conform to SQL syntax, so nothing else need be escaped.

    If the character is empty, no characters are escaped and is output as , not . It is probably not a good idea to specify an empty escape character, particularly if field values in your data contain any of the characters in the list just given.

    Here is an example that produces a file in the comma-separated values (CSV) format used by many programs:

    SELECT a,b,a+b INTO OUTFILE '/tmp/result.txt'
      FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
      LINES TERMINATED BY '\n'
      FROM test_table;
    
  • If you use instead of , MySQL writes only one row into the file, without any column or line termination and without performing any escape processing. This is useful if you want to store a value in a file.

  • The clause can name a list of one or more user-defined variables. The selected values are assigned to the variables. The number of variables must match the number of columns.

    Within a stored routine, the variables can be routine parameters or local variables. See Section 17.2.7.3, “ Statement”.

  • Note: Any file created by or is writable by all users on the server host. The reason for this is that the MySQL server cannot create a file that is owned by anyone other than the user under whose account it is running. (You should never run mysqld as for this and other reasons.) The file thus must be world-writable so that you can manipulate its contents.

  • The syntax description at the beginning this section shows the clause near the end of the statement. It is also possible to use or immediately preceding the clause.

  • A clause names a procedure that should process the data in the result set. For an example, see Section 24.3.1, “Procedure Analyse”.

  • If you use with a storage engine that uses page or row locks, rows examined by the query are write-locked until the end of the current transaction. Using sets a shared lock that allows other transactions to read the examined rows but not to update or delete them. See Section 14.2.10.5, “ and Locking Reads”.

Following the keyword, you can use a number of options that affect the operation of the statement.

The , , and options specify whether duplicate rows should be returned. If none of these options are given, the default is (all matching rows are returned). and are synonyms and specify removal of duplicate rows from the result set.

, , and options beginning with are MySQL extensions to standard SQL.

  • gives the higher priority than a statement that updates a table. You should use this only for queries that are very fast and must be done at once. A query that is issued while the table is locked for reading runs even if there is an update statement waiting for the table to be free.

    cannot be used with statements that are part of a .

  • forces the optimizer to join the tables in the order in which they are listed in the clause. You can use this to speed up a query if the optimizer joins the tables in non-optimal order. See Section 7.2.1, “Optimizing Queries with . also can be used in the list. See Section 13.2.7.1, “ Syntax”.

  • can be used with or to tell the optimizer that the result set has many rows. In this case, MySQL directly uses disk-based temporary tables if needed, and prefers sorting to using a temporary table with a key on the elements.

  • forces the result to be put into a temporary table. This helps MySQL free the table locks early and helps in cases where it takes a long time to send the result set to the client.

  • can be used with or to tell the optimizer that the result set is small. In this case, MySQL uses fast temporary tables to store the resulting table instead of using sorting. This should not normally be needed.

  • tells MySQL to calculate how many rows there would be in the result set, disregarding any clause. The number of rows can then be retrieved with . See Section 12.9.3, “Information Functions”.

  • tells MySQL to store the query result in the query cache if you are using a value of or . For a query that uses or subqueries, this option effects any in the query. See Section 5.14, “The MySQL Query Cache”.

  • tells MySQL not to store the query result in the query cache. See Section 5.14, “The MySQL Query Cache”. For a query that uses or subqueries, this option effects any in the query.

13.2.7.1.  Syntax

MySQL supports the following syntaxes for the part of statements and multiple-table and statements:


     [, ] ...

:
    
  | 

:
     [[AS] ]
        [{USE|IGNORE|FORCE} INDEX ()]
  | (  )
  | { OJ  LEFT OUTER JOIN 
        ON  }

:
     [INNER | CROSS] JOIN  []
  |  STRAIGHT_JOIN 
  |  STRAIGHT_JOIN  ON 
  |  LEFT [OUTER] JOIN  
  |  NATURAL [LEFT [OUTER]] JOIN 
  |  RIGHT [OUTER] JOIN  
  |  NATURAL [RIGHT [OUTER]] JOIN 

:
    ON 
  | USING ()

A table reference is also known as a join expression.

The syntax of is extended in comparison with the SQL Standard. The latter accepts only , not a list of them inside a pair of parentheses.

This is a conservative extension if we consider each comma in a list of items as equivalent to an inner join. For example:

SELECT * FROM t1 LEFT JOIN (t2, t3, t4)
                 ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)

is equivalent to:

SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4)
                 ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)

In MySQL, is a syntactic equivalent to (they can replace each other). In standard SQL, they are not equivalent. is used with an clause, is used otherwise.

In versions of MySQL prior to 5.0.1, parentheses in were just omitted and all join operations were grouped to the left. In general, parentheses can be ignored in join expressions containing only inner join operations. As of 5.0.1, nested joins are allowed (see Section 7.2.10, “Nested Join Optimization”).

Further changes in join processing were made in 5.0.12 to make MySQL more compliant with standard SQL. These charges are described later in this section.

The following list describes general factors to take into account when writing joins.

  • A table reference can be aliased using AS or :

    SELECT t1.name, t2.salary
      FROM employee AS t1 INNER JOIN info AS t2 ON t1.name = t2.name;
    
    SELECT t1.name, t2.salary
      FROM employee t1 INNER JOIN info t2 ON t1.name = t2.name;
    
  • and (comma) are semantically equivalent in the absence of a join condition: both produce a Cartesian product between the specified tables (that is, each and every row in the first table is joined to each and every row in the second table).

    However, the precedence of the comma operator is less than than of , , , and so on. If you mix comma joins with the other join types when there is a join condition, an error of the form ' in 'on clause' may occur. Information about dealing with this problem is given later in this section.

  • The conditional is any conditional expression of the form that can be used in a clause. Generally, you should use the clause for conditions that specify how to join tables, and the clause to restrict which rows you want in the result set.

  • If there is no matching row for the right table in the or part in a , a row with all columns set to is used for the right table. You can use this fact to find rows in a table that have no counterpart in another table:

    SELECT table1.* FROM table1
      LEFT JOIN table2 ON table1.id=table2.id
      WHERE table2.id IS NULL;
    

    This example finds all rows in with an value that is not present in (that is, all rows in with no corresponding row in ). This assumes that is declared . See Section 7.2.9, “ and Optimization”.

  • The ) clause names a list of columns that must exist in both tables. If tables and both contain columns , , and , the following join compares corresponding columns from the two tables:

    a LEFT JOIN b USING (c1,c2,c3)
    
  • The of two tables is defined to be semantically equivalent to an or a with a clause that names all columns that exist in both tables.

  • works analogously to . To keep code portable across databases, it is recommended that you use instead of .

  • The syntax shown in the join syntax description exists only for compatibility with ODBC. The curly braces in the syntax should be written literally; they are not metasyntax as used elsewhere in syntax descriptions.

  • is identical to , except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order.

Some join examples:

SELECT * FROM table1, table2;

SELECT * FROM table1 INNER JOIN table2 ON table1.id=table2.id;

SELECT * FROM table1 LEFT JOIN table2 ON table1.id=table2.id;

SELECT * FROM table1 LEFT JOIN table2 USING (id);

SELECT * FROM table1 LEFT JOIN table2 ON table1.id=table2.id
  LEFT JOIN table3 ON table2.id=table3.id;

You can provide hints as to which index MySQL should use when retrieving information from a table. By specifying ), you can tell MySQL to use only one of the possible indexes to find rows in the table. The alternative syntax ) can be used to tell MySQL to not use some particular index. These hints are useful if shows that MySQL is using the wrong index from the list of possible indexes.

You can also use , which acts like ) but with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the given indexes to find rows in the table.

, , and affect only which indexes are used when MySQL decides how to find rows in the table and how to do the join. They do not affect whether an index is used when resolving an or .

, , and are synonyms for , , and .

Examples:

SELECT * FROM table1 USE INDEX (key1,key2)
  WHERE key1=1 AND key2=2 AND key3=3;

SELECT * FROM table1 IGNORE INDEX (key3)
  WHERE key1=1 AND key2=2 AND key3=3;

Join Processing Changes in MySQL 5.0.12

Beginning with MySQL 5.0.12, natural joins and joins with , including outer join variants, are processed according to the SQL:2003 standard. The goal was to align the syntax and semantics of MySQL with respect to and according to SQL:2003. However, these changes in join processing can result in different output columns for some joins. Also, some queries that appeared to work correctly in older versions must be rewritten to comply with the standard.

These changes have five main aspects:

  • The way that MySQL determines the result columns of or join operations (and thus the result of the entire clause).

  • Expansion of and .* into a list of selected columns.

  • Resolution of column names in or joins.

  • Transformation of or joins into .

  • Resolution of column names in the condition of a .

The following list provides more detail about several effects of the 5.0.12 change in join processing. The term “previously” means “prior to MySQL 5.0.12.

  • The columns of a join or a join may be different from previously. Specifically, redundant output columns no longer appear, and the order of columns for expansion may be different from before.

    Consider this set of statements:

    CREATE TABLE t1 (i INT, j INT);
    CREATE TABLE t2 (k INT, j INT);
    INSERT INTO t1 VALUES(1,1);
    INSERT INTO t2 VALUES(1,1);
    SELECT * FROM t1 NATURAL JOIN t2;
    SELECT * FROM t1 JOIN t2 USING (j);
    

    Previously, the statements produced this output:

    +------+------+------+------+
    | i    | j    | k    | j    |
    +------+------+------+------+
    |    1 |    1 |    1 |    1 |
    +------+------+------+------+
    +------+------+------+------+
    | i    | j    | k    | j    |
    +------+------+------+------+
    |    1 |    1 |    1 |    1 |
    +------+------+------+------+
    

    In the first statement, column appears in both tables and thus becomes a join column, so, according to standard SQL, it should appear only once in the output, not twice. Similarly, in the second SELECT statement, column is named in the clause and should appear only once in the output, not twice. But in both cases, the redundant column is not eliminated. Also, the order of the columns is not correct according to standard SQL.

    Now the statements produce this output:

    +------+------+------+
    | j    | i    | k    |
    +------+------+------+
    |    1 |    1 |    1 |
    +------+------+------+
    +------+------+------+
    | j    | i    | k    |
    +------+------+------+
    |    1 |    1 |    1 |
    +------+------+------+
    

    The redundant column is eliminated and the column order is correct according to standard SQL:

    • First, coalesced common columns of the two joined tables, in the order in which they occur in the first table

    • Second, columns unique to the first table, in order in which they occur in that table

    • Third, columns unique to the second table, in order in which they occur in that table

    The single result column that replaces two common columns is defined via the coalesce operation. That is, fro two and the resulting single join column is defined as , where:

    COALESCE(x, y) = (CASE WHEN V1 IS NOT NULL THEN V1 ELSE V2 END)
    

    If the join operation is any other join, the result columns of the join consists of the concatenation of all columns of the joined tables. This is the same as previously.

    A consequence of the definition of coalesced columns is that, for outer joins, the coalesced column contains the value of the non- column if one of the two columns is always . If neither or both columns are , both common columns have the same value, so it doesn't matter which one is chosen as the value of the coalesced column. A simple way to interpret this is to consider that a coalesced column of an outer join is represented by the common column of the inner table of a . Suppose that the tables and have the following contents:

    t1    t2
    ----  ----
    1 x   2 z
    2 y   3 w
    

    Then:

    mysql> 
    +------+------+------+
    | a    | b    | c    |
    +------+------+------+
    |    1 | x    | NULL |
    |    2 | y    | y    |
    +------+------+------+
    

    Here column contains the values of .

    mysql> 
    +------+------+------+
    | a    | c    | b    |
    +------+------+------+
    |    2 | y    | y    |
    |    3 | z    | NULL |
    +------+------+------+
    

    Here column contains the values of .

    Compare these results to the otherwise equivalent queries with :

    mysql> 
    +------+------+------+------+
    | a    | b    | a    | c    |
    +------+------+------+------+
    |    1 | x    | NULL | NULL |
    |    2 | y    |    2 | y    |
    +------+------+------+------+
    
    mysql> 
    +------+------+------+------+
    | a    | b    | a    | c    |
    +------+------+------+------+
    |    2 | y    |    2 | y    |
    | NULL | NULL |    3 | z    |
    +------+------+------+------+
    
  • Previously, a clause could be rewritten as an clause that compares corresponding columns. For example, the following two clauses were semantically identical:

    a LEFT JOIN b USING (c1,c2,c3)
    a LEFT JOIN b ON a.c1=b.c1 AND a.c2=b.c2 AND a.c3=b.c3
    

    Now the two clauses no longer are quite the same:

    • With respect to determining which rows satisfy the join condition, both joins remain semantically identical.

    • With respect to determining which columns to display for expansion, the two joins are not semantically identical. The join selects the coalesced value of corresponding columns, whereas the join selects all columns from all tables. For the preceding join, selects these values:

      COALESCE(a.c1,b.c1), COALESCE(a.c2,b.c2), COALESCE(a.c3,b.c3)
      

      For the join, selects these values:

      a.c1, a.c2, a.c3, b.c1, b.c2, b.c3
      

      With an inner join, is the same as either or because both columns will have the same value. With an outer join (such as ), one of the two columns can be . That column will be omitted from the result.

  • The evaluation of multi-way natural joins differs in a very important way that affects the result of or joins and that can require query rewriting. Suppose that you have three tables , , and that each have one row: , , and . Suppose also that you have this on the three tables:

    SELECT ... FROM t1 NATURAL JOIN t2 NATURAL JOIN t3;
    

    Previously, the left operand of the second join was considered to be , whereas it should be the nested join . As a result, the columns of are checked for common columns only in , and, if has common columns with , these columns are not used as equi-join columns. Thus, previously, the preceding query was transformed to the following equi-join:

    SELECT ... FROM t1, t2, t3
      WHERE t1.b = t2.b AND t2.c = t3.c;
    

    That join is missing one more equi-join predicate . As a result, it produces one row, not the empty result that it should. The correct equivalent query is this:

    SELECT ... FROM t1, t2, t3
      WHERE t1.b = t2.b AND t2.c = t3.c AND t1.a = t3.a;
    

    If you require the same query result in current versions of MySQL as in older versions, rewrite the natural join as the first equi-join.

  • Previously, the comma operator () and both had the same precedence, so the join expression was interpreted as . Now has higher precedence, so the expression is interpreted as . This change affects statements that use an clause, because that clause can refer only to columns in the operands of the join, and the change in precedence changes interpretation of what those operands are.

    Example:

    CREATE TABLE t1 (i1 INT, j1 INT);
    CREATE TABLE t2 (i2 INT, j2 INT);
    CREATE TABLE t3 (i3 INT, j3 INT);
    INSERT INTO t1 VALUES(1,1);
    INSERT INTO t2 VALUES(1,1);
    INSERT INTO t3 VALUES(1,1);
    SELECT * FROM t1, t2 JOIN t3 ON (t1.i1 = t3.i3);
    

    Previously, the was legal due to the implicit grouping of as . Now the takes precedence, so the operands for the clause are and . Because is not a column in either of the operands, the result is an error. To allow the join to be processed, group the first two tables explicitly with parentheses so that the operands for the clause are and :

    SELECT * FROM (t1, t2) JOIN t3 ON (t1.i1 = t3.i3);
    

    Alternatively, avoid the use of the comma operator and use instead:

    SELECT * FROM t1 JOIN t2 JOIN t3 ON (t1.i1 = t3.i3);
    

    This change also applies to statements that mix the comma operator with , , , and , all of which now have higher precedence than the comma operator.

  • Previously, the clause could refer to columns in tables named to its right. Now an clause can refer only to its operands.

    Example:

    CREATE TABLE t1 (i1 INT);
    CREATE TABLE t2 (i2 INT);
    CREATE TABLE t3 (i3 INT);
    SELECT * FROM t1 JOIN t2 ON (i1 = i3) JOIN t3;
    

    Previously, the statement was legal. Now the statement fails with an error because is a column in , which is not an operand of the clause. The statement should be rewritten as follows:

    SELECT * FROM t1 JOIN t2 JOIN t3 ON (i1 = i3);
    
  • Resolution of column names in or joins is different than previously. For column names that are outside the clause, MySQL now handles a superset of the queries compared to previously. That is, in cases when MySQL formerly issued an error that some column is ambiguous, the query now is handled correctly. This is due to the fact that MySQL now treats the common columns of or joins as a single column, so when a query refers to such columns, the query compiler does not consider them as ambiguous.

    Example:

    SELECT * FROM t1 NATURAL JOIN t2 WHERE b > 1;
    

    Previously, this query would produce an error . Now the query produces the correct result:

    +------+------+------+
    | b    | c    | y    |
    +------+------+------+
    |    4 |    2 |    3 |
    +------+------+------+
    

    One extension of MySQL compared to the SQL:2003 standard is that MySQL allows you to qualify the common (coalesced) columns of or joins (just as previously), while the standard disallows that.

13.2.7.2.  Syntax

SELECT ...
UNION [ALL | DISTINCT] SELECT ...
[UNION [ALL | DISTINCT] SELECT ...]

is used to combine the result from multiple statements into a single result set.

The column names from the first statement are used as the column names for the results returned. Selected columns listed in corresponding positions of each statement should have the same data type. (For example, the first column selected by the first statement should have the same type as the first column selected by the other statements.)

If the data types of corresponding columns do not match, the types and lengths of the columns in the result take into account the values retrieved by all of the statements. For example, consider the following:

mysql> 
+---------------+
| REPEAT('a',1) |
+---------------+
| a             |
| bbbbbbbbbb    |
+---------------+

(In some earlier versions of MySQL, only the type and length from the first would have been used and the second row would have been truncated to a length of 1.)

The statements are normal select statements, but with the following restrictions:

  • Only the last statement can use .

  • cannot be used with statements that are part of a . If you specify it for the first , it has no effect. If you specify it for any subsequent statements, a syntax error results.

The default behavior for is that duplicate rows are removed from the result. The optional keyword has no effect other than the default because it also specifies duplicate-row removal. With the optional keyword, duplicate-row removal does not occur and the result includes all matching rows from all the statements.

You can mix and in the same query. Mixed types are treated such that a union overrides any union to its left. A union can be produced explicitly by using or implicitly by using with no following or keyword.

To use an or clause to sort or limit the entire result, parenthesize the individual statements and place the or after the last one. The following example uses both clauses:

(SELECT a FROM t1 WHERE a=10 AND B=1)
UNION
(SELECT a FROM t2 WHERE a=11 AND B=2)
ORDER BY a LIMIT 10;

This kind of cannot use column references that include a table name (that is, names in . format). Instead, provide a column alias in the first statement and refer to the alias in the . (Alternatively, refer to the column in the using its column position. However, use of column positions is deprecated.)

Also, if a column to be sorted is aliased, the clause must refer to the alias, not the column name. The first of the following statements will work, but the second will fail with an error:

(SELECT a AS b FROM t) UNION (SELECT ...) ORDER BY b;
(SELECT a AS b FROM t) UNION (SELECT ...) ORDER BY a;

To apply or to an individual , place the clause inside the parentheses that enclose the :

(SELECT a FROM t1 WHERE a=10 AND B=1 ORDER BY a LIMIT 10)
UNION
(SELECT a FROM t2 WHERE a=11 AND B=2 ORDER BY a LIMIT 10);

Use of for individual statements implies nothing about the order in which the rows appear in the final result because by default produces an unordered set of rows. If appears with , it is used to determine the subset of the selected rows to retrieve for the , but does not necessarily affect the order of those rows in the final result. If appears without in a , it is optimized away because it will have no effect anyway.

To cause rows in a result to consist of the sets of rows retrieved by each one after the other, select an additional column in each to use as a sort column and add an following the last :

(SELECT 1 AS sort_col, col1a, col1b, ... FROM t1)
UNION
(SELECT 2, col2a, col2b, ... FROM t2) ORDER BY sort_col;

To additionally maintain sort order within individual results, add a secondary column to the clause:

(SELECT 1 AS sort_col, col1a, col1b, ... FROM t1)
UNION
(SELECT 2, col2a, col2b, ... FROM t2) ORDER BY sort_col, col1a;

13.2.8. Subquery Syntax

A subquery is a statement within another statement.

Starting with MySQL 4.1, all subquery forms and operations that the SQL standard requires are supported, as well as a few features that are MySQL-specific.

Here is an example of a subquery:

SELECT * FROM t1 WHERE column1 = (SELECT column1 FROM t2);

In this example, is the outer query (or outer statement), and is the subquery. We say that the subquery is nested within the outer query, and in fact it is possible to nest subqueries within other subqueries, to a considerable depth. A subquery must always appear within parentheses.

The main advantages of subqueries are:

  • They allow queries that are structured so that it is possible to isolate each part of a statement.

  • They provide alternative ways to perform operations that would otherwise require complex joins and unions.

  • They are, in many people's opinion, readable. Indeed, it was the innovation of subqueries that gave people the original idea of calling the early SQL “Structured Query Language.

Here is an example statement that shows the major points about subquery syntax as specified by the SQL standard and supported in MySQL:

DELETE FROM t1
WHERE s11 > ANY
(SELECT COUNT(*) /* no hint */ FROM t2
WHERE NOT EXISTS
(SELECT * FROM t3
WHERE ROW(5*t2.s1,77)=
(SELECT 50,11*s1 FROM t4 UNION SELECT 50,77 FROM
(SELECT * FROM t5) AS t5)));

A subquery can return a scalar (a single value), a single row, a single column, or a table (one or more rows of one or more columns). These are called scalar, column, row, and table subqueries. Subqueries that return a particular kind of result often can be used only in certain contexts, as described in the following sections.

There are few restrictions on the type of statements in which subqueries can be used. A subquery can contain any of the keywords or clauses that an ordinary can contain: , , , , joins, index hints, constructs, comments, functions, and so on.

One restriction is that a subquery's outer statement must be one of: , , , , , or . Another restriction is that currently you cannot modify a table and select from the same table in a subquery. This applies to statements such as , , , , and (because subqueries can be used in the clause) .

A more comprehensive discussion of restrictions on subquery use, including performance issues for certain forms of subquery syntax, is given in Section I.3, “Restrictions on Subqueries”.

13.2.8.1. The Subquery as Scalar Operand

In its simplest form, a subquery is a scalar subquery that returns a single value. A scalar subquery is a simple operand, and you can use it almost anywhere a single column value or literal is legal, and you can expect it to have those characteristics that all operands have: a data type, a length, an indication whether it can be , and so on. For example:

CREATE TABLE t1 (s1 INT, s2 CHAR(5) NOT NULL);
INSERT INTO t1 VALUES(100, 'abcde');
SELECT (SELECT s2 FROM t1);

The subquery in this returns a single value () that has a data type of , a length of 5, a character set and collation equal to the defaults in effect at time, and an indication that the value in the column can be . In fact, almost all subqueries can be . If the table used in the example were empty, the value of the subquery would be .

There are a few contexts in which a scalar subquery cannot be used. If a statement allows only a literal value, you cannot use a subquery. For example, requires literal integer arguments, and requires a literal string filename. You cannot use subqueries to supply these values.

When you see examples in the following sections that contain the rather spartan construct , imagine that your own code contains much more diverse and complex constructions.

Suppose that we make two tables:

CREATE TABLE t1 (s1 INT);
INSERT INTO t1 VALUES (1);
CREATE TABLE t2 (s1 INT);
INSERT INTO t2 VALUES (2);

Then perform a :

SELECT (SELECT s1 FROM t2) FROM t1;

The result is because there is a row in containing a column that has a value of .

A scalar subquery can be part of an expression, but remember the parentheses, even if the subquery is an operand that provides an argument for a function. For example:

SELECT UPPER((SELECT s1 FROM t1)) FROM t2;

13.2.8.2. Comparisons Using Subqueries

The most common use of a subquery is in the form:

  ()

Where is one of these operators:

=  >  <  >=  <=  <>

For example:

  ... 'a' = (SELECT column1 FROM t1)

At one time the only legal place for a subquery was on the right side of a comparison, and you might still find some old DBMSs that insist on this.

Here is an example of a common-form subquery comparison that you cannot do with a join. It finds all the values in table that are equal to a maximum value in table :

SELECT column1 FROM t1
WHERE column1 = (SELECT MAX(column2) FROM t2);

Here is another example, which again is impossible with a join because it involves aggregating for one of the tables. It finds all rows in table containing a value that occurs twice in a given column:

SELECT * FROM t1 AS t
WHERE 2 = (SELECT COUNT(*) FROM t1 WHERE t1.id = t.id);

For a comparison performed with one of these operators, the subquery must return a scalar, with the exception that can be used with row subqueries. See Section 13.2.8.5, “Row Subqueries”.

13.2.8.3. Subqueries with , , and

Syntax:

  ANY ()
 IN ()
  SOME ()

The keyword, which must follow a comparison operator, means “return if the comparison is for of the values in the column that the subquery returns.” For example:

SELECT s1 FROM t1 WHERE s1 > ANY (SELECT s1 FROM t2);

Suppose that there is a row in table containing . The expression is if table contains because there is a value in that is less than . The expression is if table contains , or if table is empty. The expression is if table contains .

The word is an alias for . Thus, these two statements are the same:

SELECT s1 FROM t1 WHERE s1 = ANY (SELECT s1 FROM t2);
SELECT s1 FROM t1 WHERE s1 IN    (SELECT s1 FROM t2);

However, is not an alias for , but for . See Section 13.2.8.4, “Subqueries with .

The word is an alias for . Thus, these two statements are the same:

SELECT s1 FROM t1 WHERE s1 <> ANY  (SELECT s1 FROM t2);
SELECT s1 FROM t1 WHERE s1 <> SOME (SELECT s1 FROM t2);

Use of the word is rare, but this example shows why it might be useful. To most people's ears, the English phrase “a is not equal to any b” means “there is no b which is equal to a,” but that is not what is meant by the SQL syntax. The syntax means “there is some b to which a is not equal.” Using instead helps ensure that everyone understands the true meaning of the query.

13.2.8.4. Subqueries with

Syntax:

  ALL ()

The word , which must follow a comparison operator, means “return if the comparison is for of the values in the column that the subquery returns.” For example:

SELECT s1 FROM t1 WHERE s1 > ALL (SELECT s1 FROM t2);

Suppose that there is a row in table containing . The expression is if table contains because is greater than all three values in . The expression is if table contains because there is a single value in table that is greater than . The expression is unknown (that is, ) if table contains .

Finally, if table is empty, the result is . So, the following statement is when table is empty:

SELECT * FROM t1 WHERE 1 > ALL (SELECT s1 FROM t2);

But this statement is when table is empty:

SELECT * FROM t1 WHERE 1 > (SELECT s1 FROM t2);

In addition, the following statement is when table is empty:

SELECT * FROM t1 WHERE 1 > ALL (SELECT MAX(s1) FROM t2);

In general, tables containing values and empty tables are “edge cases.” When writing subquery code, always consider whether you have taken those two possibilities into account.

is an alias for . Thus, these two statements are the same:

SELECT s1 FROM t1 WHERE s1 <> ALL (SELECT s1 FROM t2);
SELECT s1 FROM t1 WHERE s1 NOT IN (SELECT s1 FROM t2);

13.2.8.5. Row Subqueries

The discussion to this point has been of scalar or column subqueries; that is, subqueries that return a single value or a column of values. A row subquery is a subquery variant that returns a single row and can thus return more than one column value. Here are two examples:

SELECT * FROM t1 WHERE (1,2) = (SELECT column1, column2 FROM t2);
SELECT * FROM t1 WHERE ROW(1,2) = (SELECT column1, column2 FROM t2);

The queries here are both if table has a row where and .

The expressions and are sometimes called row constructors. The two are equivalent. They are legal in other contexts as well. For example, the following two statements are semantically equivalent (although currently only the second one can be optimized):

  SELECT * FROM t1 WHERE (column1,column2) = (1,1);
  SELECT * FROM t1 WHERE column1 = 1 AND column2 = 1;

The normal use of row constructors is for comparisons with subqueries that return two or more columns. For example, the following query answers the request, “find all rows in table that also exist in table ”:

SELECT column1,column2,column3
FROM t1
WHERE (column1,column2,column3) IN
(SELECT column1,column2,column3 FROM t2);

13.2.8.6.  and

If a subquery returns any rows at all, is , and is . For example:

SELECT column1 FROM t1 WHERE EXISTS (SELECT * FROM t2);

Traditionally, an subquery starts with , but it could begin with or or anything at all. MySQL ignores the list in such a subquery, so it makes no difference.

For the preceding example, if contains any rows, even rows with nothing but values, the condition is . This is actually an unlikely example because a subquery almost always contains correlations. Here are some more realistic examples:

  • What kind of store is present in one or more cities?

    SELECT DISTINCT store_type FROM stores
      WHERE EXISTS (SELECT * FROM cities_stores
                    WHERE cities_stores.store_type = stores.store_type);
    
  • What kind of store is present in no cities?

    SELECT DISTINCT store_type FROM stores
      WHERE NOT EXISTS (SELECT * FROM cities_stores
                        WHERE cities_stores.store_type = stores.store_type);
    
  • What kind of store is present in all cities?

    SELECT DISTINCT store_type FROM stores s1
      WHERE NOT EXISTS (
        SELECT * FROM cities WHERE NOT EXISTS (
          SELECT * FROM cities_stores
           WHERE cities_stores.city = cities.city
           AND cities_stores.store_type = stores.store_type));
    

The last example is a double-nested query. That is, it has a clause within a clause. Formally, it answers the question “does a city exist with a store that is not in ”? But it is easier to say that a nested answers the question “is for all ?

13.2.8.7. Correlated Subqueries

A correlated subquery is a subquery that contains a reference to a table that also appears in the outer query. For example:

SELECT * FROM t1 WHERE column1 = ANY
(SELECT column1 FROM t2 WHERE t2.column2 = t1.column2);

Notice that the subquery contains a reference to a column of , even though the subquery's clause does not mention a table . So, MySQL looks outside the subquery, and finds in the outer query.

Suppose that table contains a row where and ; meanwhile, table contains a row where and . The simple expression would be , but in this example, the clause within the subquery is (because is not equal to ), so the subquery as a whole is .

Scoping rule: MySQL evaluates from inside to outside. For example:

SELECT column1 FROM t1 AS x
WHERE x.column1 = (SELECT column1 FROM t2 AS x
WHERE x.column1 = (SELECT column1 FROM t3
WHERE x.column2 = t3.column1));

In this statement, must be a column in table because renames . It is not a column in table because is an outer query that is farther out.

For subqueries in or clauses, MySQL also looks for column names in the outer select list.

For certain cases, a correlated subquery is optimized. For example:

 IN (SELECT  FROM  WHERE )

Otherwise, they are inefficient and likely to be slow. Rewriting the query as a join might improve performance.

Correlated subqueries cannot refer to the results of aggregate functions from the outer query.

13.2.8.8. Subqueries in the clause

Subqueries are legal in a statement's clause. The actual syntax is:

SELECT ... FROM () [AS]  ...

The clause is mandatory, because every table in a clause must have a name. Any columns in the select list must have unique names. You can find this syntax described elsewhere in this manual, where the term used is “derived tables.

For the sake of illustration, assume that you have this table:

CREATE TABLE t1 (s1 INT, s2 CHAR(5), s3 FLOAT);

Here is how to use a subquery in the clause, using the example table:

INSERT INTO t1 VALUES (1,'1',1.0);
INSERT INTO t1 VALUES (2,'2',2.0);
SELECT sb1,sb2,sb3
FROM (SELECT s1 AS sb1, s2 AS sb2, s3*2 AS sb3 FROM t1) AS sb
WHERE sb1 > 1;

Result: .

Here is another example: Suppose that you want to know the average of a set of sums for a grouped table. This does not work:

SELECT AVG(SUM(column1)) FROM t1 GROUP BY column1;

However, this query provides the desired information:

SELECT AVG(sum_column1)
FROM (SELECT SUM(column1) AS sum_column1
FROM t1 GROUP BY column1) AS t1;

Notice that the column name used within the subquery () is recognized in the outer query.

Subqueries in the clause can return a scalar, column, row, or table. Subqueries in the clause cannot be correlated subqueries.

Subqueries in the clause are executed even for the statement (that is, derived temporary tables are built). This occurs because upper level queries need information about all tables during optimization phase.

13.2.8.9. Subquery Errors

There are some errors that apply only to subqueries. This section describes them.

  • Unsupported subquery syntax:

    ERROR 1235 (ER_NOT_SUPPORTED_YET)
    SQLSTATE = 42000
    Message = "This version of MySQL does not yet support
    'LIMIT & IN/ALL/ANY/SOME subquery'"
    

    This means that statements of the following form do not work yet:

    SELECT * FROM t1 WHERE s1 IN (SELECT s2 FROM t2 ORDER BY s1 LIMIT 1)
    
  • Incorrect number of columns from subquery:

    ERROR 1241 (ER_OPERAND_COL)
    SQLSTATE = 21000
    Message = "Operand should contain 1 column(s)"
    

    This error occurs in cases like this:

    SELECT (SELECT column1, column2 FROM t2) FROM t1;
    

    You may use a subquery that returns multiple columns, if the purpose is comparison. See Section 13.2.8.5, “Row Subqueries”. However, in other contexts, the subquery must be a scalar operand.

  • Incorrect number of rows from subquery:

    ERROR 1242 (ER_SUBSELECT_NO_1_ROW)
    SQLSTATE = 21000
    Message = "Subquery returns more than 1 row"
    

    This error occurs for statements where the subquery returns more than one row. Consider the following example:

    SELECT * FROM t1 WHERE column1 = (SELECT column1 FROM t2);
    

    If returns just one row, the previous query will work. If the subquery returns more than one row, error 1242 will occur. In that case, the query should be rewritten as:

    SELECT * FROM t1 WHERE column1 = ANY (SELECT column1 FROM t2);
    
  • Incorrectly used table in subquery:

    Error 1093 (ER_UPDATE_TABLE_USED)
    SQLSTATE = HY000
    Message = "You can't specify target table 'x'
    for update in FROM clause"
    

    This error occurs in cases such as the following:

    UPDATE t1 SET column2 = (SELECT MAX(column1) FROM t1);
    

    You can use a subquery for assignment within an statement because subqueries are legal in and statements as well as in statements. However, you cannot use the same table (in this case, table ) for both the subquery's clause and the update target.

For transactional storage engines, the failure of a subquery causes the entire statement to fail. For non-transactional storage engines, data modifications made before the error was encountered are preserved.

13.2.8.10. Optimizing Subqueries

Development is ongoing, so no optimization tip is reliable for the long term. The following list provides some interesting tricks that you might want to play with:

  • Use subquery clauses that affect the number or order of the rows in the subquery. For example:

    SELECT * FROM t1 WHERE t1.column1 IN
    (SELECT column1 FROM t2 ORDER BY column1);
    SELECT * FROM t1 WHERE t1.column1 IN
    (SELECT DISTINCT column1 FROM t2);
    SELECT * FROM t1 WHERE EXISTS
    (SELECT * FROM t2 LIMIT 1);
    
  • Replace a join with a subquery. For example, try this:

    SELECT DISTINCT column1 FROM t1 WHERE t1.column1 IN (
    SELECT column1 FROM t2);
    

    Instead of this:

    SELECT DISTINCT t1.column1 FROM t1, t2
    WHERE t1.column1 = t2.column1;
    
  • Some subqueries can be transformed to joins for compatibility with older versions of MySQL that do not support subqueries. However, in some cases, converting a subquery to a join may improve performance. See Section 13.2.8.11, “Rewriting Subqueries as Joins for Earlier MySQL Versions”.

  • Move clauses from outside to inside the subquery. For example, use this query:

    SELECT * FROM t1
    WHERE s1 IN (SELECT s1 FROM t1 UNION ALL SELECT s1 FROM t2);
    

    Instead of this query:

    SELECT * FROM t1
    WHERE s1 IN (SELECT s1 FROM t1) OR s1 IN (SELECT s1 FROM t2);
    

    For another example, use this query:

    SELECT (SELECT column1 + 5 FROM t1) FROM t2;
    

    Instead of this query:

    SELECT (SELECT column1 FROM t1) + 5 FROM t2;
    
  • Use a row subquery instead of a correlated subquery. For example, use this query:

    SELECT * FROM t1
    WHERE (column1,column2) IN (SELECT column1,column2 FROM t2);
    

    Instead of this query:

    SELECT * FROM t1
    WHERE EXISTS (SELECT * FROM t2 WHERE t2.column1=t1.column1
    AND t2.column2=t1.column2);
    
  • Use rather than .

  • Use ) rather than .

  • Use rather than .

  • For uncorrelated subqueries that always return one row, is always slower than . For example, use this query:

    SELECT * FROM t1 WHERE t1.
    = (SELECT a FROM t2 WHERE b = );
    

    Instead of this query:

    SELECT * FROM t1 WHERE t1.
    IN (SELECT a FROM t2 WHERE b = );
    

These tricks might cause programs to go faster or slower. Using MySQL facilities like the function, you can get an idea about what helps in your own situation. See Section 12.9.3, “Information Functions”.

Some optimizations that MySQL itself makes are:

  • MySQL executes non-correlated subqueries only once. Use to make sure that a given subquery really is non-correlated.

  • MySQL rewrites , , , and subqueries in an attempt to take advantage of the possibility that the select-list columns in the subquery are indexed.

  • MySQL replaces subqueries of the following form with an index-lookup function, which describes as a special join type ( or ):

    ... IN (SELECT  FROM  ...)
    
  • MySQL enhances expressions of the following form with an expression involving or , unless values or empty sets are involved:

     {ALL|ANY|SOME} {> | < | >= | <=} ()
    

    For example, this clause:

    WHERE 5 > ALL (SELECT x FROM t)
    

    might be treated by the optimizer like this:

    WHERE 5 > (SELECT MAX(x) FROM t)
    

There is a chapter titled “How MySQL Transforms Subqueries” in the MySQL Internals Manual, available at http://dev.mysql.com/doc/.

13.2.8.11. Rewriting Subqueries as Joins for Earlier MySQL Versions

In previous versions of MySQL (prior to MySQL 4.1), only nested queries of the form and were supported. Although this is not the case in MySQL 5.0, it is still true that there are sometimes other ways to test membership in a set of values. It is also true that on some occasions, it is not only possible to rewrite a query without a subquery, but it can be more efficient to make use of some of these techniques rather than to use subqueries. One of these is the construct:

For example, this query:

SELECT * FROM t1 WHERE id IN (SELECT id FROM t2);

Can be rewritten as:

SELECT DISTINCT t1.* FROM t1, t2 WHERE t1.id=t2.id;

The queries:

SELECT * FROM t1 WHERE id NOT IN (SELECT id FROM t2);
SELECT * FROM t1 WHERE NOT EXISTS (SELECT id FROM t2 WHERE t1.id=t2.id);

Can be be rewritten using :

SELECT table1.* FROM table1 LEFT JOIN table2 ON table1.id=table2.id
WHERE table2.id IS NULL;

A can be faster than an equivalent subquery because the server might be able to optimize it better — a fact that is not specific to MySQL Server alone. Prior to SQL-92, outer joins did not exist, so subqueries were the only way to do certain things. Today, MySQL Server and many other modern database systems offer a wide range of outer join types.

MySQL Server supports multiple-table statements that can be used to efficiently delete rows based on information from one table or even from many tables at the same time. Multiple-table statements are also supported.

13.2.9. TRUNCATE Syntax

TRUNCATE [TABLE] 

empties a table completely. Logically, this is equivalent to a statement that deletes all rows, but there are practical differences under some circumstances.

For before version 5.0.3, is mapped to , so there is no difference. Starting with MySQL 5.0.3, fast is available. However, the operation is still mapped to if there are foreign key constraints that reference the table. (When fast truncate is used, it resets any counter. From MySQL 5.0.13 on, the counter is reset by , regardless of whether there is a foreign key constraint.)

For other storage engines, differs from in the following ways in MySQL 5.0:

  • Truncate operations drop and re-create the table, which is much faster than deleting rows one by one.

  • Truncate operations are not transaction-safe; an error occurs when attempting one in the course of an active transaction or active table lock.

  • The number of deleted rows is not returned.

  • As long as the table format file .frm is valid, the table can be re-created as an empty table with , even if the data or index files have become corrupted.

  • The table handler does not remember the last used value, but starts counting from the beginning. This is true even for and , which normally do not reuse sequence values.

Since truncation of a table does not make any use of , the statement does not invoke triggers.

is an Oracle SQL extension adopted in MySQL.

13.2.10. UPDATE Syntax

Single-table syntax:

UPDATE [LOW_PRIORITY] [IGNORE] 
    SET = [, = ...]
    [WHERE ]
    [ORDER BY ...]
    [LIMIT ]

Multiple-table syntax:

UPDATE [LOW_PRIORITY] [IGNORE] 
    SET = [, = ...]
    [WHERE ]

For the single-table syntax, the statement updates columns of existing rows in with new values. The clause indicates which columns to modify and the values they should be given. The clause, if given, specifies the conditions that identify which rows to update. With no clause, all rows are updated. If the clause is specified, the rows are updated in the order that is specified. The clause places a limit on the number of rows that can be updated.

For the multiple-table syntax, updates rows in each table named in that satisfy the conditions. In this case, and cannot be used.

is an expression that evaluates to true for each row to be updated. It is specified as described in Section 13.2.7, “ Syntax”.

The statement supports the following modifiers:

  • If you use the keyword, execution of the is delayed until no other clients are reading from the table.

  • If you use the keyword, the update statement does not abort even if errors occur during the update. Rows for which duplicate-key conflicts occur are not updated. Rows for which columns are updated to values that would cause data conversion errors are updated to the closet valid values instead.

If you access a column from in an expression, uses the current value of the column. For example, the following statement sets the column to one more than its current value:

UPDATE persondata SET age=age+1;

Single-table assignments are generally evaluated from left to right. For multiple-table updates, there is no guarantee that assignments are carried out in any particular order.

If you set a column to the value it currently has, MySQL notices this and does not update it.

If you update a column that has been declared by setting to , the column is set to the default value appropriate for the data type and the warning count is incremented. The default value is for numeric types, the empty string () for string types, and the “zero” value for date and time types.

returns the number of rows that were actually changed. The C API function returns the number of rows that were matched and updated and the number of warnings that occurred during the .

You can use to restrict the scope of the . A clause is a rows-matched restriction. The statement stops as soon as it has found rows that satisfy the clause, whether or not they actually were changed.

If an statement includes an clause, the rows are updated in the order specified by the clause. This can be useful in certain situations that might otherwise result in an error. Suppose that a table contains a column that has a unique index. The following statement could fail with a duplicate-key error, depending on the order in which rows are updated:

UPDATE t SET id = id + 1;

For example, if the table contains 1 and 2 in the column and 1 is updated to 2 before 2 is updated to 3, an error occurs. To avoid this problem, add an clause to cause the rows with larger values to be updated before those with smaller values:

UPDATE t SET id = id + 1 ORDER BY id DESC;

You can also perform operations covering multiple tables. However, you cannot use or with a multiple-table . The clause lists the tables involved in the join. Its syntax is described in Section 13.2.7.1, “ Syntax”. Here is an example:

UPDATE items,month SET items.price=month.price
WHERE items.id=month.id;

The preceding example shows an inner join that uses the comma operator, but multiple-table statements can use any type of join allowed in statements, such as .

You need the privilege only for columns referenced in a multiple-table that are actually updated. You need only the privilege for any columns that are read but not modified.

If you use a multiple-table statement involving tables for which there are foreign key constraints, the MySQL optimizer might process tables in an order that differs from that of their parent/child relationship. In this case, the statement fails and rolls back. Instead, update a single table and rely on the capabilities that provides to cause the other tables to be modified accordingly. See Section 14.2.6.4, “ Constraints”.

Currently, you cannot update a table and select from the same table in a subquery.