Single-table syntax:
DELETE [LOW_PRIORITY] [QUICK] [IGNORE] FROMtbl_name[WHEREwhere_condition] [ORDER BY ...] [LIMITrow_count]
Multiple-table syntax:
DELETE [LOW_PRIORITY] [QUICK] [IGNORE]
tbl_name[.*] [, tbl_name[.*]] ...
FROM table_references
[WHERE where_condition]
Or:
DELETE [LOW_PRIORITY] [QUICK] [IGNORE]
FROM tbl_name[.*] [, tbl_name[.*]] ...
USING table_references
[WHERE where_condition]
For the single-table syntax, the DELETE
statement deletes rows from tbl_name
and returns the number of rows deleted. The
WHERE clause, if given, specifies the
conditions that identify which rows to delete. With no
WHERE clause, all rows are deleted. If the
ORDER BY clause is specified, the rows are
deleted in the order that is specified. The
LIMIT clause places a limit on the number of
rows that can be deleted.
For the multiple-table syntax, DELETE deletes
from each tbl_name the rows that
satisfy the conditions. In this case, ORDER
BY and LIMIT cannot be used.
where_condition is an expression that
evaluates to true for each row to be deleted. It is specified as
described in Section 13.2.7, “SELECT Syntax”.
As stated, a DELETE statement with no
WHERE clause deletes all rows. A faster way
to do this, when you do not want to know the number of deleted
rows, is to use TRUNCATE TABLE. See
Section 13.2.9, “TRUNCATE Syntax”.
If you delete the row containing the maximum value for an
AUTO_INCREMENT column, the value is reused
later for a BDB table, but not for a
MyISAM or InnoDB table. If
you delete all rows in the table with DELETE FROM
tbl_name (without a
WHERE clause) in
AUTOCOMMIT mode, the sequence starts over for
all storage engines except InnoDB and
MyISAM. There are some exceptions to this
behavior for InnoDB tables, as discussed in
Section 14.2.6.3, “How AUTO_INCREMENT Columns Work in InnoDB”.
For MyISAM and BDB tables,
you can specify an AUTO_INCREMENT secondary
column in a multiple-column key. In this case, reuse of values
deleted from the top of the sequence occurs even for
MyISAM tables. See
Section 3.6.9, “Using AUTO_INCREMENT”.
The DELETE statement supports the following
modifiers:
-
If you specify
LOW_PRIORITY, the server delays execution of theDELETEuntil no other clients are reading from the table. -
For
MyISAMtables, if you use theQUICKkeyword, the storage engine does not merge index leaves during delete, which may speed up some kinds of delete operations. -
The
IGNOREkeyword causes MySQL to ignore all errors during the process of deleting rows. (Errors encountered during the parsing stage are processed in the usual manner.) Errors that are ignored due to the use ofOPTIONare returned as warnings.
The speed of delete operations may also be affected by factors
discussed in Section 7.2.18, “Speed of DELETE Statements”.
In MyISAM tables, deleted rows are maintained
in a linked list and subsequent INSERT
operations reuse old row positions. To reclaim unused space and
reduce file sizes, use the OPTIMIZE TABLE
statement or the myisamchk utility to
reorganize tables. OPTIMIZE TABLE is easier,
but myisamchk is faster. See
Section 13.5.2.5, “OPTIMIZE TABLE Syntax”, and
Section 8.3, “myisamchk — MyISAM Table-Maintenance Utility”.
The QUICK modifier affects whether index
leaves are merged for delete operations. DELETE
QUICK is most useful for applications where index
values for deleted rows are replaced by similar index values
from rows inserted later. In this case, the holes left by
deleted values are reused.
DELETE QUICK is not useful when deleted
values lead to undef-filled index blocks spanning a range of
index values for which new inserts occur again. In this case,
use of QUICK can lead to wasted space in the
index that remains unreclaimed. Here is an example of such a
scenario:
-
Create a table that contains an indexed
AUTO_INCREMENTcolumn. -
Insert many rows into the table. Each insert results in an index value that is added to the high end of the index.
-
Delete a block of rows at the low end of the column range using
DELETE QUICK.
In this scenario, the index blocks associated with the deleted
index values become undef-filled but are not merged with other
index blocks due to the use of QUICK. They
remain undef-filled when new inserts occur, because new rows do
not have index values in the deleted range. Furthermore, they
remain undef-filled even if you later use
DELETE without QUICK,
unless some of the deleted index values happen to lie in index
blocks within or adjacent to the undef-filled blocks. To reclaim
unused index space under these circumstances, use
OPTIMIZE TABLE.
If you are going to delete many rows from a table, it might be
faster to use DELETE QUICK followed by
OPTIMIZE TABLE. This rebuilds the index
rather than performing many index block merge operations.
The MySQL-specific LIMIT
row_count option to
DELETE tells the server the maximum number of
rows to be deleted before control is returned to the client.
This can be used to ensure that a given
DELETE statement does not take too much time.
You can simply repeat the DELETE statement
until the number of affected rows is less than the
LIMIT value.
If the DELETE statement includes an
ORDER BY clause, the rows are deleted in the
order specified by the clause. This is really useful only in
conjunction with LIMIT. For example, the
following statement finds rows matching the
WHERE clause, sorts them by
timestamp_column, and deletes the first
(oldest) one:
DELETE FROM somelog WHERE user = 'jcole' ORDER BY timestamp_column LIMIT 1;
You can specify multiple tables in a DELETE
statement to delete rows from one or more tables depending on
the particular condition in the WHERE clause.
However, you cannot use ORDER BY or
LIMIT in a multiple-table
DELETE. The
table_references clause lists the
tables involved in the join. Its syntax is described in
Section 13.2.7.1, “JOIN Syntax”.
For the first multiple-table syntax, only matching rows from the
tables listed before the FROM clause are
deleted. For the second multiple-table syntax, only matching
rows from the tables listed in the FROM
clause (before the USING clause) are deleted.
The effect is that you can delete rows from many tables at the
same time and have additional tables that are used only for
searching:
DELETE t1, t2 FROM t1, t2, t3 WHERE t1.id=t2.id AND t2.id=t3.id;
Or:
DELETE FROM t1, t2 USING t1, t2, t3 WHERE t1.id=t2.id AND t2.id=t3.id;
These statements use all three tables when searching for rows to
delete, but delete matching rows only from tables
t1 and t2.
The preceding examples show inner joins that use the comma
operator, but multiple-table DELETE
statements can use any type of join allowed in
SELECT statements, such as LEFT
JOIN.
The syntax allows .* after the table names
for compatibility with Access.
If you use a multiple-table DELETE statement
involving InnoDB tables for which there are
foreign key constraints, the MySQL optimizer might process
tables in an order that differs from that of their parent/child
relationship. In this case, the statement fails and rolls back.
Instead, you should delete from a single table and rely on the
ON DELETE capabilities that
InnoDB provides to cause the other tables to
be modified accordingly.
Note: If you provide an alias for a table, you must use the alias when referring to the table:
DELETE t1 FROM test AS t1, test2 WHERE ...
Cross-database deletes are supported for multiple-table deletes, but in this case, you must refer to the tables without using aliases. For example:
DELETE test1.tmp1, test2.tmp2 FROM test1.tmp1, test2.tmp2 WHERE ...
Currently, you cannot delete from a table and select from the same table in a subquery.
DOexpr[,expr] ...
DO executes the expressions but does not
return any results. In most respects, DO is
shorthand for SELECT expr,
..., but has the advantage that it is slightly faster
when you do not care about the result.
DO is useful primarily with functions that
have side effects, such as RELEASE_LOCK().
HANDLERtbl_nameOPEN [ ASalias] HANDLERtbl_nameREADindex_name{ = | >= | <= | < } (value1,value2,...) [ WHEREwhere_condition] [LIMIT ... ] HANDLERtbl_nameREADindex_name{ FIRST | NEXT | PREV | LAST } [ WHEREwhere_condition] [LIMIT ... ] HANDLERtbl_nameREAD { FIRST | NEXT } [ WHEREwhere_condition] [LIMIT ... ] HANDLERtbl_nameCLOSE
The HANDLER statement provides direct access
to table storage engine interfaces. It is available for
MyISAM and InnoDB tables.
The HANDLER ... OPEN statement opens a table,
making it accessible via subsequent HANDLER ...
READ statements. This table object is not shared by
other threads and is not closed until the thread calls
HANDLER ... CLOSE or the thread terminates.
If you open the table using an alias, further references to the
open table with other HANDLER statements must
use the alias rather than the table name.
The first HANDLER ... READ syntax fetches a
row where the index specified satisfies the given values and the
WHERE condition is met. If you have a
multiple-column index, specify the index column values as a
comma-separated list. Either specify values for all the columns
in the index, or specify values for a leftmost prefix of the
index columns. Suppose that an index my_idx
includes three columns named col_a,
col_b, and col_c, in that
order. The HANDLER statement can specify
values for all three columns in the index, or for the columns in
a leftmost prefix. For example:
HANDLER ... READ my_idx = (col_a_val,col_b_val,col_c_val) ... HANDLER ... READ my_idx = (col_a_val,col_b_val) ... HANDLER ... READ my_idx = (col_a_val) ...
To employ the HANDLER interface to refer to a
table's PRIMARY KEY, use the quoted
identifier `PRIMARY`:
HANDLER tbl_name READ `PRIMARY` ...
The second HANDLER ... READ syntax fetches a
row from the table in index order that matches the
WHERE condition.
The third HANDLER ... READ syntax fetches a
row from the table in natural row order that matches the
WHERE condition. It is faster than
HANDLER tbl_name READ
index_name when a full
table scan is desired. Natural row order is the order in which
rows are stored in a MyISAM table data file.
This statement works for InnoDB tables as
well, but there is no such concept because there is no separate
data file.
Without a LIMIT clause, all forms of
HANDLER ... READ fetch a single row if one is
available. To return a specific number of rows, include a
LIMIT clause. It has the same syntax as for
the SELECT statement. See
Section 13.2.7, “SELECT Syntax”.
HANDLER ... CLOSE closes a table that was
opened with HANDLER ... OPEN.
HANDLER is a somewhat low-level statement.
For example, it does not provide consistency. That is,
HANDLER ... OPEN does
not take a snapshot of the table, and does
not lock the table. This means that after a
HANDLER ... OPEN statement is issued, table
data can be modified (by the current thread or other threads)
and these modifications might be only partially visible to
HANDLER ... NEXT or HANDLER ...
PREV scans.
There are several reasons to use the HANDLER
interface instead of normal SELECT
statements:
-
HANDLERis faster thanSELECT:-
A designated storage engine handler object is allocated for the
HANDLER ... OPEN. The object is reused for subsequentHANDLERstatements for that table; it need not be reinitialized for each one. -
There is less parsing involved.
-
There is no optimizer or query-checking overhead.
-
The table does not have to be locked between two handler requests.
-
The handler interface does not have to provide a consistent look of the data (for example, dirty reads are allowed), so the storage engine can use optimizations that
SELECTdoes not normally allow.
-
-
For applications that use a low-level
ISAM-like interface,HANDLERmakes it much easier to port them to MySQL. -
HANDLERenables you to traverse a database in a manner that is difficult (or even impossible) to accomplish withSELECT. TheHANDLERinterface is a more natural way to look at data when working with applications that provide an interactive user interface to the database.
INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name [(col_name,...)]
VALUES ({expr | DEFAULT},...),(...),...
[ ON DUPLICATE KEY UPDATE col_name=expr, ... ]
Or:
INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name
SET col_name={expr | DEFAULT}, ...
[ ON DUPLICATE KEY UPDATE col_name=expr, ... ]
Or:
INSERT [LOW_PRIORITY | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name [(col_name,...)]
SELECT ...
[ ON DUPLICATE KEY UPDATE col_name=expr, ... ]
INSERT inserts new rows into an existing
table. The INSERT ... VALUES and
INSERT ... SET forms of the statement insert
rows based on explicitly specified values. The INSERT
... SELECT form inserts rows selected from another
table or tables. INSERT ... SELECT is
discussed further in Section 13.2.4.1, “INSERT ... SELECT Syntax”.
You can use REPLACE instead of
INSERT to overwrite old rows.
REPLACE is the counterpart to INSERT
IGNORE in the treatment of new rows that contain
unique key values that duplicate old rows: The new rows are used
to replace the old rows rather than being discarded. See
Section 13.2.6, “REPLACE Syntax”.
tbl_name is the table into which rows
should be inserted. The columns for which the statement provides
values can be specified as follows:
-
You can provide a comma-separated list of column names following the table name. In this case, a value for each named column must be provided by the
VALUESlist or theSELECTstatement. -
If you do not specify a list of column names for
INSERT ... VALUESorINSERT ... SELECT, values for every column in the table must be provided by theVALUESlist or theSELECTstatement. If you do not know the order of the columns in the table, useDESCRIBEtbl_nameto find out. -
The
SETclause indicates the column names explicitly.
Column values can be given in several ways:
-
If you are not running in strict SQL mode, any column not explicitly given a value is set to its default (explicit or implicit) value. For example, if you specify a column list that does not name all the columns in the table, unnamed columns are set to their default values. Default value assignment is described in Section 11.1.4, “Data Type Default Values”. See also Section 1.9.6.2, “Constraints on Invalid Data”.
If you want an
INSERTstatement to generate an error unless you explicitly specify values for all columns that do not have a default value, you should use strict mode. See Section 5.2.5, “The Server SQL Mode”. -
Use the keyword
DEFAULTto set a column explicitly to its default value. This makes it easier to writeINSERTstatements that assign values to all but a few columns, because it enables you to avoid writing an incompleteVALUESlist that does not include a value for each column in the table. Otherwise, you would have to write out the list of column names corresponding to each value in theVALUESlist.You can also use
DEFAULT(col_name) as a more general form that can be used in expressions to produce a given column's default value. -
If both the column list and the
VALUESlist are empty,INSERTcreates a row with each column set to its default value:INSERT INTO
tbl_name() VALUES();In strict mode, an error occurs if any column doesn't have a default value. Otherwise, MySQL uses the implicit default value for any column that does not have an explicitly defined default.
-
You can specify an expression
exprto provide a column value. This might involve type conversion if the type of the expression does not match the type of the column, and conversion of a given value can result in different inserted values depending on the data type. For example, inserting the string'1999.0e-2'into anINT,FLOAT,DECIMAL(10,6), orYEARcolumn results in the values1999,19.9921,19.992100, and1999being inserted, respectively. The reason the value stored in theINTandYEARcolumns is1999is that the string-to-integer conversion looks only at as much of the initial part of the string as may be considered a valid integer or year. For the floating-point and fixed-point columns, the string-to-floating-point conversion considers the entire string a valid floating-point value.An expression
exprcan refer to any column that was set earlier in a value list. For example, you can do this because the value forcol2refers tocol1, which has previously been assigned:INSERT INTO
tbl_name(col1,col2) VALUES(15,col1*2);But the following is not legal, because the value for
col1refers tocol2, which is assigned aftercol1:INSERT INTO
tbl_name(col1,col2) VALUES(col2*2,15);One exception involves columns that contain
AUTO_INCREMENTvalues. Because theAUTO_INCREMENTvalue is generated after other value assignments, any reference to anAUTO_INCREMENTcolumn in the assignment returns a0.
INSERT statements that use
VALUES syntax can insert multiple rows. To do
this, include multiple lists of column values, each enclosed
within parentheses and separated by commas. Example:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
The values list for each row must be enclosed within parentheses. The following statement is illegal because the number of values in the list does not match the number of column names:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3,4,5,6,7,8,9);
The rows-affected value for an INSERT can be
obtained using the mysql_affected_rows() C
API function. See Section 22.2.3.1, “mysql_affected_rows()”.
If you use an INSERT ... VALUES statement
with multiple value lists or INSERT ...
SELECT, the statement returns an information string in
this format:
Records: 100 Duplicates: 0 Warnings: 0
Records indicates the number of rows
processed by the statement. (This is not necessarily the number
of rows actually inserted because Duplicates
can be non-zero.) Duplicates indicates the
number of rows that could not be inserted because they would
duplicate some existing unique index value.
Warnings indicates the number of attempts to
insert column values that were problematic in some way. Warnings
can occur under any of the following conditions:
-
Inserting
NULLinto a column that has been declaredNOT NULL. For multiple-rowINSERTstatements orINSERT INTO ... SELECTstatements, the column is set to the implicit default value for the column data type. This is0for numeric types, the empty string ('') for string types, and the “zero” value for date and time types.INSERT INTO ... SELECTstatements are handled the same way as multiple-row inserts because the server does not examine the result set from theSELECTto see whether it returns a single row. (For a single-rowINSERT, no warning occurs whenNULLis inserted into aNOT NULLcolumn. Instead, the statement fails with an error.) -
Setting a numeric column to a value that lies outside the column's range. The value is clipped to the closest endpoint of the range.
-
Assigning a value such as
'10.34 a'to a numeric column. The trailing non-numeric text is stripped off and the remaining numeric part is inserted. If the string value has no leading numeric part, the column is set to0. -
Inserting a string into a string column (
CHAR,VARCHAR,TEXT, orBLOB) that exceeds the column's maximum length. The value is truncated to the column's maximum length. -
Inserting a value into a date or time column that is illegal for the data type. The column is set to the appropriate zero value for the type.
If you are using the C API, the information string can be
obtained by invoking the mysql_info()
function. See Section 22.2.3.34, “mysql_info()”.
If INSERT inserts a row into a table that has
an AUTO_INCREMENT column, you can find the
value used for that column by using the SQL
LAST_INSERT_ID() function. From within the C
API, use the mysql_insert_id() function.
However, you should note that the two functions do not always
behave identically. The behavior of INSERT
statements with respect to AUTO_INCREMENT
columns is discussed further in
Section 12.9.3, “Information Functions”, and
Section 22.2.3.36, “mysql_insert_id()”.
The INSERT statement supports the following
modifiers:
-
If you use the
DELAYEDkeyword, the server puts the row or rows to be inserted into a buffer, and the client issuing theINSERT DELAYEDstatement can then continue immediately. If the table is in use, the server holds the rows. When the table is free, the server begins inserting rows, checking periodically to see whether there are any new read requests for the table. If there are, the delayed row queue is suspended until the table becomes free again. See Section 13.2.4.2, “INSERT DELAYEDSyntax”.DELAYEDis ignored withINSERT ... SELECTorINSERT ... ON DUPLICATE KEY UPDATE. -
If you use the
LOW_PRIORITYkeyword, execution of theINSERTis delayed until no other clients are reading from the table. This includes other clients that began reading while existing clients are reading, and while theINSERT LOW_PRIORITYstatement is waiting. It is possible, therefore, for a client that issues anINSERT LOW_PRIORITYstatement to wait for a very long time (or even forever) in a read-heavy environment. (This is in contrast toINSERT DELAYED, which lets the client continue at once. Note thatLOW_PRIORITYshould normally not be used withMyISAMtables because doing so disables concurrent inserts. See Section 7.3.3, “Concurrent Inserts”. -
If you specify
HIGH_PRIORITY, it overrides the effect of the--low-priority-updatesoption if the server was started with that option. It also causes concurrent inserts not to be used. -
If you use the
IGNOREkeyword, errors that occur while executing theINSERTstatement are treated as warnings instead. For example, withoutIGNORE, a row that duplicates an existingUNIQUEindex orPRIMARY KEYvalue in the table causes a duplicate-key error and the statement is aborted. WithIGNORE, the row still is not inserted, but no error is issued. Data conversions that would trigger errors abort the statement ifIGNOREis not specified. WithIGNORE, invalid values are adjusted to the closest values and inserted; warnings are produced but the statement does not abort. You can determine with themysql_info()C API function how many rows were actually inserted into the table. -
If you specify
ON DUPLICATE KEY UPDATE, and a row is inserted that would cause a duplicate value in aUNIQUEindex orPRIMARY KEY, anUPDATEof the old row is performed. See Section 13.2.4.3, “INSERT ... ON DUPLICATE KEY UPDATESyntax”.
INSERT [LOW_PRIORITY | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name [(col_name,...)]
SELECT ...
[ ON DUPLICATE KEY UPDATE col_name=expr, ... ]
With INSERT ... SELECT, you can quickly
insert many rows into a table from one or many tables. For
example:
INSERT INTO tbl_temp2 (fld_id) SELECT tbl_temp1.fld_order_id FROM tbl_temp1 WHERE tbl_temp1.fld_order_id > 100;
The following conditions hold for a INSERT ...
SELECT statements:
-
Specify
IGNOREto ignore rows that would cause duplicate-key violations. -
DELAYEDis ignored withINSERT ... SELECT. -
The target table of the
INSERTstatement may appear in theFROMclause of theSELECTpart of the query. (This was not possible in some older versions of MySQL.) In this case, MySQL creates a temporary table to hold the rows from theSELECTand then inserts those rows into the target table. -
AUTO_INCREMENTcolumns work as usual. -
To ensure that the binary log can be used to re-create the original tables, MySQL does not allow concurrent inserts for
INSERT ... SELECTstatements. -
Currently, you cannot insert into a table and select from the same table in a subquery.
In the values part of ON DUPLICATE KEY
UPDATE, you can refer to columns in other tables, as
long as you do not use GROUP BY in the
SELECT part. One side effect is that you
must qualify non-unique column names in the values part.
INSERT DELAYED ...
The DELAYED option for the
INSERT statement is a MySQL extension to
standard SQL that is very useful if you have clients that
cannot or need not wait for the INSERT to
complete. This is a common situation when you use MySQL for
logging and you also periodically run
SELECT and UPDATE
statements that take a long time to complete.
When a client uses INSERT DELAYED, it gets
an okay from the server at once, and the row is queued to be
inserted when the table is not in use by any other thread.
Another major benefit of using INSERT
DELAYED is that inserts from many clients are
bundled together and written in one block. This is much faster
than performing many separate inserts.
Note that INSERT DELAYED is slower than a
normal INSERT if the table is not otherwise
in use. There is also the additional overhead for the server
to handle a separate thread for each table for which there are
delayed rows. This means that you should use INSERT
DELAYED only when you are really sure that you need
it.
The queued rows are held only in memory until they are
inserted into the table. This means that if you terminate
mysqld forcibly (for example, with
kill -9) or if mysqld
dies unexpectedly, any queued rows that have not
been written to disk are lost.
There are some constraints on the use of
DELAYED:
-
INSERT DELAYEDworks only withMyISAM,MEMORY, andARCHIVEtables. See Section 14.1, “TheMyISAMStorage Engine”, Section 14.4, “TheMEMORY(HEAP) Storage Engine”, and Section 14.8, “TheARCHIVEStorage Engine”.For
MyISAMtables, if there are no free blocks in the middle of the data file, concurrentSELECTandINSERTstatements are supported. Under these circumstances, you very seldom need to useINSERT DELAYEDwithMyISAM. -
INSERT DELAYEDshould be used only forINSERTstatements that specify value lists. The server ignoresDELAYEDforINSERT ... SELECTorINSERT ... ON DUPLICATE KEY UPDATEstatements. -
Because the
INSERT DELAYEDstatement returns immediately, before the rows are inserted, you cannot useLAST_INSERT_ID()to get theAUTO_INCREMENTvalue that the statement might generate. -
DELAYEDrows are not visible toSELECTstatements until they actually have been inserted. -
DELAYEDis ignored on slave replication servers because it could cause the slave to have different data than the master. -
Pending
INSERT DELAYEDstatements are lost if a table is write locked andALTER TABLEis used to modify the table structure.
The following describes in detail what happens when you use
the DELAYED option to
INSERT or REPLACE. In
this description, the “thread” is the thread that
received an INSERT DELAYED statement and
“handler” is the thread that handles all
INSERT DELAYED statements for a particular
table.
-
When a thread executes a
DELAYEDstatement for a table, a handler thread is created to process allDELAYEDstatements for the table, if no such handler already exists. -
The thread checks whether the handler has previously acquired a
DELAYEDlock; if not, it tells the handler thread to do so. TheDELAYEDlock can be obtained even if other threads have aREADorWRITElock on the table. However, the handler waits for allALTER TABLElocks orFLUSH TABLESstatements to finish, to ensure that the table structure is up to date. -
The thread executes the
INSERTstatement, but instead of writing the row to the table, it puts a copy of the final row into a queue that is managed by the handler thread. Any syntax errors are noticed by the thread and reported to the client program. -
The client cannot obtain from the server the number of duplicate rows or the
AUTO_INCREMENTvalue for the resulting row, because theINSERTreturns before the insert operation has been completed. (If you use the C API, themysql_info()function does not return anything meaningful, for the same reason.) -
The binary log is updated by the handler thread when the row is inserted into the table. In case of multiple-row inserts, the binary log is updated when the first row is inserted.
-
Each time that
delayed_insert_limitrows are written, the handler checks whether anySELECTstatements are still pending. If so, it allows these to execute before continuing. -
When the handler has no more rows in its queue, the table is unlocked. If no new
INSERT DELAYEDstatements are received withindelayed_insert_timeoutseconds, the handler terminates. -
If more than
delayed_queue_sizerows are pending in a specific handler queue, the thread requestingINSERT DELAYEDwaits until there is room in the queue. This is done to ensure that mysqld does not use all memory for the delayed memory queue. -
The handler thread shows up in the MySQL process list with
delayed_insertin theCommandcolumn. It is killed if you execute aFLUSH TABLESstatement or kill it withKILLthread_id. However, before exiting, it first stores all queued rows into the table. During this time it does not accept any newINSERTstatements from other threads. If you execute anINSERT DELAYEDstatement after this, a new handler thread is created.Note that this means that
INSERT DELAYEDstatements have higher priority than normalINSERTstatements if there is anINSERT DELAYEDhandler running. Other update statements have to wait until theINSERT DELAYEDqueue is empty, someone terminates the handler thread (withKILLthread_id), or someone executes aFLUSH TABLES. -
The following status variables provide information about
INSERT DELAYEDstatements:Status Variable Meaning Delayed_insert_threadsNumber of handler threads Delayed_writesNumber of rows written with INSERT DELAYEDNot_flushed_delayed_rowsNumber of rows waiting to be written You can view these variables by issuing a
SHOW STATUSstatement or by executing a mysqladmin extended-status command.
If you specify ON DUPLICATE KEY UPDATE, and
a row is inserted that would cause a duplicate value in a
UNIQUE index or PRIMARY
KEY, an UPDATE of the old row is
performed. For example, if column a is
declared as UNIQUE and contains the value
1, the following two statements have
identical effect:
INSERT INTO table (a,b,c) VALUES (1,2,3) ON DUPLICATE KEY UPDATE c=c+1; UPDATE table SET c=c+1 WHERE a=1;
The rows-affected value is 1 if the row is inserted as a new record and 2 if an existing record is updated.
If column b is also unique, the
INSERT is equivalent to this
UPDATE statement instead:
UPDATE table SET c=c+1 WHERE a=1 OR b=2 LIMIT 1;
If a=1 OR b=2 matches several rows, only
one row is updated. In general, you
should try to avoid using an ON DUPLICATE
KEY clause on tables with multiple unique indexes.
You can use the
VALUES(col_name)
function in the UPDATE clause to refer to
column values from the INSERT portion of
the INSERT ... UPDATE statement. In other
words,
VALUES(col_name)
in the UPDATE clause refers to the value of
col_name that would be inserted,
had no duplicate-key conflict occurred. This function is
especially useful in multiple-row inserts. The
VALUES() function is meaningful only in
INSERT ... UPDATE statements and returns
NULL otherwise. Example:
INSERT INTO table (a,b,c) VALUES (1,2,3),(4,5,6) ON DUPLICATE KEY UPDATE c=VALUES(a)+VALUES(b);
That statement is identical to the following two statements:
INSERT INTO table (a,b,c) VALUES (1,2,3) ON DUPLICATE KEY UPDATE c=3; INSERT INTO table (a,b,c) VALUES (4,5,6) ON DUPLICATE KEY UPDATE c=9;
The DELAYED option is ignored when you use
ON DUPLICATE KEY UPDATE.
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name' [REPLACE | IGNORE] INTO TABLEtbl_name[FIELDS [TERMINATED BY 'string'] [[OPTIONALLY] ENCLOSED BY 'char'] [ESCAPED BY 'char'] ] [LINES [STARTING BY 'string'] [TERMINATED BY 'string'] ] [IGNOREnumberLINES] [(col_name_or_user_var,...)] [SETcol_name=expr,...)]
The LOAD DATA INFILE statement reads rows
from a text file into a table at a very high speed. The filename
must be given as a literal string.
LOAD DATA INFILE is the complement of
SELECT ... INTO OUTFILE. (See
Section 13.2.7, “SELECT Syntax”.) To write data from a table to a file,
use SELECT ... INTO OUTFILE. To read the file
back into a table, use LOAD DATA INFILE. The
syntax of the FIELDS and
LINES clauses is the same for both
statements. Both clauses are optional, but
FIELDS must precede LINES
if both are specified.
For more information about the efficiency of
INSERT versus LOAD DATA
INFILE and speeding up LOAD DATA
INFILE, see Section 7.2.16, “Speed of INSERT Statements”.
The character set indicated by the
character_set_database system variable is
used to interpret the information in the file. SET
NAMES and the setting of
character_set_client do not affect
interpretation of input.
Note that it is currently not possible to load data files that
use the ucs2 character set.
As of MySQL 5.0.19, the
character_set_filesystem system variable
controls the interpretation of the filename.
You can also load data files by using the
mysqlimport utility; it operates by sending a
LOAD DATA INFILE statement to the server. The
--local option causes
mysqlimport to read data files from the
client host. You can specify the --compress
option to get better performance over slow networks if the
client and server support the compressed protocol. See
Section 8.14, “mysqlimport — A Data Import Program”.
If you use LOW_PRIORITY, execution of the
LOAD DATA statement is delayed until no other
clients are reading from the table.
If you specify CONCURRENT with a
MyISAM table that satisfies the condition for
concurrent inserts (that is, it contains no free blocks in the
middle), other threads can retrieve data from the table while
LOAD DATA is executing. Using this option
affects the performance of LOAD DATA a bit,
even if no other thread is using the table at the same time.
The LOCAL keyword, if specified, is
interpreted with respect to the client end of the connection:
-
If
LOCALis specified, the file is read by the client program on the client host and sent to the server. The file can be given as a full pathname to specify its exact location. If given as a relative pathname, the name is interpreted relative to the directory in which the client program was started. -
If
LOCALis not specified, the file must be located on the server host and is read directly by the server. The server uses the following rules to locate the file:-
If the filename is an absolute pathname, the server uses it as given.
-
If the filename is a relative pathname with one or more leading components, the server searches for the file relative to the server's data directory.
-
If a filename with no leading components is given, the server looks for the file in the database directory of the default database.
-
Note that, in the non-LOCAL case, these rules
mean that a file named as ./myfile.txt is
read from the server's data directory, whereas the file named as
myfile.txt is read from the database
directory of the default database. For example, if
db1 is the default database, the following
LOAD DATA statement reads the file
data.txt from the database directory for
db1, even though the statement explicitly
loads the file into a table in the db2
database:
LOAD DATA INFILE 'data.txt' INTO TABLE db2.my_table;
Windows pathnames are specified using forward slashes rather than backslashes. If you do use backslashes, you must double them.
For security reasons, when reading text files located on the
server, the files must either reside in the database directory
or be readable by all. Also, to use LOAD DATA
INFILE on server files, you must have the
FILE privilege. See
Section 5.8.3, “Privileges Provided by MySQL”.
Using LOCAL is a bit slower than letting the
server access the files directly, because the contents of the
file must be sent over the connection by the client to the
server. On the other hand, you do not need the
FILE privilege to load local files.
LOCAL works only if your server and your
client both have been enabled to allow it. For example, if
mysqld was started with
--local-infile=0, LOCAL does
not work. See Section 5.7.4, “Security Issues with LOAD DATA LOCAL”.
On Unix, if you need LOAD DATA to read from a
pipe, you can use the following technique (here we load the
listing of the / directory into a table):
mkfifo /mysql/db/x/x chmod 666 /mysql/db/x/x find / -ls > /mysql/db/x/x mysql -e "LOAD DATA INFILE 'x' INTO TABLE x" x
The REPLACE and IGNORE
keywords control handling of input rows that duplicate existing
rows on unique key values:
-
If you specify
REPLACE, input rows replace existing rows. In other words, rows that have the same value for a primary key or unique index as an existing row. See Section 13.2.6, “REPLACESyntax”. -
If you specify
IGNORE, input rows that duplicate an existing row on a unique key value are skipped. If you do not specify either option, the behavior depends on whether theLOCALkeyword is specified. WithoutLOCAL, an error occurs when a duplicate key value is found, and the rest of the text file is ignored. WithLOCAL, the default behavior is the same as ifIGNOREis specified; this is because the server has no way to stop transmission of the file in the middle of the operation.
If you want to ignore foreign key constraints during the load
operation, you can issue a SET
FOREIGN_KEY_CHECKS=0 statement before executing
LOAD DATA.
If you use LOAD DATA INFILE on an empty
MyISAM table, all non-unique indexes are
created in a separate batch (as for REPAIR
TABLE). Normally, this makes LOAD DATA
INFILE much faster when you have many indexes. In some
extreme cases, you can create the indexes even faster by turning
them off with ALTER TABLE ... DISABLE KEYS
before loading the file into the table and using ALTER
TABLE ... ENABLE KEYS to re-create the indexes after
loading the file. See Section 7.2.16, “Speed of INSERT Statements”.
For both the LOAD DATA INFILE and
SELECT ... INTO OUTFILE statements, the
syntax of the FIELDS and
LINES clauses is the same. Both clauses are
optional, but FIELDS must precede
LINES if both are specified.
If you specify a FIELDS clause, each of its
subclauses (TERMINATED BY,
[OPTIONALLY] ENCLOSED BY, and
ESCAPED BY) is also optional, except that you
must specify at least one of them.
If you specify no FIELDS clause, the defaults
are the same as if you had written this:
FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\'
If you specify no LINES clause, the defaults
are the same as if you had written this:
LINES TERMINATED BY '\n' STARTING BY ''
In other words, the defaults cause LOAD DATA
INFILE to act as follows when reading input:
-
Look for line boundaries at newlines.
-
Do not skip over any line prefix.
-
Break lines into fields at tabs.
-
Do not expect fields to be enclosed within any quoting characters.
-
Interpret occurrences of tab, newline, or ‘
\’ preceded by ‘\’ as literal characters that are part of field values.
Conversely, the defaults cause SELECT ... INTO
OUTFILE to act as follows when writing output:
-
Write tabs between fields.
-
Do not enclose fields within any quoting characters.
-
Use ‘
\’ to escape instances of tab, newline, or ‘\’ that occur within field values. -
Write newlines at the ends of lines.
Backslash is the MySQL escape character within strings, so to
write FIELDS ESCAPED BY '\\', you must
specify two backslashes for the value to be interpreted as a
single backslash.
Note: If you have generated the
text file on a Windows system, you might have to use
LINES TERMINATED BY '\r\n' to read the file
properly, because Windows programs typically use two characters
as a line terminator. Some programs, such as
WordPad, might use \r as a
line terminator when writing files. To read such files, use
LINES TERMINATED BY '\r'.
If all the lines you want to read in have a common prefix that
you want to ignore, you can use LINES STARTING BY
'prefix_string' to skip
over the prefix, and anything before it. If
a line does not include the prefix, the entire line is skipped.
Suppose that you issue the following statement:
LOAD DATA INFILE '/tmp/test.txt' INTO TABLE test FIELDS TERMINATED BY ',' LINES STARTING BY 'xxx';
If the data file looks like this:
xxx"abc",1 something xxx"def",2 "ghi",3
The resulting rows will be ("abc",1) and
("def",2). The third row in the file is
skipped because it does not contain the prefix.
The IGNORE number
LINES option can be used to ignore lines at the start
of the file. For example, you can use IGNORE 1
LINES to skip over an initial header line containing
column names:
LOAD DATA INFILE '/tmp/test.txt' INTO TABLE test IGNORE 1 LINES;
When you use SELECT ... INTO OUTFILE in
tandem with LOAD DATA INFILE to write data
from a database into a file and then read the file back into the
database later, the field- and line-handling options for both
statements must match. Otherwise, LOAD DATA
INFILE will not interpret the contents of the file
properly. Suppose that you use SELECT ... INTO
OUTFILE to write a file with fields delimited by
commas:
SELECT * INTO OUTFILE 'data.txt' FIELDS TERMINATED BY ',' FROM table2;
To read the comma-delimited file back in, the correct statement would be:
LOAD DATA INFILE 'data.txt' INTO TABLE table2 FIELDS TERMINATED BY ',';
If instead you tried to read in the file with the statement
shown following, it wouldn't work because it instructs
LOAD DATA INFILE to look for tabs between
fields:
LOAD DATA INFILE 'data.txt' INTO TABLE table2 FIELDS TERMINATED BY '\t';
The likely result is that each input line would be interpreted as a single field.
LOAD DATA INFILE can be used to read files
obtained from external sources. For example, many programs can
export data in comma-separate values (CSV) format, such that
lines have fields separated by commas and enclosed within double
quotes. If lines in such a file are terminated by newlines, the
statement shown here illustrates the field- and line-handling
options you would use to load the file:
LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n';
Any of the field- or line-handling options can specify an empty
string (''). If not empty, the
FIELDS [OPTIONALLY] ENCLOSED BY and
FIELDS ESCAPED BY values must be a single
character. The FIELDS TERMINATED BY,
LINES STARTING BY, and LINES
TERMINATED BY values can be more than one character.
For example, to write lines that are terminated by carriage
return/linefeed pairs, or to read a file containing such lines,
specify a LINES TERMINATED BY '\r\n' clause.
To read a file containing jokes that are separated by lines
consisting of %%, you can do this
CREATE TABLE jokes (a INT NOT NULL AUTO_INCREMENT PRIMARY KEY, joke TEXT NOT NULL); LOAD DATA INFILE '/tmp/jokes.txt' INTO TABLE jokes FIELDS TERMINATED BY '' LINES TERMINATED BY '\n%%\n' (joke);
FIELDS [OPTIONALLY] ENCLOSED BY controls
quoting of fields. For output (SELECT ... INTO
OUTFILE), if you omit the word
OPTIONALLY, all fields are enclosed by the
ENCLOSED BY character. An example of such
output (using a comma as the field delimiter) is shown here:
"1","a string","100.20" "2","a string containing a , comma","102.20" "3","a string containing a \" quote","102.20" "4","a string containing a \", quote and comma","102.20"
If you specify OPTIONALLY, the
ENCLOSED BY character is used only to enclose
values from columns that have a string data type (such as
CHAR, BINARY,
TEXT, or ENUM):
1,"a string",100.20 2,"a string containing a , comma",102.20 3,"a string containing a \" quote",102.20 4,"a string containing a \", quote and comma",102.20
Note that occurrences of the ENCLOSED BY
character within a field value are escaped by prefixing them
with the ESCAPED BY character. Also note that
if you specify an empty ESCAPED BY value, it
is possible to inadvertently generate output that cannot be read
properly by LOAD DATA INFILE. For example,
the preceding output just shown would appear as follows if the
escape character is empty. Observe that the second field in the
fourth line contains a comma following the quote, which
(erroneously) appears to terminate the field:
1,"a string",100.20 2,"a string containing a , comma",102.20 3,"a string containing a " quote",102.20 4,"a string containing a ", quote and comma",102.20
For input, the ENCLOSED BY character, if
present, is stripped from the ends of field values. (This is
true regardless of whether OPTIONALLY is
specified; OPTIONALLY has no effect on input
interpretation.) Occurrences of the ENCLOSED
BY character preceded by the ESCAPED
BY character are interpreted as part of the current
field value.
If the field begins with the ENCLOSED BY
character, instances of that character are recognized as
terminating a field value only if followed by the field or line
TERMINATED BY sequence. To avoid ambiguity,
occurrences of the ENCLOSED BY character
within a field value can be doubled and are interpreted as a
single instance of the character. For example, if
ENCLOSED BY '"' is specified, quotes are
handled as shown here:
"The ""BIG"" boss" -> The "BIG" boss The "BIG" boss -> The "BIG" boss The ""BIG"" boss -> The ""BIG"" boss
FIELDS ESCAPED BY controls how to write or
read special characters. If the FIELDS ESCAPED
BY character is not empty, it is used to prefix the
following characters on output:
-
The
FIELDS ESCAPED BYcharacter -
The
FIELDS [OPTIONALLY] ENCLOSED BYcharacter -
The first character of the
FIELDS TERMINATED BYandLINES TERMINATED BYvalues -
ASCII
0(what is actually written following the escape character is ASCII ‘0’, not a zero-valued byte)
If the FIELDS ESCAPED BY character is empty,
no characters are escaped and NULL is output
as NULL, not \N. It is
probably not a good idea to specify an empty escape character,
particularly if field values in your data contain any of the
characters in the list just given.
For input, if the FIELDS ESCAPED BY character
is not empty, occurrences of that character are stripped and the
following character is taken literally as part of a field value.
The exceptions are an escaped ‘0’
or ‘N’ (for example,
\0 or \N if the escape
character is ‘\’). These
sequences are interpreted as ASCII NUL (a zero-valued byte) and
NULL. The rules for NULL
handling are described later in this section.
For more information about
‘\’-escape syntax, see
Section 9.1, “Literal Values”.
In certain cases, field- and line-handling options interact:
-
If
LINES TERMINATED BYis an empty string andFIELDS TERMINATED BYis non-empty, lines are also terminated withFIELDS TERMINATED BY. -
If the
FIELDS TERMINATED BYandFIELDS ENCLOSED BYvalues are both empty (''), a fixed-row (non-delimited) format is used. With fixed-row format, no delimiters are used between fields (but you can still have a line terminator). Instead, column values are read and written using a field width wide enough to hold all values in the field. ForTINYINT,SMALLINT,MEDIUMINT,INT, andBIGINT, the field widths are 4, 6, 8, 11, and 20, respectively, no matter what the declared display width is.LINES TERMINATED BYis still used to separate lines. If a line does not contain all fields, the rest of the columns are set to their default values. If you do not have a line terminator, you should set this to''. In this case, the text file must contain all fields for each row.Fixed-row format also affects handling of
NULLvalues, as described later. Note that fixed-size format does not work if you are using a multi-byte character set.Note: Before MySQL 5.0.6, fixed-row format used the display width of the column. For example,
INT(4)was read or written using a field with a width of 4. However, if the column contained wider values, they were dumped to their full width, leading to the possibility of a “ragged” field holding values of different widths. Using a field wide enough to hold all values in the field prevents this problem. However, data files written before this change was made might not be reloaded correctly withLOAD DATA INFILEfor MySQL 5.0.6 and up. This change also affects data files read by mysqlimport and written by mysqldump --tab, which useLOAD DATA INFILEandSELECT ... INTO OUTFILE.
Handling of NULL values varies according to
the FIELDS and LINES
options in use:
-
For the default
FIELDSandLINESvalues,NULLis written as a field value of\Nfor output, and a field value of\Nis read asNULLfor input (assuming that theESCAPED BYcharacter is ‘\’). -
If
FIELDS ENCLOSED BYis not empty, a field containing the literal wordNULLas its value is read as aNULLvalue. This differs from the wordNULLenclosed withinFIELDS ENCLOSED BYcharacters, which is read as the string'NULL'. -
If
FIELDS ESCAPED BYis empty,NULLis written as the wordNULL. -
With fixed-row format (which is used when
FIELDS TERMINATED BYandFIELDS ENCLOSED BYare both empty),NULLis written as an empty string. Note that this causes bothNULLvalues and empty strings in the table to be indistinguishable when written to the file because both are written as empty strings. If you need to be able to tell the two apart when reading the file back in, you should not use fixed-row format.
An attempt to load NULL into a NOT
NULL column causes assignment of the implicit default
value for the column's data type and a warning, or an error in
strict SQL mode. Implicit default values are discussed in
Section 11.1.4, “Data Type Default Values”.
Some cases are not supported by LOAD DATA
INFILE:
-
Fixed-size rows (
FIELDS TERMINATED BYandFIELDS ENCLOSED BYboth empty) andBLOBorTEXTcolumns. -
If you specify one separator that is the same as or a prefix of another,
LOAD DATA INFILEcannot interpret the input properly. For example, the followingFIELDSclause would cause problems:FIELDS TERMINATED BY '"' ENCLOSED BY '"'
-
If
FIELDS ESCAPED BYis empty, a field value that contains an occurrence ofFIELDS ENCLOSED BYorLINES TERMINATED BYfollowed by theFIELDS TERMINATED BYvalue causesLOAD DATA INFILEto stop reading a field or line too early. This happens becauseLOAD DATA INFILEcannot properly determine where the field or line value ends.
The following example loads all columns of the
persondata table:
LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata;
By default, when no column list is provided at the end of the
LOAD DATA INFILE statement, input lines are
expected to contain a field for each table column. If you want
to load only some of a table's columns, specify a column list:
LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata (col1,col2,...);
You must also specify a column list if the order of the fields in the input file differs from the order of the columns in the table. Otherwise, MySQL cannot tell how to match input fields with table columns.
Before MySQL 5.0.3, the column list must contain only names of
columns in the table being loaded, and the
SET clause is not supported. As of MySQL
5.0.3, the column list can contain either column names or user
variables. With user variables, the SET
clause enables you to perform transformations on their values
before assigning the result to columns.
User variables in the SET clause can be used
in several ways. The following example uses the first input
column directly for the value of t1.column1,
and assigns the second input column to a user variable that is
subjected to a division operation before being used for the
value of t1.column2:
LOAD DATA INFILE 'file.txt' INTO TABLE t1 (column1, @var1) SET column2 = @var1/100;
The SET clause can be used to supply values
not derived from the input file. The following statement sets
column3 to the current date and time:
LOAD DATA INFILE 'file.txt' INTO TABLE t1 (column1, column2) SET column3 = CURRENT_TIMESTAMP;
You can also discard an input value by assigning it to a user variable and not assigning the variable to a table column:
LOAD DATA INFILE 'file.txt' INTO TABLE t1 (column1, @dummy, column2, @dummy, column3);
Use of the column/variable list and SET
clause is subject to the following restrictions:
-
Assignments in the
SETclause should have only column names on the left hand side of assignment operators. -
You can use subqueries in the right hand side of
SETassignments. A subquery that returns a value to be assigned to a column may be a scalar subquery only. Also, you cannot use a subquery to select from the table that is being loaded. -
Lines ignored by an
IGNOREclause are not processed for the column/variable list orSETclause. -
User variables cannot be used when loading data with fixed-row format because user variables do not have a display width.
When processing an input line, LOAD DATA
splits it into fields and uses the values according to the
column/variable list and the SET clause, if
they are present. Then the resulting row is inserted into the
table. If there are BEFORE INSERT or
AFTER INSERT triggers for the table, they are
activated before or after inserting the row, respectively.
If an input line has too many fields, the extra fields are ignored and the number of warnings is incremented.
If an input line has too few fields, the table columns for which input fields are missing are set to their default values. Default value assignment is described in Section 11.1.4, “Data Type Default Values”.
An empty field value is interpreted differently than if the field value is missing:
-
For string types, the column is set to the empty string.
-
For numeric types, the column is set to
0. -
For date and time types, the column is set to the appropriate “zero” value for the type. See Section 11.3, “Date and Time Types”.
These are the same values that result if you assign an empty
string explicitly to a string, numeric, or date or time type
explicitly in an INSERT or
UPDATE statement.
TIMESTAMP columns are set to the current date
and time only if there is a NULL value for
the column (that is, \N), or if the
TIMESTAMP column's default value is the
current timestamp and it is omitted from the field list when a
field list is specified.
LOAD DATA INFILE regards all input as
strings, so you cannot use numeric values for
ENUM or SET columns the
way you can with INSERT statements. All
ENUM and SET values must
be specified as strings.
BIT values cannot be loaded using binary
notation (for example, b'011010'). To work
around this, specify the values as regular integers and use the
SET clause to convert them so that MySQL
performs a numeric type conversion and loads them into the
BIT column properly:
shell>cat /tmp/bit_test.txt2 127 shell>mysql testmysql>LOAD DATA INFILE '/tmp/bit_test.txt'->INTO TABLE bit_test (@var1) SET b= CAST(@var1 AS SIGNED);Query OK, 2 rows affected (0.00 sec) Records: 2 Deleted: 0 Skipped: 0 Warnings: 0 mysql>SELECT BIN(b+0) FROM bit_test;+----------+ | bin(b+0) | +----------+ | 10 | | 1111111 | +----------+ 2 rows in set (0.00 sec)
When the LOAD DATA INFILE statement finishes,
it returns an information string in the following format:
Records: 1 Deleted: 0 Skipped: 0 Warnings: 0
If you are using the C API, you can get information about the
statement by calling the mysql_info()
function. See Section 22.2.3.34, “mysql_info()”.
Warnings occur under the same circumstances as when values are
inserted via the INSERT statement (see
Section 13.2.4, “INSERT Syntax”), except that LOAD DATA
INFILE also generates warnings when there are too few
or too many fields in the input row. The warnings are not stored
anywhere; the number of warnings can be used only as an
indication of whether everything went well.
You can use SHOW WARNINGS to get a list of
the first max_error_count warnings as
information about what went wrong. See
Section 13.5.4.25, “SHOW WARNINGS Syntax”.
REPLACE [LOW_PRIORITY | DELAYED]
[INTO] tbl_name [(col_name,...)]
VALUES ({expr | DEFAULT},...),(...),...
Or:
REPLACE [LOW_PRIORITY | DELAYED]
[INTO] tbl_name
SET col_name={expr | DEFAULT}, ...
Or:
REPLACE [LOW_PRIORITY | DELAYED]
[INTO] tbl_name [(col_name,...)]
SELECT ...
REPLACE works exactly like
INSERT, except that if an old row in the
table has the same value as a new row for a PRIMARY
KEY or a UNIQUE index, the old row
is deleted before the new row is inserted. See
Section 13.2.4, “INSERT Syntax”.
REPLACE is a MySQL extension to the SQL
standard. It either inserts, or deletes and
inserts. For another MySQL extension to standard SQL —
that either inserts or updates — see
Section 13.2.4.3, “INSERT ... ON DUPLICATE KEY UPDATE Syntax”.
Note that unless the table has a PRIMARY KEY
or UNIQUE index, using a
REPLACE statement makes no sense. It becomes
equivalent to INSERT, because there is no
index to be used to determine whether a new row duplicates
another.
Values for all columns are taken from the values specified in
the REPLACE statement. Any missing columns
are set to their default values, just as happens for
INSERT. You cannot refer to values from the
current row and use them in the new row. If you use an
assignment such as SET
col_name =
col_name + 1, the reference
to the column name on the right hand side is treated as
DEFAULT(col_name),
so the assignment is equivalent to SET
col_name =
DEFAULT(col_name) + 1.
To use REPLACE, you must have both the
INSERT and DELETE
privileges for the table.
The REPLACE statement returns a count to
indicate the number of rows affected. This is the sum of the
rows deleted and inserted. If the count is 1 for a single-row
REPLACE, a row was inserted and no rows were
deleted. If the count is greater than 1, one or more old rows
were deleted before the new row was inserted. It is possible for
a single row to replace more than one old row if the table
contains multiple unique indexes and the new row duplicates
values for different old rows in different unique indexes.
The affected-rows count makes it easy to determine whether
REPLACE only added a row or whether it also
replaced any rows: Check whether the count is 1 (added) or
greater (replaced).
If you are using the C API, the affected-rows count can be
obtained using the mysql_affected_rows()
function.
Currently, you cannot replace into a table and select from the same table in a subquery.
MySQL uses the following algorithm for
REPLACE (and LOAD DATA ...
REPLACE):
-
Try to insert the new row into the table
-
While the insertion fails because a duplicate-key error occurs for a primary key or unique index:
-
Delete from the table the conflicting row that has the duplicate key value
-
Try again to insert the new row into the table
-
SELECT
[ALL | DISTINCT | DISTINCTROW ]
[HIGH_PRIORITY]
[STRAIGHT_JOIN]
[SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT]
[SQL_CACHE | SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS]
select_expr, ...
[FROM table_references
[WHERE where_condition]
[GROUP BY {col_name | expr | position}
[ASC | DESC], ... [WITH ROLLUP]]
[HAVING where_condition]
[ORDER BY {col_name | expr | position}
[ASC | DESC], ...]
[LIMIT {[offset,] row_count | row_count OFFSET offset}]
[PROCEDURE procedure_name(argument_list)]
[INTO OUTFILE 'file_name' export_options
| INTO DUMPFILE 'file_name'
| INTO @var_name [, @var_name]]
[FOR UPDATE | LOCK IN SHARE MODE]]
SELECT is used to retrieve rows selected from
one or more tables, and can include UNION
statements and subqueries. See Section 13.2.7.2, “UNION Syntax”, and
Section 13.2.8, “Subquery Syntax”.
The most commonly used clauses of SELECT
statements are these:
-
Each
select_exprindicates a column that you want to retrieve. There must be at least oneselect_expr. -
table_referencesindicates the table or tables from which to retrieve rows. Its syntax is described in Section 13.2.7.1, “JOINSyntax”. -
The
WHEREclause, if given, indicates the condition or conditions that rows must satisfy to be selected.where_conditionis an expression that evaluates to true for each row to be selected. The statement selects all rows if there is noWHEREclause.In the
WHEREclause, you can use any of the functions and operators that MySQL supports, except for aggregate (summary) functions. See Chapter 12, Functions and Operators.
SELECT can also be used to retrieve rows
computed without reference to any table.
For example:
mysql> SELECT 1 + 1;
-> 2
You are allowed to specify DUAL as a dummy
table name in situations where no tables are referenced:
mysql> SELECT 1 + 1 FROM DUAL;
-> 2
DUAL is purely for compatibility with some
other database servers that require a FROM
clause. MySQL does not require the clause if no tables are
referenced.
In general, clauses used must be given in exactly the order
shown in the syntax description. For example, a
HAVING clause must come after any
GROUP BY clause and before any ORDER
BY clause. The exception is that the
INTO clause can appear either as shown in the
syntax description or immediately preceding the
FROM clause.
-
A
select_exprcan be given an alias usingASalias_name. The alias is used as the expression's column name and can be used inGROUP BY,ORDER BY, orHAVINGclauses. For example:SELECT CONCAT(last_name,', ',first_name) AS full_name FROM mytable ORDER BY full_name;
The
ASkeyword is optional when aliasing aselect_expr. The preceding example could have been written like this:SELECT CONCAT(last_name,', ',first_name) full_name FROM mytable ORDER BY full_name;
However, because the
ASis optional, a subtle problem can occur if you forget the comma between twoselect_exprexpressions: MySQL interprets the second as an alias name. For example, in the following statement,columnbis treated as an alias name:SELECT columna columnb FROM mytable;
For this reason, it is good practice to be in the habit of using
ASexplicitly when specifying column aliases. -
It is not allowable to use a column alias in a
WHEREclause, because the column value might not yet be determined when theWHEREclause is executed. See Section A.5.4, “Problems with Column Aliases”. -
The
FROMtable_referencesclause indicates the table or tables from which to retrieve rows. If you name more than one table, you are performing a join. For information on join syntax, see Section 13.2.7.1, “JOINSyntax”. For each table specified, you can optionally specify an alias.tbl_name[[AS]alias] [{USE|IGNORE|FORCE} INDEX (key_list)]The use of
USE INDEX,IGNORE INDEX,FORCE INDEXto give the optimizer hints about how to choose indexes is described in Section 13.2.7.1, “JOINSyntax”.You can use
SET max_seeks_for_key=valueas an alternative way to force MySQL to prefer key scans instead of table scans. See Section 5.2.2, “Server System Variables”. -
You can refer to a table within the default database as
tbl_name, or asdb_name.tbl_nameto specify a database explicitly. You can refer to a column ascol_name,tbl_name.col_name, ordb_name.tbl_name.col_name. You need not specify atbl_nameordb_name.tbl_nameprefix for a column reference unless the reference would be ambiguous. See Section 9.2.1, “Identifier Qualifiers”, for examples of ambiguity that require the more explicit column reference forms. -
A table reference can be aliased using
tbl_nameASalias_nameortbl_name alias_name:SELECT t1.name, t2.salary FROM employee AS t1, info AS t2 WHERE t1.name = t2.name; SELECT t1.name, t2.salary FROM employee t1, info t2 WHERE t1.name = t2.name;
-
Columns selected for output can be referred to in
ORDER BYandGROUP BYclauses using column names, column aliases, or column positions. Column positions are integers and begin with 1:SELECT college, region, seed FROM tournament ORDER BY region, seed; SELECT college, region AS r, seed AS s FROM tournament ORDER BY r, s; SELECT college, region, seed FROM tournament ORDER BY 2, 3;
To sort in reverse order, add the
DESC(descending) keyword to the name of the column in theORDER BYclause that you are sorting by. The default is ascending order; this can be specified explicitly using theASCkeyword.Use of column positions is deprecated because the syntax has been removed from the SQL standard.
-
If you use
GROUP BY, output rows are sorted according to theGROUP BYcolumns as if you had anORDER BYfor the same columns. To avoid the overhead of sorting thatGROUP BYproduces, addORDER BY NULL:SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;
-
MySQL extends the
GROUP BYclause so that you can also specifyASCandDESCafter columns named in the clause:SELECT a, COUNT(b) FROM test_table GROUP BY a DESC;
-
MySQL extends the use of
GROUP BYto allow selecting fields that are not mentioned in theGROUP BYclause. If you are not getting the results that you expect from your query, please read the description ofGROUP BYfound in Section 12.10, “Functions and Modifiers for Use withGROUP BYClauses”. -
GROUP BYallows aWITH ROLLUPmodifier. See Section 12.10.2, “GROUP BYModifiers”. -
The
HAVINGclause is applied nearly last, just before items are sent to the client, with no optimization. (LIMITis applied afterHAVING.)A
HAVINGclause can refer to any column or alias named in aselect_exprin theSELECTlist or in outer subqueries, and to aggregate functions. However, the SQL standard requires thatHAVINGmust reference only columns in theGROUP BYclause or columns used in aggregate functions. To accommodate both standard SQL and the MySQL-specific behavior of being able to refer columns in theSELECTlist, MySQL 5.0.2 and up allowsHAVINGto refer to columns in theSELECTlist, columns in theGROUP BYclause, columns in outer subqueries, and to aggregate functions.For example, the following statement works in MySQL 5.0.2 but produces an error for earlier versions:
mysql>
SELECT COUNT(*) FROM t GROUP BY col1 HAVING col1 = 2;If the
HAVINGclause refers to a column that is ambiguous, a warning occurs. In the following statement,col2is ambiguous because it is used as both an alias and a column name:SELECT COUNT(col1) AS col2 FROM t GROUP BY col2 HAVING col2 = 2;
Preference is given to standard SQL behavior, so if a
HAVINGcolumn name is used both inGROUP BYand as an aliased column in the output column list, preference is given to the column in theGROUP BYcolumn. -
Do not use
HAVINGfor items that should be in theWHEREclause. For example, do not write the following:SELECT
col_nameFROMtbl_nameHAVINGcol_name> 0;Write this instead:
SELECT
col_nameFROMtbl_nameWHEREcol_name> 0; -
The
HAVINGclause can refer to aggregate functions, which theWHEREclause cannot:SELECT user, MAX(salary) FROM users GROUP BY user HAVING MAX(salary) > 10;
(This did not work in some older versions of MySQL.)
-
MySQL allows duplicate column names. That is, there can be more than one
select_exprwith the same name. This is an extension to standard SQL. Because MySQL also allowsGROUP BYandHAVINGto refer toselect_exprvalues, this can result in an ambiguity:SELECT 12 AS a, a FROM t GROUP BY a;
In that statement, both columns have the name
a. To ensure that the correct column is used for grouping, use different names for eachselect_expr. -
When MySQL resolves an unqualified column or alias reference in an
ORDER BY,GROUP BY, orHAVINGclause, it first searches for the name in theselect_exprvalues. If the name is not found, it looks in the columns of the tables named in theFROMclause. -
The
LIMITclause can be used to constrain the number of rows returned by theSELECTstatement.LIMITtakes one or two numeric arguments, which must both be non-negative integer constants (except when using prepared statements).With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1):
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:
SELECT * FROM tbl LIMIT 95,18446744073709551615;
With one argument, the value specifies the number of rows to return from the beginning of the result set:
SELECT * FROM tbl LIMIT 5; # Retrieve first 5 rows
In other words,
LIMITrow_countis equivalent toLIMIT 0,row_count.For prepared statements, you can use placeholders (supported as of MySQL version 5.0.7). The following statements will return one row from the
tbltable:SET @a=1; PREPARE STMT FROM 'SELECT * FROM tbl LIMIT ?'; EXECUTE STMT USING @a;
The following statements will return the second to sixth row from the
tbltable:SET @skip=1; SET @numrows=5; PREPARE STMT FROM 'SELECT * FROM tbl LIMIT ?, ?'; EXECUTE STMT USING @skip, @numrows;
For compatibility with PostgreSQL, MySQL also supports the
LIMITrow_countOFFSEToffsetsyntax. -
The
SELECT ... INTO OUTFILE 'file_name' form ofSELECTwrites the selected rows to a file. The file is created on the server host, so you must have theFILEprivilege to use this syntax.file_namecannot be an existing file, which among other things prevents files such as/etc/passwdand database tables from being destroyed. As of MySQL 5.0.19, thecharacter_set_filesystemsystem variable controls the interpretation of the filename.The
SELECT ... INTO OUTFILEstatement is intended primarily to let you very quickly dump a table to a text file on the server machine. If you want to create the resulting file on some client host other than the server host, you cannot useSELECT ... INTO OUTFILE. In that case, you should instead use a command such asmysql -e "SELECT ..." >file_nameto generate the file on the client host.SELECT ... INTO OUTFILEis the complement ofLOAD DATA INFILE; the syntax for theexport_optionspart of the statement consists of the sameFIELDSandLINESclauses that are used with theLOAD DATA INFILEstatement. See Section 13.2.5, “LOAD DATA INFILESyntax”.FIELDS ESCAPED BYcontrols how to write special characters. If theFIELDS ESCAPED BYcharacter is not empty, it is used as a prefix that precedes following characters on output:-
The
FIELDS ESCAPED BYcharacter -
The
FIELDS [OPTIONALLY] ENCLOSED BYcharacter -
The first character of the
FIELDS TERMINATED BYandLINES TERMINATED BYvalues -
ASCII
NUL(the zero-valued byte; what is actually written following the escape character is ASCII ‘0’, not a zero-valued byte)
The
FIELDS TERMINATED BY,ENCLOSED BY,ESCAPED BY, orLINES TERMINATED BYcharacters must be escaped so that you can read the file back in reliably. ASCIINULis escaped to make it easier to view with some pagers.The resulting file does not have to conform to SQL syntax, so nothing else need be escaped.
If the
FIELDS ESCAPED BYcharacter is empty, no characters are escaped andNULLis output asNULL, not\N. It is probably not a good idea to specify an empty escape character, particularly if field values in your data contain any of the characters in the list just given.Here is an example that produces a file in the comma-separated values (CSV) format used by many programs:
SELECT a,b,a+b INTO OUTFILE '/tmp/result.txt' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' FROM test_table;
-
-
If you use
INTO DUMPFILEinstead ofINTO OUTFILE, MySQL writes only one row into the file, without any column or line termination and without performing any escape processing. This is useful if you want to store aBLOBvalue in a file. -
The
INTOclause can name a list of one or more user-defined variables. The selected values are assigned to the variables. The number of variables must match the number of columns.Within a stored routine, the variables can be routine parameters or local variables. See Section 17.2.7.3, “
SELECT ... INTOStatement”. -
Note: Any file created by
INTO OUTFILEorINTO DUMPFILEis writable by all users on the server host. The reason for this is that the MySQL server cannot create a file that is owned by anyone other than the user under whose account it is running. (You should never run mysqld asrootfor this and other reasons.) The file thus must be world-writable so that you can manipulate its contents. -
The
SELECTsyntax description at the beginning this section shows theINTOclause near the end of the statement. It is also possible to useINTO OUTFILEorINTO DUMPFILEimmediately preceding theFROMclause. -
A
PROCEDUREclause names a procedure that should process the data in the result set. For an example, see Section 24.3.1, “Procedure Analyse”. -
If you use
FOR UPDATEwith a storage engine that uses page or row locks, rows examined by the query are write-locked until the end of the current transaction. UsingLOCK IN SHARE MODEsets a shared lock that allows other transactions to read the examined rows but not to update or delete them. See Section 14.2.10.5, “SELECT ... FOR UPDATEandSELECT ... LOCK IN SHARE MODELocking Reads”.
Following the SELECT keyword, you can use a
number of options that affect the operation of the statement.
The ALL, DISTINCT, and
DISTINCTROW options specify whether duplicate
rows should be returned. If none of these options are given, the
default is ALL (all matching rows are
returned). DISTINCT and
DISTINCTROW are synonyms and specify removal
of duplicate rows from the result set.
HIGH_PRIORITY,
STRAIGHT_JOIN, and options beginning with
SQL_ are MySQL extensions to standard SQL.
-
HIGH_PRIORITYgives theSELECThigher priority than a statement that updates a table. You should use this only for queries that are very fast and must be done at once. ASELECT HIGH_PRIORITYquery that is issued while the table is locked for reading runs even if there is an update statement waiting for the table to be free.HIGH_PRIORITYcannot be used withSELECTstatements that are part of aUNION. -
STRAIGHT_JOINforces the optimizer to join the tables in the order in which they are listed in theFROMclause. You can use this to speed up a query if the optimizer joins the tables in non-optimal order. See Section 7.2.1, “Optimizing Queries withEXPLAIN”.STRAIGHT_JOINalso can be used in thetable_referenceslist. See Section 13.2.7.1, “JOINSyntax”. -
SQL_BIG_RESULTcan be used withGROUP BYorDISTINCTto tell the optimizer that the result set has many rows. In this case, MySQL directly uses disk-based temporary tables if needed, and prefers sorting to using a temporary table with a key on theGROUP BYelements. -
SQL_BUFFER_RESULTforces the result to be put into a temporary table. This helps MySQL free the table locks early and helps in cases where it takes a long time to send the result set to the client. -
SQL_SMALL_RESULTcan be used withGROUP BYorDISTINCTto tell the optimizer that the result set is small. In this case, MySQL uses fast temporary tables to store the resulting table instead of using sorting. This should not normally be needed. -
SQL_CALC_FOUND_ROWStells MySQL to calculate how many rows there would be in the result set, disregarding anyLIMITclause. The number of rows can then be retrieved withSELECT FOUND_ROWS(). See Section 12.9.3, “Information Functions”. -
SQL_CACHEtells MySQL to store the query result in the query cache if you are using aquery_cache_typevalue of2orDEMAND. For a query that usesUNIONor subqueries, this option effects anySELECTin the query. See Section 5.14, “The MySQL Query Cache”. -
SQL_NO_CACHEtells MySQL not to store the query result in the query cache. See Section 5.14, “The MySQL Query Cache”. For a query that usesUNIONor subqueries, this option effects anySELECTin the query.
MySQL supports the following JOIN syntaxes
for the table_references part of
SELECT statements and multiple-table
DELETE and UPDATE
statements:
table_references:table_reference[,table_reference] ...table_reference:table_factor|join_tabletable_factor:tbl_name[[AS]alias] [{USE|IGNORE|FORCE} INDEX (key_list)] | (table_references) | { OJtable_referenceLEFT OUTER JOINtable_referenceONconditional_expr}join_table:table_reference[INNER | CROSS] JOINtable_factor[join_condition] |table_referenceSTRAIGHT_JOINtable_factor|table_referenceSTRAIGHT_JOINtable_factorONcondition|table_referenceLEFT [OUTER] JOINtable_referencejoin_condition|table_referenceNATURAL [LEFT [OUTER]] JOINtable_factor|table_referenceRIGHT [OUTER] JOINtable_referencejoin_condition|table_referenceNATURAL [RIGHT [OUTER]] JOINtable_factorjoin_condition: ONconditional_expr| USING (column_list)
A table reference is also known as a join expression.
The syntax of table_factor is
extended in comparison with the SQL Standard. The latter
accepts only table_reference, not a
list of them inside a pair of parentheses.
This is a conservative extension if we consider each comma in
a list of table_reference items as
equivalent to an inner join. For example:
SELECT * FROM t1 LEFT JOIN (t2, t3, t4)
ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)
is equivalent to:
SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4)
ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)
In MySQL, CROSS JOIN is a syntactic
equivalent to INNER JOIN (they can replace
each other). In standard SQL, they are not equivalent.
INNER JOIN is used with an
ON clause, CROSS JOIN is
used otherwise.
In versions of MySQL prior to 5.0.1, parentheses in
table_references were just omitted
and all join operations were grouped to the left. In general,
parentheses can be ignored in join expressions containing only
inner join operations. As of 5.0.1, nested joins are allowed
(see Section 7.2.10, “Nested Join Optimization”).
Further changes in join processing were made in 5.0.12 to make MySQL more compliant with standard SQL. These charges are described later in this section.
The following list describes general factors to take into account when writing joins.
-
A table reference can be aliased using
tbl_nameASalias_nameortbl_name alias_name:SELECT t1.name, t2.salary FROM employee AS t1 INNER JOIN info AS t2 ON t1.name = t2.name; SELECT t1.name, t2.salary FROM employee t1 INNER JOIN info t2 ON t1.name = t2.name;
-
INNER JOINand,(comma) are semantically equivalent in the absence of a join condition: both produce a Cartesian product between the specified tables (that is, each and every row in the first table is joined to each and every row in the second table).However, the precedence of the comma operator is less than than of
INNER JOIN,CROSS JOIN,LEFT JOIN, and so on. If you mix comma joins with the other join types when there is a join condition, an error of the formUnknown column 'col_name' in 'on clause' may occur. Information about dealing with this problem is given later in this section. -
The
ONconditional is any conditional expression of the form that can be used in aWHEREclause. Generally, you should use theONclause for conditions that specify how to join tables, and theWHEREclause to restrict which rows you want in the result set. -
If there is no matching row for the right table in the
ONorUSINGpart in aLEFT JOIN, a row with all columns set toNULLis used for the right table. You can use this fact to find rows in a table that have no counterpart in another table:SELECT table1.* FROM table1 LEFT JOIN table2 ON table1.id=table2.id WHERE table2.id IS NULL;
This example finds all rows in
table1with anidvalue that is not present intable2(that is, all rows intable1with no corresponding row intable2). This assumes thattable2.idis declaredNOT NULL. See Section 7.2.9, “LEFT JOINandRIGHT JOINOptimization”. -
The
USING(column_list) clause names a list of columns that must exist in both tables. If tablesaandbboth contain columnsc1,c2, andc3, the following join compares corresponding columns from the two tables:a LEFT JOIN b USING (c1,c2,c3)
-
The
NATURAL [LEFT] JOINof two tables is defined to be semantically equivalent to anINNER JOINor aLEFT JOINwith aUSINGclause that names all columns that exist in both tables. -
RIGHT JOINworks analogously toLEFT JOIN. To keep code portable across databases, it is recommended that you useLEFT JOINinstead ofRIGHT JOIN. -
The
{ OJ ... LEFT OUTER JOIN ...}syntax shown in the join syntax description exists only for compatibility with ODBC. The curly braces in the syntax should be written literally; they are not metasyntax as used elsewhere in syntax descriptions. -
STRAIGHT_JOINis identical toJOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order.
Some join examples:
SELECT * FROM table1, table2; SELECT * FROM table1 INNER JOIN table2 ON table1.id=table2.id; SELECT * FROM table1 LEFT JOIN table2 ON table1.id=table2.id; SELECT * FROM table1 LEFT JOIN table2 USING (id); SELECT * FROM table1 LEFT JOIN table2 ON table1.id=table2.id LEFT JOIN table3 ON table2.id=table3.id;
You can provide hints as to which index MySQL should use when
retrieving information from a table. By specifying
USE INDEX
(key_list), you can tell
MySQL to use only one of the possible indexes to find rows in
the table. The alternative syntax IGNORE INDEX
(key_list) can be used to
tell MySQL to not use some particular index. These hints are
useful if EXPLAIN shows that MySQL is using
the wrong index from the list of possible indexes.
You can also use FORCE INDEX, which acts
like USE INDEX
(key_list) but with the
addition that a table scan is assumed to be
very expensive. In other words, a table
scan is used only if there is no way to use one of the given
indexes to find rows in the table.
USE INDEX, IGNORE INDEX,
and FORCE INDEX affect only which indexes
are used when MySQL decides how to find rows in the table and
how to do the join. They do not affect whether an index is
used when resolving an ORDER BY or
GROUP BY.
USE KEY, IGNORE KEY, and
FORCE KEY are synonyms for USE
INDEX, IGNORE INDEX, and
FORCE INDEX.
Examples:
SELECT * FROM table1 USE INDEX (key1,key2) WHERE key1=1 AND key2=2 AND key3=3; SELECT * FROM table1 IGNORE INDEX (key3) WHERE key1=1 AND key2=2 AND key3=3;
Join Processing Changes in MySQL 5.0.12
Beginning with MySQL 5.0.12, natural joins and joins with
USING, including outer join variants, are
processed according to the SQL:2003 standard. The goal was to
align the syntax and semantics of MySQL with respect to
NATURAL JOIN and JOIN ...
USING according to SQL:2003. However, these changes
in join processing can result in different output columns for
some joins. Also, some queries that appeared to work correctly
in older versions must be rewritten to comply with the
standard.
These changes have five main aspects:
-
The way that MySQL determines the result columns of
NATURALorUSINGjoin operations (and thus the result of the entireFROMclause). -
Expansion of
SELECT *andSELECTtbl_name.* into a list of selected columns. -
Resolution of column names in
NATURALorUSINGjoins. -
Transformation of
NATURALorUSINGjoins intoJOIN ... ON. -
Resolution of column names in the
ONcondition of aJOIN ... ON.
The following list provides more detail about several effects of the 5.0.12 change in join processing. The term “previously” means “prior to MySQL 5.0.12.”
-
The columns of a
NATURALjoin or aUSINGjoin may be different from previously. Specifically, redundant output columns no longer appear, and the order of columns forSELECT *expansion may be different from before.Consider this set of statements:
CREATE TABLE t1 (i INT, j INT); CREATE TABLE t2 (k INT, j INT); INSERT INTO t1 VALUES(1,1); INSERT INTO t2 VALUES(1,1); SELECT * FROM t1 NATURAL JOIN t2; SELECT * FROM t1 JOIN t2 USING (j);
Previously, the statements produced this output:
+------+------+------+------+ | i | j | k | j | +------+------+------+------+ | 1 | 1 | 1 | 1 | +------+------+------+------+ +------+------+------+------+ | i | j | k | j | +------+------+------+------+ | 1 | 1 | 1 | 1 | +------+------+------+------+
In the first
SELECTstatement, columnjappears in both tables and thus becomes a join column, so, according to standard SQL, it should appear only once in the output, not twice. Similarly, in the second SELECT statement, columnjis named in theUSINGclause and should appear only once in the output, not twice. But in both cases, the redundant column is not eliminated. Also, the order of the columns is not correct according to standard SQL.Now the statements produce this output:
+------+------+------+ | j | i | k | +------+------+------+ | 1 | 1 | 1 | +------+------+------+ +------+------+------+ | j | i | k | +------+------+------+ | 1 | 1 | 1 | +------+------+------+
The redundant column is eliminated and the column order is correct according to standard SQL:
-
First, coalesced common columns of the two joined tables, in the order in which they occur in the first table
-
Second, columns unique to the first table, in order in which they occur in that table
-
Third, columns unique to the second table, in order in which they occur in that table
The single result column that replaces two common columns is defined via the coalesce operation. That is, fro two
t1.aandt2.athe resulting single join columnais defined asa = COALESCE(t1.a, t2.a), where:COALESCE(x, y) = (CASE WHEN V1 IS NOT NULL THEN V1 ELSE V2 END)
If the join operation is any other join, the result columns of the join consists of the concatenation of all columns of the joined tables. This is the same as previously.
A consequence of the definition of coalesced columns is that, for outer joins, the coalesced column contains the value of the non-
NULLcolumn if one of the two columns is alwaysNULL. If neither or both columns areNULL, both common columns have the same value, so it doesn't matter which one is chosen as the value of the coalesced column. A simple way to interpret this is to consider that a coalesced column of an outer join is represented by the common column of the inner table of aJOIN. Suppose that the tablest1(a,b)andt2(a,c)have the following contents:t1 t2 ---- ---- 1 x 2 z 2 y 3 w
Then:
mysql>
SELECT * FROM t1 NATURAL LEFT JOIN t2;+------+------+------+ | a | b | c | +------+------+------+ | 1 | x | NULL | | 2 | y | y | +------+------+------+Here column
acontains the values oft1.a.mysql>
SELECT * FROM t1 NATURAL RIGHT JOIN t2;+------+------+------+ | a | c | b | +------+------+------+ | 2 | y | y | | 3 | z | NULL | +------+------+------+Here column
acontains the values oft2.a.Compare these results to the otherwise equivalent queries with
JOIN ... ON:mysql>
SELECT * FROM t1 LEFT JOIN t2 ON (t1.a = t2.a);+------+------+------+------+ | a | b | a | c | +------+------+------+------+ | 1 | x | NULL | NULL | | 2 | y | 2 | y | +------+------+------+------+mysql>
SELECT * FROM t1 RIGHT JOIN t2 ON (t1.a = t2.a);+------+------+------+------+ | a | b | a | c | +------+------+------+------+ | 2 | y | 2 | y | | NULL | NULL | 3 | z | +------+------+------+------+ -
-
Previously, a
USINGclause could be rewritten as anONclause that compares corresponding columns. For example, the following two clauses were semantically identical:a LEFT JOIN b USING (c1,c2,c3) a LEFT JOIN b ON a.c1=b.c1 AND a.c2=b.c2 AND a.c3=b.c3
Now the two clauses no longer are quite the same:
-
With respect to determining which rows satisfy the join condition, both joins remain semantically identical.
-
With respect to determining which columns to display for
SELECT *expansion, the two joins are not semantically identical. TheUSINGjoin selects the coalesced value of corresponding columns, whereas theONjoin selects all columns from all tables. For the precedingUSINGjoin,SELECT *selects these values:COALESCE(a.c1,b.c1), COALESCE(a.c2,b.c2), COALESCE(a.c3,b.c3)
For the
ONjoin,SELECT *selects these values:a.c1, a.c2, a.c3, b.c1, b.c2, b.c3
With an inner join,
COALESCE(a.c1,b.c1)is the same as eithera.c1orb.c1because both columns will have the same value. With an outer join (such asLEFT JOIN), one of the two columns can beNULL. That column will be omitted from the result.
-
-
The evaluation of multi-way natural joins differs in a very important way that affects the result of
NATURALorUSINGjoins and that can require query rewriting. Suppose that you have three tablest1(a,b),t2(c,b), andt3(a,c)that each have one row:t1(1,2),t2(10,2), andt3(7,10). Suppose also that you have thisNATURAL JOINon the three tables:SELECT ... FROM t1 NATURAL JOIN t2 NATURAL JOIN t3;
Previously, the left operand of the second join was considered to be
t2, whereas it should be the nested join(t1 NATURAL JOIN t2). As a result, the columns oft3are checked for common columns only int2, and, ift3has common columns witht1, these columns are not used as equi-join columns. Thus, previously, the preceding query was transformed to the following equi-join:SELECT ... FROM t1, t2, t3 WHERE t1.b = t2.b AND t2.c = t3.c;
That join is missing one more equi-join predicate
(t1.a = t3.a). As a result, it produces one row, not the empty result that it should. The correct equivalent query is this:SELECT ... FROM t1, t2, t3 WHERE t1.b = t2.b AND t2.c = t3.c AND t1.a = t3.a;
If you require the same query result in current versions of MySQL as in older versions, rewrite the natural join as the first equi-join.
-
Previously, the comma operator (
,) andJOINboth had the same precedence, so the join expressiont1, t2 JOIN t3was interpreted as((t1, t2) JOIN t3). NowJOINhas higher precedence, so the expression is interpreted as(t1, (t2 JOIN t3)). This change affects statements that use anONclause, because that clause can refer only to columns in the operands of the join, and the change in precedence changes interpretation of what those operands are.Example:
CREATE TABLE t1 (i1 INT, j1 INT); CREATE TABLE t2 (i2 INT, j2 INT); CREATE TABLE t3 (i3 INT, j3 INT); INSERT INTO t1 VALUES(1,1); INSERT INTO t2 VALUES(1,1); INSERT INTO t3 VALUES(1,1); SELECT * FROM t1, t2 JOIN t3 ON (t1.i1 = t3.i3);
Previously, the
SELECTwas legal due to the implicit grouping oft1,t2as(t1,t2). Now theJOINtakes precedence, so the operands for theONclause aret2andt3. Becauset1.i1is not a column in either of the operands, the result is anUnknown column 't1.i1' in 'on clause'error. To allow the join to be processed, group the first two tables explicitly with parentheses so that the operands for theONclause are(t1,t2)andt3:SELECT * FROM (t1, t2) JOIN t3 ON (t1.i1 = t3.i3);
Alternatively, avoid the use of the comma operator and use
JOINinstead:SELECT * FROM t1 JOIN t2 JOIN t3 ON (t1.i1 = t3.i3);
This change also applies to statements that mix the comma operator with
INNER JOIN,CROSS JOIN,LEFT JOIN, andRIGHT JOIN, all of which now have higher precedence than the comma operator. -
Previously, the
ONclause could refer to columns in tables named to its right. Now anONclause can refer only to its operands.Example:
CREATE TABLE t1 (i1 INT); CREATE TABLE t2 (i2 INT); CREATE TABLE t3 (i3 INT); SELECT * FROM t1 JOIN t2 ON (i1 = i3) JOIN t3;
Previously, the
SELECTstatement was legal. Now the statement fails with anUnknown column 'i3' in 'on clause'error becausei3is a column int3, which is not an operand of theONclause. The statement should be rewritten as follows:SELECT * FROM t1 JOIN t2 JOIN t3 ON (i1 = i3);
-
Resolution of column names in
NATURALorUSINGjoins is different than previously. For column names that are outside theFROMclause, MySQL now handles a superset of the queries compared to previously. That is, in cases when MySQL formerly issued an error that some column is ambiguous, the query now is handled correctly. This is due to the fact that MySQL now treats the common columns ofNATURALorUSINGjoins as a single column, so when a query refers to such columns, the query compiler does not consider them as ambiguous.Example:
SELECT * FROM t1 NATURAL JOIN t2 WHERE b > 1;
Previously, this query would produce an error
ERROR 1052 (23000): Column 'b' in where clause is ambiguous. Now the query produces the correct result:+------+------+------+ | b | c | y | +------+------+------+ | 4 | 2 | 3 | +------+------+------+
One extension of MySQL compared to the SQL:2003 standard is that MySQL allows you to qualify the common (coalesced) columns of
NATURALorUSINGjoins (just as previously), while the standard disallows that.
SELECT ... UNION [ALL | DISTINCT] SELECT ... [UNION [ALL | DISTINCT] SELECT ...]
UNION is used to combine the result from
multiple SELECT statements into a single
result set.
The column names from the first SELECT
statement are used as the column names for the results
returned. Selected columns listed in corresponding positions
of each SELECT statement should have the
same data type. (For example, the first column selected by the
first statement should have the same type as the first column
selected by the other statements.)
If the data types of corresponding SELECT
columns do not match, the types and lengths of the columns in
the UNION result take into account the
values retrieved by all of the SELECT
statements. For example, consider the following:
mysql> SELECT REPEAT('a',1) UNION SELECT REPEAT('b',10);
+---------------+
| REPEAT('a',1) |
+---------------+
| a |
| bbbbbbbbbb |
+---------------+
(In some earlier versions of MySQL, only the type and length
from the first SELECT would have been used
and the second row would have been truncated to a length of
1.)
The SELECT statements are normal select
statements, but with the following restrictions:
-
Only the last
SELECTstatement can useINTO OUTFILE. -
HIGH_PRIORITYcannot be used withSELECTstatements that are part of aUNION. If you specify it for the firstSELECT, it has no effect. If you specify it for any subsequentSELECTstatements, a syntax error results.
The default behavior for UNION is that
duplicate rows are removed from the result. The optional
DISTINCT keyword has no effect other than
the default because it also specifies duplicate-row removal.
With the optional ALL keyword,
duplicate-row removal does not occur and the result includes
all matching rows from all the SELECT
statements.
You can mix UNION ALL and UNION
DISTINCT in the same query. Mixed
UNION types are treated such that a
DISTINCT union overrides any
ALL union to its left. A
DISTINCT union can be produced explicitly
by using UNION DISTINCT or implicitly by
using UNION with no following
DISTINCT or ALL keyword.
To use an ORDER BY or
LIMIT clause to sort or limit the entire
UNION result, parenthesize the individual
SELECT statements and place the
ORDER BY or LIMIT after
the last one. The following example uses both clauses:
(SELECT a FROM t1 WHERE a=10 AND B=1) UNION (SELECT a FROM t2 WHERE a=11 AND B=2) ORDER BY a LIMIT 10;
This kind of ORDER BY cannot use column
references that include a table name (that is, names in
tbl_name.col_name
format). Instead, provide a column alias in the first
SELECT statement and refer to the alias in
the ORDER BY. (Alternatively, refer to the
column in the ORDER BY using its column
position. However, use of column positions is deprecated.)
Also, if a column to be sorted is aliased, the ORDER
BY clause must refer to the
alias, not the column name. The first of the following
statements will work, but the second will fail with an
Unknown column 'a' in 'order clause' error:
(SELECT a AS b FROM t) UNION (SELECT ...) ORDER BY b; (SELECT a AS b FROM t) UNION (SELECT ...) ORDER BY a;
To apply ORDER BY or
LIMIT to an individual
SELECT, place the clause inside the
parentheses that enclose the SELECT:
(SELECT a FROM t1 WHERE a=10 AND B=1 ORDER BY a LIMIT 10) UNION (SELECT a FROM t2 WHERE a=11 AND B=2 ORDER BY a LIMIT 10);
Use of ORDER BY for individual
SELECT statements implies nothing about the
order in which the rows appear in the final result because
UNION by default produces an unordered set
of rows. If ORDER BY appears with
LIMIT, it is used to determine the subset
of the selected rows to retrieve for the
SELECT, but does not necessarily affect the
order of those rows in the final UNION
result. If ORDER BY appears without
LIMIT in a SELECT, it is
optimized away because it will have no effect anyway.
To cause rows in a UNION result to consist
of the sets of rows retrieved by each
SELECT one after the other, select an
additional column in each SELECT to use as
a sort column and add an ORDER BY following
the last SELECT:
(SELECT 1 AS sort_col, col1a, col1b, ... FROM t1) UNION (SELECT 2, col2a, col2b, ... FROM t2) ORDER BY sort_col;
To additionally maintain sort order within individual
SELECT results, add a secondary column to
the ORDER BY clause:
(SELECT 1 AS sort_col, col1a, col1b, ... FROM t1) UNION (SELECT 2, col2a, col2b, ... FROM t2) ORDER BY sort_col, col1a;
A subquery is a SELECT statement within
another statement.
Starting with MySQL 4.1, all subquery forms and operations that the SQL standard requires are supported, as well as a few features that are MySQL-specific.
Here is an example of a subquery:
SELECT * FROM t1 WHERE column1 = (SELECT column1 FROM t2);
In this example, SELECT * FROM t1 ... is the
outer query (or outer
statement), and (SELECT column1 FROM
t2) is the subquery. We say that
the subquery is nested within the outer
query, and in fact it is possible to nest subqueries within
other subqueries, to a considerable depth. A subquery must
always appear within parentheses.
The main advantages of subqueries are:
-
They allow queries that are structured so that it is possible to isolate each part of a statement.
-
They provide alternative ways to perform operations that would otherwise require complex joins and unions.
-
They are, in many people's opinion, readable. Indeed, it was the innovation of subqueries that gave people the original idea of calling the early SQL “Structured Query Language.”
Here is an example statement that shows the major points about subquery syntax as specified by the SQL standard and supported in MySQL:
DELETE FROM t1 WHERE s11 > ANY (SELECT COUNT(*) /* no hint */ FROM t2 WHERE NOT EXISTS (SELECT * FROM t3 WHERE ROW(5*t2.s1,77)= (SELECT 50,11*s1 FROM t4 UNION SELECT 50,77 FROM (SELECT * FROM t5) AS t5)));
A subquery can return a scalar (a single value), a single row, a single column, or a table (one or more rows of one or more columns). These are called scalar, column, row, and table subqueries. Subqueries that return a particular kind of result often can be used only in certain contexts, as described in the following sections.
There are few restrictions on the type of statements in which
subqueries can be used. A subquery can contain any of the
keywords or clauses that an ordinary SELECT
can contain: DISTINCT, GROUP
BY, ORDER BY,
LIMIT, joins, index hints,
UNION constructs, comments, functions, and so
on.
One restriction is that a subquery's outer statement must be one
of: SELECT, INSERT,
UPDATE, DELETE,
SET, or DO. Another
restriction is that currently you cannot modify a table and
select from the same table in a subquery. This applies to
statements such as DELETE,
INSERT, REPLACE,
UPDATE, and (because subqueries can be used
in the SET clause) LOAD DATA
INFILE.
A more comprehensive discussion of restrictions on subquery use, including performance issues for certain forms of subquery syntax, is given in Section I.3, “Restrictions on Subqueries”.
In its simplest form, a subquery is a scalar subquery that
returns a single value. A scalar subquery is a simple operand,
and you can use it almost anywhere a single column value or
literal is legal, and you can expect it to have those
characteristics that all operands have: a data type, a length,
an indication whether it can be NULL, and
so on. For example:
CREATE TABLE t1 (s1 INT, s2 CHAR(5) NOT NULL); INSERT INTO t1 VALUES(100, 'abcde'); SELECT (SELECT s2 FROM t1);
The subquery in this SELECT returns a
single value ('abcde') that has a data type
of CHAR, a length of 5, a character set and
collation equal to the defaults in effect at CREATE
TABLE time, and an indication that the value in the
column can be NULL. In fact, almost all
subqueries can be NULL. If the table used
in the example were empty, the value of the subquery would be
NULL.
There are a few contexts in which a scalar subquery cannot be
used. If a statement allows only a literal value, you cannot
use a subquery. For example, LIMIT requires
literal integer arguments, and LOAD DATA
INFILE requires a literal string filename. You
cannot use subqueries to supply these values.
When you see examples in the following sections that contain
the rather spartan construct (SELECT column1 FROM
t1), imagine that your own code contains much more
diverse and complex constructions.
Suppose that we make two tables:
CREATE TABLE t1 (s1 INT); INSERT INTO t1 VALUES (1); CREATE TABLE t2 (s1 INT); INSERT INTO t2 VALUES (2);
Then perform a SELECT:
SELECT (SELECT s1 FROM t2) FROM t1;
The result is 2 because there is a row in
t2 containing a column
s1 that has a value of
2.
A scalar subquery can be part of an expression, but remember the parentheses, even if the subquery is an operand that provides an argument for a function. For example:
SELECT UPPER((SELECT s1 FROM t1)) FROM t2;
The most common use of a subquery is in the form:
non_subquery_operandcomparison_operator(subquery)
Where comparison_operator is one of
these operators:
= > < >= <= <>
For example:
... 'a' = (SELECT column1 FROM t1)
At one time the only legal place for a subquery was on the right side of a comparison, and you might still find some old DBMSs that insist on this.
Here is an example of a common-form subquery comparison that
you cannot do with a join. It finds all the values in table
t1 that are equal to a maximum value in
table t2:
SELECT column1 FROM t1 WHERE column1 = (SELECT MAX(column2) FROM t2);
Here is another example, which again is impossible with a join
because it involves aggregating for one of the tables. It
finds all rows in table t1 containing a
value that occurs twice in a given column:
SELECT * FROM t1 AS t WHERE 2 = (SELECT COUNT(*) FROM t1 WHERE t1.id = t.id);
For a comparison performed with one of these operators, the
subquery must return a scalar, with the exception that
= can be used with row subqueries. See
Section 13.2.8.5, “Row Subqueries”.
Syntax:
operandcomparison_operatorANY (subquery)operandIN (subquery)operandcomparison_operatorSOME (subquery)
The ANY keyword, which must follow a
comparison operator, means “return
TRUE if the comparison is
TRUE for ANY of the
values in the column that the subquery returns.” For
example:
SELECT s1 FROM t1 WHERE s1 > ANY (SELECT s1 FROM t2);
Suppose that there is a row in table t1
containing (10). The expression is
TRUE if table t2
contains (21,14,7) because there is a value
7 in t2 that is less
than 10. The expression is
FALSE if table t2
contains (20,10), or if table
t2 is empty. The expression is
UNKNOWN if table t2
contains (NULL,NULL,NULL).
The word IN is an alias for =
ANY. Thus, these two statements are the same:
SELECT s1 FROM t1 WHERE s1 = ANY (SELECT s1 FROM t2); SELECT s1 FROM t1 WHERE s1 IN (SELECT s1 FROM t2);
However, NOT IN is not an alias for
<> ANY, but for <>
ALL. See Section 13.2.8.4, “Subqueries with ALL”.
The word SOME is an alias for
ANY. Thus, these two statements are the
same:
SELECT s1 FROM t1 WHERE s1 <> ANY (SELECT s1 FROM t2); SELECT s1 FROM t1 WHERE s1 <> SOME (SELECT s1 FROM t2);
Use of the word SOME is rare, but this
example shows why it might be useful. To most people's ears,
the English phrase “a is not equal to any b”
means “there is no b which is equal to a,” but
that is not what is meant by the SQL syntax. The syntax means
“there is some b to which a is not equal.” Using
<> SOME instead helps ensure that
everyone understands the true meaning of the query.
Syntax:
operandcomparison_operatorALL (subquery)
The word ALL, which must follow a
comparison operator, means “return
TRUE if the comparison is
TRUE for ALL of the
values in the column that the subquery returns.” For
example:
SELECT s1 FROM t1 WHERE s1 > ALL (SELECT s1 FROM t2);
Suppose that there is a row in table t1
containing (10). The expression is
TRUE if table t2
contains (-5,0,+5) because
10 is greater than all three values in
t2. The expression is
FALSE if table t2
contains (12,6,NULL,-100) because there is
a single value 12 in table
t2 that is greater than
10. The expression is
unknown (that is,
NULL) if table t2
contains (0,NULL,1).
Finally, if table t2 is empty, the result
is TRUE. So, the following statement is
TRUE when table t2 is
empty:
SELECT * FROM t1 WHERE 1 > ALL (SELECT s1 FROM t2);
But this statement is NULL when table
t2 is empty:
SELECT * FROM t1 WHERE 1 > (SELECT s1 FROM t2);
In addition, the following statement is
NULL when table t2 is
empty:
SELECT * FROM t1 WHERE 1 > ALL (SELECT MAX(s1) FROM t2);
In general, tables containing
NULL values and empty
tables are “edge cases.” When writing
subquery code, always consider whether you have taken those
two possibilities into account.
NOT IN is an alias for <>
ALL. Thus, these two statements are the same:
SELECT s1 FROM t1 WHERE s1 <> ALL (SELECT s1 FROM t2); SELECT s1 FROM t1 WHERE s1 NOT IN (SELECT s1 FROM t2);
The discussion to this point has been of scalar or column subqueries; that is, subqueries that return a single value or a column of values. A row subquery is a subquery variant that returns a single row and can thus return more than one column value. Here are two examples:
SELECT * FROM t1 WHERE (1,2) = (SELECT column1, column2 FROM t2); SELECT * FROM t1 WHERE ROW(1,2) = (SELECT column1, column2 FROM t2);
The queries here are both TRUE if table
t2 has a row where column1 =
1 and column2 = 2.
The expressions (1,2) and
ROW(1,2) are sometimes called
row constructors. The two are
equivalent. They are legal in other contexts as well. For
example, the following two statements are semantically
equivalent (although currently only the second one can be
optimized):
SELECT * FROM t1 WHERE (column1,column2) = (1,1); SELECT * FROM t1 WHERE column1 = 1 AND column2 = 1;
The normal use of row constructors is for comparisons with
subqueries that return two or more columns. For example, the
following query answers the request, “find all rows in
table t1 that also exist in table
t2”:
SELECT column1,column2,column3 FROM t1 WHERE (column1,column2,column3) IN (SELECT column1,column2,column3 FROM t2);
If a subquery returns any rows at all, EXISTS
subquery is
TRUE, and NOT EXISTS
subquery is
FALSE. For example:
SELECT column1 FROM t1 WHERE EXISTS (SELECT * FROM t2);
Traditionally, an EXISTS subquery starts
with SELECT *, but it could begin with
SELECT 5 or SELECT
column1 or anything at all. MySQL ignores the
SELECT list in such a subquery, so it makes
no difference.
For the preceding example, if t2 contains
any rows, even rows with nothing but NULL
values, the EXISTS condition is
TRUE. This is actually an unlikely example
because a [NOT] EXISTS subquery almost
always contains correlations. Here are some more realistic
examples:
-
What kind of store is present in one or more cities?
SELECT DISTINCT store_type FROM stores WHERE EXISTS (SELECT * FROM cities_stores WHERE cities_stores.store_type = stores.store_type); -
What kind of store is present in no cities?
SELECT DISTINCT store_type FROM stores WHERE NOT EXISTS (SELECT * FROM cities_stores WHERE cities_stores.store_type = stores.store_type); -
What kind of store is present in all cities?
SELECT DISTINCT store_type FROM stores s1 WHERE NOT EXISTS ( SELECT * FROM cities WHERE NOT EXISTS ( SELECT * FROM cities_stores WHERE cities_stores.city = cities.city AND cities_stores.store_type = stores.store_type));
The last example is a double-nested NOT
EXISTS query. That is, it has a NOT
EXISTS clause within a NOT EXISTS
clause. Formally, it answers the question “does a city
exist with a store that is not in
Stores”? But it is easier to say
that a nested NOT EXISTS answers the
question “is x
TRUE for all
y?”
A correlated subquery is a subquery that contains a reference to a table that also appears in the outer query. For example:
SELECT * FROM t1 WHERE column1 = ANY (SELECT column1 FROM t2 WHERE t2.column2 = t1.column2);
Notice that the subquery contains a reference to a column of
t1, even though the subquery's
FROM clause does not mention a table
t1. So, MySQL looks outside the subquery,
and finds t1 in the outer query.
Suppose that table t1 contains a row where
column1 = 5 and column2 =
6; meanwhile, table t2 contains a
row where column1 = 5 and column2
= 7. The simple expression ... WHERE
column1 = ANY (SELECT column1 FROM t2) would be
TRUE, but in this example, the
WHERE clause within the subquery is
FALSE (because (5,6) is
not equal to (5,7)), so the subquery as a
whole is FALSE.
Scoping rule: MySQL evaluates from inside to outside. For example:
SELECT column1 FROM t1 AS x WHERE x.column1 = (SELECT column1 FROM t2 AS x WHERE x.column1 = (SELECT column1 FROM t3 WHERE x.column2 = t3.column1));
In this statement, x.column2 must be a
column in table t2 because SELECT
column1 FROM t2 AS x ... renames
t2. It is not a column in table
t1 because SELECT column1 FROM t1
... is an outer query that is farther
out.
For subqueries in HAVING or ORDER
BY clauses, MySQL also looks for column names in the
outer select list.
For certain cases, a correlated subquery is optimized. For example:
valIN (SELECTkey_valFROMtbl_nameWHEREcorrelated_condition)
Otherwise, they are inefficient and likely to be slow. Rewriting the query as a join might improve performance.
Correlated subqueries cannot refer to the results of aggregate functions from the outer query.
Subqueries are legal in a SELECT
statement's FROM clause. The actual syntax
is:
SELECT ... FROM (subquery) [AS]name...
The [AS] name
clause is mandatory, because every table in a
FROM clause must have a name. Any columns
in the subquery select list must
have unique names. You can find this syntax described
elsewhere in this manual, where the term used is
“derived tables.”
For the sake of illustration, assume that you have this table:
CREATE TABLE t1 (s1 INT, s2 CHAR(5), s3 FLOAT);
Here is how to use a subquery in the FROM
clause, using the example table:
INSERT INTO t1 VALUES (1,'1',1.0); INSERT INTO t1 VALUES (2,'2',2.0); SELECT sb1,sb2,sb3 FROM (SELECT s1 AS sb1, s2 AS sb2, s3*2 AS sb3 FROM t1) AS sb WHERE sb1 > 1;
Result: 2, '2', 4.0.
Here is another example: Suppose that you want to know the average of a set of sums for a grouped table. This does not work:
SELECT AVG(SUM(column1)) FROM t1 GROUP BY column1;
However, this query provides the desired information:
SELECT AVG(sum_column1) FROM (SELECT SUM(column1) AS sum_column1 FROM t1 GROUP BY column1) AS t1;
Notice that the column name used within the subquery
(sum_column1) is recognized in the outer
query.
Subqueries in the FROM clause can return a
scalar, column, row, or table. Subqueries in the
FROM clause cannot be correlated
subqueries.
Subqueries in the FROM clause are executed
even for the EXPLAIN statement (that is,
derived temporary tables are built). This occurs because upper
level queries need information about all tables during
optimization phase.
There are some errors that apply only to subqueries. This section describes them.
-
Unsupported subquery syntax:
ERROR 1235 (ER_NOT_SUPPORTED_YET) SQLSTATE = 42000 Message = "This version of MySQL does not yet support 'LIMIT & IN/ALL/ANY/SOME subquery'"
This means that statements of the following form do not work yet:
SELECT * FROM t1 WHERE s1 IN (SELECT s2 FROM t2 ORDER BY s1 LIMIT 1)
-
Incorrect number of columns from subquery:
ERROR 1241 (ER_OPERAND_COL) SQLSTATE = 21000 Message = "Operand should contain 1 column(s)"
This error occurs in cases like this:
SELECT (SELECT column1, column2 FROM t2) FROM t1;
You may use a subquery that returns multiple columns, if the purpose is comparison. See Section 13.2.8.5, “Row Subqueries”. However, in other contexts, the subquery must be a scalar operand.
-
Incorrect number of rows from subquery:
ERROR 1242 (ER_SUBSELECT_NO_1_ROW) SQLSTATE = 21000 Message = "Subquery returns more than 1 row"
This error occurs for statements where the subquery returns more than one row. Consider the following example:
SELECT * FROM t1 WHERE column1 = (SELECT column1 FROM t2);
If
SELECT column1 FROM t2returns just one row, the previous query will work. If the subquery returns more than one row, error 1242 will occur. In that case, the query should be rewritten as:SELECT * FROM t1 WHERE column1 = ANY (SELECT column1 FROM t2);
-
Incorrectly used table in subquery:
Error 1093 (ER_UPDATE_TABLE_USED) SQLSTATE = HY000 Message = "You can't specify target table 'x' for update in FROM clause"
This error occurs in cases such as the following:
UPDATE t1 SET column2 = (SELECT MAX(column1) FROM t1);
You can use a subquery for assignment within an
UPDATEstatement because subqueries are legal inUPDATEandDELETEstatements as well as inSELECTstatements. However, you cannot use the same table (in this case, tablet1) for both the subquery'sFROMclause and the update target.
For transactional storage engines, the failure of a subquery causes the entire statement to fail. For non-transactional storage engines, data modifications made before the error was encountered are preserved.
Development is ongoing, so no optimization tip is reliable for the long term. The following list provides some interesting tricks that you might want to play with:
-
Use subquery clauses that affect the number or order of the rows in the subquery. For example:
SELECT * FROM t1 WHERE t1.column1 IN (SELECT column1 FROM t2 ORDER BY column1); SELECT * FROM t1 WHERE t1.column1 IN (SELECT DISTINCT column1 FROM t2); SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 LIMIT 1);
-
Replace a join with a subquery. For example, try this:
SELECT DISTINCT column1 FROM t1 WHERE t1.column1 IN ( SELECT column1 FROM t2);
Instead of this:
SELECT DISTINCT t1.column1 FROM t1, t2 WHERE t1.column1 = t2.column1;
-
Some subqueries can be transformed to joins for compatibility with older versions of MySQL that do not support subqueries. However, in some cases, converting a subquery to a join may improve performance. See Section 13.2.8.11, “Rewriting Subqueries as Joins for Earlier MySQL Versions”.
-
Move clauses from outside to inside the subquery. For example, use this query:
SELECT * FROM t1 WHERE s1 IN (SELECT s1 FROM t1 UNION ALL SELECT s1 FROM t2);
Instead of this query:
SELECT * FROM t1 WHERE s1 IN (SELECT s1 FROM t1) OR s1 IN (SELECT s1 FROM t2);
For another example, use this query:
SELECT (SELECT column1 + 5 FROM t1) FROM t2;
Instead of this query:
SELECT (SELECT column1 FROM t1) + 5 FROM t2;
-
Use a row subquery instead of a correlated subquery. For example, use this query:
SELECT * FROM t1 WHERE (column1,column2) IN (SELECT column1,column2 FROM t2);
Instead of this query:
SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t2.column1=t1.column1 AND t2.column2=t1.column2);
-
Use
NOT (a = ANY (...))rather thana <> ALL (...). -
Use
x = ANY (table containing (1,2)) rather thanx=1 OR x=2. -
Use
= ANYrather thanEXISTS. -
For uncorrelated subqueries that always return one row,
INis always slower than=. For example, use this query:SELECT * FROM t1 WHERE t1.
col_name= (SELECT a FROM t2 WHERE b =some_const);Instead of this query:
SELECT * FROM t1 WHERE t1.
col_nameIN (SELECT a FROM t2 WHERE b =some_const);
These tricks might cause programs to go faster or slower.
Using MySQL facilities like the BENCHMARK()
function, you can get an idea about what helps in your own
situation. See Section 12.9.3, “Information Functions”.
Some optimizations that MySQL itself makes are:
-
MySQL executes non-correlated subqueries only once. Use
EXPLAINto make sure that a given subquery really is non-correlated. -
MySQL rewrites
IN,ALL,ANY, andSOMEsubqueries in an attempt to take advantage of the possibility that the select-list columns in the subquery are indexed. -
MySQL replaces subqueries of the following form with an index-lookup function, which
EXPLAINdescribes as a special join type (unique_subqueryorindex_subquery):... IN (SELECT
indexed_columnFROMsingle_table...) -
MySQL enhances expressions of the following form with an expression involving
MIN()orMAX(), unlessNULLvalues or empty sets are involved:value{ALL|ANY|SOME} {> | < | >= | <=} (non-correlated subquery)For example, this
WHEREclause:WHERE 5 > ALL (SELECT x FROM t)
might be treated by the optimizer like this:
WHERE 5 > (SELECT MAX(x) FROM t)
There is a chapter titled “How MySQL Transforms Subqueries” in the MySQL Internals Manual, available at http://dev.mysql.com/doc/.
In previous versions of MySQL (prior to MySQL 4.1), only
nested queries of the form INSERT ... SELECT
... and REPLACE ... SELECT ...
were supported. Although this is not the case in MySQL
5.0, it is still true that there are sometimes
other ways to test membership in a set of values. It is also
true that on some occasions, it is not only possible to
rewrite a query without a subquery, but it can be more
efficient to make use of some of these techniques rather than
to use subqueries. One of these is the IN()
construct:
For example, this query:
SELECT * FROM t1 WHERE id IN (SELECT id FROM t2);
Can be rewritten as:
SELECT DISTINCT t1.* FROM t1, t2 WHERE t1.id=t2.id;
The queries:
SELECT * FROM t1 WHERE id NOT IN (SELECT id FROM t2); SELECT * FROM t1 WHERE NOT EXISTS (SELECT id FROM t2 WHERE t1.id=t2.id);
Can be be rewritten using IN():
SELECT table1.* FROM table1 LEFT JOIN table2 ON table1.id=table2.id WHERE table2.id IS NULL;
A LEFT [OUTER] JOIN can be faster than an
equivalent subquery because the server might be able to
optimize it better — a fact that is not specific to
MySQL Server alone. Prior to SQL-92, outer joins did not
exist, so subqueries were the only way to do certain things.
Today, MySQL Server and many other modern database systems
offer a wide range of outer join types.
MySQL Server supports multiple-table DELETE
statements that can be used to efficiently delete rows based
on information from one table or even from many tables at the
same time. Multiple-table UPDATE statements
are also supported.
TRUNCATE [TABLE] tbl_name
TRUNCATE TABLE empties a table completely.
Logically, this is equivalent to a DELETE
statement that deletes all rows, but there are practical
differences under some circumstances.
For InnoDB before version 5.0.3,
TRUNCATE TABLE is mapped to
DELETE, so there is no difference. Starting
with MySQL 5.0.3, fast TRUNCATE TABLE is
available. However, the operation is still mapped to
DELETE if there are foreign key constraints
that reference the table. (When fast truncate is used, it resets
any AUTO_INCREMENT counter. From MySQL 5.0.13
on, the AUTO_INCREMENT counter is reset by
TRUNCATE TABLE, regardless of whether there
is a foreign key constraint.)
For other storage engines, TRUNCATE TABLE
differs from DELETE in the following ways in
MySQL 5.0:
-
Truncate operations drop and re-create the table, which is much faster than deleting rows one by one.
-
Truncate operations are not transaction-safe; an error occurs when attempting one in the course of an active transaction or active table lock.
-
The number of deleted rows is not returned.
-
As long as the table format file
tbl_name.frm is valid, the table can be re-created as an empty table withTRUNCATE TABLE, even if the data or index files have become corrupted. -
The table handler does not remember the last used
AUTO_INCREMENTvalue, but starts counting from the beginning. This is true even forMyISAMandInnoDB, which normally do not reuse sequence values.
Since truncation of a table does not make any use of
DELETE, the TRUNCATE
statement does not invoke ON DELETE triggers.
TRUNCATE TABLE is an Oracle SQL extension
adopted in MySQL.
Single-table syntax:
UPDATE [LOW_PRIORITY] [IGNORE]tbl_nameSETcol_name1=expr1[,col_name2=expr2...] [WHEREwhere_condition] [ORDER BY ...] [LIMITrow_count]
Multiple-table syntax:
UPDATE [LOW_PRIORITY] [IGNORE]table_referencesSETcol_name1=expr1[,col_name2=expr2...] [WHEREwhere_condition]
For the single-table syntax, the UPDATE
statement updates columns of existing rows in
tbl_name with new values. The
SET clause indicates which columns to modify
and the values they should be given. The
WHERE clause, if given, specifies the
conditions that identify which rows to update. With no
WHERE clause, all rows are updated. If the
ORDER BY clause is specified, the rows are
updated in the order that is specified. The
LIMIT clause places a limit on the number of
rows that can be updated.
For the multiple-table syntax, UPDATE updates
rows in each table named in
table_references that satisfy the
conditions. In this case, ORDER BY and
LIMIT cannot be used.
where_condition is an expression that
evaluates to true for each row to be updated. It is specified as
described in Section 13.2.7, “SELECT Syntax”.
The UPDATE statement supports the following
modifiers:
-
If you use the
LOW_PRIORITYkeyword, execution of theUPDATEis delayed until no other clients are reading from the table. -
If you use the
IGNOREkeyword, the update statement does not abort even if errors occur during the update. Rows for which duplicate-key conflicts occur are not updated. Rows for which columns are updated to values that would cause data conversion errors are updated to the closet valid values instead.
If you access a column from tbl_name
in an expression, UPDATE uses the current
value of the column. For example, the following statement sets
the age column to one more than its current
value:
UPDATE persondata SET age=age+1;
Single-table UPDATE assignments are generally
evaluated from left to right. For multiple-table updates, there
is no guarantee that assignments are carried out in any
particular order.
If you set a column to the value it currently has, MySQL notices this and does not update it.
If you update a column that has been declared NOT
NULL by setting to NULL, the column
is set to the default value appropriate for the data type and
the warning count is incremented. The default value is
0 for numeric types, the empty string
('') for string types, and the
“zero” value for date and time types.
UPDATE returns the number of rows that were
actually changed. The mysql_info() C API
function returns the number of rows that were matched and
updated and the number of warnings that occurred during the
UPDATE.
You can use LIMIT
row_count to restrict the
scope of the UPDATE. A
LIMIT clause is a rows-matched restriction.
The statement stops as soon as it has found
row_count rows that satisfy the
WHERE clause, whether or not they actually
were changed.
If an UPDATE statement includes an
ORDER BY clause, the rows are updated in the
order specified by the clause. This can be useful in certain
situations that might otherwise result in an error. Suppose that
a table t contains a column
id that has a unique index. The following
statement could fail with a duplicate-key error, depending on
the order in which rows are updated:
UPDATE t SET id = id + 1;
For example, if the table contains 1 and 2 in the
id column and 1 is updated to 2 before 2 is
updated to 3, an error occurs. To avoid this problem, add an
ORDER BY clause to cause the rows with larger
id values to be updated before those with
smaller values:
UPDATE t SET id = id + 1 ORDER BY id DESC;
You can also perform UPDATE operations
covering multiple tables. However, you cannot use ORDER
BY or LIMIT with a multiple-table
UPDATE. The
table_references clause lists the
tables involved in the join. Its syntax is described in
Section 13.2.7.1, “JOIN Syntax”. Here is an example:
UPDATE items,month SET items.price=month.price WHERE items.id=month.id;
The preceding example shows an inner join that uses the comma
operator, but multiple-table UPDATE
statements can use any type of join allowed in
SELECT statements, such as LEFT
JOIN.
You need the UPDATE privilege only for
columns referenced in a multiple-table UPDATE
that are actually updated. You need only the
SELECT privilege for any columns that are
read but not modified.
If you use a multiple-table UPDATE statement
involving InnoDB tables for which there are
foreign key constraints, the MySQL optimizer might process
tables in an order that differs from that of their parent/child
relationship. In this case, the statement fails and rolls back.
Instead, update a single table and rely on the ON
UPDATE capabilities that InnoDB
provides to cause the other tables to be modified accordingly.
See Section 14.2.6.4, “FOREIGN KEY Constraints”.
Currently, you cannot update a table and select from the same table in a subquery.