-
Known bug to be fixed later: If you compare a
NULLvalue to a subquery usingALL,ANY, orSOME, and the subquery returns an empty result, the comparison might evaluate to the non-standard result ofNULLrather than toTRUEorFALSE. -
A subquery's outer statement can be any one of:
SELECT,INSERT,UPDATE,DELETE,SET, orDO. -
Subquery optimization for
INis not as effective as for the=operator or forIN(value_list) constructs.A typical case for poor
INsubquery performance is when the subquery returns a small number of rows but the outer query returns a large number of rows to be compared to the subquery result.The problem is that, for a statement that uses an
INsubquery, the optimizer rewrites it as a correlated subquery. Consider the following statement that uses an uncorrelated subquery:SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
The optimizer rewrites the statement to a correlated subquery:
SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
If the inner and outer queries return
MandNrows, respectively, the execution time becomes on the order ofO(M×N), rather thanO(M+N) as it would be for an uncorrelated subquery.An implication is that an
INsubquery can be much slower than a query written using anIN(value_list) construct that lists the same values that the subquery would return. -
In general, you cannot modify a table and select from the same table in a subquery. For example, this limitation applies to statements of the following forms:
DELETE FROM t WHERE ... (SELECT ... FROM t ...); UPDATE t ... WHERE col = (SELECT ... FROM t ...); {INSERT|REPLACE} INTO t (SELECT ... FROM t ...);Exception: The preceding prohibition does not apply if you are using a subquery for the modified table in the
FROMclause. Example:UPDATE t ... WHERE col = (SELECT (SELECT ... FROM t...) AS _t ...);
Here the prohibition does not apply because the result from a subquery in the
FROMclause is stored as a temporary table, so the relevant rows inthave already been selected by the time the update tottakes place. -
Row comparison operations are only partially supported:
-
For
exprIN (subquery),exprcan be ann-tuple (specified via row constructor syntax) and the subquery can return rows ofn-tuples. -
For
exprop{ALL|ANY|SOME} (subquery),exprmust be a scalar value and the subquery must be a column subquery; it cannot return multiple-column rows.
In other words, for a subquery that returns rows of
n-tuples, this is supported:(
val_1, ...,val_n) IN (subquery)But this is not supported:
(
val_1, ...,val_n)op{ALL|ANY|SOME} (subquery)The reason for supporting row comparisons for
INbut not for the others is thatINis implemented by rewriting it as a sequence of=comparisons andANDoperations. This approach cannot be used forALL,ANY, orSOME. -
-
Row constructors are not well optimized. The following two expressions are equivalent, but only the second can be optimized:
(col1, col2, ...) = (val1, val2, ...) col1 = val1 AND col2 = val2 AND ...
-
Subqueries in the
FROMclause cannot be correlated subqueries. They are materialized (executed to produce a result set) before evaluating the outer query, so they cannot be evaluated per row of the outer query. -
The optimizer is more mature for joins than for subqueries, so in many cases a statement that uses a subquery can be executed more efficiently if you rewrite it as a join.
An exception occurs for the case where an
INsubquery can be rewritten as aSELECT DISTINCTjoin. Example:SELECT col FROM t1 WHERE id_col IN (SELECT id_col2 FROM t2 WHERE
condition);That statement can be rewritten as follows:
SELECT DISTINCT col FROM t1, t2 WHERE t1.id_col = t2.id_col AND
condition;But in this case, the join requires an extra
DISTINCToperation and is not more efficient than the subquery. -
Possible future optimization: MySQL does not rewrite the join order for subquery evaluation. In some cases, a subquery could be executed more efficiently if MySQL rewrote it as a join. This would give the optimizer a chance to choose between more execution plans. For example, it could decide whether to read one table or the other first.
Example:
SELECT a FROM outer_table AS ot WHERE a IN (SELECT a FROM inner_table AS it WHERE ot.b = it.b);
For that query, MySQL always scans
outer_tablefirst and then executes the subquery oninner_tablefor each row. Ifouter_tablehas a lot of rows andinner_tablehas few rows, the query probably will not be as fast as it could be.The preceding query could be rewritten like this:
SELECT a FROM outer_table AS ot, inner_table AS it WHERE ot.a = it.a AND ot.b = it.b;
In this case, we can scan the small table (
inner_table) and look up rows inouter_table, which will be fast if there is an index on(ot.a,ot.b). -
Possible future optimization: A correlated subquery is evaluated for each row of the outer query. A better approach is that if the outer row values do not change from the previous row, do not evaluate the subquery again. Instead, use its previous result.
-
Possible future optimization: A subquery in the
FROMclause is evaluated by materializing the result into a temporary table, and this table does not use indexes. This does not allow the use of indexes in comparison with other tables in the query, although that might be useful. -
Possible future optimization: If a subquery in the
FROMclause resembles a view to which the merge algorithm can be applied, rewrite the query and apply the merge algorithm so that indexes can be used. The following statement contains such a subquery:SELECT * FROM (SELECT * FROM t1 WHERE t1.t1_col) AS _t1, t2 WHERE t2.t2_col;
The statement can be rewritten as a join like this:
SELECT * FROM t1, t2 WHERE t1.t1_col AND t2.t2_col;
This type of rewriting would provide two benefits:
-
It avoids the use of a temporary table for which no indexes can be used. In the rewritten query, the optimizer can use indexes on
t1. -
It gives the optimizer more freedom to choose between different execution plans. For example, rewriting the query as a join allows the optimizer to use
t1ort2first.
-
-
Possible future optimization: For
IN,= ANY,<> ANY,= ALL, and<> ALLwith non-correlated subqueries, use an in-memory hash for a result result or a temporary table with an index for larger results. Example:SELECT a FROM big_table AS bt WHERE non_key_field IN (SELECT non_key_field FROM
tableWHEREcondition)In this case, we could create a temporary table:
CREATE TABLE t (key (non_key_field)) (SELECT non_key_field FROM
tableWHEREcondition)Then, for each row in
big_table, do a key lookup intbased onbt.non_key_field.