Tabla de contenidos
- 7.1. Panorámica sobre optimización
-
7.2. Optimizar sentencias
SELECT
y otras consultas -
-
7.2.1. Sintaxis de
EXPLAIN
(Obtener información acerca de unSELECT
) - 7.2.2. Estimar el renidimiento de una consulta
-
7.2.3. Velocidad de las consultas
SELECT
-
7.2.4. Optimización de las cláusulas
WHERE
por parte de MySQL - 7.2.5. Optimización de rango
- 7.2.6. Index Merge Optimization
-
7.2.7. Cómo optimiza MySQL
IS NULL
-
7.2.8. Cómo MySQL optimiza
DISTINCT
-
7.2.9. Cómo optimiza MySQL los
LEFT JOIN
yRIGHT JOIN
-
7.2.10. Cómo optimiza MySQL
ORDER BY
-
7.2.11. Cómo optimiza MySQL los
GROUP BY
-
7.2.12. Cómo optimiza MySQL las cláusulas
LIMIT
- 7.2.13. Cómo evitar lecturas completas de tablas
-
7.2.14. Velocidad de la sentencia
INSERT
-
7.2.15. Velocidad de las sentencias
UPDATE
-
7.2.16. Velocidad de sentencias
DELETE
- 7.2.17. Otros consejos sobre optimización
-
7.2.1. Sintaxis de
- 7.3. Temas relacionados con el bloqueo
- 7.4. Optimizar la estructura de una base de datos
-
- 7.4.1. Elecciones de diseño
- 7.4.2. Haga sus datos lo más pequeños posibles
- 7.4.3. Índices de columna
- 7.4.4. Índices de múltiples columnas
- 7.4.5. Cómo utiliza MySQL los índices
-
7.4.6. La caché de claves de
MyISAM
- 7.4.7. Cómo cuenta MySQL las tablas abiertas
- 7.4.8. Cómo abre y cierra tablas MySQL
- 7.4.9. Desventajas de crear muchas tablas en la misma base de datos
- 7.5. Optimización del servidor MySQL
-
- 7.5.1. Factores de sistema y afinamientos de parámetros de arranque
- 7.5.2. Afinar parámetros del servidor
- 7.5.3. Vigilar el rendimiento del optimizador de consultas
- 7.5.4. Efectos de la compilación y del enlace en la velocidad de MySQL
- 7.5.5. Cómo utiliza MySQL la memoria
- 7.5.6. Cómo usa MySQL las DNS
- 7.6. Cuestiones relacionadas con el disco
Optimization is a complex task because ultimately it requires understanding of the entire system to be optimized. Although it may be possible to perform some local optimizations with little knowledge of your system or application, the more optimal you want your system to become, the more you have to know about it.
This chapter tries to explain and give some examples of different ways to optimize MySQL. Remember, however, that there are always additional ways to make the system even faster, although they may require increasing effort to achieve.
The most important factor in making a system fast is its basic design. You also need to know what kinds of things your system is doing, and what your bottlenecks are.
The most common system bottlenecks are:
-
Disk seeks. It takes time for the disk to find a piece of data. With modern disks, the mean time for this is usually lower than 10ms, so we can in theory do about 100 seeks a second. This time improves slowly with new disks and is very hard to optimize for a single table. The way to optimize seek time is to distribute the data onto more than one disk.
-
Disk reading and writing. When the disk is at the correct position, we need to read the data. With modern disks, one disk delivers at least 10-20MB/s throughput. This is easier to optimize than seeks because you can read in parallel from multiple disks.
-
CPU cycles. When we have the data in main memory, we need to process it to get our result. Having small tables compared to the amount of memory is the most common limiting factor. But with small tables, speed is usually not the problem.
-
Memory bandwidth. When the CPU needs more data than can fit in the CPU cache, main memory bandwidth becomes a bottleneck. This is an uncommon bottleneck for most systems, but one to be aware of.
When using the MyISAM
storage engine, MySQL
uses extremely fast table locking that allows multiple readers
or a single writer. The biggest problem with this storage engine
occurs when you have a steady stream of mixed updates and slow
selects on a single table. If this is a problem for certain
tables, you can use another storage engine for them. See
Capítulo 14, Motores de almacenamiento de MySQL y tipos de tablas.
MySQL can work with both transactional and non-transactional
tables. To make it easier to work smoothly with
non-transactional tables (which cannot roll back if something
goes wrong), MySQL has the following rules. Note that these
rules apply only when not
running in strict mode or if you use the
IGNORE
specifier for
INSERT
or UPDATE
.
-
All columns have default values. Note that when running in strict SQL mode (including
TRADITIONAL
SQL mode), you must specify any default value for aNOT NULL
column. -
If you insert an inappropriate or out-of-range value into a column, MySQL sets the column to the “best possible value” instead of reporting an error. For numerical values, this is 0, the smallest possible value or the largest possible value. For strings, this is either the empty string or as much of the string as can be stored in the column. Note that this behaviour does not apply when running in strict or
TRADITIONAL
SQL mode. -
All calculated expressions return a value that can be used instead of signaling an error condition. For example, 1/0 returns
NULL
. (This behavior can be changed by using theERROR_FOR_DIVISION_BY_ZERO
SQL mode).
If you are using non-transactional tables, you should not use MySQL to check column content. In general, the safest (and often fastest) way is to let the application ensure that it passes only legal values to the database.
For more information about this, see
Sección 1.7.6, “Cómo trata MySQL las restricciones (Constraints)” and Sección 13.2.4, “Sintaxis de INSERT
” or
Sección 5.3.2, “El modo SQL del servidor”.
Because all SQL servers implement different parts of standard SQL, it takes work to write portable SQL applications. It is very easy to achieve portability for very simple selects and inserts, but becomes more difficult the more capabilities you require. If you want an application that is fast with many database systems, it becomes even harder!
To make a complex application portable, you need to determine which SQL servers it must work with, then determine what features those servers support.
All database systems have some weak points. That is, they have different design compromises that lead to different behavior.
You can use the MySQL crash-me program to find functions, types, and limits that you can use with a selection of database servers. crash-me does not check for every possible feature, but it is still reasonably comprehensive, performing about 450 tests.
An example of the type of information crash-me can provide is that you should not use column names that are longer than 18 characters if you want to be able to use Informix or DB2.
The crash-me program and the MySQL benchmarks
are all very database independent. By taking a look at how they
are written, you can get a feeling for what you have to do to
make your own applications database independent. The programs
can be found in the sql-bench
directory of
MySQL source distributions. They are written in Perl and use the
DBI database interface. Use of DBI in itself solves part of the
portability problem because it provides database-independent
access methods.
For crash-me results, visit http://dev.mysql.com/tech-resources/crash-me.php. See http://dev.mysql.com/tech-resources/benchmarks/ for the results from the benchmarks.
If you strive for database independence, you need to get a good
feeling for each SQL server's bottlenecks. For example, MySQL is
very fast in retrieving and updating records for
MyISAM
tables, but has a problem in mixing
slow readers and writers on the same table. Oracle, on the other
hand, has a big problem when you try to access rows that you
have recently updated (until they are flushed to disk).
Transactional databases in general are not very good at
generating summary tables from log tables, because in this case
row locking is almost useless.
To make your application really database independent, you need to define an easily extendable interface through which you manipulate your data. As C++ is available on most systems, it makes sense to use a C++ class-based interface to the databases.
If you use some feature that is specific to a given database
system (such as the REPLACE
statement, which
is specific to MySQL), you should implement the same feature for
other SQL servers by coding an alternative method. Although the
alternative may be slower, it allows the other servers to
perform the same tasks.
With MySQL, you can use the /*! */
syntax to
add MySQL-specific keywords to a query. The code inside
/**/
is treated as a comment (and ignored) by
most other SQL servers.
If high performance is more important than exactness, as in some Web applications, it is possible to create an application layer that caches all results to give you even higher performance. By letting old results expire after a while, you can keep the cache reasonably fresh. This provides a method to handle high load spikes, in which case you can dynamically increase the cache and set the expiration timeout higher until things get back to normal.
In this case, the table creation information should contain information of the initial size of the cache and how often the table should normally be refreshed.
An alternative to implementing an application cache is to use the MySQL query cache. By enabling the query cache, the server handles the details of determining whether a query result can be reused. This simplifies your application. See Sección 5.12, “La caché de consultas de MySQL”.
This section describes an early application for MySQL.
During MySQL initial development, the features of MySQL were made to fit our largest customer, which handled data warehousing for a couple of the largest retailers in Sweden.
From all stores, we got weekly summaries of all bonus card transactions, and were expected to provide useful information for the store owners to help them find how their advertising campaigns were affecting their own customers.
The volume of data was quite huge (about seven million summary transactions per month), and we had data for 4-10 years that we needed to present to the users. We got weekly requests from our customers, who wanted instant access to new reports from this data.
We solved this problem by storing all information per month in compressed “transaction tables”. We had a set of simple macros that generated summary tables grouped by different criteria (product group, customer id, store, and so on) from the tables in which the transactions were stored. The reports were Web pages that were dynamically generated by a small Perl script. This script parsed a Web page, executed the SQL statements in it, and inserted the results. We would have used PHP or mod_perl instead, but they were not available at the time.
For graphical data, we wrote a simple tool in C that could process SQL query results and produce GIF images based on those results. This tool also was dynamically executed from the Perl script that parses the Web pages.
In most cases, a new report could be created simply by copying an existing script and modifying the SQL query that it used. In some cases, we needed to add more columns to an existing summary table or generate a new one. This also was quite simple because we kept all transaction-storage tables on disk. (This amounted to about 50GB of transaction tables and 200GB of other customer data.)
We also let our customers access the summary tables directly with ODBC so that the advanced users could experiment with the data themselves.
This system worked well and we had no problems handling the data with quite modest Sun Ultra SPARCstation hardware (2x200MHz). Eventually the system was migrated to Linux.
This section should contain a technical description of the MySQL
benchmark suite (as well as crash-me), but
that description has not yet been written. However, you can get
a good idea for how the benchmarks work by looking at the code
and results in the sql-bench
directory in
any MySQL source distribution.
This benchmark suite is meant to tell any user what operations a given SQL implementation performs well or poorly.
Note that this benchmark is single-threaded, so it measures the minimum time for the operations performed. We plan to add multi-threaded tests to the benchmark suite in the future.
To use the benchmark suite, the following requirements must be satisfied:
-
The benchmark suite is provided with MySQL source distributions. You can either download a released distribution from http://dev.mysql.com/downloads/, or use the current development source tree (see Sección 2.8.3, “Instalar desde el árbol de código fuente de desarrollo”).
-
The benchmark scripts are written in Perl and use the Perl DBI module to access database servers, so DBI must be installed. You also need the server-specific DBD drivers for each of the servers you want to test. For example, to test MySQL, PostgreSQL, and DB2, you must have the
DBD::mysql
,DBD::Pg
, andDBD::DB2
modules installed. See Sección 2.13, “Notas sobre la instalación de Perl”.
After you obtain a MySQL source distribution, you can find the
benchmark suite located in its sql-bench
directory. To run the benchmark tests, build MySQL, then change
location into the sql-bench
directory and
execute the run-all-tests
script:
shell> cd sql-bench
shell> perl run-all-tests --server=server_name
server_name
is one of the supported
servers. To get a list of all options and supported servers,
invoke this command:
shell> perl run-all-tests --help
The crash-me script also is located in the
sql-bench
directory.
crash-me tries to determine what features a
database supports and what its capabilities and limitations are
by actually running queries. For example, it determines:
-
What column types are supported
-
How many indexes are supported
-
What functions are supported
-
How big a query can be
-
How big a
VARCHAR
column can be
You can find the results from crash-me for many different database servers at http://dev.mysql.com/tech-resources/crash-me.php. For more information about benchmark results, visit http://dev.mysql.com/tech-resources/benchmarks/.
You should definitely benchmark your application and database to find out where the bottlenecks are. By fixing a bottleneck (or by replacing it with a “dummy” module), you can then easily identify the next bottleneck. Even if the overall performance for your application currently is acceptable, you should at least make a plan for each bottleneck, and decide how to solve it if someday you really need the extra performance.
For an example of a portable benchmark program, look at the MySQL benchmark suite. See Sección 7.1.4, “El paquete de pruebas de rendimiento (benchmarks) de MySQL”. You can take any program from this suite and modify it for your needs. By doing this, you can try different solutions to your problem and test which really is fastest for you.
Another free benchmark suite is the Open Source Database Benchmark, available at http://osdb.sourceforge.net/.
It is very common for a problem to occur only when the system is very heavily loaded. We have had many customers who contact us when they have a (tested) system in production and have encountered load problems. In most cases, performance problems turn out to be due to issues of basic database design (for example, table scans are not good under high load) or problems with the operating system or libraries. Most of the time, these problems would be much easier to fix if the systems were not in production.
To avoid problems like this, you should put some effort into benchmarking your whole application under the worst possible load. You can use Super Smack for this. It is available at http://jeremy.zawodny.com/mysql/super-smack/. As the name suggests, it can bring a system to its knees if you ask it, so make sure to use it only on your development systems.