MySQL stores its data dictionary information for tables in
.frm
files in database directories. This is
true for all MySQL storage engines. But every
InnoDB
table also has its own entry in
InnoDB
internal data dictionaries inside the
tablespace. When MySQL drops a table or a database, it has to
delete both an .frm
file or files, and the
corresponding entries inside the InnoDB
data
dictionary. This is the reason why you cannot move
InnoDB
tables between databases simply by
moving the .frm
files.
Every InnoDB
table has a special index called
the clustered index where the data of the rows is stored. If you
define a PRIMARY KEY
on your table, the index
of the primary key is the clustered index.
If you do not define a PRIMARY KEY
for your
table, MySQL picks the first UNIQUE
index that
has only NOT NULL
columns as the primary key
and InnoDB
uses it as the clustered index. If
there is no such index in the table, InnoDB
internally generates a clustered index where the rows are ordered
by the row ID that InnoDB
assigns to the rows
in such a table. The row ID is a 6-byte field that increases
monotonically as new rows are inserted. Thus the rows ordered by
the row ID are physically in the insertion order.
Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads. If a table is large, the clustered index architecture often saves a disk I/O when compared to the traditional solution. (In many databases, the data is traditionally stored on a different page from the index record.)
In InnoDB
, the records in non-clustered indexes
(also called secondary indexes) contain the primary key value for
the row. InnoDB
uses this primary key value to
search for the row from the clustered index. Note that if the
primary key is long, the secondary indexes use more space.
InnoDB
compares CHAR
and
VARCHAR
strings of different lengths such that
the remaining length in the shorter string is treated as if padded
with spaces.
All indexes in InnoDB
are B-trees where the
index records are stored in the leaf pages of the tree. The
default size of an index page is 16KB. When new records are
inserted, InnoDB
tries to leave 1/16 of the
page free for future insertions and updates of the index
records.
If index records are inserted in a sequential order (ascending
or descending), the resulting index pages is about 15/16 full.
If records are inserted in a random order, the pages are from
1/2 to 15/16 full. If the fillfactor of an index page drops
below 1/2, InnoDB
tries to contract the index
tree to free the page.
It is a common situation in a database application that the primary key is a unique identifier and new rows are inserted in the ascending order of the primary key. Thus the insertions to the clustered index do not require random reads from a disk.
On the other hand, secondary indexes are usually non-unique, and
insertions into secondary indexes happen in a relatively random
order. This would cause a lot of random disk I/O operations
without a special mechanism used in InnoDB
.
If an index record should be inserted to a non-unique secondary
index, InnoDB
checks whether the secondary
index page is in the buffer pool. If that is the case,
InnoDB
does the insertion directly to the
index page. If the index page is not found in the buffer pool,
InnoDB
inserts the record to a special insert
buffer structure. The insert buffer is kept so small that it
fits entirely in the buffer pool, and insertions can be done
very fast.
Periodically, the insert buffer is merged into the secondary index trees in the database. Often it is possible to merge several insertions to the same page of the index tree, saving disk I/O operations. It has been measured that the insert buffer can speed up insertions into a table up to 15 times.
The insert buffer merging may continue to happen *after* the inserting transaction has been committed. In fact, it may continue to happen after a server shutdown & restart (see Sección 15.8.1, “Forzar una recuperación”).
The insert buffer merging may take many hours, when many secondary indexes must be updated, and many rows have been inserted. During this time, disk I/O will be increased, which can cause significant slowdown on disk-bound queries. Another significant background I/O operation is the purge thread (see Sección 15.12, “Implementación de multiversión”).
If a table fits almost entirely in main memory, the fastest way
to perform queries on it is to use hash indexes.
InnoDB
has an automatic mechanism that
monitors index searches made to the indexes defined for a table.
If InnoDB
notices that queries could benefit
from building a hash index, it does so automatically.
Note that the hash index is always built based on an existing
B-tree index on the table. InnoDB
can build a
hash index on a prefix of any length of the key defined for the
B-tree, depending on the pattern of searches that
InnoDB
observes for the B-tree index. A hash
index can be partial: It is not required that the whole B-tree
index is cached in the buffer pool. InnoDB
builds hash indexes on demand for those pages of the index that
are often accessed.
In a sense, InnoDB
tailors itself through the
adaptive hash index mechanism to ample main memory, coming
closer to the architecture of main memory databases.
Records in InnoDB
tables have the following
characteristics:
-
Each index record in
InnoDB
contains a header of six bytes. The header is used to link consecutive records together, and also in row-level locking. -
Records in the clustered index contain fields for all user-defined columns. In addition, there is a six-byte field for the transaction ID and a seven-byte field for the roll pointer.
-
If no primary key was defined for a table, each clustered index record also contains a six-byte row ID field.
-
Each secondary index record contains also all the fields defined for the clustered index key.
-
A record contains also a pointer to each field of the record. If the total length of the fields in a record is less than 128 bytes, the pointer is one byte; otherwise, two bytes. The array of these pointers is called the record directory. The area where these pointers point is called the data part of the record.
-
Internally,
InnoDB
stores fixed-length character columns such asCHAR(10)
in a fixed-length format.InnoDB
truncates trailing spaces fromVARCHAR
columns. Note that MySQL may internally convertCHAR
columns toVARCHAR
. See Sección 13.1.5.1, “Cambios tácitos en la especificación de columnas”. -
An SQL
NULL
value reserves 1 or 2 bytes in the record directory. Besides that, an SQLNULL
value reserves zero bytes in the data part of the record if stored in a variable length column. In a fixed-length column, it reserves the fixed length of the column in the data part of the record. The motivation behind reserving the fixed space forNULL
values is that then an update of the column fromNULL
to a non-NULL
value can be done in place and does not cause fragmentation of the index page.