InnoDB
uses simulated asynchronous disk I/O:
InnoDB
creates a number of threads to take
care of I/O operations, such as read-ahead.
There are two read-ahead heuristics in
InnoDB
:
-
In sequential read-ahead, if
InnoDB
notices that the access pattern to a segment in the tablespace is sequential, it posts in advance a batch of reads of database pages to the I/O system. -
In random read-ahead, if
InnoDB
notices that some area in a tablespace seems to be in the process of being fully read into the buffer pool, it posts the remaining reads to the I/O system.
InnoDB
uses a novel file flush technique
called doublewrite . It adds safety to
recovery following an operating system crash or a power outage,
and improves performance on most varieties of Unix by reducing
the need for fsync()
operations.
Doublewrite means that before writing pages to a data file,
InnoDB
first writes them to a contiguous
tablespace area called the doublewrite buffer. Only after the
write and the flush to the doublewrite buffer has completed does
InnoDB
write the pages to their proper
positions in the data file. If the operating system crashes in
the middle of a page write, InnoDB
can later
find a good copy of the page from the doublewrite buffer during
recovery.
In MySQL 5.0, you can also use raw disk partitions as tablespace data files. By using a raw disk, you can perform non-buffered I/O on Windows and on some Unix systems without filesystem overhead, which may improve performance.
When you create a new data file, you must put the keyword
newraw
immediately after the data file size
in innodb_data_file_path
. The partition must
be at least as large as the size that you specify. Note that 1MB
in InnoDB
is 1024 * 1024 bytes, whereas 1MB
usually means 1,000,000 bytes in disk specifications.
[mysqld] innodb_data_home_dir= innodb_data_file_path=/dev/hdd1:3Gnewraw;/dev/hdd2:2Gnewraw
The next time you start the server, InnoDB
notices the newraw
keyword and initializes
the new partition. However, do not create or change any
InnoDB
tables yet. Otherwise, when you next
restart the server, InnoDB
reinitializes the
partition and your changes are lost. (Starting from 3.23.44, as
a safety measure InnoDB
prevents users from
modifying data when any partition with newraw
is specified.)
After InnoDB
has initialized the new
partition, stop the server, change newraw
in
the data file specification to raw
:
[mysqld] innodb_data_home_dir= innodb_data_file_path=/dev/hdd1:5Graw;/dev/hdd2:2Graw
Then restart the server and InnoDB
allows
changes to be made.
On Windows, you can allocate a disk partition as a data file like this:
[mysqld] innodb_data_home_dir= innodb_data_file_path=//./D::10Gnewraw
The //./
corresponds to the Windows syntax
of \\.\
for accessing physical drives.
When you use raw disk partitions, be sure that they have permissions that allow read and write access by the account used for running the MySQL server.
The data files you define in the configuration file form the
tablespace of InnoDB
. The files are simply
concatenated to form the tablespace. There is no striping in
use. Currently you cannot define where in the tablespace your
tables are allocated. However, in a newly created tablespace,
InnoDB
allocates space starting from the
first data file.
The tablespace consists of database pages with a default size of
16KB. The pages are grouped into extents of 64 consecutive
pages. The “files” inside a tablespace are called
segments in InnoDB
.
The term “rollback segment” is somewhat confusing
because it actually contains many tablespace segments.
Two segments are allocated for each index in
InnoDB
. One is for non-leaf nodes of the
B-tree, the other is for the leaf nodes. The idea here is to
achieve better sequentiality for the leaf nodes, which contain
the data.
When a segment grows inside the tablespace,
InnoDB
allocates the first 32 pages to it
individually. After that InnoDB
starts to
allocate whole extents to the segment. InnoDB
can add to a large segment up to 4 extents at a time to ensure
good sequentiality of data.
Some pages in the tablespace contain bitmaps of other pages, and
therefore a few extents in an InnoDB
tablespace cannot be allocated to segments as a whole, but only
as individual pages.
When you ask for available free space in the tablespace by
issuing a SHOW TABLE STATUS
,
InnoDB
reports the extents that are
definitely free in the tablespace. InnoDB
always reserves some extents for clean-up and other internal
purposes; these reserved extents are not included in the free
space.
When you delete data from a table, InnoDB
contracts the corresponding B-tree indexes. It depends on the
pattern of deletes whether that frees individual pages or
extents to the tablespace, so that the freed space becomes
available for other users. Dropping a table or deleting all rows
from it is guaranteed to release the space to other users, but
remember that deleted rows are physically removed only in an
(automatic) purge operation after they are no longer needed for
transaction rollbacks or consistent reads.
If there are random insertions into or deletions from the indexes of a table, the indexes may become fragmented. Fragmentation means that the physical ordering of the index pages on the disk is not close to the index ordering of the records on the pages, or that there are many unused pages in the 64-page blocks that were allocated to the index.
A symptom of fragmentation is that a table takes more space than it 'should take'. How much exactly is that, is difficult to determine. All InnoDB data and indexes are stored in B-trees, and their fillfactor may vary 50 % to 100 %. Another symptom of fragmentation is that a table scan:
SELECT COUNT(*) FROM t WHERE a_non_indexed_column <> 12345;
takes more time than it should. (In the above query, we are “fooling” the SQL optimizer into scanning the clustered index, not a secondary index.) Most disks can read 10 - 50 MB/s. This can be used to estimate how fast a table scan should run.
It can speed up index scans if you periodically perform a
“null” ALTER TABLE
operation:
ALTER TABLE tbl_name
ENGINE=INNODB
That causes MySQL to rebuild the table. Another way to perform a defragmention operation is to use mysqldump to dump the table to a text file, drop the table, and reload it from the dump file.
If the insertions to an index are always ascending and records
are deleted only from the end, the InnoDB
file space management algorithm guarantees that fragmentation of
the index does not occur.