14.5. The BDB (BerkeleyDB) Storage Engine

MySQL 5.0

14.5. The BDB (BerkeleyDB) Storage Engine

Sleepycat Software has provided MySQL with the Berkeley DB transactional storage engine. This storage engine typically is called for short. tables may have a greater chance of surviving crashes and are also capable of and operations on transactions.

Support for the storage engine is included in MySQL source distributions is activated in MySQL-Max binary distributions. The MySQL source distribution comes with a distribution that is patched to make it work with MySQL. You cannot use a non-patched version of with MySQL.

We at MySQL AB work in close cooperation with Sleepycat to keep the quality of the MySQL/BDB interface high. (Even though Berkeley DB is in itself very tested and reliable, the MySQL interface is still considered gamma quality. We continue to improve and optimize it.)

When it comes to support for any problems involving tables, we are committed to helping our users locate the problem and create reproducible test cases. Any such test case is forwarded to Sleepycat, which in turn helps us find and fix the problem. As this is a two-stage operation, any problems with tables may take a little longer for us to fix than for other storage engines. However, we anticipate no significant difficulties with this procedure because the Berkeley DB code itself is used in many applications other than MySQL.

For general information about Berkeley DB, please visit the Sleepycat Web site, http://www.sleepycat.com/.

14.5.1. Operating Systems Supported by BDB

Currently, we know that the storage engine works with the following operating systems:

  • Linux 2.x Intel

  • Sun Solaris (SPARC and x86)

  • FreeBSD 4.x/5.x (x86, sparc64)

  • IBM AIX 4.3.x

  • SCO OpenServer

  • SCO UnixWare 7.1.x

  • Windows NT/2000/XP

The storage engine does not work with the following operating systems:

  • Linux 2.x Alpha

  • Linux 2.x AMD64

  • Linux 2.x IA-64

  • Linux 2.x s390

  • Mac OS X

Note: The preceding lists are not complete. We update them as we receive more information.

If you build MySQL from source with support for tables, but the following error occurs when you start mysqld, it means that the storage engine is not supported for your architecture:

bdb: architecture lacks fast mutexes: applications cannot be threaded
Can't init databases

In this case, you must rebuild MySQL without support or start the server with the option.

14.5.2. Installing BDB

If you have downloaded a binary version of MySQL that includes support for Berkeley DB, simply follow the usual binary distribution installation instructions. (MySQL-Max distributions include support.)

If you build MySQL from source, you can enable support by invoking configure with the option in addition to any other options that you normally use. Download a MySQL 5.0 distribution, change location into its top-level directory, and run this command:

shell> ]

For more information, see Section 5.3, “The mysqld-max Extended MySQL Server”, Section 2.8, “Installing MySQL on Other Unix-Like Systems”, and Section 2.9, “MySQL Installation Using a Source Distribution”.

14.5.3. BDB Startup Options

The following options to mysqld can be used to change the behavior of the storage engine. For more information, see Section 5.2.1, “mysqld Command Options”.

  • The base directory for tables. This should be the same directory that you use for .

  • The lock detection method. The option value should be , , , or .

  • The log file directory.

  • Do not start Berkeley DB in recover mode.

  • Don't synchronously flush the logs. This option is deprecated; use instead (see the description for ).

  • Start Berkeley DB in multi-process mode. (Do not use when initializing Berkeley DB.)

  • The temporary file directory.

  • Disable the storage engine.

  • Synchronously flush the logs. This option is enabled by default. Use to disable it.

If you use the option, MySQL does not initialize the Berkeley DB library and this saves a lot of memory. However, if you use this option, you cannot use tables. If you try to create a table, MySQL uses the default storage engine instead.

Normally, you should start mysqld without the option if you intend to use tables. However, this may cause problems when you try to start mysqld if the log files are corrupted. See Section 2.10.2.3, “Starting and Troubleshooting the MySQL Server”.

With the variable, you can specify the maximum number of locks that can be active on a table. The default is 10,000. You should increase this if errors such as the following occur when you perform long transactions or when mysqld has to examine many rows to execute a query:

bdb: Lock table is out of available locks
Got error 12 from ...

You may also want to change the and variables if you are using large multiple-statement transactions. See Section 5.12.3, “The Binary Log”.

See also Section 5.2.2, “Server System Variables”.

14.5.4. Characteristics of BDB Tables

Each table is stored on disk in two files. The files have names that begin with the table name and have an extension to indicate the file type. An file stores the table format, and a file contains the table data and indexes.

To specify explicitly that you want a table, indicate that with an table option:

CREATE TABLE t (i INT) ENGINE = BDB;

The older term is supported as a synonym for for backward compatibility, but is the preferred term and is deprecated.

is a synonym for in the table option.

The storage engine provides transactional tables. The way you use these tables depends on the autocommit mode:

  • If you are running with autocommit enabled (which is the default), changes to tables are committed immediately and cannot be rolled back.

  • If you are running with autocommit disabled, changes do not become permanent until you execute a statement. Instead of committing, you can execute to forget the changes.

    You can start a transaction with the or statement to suspend autocommit, or with to disable autocommit explicitly.

For more information about transactions, see Section 13.4.1, “, , and Syntax”.

The storage engine has the following characteristics:

  • tables can have up to 31 indexes per table, 16 columns per index, and a maximum key size of 1024 bytes.

  • MySQL requires a primary key in each table so that each row can be uniquely identified. If you don't create one explicitly by declaring a , MySQL creates and maintains a hidden primary key for you. The hidden key has a length of five bytes and is incremented for each insert attempt. This key does not appear in the output of or .

  • The primary key is faster than any other index, because it is stored together with the row data. The other indexes are stored as the key data plus the primary key, so it's important to keep the primary key as short as possible to save disk space and get better speed.

    This behavior is similar to that of , where shorter primary keys save space not only in the primary index but in secondary indexes as well.

  • If all columns that you access in a table are part of the same index or part of the primary key, MySQL can execute the query without having to access the actual row. In a table, this can be done only if the columns are part of the same index.

  • Sequential scanning is slower for tables than for tables because the data in tables is stored in B-trees and not in a separate data file.

  • Key values are not prefix- or suffix-compressed like key values in tables. In other words, key information takes a little more space in tables compared to tables.

  • There are often holes in the table to allow you to insert new rows in the middle of the index tree. This makes tables somewhat larger than tables.

  • is slow for tables, because no row count is maintained in the table.

  • The optimizer needs to know the approximate number of rows in the table. MySQL solves this by counting inserts and maintaining this in a separate segment in each table. If you don't issue a lot of or statements, this number should be accurate enough for the MySQL optimizer. However, MySQL stores the number only on close, so it may be incorrect if the server terminates unexpectedly. It should not be fatal even if this number is not 100% correct. You can update the row count by using or . See Section 13.5.2.1, “ Syntax”, and Section 13.5.2.5, “ Syntax”.

  • Internal locking in tables is done at the page level.

  • works on tables as with other tables. If you do not use , MySQL issues an internal multiple-write lock on the table (a lock that does not block other writers) to ensure that the table is properly locked if another thread issues a table lock.

  • To support transaction rollback, the storage engine maintains log files. For maximum performance, you can use the option to place the logs on a different disk than the one where your databases are located.

  • MySQL performs a checkpoint each time a new log file is started, and removes any log files that are not needed for current transactions. You can also use at any time to checkpoint the Berkeley DB tables.

    For disaster recovery, you should use table backups plus MySQL's binary log. See Section 5.10.1, “Database Backups”.

    Warning: If you delete old log files that are still in use, is not able to do recovery at all and you may lose data if something goes wrong.

  • Applications must always be prepared to handle cases where any change of a table may cause an automatic rollback and any read may fail with a deadlock error.

  • If you get a full disk with a table, you get an error (probably error 28) and the transaction should roll back. This contrasts with tables, for which mysqld waits for sufficient free disk space before continuing.

14.5.5. Restrictions on BDB Tables

The following list indicates restrictions that you must observe when using tables:

  • Each table stores in its file the path to the file as it was created. This is done to enable detection of locks in a multi-user environment that supports symlinks. As a consequence of this, it is not possible to move table files from one database directory to another.

  • When making backups of tables, you must either use mysqldump or else make a backup that includes the files for each table (the and files) as well as the log files. The storage engine stores unfinished transactions in its log files and requires them to be present when mysqld starts. The logs are the files in the data directory with names of the form (ten digits).

  • If a column that allows values has a unique index, only a single value is allowed. This differs from other storage engines, which allow multiple values in unique indexes.

14.5.6. Errors That May Occur When Using BDB Tables

  • If the following error occurs when you start mysqld after upgrading, it means that the current version of doesn't support the old log file format:

    bdb:  Ignoring log file: .../log.:
    unsupported log version #
    

    In this case, you must delete all logs from your data directory (the files that have names of the form ) and restart mysqld. We also recommend that you then use mysqldump --opt to dump your tables, drop the tables, and restore them from the dump file.

  • If autocommit mode is disabled and you drop a table that is referenced in another transaction, you may get error messages of the following form in your MySQL error log:

    001119 23:43:56  bdb:  Missing log fileid entry
    001119 23:43:56  bdb:  txn_abort: Log undo failed for LSN:
                           1 3644744: Invalid
    

    This is not fatal, but the fix is not trivial. Until the problem is fixed, we recommend that you not drop tables except while autocommit mode is enabled.