15.3. Simple Multi-Computer How-To

MySQL 5.0

15.3. Simple Multi-Computer How-To

This section is a “How-To” that describes the basics for how to plan, install, configure, and run a MySQL Cluster. Whereas the examples in Section 15.4, “MySQL Cluster Configuration” provide more in-depth information on a variety of clustering options and configuration, the result of following the guidelines and procedures outlined here should be a usable MySQL Cluster which meets the minimum requirements for availability and safeguarding of data.

This section covers hardware and software requirements; networking issues; installation of MySQL Cluster; configuration issues; starting, stopping, and restarting the cluster; loading of a sample database; and performing queries.

Basic Assumptions

This How-To makes the following assumptions:

  1. The cluster setup has four nodes, each on a separate host, and each with a fixed network address on a typical Ethernet as shown here:

    Node IP Address
    Management (MGM) node 192.168.0.10
    MySQL server (SQL) node 192.168.0.20
    Data (NDBD) node "A" 192.168.0.30
    Data (NDBD) node "B" 192.168.0.40

    This may be made clearer in the following diagram:

    MySQL Cluster Multi-Computer             Setup

    Note: In the interest of simplicity (and reliability), this How-To uses only numeric IP addresses. However, if DNS resolution is available on your network, it is possible to use hostnames in lieu of IP addresses in configuring Cluster. Alternatively, you can use the file or your operating system's equivalent for providing a means to do host lookup if such is available.

  2. Each host in our scenario is an Intel-based desktop PC running a common, generic Linux distribution installed to disk in a standard configuration, and running no unnecessary services. The core OS with standard TCP/IP networking capabilities should be sufficient. Also for the sake of simplicity, we also assume that the filesystems on all hosts are set up identically. In the event that they are not, you will need to adapt these instructions accordingly.

  3. Standard 100 Mbps or 1 gigabit Ethernet cards are installed on each machine, along with the proper drivers for the cards, and that all four hosts are connected via a standard-issue Ethernet networking appliance such as a switch. (All machines should use network cards with the same throughout. That is, all four machines in the cluster should have 100 Mbps cards or all four machines should have 1 Gbps cards.) MySQL Cluster will work in a 100 Mbps network; however, gigabit Ethernet will provide better performance.

    Note that MySQL Cluster is not intended for use in a network for which throughput is less than 100 Mbps. For this reason (among others), attempting to run a MySQL Cluster over a public network such as the Internet is not likely to be successful, and is not recommended.

  4. For our sample data, we will use the database which is available for download from the MySQL AB Web site. As this database takes up a relatively small amount of space, we assume that each machine has 256MB RAM, which should be sufficient for running the operating system, host NDB process, and (for the data nodes) for storing the database.

Although we refer to a Linux operating system in this How-To, the instructions and procedures that we provide here should be easily adaptable to either Solaris or Mac OS X. We also assume that you already know how to perform a minimal installation and configuration of the operating system with networking capability, or that you are able to obtain assistance in this elsewhere if needed.

We discuss MySQL Cluster hardware, software, and networking requirements in somewhat greater detail in the next section. (See Section 15.3.1, “Hardware, Software, and Networking”.)

15.3.1. Hardware, Software, and Networking

One of the strengths of MySQL Cluster is that it can be run on commodity hardware and has no unusual requirements in this regard, other than for large amounts of RAM, due to the fact that all live data storage is done in memory. (Note that this is subject to change, and that we intend to implement disk-based storage in a future MySQL Cluster release.) Naturally, multiple and faster CPUs will enhance performance. Memory requirements for Cluster processes are relatively small.

The software requirements for Cluster are also modest. Host operating systems do not require any unusual modules, services, applications, or configuration to support MySQL Cluster. For Mac OS X or Solaris, the standard installation is sufficient. For Linux, a standard, “out of the box” installation should be all that is necessary. The MySQL software requirements are simple: all that is needed is a production release of MySQL-max 5.0; you must use the version of MySQL to have Cluster support. (See Section 5.3, “The mysqld-max Extended MySQL Server”.) It is not necessary to compile MySQL yourself merely to be able to use Cluster. In this How-To, we assume that you are using the binary appropriate to your Linux, Solaris, or Mac OS X operating system, available via the MySQL software downloads page at http://dev.mysql.com/downloads/.

For inter-node communication, Cluster supports TCP/IP networking in any standard topology, and the minimum expected for each host is a standard 100 Mbps Ethernet card, plus a switch, hub, or router to provide network connectivity for the cluster as a whole. We strongly recommend that a MySQL Cluster be run on its own subnet which is not shared with non-Cluster machines for the following reasons:

  • Security: Communications between Cluster nodes are not encrypted or shielded in any way. The only means of protecting transmissions within a MySQL Cluster is to run your Cluster on a protected network. If you intend to use MySQL Cluster for Web applications, the cluster should definitely reside behind your firewall and not in your network's De-Militarized Zone (DMZ) or elsewhere.

  • Efficiency: Setting up a MySQL Cluster on a private or protected network allows the cluster to make exclusive use of bandwidth between cluster hosts. Using a separate switch for your MySQL Cluster not only helps protect against unauthorized access to Cluster data, it also ensures that Cluster nodes are shielded from interference caused by transmissions between other computers on the network. For enhanced reliability, you can use dual switches and dual cards to remove the network as a single point of failure; many device drivers support failover for such communication links.

It is also possible to use the high-speed Scalable Coherent Interface (SCI) with MySQL Cluster, but this is not a requirement. See Section 15.9, “Using High-Speed Interconnects with MySQL Cluster”, for more about this protocol and its use with MySQL Cluster.

15.3.2. Multi-Computer Installation

Each MySQL Cluster host computer running data or SQL nodes must have installed on it a MySQL-max binary. For management nodes, it is not necessary to install the MySQL server binary, but you do have to install the MGM server daemon and client binaries (ndb_mgmd and ndb_mgm, respectively). This section covers the steps necessary to install the correct binaries for each type of Cluster node.

MySQL AB provides precompiled binaries that support Cluster, and there is generally no need to compile these yourself. Therefore, the first step in the installation process for each cluster host is to download the file from the MySQL downloads area. We assume that you have placed it in each machine's directory. (If you do require a custom binary, see Section 2.9.3, “Installing from the Development Source Tree”.)

RPMs are also available for both 32-bit and 64-bit Linux platforms; as of MySQL 4.1.10a, the binaries installed by the RPMs support the storage engine. If you choose to use these rather than the binary files, be aware that you must install both the and packages on all machines that are to host cluster nodes. (See Section 2.4, “Installing MySQL on Linux”, for more information about installing MySQL using the RPMs.) After installing from RPM, you will still need to configure the cluster as discussed in Section 15.3.3, “Multi-Computer Configuration”.

Note: After completing the installation, do not yet start any of the binaries. We will show you how to do so following the configuration of all nodes.

Storage and SQL Node Installation

On each of the three machines designated to host storage or SQL nodes, perform the following steps as the system user:

  1. Check your and files (or use whatever tools are provided by your operating system for manging users and groups) to see whether there is already a group and user on the system. Some OS distributions create these as part of the operating system installation process. If they are not already present, create a new user group, and then add a user to this group:

    shell> 
    shell> 
    

    The syntax for useradd and groupadd may differ slightly on different versions of Unix, or they may have different names such as adduser and addgroup.

  2. Change location to the directory containing the downloaded file, unpack the archive, and create a symlink to the directory named . Note that the actual file and directory names will vary according to the MySQL version number.

    shell> 
    shell> 
    shell> 
    
  3. Change location to the directory and run the supplied script for creating the system databases:

    shell> 
    shell> 
    
  4. Set the necessary permissions for the MySQL server and data directories:

    shell> 
    shell> 
    shell> 
    

    Note that the data directory on each machine hosting a data node is . We will use this piece of information when we configure the management node. (See Section 15.3.3, “Multi-Computer Configuration”.)

  5. Copy the MySQL startup script to the appropriate directory, make it executable, and set it to start when the operating system is booted up:

    shell> 
    shell> 
    shell> 
    

    (The startup scripts directory may vary depending on your operating system and version — for example, in some Linux distributions, it is .)

    Here we use Red Hat's chkconfig for creating links to the startup scripts; use whatever means is appropriate for this purpose on your operating system and distribution, such as update-rc.d on Debian.

Remember that the preceding steps must be performed separately for each machine on which a storage or SQL node is to reside.

Management Node Installation

Installation for the management (MGM) node does not require installation of the mysqld binary. Only the binaries for the MGM server and client are required, which can be found in the downloaded archive. Again, we assume that you have placed this file in .

As system (that is, after using sudo, su root, or your system's equivalent for temporarily assuming the system administrator account's privileges), perform the following steps to install ndb_mgmd and ndb_mgm on the Cluster management node host:

  1. Change location to the directory, and extract the ndb_mgm and ndb_mgmd from the archive into a suitable directory such as :

    shell> 
    shell> 
    shell> 
    shell> 
    

    (You can safely delete the directory created by unpacking the downloaded archive, and the files it contains, from once ndb_mgm and ndb_mgmd have been copied to the executables directory.)

  2. Change location to the directory into which you copied the files, and then make both of them executable:

    shell> 
    shell> 
    

In Section 15.3.3, “Multi-Computer Configuration”, we will create and write configuration files for all of the nodes in our example Cluster.

15.3.3. Multi-Computer Configuration

For our four-node, four-host MySQL Cluster, we will need to write four configuration files, one per node/host.

  • Each data node or SQL node requires a file that provides two pieces of information: a connectstring telling the node where to find the MGM node, and a line telling the MySQL server on this host (the machine hosting the data node) to run in NDB mode.

    For more information on connectstrings, see Section 15.4.4.2, “The Cluster .

  • The management node needs a file telling it how many replicas to maintain, how much memory to allocate for data and indexes on each data node, where to find the data nodes, where to save data to disk on each data node, and where to find any SQL nodes.

Configuring the Storage and SQL Nodes

The file needed for the data nodes is fairly simple. The configuration file should be located in the directory and can be edited using any text editor. (Create the file if it does not exist.) For example:

shell> 

We show vi being used here to create the file, but any text editor should work just as well.

For each data node and SQL node in our example setup, should look like this:

# Options for mysqld process:
[MYSQLD]                        
ndbcluster                      # run NDB engine
ndb-connectstring=192.168.0.10  # location of MGM node

# Options for ndbd process:
[MYSQL_CLUSTER]                 
ndb-connectstring=192.168.0.10  # location of MGM node

After entering the preceding information, save this file and exit the text editor. Do this for the machines hosting data node “A”, data node “B”, and the SQL node.

Important: Once you have started a mysqld process with the and parameters in the in the file as shown previously, you cannot execute any or statements without having actually started the cluster. Otherwise, these statements will fail with an error. This is by design.

Configuring the Management Node

The first step in configuring the MGM node is to create the directory in which the configuration file can be found and then to create the file itself. For example (running as ):

shell> 
shell> 
shell> 

For our representative setup, the file should read as follows:

# Options affecting ndbd processes on all data nodes:
[NDBD DEFAULT]    
NoOfReplicas=2    # Number of replicas
DataMemory=80M    # How much memory to allocate for data storage
IndexMemory=18M   # How much memory to allocate for index storage
                  # For DataMemory and IndexMemory, we have used the
                  # default values. Since the "world" database takes up
                  # only about 500KB, this should be more than enough for
                  # this example Cluster setup.

# TCP/IP options:
[TCP DEFAULT]     
portnumber=2202   # This the default; however, you can use any
                  # port that is free for all the hosts in cluster
                  # Note: It is recommended beginning with MySQL 5.0 that
                  # you do not specify the portnumber at all and simply allow
                  # the default value to be used instead

# Management process options:
[NDB_MGMD]                      
hostname=192.168.0.10           # Hostname or IP address of MGM node
datadir=/var/lib/mysql-cluster  # Directory for MGM node logfiles

# Options for data node "A":
[NDBD]                          
                                # (one [NDBD] section per data node)
hostname=192.168.0.30           # Hostname or IP address
datadir=/usr/local/mysql/data   # Directory for this data node's datafiles

# Options for data node "B":
[NDBD]                          
hostname=192.168.0.40           # Hostname or IP address
datadir=/usr/local/mysql/data   # Directory for this data node's datafiles

# SQL node options:
[MYSQLD]                        
hostname=192.168.0.20           # Hostname or IP address
                                # (additional mysqld connections can be
                                # specified for this node for various
                                # purposes such as running ndb_restore)

(Note: The database can be downloaded from http://dev.mysql.com/doc/, where it can be found listed under “Examples.”)

After all the configuration files have been created and these minimal options have been specified, you are ready to proceed with starting the cluster and verifying that all processes are running. We discuss how this is done in Section 15.3.4, “Initial Startup”.

For more detailed information about the available MySQL Cluster configuration parameters and their uses, see Section 15.4.4, “Configuration File”, and Section 15.4, “MySQL Cluster Configuration”. For configuration of MySQL Cluster as relates to making backups, see Section 15.8.4, “Configuration for Cluster Backup”.

Note: The default port for Cluster management nodes is 1186; the default port for data nodes is 2202. Beginning with MySQL 5.0.3, this restriction is lifted, and the cluster automatically allocates ports for data nodes from those that are already free.

15.3.4. Initial Startup

Starting the cluster is not very difficult after it has been configured. Each cluster node process must be started separately, and on the host where it resides. Although it is possible to start the nodes in any order, it is recommended that the management node be started first, followed by the storage nodes, and then finally by any SQL nodes:

  1. On the management host, issue the following command from the system shell to start the MGM node process:

    shell> 
    

    Note that ndb_mgmd must be told where to find its configuration file, using the or option. (See Section 15.6.3, “ndb_mgmd, the Management Server Process”, for details.)

  2. On each of the data node hosts, run this command to start the ndbd process for the first time:

    shell> 
    

    Note that it is very important to use the parameter only when starting ndbd for the first time, or when restarting after a backup/restore operation or a configuration change. This is because the option causes the node to delete any files created by earlier ndbd instances that are needed for recovery, including the recovery log files.

  3. If you used RPM files to install MySQL on the cluster host where the SQL node is to reside, you can (and should) use the startup script installed in to start the MySQL server process on the SQL node. Note that you need to install the server RPM in addition to the Standard server RPM to run the server binary.

If all has gone well, and the cluster has been set up correctly, the cluster should now be operational. You can test this by invoking the ndb_mgm management node client. The output should look like that shown here, although you might see some slight differences in the output depending upon the exact version of MySQL that you are using:

shell> 
-- NDB Cluster -- Management Client --
ndb_mgm> 
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2    @192.168.0.30  (Version: 5.0.25, Nodegroup: 0, Master)
id=3    @192.168.0.40  (Version: 5.0.25, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @192.168.0.10  (Version: 5.0.25)

[mysqld(SQL)]   1 node(s)
id=4   (Version: 5.0.25)

Note: If you are using an older version of MySQL, you may see the SQL node referenced as . This reflects an older usage that is now deprecated.

You should now be ready to work with databases, tables, and data in MySQL Cluster. See Section 15.3.5, “Loading Sample Data and Performing Queries”, for a brief discussion.

15.3.5. Loading Sample Data and Performing Queries

Working with data in MySQL Cluster is not much different from doing so in MySQL without Cluster. There are two points to keep in mind:

  • For a table to be replicated in the cluster, it must use the storage engine. To specify this, use the or table option. You can add this option when creating the table:

    CREATE TABLE  ( ... ) ENGINE=NDBCLUSTER;
    

    Alternatively, for an existing table that uses a different storage engine, use to change the table to use :

    ALTER TABLE  ENGINE=NDBCLUSTER;
    
  • Each table must have a primary key. If no primary key is defined by the user when a table is created, the storage engine automatically generates a hidden one. (Note: This hidden key takes up space just as does any other table index. It is not uncommon to encounter problems due to insufficient memory for accommodating these automatically created indexes.)

If you are importing tables from an existing database using the output of mysqldump, you can open the SQL script in a text editor and add the option to any table creation statements, or replace any existing (or ) options. Suppose that you have the sample database on another MySQL server that does not support MySQL Cluster, and you want to export the table:

shell> 

The resulting file will contain this table creation statement (and the statements necessary to import the table data):

DROP TABLE IF EXISTS `City`;
CREATE TABLE `City` (
  `ID` int(11) NOT NULL auto_increment,
  `Name` char(35) NOT NULL default '',
  `CountryCode` char(3) NOT NULL default '',
  `District` char(20) NOT NULL default '',
  `Population` int(11) NOT NULL default '0',
  PRIMARY KEY  (`ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `City` VALUES (1,'Kabul','AFG','Kabol',1780000);
INSERT INTO `City` VALUES (2,'Qandahar','AFG','Qandahar',237500);
INSERT INTO `City` VALUES (3,'Herat','AFG','Herat',186800);

You will need to make sure that MySQL uses the NDB storage engine for this table. There are two ways that this can be accomplished. One of these is to modify the table definition before importing it into the Cluster database. Using the table as an example, modify the option of the definition as follows:

DROP TABLE IF EXISTS `City`;
CREATE TABLE `City` (
  `ID` int(11) NOT NULL auto_increment,
  `Name` char(35) NOT NULL default '',
  `CountryCode` char(3) NOT NULL default '',
  `District` char(20) NOT NULL default '',
  `Population` int(11) NOT NULL default '0',
  PRIMARY KEY  (`ID`)
) ENGINE=NDBCLUSTER DEFAULT CHARSET=latin1;

INSERT INTO `City` VALUES (1,'Kabul','AFG','Kabol',1780000);
INSERT INTO `City` VALUES (2,'Qandahar','AFG','Qandahar',237500);
INSERT INTO `City` VALUES (3,'Herat','AFG','Herat',186800);

This must be done for the definition of each table that is to be part of the clustered database. The easiest way to accomplish this is to do a search-and-replace on the file that contains the definitions and replace all instances of or with . If you do not want to modify the file, you can use the unmodified file to create the tables, and then use to change their storage engine. The particulars are given later in this section.

Assuming that you have already created a database named on the SQL node of the cluster, you can then use the mysql command-line client to read , and create and populate the corresponding table in the usual manner:

shell> 

It is very important to keep in mind that the preceding command must be executed on the host where the SQL node is running (in this case, on the machine with the IP address ).

To create a copy of the entire database on the SQL node, use mysqldump on the non-cluster server to export the database to a file named ; for example, in the directory. Then modify the table definitions as just described and import the file into the SQL node of the cluster like this:

shell> 

If you save the file to a different location, adjust the preceding instructions accordingly.

It is important to note that in MySQL 5.0 does not support autodiscovery of databases. (See Section 15.10, “Known Limitations of MySQL Cluster”.) This means that, once the database and its tables have been created on one data node, you need to issue the statement (beginning with MySQL 5.0.2, you may use instead), followed by on each SQL node in the cluster. This causes the node to recognize the database and read its table definitions.

Running queries on the SQL node is no different from running them on any other instance of a MySQL server. To run queries from the command line, you first need to log in to the MySQL Monitor in the usual way (specify the password at the prompt):

shell> 
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1 to server version: 5.0.25

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql>

We simply use the MySQL server's account and assume that you have followed the standard security precautions for installing a MySQL server, including setting a strong password. For more information, see Section 2.10.3, “Securing the Initial MySQL Accounts”.

It is worth taking into account that Cluster nodes do not make use of the MySQL privilege system when accessing one another. Setting or changing MySQL user accounts (including the account) effects only applications that access the SQL node, not interaction between nodes.

If you did not modify the clauses in the table definitions prior to importing the SQL script, you should run the following statements at this point:

mysql> 
mysql> 
mysql> 
mysql> 

Selecting a database and running a SELECT query against a table in that database is also accomplished in the usual manner, as is exiting the MySQL Monitor:

mysql> 
mysql> 
+-----------+------------+
| Name      | Population |
+-----------+------------+
| Bombay    |   10500000 |
| Seoul     |    9981619 |
| São Paulo |    9968485 |
| Shanghai  |    9696300 |
| Jakarta   |    9604900 |
+-----------+------------+
5 rows in set (0.34 sec)

mysql> 
Bye

shell>

Applications that use MySQL can employ standard APIs to access NDB tables. It is important to remember that your application must access the SQL node, and not the MGM or data nodes. This brief example shows how we might execute the statement just shown by using PHP 5's extension running on a Web server elsewhere on the network:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
  "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
  <meta http-equiv="Content-Type"
        content="text/html; charset=iso-8859-1">
  <title>SIMPLE mysqli SELECT</title>
</head>
<body>
<?php
  # connect to SQL node:
  $link = new mysqli('192.168.0.20', 'root', '', 'world');
  # parameters for mysqli constructor are:
  #   host, user, password, database

  if( mysqli_connect_errno() )
    die("Connect failed: " . mysqli_connect_error());

  $query = "SELECT Name, Population
            FROM City
            ORDER BY Population DESC
            LIMIT 5";

  # if no errors...
  if( $result = $link->query($query) )
  {
?>
<table border="1" width="40%" cellpadding="4" cellspacing ="1">
  <tbody>
  <tr>
    <th width="10%">City</th>
    <th>Population</th>
  </tr>
<?
    # then display the results...
    while($row = $result->fetch_object())
      printf(<tr>\n  <td align=\"center\">%s</td><td>%d</td>\n</tr>\n",
              $row->Name, $row->Population);
?>
  </tbody
</table>
<?
  # ...and verify the number of rows that were retrieved
    printf("<p>Affected rows: %d</p>\n", $link->affected_rows);
  }
  else
    # otherwise, tell us what went wrong
    echo mysqli_error();

  # free the result set and the mysqli connection object
  $result->close();
  $link->close();
?>
</body>
</html>

We assume that the process running on the Web server can reach the IP address of the SQL node.

In a similar fashion, you can use the MySQL C API, Perl-DBI, Python-mysql, or MySQL AB's own Connectors to perform the tasks of data definition and manipulation just as you would normally with MySQL.

15.3.6. Safe Shutdown and Restart

To shut down the cluster, enter the following command in a shell on the machine hosting the MGM node:

shell> 

The option here is used to pass a command to the ndb_mgm client from the shell. See Section 4.3.1, “Using Options on the Command Line”. The command causes the ndb_mgm, ndb_mgmd, and any ndbd processes to terminate gracefully. Any SQL nodes can be terminated using mysqladmin shutdown and other means.

To restart the cluster, run these commands:

  • On the management host ( in our example setup):

    shell> 
    
  • On each of the data node hosts ( and ):

    shell> 
    

    Remember not to invoke this command with the option when restarting an NDBD node normally.

  • On the SQL host ():

    shell> 
    

For information on making Cluster backups, see Section 15.8.2, “Using The Management Client to Create a Backup”.

To restore the cluster from backup requires the use of the ndb_restore command. This is covered in Section 15.8.3, “How to Restore a Cluster Backup”.

More information on configuring MySQL Cluster can be found in Section 15.4, “MySQL Cluster Configuration”.