Load Balanced MySQL Replicated Cluster

This page details how to install a 2-node cluster to load-balance and make highly available:


 * Multiple Instances of Apache Tomcat
 * MySQL with Master-Master circular replication

It also shows how to set up STONITH on HP iLO remote supervisor access.

This is based upon SLES 11 64-bit with the high-availability extension http://www.novell.com/products/highavailability/.

Background
MySQL can scale out very well for read nodes using replication but scaling write nodes is more problematic. Generally only one node is possible for inserting data into a database unless MySQL cluster is used. The reason for this is when data is inserted from separate locations, auto_increment columns can produce the same value from 2 different nodes for different data inserts, causing conflicts when the data tries to merge.

MySQL circular replication can be used to scale out write nodes but there are certain considerations to taken into account. The data will only be as complete as the speed of the replication. If data is inserted faster than the MySQL slave thread can run then each node can be missing data from the other node. This can be acceptable or not depending on your application and data requirements. For example if you use foreign keys in your database, inserts will fail if the data which the foreign key references has not yet been replicated. These issues need to be considered before you decide to employ Master-Master circular replication.

This howto will show how 5 Tomcat instances running on 5 separate IP addresses can be made highly available and load-balanced. This is to show how additional IP addresses can also be load-balanced using a virtual IP.

IP Addressing Schema
The services will be laid out like so:

NODE1

 * 192.168.1.10        NODE1-TOMCAT1 MYSQL1 NODE1
 * 192.168.1.11        NODE1-ILO
 * 192.168.1.12        NODE1-TOMCAT2
 * 192.168.1.13        NODE1-TOMCAT3
 * 192.168.1.14        NODE1-TOMCAT4
 * 192.168.1.15        NODE1-TOMCAT5

NODE2

 * 192.168.1.20        NODE2-TOMCAT1 MYSQL2 NODE2
 * 192.168.1.21        NODE2-ILO
 * 192.168.1.22        NODE2-TOMCAT2
 * 192.168.1.23        NODE2-TOMCAT3
 * 192.168.1.24        NODE2-TOMCAT4
 * 192.168.1.25        NODE2-TOMCAT5

Load-balanced IPs

 * 192.168.1.100      VIP-TOMCAT1 VIP-MYSQL
 * 192.168.1.102      VIP-TOMCAT2
 * 192.168.1.103      VIP-TOMCAT3
 * 192.168.1.104      VIP-TOMCAT4
 * 192.168.1.105      VIP-TOMCAT5

Operating System Install
Install SLES 11 64-bit using an ISO image on a virtual machine or physical media on a physical box. Accept all defaults but don't bother adding the add-on product yet. Turn off the firewall so you can SSH to it after the install.

Networking
Set up the network config files like so:

/etc/sysconfig/network/ifcfg-eth0
The main node IP address is configured either during the server install, or afterwards using YAST. The extra IP addresses for each Tomcat service to run on are configured in here too.

/etc/sysconfig/network/ifcfg-lo
Virtual IP's have to be configured in the loopback config file. They have to be configured here to ensure the interface doesn't respond to ARP packets, but also to allow Linux to route correctly locally.


 * 1) Loopback (lo) configuration

/etc/hosts
It's a good idea to have a record of all hosts and IP's in the local /etc/hosts of each node. This way the cluster isn't relying on DNS for name resolution. Add the entries into /etc/hosts on each node.

ARP Parameters
To load-balance using ldirectord and LVS we need to restrict ARP using /etc/sysctl.conf. These settings are taken from http://kb.linuxvirtualserver.org/wiki/Using_arp_announce/arp_ignore_to_disable_ARP which describes the effect they have.

Add this info to /etc/sysctl.conf and re-read the config:

sysctl -p

Install Software

 * Copy the SLES-11 DVD ISO and the SLE-HAE DVD ISO to a folder called /ISO on the server
 * Run YAST
 * Go to Software Repositories
 * Delete any repositories that are set up in there
 * Add the SLES-11 ISO and accept the agreement
 * Add the SLE-HAE ISO and accept the agreement
 * Go to Software management
 * Change the filter to Pattern and select High-Availability.
 * Accept everything and let it install all dependencies.

After the package install has finished exit out of YAST and run:

zypper install gcc perl-mailtools perl-dbi heartbeat-ldirectord

Accept all dependencies again.

MySQL Install
Install MySQL server, client and shared from your chosen source e.g. from MySQL Enterprise RPM's:

rpm -Uvh MySQL-server-advanced-gpl-5.1.38-0.sles11.x86_64.rpm \ MySQL-client-advanced-gpl-5.1.38-0.sles11.x86_64.rpm \ MySQL-shared-advanced-gpl-5.1.38-0.sles11.x86_64.rpm

Run mysql_secure_installation and set up appropriately.

Ensure MySQL doesn't start at boot time as the cluster will control MySQL using an OCF resource agent:

chkconfig mysql off

Missing Perl Socket6
Ldirectord in SLES 11 won't work due to a missing perl Socket6 library. You can get this as source and compile it from http://search.cpan.org/~umemoto/Socket6/ or find an RPM. Installing from source is probably the easiest. Download the latest version (currently 6.0.23) and extract it. CD into the extracted dir then run:

./configure perl Makefile.PL make make install

ldirectord.cf
Create a file named /etc/ha.d/ldirectord.cf and edit it. Add the following into it:

All of the above should be self-explanatory or the comments should say what is going on. For more information check the man page for ldirectord.cf and look at the man page for ipvsadm for information on the different schedulers available.

Tomcat
In this howto we are using Tomcat CATALINA_BASE of /opt/tomcat1 /opt/tomcat2 etc and a CATALINA_HOME of /opt/tomcat.

Edit the server.xml config file of each Tomcat instance to have a connector for each port for the local IP and a connector for each port for the load-balanced virtual IP. For example tomcat1 listens on port 80 and 443 so will need a connector for both ports, on both the local and virtual IP address:

The local IP will always be present on eth0 on NODE1, the virtual IP will either be present as an additional IP on eth0 or as an additional IP on lo depending on which node the ldirectord resource is running on.

LSB Init Scripts
OCF resource agents are the best to use if there is one available or if you write your own. However sometimes an LSB init script will suffice as long as it responds correctly to the start, stop and status arguments. It also has to give the correct return codes after each operation depending on the state of the resource. Check if your init script is LSB compatible by following the appendix at the end of the http://www.clusterlabs.org/wiki/Media:Configuration_Explained.pdf document.

There is a Tomcat resource agent already provided with Pacemaker. However if we were to use a resource agent for any reason the following template should work fine. Just replace @service@ with the name of the Tomcat instance.

MySQL Permissions
The permissions shown below are for a test setup. For production use a more fine-grained level of privilege control needs to be used.

NODE1
mysql -uroot -ppassword -e"grant all privileges on *.* to 'slave_user'@'localhost' identified by 'password'"; mysql -uroot -ppassword -e"grant all privileges on *.* to 'slave_user'@'%' identified by 'password'"; mysql -uroot -ppassword -e"flush privileges; reset master;"

NODE2
mysql -uroot -ppassword -e"grant all privileges on *.* to 'slave_user'@'localhost' identified by 'password'"; mysql -uroot -ppassword -e"grant all privileges on *.* to 'slave_user'@'%' identified by 'password'"; mysql -uroot -ppassword -e"flush privileges; reset master;"

Circular Replication Setup
Before performing this step ensure you can start MySQL manually using the init script and you have configured the /etc/my.cnf file correctly on each node using the following parameters in particular:

Once MySQL is happily running using the standard init script, run a reset master and show master status on each node to ensure that the master log file and position is the standard mysql-bin.000001 and 106. Also ensure that both MySQL installs are exactly the same with regards to privileges, databases and tables. We are going to be replicating changes to all databases so nothing can be different before replication starts.

NODE1
mysql -uroot -ppassword

NODE2
mysql -uroot -ppassword

Check the slave is running correctly on each node by running

show slave status\G

and checking the status is 'Waiting for master to send event' and the Slave_IO and the Slave_SQL threads are running.

MySQL Test Table
Add a database called ldirectord. This will be used by ldirectord to check the status of MySQL and by the MySQL OCF resource agent for checking status.

mysqladmin -uroot -ppassword create ldirectord

Then create a table called connectioncheck

mysql -uroot -ppassword

OpenAIS
Edit /etc/ais/openais.conf to your setup. The main lines you will need to change are:

Create a key for the AIS communication:

ais-keygen

Then scp the config to the other nodes in the cluster:

scp -r /etc/ais/ NODE2:/etc/

Live CIB
Here we only provide the whole configuration for the cluster while explaining some of the aspects which are especially applicable to this situation. Due to the wealth of documentation for Pacemaker and the crm shell on this web site it doesn't require going over again.

Save the following into a file such as crm_config.txt

Some notes about the above configuration:


 * We put lvs_support="true" into each Virtual-IP entry so the IPAddr2 resource agent can remove the IP from the loopback device before adding it to eth0. The node which isn't running ldirectord will still have the IP on the loopback device until it is promoted, where the IP will be removed from lo and added to eth0.
 * The 2 STONITH devices are named corresponding to which node they will operate on. But lower down they are given -infinity location scores to ensure they can never run on that node.
 * We set no-quorum-policy to ignore to ensure the cluster will continue to operate if 1 node is down. This is essential for 2-node clusters.
 * start-failure-is-fatal is set to false to allow migration-threshold to work on each resource.

Import the crm config into the cluster on any active node:

crm configure < crm_config.txt

Check Cluster
Finally check the cluster to ensure it works:

crm_mon -1