Linux Servers Cluster using HeartBeat
15 Jan 2012Today I’ve been working to set up a High Availability cluster made up of 2 virtual instances of CentOS Servers using Heartbeat. The cluster will be using one of the servers as primary and the second one as backup. The basic idea behing Heartbeat is that the backup server will be continuously pinging the primary server (using broadcast, multicast or unicast packets). In the eventuality that the active server goes down the backup server sets up a floating IP address, sends a gratious ARP message informing the others hosts on the local network segment that the mac-address corresponding to the floating IP has changed. The floating IP address is an IP address shared by the active and standby servers but it’s assigned just to one of them at a time.
Heartbeat setup is pretty straight forward, I’ll describe in the following lines how you can install it on a CentOS system.
- Download the latest epel-release rpm from
- Install Heartbeat packages:
- The Heartbeat config files are authkeys, ha.cf and haresources. We have to move thos into /etc/ha.d/ directory and edit them according to our setup.
- Edit authkeys with sha1 as authentication method:
- Change the permission for /etc/ha.d/authkeys to “600”:
- Now let’s edit the ha.cf configuration file:
- Edit the haresources file - it must contain the hostname of the primary server and the floating virtual IP shared by the 2 servers:
- Edit the Linux hosts file so that we map each of the hostnames to their corresponind IP address:
The steps above must be repeated on Server2, please note that the configuration on both the machines in the cluster must match. After setting up the config on the machines we are ready to start the Heartbeat service.
- Starting the Heartbeat service:
Let’s test configuration, here comes the tricky part where I encountered some issues with my setup:
So we’ve got server1 (10.0.0.11) and server2 (10.0.0.12) running on 2 virtual machine instances. I’m starting some ping packets from my PC (10.0.0.5) to the floating IP address - 10.0.0.20. I’m checking the ARP table of the PC and I’m seeing that 10.0.0.20 is associated to the mac address of server1, the primary node. Next, I shutdown interface eth0 of server1 and as configured in the ha.cf file I’m expecting for the ping packets to fail just for a couple of seconds more than 5 seconds.
Unfortunately I don’t get this result with my setup, the ICMP replies start to appear after aproximately 40 seconds. From what I’m seeing the gratious ARP message which had to be sent by Heartbeat to inform that the HW MAC address corresponding to 10.0.0.20 has changed doesn’t reach my PC. So the PC waits until the OS flushes the ARP entry for 10.0.0.20 and then broadcasts another ARP request for 10.0.0.20. One thing I don’t understand is that before sending a broadcast ARP in the network to discover who has 10.0.0.20 my PC sends 3 unicast ARP messages to the MAC Address of the host which went down asking for 10.0.0.20.
If you have got any ideas please let me know how I can improve this behavior.