Configure Multiple Fencing Devices Using pcs

Often, people build clusters using just one fence device. As with everything in HA clustering, a single fence device becomes a single point of failure.

To address this, "fencing-topology" was added to pacemaker to configure multiple, and complex, fencing configurations. In pcs, this is configured using "stonith levels".

= The Problem =

A very popular method of fencing is to use the IPMI BMC within the node is used for this purpose. So this method of fencing will be used in this example.

This approach has two single points of failure and is, by itself, less than ideal.


 * 1) The IPMI's BMC draws it's power from the host's power supply. Should the host lose power, then the IPMI will not be able to respond to fence requests and the fence action will fail.
 * 2) The IPMI's network connection uses a single network interface, so a broken or disconnected network cable, a failed switch port or switch or a failure in the NIC itself would also leave the IPMI interface inaccessible.

Variations on this problem apply to most all fence devices.

= The Solution =

This simple solution to this problem is to use a second fence method. A popular alternative fence method is to use one or more switched PDUs. These allow the other node to cut the power outlets feeding the target node's power supply(ies). In this example, we will show how two separate PDUs, each powering either side of a node's redundant PSU, a common configuration where two separate power rails are used for power fault tolerance.

To provide total redundancy, when two switches are available, the PDU(s) can be connected to the second switch. This will ensure that the backup fence method is available should the primary switch fail completely. This requires a more complex network configuration that is outside the scope of this mini-tutorial, however.

= Ordering =

We prefer to use IPMI fencing because when it does work, and when it confirms that a node is off, we can be certain that the fence action was successful. The switched PDUs, on the other hand, confirm the fence action was successful when the requested power outlets are opened. If a user moved the node's power cable(s) after the fencing was configured, the fence action may return a "success" when the node did not actually power off.

For this reason, we will want to ensure that the IPMI fence method is used when possible. We only want to fall back to the PDU-based fencing if IPMI fails.

= Implementation =

Implementing ordered, grouped fencing requires two steps;


 * 1) The fence methods need to be defined. The IPMI will have a single "reboot" action defined. Each outlet on each PDU will have two actions defined; An "off" and an "on" action.
 * 2) "fence levels" will be configured to say "Use the IPMI interface first, but if it fails, call both PDUs and only consider the PDU fence method a success if both PDUs successfully fence".

Starting Point
For a frame of reference, the cluster starts with this configuration;

Cluster Name: an-cluster-03 Corosync Nodes: pcmk-1 pcmk-2 Pacemaker Nodes: pcmk-1 pcmk-2 Resources: Stonith Devices: Fencing Levels: Location Constraints: Ordering Constraints: Colocation Constraints: Cluster Properties: cluster-infrastructure: corosync dc-version: 1.1.9-dde1c52 no-quorum-policy: ignore stonith-enabled: false

Configuring the fence devices
We will need to make a few assumptions about our example cluster;
 * It is a two-node cluster with the node names "pcmk-1" and "pcmk-2".
 * The two PDUs are accessible at the network address "pdu-1" and "pdu-2" and will be accessed using the "fence_apc_snmp" fence agent.
 * Fencing details;
 * The fencing details for "pcmk-1" are;
 * IPMI device address is "pcmk-1.ipmi", the login name is "admin" and the password is "secret".
 * It's power supplies are connected to "pdu-1" on port 1 and "pdu-2" on port 1.
 * The fencing details for "pcmk-2" are;
 * IPMI device address is "pcmk-2.ipmi", the login name is "admin" and the password is "secret".
 * It's power supplies are connected to "pdu-1" on port 2 and "pdu-2" on port 2.

Please adapt the example below to the names, addresses, credentials and fence agents you are using in your cluster.

Configuring The Fence Methods
We will configure the IPMI fence method to use the "reboot" action. The PDU fencing is more complicated though.

In order for fencing to work when two separate PDUs are used, we must ensure that there is a period of time where both PDUs have their ports powered off at the same time. To do this, we need to setup four primitives; One for each device set to an "off" action and another for each device set to an "on" action. This will allow us to call "pdu1:x off -> pdu2:x off -> pdu1:x on -> pdu2:x on".

Configure the IPMI fence methods for "pcmk-1";

pcs stonith create fence_pcmk1_ipmi fence_ipmilan \ pcmk_host_list="pcmk-1" ipaddr="pcmk-1.ipmi" \ action="reboot" login="admin" passwd="secret" delay=15 \ op monitor interval=60s

Now configure the four PDU fence methods for "pcmk-1". Note that we've added 'power_wait="5"' to the second PDU's "off" action. Later, we will stitch these actions together and this argument will tell pacemaker to wait 5 seconds after turning off the second PDU before restoring power. This gives plenty of time for the node's power supplies to completely drain, ensuring that the node loses power. You will also note that the "monitor" operation is only set on the "off" actions. There is no need to monitor the status of the "on" actions as it would be redundant.

pcs stonith create fence_pcmk1_pdu1_off fence_apc_snmp \ pcmk_host_list="pcmk-1" ipaddr="pdu-1" action="off" \ port="1" op monitor interval="60s" pcs stonith create fence_pcmk1_pdu2_off fence_apc_snmp \ pcmk_host_list="pcmk-1" ipaddr="pdu-2" action="off" \ port="1" power_wait="5" \ op monitor interval="60s" pcs stonith create fence_pcmk1_pdu1_on fence_apc_snmp \ pcmk_host_list="pcmk-1" ipaddr="pdu-1" action="on" \ port="1" pcs stonith create fence_pcmk1_pdu2_on fence_apc_snmp \ pcmk_host_list="pcmk-1" ipaddr="pdu-2" action="on" \ port="1"
 * 1) Node 1 - Off
 * 1) Node 1 - on

Configure the IPMI fence method for "pcmk-2";

pcs stonith create fence_pcmk2_ipmi fence_ipmilan \ pcmk_host_list="pcmk-2" ipaddr="pcmk-2.ipmi" \ action="reboot" login="admin" passwd="secret" delay=15 \ op monitor interval=60s

Finally, configure the four PDU fence methods for "pcmk-2";

pcs stonith create fence_pcmk2_pdu1_off fence_apc_snmp \ pcmk_host_list="pcmk-2" ipaddr="pdu-1" action="off" \ port="2" op monitor interval="60s" pcs stonith create fence_pcmk2_pdu2_off fence_apc_snmp \ pcmk_host_list="pcmk-2" ipaddr="pdu-2" action="off" \ port="2" power_wait="5" \ op monitor interval="60s" pcs stonith create fence_pcmk2_pdu1_on fence_apc_snmp \ pcmk_host_list="pcmk-2" ipaddr="pdu-1" action="on" \ port="2" pcs stonith create fence_pcmk2_pdu2_on fence_apc_snmp \ pcmk_host_list="pcmk-2" ipaddr="pdu-2" action="on" \ port="2"
 * 1) Node 2 - Off
 * 1) Node 2 - on

Now that fencing is configured, we can enable the "stonith-enabled" property.

pcs property set stonith-enabled=true

Configuring fencing_topology
The next step is to tell pacemaker what order we want the fencing methods to run. This is node using the format:


 * nodeX: method1 method2a,method2b methodN

This says; "For nodeX, try 'method1' first. If that fails, try 'method2a and then method2b' and make sure both succeed. If either fails, consider the attempt failed and move on the 'methodN'. If that fails, loop back to 'method1' and try it again..."

So in our case, the first step is to tell pacemaker that the IPMI-based fence methods are the first methods to use;

pcs stonith level add 1 pcmk-1 fence_pcmk1_ipmi pcs stonith level add 1 pcmk-2 fence_pcmk2_ipmi

Next, we tell pacemaker to use the switched PDUs as the second method;

pcs stonith level add 2 pcmk-1 fence_pcmk1_pdu1_off,fence_pcmk1_pdu2_off,fence_pcmk1_pdu1_on,fence_pcmk1_pdu2_on pcs stonith level add 2 pcmk-2 fence_pcmk2_pdu1_off,fence_pcmk2_pdu2_off,fence_pcmk2_pdu1_on,fence_pcmk2_pdu2_on

You can test this my unplugging the IPMI interface for "pcmk-1" and then crashing it, triggering "pcmk-2" to call a fence against it. After the IPMI interface times out, you should see PDU 1, port 1 turn off. then PDU 2, port 1 turn off, the crashed node will power down, then PDU 1 port 1 should turn back on and finally PDU 2 port 1 should turn back on. If you configured your server's BIOS to power on after power loss or to return to last state after power loss, your server should start to power back on.