Configure Multiple Fencing Devices Using crm

Often, people build clusters using just one fence device. As with everything in HA clustering, a single fence device becomes a single point of failure.

To address this, "fencing_topology" was added to pacemaker to configure multiple, and complex, fencing configurations.

= The Problem =

A very popular method of fencing is to use the IPMI BMC within the node is used for this purpose. So this method of fencing will be used in this example.

This approach has two single points of failure and is, by itself, less than ideal.


 * 1) The IPMI's BMC draws it's power from the host's power supply. Should the host lose power, then the IPMI will not be able to respond to fence requests and the fence action will fail.
 * 2) The IPMI's network connection uses a single network interface, so a broken or disconnected network cable, a failed switch port or switch or a failure in the NIC itself would also leave the IPMI interface inaccessible.

Variations on this problem apply to most all fence devices.

= The Solution =

This simple solution to this problem is to use a second fence method. A popular alternative fence method is to use one or more switched PDUs. These allow the other node to cut the power outlets feeding the target node's power supply(ies). In this example, we will show how two separate PDUs, each powering either side of a node's redundant PSU, a common configuration where two separate power rails are used for power fault tolerance.

To provide total redundancy, when two switches are available, the PDU(s) can be connected to the second switch. This will ensure that the backup fence method is available should the primary switch fail completely. This requires a more complex network configuration that is outside the scope of this mini-tutorial, however.

= Ordering =

We prefer to use IPMI fencing because when it does work, and when it confirms that a node is off, we can be certain that the fence action was successful. The switched PDUs, on the other hand, confirm the fence action was successful when the requested power outlets are opened. If a user moved the node's power cable(s) after the fencing was configured, the fence action may return a "success" when the node did not actually power off.

For this reason, we will want to ensure that the IPMI fence method is used when possible. We only want to fall back to the PDU-based fencing if IPMI fails.

= Implementation =

Implementing ordered, grouped fencing requires three steps;


 * 1) The fence methods need to be defined. The IPMI will have a single "reboot" action defined. Each outlet on each PDU will have two actions defined; An "off" and an "on" action.
 * 2) Each fence method will have a location constraint set to ensure it can not be used by the target node (a node can never fence itself).
 * 3) fencing_topology will be configured to say "Use the IPMI interface first, but if it fails, call both PDUs and only consider the PDU fence method a success if both PDUs successfully fence".

Configuring the fence devices
We will need to make a few assumptions about our example cluster;
 * It is a two-node cluster with the node names "pcmk-1" and "pcmk-2".
 * The two PDUs are accessible at the network address "pdu-1" and "pdu-2" and will be accessed using the "fence_apc_snmp" fence agent.
 * Fencing details;
 * The fencing details for "pcmk-1" are;
 * IPMI device address is "pcmk-1.ipmi", the login name is "admin" and the password is "secret".
 * It's power supplies are connected to "pdu-1" on port 1 and "pdu-2" on port 1.
 * The fencing details for "pcmk-2" are;
 * IPMI device address is "pcmk-2.ipmi", the login name is "admin" and the password is "secret".
 * It's power supplies are connected to "pdu-1" on port 2 and "pdu-2" on port 2.

Please adapt the example below to the names, addresses, credentials and fence agents you are using in your cluster.

Configuring The Fence Methods
We will configure the IPMI fence method to use the "reboot" action. The PDU fencing is more complicated though.

In order for fencing to work when two separate PDUs are used, we must ensure that there is a period of time where both PDUs have their ports powered off at the same time. To do this, we need to setup four primitives; One for each device set to an "off" action and another for each device set to an "on" action. This will allow us to call "pdu1:x off -> pdu2:x off -> pdu1:x on -> pdu2:x on".

Configure the IPMI fence methods for "pcmk-1";

crm configure primitive fence_pcmk1_ipmi stonith:fence_ipmilan params \ ipaddr="pcmk-1.ipmi" action="reboot" login="admin" passwd="secret" \ pcmk_host_list="pcmk-1"

Now configure the four PDU fence methods for "pcmk-1";

crm configure primitive fence_pcmk1_psu1_off stonith:fence_apc_snmp params \ ipaddr="pdu-1" action="off" port="1" pcmk_host_list="pcmk-1" crm configure primitive fence_pcmk1_psu2_off stonith:fence_apc_snmp params \ ipaddr="pdu-2" action="off" port="1" pcmk_host_list="pcmk-1" crm configure primitive fence_pcmk1_psu1_on stonith:fence_apc_snmp params \ ipaddr="pdu-1" action="on" port="1" pcmk_host_list="pcmk-1" crm configure primitive fence_pcmk1_psu2_on stonith:fence_apc_snmp params \ ipaddr="pdu-2" action="on" port="1" pcmk_host_list="pcmk-1"

Configure the IPMI fence method for "pcmk-2";

crm configure primitive fence_pcmk2_ipmi stonith:fence_ipmilan params \ ipaddr="pcmk-2.ipmi" action="reboot" login="admin" passwd="secret" \ pcmk_host_list="pcmk-2"

Finally, configure the four PDU fence methods for "pcmk-2";

crm configure primitive fence_pcmk2_psu1_off stonith:fence_apc_snmp params \ ipaddr="pdu-1" action="off" port="2" pcmk_host_list="pcmk-2" crm configure primitive fence_pcmk2_psu2_off stonith:fence_apc_snmp params \ ipaddr="pdu-2" action="off" port="2" pcmk_host_list="pcmk-2" crm configure primitive fence_pcmk2_psu1_on stonith:fence_apc_snmp params \ ipaddr="pdu-1" action="on" port="2" pcmk_host_list="pcmk-2" crm configure primitive fence_pcmk2_psu2_on stonith:fence_apc_snmp params \ ipaddr="pdu-2" action="on" port="2" pcmk_host_list="pcmk-2"

Configuring Location Constraints
A node can never fence itself, so now we want to tell pacemaker to never let the target of each fence method run the fence methods aimed at it.

First, configure the fence methods aimed at "pcmk-1" to never run on "pcmk-1" by assigning a negative-infinity location score;

crm configure location loc_fence_pcmk1_ipmi fence_pcmk1_ipmi -inf: pcmk-1 crm configure location loc_fence_pcmk1_psu1_off fence_pcmk1_psu1_off -inf: pcmk-1 crm configure location loc_fence_pcmk1_psu2_off fence_pcmk1_psu2_off -inf: pcmk-1 crm configure location loc_fence_pcmk1_psu1_on fence_pcmk1_psu1_on -inf: pcmk-1 crm configure location loc_fence_pcmk1_psu2_on fence_pcmk1_psu2_on -inf: pcmk-1

Secondly, do the same for "pcmk-2";

crm configure location loc_fence_pcmk2_ipmi fence_pcmk2_ipmi -inf: pcmk-2 crm configure location loc_fence_pcmk2_psu1_off fence_pcmk2_psu1_off -inf: pcmk-2 crm configure location loc_fence_pcmk2_psu2_off fence_pcmk2_psu2_off -inf: pcmk-2 crm configure location loc_fence_pcmk2_psu1_on fence_pcmk2_psu1_on -inf: pcmk-2 crm configure location loc_fence_pcmk2_psu2_on fence_pcmk2_psu2_on -inf: pcmk-2

Configuring fencing_topology
The last step is to tell pacemaker what order we want the fencing methods to run. This is node using the format:


 * nodeX: method1 method2a,method2b methodN

This says; "For nodeX, try 'method1' first. If that fails, try 'method2a and then method2b' and make sure both succeed. If either fails, consider the attempt failed and move on the 'methodN'. If that fails, loop back to 'method1' and try it again..."

So in our case, this translates to the following;

crm configure fencing_topology pcmk-1: fence_pcmk1_ipmi fence_pcmk1_psu1_off,fence_pcmk1_psu2_off,fence_pcmk1_psu1_on,fence_pcmk1_psu2_on \ pcmk-2: fence_pcmk2_ipmi fence_pcmk2_psu1_off,fence_pcmk2_psu2_off,fence_pcmk2_psu1_on,fence_pcmk2_psu2_on

You can test this my unplugging the IPMI interface for "pcmk-1" and then crashing it, triggering "pcmk-2" to call a fence against it. After the IPMI interface times out, you should see PDU 1, port 1 turn off. then PDU 2, port 1 turn off, the crashed node will power down, then PDU 1 port 1 should turn back on and finally PDU 2 port 1 should turn back on. If you configured your server's BIOS to power on after power loss or to return to last state after power loss, your server should start to power back on.