Configure Multiple Fencing Devices Using crm

From ClusterLabs

Jump to: navigation, search

This describes how to Configure Multiple Fencing Devices (using that page's example of IPMI followed by two switched PDUs) via the crm tool.

Starting Point

For a frame of reference, the cluster starts with this configuration;

node $id="1" an-c03n01.alteeve.ca
node $id="2" an-c03n02.alteeve.ca
property $id="cib-bootstrap-options" \
	dc-version="1.1.9-dde1c52" \
	cluster-infrastructure="corosync" \
	no-quorum-policy="ignore" \
	stonith-enabled="false"

Configuring the fence devices

We will need to make a few assumptions about our example cluster;

  • It is a two-node cluster with the node names "pcmk-1" and "pcmk-2".
  • The two PDUs are accessible at the network address "pdu-1" and "pdu-2" and will be accessed using the "fence_apc_snmp" fence agent.
  • Fencing details;
    • The fencing details for "pcmk-1" are;
      • IPMI device address is "pcmk-1.ipmi", the login name is "admin" and the password is "secret".
      • It's power supplies are connected to "pdu-1" on port 1 and "pdu-2" on port 1.
    • The fencing details for "pcmk-2" are;
      • IPMI device address is "pcmk-2.ipmi", the login name is "admin" and the password is "secret".
      • It's power supplies are connected to "pdu-1" on port 2 and "pdu-2" on port 2.

Please adapt the example below to the names, addresses, credentials and fence agents you are using in your cluster.

Configuring The Fence Methods

We will configure the IPMI fence method to use the "reboot" action. The PDU fencing is more complicated though.

In order for fencing to work when two separate PDUs are used, we must ensure that there is a period of time where both PDUs have their ports powered off at the same time. To do this, we need to setup four primitives; One for each device set to an "off" action and another for each device set to an "on" action. This will allow us to call "pdu1:x off -> pdu2:x off -> pdu1:x on -> pdu2:x on".

Template note icon.png
Note: Prior to version 1.1.10, 'action="..."' was ignored. If you have a version of pacemaker below this (including 1.1.10 rc5 and older), you will need to replace 'action="..."' with 'pcmk_reboot_action="..."'.

Configure the IPMI fence methods for "pcmk-1";

crm configure primitive fence_pcmk1_ipmi stonith:fence_ipmilan params \
       ipaddr="an-c03n01.ipmi" action="reboot" login="admin" passwd="secret" delay="15" \
       pcmk_host_list="pcmk-1" op monitor interval="60s"

Now configure the four PDU fence methods for "pcmk-1". Note that we've added 'power_wait="5"' to the second PDU's "off" action. Later, we will stitch these actions together and this argument will tell pacemaker to wait 5 seconds after turning off the second PDU before restoring power. This gives plenty of time for the node's power supplies to completely drain, ensuring that the node loses power. You will also note that the "monitor" operation is only set on the "off" actions. There is no need to monitor the status of the "on" actions as it would be redundant.

crm configure primitive fence_pcmk1_psu1_off stonith:fence_apc_snmp params \
       ipaddr="pdu-1" action="off" port="1" pcmk_host_list="pcmk-1" \
       op monitor interval="60s"
crm configure primitive fence_pcmk1_psu2_off stonith:fence_apc_snmp params \
       ipaddr="pdu-2" action="off" port="1" pcmk_host_list="pcmk-1" \
       power_wait="5" op monitor interval="60s"
crm configure primitive fence_pcmk1_psu1_on stonith:fence_apc_snmp params \
       ipaddr="pdu-1" action="on" port="1" pcmk_host_list="pcmk-1"
crm configure primitive fence_pcmk1_psu2_on stonith:fence_apc_snmp params \
       ipaddr="pdu-2" action="on" port="1" pcmk_host_list="pcmk-1"

Configure the IPMI fence method for "pcmk-2";

crm configure primitive fence_pcmk2_ipmi stonith:fence_ipmilan params \
       ipaddr="an-c03n02.ipmi" action="reboot" login="admin" passwd="secret" \
       pcmk_host_list="pcmk-2" op monitor interval="60s"

Finally, configure the four PDU fence methods for "pcmk-2";

crm configure primitive fence_pcmk2_psu1_off stonith:fence_apc_snmp params \
       ipaddr="pdu-1" action="off" port="1" pcmk_host_list="pcmk-2" \
       op monitor interval="60s"
crm configure primitive fence_pcmk2_psu2_off stonith:fence_apc_snmp params \
       ipaddr="pdu-2" action="off" port="1" pcmk_host_list="pcmk-2" \
       power_wait="5" op monitor interval="60s"
crm configure primitive fence_pcmk2_psu1_on stonith:fence_apc_snmp params \
       ipaddr="pdu-1" action="on" port="1" pcmk_host_list="pcmk-2"
crm configure primitive fence_pcmk2_psu2_on stonith:fence_apc_snmp params \
       ipaddr="pdu-2" action="on" port="1" pcmk_host_list="pcmk-2"

Now that fencing is configured, we can enable the "stonith-enabled" property.

crm configure property stonith-enabled=true

Configuring fencing_topology

The next step is to tell pacemaker what order we want the fencing methods to run. This is node using the format:

  • nodeX: method1 method2a,method2b methodN

This says; "For nodeX, try 'method1' first. If that fails, try 'method2a and then method2b' and make sure both succeed. If either fails, consider the attempt failed and move on the 'methodN'. If that fails, loop back to 'method1' and try it again..."

So in our case, this translates to the following;

crm configure fencing_topology \
      pcmk-1: fence_pcmk1_ipmi fence_pcmk1_psu1_off,fence_pcmk1_psu2_off,fence_pcmk1_psu1_on,fence_pcmk1_psu2_on \
      pcmk-2: fence_pcmk2_ipmi fence_pcmk2_psu1_off,fence_pcmk2_psu2_off,fence_pcmk2_psu1_on,fence_pcmk2_psu2_on

You can test this my unplugging the IPMI interface for "pcmk-1" and then crashing it, triggering "pcmk-2" to call a fence against it. After the IPMI interface times out, you should see PDU 1, port 1 turn off. then PDU 2, port 1 turn off, the crashed node will power down, then PDU 1 port 1 should turn back on and finally PDU 2 port 1 should turn back on. If you configured your server's BIOS to power on after power loss or to return to last state after power loss, your server should start to power back on.

Personal tools