Configure Multiple Fencing Devices

From ClusterLabs

Jump to: navigation, search

Pacemaker supports fencing to ensure problematic nodes cannot corrupt resources.

Sometimes, it is either insufficient or undesired to use only a single fencing device. To configure multiple fencing devices, Pacemaker supports fencing topologies, also called STONITH levels.

The Problem

A very popular method of fencing is to use the IPMI BMC within the node for this purpose. So this method of fencing will be used in this example.

This approach has two single points of failure and is, by itself, less than ideal.

  1. The IPMI's BMC draws it's power from the host's power supply. Should the host lose power, then the IPMI will not be able to respond to fence requests and the fence action will fail.
  2. The IPMI's network connection uses a single network interface, so a broken or disconnected network cable, a failed switch port or switch or a failure in the NIC itself would also leave the IPMI interface inaccessible.

Variations on this problem apply to many fence devices.

The Solution

This simple solution to this problem is to use a second fence method. A popular alternative fence method is to use one or more switched PDUs. These allow the other node to cut the power outlets feeding the target nodes' power supplies. In this example, we will show how two separate PDUs, each powering either side of a node's redundant PSU, a common configuration where two separate power rails are used for power fault tolerance.

To provide total redundancy, when two switches are available, the PDU(s) can be connected to the second switch. This will ensure that the backup fence method is available should the primary switch fail completely. This requires a more complex network configuration that is outside the scope of this mini-tutorial, however.

Ordering

We prefer to use IPMI fencing because when it does work, and when it confirms that a node is off, we can be certain that the fence action was successful. The switched PDUs, on the other hand, confirm the fence action was successful when the requested power outlets are opened. If a user moved the node's power cable(s) after the fencing was configured, the fence action may return a "success" when the node did not actually power off.

For this reason, we will want to ensure that the IPMI fence method is used when possible. We only want to fall back to the PDU-based fencing if IPMI fails.

Implementation

Implementing ordered, grouped fencing requires two steps;

  1. The fence methods need to be defined. The IPMI will have a single "reboot" action defined. Each outlet on each PDU will have two actions defined; An "off" and an "on" action.
  2. Fencing topology (levels) will be configured to say, "Use the IPMI interface first, but if it fails, call both PDUs and only consider the PDU fence method a success if both PDUs successfully fence".

How to accomplish this depends on the toolset you are using. See one of the following as appropriate:

Personal tools