From ClusterLabs

System health is a feature which allows resources in a system to implicitly include a score which indicates the health of a node.

This feature is implemented in two parts. The first part consists of a change in pacemaker. The second part consists of health daemons setting health attributes.

Changes in Pacemaker

Pacemaker's policy engine will include a number of configuration entries. The first is node-health-strategy. The possible values for this key are:

  • none
  • migrate-on-red
  • only-green
  • progressive
  • custom

none is the default value. This setting will have no effect on weight calculations within Pacemaker.

The next three values (migrate-on-red, only-green, and progressive) will have the following effect on weight calculations within Pacemaker. Every resource which is defined within Pacemaker will now search for attributes in a node that start with #health. Examples would include #health, #health-ipmi, #health-smart, #health-foo-bar, et cetera. An attribute can have the following values:

  • red
  • yellow
  • green
  • integer value

Each attribute in a node starting with #health will be summed up with whatever other weights that are defined for resources in the system. The weights will determine on which node a resource will run.

Now the differences between migrate-on-red, only-green, and progressive are as follows:

  • migrate-on-red - red will have a value of -INF, yellow and green will have values of 0.
  • only-green - red and yellow will have values of -INF, green will have a value of 0.
  • progressive - red, yellow, and green will take their values from the corresponding policy engine settings:
    • node-health-red (Note: the default is -INF)
    • node-health-yellow (Note: the default is 0)
    • node-health-green (Note: the default is 0)

custom indicates to Pacemaker that the system administrator will define rules to include whichever health attributes that they deem appropriate for their setup.

Health Daemons

A health daemon is a program that will periodically query or listen to events about the health status of a system. When it detects changes in the health, it will notify Pacemaker via the attrd_updater command.

Some mechanisms which report the status about the health of a system include:

  • iBMC (Integrated Baseboard Management Controller)
  • /var/log/mcelog
  • /var/log/messages
  • RSA2 (Remote Supervisor Adapter 2)
  • sysfs (Linux kernel filesystem)
  • SMART (Self-Monitoring, Analysis, and Reporting Technology)