Update Resource Agent for OCF 1.1

The OCF 1.1 Resource Agent API standard released in 2021 offers new features for resource agents. This document describes how to update an existing OCF 1.0 compliant resource agent to be compatible with OCF 1.1.

Meta-Data

Version

The only required step for OCF 1.1 support is to update the <version> element in the top level of meta-data:

<version>1.1</version>

That's it! Everything else is optional.

Note that <version> is the OCF standard that the agents supports; the version attribute of the <resource-agent> element is for the version of the agent itself (and can be any value desired).

Description

In the OCF 1.0 standard, there was no place for a description of the agent itself. OCF 1.1 adopted the already-common practice of using <longdesc> and <shortdesc> elements in the top level of the meta-data for this purpose. If you don't already use them, add them if desired. Example:

<longdesc lang="en">
This is a long description of a theoretical resource agent that doesn't really exist. You could say whatever you want about its purpose here. The short description below is, well, a short description.
</longdesc>
<shortdesc lang="en">Super-duper resource agent that does everything</shortdesc>

Unique parameters

The unique attribute in parameters is now deprecated. You can keep it if you want to be compatible with older software that looks for it, but removing it is recommended.

Instead, add unique-group attributes for every set of parameters that should be unique for each instance of the resource agent. Here is an example (just the relevant portions) of an agent that requires an IP address and port combination that must be unique (the value "address" is arbitrary):

<parameters>
   <parameter name="ip" unique-group="address">
      ...
   </parameter>
   <parameter name="port" unique-group="address">
      ...
   </parameter>
   ...
</parameters>

Required parameters

Mark any required parameters (those the user must specify) with the new required="1" attribute.

Deprecated parameters

Mark any deprecated parameters with the new <deprecated> child element, which may optionally contain <replaced-with> child elements indicating parameters that should be used instead, and <desc> child elements explaining the deprecation for users (potentially with multiple translations). Example:

<parameter name="foo">
  <deprecated>
    <replaced-with name="mode"/>
    <desc lang="en">Don't use foo, it's bad.</desc>
    <desc lang="cs">Nepoužívej foo, sic to schytáš.</desc>
  </deprecated>
<longdesc lang="en">
Whether the example daemon should operate with foo factor
</longdesc>
<shortdesc lang="en">Foo factor</shortdesc>
<content type="string" />
</parameter>

Enumerated parameter values

If you have any parameters that take specific values, you can now enumerate those values instead of allowing free-form text. Example:

<parameter name="mode">
<longdesc lang="en">
The mode the example daemon should operate in. Allowed values are "dry-run" and
"live".
</longdesc>
<shortdesc lang="en">Run mode</shortdesc>
<content type="select" default="live">
  <option value="dry-run" />
  <option value="live" />
</content>
</parameter>

Reloadable parameters

OCF 1.1 supports the concept of reloadable parameters, which is the same as how Pacemaker used the now-deprecated unique attribute.

If a parameter value can be changed without requiring a full stop and start of the service itself, mark the parameter with the new reloadable="1" attribute. This is not related to reloading the service itself, just the agent parameter values.

An example might be a web server agent that can use one of several clients to check the server status. The parameter that specifies the client can be changed without restarting the web server itself, so it could be marked as reloadable. The user can change the value of that parameter with no downtime for the web server.

If you mark any parameters as reloadable, you also have to implement a reload-agent action as described below, and advertise the action in meta-data.

Actions

notify

Pacemaker implemented an extension to OCF 1.0 for clone resources (which can run on multiple cluster nodes at the same time). These resource agents could optionally receive notifications before and after resource actions on any instance, via the notify action.

OCF 1.1 has adopted the notify action, but left its behavior undescribed. Continue using it or not as desired.

promote and demote

Another Pacemaker extension was promotable resources (clones whose instances can run in one of two modes). The start and demote actions bring an instance to the default mode, and the promote action brings the instance to the special mode. OCF 1.1 adopts these actions.

A major difference from the older Pacemaker implementation is that the role names are now Unpromoted and Promoted rather than Master and Slave. Newer versions of Pacemaker support both sets of names.

If your agent already implements promotable clones, update any mentions of the role names. The agent won't be able to support both the old and new names, because only one set can be advertised in monitor action meta-data. If you advertise the old names, advertise OCF 1.0 support; if you advertise the new names, advertise OCF 1.1 support.

reload and reload-agent

The reload action previously had conflicting uses; most resource agents used it to reload the service itself, while Pacemaker used it to reload agent parameters.

In OCF 1.1, the reload action is now reserved for reloading the service itself. For example, if the service can re-read its configuration file after receiving a signal, the reload action can send that signal. This is equivalent to how init scripts and systemd unit files use reload.

The new reload-agent action is for making effective any changes in parameters marked reloadable. Many times this will be a no-op -- in the earlier example of a web server agent that has a reloadable parameter for which client to use to contact the web server, nothing special needs to be done if that parameter is changed (the agent will simply use the new value the next time it needs to contact the web server). A different example might be a database agent with a reloadable parameter for whether the database is in read-only or read/write mode; the agent might contact the database server with a client to change the mode, which would be much quicker (and have no downtime) compared to a full database restart.

OCF_OUTPUT_FORMAT

In OCF 1.1, agents may optionally support displaying output in multiple formats. The desired format will be passed via the OCF_OUTPUT_FORMAT environment variable. The specific formats supported are left to the agent, as are the values used to identify them (it is recommended to use "text" for human-readable text and "xml" for XML, if supported).

Following existing practice, the meta-data action must default to using XML output, and all other actions must default to text. It is totally up to you whether to support anything else.

Mainly this is expected to be used for the validate-all action, to be able to return XML for better machine parsing. However the XML schema has not been standardized, so this will be an area of experimentation in the near future.

OCF_CHECK_LEVEL

OCF 1.0 and 1.1 both support the OCF_CHECK_LEVEL environment variable for the monitor action, to determine the depth (service impact) of check done.

OCF 1.1 extends this to the validate-all action as well. If not specified or 0, only syntax and consistency checks should be done (for example, verifying that a parameter value is an integer if that's appropriate). If 10, the agent may additionally verify the suitability of the local host (for example, that a necessary directory exists).

Exit statuses

The meaning of a couple of exit statuses has been clarified:

  • OCF_ERR_ARGS (2): parameters are invalid in the context of the local host (such as a nonexistent configuration file)
  • OCF_ERR_CONFIGURED (6): parameters are internally invalid (such as a string given where only an integer is allowed)

In addition, new exit statuses that were Pacemaker extensions have been adopted:

  • OCF_RUNNING_PROMOTED (8): properly running in the promoted role
  • OCF_FAILED_PROMOTED (9): failed in the promoted role
  • OCF_RUNNING_DEGRADED (190): properly running but failure is more likely in the near term
  • OCF_PROMOTED_DEGRADED (191): properly running in the promoted role but degraded

The symbolic names for these new statuses might or might not be defined by shell include files, so be aware of what includes you are using. If you want to maintain compatibility with older includes, you can define each symbol you need if it's not already defined, like:

: ${OCF_RUNNING_PROMOTED:=8}

Pacemaker-specific changes for promotable clones

Pacemaker implements a number of extensions to the OCF standard. Pacemaker 2.1.0 and later make significant changes to these extensions with regards to promotable clones, so if you have an existing agent that supports promotable clones, these will affect you:

  • Pacemaker now provides resource agents with new environment variables (in addition to the existing ones) for promotable clone notifications, with master replaced with promoted and slave replaced with unpromoted. For example, OCF_RESKEY_CRM_meta_notify_unpromoted_resource will be identical to OCF_RESKEY_CRM_meta_notify_slave_resource. Use the new names in your agent. If you want to stay compatible with older Pacemaker versions, put something like this all on one line near the top of your agent for each relevant variable the agent uses:
: ${OCF_RESKEY_CRM_meta_notify_unpromoted_resource:=OCF_RESKEY_CRM_meta_notify_slave_resource}
  • The crm_master command has been deprecated and replaced with a new crm_attribute --promotion option that defaults to --lifetime=reboot (example: crm_master -l reboot -v 10 becomes crm_attribute --promotion -v 10. The old command will still work for now, but the new one should be used if available.

The ocf-shellfuncs include file from the resource-agents project might add some wrappers to simplify the above.