Once you’ve installed a cluster using Red Hat Enterprise Linux High Availability Add-on, you might be wondering what you can do with your shiny new cluster.
There are many use-cases for a Red Hat Enterprise Linux High Availability (HA) cluster, for example, active/passive apache, active/active load-balanced apache, Highly Available NFS, and replicated pgSQL. However, before we get to examples of resources you can configure, we have to go through some basics.
In this series, we will walk-through setting up fence agents, and examples of resource agents that Red Hat ships.
If you haven’t installed the cluster packages, and set up your basic cluster, this can be quite confusing; I encourage everyone to do those steps first. In this series, we’re going to jump right into what’s next. And “what’s next” is configuring your fence and resource agents.
So, please do take a few minutes to follow this step-by-step documentation to install and add your nodes into a basic cluster.
You should now have your cluster software installed on all nodes, your network configured, and you’ve run the setup command to add your nodes into the cluster. The next thing you’ll need to do is set up your fence device(s).
Fencing is used to protect the resources on the node and fence devices are what the cluster uses to remove nodes from the cluster. The fence device is the strong-arm enforcer of the laws you’ve laid down, or that are configured by default.
By default, a node that stops communicating with the cluster will be rebooted. This is because the cluster layer cannot tell why a node stopped communicating, nor is it designed to troubleshoot what has happened to cause this condition. (And it's important that we leave a little bit of manual work for the human operators, otherwise we all welcome our Cluster Overlords. Wait...) Anyway, if it is not removed, whatever the node is doing might cause data loss or corruption.
Red Hat ships a number of fence agents that you can use to create your fence devices. To see a complete list, run the following command on your new cluster, ‘pcs stonith list’.
Commonly used fence agents include fence_ipmilan, fence_imm, and fence_ilo. They connect to various remote management systems to reboot the node on fencing. Your suspicions are correct, their names indicate what system they connect to: the fence_imm agent connects to the IMM on an IBM server, fence_ilo connects to HP’s iLo, and fence_ipmilan speaks ipmi to a number of remote management systems.
To learn more about what each fence agent, run ‘pcs stonith describe <fence_agent>’.
For our example, we are setting up fence devices that connect to the Dell Remote Access Card (DRAC) remote management console, so we are going to use fence_ipmilan:
$ sudo pcs stonith describe fence_ipmilan fence_ipmilan - Fence agent for IPMI fence_ipmilan is an I/O Fencing agent which can be used with machines controlled by IPMI.This agent calls support software ipmitool (http://ipmitool.sf.net/). WARNING! This fence agent might report success before the node is powered off. You should use -m/method onoff if your fence device works correctly with that option. Resource options: ipport: TCP/UDP port to use for connection with device port: IP address or hostname of fencing device (together with --port-as-ip) inet6_only: Forces agent to use IPv6 addresses only ipaddr: IP Address or Hostname ---snip a lot more options---
Create that Fence Device
To create your fence devices, use the following command:
$ sudo pcs stonith create <name> <type> <option>
If you are using a hypervisor, such as KVM, for your cluster fence device, you will create one fence device, the hypervisor and give it a list of the nodes in your cluster:
$ pcs stonith create virtfence_xvm fence_xvm key_file=/etc/corosync/fence_xvm.key pcmk_host_map="node1.examplerh.com:node1;node2.examplerh.com:node2;node3.examplerh.com:node3"
We are not using a hypervisor in our imaginary cluster; we are using four bare-metal machines. Therefore, we are going to set up four fence devices, one for each remote management console on each node:
$ pcs stonith create testdrac1 fence_ipmilan lanplus=1 ipaddr=clulab1.example.com login=druser passwd=drpassword pcmk_host_list=clexample1.node.com $ pcs stonith create testdrac2 fence_ipmilan lanplus=1 ipaddr=clulab2.example.com login=druser passwd=drpassword pcmk_host_list=clexample2.node.com $ pcs stonith create testdrac3 fence_ipmilan lanplus=1 ipaddr=clulab3.example.com login=druser passwd=drpassword pcmk_host_list=clexample3.node.com $ pcs stonith create testdrac4 fence_ipmilan lanplus=1 ipaddr=clulab4.example.com login=druser passwd=drpassword pcmk_host_list=clexample4.node.com
STOP -- Let's take the last invocation of this ‘pcs’ create command and break down each of the options we gave it so it makes more sense, and so this isn’t just a ‘copy and paste’ exercise:
‘stonith’ The actual name of the daemon that calls the fence agent is ‘stonithd’. That daemon has arguably the best name of any daemon, ever: STONITH - Shoot The Other Node In The Head. We are creating a stonith fence agent, hence ‘pcs stonith create’.
Next, ‘testdrac4’ is the name we’re giving the fence device. We could call it ‘elephant’, as the name doesn’t matter -- however, it should mean something to you.
‘fence_ipmilan’ is the name of the fence agent that Red Hat ships that you will be using for this fence device.
‘lanplus=1’ tells the fence device to use the more secure ‘lanplus’, instead of ‘lan’.
‘ipaddr=clulab4.example.com’ is the ip address or hostname of the server.
‘login=druser passwd=drpassword’ is the username and password on the remote management console. Instead of putting the password on the command line, you could also use the passwd_script to pull the password.
And finally, ‘pcmk_host_list=clexample4.node.com’ is the name of the node that will be fenced by this device. Note: The name of the node in the cluster configuration does not technically have to be the same as the hostname of the server. Use the name of the node as seen in the ‘pcs status’ output.
Checking and Tuning
To see the details, including some of the default options set for our new fence devices, run ‘pcs stonith show testdrac1’.
$ sudo pcs stonith show testdrac1 Resource: testdrac1 (class=stonith type=fence_ipmilan) Attributes: lanplus=1 ipaddr=clulab1.example.com l login=druser passwd=drpassword pcmk_host_list=clexample4.node.com Operations: monitor interval=60s (testdrac1-monitor-interval-60s)
As you might already suspect, next to ‘Operations’ the option ‘monitor interval=60s’ means that the cluster will run the monitor for our fence agent, testdrac1, every 60 seconds. To see what the monitor operation actually does, you can actually look at fence agent itself -- this is all open source code -- it is located under /sbin/fence_ipmilan. You can tune all of this operation options using the `pcs stonith update` command.
If you would like to change the monitor interval to every five minutes, the command is:
$ sudo pcs stonith update testdrac1 op monitor interval=300s
To see our change in-affect:
$ sudo pcs stonith show testdrac1 Resource: testdrac1 (class=stonith type=fence_ipmilan) Attributes: lanplus=1 ipaddr=clulab1.example.com l login=druser passwd=drpassword pcmk_host_list=clexample4.node.com Operations: monitor interval=300s (testdrac1-monitor-interval-300s)
And finally, to check the status of our fence device, run `pcs status`, which gives you a lot of status information for your cluster. If you want to see just the fence agent status, run `pcs stonith`
$ sudo pcs stonith testdrac1 (stonith:fence_ipmilan): Started cluexample4.node.com testdrac2 (stonith:fence_ipmilan): Started cluexample1.node.com testdrac3 (stonith:fence_ipmilan): Started cluexample2.node.com testdrac4 (stonith:fence_ipmilan): Started cluexample3.node.com
STOP - Before we go on, let’s break down this output. First, it shows the name of the fence agent, ‘testdrac4’, then it shows what kind of fence device we’re using, ‘(stonith:fence_ipmilan):’, and then it shows on what node it was started, in this case ‘cluexample3.node.com’. Note: the fence agent usually runs on a different node than the one it is in charge of fencing, so in this case the fence agent for node 4 is running on node 3.
At this point, we have our fence devices ready to protect our resources. If there were any issues with individual nodes, for example, if communication was lost to node cluexample2, the cluster could react by calling its fence agent, testdrac2, and powering the node off to prevent any potential issues with its future resources.
Coming soon - Resource Agents
In the next post in this series, we will jump into Resource Agents and show what you can do with the agents that Red Hat ships right-out-of-the-box.
In the meantime, if you would like more information about the Fence Agents we ship, our Support Policies, and Troubleshooting, please see:
Product Documentation - Fencing: Configuring STONITH
Planning Fence Configuration in a Red Hat High Availability Cluster
Virtualization Support for RHEL High Availability and Resilient Storage Clusters
Support Policies for RHEL High Availability Clusters - General Requirements for Fencing/STONITH
How to test fence devices and fencing configuration in a RHEL 5, 6, or 7 High Availability cluster?
Jennifer Scalf is a Senior Technical Account Manager at Red Hat. She has over sixteen years of Linux systems experience working in every layer of the stack. Jennifer is a subject matter expert in clustering with the Red Hat Enterprise Linux High Availability Add-on. Find more posts by Jennifer Scalf at https://www.redhat.com/en/about/blog/authors/jennifer-scalf
Innovation is only possible because of the people behind it. Join us at Red Hat Summit, May 2-4, to hear from TAMs and other Red Hat experts in person! Register now for only US$1,000 using code CEE17.
A Red Hat Technical Account Manager (TAM) is a specialized product expert who works collaboratively with IT organizations to strategically plan for successful deployments and help realize optimal performance and growth. The TAM is part of Red Hat’s world-class Customer Experience and Engagement organization and provides proactive advice and guidance to help you identify and address potential problems before they occur. Should a problem arise, your TAM will own the issue and engage the best resources to resolve it as quickly as possible with minimal disruption to your business.