Issue #9 July 2005

Open source high-availability clustering

Introduction

As Linux is used more widely for mission-critical applications, support for high availability through application failover is becoming more important. Improving Linux high availability involves employing both hardware and software technologies, including:

  • Server redundancy, including application failover and server clustering
  • Storage redundancy, including RAID and I/O multipathing
  • Network redundancy
  • Power system redundancy

These features provide a way to achieve scalable performance and high availability at low cost. In this article, we focus on the open source Red Hat Cluster Manager application failover software package, describing its basic principles and operation. In addition, we outline how increasing levels of availability (at increasing cost) can be achieved with Linux using Cluster Manager and related redundancy techniques.

Architecture and operation

Red Hat Cluster Manager is an application failover software package that allows a group of connected Linux servers (known as a cluster) to run the same application. Cluster Manager can automatically detect when certain faults have occurred (such as a server or network failure) that prevent an application, server daemon, or shared file service from running. It can then restart that application or service on another server in the cluster. Cluster manager can also be used to shut down an application on one server and then restart this same application on another server in the cluster, a process known as application migration. A group of servers in the cluster that can run the same application is known as a failover domain.

Cluster Manager can be used to improve the availability and simplify the management of database (such as Oracle and MySQL), file serving (NFS and CIFS protocols), and web serving (such as Apache) applications. It uses standard networking, shared storage, and server management technology to monitor the status of servers and networks to insure that an application runs on only one active and available server at a time. Node fencing is used to insure that a node that is not communicating with other nodes (and hence is no longer part of the cluster) can no longer run an application or access shared storage until its cluster membership has been restored. A daemon runs on each node in the cluster to monitor cluster status and synchronizing configuration information between cluster nodes so that at any point in time, all nodes have the same view of cluster membership and system state.

Cluster manager provides application availability by grouping applications and their required resources together into a cluster service. A cluster service is made up of cluster resources, components that can be failed over from one node to another, including an IP address, an application initialization script, and a shared storage partition (such as a local file system on a shared disk, shared cluster file system, or network file system like NFS).

After you add a cluster service, the cluster management software stores the information in a cluster configuration file, and the configuration data is aggregated to all cluster nodes using the Cluster Configuration System (CCS), a set of daemons running on each cluster node that allows retrieval of changes to the XML-based configuration file. Red Hat Cluster Manager allows transparent client access to cluster services on any node in the cluster.

To accomplish application failover and migration while preserving data integrity, Cluster Manager nodes maintain group membership via node heartbeats. Each node sends heartbeat signals to other nodes that say, in effect, "I am still functioning properly and my network connection to you is still intact." If a cluster node can no longer heartbeat other nodes, then it is fenced. The node is rebooted and no longer accesses shared storage, and the applications that had been running on it are migrated to another node in the cluster. The set of nodes allowed to run a particular cluster service can be restricted to a subset known as a failover domain.

Figure 1. Basic cluster shows the basic structure of a Red Hat Enterprise Linux cluster using Cluster Manager. A server cluster tier, configured with and administered via Cluster Manager, accesses shared storage via IP (shared NFS or iSCSI volume mounts) or Fibre Channel (shared GFS or ext3fs file system mounts). An application client tier can access a cluster service on any machine in the server cluster tier. If a node in this tier stops heartbeating other cluster members, it is fenced, and the cluster services executing on it are migrated to other nodes in its failover domain in the server cluster tier.

Basic cluster
Figure 1. Basic cluster

Configuring a cluster

The Red Hat Cluster Suite manual set provides a detailed description for configuring and administering Cluster Manager. We only summarize the major steps in this section and the next. The Cluster Manager configuration file (/etc/cluster/cluster.conf) is an XML-format file created using the Cluster Configuration Tool. (Red Hat recommends that this file be created and modified only with the Cluster Configuration Tool, never through manual editing.) The configuration steps in the tool are as follows:

Cluster nodes
Add members to the cluster and optionally configure a power controller connection for any given cluster member. This step defines which nodes on the same subnet (all cluster members must be on the same subnet) are included in the cluster.
Fence devices
Establish one or more devices or methods to control each node in a cluster, which maintains cluster availability and integrity. Red Hat supports a variety of techniques and hardware for fencing, including APC power switches and built-in server management hardware like HP's Integrated Lights Out system.
Failover domains
Configure one or more subsets of cluster nodes used to run a service in the event of a node failure. A cluster service with a defined failover domain will only be run on cluster nodes in that domain, and no others.
Resources
Configure resources to be managed by the system. Choose from the available list of file systems, IP addresses, NFS mounts and exports, and user-created scripts. Configure them individually.
Services
Once cluster resources, nodes, and failover domains are defined, it is then possible to combine them into cluster services using the Cluster Configuration Tool.
Note:
Running the Cluster Configuration Tool for the first time causes the cluster configuration file /etc/cluster/cluster.conf to be created automatically.

Administering a cluster

As shown in Figure 2. Cluster Status Tool, you can use the Cluster Status Tool to enable, disable, restart, or relocate a service. To enable a service, select the service in the Services area and click Enable. To disable a service, select the service in the Services area and click Disable. To restart a service, select the service in the Services area and click Restart. To move a service from one member to another, disable the service and drag it to another member. Dragging it to another member automatically starts the service on the new member.

Cluster Status Tool
Figure 2. Cluster Status Tool

Displaying cluster and service status

Monitoring cluster and application service status can be accomplished using the following tools:

  • The clustat command
  • Log file messages
  • The cluster monitoring GUI

Cluster and service status includes the following information:

  • Cluster member system status
  • Heartbeat channel status
  • Service status and which cluster system is running the service or owns the service

Cluster node member status falls into two classes:

Online
The member system is communicating with other member systems and accessing the quorum partitions.
Inactive
The member system is unable to communicate with the other member system.

A cluster service can have several states, including the following:

Running
The service resources are configured and available on the cluster system that owns the service.
Pending
The service has failed on a member and is pending start on another member.
Disabled
The service has been disabled and does not have an assigned owner.
Stopped
The service is not running; waiting for a member capable of starting service.
Failed
The service has failed to start, and the cluster cannot successfully stop the service.

It is possible to display a snapshot of the current cluster status from a shell prompt by invoking the clustat utility. For example, for a two-node cluster with nodes tng3-2 and tng3-1 (both online) with a failed web server service and a running email service, the clustat command would output the text:

Member Status: Quorate, Group Member 
Member Name State ID 
------ ---- ----- -- 
tng3-2 Online 0x0000000000000002 
tng3-1 Online 0x0000000000000001 
Service Name Owner (Last) State 
-------- ----- ----- ------ ----- 
webserver (tng3-1 ) failed 
email tng3-2 started

To monitor the cluster and display cluster status at specific time intervals from a shell prompt, the clustat command can be used with the -i time option, where time specifies the number of seconds between status snapshots.

In this article, it's important to keep in mind we can only highlight the basic steps in configuring and administering Red Hat Cluster Manager in a Red Hat Enterprise Linux cluster. Consult the Cluster Manager manual for detailed instructions for this process.

Cluster Manager applications

Cluster Manager provides an infrastructure for high availability for nearly any application. Support for the Apache web server and NFS and Samba file services comes built into Cluster Manager and Red Hat Enterprise Linux. A key component for an application deployment with Cluster Manager is a script to manage starting and stopping the application on a cluster node. For NFS, Samba, and Apache, these scripts are part of the standard Linux init and shutdown processes executed when a machine boots or is shut down. For an Apache web server, these script sequences are found in the file /etc/rc.d/init.d/httpd, while for NFS and Samba the appropriate script files are /etc/rc.d/init.d/nfsd and /etc/rc.d/init.d/smbd, respectively. Scripts can be developed for other applications by consulting the appropriate documents outlining the construction of init scripts, including the Red Hat Enterprise Linux System Administration Guide.

As an example, here are the steps necessary to configure Apache for use with Cluster Manager:

  • Make sure Apache is installed and configured (IP addresses, file systems, etc.) on each cluster member hosting Apache as a cluster service
  • For the file system to be mounted with the HTTP files to be served, do not add this mount information to the /etc/fstab file because only the cluster software can mount and unmount file systems used in a service
  • Execute the Linux command chkconfig --del httpd to remove Apache from the boot sequence: Apache startup and shutdown will be controlled by Cluster Manager instead
  • From the Cluster Configuration Tool, perform the following steps:
    • Add the init script for the Apache HTTP Server service
    • Add a device for the Apache HTTP Server content files and/or custom scripts
    • Add an IP address for the Apache HTTP Server service
    • Create the Apache HTTP Server service

The following commands, respectively, can then be used to start and stop the Apache HTTP Server cluster service on the cluster nodes.

service httpd start
service httpd stop

Summary

Red Hat Cluster Manager has evolved as a component of Red Hat Cluster Suite, which includes both Cluster Manager and the Linux Virtual Server (LVS) for IP load balancing. Originally included as part of Red Hat Enterprise Linux AS 2.1, Cluster Suite is a separate layered product in Red Hat Enterprise Linux 3 and 4. The Red Hat Enterprise Linux 4 release of Cluster Suite includes significant technical advances over previous versions, including support for a much larger number of cluster members and low-cost non-shared storage configurations.

  Separate product from Enterprise Linux Shared cluster infrastructure with GFS Cluster Logical Volume Manager support Shared (SAN or multi-port SCSI) required? Maximum number of nodes
Red Hat Enterprise Linux AS 2.1 No No No Yes 8
Red Hat Enterprise Linux 3 Yes No No Yes 8
Red Hat Enterprise Linux 4 Yes Yes Yes No 300
Table 1. Red Hat Cluster Suite

Further reading

A good resource for learning more about designing and configuring high-availability systems is the book Blueprints for High Availability by Evan Marcus and Hal Stern. Additional resources include:

About the authors

From 1990 to May 2000, Matthew O'Keefe taught and performed research in storage systems and parallel simulation software as a professor of electrical and computer engineering at the University of Minnesota. He founded Sistina Software in May of 2000 to develop storage infrastructure software for Linux, including the Global File System (GFS) and the Linux Logical Volume Manager (LVM). Sistina was acquired by Red Hat in December 2003, where Matthew now directs storage software strategy.

John Ha is currently the Technical Lead for the Red Hat Product Documentation group. He also writes and maintains the Red Hat Cluster Suite documentation. He has been an avid Linux user since Red Hat Linux 6.2 and has finally convinced his girlfriend to switch from Minesweeper to Gnome Mines for security reasons.