United States (change)
Shortcuts: Downloads Fedora Red Hat Network
Issue #9 July 2005
As Linux is used more widely for mission-critical applications, support for high availability through application failover is becoming more important. Improving Linux high availability involves employing both hardware and software technologies, including:
These features provide a way to achieve scalable performance and high availability at low cost. In this article, we focus on the open source Red Hat Cluster Manager application failover software package, describing its basic principles and operation. In addition, we outline how increasing levels of availability (at increasing cost) can be achieved with Linux using Cluster Manager and related redundancy techniques.
Red Hat Cluster Manager is an application failover software package that allows a group of connected Linux servers (known as a cluster) to run the same application. Cluster Manager can automatically detect when certain faults have occurred (such as a server or network failure) that prevent an application, server daemon, or shared file service from running. It can then restart that application or service on another server in the cluster. Cluster manager can also be used to shut down an application on one server and then restart this same application on another server in the cluster, a process known as application migration. A group of servers in the cluster that can run the same application is known as a failover domain.
Cluster Manager can be used to improve the availability and simplify the management of database (such as Oracle and MySQL), file serving (NFS and CIFS protocols), and web serving (such as Apache) applications. It uses standard networking, shared storage, and server management technology to monitor the status of servers and networks to insure that an application runs on only one active and available server at a time. Node fencing is used to insure that a node that is not communicating with other nodes (and hence is no longer part of the cluster) can no longer run an application or access shared storage until its cluster membership has been restored. A daemon runs on each node in the cluster to monitor cluster status and synchronizing configuration information between cluster nodes so that at any point in time, all nodes have the same view of cluster membership and system state.
Cluster manager provides application availability by grouping applications and their required resources together into a cluster service. A cluster service is made up of cluster resources, components that can be failed over from one node to another, including an IP address, an application initialization script, and a shared storage partition (such as a local file system on a shared disk, shared cluster file system, or network file system like NFS).
After you add a cluster service, the cluster management software stores the information in a cluster configuration file, and the configuration data is aggregated to all cluster nodes using the Cluster Configuration System (CCS), a set of daemons running on each cluster node that allows retrieval of changes to the XML-based configuration file. Red Hat Cluster Manager allows transparent client access to cluster services on any node in the cluster.
To accomplish application failover and migration while preserving data integrity, Cluster Manager nodes maintain group membership via node heartbeats. Each node sends heartbeat signals to other nodes that say, in effect, "I am still functioning properly and my network connection to you is still intact." If a cluster node can no longer heartbeat other nodes, then it is fenced. The node is rebooted and no longer accesses shared storage, and the applications that had been running on it are migrated to another node in the cluster. The set of nodes allowed to run a particular cluster service can be restricted to a subset known as a failover domain.
Figure 1. Basic cluster shows the basic structure of a Red Hat Enterprise Linux cluster using Cluster Manager. A server cluster tier, configured with and administered via Cluster Manager, accesses shared storage via IP (shared NFS or iSCSI volume mounts) or Fibre Channel (shared GFS or ext3fs file system mounts). An application client tier can access a cluster service on any machine in the server cluster tier. If a node in this tier stops heartbeating other cluster members, it is fenced, and the cluster services executing on it are migrated to other nodes in its failover domain in the server cluster tier.
The Red Hat Cluster Suite manual set provides a detailed description for configuring and administering Cluster Manager. We only summarize the major steps in this section and the next. The Cluster Manager configuration file (/etc/cluster/cluster.conf) is an XML-format file created using the Cluster Configuration Tool. (Red Hat recommends that this file be created and modified only with the Cluster Configuration Tool, never through manual editing.) The configuration steps in the tool are as follows:
/etc/cluster/cluster.conf to be created automatically.As shown in Figure 2. Cluster Status Tool, you can use the Cluster Status Tool to enable, disable, restart, or relocate a service. To enable a service, select the service in the Services area and click Enable. To disable a service, select the service in the Services area and click Disable. To restart a service, select the service in the Services area and click Restart. To move a service from one member to another, disable the service and drag it to another member. Dragging it to another member automatically starts the service on the new member.
Monitoring cluster and application service status can be accomplished using the following tools:
clustat commandCluster and service status includes the following information:
Cluster node member status falls into two classes:
A cluster service can have several states, including the following:
It is possible to display a snapshot of the current cluster status from a shell prompt by invoking the clustat utility. For example, for a two-node cluster with nodes tng3-2 and tng3-1 (both online) with a failed web server service and a running email service, the clustat command would output the text:
Member Status: Quorate, Group Member
Member Name State ID
------ ---- ----- --
tng3-2 Online 0x0000000000000002
tng3-1 Online 0x0000000000000001
Service Name Owner (Last) State
-------- ----- ----- ------ -----
webserver (tng3-1 ) failed
email tng3-2 started
To monitor the cluster and display cluster status at specific time intervals from a shell prompt, the clustat command can be used with the -i time option, where time specifies the number of seconds between status snapshots.
In this article, it's important to keep in mind we can only highlight the basic steps in configuring and administering Red Hat Cluster Manager in a Red Hat Enterprise Linux cluster. Consult the Cluster Manager manual for detailed instructions for this process.
Cluster Manager provides an infrastructure for high availability for nearly any application. Support for the Apache web server and NFS and Samba file services comes built into Cluster Manager and Red Hat Enterprise Linux. A key component for an application deployment with Cluster Manager is a script to manage starting and stopping the application on a cluster node. For NFS, Samba, and Apache, these scripts are part of the standard Linux init and shutdown processes executed when a machine boots or is shut down. For an Apache web server, these script sequences are found in the file /etc/rc.d/init.d/httpd, while for NFS and Samba the appropriate script files are /etc/rc.d/init.d/nfsd and /etc/rc.d/init.d/smbd, respectively. Scripts can be developed for other applications by consulting the appropriate documents outlining the construction of init scripts, including the Red Hat Enterprise Linux System Administration Guide.
As an example, here are the steps necessary to configure Apache for use with Cluster Manager:
/etc/fstab file because only the cluster software can mount and unmount file systems used in a servicechkconfig --del httpd to remove Apache from the boot sequence: Apache startup and shutdown will be controlled by Cluster Manager insteadThe following commands, respectively, can then be used to start and stop the Apache HTTP Server cluster service on the cluster nodes.
service httpd start
service httpd stop
Red Hat Cluster Manager has evolved as a component of Red Hat Cluster Suite, which includes both Cluster Manager and the Linux Virtual Server (LVS) for IP load balancing. Originally included as part of Red Hat Enterprise Linux AS 2.1, Cluster Suite is a separate layered product in Red Hat Enterprise Linux 3 and 4. The Red Hat Enterprise Linux 4 release of Cluster Suite includes significant technical advances over previous versions, including support for a much larger number of cluster members and low-cost non-shared storage configurations.
| Separate product from Enterprise Linux | Shared cluster infrastructure with GFS | Cluster Logical Volume Manager support | Shared (SAN or multi-port SCSI) required? | Maximum number of nodes | |
|---|---|---|---|---|---|
| Red Hat Enterprise Linux AS 2.1 | No | No | No | Yes | 8 |
| Red Hat Enterprise Linux 3 | Yes | No | No | Yes | 8 |
| Red Hat Enterprise Linux 4 | Yes | Yes | Yes | No | 300 |
A good resource for learning more about designing and configuring high-availability systems is the book Blueprints for High Availability by Evan Marcus and Hal Stern. Additional resources include: