Chapter 8. Cluster Administration

The following chapter describes the various administrative tasks involved in maintaining a cluster after it has been installed and configured.

Displaying Cluster and Service Status

Monitoring cluster and service status can help identify and resolve problems in the cluster environment. The following tools assist in displaying cluster status:

Note that status is always from the point of view of the cluster system on which an administrator is running a tool. To obtain comprehensive cluster status, run a tool on all cluster systems.

Cluster and service status includes the following information:

The following tables describe how to analyze the status information shown by the clustat command and the cluster GUI.

Table 8-1. Member Status

Member StatusDescription
UPThe member system is communicating with the other member system and accessing the quorum partitions.
DOWNThe member system is unable to communicate with the other member system.

Table 8-2. Power Switch Status

Power Switch StatusDescription
OKThe power switch is operating properly.
WrnCould not obtain power switch status.
ErrA failure or error has occurred.
GoodThe power switch is operating properly.
UnknownThe other cluster member is DOWN.
TimeoutThe power switch is not responding to power daemon commands, possibly because of a disconnected serial cable.
ErrorA failure or error has occurred.
NoneThe cluster configuration does not include power switches.
InitializingThe switch is in the process of being initialized and its definitive status has not been concluded.

Table 8-3. Heartbeat Channel Status

Heartbeat Channel StatusDescription
OKThe heartbeat channel is operating properly.
WrnCould not obtain channel status.
ErrA failure or error has occurred.
ONLINEThe heartbeat channel is operating properly.
OFFLINEThe other cluster member appears to be UP, but it is not responding to heartbeat requests on this channel.
UNKNOWNCould not obtain the status of the other cluster member system over this channel, possibly because the system is DOWN or the cluster daemons are not running.

Table 8-4. Service Status

Service StatusDescription
runningThe service resources are configured and available on the cluster system that owns the service. The running state is a persistent state. From this state, a service can enter the stopping state (for example, if the preferred member rejoins the cluster)
disabledThe service has been disabled, and does not have an assigned owner. The disabled state is a persistent state. From this state, the service can enter the starting state (if a user initiates a request to start the service).
startingThe service is in the process of being started. The starting state is a transient state. The service remains in the starting state until the service start succeeds or fails. From this state, the service can enter the running state (if the service start succeeds), the stopped state (if the service stop fails), or the error state (if the status of the service resources cannot be determined).
stoppingThe service is in the process of being stopped. The stopping state is a transient state. The service remains in the stopping state until the service stop succeeds or fails. From this state, the service can enter the stopped state (if the service stop succeeds), the running state (if the service stop failed and the service can be started).
stoppedThe service is not running on any cluster system, does not have an assigned owner, and does not have any resources configured on a cluster system. The stopped state is a persistent state. From this state, the service can enter the disabled state (if a user initiates a request to disable the service), or the starting state (if the preferred member joins the cluster).

To display a snapshot of the current cluster status, invoke the clustat utility. For example:

clustat
Cluster Status Monitor (Fileserver Test Cluster)
07:46:05
Cluster alias: clu1alias.boston.redhat.com

===================== M e m b e r   S t a t u s =======================
  Member         Status     Node Id    Power Switch
  -------------- ---------- ---------- ------------
  clu1           Up         0          Good
  clu2           Up         1          Good

=================== H e a r t b e a t   S t a t u s ===================
  Name                           Type       Status
  ------------------------------ ---------- ------------
  clu1         <--> clu2         network    ONLINE

=================== S e r v i c e   S t a t u s =======================
                                        Last             Monitor
Restart
Service         Status   Owner          Transition       Interval Count

  ------------- -------- ------------- ---------------- ------------
  nfs1          started  clu1          16:07:42 Feb 27  15       0
  nfs2          started  clu2          00:03:52 Feb 28  2        0
  nfs3          started  clu1          07:43:54 Feb 28  90       0

To monitor the cluster and display status at specific time intervals, invoke clustat with the -i time command-line option, where time specifies the number of seconds between status snapshots.