| Red Hat Docs > Manuals > Red Hat Enterprise Linux Manuals > |
The information in the following sections can assist in the management of the cluster software configuration.
A cluster uses several intra-cluster communication mechanisms to ensure data integrity and correct cluster behavior when a failure occurs. The cluster uses these mechanisms to:
Control when a system can become a cluster member
Determine the state of the cluster systems
Control the behavior of the cluster when a failure occurs
The cluster communication mechanisms are as follows:
Quorum disk partitions
Periodically, each cluster system writes a timestamp and system status (UP or DOWN) to the primary and backup quorum partitions, which are raw partitions located on shared storage. Each cluster system reads the system status and timestamp that were written by the other cluster system and determines if they are up to date. The cluster systems attempt to read the information from the primary quorum partition. If this partition is corrupted, the cluster systems read the information from the backup quorum partition and simultaneously repair the primary partition. Data consistency is maintained through checksums and any inconsistencies between the partitions are automatically corrected.
If a cluster system reboots but cannot write to both quorum partitions, the system will not be allowed to join the cluster. In addition, if an existing cluster system can no longer write to both partitions, it removes itself from the cluster by shutting down.
Remote power switch monitoring
Periodically, each cluster system monitors the health of the remote power switch connection, if any. The cluster system uses this information to help determine the status of the other cluster system. The complete failure of the power switch communication mechanism does not automatically result in a failover.
Ethernet and serial heartbeats
The cluster systems are connected together by using point-to-point Ethernet and serial lines. Periodically, each cluster system issues heartbeats (pings) across these lines. The cluster uses this information to help determine the status of the systems and to ensure correct cluster operation. The complete failure of the heartbeat communication mechanism does not automatically result in a failover.
If a cluster system determines that the quorum timestamp from the other cluster system is not up-to-date, it will check the heartbeat status. If heartbeats to the system are still operating, the cluster will take no action at this time. If a cluster system does not update its timestamp after some period of time, and does not respond to heartbeat pings, it is considered down.
Note that the cluster will remain operational as long as one cluster system can write to the quorum disk partitions, even if all other communication mechanisms fail.