[Linux-cluster] CS4/ question about load on Heat Beat network.

Patrick Caulfield pcaulfie at redhat.com
Mon Jan 23 08:34:10 UTC 2006


Alain Moulle wrote:
> Hi
> 
> I wonder which is the strategy in CS4 when
> the Heart Beat network is over-loaded for
> a while, so much that none of the nodes
> have responses on heart beat check.
> 
> Do all nodes in cluster decide to fence reboot
> their neighboors and succeed to do it when the
> load on network is lessening ?
> Or what ?
> Do we have any security on this point to
> avoid the fence reboot request of CS4 towards
> all nodes in the cluster, just because the
> network is over-loaded ?
> 

CMAN uses quorum to decide whether it can carry on operating after a cluster
split. If more than half of the nodes are still talking to each other then
they will have quorum and will fence the remaining nodes.

If none of the nodes can see any other node (eg ethernet switch failure) then
none of the nodes will have quorum on its own so no fencing will be done.

If you subsequently reconnect the nodes after that catastrophe they will all
drop out of the cluster as no node can be sure of the state of any other node
- to do so would endanger data. So you will need to restart cluster services
on all nodes.

-- 

patrick




More information about the Linux-cluster mailing list