[Linux-cluster] IP-based tie-breaker on a 2-node cluster?

gordan at bobich.net gordan at bobich.net
Thu Apr 17 15:55:43 UTC 2008


On Thu, 17 Apr 2008, Andrew Lacey wrote:

> I am doing some testing on a 2-node, active/standby RHEL 4 cluster with
> non-GFS shared storage. I am using HP iLO for fencing. I don't have a
> quorum disk set up. Both cluster nodes are connected to the same switch,
> and that network path is used for cluster communication as well as general
> network communication (including access to iLO). I've found that when the
> switch goes down and comes back up, the result is not desirable. As soon
> as the switch loses power, each node starts trying to fence the other.
> Since the iLO is not reachable, this is unsuccessful, but the nodes keep
> retrying the fence. When the switch comes back online, the "OK Corral"
> scenario takes place -- both nodes fence each other simultaneously and
> bring down the cluster.

I had a similar issue, but the solution I went for is doctoring the 
fencing agent to put in a delay based on node's priority in to the fencing 
daemon. That way the nodes wouldn't try to fence simultaneously, but in a 
staggered fashion.

If you have a spare NIC, and the nodes are next to each other, you could 
make them use a cross-over cable for their cluster communication, so they 
would notice that they are both still up even when the switch dies. That's 
what I do.

Gordan




More information about the Linux-cluster mailing list