[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] IP-based tie-breaker on a 2-node cluster?

On Thu, 17 Apr 2008, Andrew Lacey wrote:

I am doing some testing on a 2-node, active/standby RHEL 4 cluster with
non-GFS shared storage. I am using HP iLO for fencing. I don't have a
quorum disk set up. Both cluster nodes are connected to the same switch,
and that network path is used for cluster communication as well as general
network communication (including access to iLO). I've found that when the
switch goes down and comes back up, the result is not desirable. As soon
as the switch loses power, each node starts trying to fence the other.
Since the iLO is not reachable, this is unsuccessful, but the nodes keep
retrying the fence. When the switch comes back online, the "OK Corral"
scenario takes place -- both nodes fence each other simultaneously and
bring down the cluster.

I had a similar issue, but the solution I went for is doctoring the fencing agent to put in a delay based on node's priority in to the fencing daemon. That way the nodes wouldn't try to fence simultaneously, but in a staggered fashion.

If you have a spare NIC, and the nodes are next to each other, you could make them use a cross-over cable for their cluster communication, so they would notice that they are both still up even when the switch dies. That's what I do.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]