Re: [Linux-cluster] Starter Cluster / GFS

Digimer wrote:
On 10-11-10 10:29 PM, Jankowski, Chris wrote:

Digimer wrote:
Both partitions will try to fence the other, but the slower will lose and get fenced before it can fence.
Well, this is certainly not my experience in dealing with modern rack mounted or blade servers where you use iLO (on HP) or DRAC (on Dell).

What actually happens in two node clusters is that both servers issue the fence request to the iLO or DRAC. It gets processed and *both* servers get powered off.  Ouch!!  Your 100% HA cluster becomes 100% dead cluster.

That is somewhat frightening. My experience is limited to stock IPMI and
Node Assassin. I've not seen a situation where both die. I'd strongly
suggest that a bug be filed.

It's actually fairly predictable and quite common. If the nodes lose connectivity to each other but both are actually alive (e.g. cluster service switch failure), you will get this sort of a shoot-out. The cause is that most out-of-band power-off mechanisms have an inherent lag of several seconds (i.e. it can be a few seconds between when you issue a power-off command and the machine actually powers off). During that race window, both machines may issue a remote power-off before they actually shut down themselves.


