[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Fencing Race Question

On Fri, 2007-10-26 at 08:21 -0700, Scott Becker wrote:
> In the FAQ, Fencing Questions
> 13. What's the right way to configure fencing when I have redundant
> power supplies?
> I'm going to setup the second example.
> My concern is about a race condition with the two devices:
> <device name="pwr01" option="off" switch="1" port="1"/>
> <device name="pwr02" option="off" switch="1" port="1"/>

> If I had only one power switch, the first node to login successfully
> turns the other off (APC with one IP address). My concern is with two
> APCs, that whoever loses the race to login to APC #1 may win the race
> to login to APC #2 and then each will turn off only one of the other's
> power supplies.

If it fails to log in to power supply #1, fencing has failed, and it
doesn't bother trying fencing option #2.  It then pauses for several
seconds before retrying.

It's not an "either-or" situation.  Both have to complete - in sequence
& successfully - or fencing has failed and the entire operation must be

> Is the fencing execution script designed to 1) perform all necessary
> fencing device logins successfully then and only then 2) issue the
> poweroff commands? Thereby averting a potential race because the loser
> of the race for APC #1 will give up and hand over device #2.

In the absolute worst case (which I think is very, very unlikely):

node1 powers off node2-PS1    node2 fails to log in to SW1
.                             node2 waits a bit
.                             node2 powers off node1-PS1
node1 powers off node2-PS1

Also, note that the "fence race" case typically applies in a partition
of the cluster network where fencing is still accessible.  This
configuration is not recommended for two node clusters; it's recommended
that the cluster communicates and fences over the same network:


Basically, the theory goes like this:

If you pull the ethernet cable on a node and the fence device
controlling the other node is on the same path, it will have some
difficulty accessing the fence device, and will usually lose the "fence

-- Lon

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]