[Linux-cluster] fenced(8) not fencing through all fenceing devices

David Teigland teigland at redhat.com
Fri Feb 2 15:15:13 UTC 2007


On Fri, Feb 02, 2007 at 11:31:21AM +0100, Miroslav Zubcic wrote:
> David Teigland wrote:
> 
> > I think you want something like this instead:
> > 
> > <fence>
> > 	<method name="1">
> > 		<device name="pwr01" option="off" port="1" ../>
> > 		<device name="pwr02" option="off" port="1" ../>
> > 		<device name="pwr01" option="on" port="1" ../>
> > 		<device name="pwr02" option="on" port="1" ../>
> > 	</method>
> > </fence>
> 
> Yes! It works. I just tried two devices in one method, and then
> triggered fencing with "ip link set dev bond0 down" on one node ...
> three times just in case. It works.
> 
> > There are two problems with your config:
> 
> > 1. You have both devices in separate methods.  A second method is only
> > tried if the first fails.  
> 
> I didn't know there is a option to define both devices in one method.
> This system-config-cluster tool which I'm usually using to create
> initial configuration/skeleton for a new cluster setup doesn't have such
> option, man page cluster.conf(5) and PDF documentation fails to describe
> all possible config parameters, I concluded ad hoc, that declaring two
> devices in one method will be error. Well, ok, now I see that it isn't.
> 
> > 2. You're using the default "reboot" option which isn't reliable with dual
> > power supplies.  The first port may come back on before the second is
> > turned off.  So, you need to turn both ports off (ensuring the power is
> > really off) before turning either back on.
> 
> I have configured outlet on APC switches like this:
> 
>      1- Outlet Name         : Outlet 1
>      2- Power On Delay(sec) : 4
>      3- Power Off Delay(sec): 1
>      4- Reboot Duration(sec): 6
>      5- Accept Changes      :
> 
> So I didn't used undocumented options off/on in cluster.conf(5), 6
> second is enough for two telnet actions from fence_apc(8) I think.
> 
> It would be really nice if man(1) pages are up2date eh?
> 
> > You may still have a minor problem, though, because in the two-node
> > cluster mode, a cluster partition will result in both nodes trying to
> > fence each other in parallel.
> 
> I have discussed this issue on this list earlier. In RHEL 3 there was
> "tiebreaker IP address" option which dissapeared in RHEL 4 cluster, so
> we have well known "split brain" cluster condition.
> 
> It wolud be really nice if fenced(8) checks ethernet link condition
> before deciding to fence partner in two-node cluster. Somehow, two-node
> clusters are very popular in my country.

Have you looked into qdisk yet?  It's new and might help in this area.


> > With a single power supply this works fine
> > because one node will always be turned off before it can turn off the
> > other.  But, with dual power supplies you can get both nodes turning off
> > one power port on the other, although only one of the nodes should succeed
> > in turning off the second power port.  i.e. the winner of the fencing race
> > may end up with one of its power ports turned off.  Whether this is a big
> > problem, I don't know.
> 
> Yes, this is fourth time - fourth cluster installation, and I always
> have this problem. Weather machines have single or dual power supply.
> 
> I have workaround for this:
> 
> I creat bonding interface with all physical ethernet ports in it.
> Then, I configure vlanX interface with bond0 as base. On Cisco ethernet
> switch, I configure main ethernet segment untagged, and vlanX ethernet
> segment as tagged. On main ethernet there is data network, VIP addresses
> etc, but connection with fence devices (APC, WTI, iLO, RSA II ...) are
> in encapsulated vlan interface. In that way, while the last physical
> ethernet is functional and working, node is not fenced. If the last
> ethernet in bonding aggregation fails, node is fenced, but it doesn't
> have a chance to fence other node, because L2 layer + network is on the
> same physical devices where main link is.




More information about the Linux-cluster mailing list