[Linux-cluster] fenced(8) not fencing through all fenceing devices
David Teigland
teigland at redhat.com
Fri Feb 2 15:15:13 UTC 2007
On Fri, Feb 02, 2007 at 11:31:21AM +0100, Miroslav Zubcic wrote:
> David Teigland wrote:
>
> > I think you want something like this instead:
> >
> > <fence>
> > <method name="1">
> > <device name="pwr01" option="off" port="1" ../>
> > <device name="pwr02" option="off" port="1" ../>
> > <device name="pwr01" option="on" port="1" ../>
> > <device name="pwr02" option="on" port="1" ../>
> > </method>
> > </fence>
>
> Yes! It works. I just tried two devices in one method, and then
> triggered fencing with "ip link set dev bond0 down" on one node ...
> three times just in case. It works.
>
> > There are two problems with your config:
>
> > 1. You have both devices in separate methods. A second method is only
> > tried if the first fails.
>
> I didn't know there is a option to define both devices in one method.
> This system-config-cluster tool which I'm usually using to create
> initial configuration/skeleton for a new cluster setup doesn't have such
> option, man page cluster.conf(5) and PDF documentation fails to describe
> all possible config parameters, I concluded ad hoc, that declaring two
> devices in one method will be error. Well, ok, now I see that it isn't.
>
> > 2. You're using the default "reboot" option which isn't reliable with dual
> > power supplies. The first port may come back on before the second is
> > turned off. So, you need to turn both ports off (ensuring the power is
> > really off) before turning either back on.
>
> I have configured outlet on APC switches like this:
>
> 1- Outlet Name : Outlet 1
> 2- Power On Delay(sec) : 4
> 3- Power Off Delay(sec): 1
> 4- Reboot Duration(sec): 6
> 5- Accept Changes :
>
> So I didn't used undocumented options off/on in cluster.conf(5), 6
> second is enough for two telnet actions from fence_apc(8) I think.
>
> It would be really nice if man(1) pages are up2date eh?
>
> > You may still have a minor problem, though, because in the two-node
> > cluster mode, a cluster partition will result in both nodes trying to
> > fence each other in parallel.
>
> I have discussed this issue on this list earlier. In RHEL 3 there was
> "tiebreaker IP address" option which dissapeared in RHEL 4 cluster, so
> we have well known "split brain" cluster condition.
>
> It wolud be really nice if fenced(8) checks ethernet link condition
> before deciding to fence partner in two-node cluster. Somehow, two-node
> clusters are very popular in my country.
Have you looked into qdisk yet? It's new and might help in this area.
> > With a single power supply this works fine
> > because one node will always be turned off before it can turn off the
> > other. But, with dual power supplies you can get both nodes turning off
> > one power port on the other, although only one of the nodes should succeed
> > in turning off the second power port. i.e. the winner of the fencing race
> > may end up with one of its power ports turned off. Whether this is a big
> > problem, I don't know.
>
> Yes, this is fourth time - fourth cluster installation, and I always
> have this problem. Weather machines have single or dual power supply.
>
> I have workaround for this:
>
> I creat bonding interface with all physical ethernet ports in it.
> Then, I configure vlanX interface with bond0 as base. On Cisco ethernet
> switch, I configure main ethernet segment untagged, and vlanX ethernet
> segment as tagged. On main ethernet there is data network, VIP addresses
> etc, but connection with fence devices (APC, WTI, iLO, RSA II ...) are
> in encapsulated vlan interface. In that way, while the last physical
> ethernet is functional and working, node is not fenced. If the last
> ethernet in bonding aggregation fails, node is fenced, but it doesn't
> have a chance to fence other node, because L2 layer + network is on the
> same physical devices where main link is.
More information about the Linux-cluster
mailing list