[Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding
Maciej Bogucki
maciej.bogucki at artegence.com
Mon Jun 2 06:24:12 UTC 2008
doobs72 _ wrote:
>
> Hi
>
>
>
> I’m having fencing problems in my 3 node cluster running on
> RHEL5.0 which involves bonding.
>
>
>
> I have 3 severs A, B & C in a cluster with bonding configured on eth2
> & eth3 for my cluster traffic. The config is as below:
>
>
>
> DEVICE=eth2
>
> BOOTPROTO=none
>
> ONBOOT=yes
>
> TYPE=Ethernet
>
> MASTER=bond1
>
> SLAVE=yes
>
> USRCTL=no
>
>
>
> DEVICE=eth3
>
> BOOTPROTO=none
>
> ONBOOT=yes
>
> TYPE=Ethernet
>
> MASTER=bond1
>
> SLAVE=yes
>
> USRCTL=no
>
>
>
>
>
> DEVICE=bond1
>
> IPADDR=192.168.x.x
>
> NETMASK=255.255.255.0
>
> NETWORK=192.168.x.0
>
> BROADCAST=192.168.x.255
>
> ONBOOT=YES
>
> BOOTPROTO=none
>
>
>
> The /etc/modprobe.conf file is configured as below:
>
>
>
> alias eth0 bnx2
>
> alias eth1 bnx2
>
> alias eth2 e1000
>
> alias eth3 e1000
>
> alias eth4 e1000
>
> alias eth5 e1000
>
> alias scsi_hostadapter cciss
>
> alias bond0 bonding
>
> options bond0 miimon=100 mode=active-backup max_bonds=3
>
> alias bond1 bonding
>
> options bond1 miimon=100 mode=active-backup
>
> alias bond2 bonding
>
> options bond2 miimon=100 mode=active-backup
>
> alias scsi_hostadapter1 qla2xxx
>
> alias scsi_hostadapter2 usb-storage
>
>
>
>
>
> The cluster starts up OK, however when I try to test the bonded
> interfaces my troubles begin.
>
> On Node C if I "ifdown bond1", the node C, is fenced and everything
> works as expected.
>
>
>
> However if on Node C, I take down the interfaces one at a time i.e.
>
> "ifdown eth2", - the cluster stays up as expected using eth3 for
> routing traffic
>
> "ifdown eth3"
>
> then node C is fenced by Node A. However in the /var/log/messages file
> on Node C I see a message saying that Node B will be fenced. The
> outcome is Nodes C & B are fenced.
>
>
>
> My question is why does node B get fenced as well?
>
>
Hello,
First of all, You have the problem with bonding. Switch off the cluster,
and investigate why when You do "ifdown eth3" the cluster goes down. I
suspect that the problem is with e1000 driver.
I suppose that C is the master of the cluster and it is faster than
election of new master(of A,B).
You could identify the master by: i=`cman_tool services | grep -A 1
default | tail -1 | sed -e 's/\[\(.\).*/\1/'`; cman_tool nodes | awk
'{print $1,$5}' | grep "^$i"
To resolve this issue You need to use more than one communication medium
fe. ethernet or disk quorum if You have one?
Best Regards
Maciej Bogucki
More information about the Linux-cluster
mailing list