[Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding

Mon Jun 2 06:24:12 UTC 2008

doobs72 _ wrote:
>
> Hi
>
>  
>
>  I’m having fencing problems in my 3 node cluster running on 
> RHEL5.0 which involves bonding.
>
>  
>
> I have 3 severs A, B & C in a cluster with bonding configured on eth2 
> & eth3 for my cluster traffic.  The config is as below:
>
>  
>
> DEVICE=eth2
>
> BOOTPROTO=none
>
> ONBOOT=yes
>
> TYPE=Ethernet
>
> MASTER=bond1
>
> SLAVE=yes
>
> USRCTL=no
>
>  
>
> DEVICE=eth3
>
> BOOTPROTO=none
>
> ONBOOT=yes
>
> TYPE=Ethernet
>
> MASTER=bond1
>
> SLAVE=yes
>
> USRCTL=no
>
>  
>
>  
>
> DEVICE=bond1
>
> IPADDR=192.168.x.x
>
> NETMASK=255.255.255.0
>
> NETWORK=192.168.x.0
>
> BROADCAST=192.168.x.255
>
> ONBOOT=YES
>
> BOOTPROTO=none
>
>  
>
> The /etc/modprobe.conf file is configured as below:
>
>  
>
> alias eth0 bnx2
>
> alias eth1 bnx2
>
> alias eth2 e1000
>
> alias eth3 e1000
>
> alias eth4 e1000
>
> alias eth5 e1000
>
> alias scsi_hostadapter cciss
>
> alias bond0 bonding
>
> options bond0 miimon=100 mode=active-backup max_bonds=3
>
> alias bond1 bonding
>
> options bond1 miimon=100 mode=active-backup
>
> alias bond2 bonding
>
> options bond2 miimon=100 mode=active-backup
>
> alias scsi_hostadapter1 qla2xxx
>
> alias scsi_hostadapter2 usb-storage
>
>  
>
>  
>
> The cluster starts up OK, however when I try to test the bonded 
> interfaces my troubles begin.
>
> On Node C if I "ifdown bond1", the node C, is fenced and everything 
> works as expected.
>
>  
>
> However if on Node C, I take down the interfaces one at a time i.e. 
>
>  "ifdown  eth2", - the cluster stays up as expected using eth3 for 
> routing traffic  
>
>   "ifdown eth3" 
>
> then node C is fenced by Node A. However in the /var/log/messages file 
> on Node C I see a message saying that Node B will be fenced. The 
> outcome is Nodes C & B are fenced.
>
>  
>
> My question is why does node B get fenced as well?
>
>
Hello,

First of all, You have the problem with bonding. Switch off the cluster, 
and investigate why when You do "ifdown eth3" the cluster goes down. I 
suspect that the problem is with e1000 driver.
I suppose that C is the master of the cluster and it is faster than 
election of new master(of A,B).
You could identify the master by: i=`cman_tool services | grep -A 1 
default | tail -1 | sed -e 's/\[$.$.*/\1/'`; cman_tool nodes | awk 
'{print $1,$5}' | grep "^$i"
To resolve this issue You need to use more than one communication medium 
fe. ethernet or disk quorum if You have one?

Best Regards
Maciej Bogucki