[Linux-cluster] Fencing issue using IPMI (nodes fencing each other ending in a loop)

Sat Sep 27 15:25:01 UTC 2008

You may be having the same issue as I have:
[Linux-cluster] Node won't rejoin after reboot

Do you have Red Hat support case open for it? If you do, can you send me
the issue number privately (!) ? I would use it in our support case,
because the issues may be related.

Also:
- Is IGMP Snooping enabled on Cisco switches? Try disabling it, if
  possible.
- RHEL5.2 is using IGMPv3 by default. Try moving to IGMPv2

Jakub

Stevan Colaco wrote:
> Hello All,
> 
> To test i have moved network connectivity from cisco swictes to 5 port
> DLINK swicth.
> Cluster is workig with fencing properly.
> 
> Best Regards,
> -Stevan Colaco
> 
> 
> 2008/9/24 Grisha G. <grigorygor at gmail.com>:
> > In 2 node cluster you should use a quorum disk to solve the split brain
> > problem.
> > after you create a quorum disk change this line in you cluster.conf
> > from <cman expected_votes="1" two_node="1"/>
> > to  <cman expected_votes="3" two_node="0"/>
> >
> > Grisha
> >
> >
> > On Tue, Sep 23, 2008 at 7:27 PM, Stevan Colaco <stevan.colaco at gmail.com>
> > wrote:
> >>
> >> Hello
> >>
> >> issue: Fencing using fence_ipmilan, each node keeps fencing the other
> >> node ending in a fence loop.....
> >>
> >> We have implemented RH Cluster on RHEL5.2 64bit.
> >> Server Hardware: SUN X4150
> >> Storage: SUN 6140
> >> Fencing Machnism: fence_ipmilan
> >>
> >>  We have downloaded the IPMI fence_ipmilan and configured two node
> >> cluster with ipmi fencing. But..
> >>
> >> when we ifdown the NIC interface, the node gets fenced but the service
> >> does not relocate to the other node. at the same time when the
> >> initially fenced node joins back the cluster it fences the other
> >> node......
> >> this keeps on ending in a loop.
> >>
> >> We downloaded and followed the intructions from the ipmi site
> >> mentioned below
> >> http://docs.sun.com/source/819-6588-13/ipmi_com.html#0_74891
> >>
> >> we tested with following  Cmd line method which works fine.
> >> #fence_ipmilan -a "ip addr" -l root -p <Passkey> -o <on|off|reboot>
> >>
> >> here is my cluster.conf
> >>
> >> <?xml version="1.0"?>
> >> <cluster alias="tibcouat" config_version="12" name="tibcouat">
> >>        <fence_daemon clean_start="0" post_fail_delay="0"
> >> post_join_delay="3"/>
> >>        <clusternodes>
> >>                <clusternode name="tibco-node1-uat.kmefic.com.kw"
> >> nodeid="1" votes="1">
> >>                        <fence>
> >>                                <method name="1">
> >>                                        <device name="tibco-node1"/>
> >>                                </method>
> >>                        </fence>
> >>                </clusternode>
> >>                <clusternode name="tibco-node2-uat.kmefic.com.kw"
> >> nodeid="2" votes="1">
> >>                        <fence>
> >>                                <method name="1">
> >>                                        <device name="tibco-node2"/>
> >>                                </method>
> >>                        </fence>
> >>                </clusternode>
> >>        </clusternodes>
> >>        <cman expected_votes="1" two_node="1"/>
> >>        <fencedevices>
> >>                <fencedevice agent="fence_ipmilan" ipaddr="172.16.71.41"
> >> login="root" name="tibco-node1" passwd="changeme"/>
> >>                <fencedevice agent="fence_ipmilan" ipaddr="172.16.71.42"
> >> login="root" name="tibco-node2" passwd="changeme"/>
> >>        </fencedevices>
> >>        <rm>
> >>                <failoverdomains>
> >>                        <failoverdomain name="prefer_node1" nofailback="0"
> >> ordered="1"
> >> restricted="1">
> >>                                <failoverdomainnode
> >> name="tibco-node1-uat.kmefic.com.kw" priority="1"/>
> >>                                <failoverdomainnode
> >> name="tibco-node2-uat.kmefic.com.kw" priority="2"/>
> >>                        </failoverdomain>
> >>                </failoverdomains>
> >>                <resources>
> >>                        <ip address="172.16.71.55" monitor_link="1"/>
> >>                        <clusterfs device="/dev/vg0/gfsdata"
> >> force_unmount="0" fsid="63282"
> >> fstype="gfs" mountpoint="/var/www/html" name="gfsdata"
> >> self_fence="0"/>
> >>                        <apache config_file="conf/httpd.conf"
> >> name="docroot"
> >> server_root="/etc/httpd" shutdown_wait="0"/>
> >>                </resources>
> >>                <service autostart="1" domain="prefer_node1" exclusive="0"
> >> name="webby" recovery="relocate">
> >>                        <ip ref="172.16.71.55"/>
> >>                </service>
> >>        </rm>
> >> </cluster>
> >>
> >>
> >> Kindly investigate and provide us the solution at the earliest.
> >>