[Linux-cluster] fencing loop in a 2-node partitioned cluster
Gianluca Cecchi
gianluca.cecchi at gmail.com
Tue Feb 24 08:26:03 UTC 2009
Actually my situation is pretty different and worse.
two nodes cluster with qdisk and hp ilo based fencing, components rh
el 5U3 based.
if I panic a node, the other correctly fence it with default action of
rebooting it. And also the converse is true.
But if for example I get down the intracluster network (it is bonded
actually, but I'm trying to repdoduce as many scenarios I can), the
reaction is that each node fences the other, but they both remains in
power off mode... so no loop at all
here is relvant information from my cluster.conf:
<?xml version="1.0"?>
<cluster alias="oracs" config_version="46" name="oracs">
<cman expected_votes="3" two_node="0"/>
<fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="20"/>
<clusternodes>
<clusternode name="node01" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="ilonode01"/>
</method>
</fence>
</clusternode>
<clusternode name="node02" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="ilonode02"/>
</method>
</fence>
</clusternode>
</clusternodes>
<quorumd device="/dev/mapper/mpath3" interval="3"
label="acsquorum" log_facility="local4" log_level="7" tko="5"
votes="1">
<heuristic interval="2" program="ping -c1 -w1
10.4.5.250" score="1" tko="3"/>
</quorumd>
<fencedevices>
<fencedevice agent="fence_ilo" hostname="10.4.192.208"
login="fenceuser" name="ilonode01" passwd="xxxxx"/>
<fencedevice agent="fence_ilo" hostname="10.4.192.209"
login="fenceuser" name="ilonode02" passwd="xxxxx"/>
</fencedevices>
the heuristic ip of qdisk is on production lan (10.4.5.x), while
intracluster is on another lan (192.168.16.x).
Is there any parameter I can configure to prevent this situation or is
it by design?
I would expect one (for example quorum master node) to survive and
successfully fence the other...
Also because in this scenario I have:
- both nodes see the SAN and the quorum disk
- both nodes see production LAN
- both nodes see the status of the other one via iLO commands
I remember also old kimberlite was able to configure more than one
intracluster lan...?
my components are:
cman-2.0.98-2chrissie (a patched cman after 5U3 because of this:
https://bugzilla.redhat.com/show_bug.cgi?id=485026)
rgmanager-2.0.46-1.el5
openais-0.80.3-22.el5
Any suggestions are welcome.
Thanks Gianluca
More information about the Linux-cluster
mailing list