[Linux-cluster] RH Cluster doesn't pass basic acceptance tests - bug in fenced?
Lon Hohberger
lhh at redhat.com
Thu Jan 11 14:30:08 UTC 2007
On Wed, 2007-01-10 at 15:16 -0500, Josef Whiter wrote:
> > Thanks for any advice ...
> >
>
> This isn't a bug, its working as expected. What you need in qdisk, set it up
> with the proper hueristics and it will force the shutdown of the bad node before
> the bad node has a chance to fence off the working node.
What he said. With qdisk, you can have the node declare itself unfit
for cluster operation when bond0 or bond1 loses link; something like:
<quorumd min_score="2" votes="2" status_file="/tmp/qdisk_status_info">
<heuristic program="ping -c1 -t1 <bond0 router>" score="1"
interval="2"/>
<heuristic program="ping -c1 -t1 <bond1 router>" score="1"
interval="2"/>
</quorumd>
You could use more complex link monitoring (like the stuff
in /usr/share/cluster/ip.sh) if you wanted, but this gives you the basic
idea.
The idea here is that if bond0 *or* bond1 loses link, qdiskd declares
the node unfit (min_score = 2, and each route is 1 point, so loss of
either => fatal). A feature was added after the initial release of
qdiskd to reboot the node on loss of required score (previously, it
would cause the node to become inquorate and block activity).
-- Lon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070111/5ed5f4b7/attachment.sig>
More information about the Linux-cluster
mailing list