[Linux-cluster] RH Cluster doesn't pass basic acceptance tests - bug in fenced?

Lon Hohberger lhh at redhat.com
Thu Jan 11 14:30:08 UTC 2007


On Wed, 2007-01-10 at 15:16 -0500, Josef Whiter wrote:

> > Thanks for any advice ...
> > 
> 
> This isn't a bug, its working as expected.  What you need in qdisk, set it up
> with the proper hueristics and it will force the shutdown of the bad node before
> the bad node has a chance to fence off the working node.

What he said.  With qdisk, you can have the node declare itself unfit
for cluster operation when bond0 or bond1 loses link; something like:

<quorumd min_score="2" votes="2" status_file="/tmp/qdisk_status_info">
   <heuristic program="ping -c1 -t1 <bond0 router>" score="1"
interval="2"/>
   <heuristic program="ping -c1 -t1 <bond1 router>" score="1"
interval="2"/>
</quorumd>

You could use more complex link monitoring (like the stuff
in /usr/share/cluster/ip.sh) if you wanted, but this gives you the basic
idea.

The idea here is that if bond0 *or* bond1 loses link, qdiskd declares
the node unfit (min_score = 2, and each route is 1 point, so loss of
either => fatal).  A feature was added after the initial release of
qdiskd to reboot the node on loss of required score (previously, it
would cause the node to become inquorate and block activity).

-- Lon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070111/5ed5f4b7/attachment.sig>


More information about the Linux-cluster mailing list