Re: [Linux-cluster] RH Cluster doesn't pass basic acceptance tests - bug in fenced?

On Wed, 2007-01-10 at 15:16 -0500, Josef Whiter wrote:

> > Thanks for any advice ...
> > 
> This isn't a bug, its working as expected.  What you need in qdisk, set it up
> with the proper hueristics and it will force the shutdown of the bad node before
> the bad node has a chance to fence off the working node.

What he said.  With qdisk, you can have the node declare itself unfit
for cluster operation when bond0 or bond1 loses link; something like:

<quorumd min_score="2" votes="2" status_file="/tmp/qdisk_status_info">
   <heuristic program="ping -c1 -t1 <bond0 router>" score="1"
   <heuristic program="ping -c1 -t1 <bond1 router>" score="1"

You could use more complex link monitoring (like the stuff
in /usr/share/cluster/ip.sh) if you wanted, but this gives you the basic

The idea here is that if bond0 *or* bond1 loses link, qdiskd declares
the node unfit (min_score = 2, and each route is 1 point, so loss of
either => fatal).  A feature was added after the initial release of
qdiskd to reboot the node on loss of required score (previously, it
would cause the node to become inquorate and block activity).

-- Lon

