[Linux-cluster] Heuristics for quorum disk used as a tiebreaker in a two node cluster.

Thu Dec 9 03:57:33 UTC 2010

Lon,

Thank you for your suggestions. 

1.
I like very much your idea of having additional fencing agent (called as the first one in the chain) with delay dependent on the presence of the service on the node.  I understand the code.  What I do not know is what are the steps in adding my own fencing agents. They all live in /usr/sbin.  

Is it as simple as placing the new fencing agent in /usr/bin?  Is some kind of registration required e.g. so ccs_config_validate will recognise it?

2.
I'd guess that the extra fencing agent can also solve the problem of both nodes being fenced when the inter-node link goes down.  This is a distinct from the scenario where the communication through quorum disk ceases.  This will be a bonus.

3.
I am using quorum disk as a natural way to assure that the cluster of 2 nodes has quorum with just one node. I am aware of the <cman two_node="1"/> option. 

What are the advantages or disadvantages of using quorum disk for two nodes compared with no quorum disk and the two_node="1" attribute set?

Thanks and regards,

Chris Jankowski

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
Sent: Thursday, 9 December 2010 07:33
To: linux clustering
Subject: Re: [Linux-cluster] Heuristics for quorum disk used as a tiebreaker in a two node cluster.

On Fri, 2010-12-03 at 10:10 +0000, Jankowski, Chris wrote:

> This is exactly what I would like to achieve.  I know which node
> should stay alive - the one running my service, and it is trivial for
> me to find this out directly, as I can query for its status locally on
> a node. I do not have use the network. This can be used as a heuristic
> for the quorum disc.
>  
> What I am missing is how to make that into a workable whole.
> Specifically the following aspects are of concern:
>  
> 1.
> I do not want the other node to be ejected from the cluster just
> because it does not run the service.  But the test is binary, so it
> looks like it will be ejected.

When a two node cluster partitions, someone has to die.

> 2.
> Startup time, before the service started.  As no node has the service,
> both will be candidates for ejection.

One node will die and the other will start the service.

> 3.
> Service migration time.
> During service migration from one node to another, there is a
> transient period of time when the service is not active on either
> node.

If you partition during a 'relocation' operation, rgmanager will
evaluate the service and start it after fencing completes.

> 1.
> How do I put all of this together to achieve the overall objective of
> the node with the service surviving the partitioning event
> uninterrupted?

As it turns out, using qdiskd to do this is not the easiest thing in the
world.  This has to do with a variety of factors, but the biggest is
that qdiskd has to make choices -before- CMAN/corosync do, so it's hard
to ensure correct behavior in this particular case.

The simplest thing I know of to do this is to selectively delay fencing.
It's a bit of a hack (though less so than using qdiskd, as it turns
out).

NOTE: This agent _MUST_ be used in conjunction with a real fencing
agent.  Put the reference to the agent before the real fencing agent
within the same method.

It might look like this:

#!/bin/sh

me=$(hostname)
service=empty1

owner=$(clustat -lfs $service | grep '^  Owner' | cut -f2 -d: ; exit
${PIPESTATUS[0]})
state=$?

echo Eval $service state $state $owner

if [ $state -eq 0 ] && [ "$owner" != "$me" ]; then
        echo Not the owner - Delaying 30 seconds
        sleep 30
fi

exit 0

What it does is give preference to the node running the service by
making the non-owner delay a bit before trying to perform real fencing
operation.  If the real owner is alive, it will fence first.  If the
service was not running before the partition, no node gets preference.

If the primary driving reason for using qdiskd was to solve this
problem, then you can you can avoid using qdiskd.

> 2.
> What is the relationship between  fencing and node suicide due to
> communication through quorum disk?

None.  Both occur.

> 3.
> How does the master election relate to this?

It doesn't, really.  To get a node to drop master, you have to turn
'reboot' off.  After 'reboot' is off, a node will abdicate 'master' mode
if its score drops.

-- Lon

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster