[Linux-cluster] qdisk WITHOUT fencing

Fri Jun 18 10:49:58 UTC 2010

On 06/18/2010 11:28 AM, Jankowski, Chris wrote:

Can you please sort out the (lack of) word-wraps in your email client?

> Do you have a better idea? How do you propose to ensure that there
> is no resource clash when a node becomes intermittent or half-dead?
> How do you prevent it's interference from bringing down the service?
> What do you propose? More importantly, how would you propose to handle
> this when ensuring consistency is of paramount importance, e.g. when
> using a cluster file system?
>
> I believe that SCSI reservation are the key for protection.  One can
> form a group of hosts that are allowed to access storage and exclude
> those that had their membership revoked. Note that this is a protective
> mechanism - the stance is here: "This is ours and we protect it".  A
> node that have been ejected cannot do damage anymore.  This is
> philosophically opposite approach to fencing, which is: "I'll go out and
> shoot everybody whom I consider suspect and I am not going to come back
> until I've successfully shot everybody whom I consider suspect."

It isn't opposite philosophically at all. Instead of fencing by powering 
off the offending machine, you are fencing by cutting the machine off 
from the SAN. Logically, the two are identical, but you then also 
potentially need to apply other fencing for, say, network resources. 
I've written a fencing agent before for a managed switch to fence a 
machine by fencing it's switch port. That works as well as power 
fancing, but it isn't at all fundamentally different.

> A properly implemented quorum disk is the key for management of the
> cluster membership. Based on access to quorum disk one can then
> establish who is the member. The nodes ejected are configured to
> commit suicide, reboot and try to rejoin the cluster.

If a node crashes, it cannot be expected to remain functional enough to 
commit suicide.

> Then, based on membership one can set up SCSI reservations on shared
> storage.  This will protect the integrity of the filesystems including
> shared cluster filesystem.

See above - the distinction between power a node off or cutting off all 
it's network access is pretty immaterial. It doesn't get you away from 
the fundamental problem that you need a reliable way of preventing the 
failing node from rejoining the cluster.

> Note that there is natural affinity between the quorum disk on shared
> storage and shared cluster file system on the shared storage. Whoever
> has access to the quorum disk has access to shared storage and can
> stay as a member. Whoever does not should be ejected. Whether such
> node is dead, half-dead or actively looking for mischief is irrelevant,
> because it does not have access to storage once SCSI reservations have
> been set to exclude it. It won't get anywhere without access to storage.

Sure - but I don't think anyone ever argued that power based fencing is 
mandatory. Brocade switch based fencing from the SAN was supported last 
time I checked the list of supported fencing devices for RHCS.

> This is how DEC/Compaq/HP TruCluster V5.x works. It does support shared
> cluster filesystem.  In fact, this is the only filesystem that it
> supports except for UFS for CDROMS. And it supports shared root.

Shared Root is supported on Linux, in a lot of ways. Open Shared Root is 
one example, and I've even written a set of extensions to make that work 
on GlusterFS. I think it's in the OSR contrib repository.

> There is only one password file, one group file, one set of binaries
> and libraries all shared in CFS. And it has a rolling upgrade. It
> works reliably and there is not a trace of fencing in it.  So, it can
> be done.  This is a living proof and it works.

I think we are not agreeing entirely on what "fencing" actually is. And 
you are still talking about solving a problem that isn't hard to solve 
with RHCS - single SAN, at one location. If the machines are in one 
place, fencing isn't a problem. What's difficult is fencing in a 
geographically dispersed setup. I thought this was the main point of 
this thread.

Gordan