[Linux-cluster] Some GDLM questions

Sun Jul 4 14:41:58 UTC 2004

Saturday, July 3, 2004, 12:00:47 PM, David Teigland wrote:

>> GDLM is not listed as a client of FENCE. This seems to imply
>> that a GDLM application has to interact directly with FENCE to 
>> deal with the unknown state problem in a 2 node cluster where each 
>> member has 1 vote and expected votes is 1 (section 3.2.6.2, page 28)
>> as otherwise the same lockspace could end up existing on multiple
>> machines in a single cluster. How would an application interact
>> with FENCE to prevent this or does this have to be handled by
>> configuring the cluster to reboot in this case?

> This is the quickest one to answer right off the bat.  We'll get to the others
> over the next few days I expect.

> Fencing is a service that runs on its own in a CMAN cluster; it's entirely
> independent from other services.  GFS simply checks to verify fencing is
> running before allowing a mount since it's especially dangerous for a mount to
> succeed without it.

> As soon as a node joins a fencing domain it will be fenced by another domain
> member if it fails.  i.e. as soon as a node runs:

>> cman_tool join    (joins the cluster)
>> fence_tool join   (starts fenced which joins the default fence domain)

> it will be fenced by other domain members if it fails.  So, you simply need to
> configure your nodes to run fence_tool join after joining the cluster if you
> want fencing to happen.  You can add any checks later on that you think are
> necessary to be sure that the node is in the fence domain.  (Looking at
> /proc/cluster/services is one way.)

> Running fence_tool leave will remove a node cleanly from the fence domain (it
> won't be fenced by other members.)

> One note of warning.  If the fence daemon (fenced) process is killed on node X,
> it appears to fenced processes on other nodes that X has left the domain
> cleanly (just as if it had run fence_tool leave).  X only leaves the domain
> "uncleanly" when the node itself fails (meaning the cluster manager decides X
> has failed.)  There is some further development planned to address this.

I understand the above but its still not clear to me how a
locking application would get fenced. On startup the application
could check that the cluster member has joined the fence domain.
This will ensure that it gets fenced if something goes wrong.

What's not clear is how the fence process will shut down (or
suspend) the locking application while fencing the node. Fencing
seems to be related to blocking access to I/O devices.