[Linux-cluster] Re: Csnap instantiation and failover using libdlm

Benjamin Marzinski bmarzins at redhat.com
Thu Oct 21 21:56:29 UTC 2004


> > Is their some method for the lock to be revoked, 
> 
> Killing the agent that has it should do the job, which would be part of 
> stomith.  There also has to be a way of giving up the lock gracefully 
> when a node exits the cluster voluntarily.  I neglected to mention 
> "graceful node exit and cleanup" as another bit of infrastructure glue 
> still needed.
> 
Um.. I just realized that there's a problem here.
If the agent dies but the server doesn't, the lock will get revoked.
While this won't interfere with the clients currently connected to the
server, any new client (or client that gets disconnected) will think that
there is no server, and promote it's server to master.... and data corruption
will follow.

As far as I can tell, the way to ensure that this doesn't happen is to have
the server process take out the lock. That way the lock won't be freed unless
the server process dies. Agreed?

If that's the case, should the server also be responsible for contacting the
agents in the appropriate service group and getting the client information?

-Ben




More information about the Linux-cluster mailing list