[Linux-cluster] Re: Csnap instantiation and failover using libdlm

Fri Oct 22 17:31:21 UTC 2004

On Thu, Oct 21, 2004 at 08:27:46PM -0400, Daniel Phillips wrote:
> On Thursday 21 October 2004 17:56, Benjamin Marzinski wrote:
> > Um.. I just realized that there's a problem here.
> > If the agent dies but the server doesn't, the lock will get revoked.
> > While this won't interfere with the clients currently connected to
> > the server, any new client (or client that gets disconnected) will
> > think that there is no server, and promote it's server to master....
> > and data corruption will follow.
> >
> > As far as I can tell, the way to ensure that this doesn't happen is
> > to have the server process take out the lock. That way the lock won't
> > be freed unless the server process dies. Agreed?
> 
> No, the way to ensure this is to have the server die if its control 
> socket goes away.

O.k. lets say that you come up with a way to have a signal send to the
server when the agent dies.  (I can't think of any easy way to do this,
but that would be the surest way of killing the server immediately after
the agent dies). Even in that case, there is still the possibility of
corruption.  Say you have nodes A and B, both with csnap_servers. A is running
the master server. Lets just say, to make this example more reasonable, that
A and B are accessing the origin and snapstore over iSCSI. a request comes into
the server on A from node B that causes it to write to the snapstore. Due to
some network issue, this takes a while.  At the same time the agent running on A
dies. Somehow, a signal automatically gets sent to the server on A.  However,
since this server is in the D state, the signal does not get delivered yet.
This network issue also causes node B to loose connection to node A.  It sees
that there is no server, starts its own and sends the request to its server.
The servers finally write to the snapstore at the same time, and the disk is
corrupted.

o.k. it's a corner case, but there are probably more likely cases too. There
is no way that I know of to notify the server in a way that would guarantee that
there was no corruption.  The risk isn't high, but it is there.

> However, you have pointed out why it's bad for the new server to rely 
> only on the lock to decide when its safe to start processing requests, 
> or even to recover the journal: there may still be writes in flight 
> from the old server.  If a server dies but its node is still in the 
> cluster, the new server's agent has to regard that as a valid reason 
> for fencing the node.  This can only be handled properly at the 
> membership level, not at the lock level.

Yes, fencing would fix this, but weren't you pushing for the least drastic
solution to the problem.  Since all the server processes only write directly
to the disk, if the server process is dead, then no writes will reach the disk.
That's the reason why you can't kill processes in the 'D' state. As long as
you have journaling to clean up the unfinished transactions it should be
perfectly safe to failover once the server is dead.

If the agent is dead, but the server isn't, then you have problems. Really,
since the new server can't start until all the clients break their connection
with the old server, the only issue you have to worry about is that the old
server might be stuck waiting for a write to complete, as in the example above.
Since that wouldn't cause the agent to die, fencing because the agent died is
almost always unnecessary.

Besides that, you add a bunch of complexity to the agent.  Say that SM
tells the agent that a node is no longer in the service group, but it still
is in the cluster. The agent has to decide if the node cleanly left the group
or not. I suppose that it could check to see if it has any clients connected
to the server, and if so, it knows that the server has not left cleanly. But
what if the agent on the new server node doesn't have any clients connected to
the old server. It doesn't know whether or not other nodes do. It would have to
communicate with every agent in the service group to see if the server should
be fenced.

If the server grabbed the lock instead, you would only fail over when the
server was dead. As I said before, once the server is dead, it is
completely safe to fail over.

> > If that's the case, should the server also be responsible for
> > contacting the agents in the appropriate service group and getting
> > the client information?
> 
> It's not the case, so we don't have to worry about it.
> 
> The only interesting argument I know of for moving infrastructure 
> details into the server is to get rid of one daemon,

And to eliminate a corner case that causes corruption without having
agents fencing nodes for usually no good reason. And to keep from adding
a bunch of addition code, that wouldn't be necessary if the code was
moved.

> but daemons are 
> cheap, particularly if they sleep nearly all the time like the agent 
> does.  It's better to keep the agent and daemon separate and 
> specialized for the time being.

I don't think we have to pull all the infrastructure into the server. I think
it seems logical, but I you are against it, I don't really care. But I do
believe that having the agent grab and hold the lock, instead of having the
server do it, is a bad idea.

-Ben

> Regards,
> 
> Daniel