[Linux-cluster] Csnap server cluster membership interface

Wed Oct 20 16:17:56 UTC 2004

Hi Ben,

The next bit of cluster infrastructure glue that we need is the 
interface between csnap server startup and cluster membership.  The 
problem we need to solve is: a new server must be sure that every 
snapshot client (as opposed to origin client) of the former server has 
either reconnected or left the cluster.  This is because snapshot 
clients take read locks on chunks that they read from the origin.  If a 
new server ignores those locks it could allow an origin client to 
overwrite a chunk that a snapshot client is reading.

I was originally thinking about replicating the list of snapshot clients 
across all the agents using a protocol between the server, clients and 
agents, so that a new server always has the current list available.  
But this is stupid, because cman already keeps the list of cluster 
nodes on every node, and there is a csnap agent on every node that 
knows about all the snapshot clients connected to itself.  So a direct 
approach is possible, as follows:

  - Once an agent succeeds in getting the exclusive on the snapshot
    store lock, it sends a "new server" message to the csnap agent
    on every node (alternatively, to every member of the "csnap"
    service group, see below).

  - Each agent responds by sending the (possibly empty) list of snapshot
    client ids back to the new server's agent.

  - The new server's agent must keep track of membership events to know
    if the number of replies it is expecting is reduced (we don't care
    about any new node because it could not possibly have been connected
    to the old server).

  - When all nodes have replied, the new server's agent forwards the
    combined list of client+node ids to its local csnap server and 
    activates it.

  - The new csnap server must receive connections from each of the
    snapshot clients before it will service any origin writes (might as
    well not service anything until ready, it's simple).

  - If any snapshot client goes away (closes its control connection
    with the agent) the local agent will know, and must connect on
    behalf of the departed client, and immediately disconnect.  It is
    perfectly reasonable for a client to disappear in this way: it
    corresponds to a user unmounting a snapshot from that node.

Now, this relies on the certainty that there is a csnap agent on every 
cluster node.  If there is not, then some nodes will never answer and 
the algorithm will never terminate.  The question is, do we require the 
cluster device to be configured the same way on every node?  For 
example, you could export a snapshot device via gnbd to nodes that do 
not require csnap devices.  If we allow such things (I think we should) 
then we probably want csnap agents to form a service group.  Instead of 
messaging all the nodes in the cluster, we message the members of the 
service group.

Note: we can broadcast the csnap server address along with the "new 
server" message instead of fiddling with the lvb, so the snapshot store 
lock goes back to being just that instead of trying to be a messaging 
system as well.

This algorithm ought to run in a few hundredths of a second even for 
large clusters, which will be the server failover time.  The new server 
can be initializing itself in parallel, i.e., reading the superblock 
and recovering the journal.  So this should be pretty fast.

Would you like to take a run at implementing this?  As far as I can see, 
cman usespace interface documentation consists of the source code, and 
some working code in clvmd and magma, so there is some digging to do.

Regards,

Daniel