[Linux-cluster] Re: Csnap server cluster membership interface

Thu Oct 21 16:16:02 UTC 2004

On Thu, Oct 21, 2004 at 01:40:30AM -0400, Daniel Phillips wrote:
> Good morning Ben,
> 
> Just a few final thoughts for ahem, tomorrow...
> 
> I can see that everybody really wants to go for the "full fat" solution 
> (as Patrick would say) right away, without trying anything simple 
> first, so I will go with the flow.  After all, I also like skydiving.

I'm not against passing through the simple case in the course of writing
the code, but I agree with kevin that the design should be for the final
version.

> I feel that a service group is the correct place to hook into cman, 
> mainly because not all nodes in the cluster need have agents running on 
> them, for various reasons including a csnap virtual device being 
> exported to another node, therefore needing neither agent nor server.  
> There isn't a lot of recovery to do in our service group, but it's nice 
> to know the mechanism is there should we need it.  Finally, I feel that 
> the service group will be able to help with orderly shutdown as I 
> mentioned earlier.

I see nothing wrong with hooking into cman and using cluster sockets. Since
we are already bound to the DLM with your failover method, being dependant on
cman is no harm. In fact, not doing it seems like a problem.

> I'm getting attached to the idea of teaching cman to hand out port 
> numbers along with service groups.  This needs to be kicked around, but 
> it will eliminate considerable configuration annoyance, and perhaps 
> most of the "well known" cluster ports, which is cruft that cman really 
> shouldn't encode into its header files if we can avoid it.

I'm not sure that you will be able to get Dave to add this.

> Since we can get cluster ports automagically, we can afford to have a 
> separate service group for each metadevice, i.e., per snapshot store to 
> be used by the cluster.  The name of the service group will be formed 
> from a unique id stored in the snapshot metadata at mksnapstore time, 
> prepended with csnap/ or something like that.

The seperate service groups would definitely work, but it only works easily
with the assigned ports per service group idea you mentioned.  I am leaning
for the "one agent per node" idea, where the agent handles all the servers.

There are only 256 ports available per node with cluster sockets. Some
applications may be port gready.  Having the requirement that cman needs to
find a port that is available on all the cluster nodes for each origin+snapshot
pair is something that I wouldn't fault Dave for not wanting to include.  If
you have one agent, you need only one service group, and one port. I don't
know how big of an issue this is, not understanding all the details of cman.
If Dave is fine with this idea, then I don't have any objections. But without
the automagically assigned ports, I don't like it.

Here is a one agent per node idea. The agent has a defined port that it always
grabs, and all agents belong to the "csnap" service group. The agent starts up
before the servers and clients, and opens a unix socket for communication with
them.  When the servers are started, they open a connection to the agent, and
pass in some unique identifier, and wait to be instantiated as the master. When
a client connects, it requests a server, using the same identifier.
Finding/instantiating a server works the same as it currently does

When the agent gets the lock to instantiate one of it's servers as the new
master, it contacts all the other agents in the "csnap" group at their defined
port, and asks for all their clients of that server, again using the unique
identifier. Once it builds up the list, it notifies the server that it is
now master, and passes it the list of clients. The server waits for connections
and then goes on it's merry way.

Like I said before, I favor this method, because it only relys on features that
we already have, and I don't know if a large number of service groups will be
an issue. That being said, if Dave blesses this addition to cman, I don't have
any real issue with your idea.

-Ben

> Multiple service groups should not be scary from the recovery point of 
> view because csnap recovers quickly from membership changes as 
> described earlier specifically for servers, which is the only 
> interesting case.
> 
> Each agent will bind to the service group's cluster port, which enforces 
> a one agent per node per metadevice rule.  Though we don't have to do 
> it right away, a single agent could handle multiple metadevices (i.e., 
> snapshot stores) using the local socket name to know which clients to 
> connect to which servers.  The agent currently uses only one local 
> socket, so could support only a single metadevice, but we can use 
> multiple agents to support multiple metadevices, which doesn't violate 
> the rule above.
> 
> Hmm, alternatively we could use the name of the snapshot store device, 
> which is locally known by both the dm target and the server, to match 
> up servers to clients.  Then the agent would not have to bind to 
> multiple sockets and we would not have to create a way of feeding it 
> new socket names.  This might be better, though it would mean that we 
> have to make rules about aliasing, one obvious form of which is the 
> device mapper uuid if the snapshot store is a device mapper device.
> 
> The server will be modified to read the snapshot store superblock before 
> settling down to act as a standby.  It passes the metadevice unique id 
> to the agent so that the agent can join the service group and bind to 
> the correct cluster port.
> 
> An agent sends cluster messages by resolving node:service to 
> cluster_address:port.  I think the node id might even be the cluster 
> address, I'm not looking at the code right now.
> 
> The rest is as described in the original post, these are really just 
> more details to consider.
> 
> Regards,
> 
> Daniel