[Linux-cluster] Interfacing csnap to cluster stack

Fri Oct 8 21:25:40 UTC 2004

On Friday 08 October 2004 15:16, Lon Hohberger wrote:
> On Thu, 2004-10-07 at 13:58 -0400, Daniel Phillips wrote:
> > Suppose that the winner of the race to get the exclusive lock is a
> > bad choice to run the server.  Perhaps it has a fast connection to
> > the net but is connected to the disk over the network instead of
> > directly like the other nodes.   How do you fix that, within this
> > model?
>
> Let me see if I am getting this use-case picture right...
> Are either of those close?

No, the arranagement I was describing is:

    SAN                       GigE
      | <---> iSCSI/GNBD <---> |
      |                        |
      | <---> Client <-------> |  Node 1
      |                        |
      | <---> Client <-------> |  Node 2
                               |
              Client <-------> |  Node 3
              Server <-------> |  Node 3

Node 3 won the race to get the EX lock because the lock is mediated over 
the GigE network.  But Node 3 is a bad choice because it is two hops 
away from the disk.  The DLM chose Node 3 because the DLM doesn't know 
anything about network topology, just who got there first to grab the 
lock.

> (1) Don't set up your csnap server in such a way that some the nodes
> exhibit a bottleneck on disk I/O and some do not.

But what prevents it?  How do you "set up your csnap server"?  Why would 
you want to introduce new rules about cluster topology instead of 
fixing the code?

> (2) Have the administrator make an intelligent decision as to whether
> or not to relocate the csnap master server again as [s]he tries to
> fix the problem that caused the failover.  I.E. Don't worry about it
> if the csnap master server is running slowly.

The administrator is normally asleep or busy with girlfriend when 
anything goes wrong.

> Your clients still work, and the csnap server is available, albeit at
> a potentially degraded state.

Well...

> (3) Don't use the cluster-lock model.  It has its shortcomings.  Its
> strengths are in its simplicity; not its flexibility.

Yes, that's the one.  We need real resource management, even if it 
initially just consists of an administrator setting up config files.  
Something has to read those config files[1] and respond to server 
instance requests from csnap agents accordingly.

[1] At cluster bring-up time.  The resource manager has to be able to 
operate without reading files during failover.

Regards,

Daniel