[Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover

Thu Apr 19 07:04:29 UTC 2007

On Tuesday April 17, wcheng at redhat.com wrote:
> 
> In short, my vote is taking this (NLM) patch set and let people try it 
> out while we switch our gear to look into other NFS V3 failover issues 
> (nfsd in particular). Neil ?

I agree with Christoph in that we should do it properly.
That doesn't mean that we need a complete solution.  But we do want to
make sure to avoid any design decisions that we might not want to be
stuck with.  Sometimes that's unavoidable, but let's try a little
harder for the moment.

One thing that has been bothering me is that sometimes the
"filesystem" (in the guise of an fsid) is used to talk to the kernel
about failover issues (when flushing locks or restarting the grace
period) and sometimes the local network address is used (when talking
with statd). 

I would rather use a single identifier.  In my previous email I was
leaning towards using the filesystem as the single identifier.  Today
I'm leaning the other way - to using the local network address.

It works like this:

  We have a module parameter for lockd something like
  "virtual_server".
  If that is set to 0, none of the following changes are effective.
  If it is set to 1:

   The destination address for any lockd request becomes part of the
   key to find the nsm_handle.
   The my_name field in SM_MON requests and SM_UNMON requests is set
   to a textual representation of that destination address.
   The reply to SM_MON (currently completely ignored by all versions
   of Linux) has an extra value which indicates how many more seconds
   of grace period there is to go.  This can be stuffed into res_stat
   maybe.
   Places where we currently check 'nlmsvc_grace_period', get moved to
   *after* the nlmsvc_retrieve_args call, and the grace_period value
   is extracted from host->nsm.

  This is the full extent of the kernel changes.

  To remove old locks, we arrange for the callbacks registered with
  statd for the relevant clients to be called.
  To set the grace period, we make sure statd knows about it and it
  will return the relevant information to lockd.
  To notify clients of the need to reclaim locks, we simple use the
  information stored by statd, which contains the local network
  address.

The only aspect of this that gives me any cause for concern is
overloading the return value for SM_MON.  Possibly it might be cleaner
to define an SM_MON2 with different args or whatever.
As this interface is entirely local to the one machine, and as it can
quite easily be kept back-compatible, I think the concept is fine.

Statd would need to pass the my_name field to the ha callout rather
than replacing it with "127.0.0.1", but other than that I don't think
any changes are needed to statd (though I haven't thought through that
fully yet).

Comments?

NeilBrown