[NFS] [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover

Neil Brown neilb at suse.de
Fri Apr 27 06:00:13 UTC 2007


On Thursday April 26, wcheng at redhat.com wrote:
> Neil Brown wrote:
> 
> >On Thursday April 26, wcheng at redhat.com wrote:
> >  
> >
> >>A convincing argument... unfortunately, this happens to be a case where 
> >>we need to protect server from client's misbehaviors. For a local 
> >>filesystem (ext3), if any file reference count is not zero (i.e. some 
> >>clients are still holding the locks), the filesystem can't be 
> >>un-mounted. We would have to fail the failover to avoid data corruption.
> >>    
> >>
> >
> >I think this is a tangential problem.
> >"removing locks held by troublesome clients so that I can unmount my
> >filesystem" is quite different from "remove locks held by client
> >clients using virtual-NAS-foo so they can be migrated".
> >  
> >
> The reason to unmount is because we want to migrate the virtual IP.

The reason to unmount is because we want to migrate the filesystem. In
your application that happens at the same time as migrating the
virtual IP, but they are still distinct operations.
 
>                                                                     IMO 
> they are the same issue but it is silly to keep fighting about this. In 
> any case, one interface is better than two, if you allow me to insist on 
> this.

How many interfaces depends somewhat on how many jobs to do.
You want to destroy state that will be rebuilt on a different server,
and you want to force-unmount a filesystem.  Two different jobs. Two
interfaces seems OK.
If they could both be done with one simple interface that would be
ideal, but I'm not sure they can.

And no-one gets to insist on anything.
You are writing the code.  I am accepting/rejecting it.  We both need
to agree or we won't move forward.  (Well... I could just write code
myself, but I don't plan to do that).

> 
> So how about we do RPC call to lockd to tell it to drop the locks owned 
> by the client/local-IP pair as you proposed, *but* add an "OR" with fsid 
> to fool proof the process ? Say something like this:
> 
> RPC_to_lockd_with (client_host, client_ip, fsid);
> if ((host == client_host && vip == client_ip) ||
> (get_fsid(file) == client_fsid))
> drop_the_locks();
> 
> This logic (RPC to lockd) will be triggered by a new command added to 
> nfs-util package.
> 
> If we can agree on this, the rest would be easy. Done ?

Sorry, but we cannot agree with this, and I think the rest is still
easy.

The more I think about it, the less I like the idea of using an fsid.
The fsid concept was created simply because we needed something that
would fit inside a filehandle.  I think that is the only place it
should be used.
Outside of filehandles, we have a perfectly good and well-understood
mechanism for identifying files and filesystems.  It is a "path name".
The functionality "drop all locks held by lockd on a particular
filesystem" is potentially useful outside of any fail-over
configuration, and should work on any filesystem, not just one that
was exported with 'fsid='.

So if you need that, then I think it really must be implemented by
something a lot like
   echo -n /path/name > /proc/fs/nfs/nlm_unlock_filesystem

This is something that we could possible teach "fuser -k" about - so
it can effectively 'kill' that part of lockd that is accessing a given
filesystem.  It is useful to failover, but definitely useful beyond
failover.


Everything else can be done in the RPC interface between lockd and
statd, leveraging the "my_name" field to identify state based on which
local network address was used.  All this other functionality is
completely agnostic about the particular filesystem and just looks at
the virtual IP that was used.
All this other functionality is all that you need unless you have a
misbehaving client.
You would do all the lockd/statd/rpc stuff.  Then try to unmount the
filesystem.  If that fails, try "fuser -k -m /whatever" and try the
unmount again.

Another interface alternative might be to hook in to
umount(MNT_FORCE), but that would require even broader review, and
probably isn't worth it....

NeilBrown




More information about the Cluster-devel mailing list