[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [NFS] [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover

Neil Brown wrote:

On Monday April 23, wcheng redhat com wrote:
Neil Brown wrote:


We started the discussion using network interface (to drop the locks) but found it wouldn't work well on local filesytems such as ext3. There is really no control on which local (sever side) interface NFS clients will use (shouldn't be hard to implement one though). When the fail-over server starts to remove the locks, it needs a way to find *all* of the locks associated with the will-be-moved partition. This is to allow umount to succeed. The server ip address alone can't guarantee that. That was the reason we switched to fsid. Also remember this is NFS v2/v3 - clients have no knowledge of server migration.

So it seems to me we do know exactly the list of local-addresses that
could possibly be associated with locks on a given filesystem.  They
are exactly the IP addresses that are publicly acknowledged to be
usable for that filesystem.
And if any client tries to access the filesystem using a different IP
address then they are doing the wrong thing and should be reformatted.

A convincing argument... unfortunately, this happens to be a case where we need to protect server from client's misbehaviors. For a local filesystem (ext3), if any file reference count is not zero (i.e. some clients are still holding the locks), the filesystem can't be un-mounted. We would have to fail the failover to avoid data corruption.

Maybe the idea of using network addresses was the first suggestion,
and maybe it was rejected for the reasons you give, but it doesn't
currently seem like those reasons are valid.  Maybe those who proposed
those reasons (and maybe that was me) couldn't see the big picture at
the time...

This debate has been (so far) tolerable and helpful - so I'm not going to comment on this paragraph :) ... But I have to remind people my first proposal was adding new flags into export command (say "exportfs -ud" to unexport+drop locks, and "exportfs -g" to re-export and start grace period). Then we moved to "echo network-addr into procfs", later switched to "fsid" approach. A very long journey ...

 The reply to SM_MON (currently completely ignored by all versions
 of Linux) has an extra value which indicates how many more seconds
 of grace period there is to go.  This can be stuffed into res_stat
 Places where we currently check 'nlmsvc_grace_period', get moved to
 *after* the nlmsvc_retrieve_args call, and the grace_period value
 is extracted from host->nsm.

ok with me but I don't see the advantages though ?

So we can have a different grace period for each different 'host'.

IMHO, having grace period for each client (host) is overkilled.


Part of unmounting the filesystem from Server A requires getting
Server A to drop all the locks on the filesystem.  We know they can
only be held by client that sent request to a given set of IP
addresses.   Lockd created an 'nsm' for each client/local-IP pair and
registered each of those with statd.  The information registered with
statd includes the details of an RPC call that can be made to lockd to
tell it to drop all the locks owned by that client/local-IP pair.

The statd in 1.1.0 records all this information in the files created
in /var/lib/nfs/sm (and could pass it to the ha-callout if required).
So when it is time to unmount the filesystem, some program can look
through all the files in nfs/nm, read each of the lines, find those
which relate to any of the local IP address that we want to move, and
initialiate the RPC callback described on that line.  This will tell
lockd to drop those lockd.  When all the RPCs have been sent, lockd
will not hold any locks on that filesystem any more.

Bright idea ! But doesn't solve the issue of misbehaved clients who come in from un-wanted (server) interfaces. Does it ?

I feel it has taken me quite a while to gain a full understanding of
what you are trying to achieve.  Maybe it would be useful to have a
concise/precise description of what the goal is.
I think a lot of the issues have now become clear, but it seems there
remains the issue of what system-wide configurations are expected, and
what configuration we can rule 'out of scope' and decide we don't have
to deal with.
I'm trying to do the write-up now. But could the following temporarily serve the purpose ? What is not clear from this thread of discussion?


-- Wendy

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]