[Cluster-devel] "->ls_in_recovery" not released

David Teigland teigland at redhat.com
Wed Nov 24 20:29:02 UTC 2010


On Wed, Nov 24, 2010 at 05:13:40PM +0100, Menyhart Zoltan wrote:
> Could you please indicate the exact URL?

The current fedora packages,
or
https://www.redhat.com/archives/cluster-devel/2010-October/msg00008.html
or
http://git.fedorahosted.org/git/?p=cluster.git;a=shortlog;h=refs/heads/STABLE31

> The Linux rules say: one should not return to user mode while holding a lock.
> This is because one should not trust the user mode programs whether they
> eventually re-enter the kernel or not, in order to release the lock.
> 
> For the very same reason (one should not trust the user mode programs),
> I think, the DML kernel module is not sufficiently robust.
> 
> If you have a closer look, the situation of the "dlm_recoverd" kernel thread
> is quite similar to waiting for a user mode program to trigger setting free
> a lock.
> 
> I can agree: it does not return to user mode.
> Yet it holds the lock and goes to sleep, in an um-interruptible way, waiting
> for a user action: it trusts 100 % a user mode program, that can be killed,
> can bee swapped out and no room to swap it in, etc.
> 
> Instead, the DLM should always return in a few seconds, saying the caller
> cannot be granted a given "dlm_lock" for a given reason.
> 
> E.g. the ocfs2 is able to handle refused lock request. It is up to the
> caller to decide if s/he wants to wait more.
> 
> I think whatever the user land does, the DLM kernel module should give
> a response to a "dlm_lock()" request within a short (for a human operator)
> time.

You have identified one of the obvious downsides to implementing
clustering partly in the kernel and partly in userland.  In my experience
this has not proven to be a problem.

Dave




More information about the Cluster-devel mailing list