[Cluster-devel] [GFS2] Fix ordering bug in lock_dlm

Thu May 22 07:44:55 UTC 2008

Hi,

On Wed, 2008-05-21 at 13:10 -0500, David Teigland wrote:
> On Wed, May 21, 2008 at 06:09:24PM +0100, Steven Whitehouse wrote:
> > >From 317b0076b8b1a27b51a8eb47a64d495fdb956ac5 Mon Sep 17 00:00:00 2001
> > From: Steven Whitehouse <swhiteho at redhat.com>
> > Date: Wed, 21 May 2008 17:21:42 +0100
> > Subject: [PATCH] [GFS2] Fix ordering bug in lock_dlm
> > 
> > This looks like a lot of change, but in fact its not. Mostly its
> > things moving from one file to another. The change is just that
> > instead of queuing lock completions and callbacks from the DLM
> > we now pass them directly to GFS2.
> > 
> > This gives us a net loss of two list heads per glock (a fair
> > saving in memory) plus a reduction in the latency of delivering
> > the messages to GFS2, plus we now have one thread fewer as well.
> > There was a bug where callbacks and completions could be delivered
> > in the wrong order due to this unnecessary queuing which is fixed
> > by this patch.
> 
> Several things,
> 
> 1. These are very significant changes.  There's nothing terribly wrong
>    with that, but it's important to get that straight.
> 
> 2. Moving large chunks of code along with making significant changes
>    makes it impossible to see what changed and what didn't.  In relation
>    to point 1, a small number of actual lines changed doesn't make it
>    insignificant, it's what those changes *do*.
> 
> 3. These changes require us to fork the lock modules for gfs1 and gfs2.
>    That's fine, it's been coming for quite a while anyway. (more below)
> 
> 4. I'll continue to maintain the original lock_dlm for gfs1.  You and
>    other gfs folks can own the new one and do whatever you like with it
>    without me getting in the way.
> 
> Now, how to fork the lock modules.  There shouldn't be much trouble
> adapting gfs_controld to cope with the two different lock_dlm's.  The one
> main problem I see is that the name of the module "lock_dlm" can't really
> be changed; it's a long standing part of the API/ABI, user interface,
> documentation, ...  But I don't think it's feasible to have two different
> files named lock_dlm.ko on the system.
> 
> The only solution I've been able to come up with is for the upstream
> lock_dlm module to be merged into the gfs2 module, along with the
> lock_nolock module.  We'd still be able to use gfs2 in the same old way,
> refering to lock_dlm and lock_nolock, but it just wouldn't be a separate
> module.  This has been the plan for a long time anway.  Initially, nothing
> functional would need to change between gfs2 and lock_dlm even though
> they're in the same module (it's the same thing we did with the
> lock_harness).  Breaking down the barrier between them could then begin,
> though, and be done incrementally.
> 
> So, when gfs2 looks for "lock_dlm" or "lock_nolock" it would look within
> the scope of its own kernel module.  For gfs1, it would continue to look
> for separate modules named lock_dlm and lock_nolock.
> 
Yes, I'd like to do it that way. For one thing it would reduce the
number of lookups we have to do on glocks when we get replies from the
DLM. It would also be possible to eliminate a number of fields which are
duplicated between struct gfs2_glock and struct gdlm_lock resulting in a
considerably reduced memory requirement.

Steve.