[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Cluster-devel] [PATCH] dlm: send_bast_queue() skip list loop not only sending basts to convertqueue


On Tue, 2011-01-04 at 16:27 -0500, David Teigland wrote:
> On Tue, Jan 04, 2011 at 06:06:51PM -0200, cmaiolino redhat com wrote:
> > The resource groups got corrupted without this patch:
> I could see an extraneous bast leading to confusion in gfs2 about the lock
> state, but gfs2 should probably be asserting somewhere before it actually
> corrupts anything...
I don't believe that this is the right solution, although it might be a
useful clue as to what is really going wrong. Also, nothing is getting
corrupted at all, the issue is that deallocation fails to take place at
the expected time. Thats not the end of the world, since there is
already a mechanism in place to catch that at a later point in time.

Carlos, why do you think that the dlm should be sending basts to the
lock requesting process for its own lock request?

That test was added specifically to prevent performance issues due to
extraneous basts being sent in such cases. I'd suggest getting some
traces of the glocks in question and checking to see what is happening,
and whether it is happening in the correct order,


> > diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
> > index 64e5f3e..565c519 100644
> > --- a/fs/dlm/lock.c
> > +++ b/fs/dlm/lock.c
> > @@ -1847,7 +1847,7 @@ static void send_bast_queue(struct dlm_rsb *r, struct list_head *head,
> >  
> >  	list_for_each_entry(gr, head, lkb_statequeue) {
> >  		/* skip self when sending basts to convertqueue */
> > -		if (gr == lkb)
> > +		if (head == &r->res_grantqueue && gr == lkb)
> >  			continue;
> >  		if (gr->lkb_bastfn && modes_require_bast(gr, lkb)) {
> >  			queue_bast(r, gr, lkb->lkb_rqmode);
> I haven't been able to figure out the problem or the fix; some printk's
> around the case in question would be revealing.
> This is the specific case where a TRY_1CB (NOQUEUBAST) conversion fails.
> Here's how I step through the code for that case:
> _convert_lock(lkb)
>     error = do_convert(lkb)
>         when error equals -EAGAIN, lkb remains on grantqueue
>     do_convert_effects(lkb, -EAGAIN)
>         -EAGAIN and NOQUEUEBAST -> send_blocking_asts_all ->
>         send_bast_queue(grantqueue, lkb)
>            [lkb is expected to be here, skip sending bast to self]
>         send_bast_queue(convertqueue, lkb):
>            [lkb should not be on here, but your patch implies there
>             are cases where it can be?  I think that would be a bug.]
> Dave

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]