[Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock

Mark Syms Mark.Syms at citrix.com
Thu Oct 11 19:28:28 UTC 2018


Also, the prepare to wait call does so in non exclusive mode so switching the parameter on the wakeup would have no effect as non exclusive waiters are always woken anyway.

So, there is some other reason why we decide to yield but never get woken.

Mark.
________________________________
From: Mark Syms <Mark.Syms at citrix.com>
Sent: Thursday, 11 October 2018 09:14
To: Tim Smith <tim.smith at citrix.com>,cluster-devel at redhat.com
CC: Andreas Gruenbacher <agruenba at redhat.com>,Ross Lagerwall <ross.lagerwall at citrix.com>
Subject: RE: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock


And, sadly, the change to wait all also doesn't prevent the schedule_timeout from occurring. Something more subtle going on obviously.

        Mark.

-----Original Message-----
From: Tim Smith <tim.smith at citrix.com>
Sent: 10 October 2018 09:23
To: cluster-devel at redhat.com
Cc: Andreas Gruenbacher <agruenba at redhat.com>; Ross Lagerwall <ross.lagerwall at citrix.com>; Mark Syms <Mark.Syms at citrix.com>
Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock

On Tuesday, 9 October 2018 16:08:10 BST Tim Smith wrote:
> On Tuesday, 9 October 2018 14:00:34 BST Andreas Gruenbacher wrote:
> > On Tue, 9 Oct 2018 at 14:46, Tim Smith <tim.smith at citrix.com> wrote:
> > > On Tuesday, 9 October 2018 13:34:47 BST you wrote:
> > > > There must be another reason for the missed wake-up. I'll have
> > > > to study the code some more.
> > >
> > > I think it needs to go into gfs2_glock_dealloc(), in such a way as
> > > to avoid that problem. Currently working out a patch to do that.
> >
> > That doesn't sound right: find_insert_glock is waiting for the glock
> > to be removed from the rhashtable. In gfs2_glock_free, we remove the
> > glock from the rhashtable and then we do the wake-up. Delaying the
> > wakeup further isn't going to help (but it might hide the problem).
>
> The only way we can get the problem we're seeing is if we get an
> effective order of
>
> T0: wake_up_glock()
> T1: prepare_to_wait()
> T1: schedule()
>
> so clearly there's a way for that to happen. Any other order and
> either
> schedule() doesn't sleep or it gets woken.
>
> The only way I can see at the moment to ensure that wake_up_glock()
> *cannot* get called until after prepare_to_wait() is to delay it until
> the read_side critical sections are done, and the first place that's
> got that property is the start of gfs2_glock_dealloc(), unless we want
> to add synchronise_rcu() to gfs2_glock_free() and I'm guessing there's
> a reason it's using
> call_rcu() instead.
>
> I'll keep thinking about it.

OK, we have a result that this definitely *isn't* the right answer (still getting the wakeup happening). This is lends more weight to the idea that there are multiple waiters, so we'll try your patch.

--
Tim Smith <tim.smith at citrix.com>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20181011/ef6af022/attachment.htm>


More information about the Cluster-devel mailing list