[dm-devel] workqueues and percpu (was: [PATCH] dm: remake of the verity target)

Thu Mar 8 23:30:48 UTC 2012

On Thu, 8 Mar 2012 15:15:21 -0800
Tejun Heo <tj at kernel.org> wrote:

> > I'm not sure what we can do about it really, apart from blocking unplug
> > until all the target CPU's workqueues have been cleared.  And/or refusing
> > to unplug a CPU until all pinned-to-that-cpu kernel threads have been
> > shut down or pinned elsewhere (which is the same thing, only more
> > general).
> > 
> > Tejun, is this new behaviour?  I do recall that a long time ago we
> > wrestled with unplug-vs-worker-threads and I ended up OK with the
> > result, but I forget what it was.  IIRC Rusty was involved.
> 
> Unfortunately, yes, this is a new behavior.  Before, we could have
> unbound delays during unplug from work items.  Now, we have CPU
> affinity assumption breakage.

Ow, didn't know that.

>  The behavior change was primarily to
> allow long running work items to use regular workqueues without
> worrying about inducing delay across cpu hotplug operations, which is
> important as it's also used on suspend / hibernation, especially on
> mobile platforms.

Well..  why did we want to support these long-running work items? 
They're abusive, aren't they?  Where are they?

> During the cmwq conversion, I ended up auditing a lot of (I think I
> went through most of them) workqueue users and IIRC there weren't too
> many which required stable affinity.
> 
> > That being said, I don't think it's worth compromising the DM code
> > because of this workqueue wart: lots of other code has the same wart,
> > and we should find a centralised fix for it.
> 
> Probably the best way to solve this is introducing pinned attribute to
> workqueues and have them drained automatically on cpu hotplug events.
> It'll require auditing workqueue users but I guess we'll just have to
> do it given that we need to actually distinguish the ones need to be
> pinned.

That will make future use of workqueues more complex and people will
screw it up.

>  Or maybe we can use explicit queue_work_on() to distinguish
> the ones which require pinning.
> 
> Another approach would be requiring all workqueues to be drained on
> cpu offlining and requiring any work item which may stall to use
> unbound wq.  IMHO, picking out the ones which may stall would be much
> less obvious than the ones which require cpu pinning.

I'd be surprised if it's *that* hard to find and fix the long-running
work items.  Hopefully most of them are already using
create_freezable_workqueue() or create_singlethread_workqueue().

I wonder if there's some debug code we can put in workqueue.c to detect
when a pinned work item takes "too long".