[dm-devel] [PATCH]dm-mpath: fix for race condition between multipath_dtr and pg_init_done.

Merla, ShivaKrishna ShivaKrishna.Merla at netapp.com
Thu Oct 17 22:03:47 UTC 2013


> From: Mike Snitzer [mailto:snitzer at redhat.com]
> Sent: Thursday, October 17, 2013 4:10 PM
> To: Hannes Reinecke
> Cc: Merla, ShivaKrishna; dm-devel at redhat.com; agk at redhat.com; Mikulas
> Patocka
> Subject: Re: [PATCH]dm-mpath: fix for race condition between
> multipath_dtr and pg_init_done.
> 
> On Thu, Oct 17 2013 at  5:47pm -0400,
> Hannes Reinecke <hare at suse.de> wrote:
> 
> > On 10/17/2013 08:53 PM, Mike Snitzer wrote:
> > >Thanks for reporting this.  Much appreciated.  More comments below.
> > >
> > >On Thu, Oct 17 2013 at  1:31pm -0400,
> > >Merla, ShivaKrishna <ShivaKrishna.Merla at netapp.com> wrote:
> > >
> > >>Whenever multipath_dtr is happening, we should prevent queueing any
> further path
> > >>activation work. There was a kernel panic where after pg_init_done()
> decrements
> > >>pg_init_in_progress to 0, wait_for_pg_init_completion call assumes
> there are no
> > >>more pending path management commands. But if pg_init_required is
> set by
> > >>pg_init_done call due to retriable mode_select errors , then
> process_queued_ios()
> > >>will again queue the path activation work. If free_multipath call has been
> > >>completed by the time activate_path work is called, kernel panic was
> seen on
> > >>accessing multipath members.
> > >
> > >Your locking looks suspect to me, see comment inlined below
> multipath_dtr
> > >
> > >But shouldn't we just train multipath_wait_for_pg_init_completion() to
> > >look at m->pg_init_required?  Have it wait for both pg_init_required and
> > >pg_init_in_progress to be zero?  We'd also have to audit that
> > >pg_init_required cannot be set while pg_init_in_progress.
> > >
> > Hmm.
> >
> > We _could_ try to resolve it by pushing I/O back onto the request queue
> > (cf my earlier post 'requeue I/O during pg_init').
> >
> > I was hoping to excite some comments with that, but seems to be my
> > fate nowadays to send out patches with no reply.
> 
> patchwork caught it:
> https://patchwork.kernel.org/patch/2969111/
> 
> I've just been distracted with other stuff the past week; but I'll be
> looking closer at this issue (and your earlier patch) shortly and we'll
> get a fix queued for 3.13.
> 
> > Anyway, maybe this will be giving it some more attention.
> > It definitely would avoid this problem, by virtue of not having to
> > queue I/O internally during pg_init, so we could easily tear down
> > the queue.
> 
> Sounds good.

Thanks for your comments.  I agree we should lock while setting dtr_in_progress, I think I overlooked it as its handled in process_queued_ios as well.
We looked into handling this in wait_for_pg_init_completion() but checking for pg_init_required here will not help as well ( until we prevent setting pg_init_required while pg_init_in_progress is set ).
Here due to SCSI_DH_RETRY on mode_select, pg_init_done will set the pg_init_required as activation needs to be retried under normal 
circumstances. But it completely differs when multipath target  is being destroyed.  Apparently I didn't see any pending_ios in our test
while this is happening. Just path activations are held up since controller was returning 5/91/36 CC's. With this condition either one of pg_init_required or pg_init_in_progress flags are set all the time.
Hannes patch will take care of preventing queueing of IO's when pg_init_in_progress is set, but currently running activation commands will not return until controller returns SUCCESS on mode_select.






More information about the dm-devel mailing list