[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [dm-devel] RFC for multipath queue_if_no_path timeout.



On Thu, Oct 17, 2013 at 12:03:10PM -0700, Frank Mayhar wrote:
> Dragging this back up into the light...
> 
> On Thu, 2013-09-26 at 19:49 -0400, Mike Snitzer wrote:
> > Frank, I had a look at your patch.  It leaves a lot to be desired, I was
> > starting to clean it up but ultimately found myself agreeing with
> > Alasdair's original point: that this policy should be implemented in the
> > userspace daemon.
> 
> I've found and fixed a couple of bugs but I would still like to know
> what issues you had with the patch.  As I said before, I would be more
> than happy to clean it up.
> 
> In the time since we had this discussion, by the way, we ran into a
> problem that a userspace daemon can't solve:  That of shutdown.  We ran
> into a number of failures in which systems were hung for hours.  It
> turned out that they were caused by a regular system shutdown.  Our
> backing store is network-based and networking was getting killed before
> applications (as is usually the case), leaving I/O outstanding on the
> device.  Since queue_if_no_path was set, the I/O wasn't dumped and our
> daemon was killed by shutdown very shortly thereafter so it couldn't
> recover (otherwise it would have cleaned things up).
> 

Was multipathd force killed? What was the default configuration
parameter "queue_without_daemon" set to?

If "queue_without_daemon" is set to "no", multipathd should disable
queueing when it is stopped. This was added specifically to avoid this
issue.

-Ben

> With those I/Os sitting queued in multipath, with no network and no
> daemon to turn off queue_if_no_path, the systems just sat.  When we
> finally diagnosed this, we realized that the timeout would work
> perfectly to solve the problem, automatically turning queue_if_no_path
> off shortly after the network went away without depending on the
> intervention of the no-longer-running daemon.
> 
> So how do you guys deal with this failure scenario?
> -- 
> Frank Mayhar
> 310-460-4042
> 
> --
> dm-devel mailing list
> dm-devel redhat com
> https://www.redhat.com/mailman/listinfo/dm-devel


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]