[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [dm-devel] LSF: Multipathing and path checking question

Just a note or two:

> My proposal is to handle this in several stages:
> - path fails
> -> Send out netlink event
> -> start dev_loss_tmo and fast_fail_io timer
> -> fast_fail_io timer triggers: Abort all oustanding I/O with
>   any future I/O, and send out netlink event.
> -> dev_loss_tmo timer triggers: Remove sdev and cleanup rport.
>   netlink event is sent implicitely by removing the sdev.
> Multipath would then interact with this sequence by:
> - Upon receiving 'path failed' event: mark path as 'ghost' or
>   ie no I/O is currently possible and will be queued (no path switch
> - Upon receiving 'fast_fail_io' event: switch paths and resubmit
queued I/Os
> - Upon receiving 'path removed' event: remove path from internal
  update multipath maps etc.

This makes perfect sense to me. Are we going to allow the end-user to
those timers (not sure that's a good idea...)?

> The time between 'path failed' and 'fast_fail_io triggers' would then
> able to capture any jitter / intermittent failures. Between 
> 'fast_fail_io triggers' and 'path removed' the path would be held in
> sort of 'limbo' in case it comes back again, eg for maintenance/SP
> etc. And we can even increase this one to rather long timespans (eg
> to give the admin enough time for a manual intervention.

> I still like this proposal as it makes multipath interaction far
> And we can do away with path checkers completely here.

All true. Although I think the "long" timespans might be best measured
minutes (say, default to 5 minutes) and should be configurable. It
probably isn't 
a good idea to leave that path dead for a very long time as a rule, even
it's possible to do so. Maybe even some sort of userland override would
worthwhile for scheduled maintenance?

Regards, Jerry

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]