[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [dm-devel] [RFC] pathchecker

Joe Thornber wrote:
On Mon, Mar 01, 2004 at 12:03:34PM -0800, Joel Becker wrote:

	The "wait for DM event" part.  Do we have an event yet?
2.6.4-udm1 doesn't seem to send any events to userspace on fail_path().
Are we thinking an upcall, or perhaps polling the status?

The event handling should be working; fail_path() uses a work queue to
schedule trigger_event() being called (we can't call it directly from
interrupt context).

dm has a very simple model for events:

- Userland issues the wait for event ioctl, which blocks until an event occurs
- A target (eg. mpath) triggers an event
- Userland returns from the wait for event ioctl.  At this point it
  should query the status of the device to work out what happened.

An event number is passed into 'wait for event' to indicate the last
known event.  This way we can avoid missing events while previous
events are being processed.  Only recent versions of dmsetup support
this event number handling.


I confirm the current event notification scheme is useable for the pathchecker. I have a prototype I'll post this week.

Speaking of that I call for comment on the saneness of the following general rule : what about the multipath configuration tool isolating failed paths in a fallback PG ? They would be marked Active as no IO went through them, and thus be exercised in case high priority paths all fail. If they are hot-activated by the controler (think a controler LUN handling switchover), they will work as-is. If they are really failed, they will just be marked as such.

Now with the pathchecking logic :
Upon MP initial config, all path are marked Active, including failed ones grouped in a separate secondary PG ... no pathchecking.

Waiter threads wait for events.

Now an exercized path fails. A waiter thread wakes and fetch the MP status string, discovers the failed path and push it on the failedpaths list. Now the patchchecker thread has this path to test.

On the 1000th try the pathchecker finds the path has gone up again. pathchecker fork'n exec the multipath config tool that reset the MP target as it was in the begining. The path pops out of the failedpaths list and everybody returns to sleep happy.

So, sane / insane ?

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]