[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[dm-devel] Re: [PATCH][RFC] supporting different failover models

Joe Thornber wrote:
On Wed, Feb 11, 2004 at 12:24:49AM -0800, Mike Christie wrote:

The problem is that some devices will require a special command be sent before the other paths can be used. And another class of devices will simply perform the switch when IO is sent down the secondary paths.

ah, this is the sort of stuff that I didn't know.

For the latter group of devices the test IOs are going to cause switches which is going to cause performance problems. By only testing the current group or specifying which groups should be tested and just seperating your paths through priority groups, we can avoid that problem, you do not have to worry about revalidating secondary paths that were previously initially failed, plus I think you can then kill the nr_tested_paths code.

I think it makes more sense to test secondary paths less frequently,
rather than never at all. The sysadmin needs to know if a secondary
path has failed, so that he can react to it. Otherwise you could get
a silent, gradual degradation of a system over many months.
Does this make sense ?

Yes, makes sense, but...

How frequently should we test the secondary
paths ? Just how expensive is the switch on typical hardware ?

It can be pretty high. I can get more detailed accounting of the costs in the morning.

It really must be done when there is no IO activity. I have been trying to figure out how to test for device idleness. I snuck a peak at MD and they are just using the disk stat functions. Originally, I was thinking the path tests could be done or at least initiated from userspace when the admin (or a userspace daemon) knows that it won't affect any user.

All that is needed to support the devices that require a special command
are some callouts in the priority group framework to initialize the
group before it is used.

Yuck.  Shouldn't this be handled by the driver for the path itself?
If I've got an open block device I expect to be able to use it.

Some devices/paths will show up as /dev/sda, but they can only accept basic commands like INQUIRY, other IO like reads and writes will just be failed. It really is vendor specific though. What exactly is the "driver for the path" is the question? I wanted to add the callouts to the "driver for the path", but I am not sure if this is the path-selector or what?

Yeah, I know it is messy. It would be nice to just do it in userspace, but if it is your root disk what can you do?

Is this direction OK, or what is the intended purpose of the test
IO+nr_test_paths code?

The nr_tested_paths counter is used to ensure that all the paths
really are dead before we start failing ios.  Otherwise we could have
this situation:

- path a fails
- mpath target notices and switches to path b
- path a recovers
- path b fails
- mpath target notices, but since it hasn't yet had time to re-test path
  a it assumes it's still broken, and errors the io
It's not perfect (and can never be), but it closes the window for this
race from the test period (many seconds) to apcproximately the time
taken to complete an io (<< 1 sec).

Thats what it looked like. Something that will ease this is being able to distinguish the error types hard, soft etc that way DM is not failing on any old error. I got no feedback when I posted, so Pat had been looking into this and I will soon.

I also replaced the dm-daemon usage for a workqueue. It will allow us to do requeueing for different devices in parallel, and it attempts to execute the work on the same processor as it was queued.

I need to look at this some more, at first glance it looks to have
more memory overhead since an extra object needs to be allocated per
queued item (this does come from a mempool I hope ?).  Also I've just

Are you referring to the work structure? Everything I added is just one per "struct multipath", and is allocated in the ctr. It is reused and queue_work() makes sure it is not added multiple times. The patch really is not usable. I did not add the timer shutdown/renable for suspend/restart. I was waiting to see what the nr_test_paths is for - now I know for sure.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]