[dm-devel] Re: [PATCH][RFC] supporting different failover models

Mike Christie michaelc at cs.wisc.edu
Wed Feb 11 06:34:00 UTC 2004


Joe Thornber wrote:
> On Wed, Feb 11, 2004 at 12:24:49AM -0800, Mike Christie wrote:
> 
>>The problem is that some devices will require a special command 
>>be sent before the other paths can be used. And another class of devices 
>>will simply perform the switch when IO is sent down the secondary paths.
> 
> 
> ah, this is the sort of stuff that I didn't know.
> 
> 
>>For the latter group of devices the test IOs are going to cause switches 
>>which is going to cause performance problems. By only testing the 
>>current group or specifying which groups should be tested and just 
>>seperating your paths through priority groups, we can avoid that 
>>problem, you do not have to worry about revalidating secondary paths 
>>that were previously initially failed, plus I think you can then kill 
>>the nr_tested_paths code.
> 
> 
> I think it makes more sense to test secondary paths less frequently,
> rather than never at all.  The sysadmin needs to know if a secondary
> path has failed, so that he can react to it.  Otherwise you could get
> a silent, gradual degradation of a system over many months.
> Does this make sense ? 

Yes, makes sense, but...

  How frequently should we test the secondary
> paths ?  Just how expensive is the switch on typical hardware ?

It can be pretty high. I can get more detailed accounting of the costs 
in the morning.

It really must be done when there is no IO activity. I have been trying 
to figure out how to test for device idleness. I snuck a peak at MD and 
they are just using the disk stat functions. Originally, I was thinking 
the path tests could be done or at least initiated from userspace when 
the admin (or a userspace daemon) knows that it won't affect any user.

> 
>>All that is needed to support the devices that require a special command
>>are some callouts in the priority group framework to initialize the
>>group before it is used.
> 
> 
> Yuck.  Shouldn't this be handled by the driver for the path itself?
> If I've got an open block device I expect to be able to use it.
> 

Some devices/paths will show up as /dev/sda, but they can only accept 
basic commands like INQUIRY, other IO like reads and writes will just be 
failed. It really is vendor specific though. What exactly is the "driver 
for the path" is the question? I wanted to add the callouts to the 
"driver for the path", but I am not sure if this is the path-selector or 
what?

Yeah, I know it is messy. It would be nice to just do it in userspace, 
but if it is your root disk what can you do?


>>Is this direction OK, or what is the intended purpose of the test
>>IO+nr_test_paths code?
> 
> 
> The nr_tested_paths counter is used to ensure that all the paths
> really are dead before we start failing ios.  Otherwise we could have
> this situation:
> 
> - path a fails
> - mpath target notices and switches to path b
> - path a recovers
> - path b fails
> - mpath target notices, but since it hasn't yet had time to re-test path
>   a it assumes it's still broken, and errors the io
> It's not perfect (and can never be), but it closes the window for this
> race from the test period (many seconds) to apcproximately the time
> taken to complete an io (<< 1 sec).

Thats what it looked like. Something that will ease this is being able 
to distinguish the error types hard, soft etc that way DM is not failing 
on any old error. I got no feedback when I posted, so Pat had been 
looking into this and I will soon.

> 
>>I also replaced the dm-daemon usage for a workqueue. It will allow us to 
>>do requeueing for different devices in parallel, and it attempts to 
>>execute the work on the same processor as it was queued.
> 
> 
> I need to look at this some more, at first glance it looks to have
> more memory overhead since an extra object needs to be allocated per
> queued item (this does come from a mempool I hope ?).  Also I've just

Are you referring to the work structure?  Everything I added is just one 
per "struct multipath", and is allocated in the ctr. It is reused and 
queue_work() makes sure it is not added multiple times. The patch really 
is not usable. I did not add the timer shutdown/renable for 
suspend/restart. I was waiting to see what the nr_test_paths is for - 
now I know for sure.




More information about the dm-devel mailing list