[dm-devel] 2.6.10-rc1-udm1: multipath work in progress

Mon Nov 1 23:03:01 UTC 2004

Le vendredi 29 octobre 2004 à 23:23 +0100, Alasdair G Kergon a écrit :
> Multipath work in progress, containing bugs, known and unknown.
> Patches 38 onwards haven't been tidied.
> 
> ftp://sources.redhat.com/pub/dm/patches/2.6-unstable/2.6.10-rc1/2.6.10-rc1-udm1.tar.bz2
>  
> Some of the interfaces have changed a bit, so it's incompatible with 
> existing userspace multipath tools: the feeling was it isn't worth updating 
> them until we've made up our minds on the interface changes.
> 
yes, thanks

> So what else ought to be changed before freezing the interfaces?
> [One change I intend to make is  struct path  to  void  in dm-hw-handler.h
> and dm-path-selector.h]
> I'm not convinced of the need for a per-path initialisation function yet.
> 
Neither do I.
It still needs a real-world case to show it can be useful.

> But a table load flag to suppress the device size checks does sound OK.
> 
Or just move the test at the end of the PG switch-on procedure for all
multipath tables ? It would keep the API less complex ...

> New table example:
>   0 96000 multipath 
>   1 queue_if_no_path 0 1 round-robin 2 1 7:1 1 7:2 1
> 
>  * <num multipath feature args> [<arg>]*
>  * <num hw_handler args> [hw_handler [<arg>]*]
>  * <num priority groups> [<selector> <num paths> <num selector args>
>  *                        [<path> [<arg>]* ]+ ]+
> 
> We can extend 'feature args' if necessary and retain backwards compatibility.
> Currently must have "0"  or "1 queue_if_no_path".
> 
What problem do we try to solve here ? Planned outages, like controler
restart or firmware upgrades.

If so, I guess we can go for queue_if_no_path for all and just ask
userspace the time & queued ios threshold before failing.

May be even the queued io threshold wouldn't be needed : let's just fail
as soon as kernel memory is exhausted or userspace fed timer is elapsed.

Eventually, fail_if_no_path can be emulated by setting the timer to 0.

New table example:
  0 96000 multipath
  10 0 1 round-robin 2 1 7:1 1 7:2 1

 * <grace period in seconds>
 * <num hw_handler args> [hw_handler [<arg>]*]
 * <num priority groups> [<selector> <num paths> <num selector args>
 *                        [<path> [<arg>]* ]+ ]+

> Status example:
> 
>  * num_multipath_feature_args [multipath_feature_args]*
>  * num_handler_status_args [handler_status_args]*
>  * num_groups [A|D|E num_paths num_selector_args [path_dev A|F fail_count [selector_args]* ]+ ]+
> 
> A=the currently-used PG
> D=a disabled PG
> E=an enabled PG (but not the currently-used one)
> 
> Feature args tells you the current queue size.
> 
> Messages:  [dmsetup message <devname> <sector> <message>]
>   disable_group / enable_group - toggle PG priority
> Disabled PGs are only used after all paths in enabled PGs have failed.
>   fail_path / reinstate_path - toggle path status
> 
I don't quite see the benefits of PG disabling feature.

As far as I can see, all it brings is permiting kernel code to change
the maps, which seems like enabling policies in the kernel : from
userspace, we have the same effect by instanciating the PG at the tail
of the params string.

> e.g. dmsetup message mp3 0 fail_if_no_path
>      dmsetup message mp3 0 reinstate_path 7:2
> 
>   queue_if_no_path / fail_if_no_path - what to do if every path in every PG 
> has failed.  Userspace can check the queue size from the status line to
> judge how urgent it is to fix the problem is: if queueing, userspace *must* 
> intervene to resolve the problem and clear the queue.
> 
You are implementing kernel-side a plugin framework to ensure the PG
switching will go reliably. What can userspace do more when there are no
paths left ?

> I don't believe suspend/resume/table reload is handled properly yet, and I'm
> seeing some oddness when there are errors with split bios.
> 
> Do path selectors and hw_handlers get the info they need?
> 
> Should there be any additional infrastructure provided for hw_handler pg_init fn?
> (e.g. a callback mechanism)
>    pg_init is required to return immediately, and must call dm_pg_init_complete()
> when it's finished.  Does it need more error flags?  e.g. to fail all paths in 
> the PG, not just the one it was called with?
> 
> I'd like to see hw_handler implementations for at least two different types of 
> hardware before pushing this upstream.
> 
I won't touch kernel code for now, but I'm most interested in a plugin
that do a START_STOP followed by forcing a scsi rescan on all the path
of the PG. That is what is needed for StorageWorks controlers in
multibus mode.

regards,
-- 
christophe varoqui <christophe.varoqui at free.fr>