[dm-devel] 2.6.10-rc1-udm1: multipath work in progress

Tue Nov 2 22:22:04 UTC 2004

Le mardi 02 novembre 2004 à 20:19 +0000, Alasdair G Kergon a écrit :
> Miscellaneous points:
> 
> On Tue, Nov 02, 2004 at 08:46:04PM +0100, christophe varoqui wrote:
> > Let the kernel fail them ... as soon as the primary PG paths are
> > exhausted, it will switch to the secondary PG and an event will cause
> > multipathd to reconfigure the table. The secondary will become primary,
> > and failed paths will come back up, grouped in a low prio PG.
>  
> Which may require rapid intervention by userspace, or the queue_if_no_paths 
> pause to give userspace time to sort things out.
> 
Is the following example illustrates what you have in mind ?

| pg1 | pg2 |	pg1 maps paths to ctr1, pg2 - ctr2
====================================================================
| A A | A A |	paths in pg2 are marked A but are unusable
| F F | A A |	ctr1 shuts down, ctr2 takes over, now pg2 paths
		are really up, maybe with a little help from
		pg_init_fn. Event is caught by multipathd
|-A -A| A A |	now you want multipathd to disable pg1 and reinstate
		its paths
|-A -A| F F |	so that when ctr2 shuts, kernel can switch over to pg1
		and pray for its paths to be up
| A A |-A -A|	then for multipathd to regularize.

The current model being :
====================================================================
| A A | A A |	paths in pg2 are marked A but are unusable
| F F | A A |	ctr1 shuts down, ctr2 takes over, now pg2 paths
		are really up, maybe with a little help from
		pg_init_fn. Event is caught by multipathd
| A A | A A |	multipathd swaps pg1 and pg2, ctr1 paths are marked up
		by the table reload
| F F | A A |	so that when ctr2 (pg1) shuts, kernel can switch over
		to pg2 and pray for its paths to be up
| A A | A A |	then for multipathd to regularize.
=====================================================================

> [Consider the primary pg_init_fn finds the paths would be OK but
> aren't current, so fails them all so the currently-preferred secondary can
> be used.  But the secondary paths turn out to have genuinely failed so you
> *do* want to use the primary after all, but you can't now.  How do you tell
> the primary to *forcibly* use the paths?  This method has effectively
> transferred the pg_init_fn to userspace.  
> 
Note I did see pg_init_fn as a best effort fn to try to activate the
paths in a PG that is going to be used as soon as the fn returns.
Whatever the return value.

If it is not the case, I should reconsider the whole thing but I
wouldn't understand why you would want to give it more wits.

> Or it requires giving the
> pg_init_fn complete knowledge of the configuration so it checks both primary
> and secondary PGs before deciding what to do - but then that has an
> equivalent effect to what's already implemented in these patches using PG
> enable/disable. Or you have a 3rd and 4th PG duplicating the 1st & 2nd ones
> but with a new 'force' flag.]
> 
> [I see queue_if_no_paths very much as a last resort: it's there
> as an option for not-so-good hardware.  In any decent system there should 
> never be no paths without catastrophic hardware failure.]
> 
So what is wrong with letting it be the default if it is not used at all
for sane hardware. Seems harmless.

> > We can failback already, with the current design.
> > As I see it, all the "disable PG" feature brings is save some table
> > reloads. Is it worth the added complexity ?
> Performing tables reloads is the complex option IMHO.
> [Even ignoring the suspend/resume queueing issues that aren't
> resolved yet.]
> 
I would guess they need resolving anyway

> Table reloads wipe all knowledge of the existing state from the kernel and
> start afresh,
> 
Hey, I actually use that property in the current design :)

>  so pg_init_fn's have to be run again etc.
> 
Don't they run too when a disabled PG is used as a last resort ?

>   They also cannot
> avoid allocating memory, which might not be available immediately.
> You can't assume a table reload will succeed and must always have a
> fallback plan in case it fails.  
>  
That I can't argue against.
But in a low memory situation I feel your scheme won't bring much more
garanties : it relies on userspace too after all.

I guess I've gone near the bottom of my arguments chest. I'll stay a bit
more passive for a while and try to grasp the impact of all this on the
tools design.

regards,
-- 
christophe varoqui <christophe.varoqui at free.fr>