[dm-devel] 2.6.10-rc1-udm1: multipath work in progress

Lars Marowsky-Bree lmb at suse.de
Fri Nov 5 23:47:46 UTC 2004


On 2004-11-05T23:18:36, Alasdair G Kergon <agk at redhat.com> wrote:

> On Fri, Nov 05, 2004 at 11:53:49PM +0100, Lars Marowsky-Bree wrote:
> > True. But then, which PG would you try first from the list of bypassed
> > ones, if all are bypassed? (Either because user-space told us to, or
> > because the error handler did it; doesn't matter much.)
> The current algorithm is simple - the one with the highest priority.

Yep.

> i.e. if all PGs are bypassed, the code simply behaves exactly as if
> there never had been any bypass logic added.

Exactly my point.

> But what if *that* one now also says 'switch_pg' ?
> Currently, we go back to the first one and if it says 'switch PG' again,
> we start failing its paths, so we have a finite process:
> A PG can only say 'bypass me' once - after that its paths will be
> failed forcibly.  And the logic is simple and transparent.
> 
> The new way still needs a flag against each PG to indicate that it
> said 'switch_pg'?
> 
> > "try reinitializing ourselves if we are the last PG with
> > healthy paths standing".
> 
> But *something* is needed to cope with all the pg_init_fns 
> always returning 'switch_pg', which would mean there never
> was just one 'last PG with healthy paths standing'.

Ah. Infinite ping-ponging. I see what you're saying.

I've had to think about this one a bit. But there's also a flaw in the
bypassed logic here (at least as implemented now, unless you got some
patch I've not yet seen). Namely, we don't ever reset it. 

So if both PGs end up bypassed, we'll always run into the bypassed--
round, and the first PG will never actually be able to say "dude!
Switch-over to the other PG now!", w/o actively failing all paths in it
first. (Even though it might have handled IO for a couple of seconds or
even minutes in between just fine already, and now really just wants to
switch.) 

That's also sub-optimal, because we don't just prohibit immediate and
infinite ping-pong (which is likely a bug in the pg_init function
anyway), but even ping-pong just caused by the admin "duh! no! I meant
the other pg" behaviour.

So yes, we probably ought to remember that we switched PGs, but probably
also should clear that flag after some IO has succeeded once w/o causing
a PG switch over.

This could probably in fact be implemented using the bypassed flag to
each pg, or by a general "I've tried switching PGs already" flag to the
table as a whole.

> I'm attracted to making the PGs sticky by default, and adding
> a 'switch_pg' message from userspace, replacing the ability
> to set/reset the bypass flags.  (A switch_pg message would
> reset all the bypass flags.  They'd be reported to
> userspace, but the only reason for userspace needing to 
> change them would be to facilitate testing i.e. no need
> for multipath tools to ever change them.)

Sounds good.

Well there might be the reason to set them on the initial table load to
get us off started correctly.


Sincerely,
    Lars Marowsky-Brée <lmb at suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business




More information about the dm-devel mailing list