[dm-devel] IO Ordering on Path Failover

Hannes Reinecke hare at suse.de
Wed Mar 5 07:54:45 UTC 2014


On 02/28/2014 11:42 PM, Bob Bawn wrote:
> I am trying to understand how IO ordering safety is enforced with on
> path failover. This is new territory for me so forgive me if this is
> obvious. Consider the sequence:
> 
> 1. client writes(lba=0,val=x) on path A.
> 2. multipath declares path A dead and retries write on path B
> 3. retried write on path B completes successfully and client get ack'd
> 4. client writes(lba=0,val=y) on path B. It also completes
> successfully and is ack'd to client
> 5. write from (1) completes and corrupts data
> 
> It seems like multipath needs a guarantee at step 2 that the original
> write won't complete after path A has been declared down. I thought it
> would issue something like a LUN RESET on path B and that the response
> to that reset would indicate that it is safe to proceed. This page
> sort of supports that speculation:
> http://scst.sourceforge.net/mc_s.html
> 
No, that assumption is wrong.

Strict ordering is only guaranteed for commands submitted from the
HBA to the wire. Once it's in-flight there are _no_ guarantees about
ordering. Eg in a FC Fabric there might be several paths to the same
target, each of which might have a different latency.
So I/O on one path might actually be faster than the other one.
And with CNA's it's virtually impossible to guarantee any I/O
ordering due to several hardware queues involved etc.

Same goes for the linux block layer; the only _enforced_ ordering of
sorts is done by I/O being sent from the page-cache, as each page
can submit only one I/O at a time.
But as soon as you're using O_DIRECT you don't have any ordering
guarantees, either, and it's up to the application to ensure any
ordering requirements.

Which is also what all filesystems do; for any critical section they
wait for the I/O result before continuing.

So for failover any retries will be covered by the multipath layer,
and only the final I/O result will be returned to the upper layers,
rendering any multipath failover invisible.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare at suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)




More information about the dm-devel mailing list