[dm-devel] dm-multipath splitting IOs in 4k blocks
Bob
M8R-0t7cpu at mailinator.com
Thu Jan 28 16:24:03 UTC 2010
On Fri, 22 Jan 2010 at 14:14:56 -0500,
Mike Snitzer <snitzer redhat com> wrote:
> On Fri, Jan 22 2010 at 11:41am -0500,
> Bob <M8R-0t7cpu mailinator com> wrote:
>
> > Hello,
> >
> > I have a question about dm-multipath. As you can see below, it seems that
> > multipath splits any IO incoming to the device in 4k blocks, and then
> > reassembles it when doing the actual read from the SAN. If the device is opened
> > in direct IO mode, this behavior is not experienced. It is not experienced
> > either if the IO is sent directly to a single path (eg /dev/sdef in this
> > example).
> >
> > My question is : what causes this behavior, and is there any way to change that ?
>
> direct-io will cause DM to accumulate pages into larger bios (via
> bio_add_page calls to dm_merge_bvec). This is why you see larger
> requests with iflag=direct.
>
> Buffered IO writes (from the page-cache) will always be in one-page
> units. It is the IO scheduler that will merge these requests.
>
> Buffered IO reads _should_ have larger requests. So it is curious that
> you're seeing single-page read requests. I can't reproduce that on a
> recent kernel.org kernel. Will need time to test on RHEL 5.3.
I tested on a vanilla 2.6.31.12, and the 4k limitation is indeed gone (took me some time because of a buggy nash).
I also needed to upgrade multipath-tools to get the "Bad DM version" fix.
Anyway, I'm a bit clueless as to where to start looking for which commit removed the bug... (can we call that a bug ?)
>
> NOTE: all DM devices should behave like I explained above (you just
> happen to be focusing on dm-multipath). Testing against normal "linear"
> DM devices would also be valid.
Indeed, the results are the same.
>
> > Some quick dd tests would tend to show that the device is quite faster if
> > multipath doesn't split the IOs.
>
> The testing output you provided doesn't reflect that (nor would I expect
> it to for sequential IO if readahead is configured)...
Speaking of read-ahead, which one is used among :
- the path RA ( /dev/sdX )
- the mpath RA ( /dev/mapper/mpathX )
- the LVM RA ( /dev/mapper/lvg-lvs ) ?
Thanks for your time
Bob
>
> Mike
>
> > [root test-bis ~]# dd if=/dev/dm-5 of=/dev/null bs=16384
> >
> > Meanwhile...
> >
> > [root test-bis ~]# iostat -kx /dev/dm-5 /dev/sdef /dev/sdfh /dev/sdgi /dev/sdgw 5
> > ...
> > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
> > sdef 4187.82 0.00 289.42 0.00 17932.14 0.00 123.92 0.45 1.56 1.01 29.34
> > sdfh 4196.41 0.00 293.81 0.00 17985.63 0.00 122.43 0.41 1.39 0.90 26.37
> > sdgi 4209.98 0.00 286.43 0.00 17964.07 0.00 125.44 0.69 2.38 1.43 40.98
> > sdgw 4188.62 0.00 289.22 0.00 17885.03 0.00 123.68 0.54 1.87 1.16 33.59
> > dm-5 0.00 0.00 17922.55 0.00 71690.22 0.00 8.00 47.14 2.63 0.05 98.28
> >
> > => avgrq-sz is 4kB (8.00 blocks) on the mpath device
> > --------
> > [root test-bis ~]# dd if=/dev/dm-5 iflag=direct of=/dev/null bs=16384
> >
> > iostat now gives :
> > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
> > sdef 0.00 0.00 640.00 0.00 10240.00 0.00 32.00 0.31 0.48 0.48 30.86
> > sdfh 0.00 0.00 644.40 0.00 10310.40 0.00 32.00 0.22 0.34 0.34 22.10
> > sdgi 0.00 0.00 663.80 0.00 10620.80 0.00 32.00 0.24 0.36 0.36 24.20
> > sdgw 0.00 0.00 640.00 0.00 10240.00 0.00 32.00 0.20 0.32 0.32 20.28
> > dm-5 0.00 0.00 2587.00 0.00 41392.00 0.00 32.00 0.97 0.38 0.38 97.20
> >
> > => avgrq-sz is now 16kB (32.00 blocks) on the mpath device
More information about the dm-devel
mailing list