[dm-devel] performance considerations with IO schedulers and DMmultipathing

Kiyoshi Ueda k-ueda at ct.jp.nec.com
Thu Dec 6 01:42:19 UTC 2007


Hi,

On Fri, 30 Nov 2007 14:02:31 +0100, Stefan Bader wrote:
> 2007/11/30, Romanowski, John (OFT) <John.Romanowski at oft.state.ny.us>:
> >
> > Here's some links:
> >
> > Google cache of this article:
> > "The basis for my test was to determine the best possible performance
> > combination of elevator tuning AND /etc/multipath.conf rr_min_io setting."
> >
> > http://72.14.209.104/search?q=cache:q2p5HOwGxHwJ:www.techyblog.com/content/view/45/28/+Multipath+rr_min_io+Oracle+Elevator+Benchmarks&hl=en&ct=clnk&cd=1&gl=us
> 
> 
> The thing is, that there are two settings that affect different drivers. The
> I/O scheduler setting will affect the disks that are part of the multipath
> volume (and only them), while the rr_min_io affects the multipath volume.
> The higher the value of rr_min_io, the more requests are sent down one path
> before switching to the next in the same path group. While this is good for
> sequential I/O (because the elevator/scheduler on the underlying device can
> merge more efficiently), this reduces the amount of I/O that is sent in
> parallel. With very high rr_min_io settings you will end up using mostly one
> path at a time, while the others are idle.
> Using small values for rr_min_io, the chances of spreading the requests over
> all paths are higher, but so is the chance of separating a long sequence
> into smaller parts that are not sequential for the disk devices that make
> the paths. Here a scheduler setting that copes with that pattern can help.
> Another approach, that is not in the mainline kernel yet, is to introduce a
> queue to the multipath target, merge sequential request there and send each
> I/O down another path (like rr_min_io=1 would do). Kiyoshi Ueda from NEC had
> a presentation about this on last years OLS (
> https://ols2006.108.redhat.com/2007/Reprints/ueda-Reprint.pdf ). From their
> evaluation of the current kernel, smaller rr_min_io values improved
> performance but the best value was different for reads and writes.

Although I can't say what combination is the best on the current
bio-based dm-multipath, I wrote down my experiences and understanding
below.  I hope that helps.


I used 2.6.19.1 and single dd on block device and ext3 filesystem
to evaluate sequential I/O performance.
I remenber that cfq/as were the best for READ and there was no big
difference for WRITE.

Generally speaking, READ has synchronous behavior and underlying
devices don't become so busy.  So I/O schedulers which dispatch READ
request quickly like cfq are good for READ rather than keeping for
long time to merge.
As for WRITE, there is no big difference between I/O schedulers,
because underlying devices are almost busy, so merging as much as
possible is good for WRITE and all I/O schedulers do that.
I haven't tested random I/O using different I/O schedulers.
It may depend on workload which I/O scheduler is the best.


As for the rr_min_io, it's very complex, because it depends on
page size (and read-ahead code as for READ).
For exsample, on systems with 4k page size:
  - For READ
    * On block device: 8 or 16 is good for 2 paths environment because:
      block device doesn't have readpages() operation, so each page
      is submitted as 1 bio.
      READ is based on read-ahead and read-ahead window size is 128k
      (not configurable on bio-based dm device).
      So dispatching half size of the window to a same path is good for
      minimum number of requests on each path in 2 paths environment.
    * On filesystem : 1 is good because:
      Almost all filesystems including ext3 have readpages() operation,
      so filesystems make a read-ahead-window-size (128k) bio.
      Since up to 2 windows (so 2 bios) can be submitted in 2.6.19.1,
      dispaching them to each paths is good.
    (NOTE: Since read-ahead code has been changed in 2.6.22 or so very
           much, the values above may differ now.)

  - for WRITE
    * 64 or 128 is good because:
      Each page is submitted as 1 bio for WRITE, regardless block device
      or filesystem.
      And default q->max_sectors is 512k.
      So dispatching 128 bios to a same path is good for maxmum merge
      on 4k page size system.

So it is very difficult to find the best rr_min_io on real workload.

If request-based dm-multipath is used, the best rr_min_io should be
always 1 :-)

Thanks,
Kiyoshi Ueda




More information about the dm-devel mailing list