[dm-devel] Re: IO scheduler based IO Controller V2

Thu May 7 15:36:53 UTC 2009

On Thu, May 07, 2009 at 10:45:01AM -0400, Vivek Goyal wrote:
> On Thu, May 07, 2009 at 10:11:26AM -0400, Vivek Goyal wrote:
> 
> [..]
> > [root at chilli io-throttle-tests]# ./andrea-test-script.sh 
> > RT: 223+1 records in
> > RT: 223+1 records out
> > RT: 234179072 bytes (234 MB) copied, 0.988448 s, 237 MB/s
> > BE: 223+1 records in
> > BE: 223+1 records out
> > BE: 234179072 bytes (234 MB) copied, 1.93885 s, 121 MB/s
> > 
> > So I am still seeing the issue with differnt kind of disks also. At this point
> > of time I am really not sure why I am seeing such results.
> 
> Hold on. I think I found the culprit here. I was thinking that what is
> the difference between two setups and realized that with vanilla kernels
> I had done "make defconfig" and with io-throttle kernels I had used an
> old config of my and did "make oldconfig". So basically config files
> were differnt.
> 
> I now used the same config file and issues seems to have gone away. I
> will look into why an old config file can force such kind of issues.
> 

Hmm.., my old config had "AS" as default scheduler that's why I was seeing
the strange issue of RT task finishing after BE. My apologies for that. I
somehow assumed that CFQ is default scheduler in my config.

So I have re-run the test to see if we are still seeing the issue of
loosing priority and class with-in cgroup. And we still do..

2.6.30-rc4 with io-throttle patches
===================================
Test1
=====
- Two readers, one BE prio 0 and other BE prio 7 in a cgroup limited with
  8MB/s BW.

234179072 bytes (234 MB) copied, 55.8448 s, 4.2 MB/s
prio 0 task finished
234179072 bytes (234 MB) copied, 55.8878 s, 4.2 MB/s

Test2
=====
- Two readers, one RT prio 0 and other BE prio 7 in a cgroup limited with
  8MB/s BW.

234179072 bytes (234 MB) copied, 55.8876 s, 4.2 MB/s
234179072 bytes (234 MB) copied, 55.8984 s, 4.2 MB/s
RT task finished

Test3
=====
- Reader Starvation
- I created a cgroup with BW limit of 64MB/s. First I just run the reader
  alone and then I run reader along with 4 writers 4 times. 

Reader alone
234179072 bytes (234 MB) copied, 3.71796 s, 63.0 MB/s

Reader with 4 writers
---------------------
First run
234179072 bytes (234 MB) copied, 30.394 s, 7.7 MB/s 

Second run
234179072 bytes (234 MB) copied, 26.9607 s, 8.7 MB/s

Third run
234179072 bytes (234 MB) copied, 37.3515 s, 6.3 MB/s

Fourth run
234179072 bytes (234 MB) copied, 36.817 s, 6.4 MB/s

Note that out of 64MB/s limit of this cgroup, reader does not get even
1/5 of the BW. In normal systems, readers are advantaged and reader gets
its job done much faster even in presence of multiple writers.   

Vanilla 2.6.30-rc4
==================

Test3
=====
Reader alone
234179072 bytes (234 MB) copied, 2.52195 s, 92.9 MB/s

Reader with 4 writers
---------------------
First run
234179072 bytes (234 MB) copied, 4.39929 s, 53.2 MB/s

Second run
234179072 bytes (234 MB) copied, 4.55929 s, 51.4 MB/s

Third run
234179072 bytes (234 MB) copied, 4.79855 s, 48.8 MB/s

Fourth run
234179072 bytes (234 MB) copied, 4.5069 s, 52.0 MB/s

Notice, that without any writers we seem to be having BW of 92MB/s and
more than 50% of that BW is still assigned to reader in presence of
writers. Compare this with io-throttle cgroup of 64MB/s where reader
struggles to get 10-15% of BW. 

So any 2nd level control will break the notion and assumptions of
underlying IO scheduler. We should probably do control at IO scheduler
level to make sure we don't run into such issues while getting
hierarchical fair share for groups.

Thanks
Vivek

> So now we are left with the issue of loosing the notion of priority and
> class with-in cgroup. In fact on bigger systems we will probably run into > issues of kiothrottled scalability as single thread is trying to cater to
> all the disks.
> 
> If we do max bw control at IO scheduler level, then I think we should be able
> to control max bw while maintaining the notion of priority and class with-in
> cgroup. Also there are multiple pdflush threads and jens seems to be pushing
> flusher threads per bdi which will help us achieve greater scalability and
> don't have to replicate that infrastructure for kiothrottled also.
> 
> Thanks
> Vivek