[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[dm-devel] Re: IO scheduler based IO Controller V2

On Fri, May 08, 2009 at 10:05:01PM +0200, Andrea Righi wrote:

> > Conclusion
> > ==========
> > It just reaffirms that with max BW control, we are not doing a fair job
> > of throttling hence no more hold the IO scheduler properties with-in
> > cgroup.
> > 
> > With proportional BW controller implemented at IO scheduler level, one
> > can do very tight integration with IO controller and hence retain 
> > IO scheduler behavior with-in cgroup.
> It is worth to bug you I would say :). Results are interesting,
> definitely. I'll check if it's possible to merge part of the io-throttle
> max BW control in this controller and who knows if finally we'll be able
> to converge to a common proposal...

Great, Few thoughts though.

- What are your requirements? Do you strictly need max bw control or
  proportional BW control will satisfy your needs? Or you need both?

- With the current algorithm BFQ (modified WF2Q+), we should be able
  to do proportional BW division while maintaining the properties of
  IO scheduler with-in cgroup in hiearchical manner.
  I think it can be simply enhanced to do max bw control also. That is
  whenever a queue is selected for dispatch (from fairness point of view)
  also check the IO rate of that group and if IO rate exceeded, expire
  the queue immediately and fake as if queue consumed its time slice
  which will be equivalent to throttling.

  But in this simple scheme, I think throttling is still unfair with-in
  the class. What I mean is following.

  if an RT task and an BE task are in same cgroup and cgroup exceeds its
  max BW, RT task is next to be dispatched from fairness point of view and it
  will end being throttled. This is still fine because until RT task is
  finished, BE task will never get to run in that cgroup, so at some point
  of time, cgroup rate will come down and RT task will get the IO done
  meeting fairnesss and max bw constraints.

  But this simple scheme does not work with-in same class. Say prio 0
  and prio 7 BE class readers. Now we will end up throttling the guy who
  is scheduled to go next and there is no mechanism that prio0 and prio7
  tasks are throttled in proportionate manner.

  So, we shall have to come up with something better, I think Dhaval was
  implementing upper limit for cpu controller. May be PeterZ and Dhaval can
  give us some pointers how did they manage to implement both proportional
  and max bw control with the help of a single tree while maintaining the
  notion of prio with-in cgroup.

PeterZ/Dhaval  ^^^^^^^^

- We should be able to get rid of reader-writer issue even with above
  simple throttling mechanism for schedulers like deadline and AS, because at
  elevator we see it as a single queue (for both reads and writes) and we
  will throttle this queue. With-in queue dispatch are taken care by io
  scheduler. So as long as IO has been queued in the queue, scheduler
  will take care of giving advantage to readers even if throttling is
  taking place on the queue.

Why am I thinking loud? So that we know what are we trying to achieve at the
end of the day. So at this point of time what are the advantages/disadvantages
of doing max bw control along with proportional bw control?

- With a combined code base, total code should be less as compared to if
  both of them are implemented separately. 

- There can be few advantages in terms of maintaining the notion of IO
  scheduler with-in cgroup. (like RT tasks always goes first in presence
  of BE and IDLE task etc. But simple throttling scheme will not take
  care of fair throttling with-in class. We need a better algorithm to
  achive that goal).

- We probably will get rid of reader writer issue for single queue
  schedulers like deadline and AS. (Need to run tests and see).

- Implementation at IO scheduler/elevator layer does not cover higher
  level logical devices. So one can do max bw control only at leaf nodes
  where IO scheduler is running and not at intermediate logical nodes.
I personally think that proportional BW control will meet more people's
need as compared to max bw contorl. 

So far nobody has come up with a solution where a single proposal covers
all the cases without breaking things. So personally, I want to make
things work at least at IO scheduler level and cover as much ground as
possible without breaking things (hardware RAID, all the direct attached
devices etc) and then worry about higher level software devices.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]