[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[dm-devel] Re: Too many I/O controller patches



Paul Menage wrote:
On Mon, Aug 4, 2008 at 1:44 PM, Andrea Righi <righi andrea gmail com> wrote:
A safer approach IMHO is to force the tasks to wait synchronously on
each operation that directly or indirectly generates i/o.

In particular the solution used by the io-throttle controller to limit
the dirty-ratio in memory is to impose a sleep via
schedule_timeout_killable() in balance_dirty_pages() when a generic
process exceeds the limits defined for the belonging cgroup.

Limiting read operations is a lot more easy, because they're always
synchronized with i/o requests.

I think that you're conflating two issues:

- controlling how much dirty memory a cgroup can have at any given
time (since dirty memory is much harder/slower to reclaim than clean
memory)

- controlling how much effect a cgroup can have on a given I/O device.

By controlling the rate at which a task can generate dirty pages,
you're not really limiting either of these. I think you'd have to set
your I/O limits artificially low to prevent a case of a process
writing a large data file and then doing fsync() on it, which would
then hit the disk with the entire file at once, and blow away any QoS
guarantees for other groups.

Anyway, dirty pages ratio is directly proportional to the IO that will
be performed on the real device, isn't it? this wouldn't prevent IO
bursts as you correctly say, but IMHO it is a simple and quite effective
way to measure the IO write activity of each cgroup on each affected
device.

To prevent the IO peaks I usually reduce the vm_dirty_ratio, but, ok,
this is a workaround, not the solution to the problem either.

IMHO, based on the dirty-page rate measurement, we should apply both
limiting methods: throttle dirty-pages ratio to prevent too many dirty
pages in the system (harde to reclaim and generating
unpredictable/unpleasant/unresponsiveness behaviour), and throttle the
dispatching of IO requests at the device-mapper/IO-scheduler layer to
smooth IO peaks/bursts, generated by fsync() and similar scenarios.

Another different approach could be to implement the measurement in the
elevator, looking at the elapsed between the IO request is issued to the
drive and the request is served. So, look at the start time T1,
completion time T2, take the difference (T2 - T1) and say: cgroup C1
consumed an amount of IO of (T2 - T1), and also use a token-bucket
policy to fill/reduce the "credits" of each IO cgroup in terms of IO
time slots. This would be a more precise measurement, instead of trying
to predict how expensive the IO operation will be, only looking at the
dirty-page ratio. Then throttle both dirty-page ratio *and* the
dispatching of the IO requests submitted by the cgroup that exceeds the
limits.


As Dave suggested, I think it would make more sense to have your
page-dirtying throttle points hook into the memory controller instead,
and allow the memory controller to track/limit dirty pages for a
cgroup, and potentially do throttling as part of that.

Paul

Yes, implementing page-drity throttling in memory controller seems
absolutely reasonable. I can try to move in this direction, merge the
page-dirty throttling in memory controller and also post the RFC.

Thanks,
-Andrea


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]