[dm-devel] [Lsf-pc] [LSF/MM TOPIC] a few storage topics
Wu Fengguang
fengguang.wu at gmail.com
Fri Feb 3 12:37:53 UTC 2012
On Thu, Jan 26, 2012 at 11:40:47AM -0500, Loke, Chetan wrote:
> > From: Andrea Arcangeli [mailto:aarcange at redhat.com]
> > Sent: January 25, 2012 5:46 PM
>
> ....
>
> > Way more important is to have feedback on the readahead hits and be
> > sure when readahead is raised to the maximum the hit rate is near 100%
> > and fallback to lower readaheads if we don't get that hit rate. But
> > that's not a VM problem and it's a readahead issue only.
> >
>
> A quick google showed up - http://kerneltrap.org/node/6642
>
> Interesting thread to follow. I haven't looked further as to what was
> merged and what wasn't.
>
> A quote from the patch - " It works by peeking into the file cache and
> check if there are any history pages present or accessed."
> Now I don't understand anything about this but I would think digging the
> file-cache isn't needed(?). So, yes, a simple RA hit-rate feedback could
> be fine.
>
> And 'maybe' for adaptive RA just increase the RA-blocks by '1'(or some
> N) over period of time. No more smartness. A simple 10 line function is
> easy to debug/maintain. That is, a scaled-down version of
> ramp-up/ramp-down. Don't go crazy by ramping-up/down after every RA(like
> SCSI LLDD madness). Wait for some event to happen.
>
> I can see where Andrew Morton's concerns could be(just my
> interpretation). We may not want to end up like a protocol state machine
> code: tcp slow-start, then increase , then congestion, then let's
> back-off. hmmm, slow-start is a problem for my business logic, so let's
> speed-up slow-start ;).
Loke,
Thrashing safe readahead can work as simple as:
readahead_size = min(nr_history_pages, MAX_READAHEAD_PAGES)
No need for more slow-start or back-off magics.
This is because nr_history_pages is a lower estimation of the threshing
threshold:
chunk A chunk B chunk C head
l01 l11 l12 l21 l22
| |-->|-->| |------>|-->| |------>|
| +-------+ +-----------+ +-------------+ |
| | # | | # | | # | |
| +-------+ +-----------+ +-------------+ |
| |<==============|<===========================|<============================|
L0 L1 L2
Let f(l) = L be a map from
l: the number of pages read by the stream
to
L: the number of pages pushed into inactive_list in the mean time
then
f(l01) <= L0
f(l11 + l12) = L1
f(l21 + l22) = L2
...
f(l01 + l11 + ...) <= Sum(L0 + L1 + ...)
<= Length(inactive_list) = f(thrashing-threshold)
So the count of continuous history pages left in inactive_list is always a
lower estimation of the true thrashing-threshold. Given a stable workload,
the readahead size will keep ramping up and then stabilize in range
(thrashing_threshold/2, thrashing_threshold)
Thanks,
Fengguang
More information about the dm-devel
mailing list