[dm-devel] [Lsf-pc] [LSF/MM TOPIC] a few storage topics

Fri Feb 3 12:37:53 UTC 2012

On Thu, Jan 26, 2012 at 11:40:47AM -0500, Loke, Chetan wrote:
> > From: Andrea Arcangeli [mailto:aarcange at redhat.com]
> > Sent: January 25, 2012 5:46 PM
> 
> ....
> 
> > Way more important is to have feedback on the readahead hits and be
> > sure when readahead is raised to the maximum the hit rate is near 100%
> > and fallback to lower readaheads if we don't get that hit rate. But
> > that's not a VM problem and it's a readahead issue only.
> > 
> 
> A quick google showed up - http://kerneltrap.org/node/6642 
> 
> Interesting thread to follow. I haven't looked further as to what was
> merged and what wasn't.
> 
> A quote from the patch - " It works by peeking into the file cache and
> check if there are any history pages present or accessed."
> Now I don't understand anything about this but I would think digging the
> file-cache isn't needed(?). So, yes, a simple RA hit-rate feedback could
> be fine.
> 
> And 'maybe' for adaptive RA just increase the RA-blocks by '1'(or some
> N) over period of time. No more smartness. A simple 10 line function is
> easy to debug/maintain. That is, a scaled-down version of
> ramp-up/ramp-down. Don't go crazy by ramping-up/down after every RA(like
> SCSI LLDD madness). Wait for some event to happen.
> 
> I can see where Andrew Morton's concerns could be(just my
> interpretation). We may not want to end up like a protocol state machine
> code: tcp slow-start, then increase , then congestion, then let's
> back-off. hmmm, slow-start is a problem for my business logic, so let's
> speed-up slow-start ;).

Loke,

Thrashing safe readahead can work as simple as:

        readahead_size = min(nr_history_pages, MAX_READAHEAD_PAGES)

No need for more slow-start or back-off magics.

This is because nr_history_pages is a lower estimation of the threshing
threshold:

   chunk A           chunk B                      chunk C                 head

   l01 l11           l12   l21                    l22
| |-->|-->|       |------>|-->|                |------>|
| +-------+       +-----------+                +-------------+               |
| |   #   |       |       #   |                |       #     |               |
| +-------+       +-----------+                +-------------+               |
| |<==============|<===========================|<============================|
        L0                     L1                            L2

 Let f(l) = L be a map from
     l: the number of pages read by the stream
 to
     L: the number of pages pushed into inactive_list in the mean time
 then
     f(l01) <= L0
     f(l11 + l12) = L1
     f(l21 + l22) = L2
     ...
     f(l01 + l11 + ...) <= Sum(L0 + L1 + ...)
                        <= Length(inactive_list) = f(thrashing-threshold)

So the count of continuous history pages left in inactive_list is always a
lower estimation of the true thrashing-threshold. Given a stable workload,
the readahead size will keep ramping up and then stabilize in range

        (thrashing_threshold/2, thrashing_threshold)

Thanks,
Fengguang