[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [dm-devel] [Lsf-pc] [LSF/MM TOPIC] a few storage topics

Andreas Dilger <adilger dilger ca> writes:

> On 2012-01-24, at 9:56, Christoph Hellwig <hch infradead org> wrote:
>> On Tue, Jan 24, 2012 at 10:15:04AM -0500, Chris Mason wrote:
>>> https://lkml.org/lkml/2011/12/13/326
>>> This patch is another example, although for a slight different reason.
>>> I really have no idea yet what the right answer is in a generic sense,
>>> but you don't need a 512K request to see higher latencies from merging.
>> That assumes the 512k requests is created by merging.  We have enough
>> workloads that create large I/O from the get go, and not splitting them
>> and eventually merging them again would be a big win.  E.g. I'm
>> currently looking at a distributed block device which uses internal 4MB
>> chunks, and increasing the maximum request size to that dramatically
>> increases the read performance.
> (sorry about last email, hit send by accident)
> I don't think we can have a "one size fits all" policy here. In most
> RAID devices the IO size needs to be at least 1MB, and with newer
> devices 4MB gives better performance.

Right, and there's more to it than just I/O size.  There's access
pattern, and more importantly, workload and related requirements
(latency vs throughput).

> One of the reasons that Lustre used to hack so much around the VFS and
> VM APIs is exactly to avoid the splitting of read/write requests into
> pages and then depend on the elevator to reconstruct a good-sized IO
> out of it.
> Things have gotten better with newer kernels, but there is still a
> ways to go w.r.t. allowing large IO requests to pass unhindered
> through to disk (or at least as far as enduring that the IO is aligned
> to the underlying disk geometry).

I've been wondering if it's gotten better, so decided to run a few quick

kernel version 3.2.0, storage: hp eva fc array, i/o scheduler cfq,
max_sectors_kb: 1024, test program: dd

- buffered writes and buffered O_SYNC writes, all 1MB block size show 4k
  I/Os passed down to the I/O scheduler
- buffered 1MB reads are a little better, typically in the 128k-256k
  range when they hit the I/O scheduler.

- buffered writes: 512K I/Os show up at the elevator
- buffered O_SYNC writes: data is again 512KB, journal writes are 4K
- buffered 1MB reads get down to the scheduler in 128KB chunks

- buffered writes: 1MB I/Os show up at the elevator
- buffered O_SYNC writes: 1MB I/Os
- buffered 1MB reads: 128KB chunks show up at the I/O scheduler

So, ext4 is doing better than ext3, but still not perfect.  xfs is
kicking ass for writes, but reads are still split up.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]