[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: [dm-devel] [Lsf-pc] [LSF/MM TOPIC] a few storage topics
- From: Jeff Moyer <jmoyer redhat com>
- To: Andreas Dilger <adilger dilger ca>
- Cc: Andrea Arcangeli <aarcange redhat com>, Jan Kara <jack suse cz>, "linux-scsi vger kernel org" <linux-scsi vger kernel org>, Mike Snitzer <snitzer redhat com>, Christoph Hellwig <hch infradead org>, "dm-devel redhat com" <dm-devel redhat com>, Boaz Harrosh <bharrosh panasas com>, "linux-fsdevel vger kernel org" <linux-fsdevel vger kernel org>, "lsf-pc lists linux-foundation org" <lsf-pc lists linux-foundation org>, Chris Mason <chris mason oracle com>
- Subject: Re: [dm-devel] [Lsf-pc] [LSF/MM TOPIC] a few storage topics
- Date: Tue, 24 Jan 2012 13:05:50 -0500
Andreas Dilger <adilger dilger ca> writes:
> On 2012-01-24, at 9:56, Christoph Hellwig <hch infradead org> wrote:
>> On Tue, Jan 24, 2012 at 10:15:04AM -0500, Chris Mason wrote:
>>> https://lkml.org/lkml/2011/12/13/326
>>>
>>> This patch is another example, although for a slight different reason.
>>> I really have no idea yet what the right answer is in a generic sense,
>>> but you don't need a 512K request to see higher latencies from merging.
>>
>> That assumes the 512k requests is created by merging. We have enough
>> workloads that create large I/O from the get go, and not splitting them
>> and eventually merging them again would be a big win. E.g. I'm
>> currently looking at a distributed block device which uses internal 4MB
>> chunks, and increasing the maximum request size to that dramatically
>> increases the read performance.
>
> (sorry about last email, hit send by accident)
>
> I don't think we can have a "one size fits all" policy here. In most
> RAID devices the IO size needs to be at least 1MB, and with newer
> devices 4MB gives better performance.
Right, and there's more to it than just I/O size. There's access
pattern, and more importantly, workload and related requirements
(latency vs throughput).
> One of the reasons that Lustre used to hack so much around the VFS and
> VM APIs is exactly to avoid the splitting of read/write requests into
> pages and then depend on the elevator to reconstruct a good-sized IO
> out of it.
>
> Things have gotten better with newer kernels, but there is still a
> ways to go w.r.t. allowing large IO requests to pass unhindered
> through to disk (or at least as far as enduring that the IO is aligned
> to the underlying disk geometry).
I've been wondering if it's gotten better, so decided to run a few quick
tests.
kernel version 3.2.0, storage: hp eva fc array, i/o scheduler cfq,
max_sectors_kb: 1024, test program: dd
ext3:
- buffered writes and buffered O_SYNC writes, all 1MB block size show 4k
I/Os passed down to the I/O scheduler
- buffered 1MB reads are a little better, typically in the 128k-256k
range when they hit the I/O scheduler.
ext4:
- buffered writes: 512K I/Os show up at the elevator
- buffered O_SYNC writes: data is again 512KB, journal writes are 4K
- buffered 1MB reads get down to the scheduler in 128KB chunks
xfs:
- buffered writes: 1MB I/Os show up at the elevator
- buffered O_SYNC writes: 1MB I/Os
- buffered 1MB reads: 128KB chunks show up at the I/O scheduler
So, ext4 is doing better than ext3, but still not perfect. xfs is
kicking ass for writes, but reads are still split up.
Cheers,
Jeff
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]