[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [linux-lvm] Tracing IO requests?



On 3/2/2011 12:13 PM, Jonathan Tripathy wrote:
I once used a tool called dstat. dstat has modules which can tell you which processes are using disk IO. I haven’t used dstat in a while so maybe someone else can chime in

I know the IO is only being caused by a "cp -a" command, but the issue is why all the reads? It should be 99% writes. Another thing I noticed is the average request size is pretty small:

14:06:20 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
[...snip!...]
14:06:21 sde 219.00 11304.00 30640.00 191.53 1.15 5.16 2.10 46.00 14:06:21 sdf 209.00 11016.00 29904.00 195.79 1.06 5.02 2.01 42.00 14:06:21 sdg 178.00 11512.00 28568.00 225.17 0.74 3.99 2.08 37.00 14:06:21 sdh 175.00 10736.00 26832.00 214.67 0.89 4.91 2.00 35.00 14:06:21 sdi 206.00 11512.00 29112.00 197.20 0.83 3.98 1.80 37.00 14:06:21 sdj 209.00 11264.00 30264.00 198.70 0.79 3.78 1.96 41.00 14:06:21 sds 214.00 10984.00 28552.00 184.75 0.78 3.60 1.78 38.00 14:06:21 sdt 194.00 13352.00 27808.00 212.16 0.83 4.23 1.91 37.00 14:06:21 sdu 183.00 12856.00 28872.00 228.02 0.60 3.22 2.13 39.00 14:06:21 sdv 189.00 11984.00 31696.00 231.11 0.57 2.96 1.69 32.00 14:06:21 md5 754.00 0.00 153848.00 204.04 0.00 0.00 0.00 0.00 14:06:21 DayTar-DayTar 753.00 0.00 153600.00 203.98 15.73 20.58 1.33 100.00 14:06:21 data 760.00 0.00 155800.00 205.00 4670.84 6070.91 1.32 100.00

Looks to be about 205 sectors/request, which is 104,960 bytes. This might be causing read-modify-write cycles if for whatever reason md is not taking advantage of the stripe cache. stripe_cache_active shows about 128 blocks (512kB) of RAM in use, per hard drive. Given the chunk size is 512kB, and the writes being requested are linear, it should not be doing read-modify-write. And yet, there are tons of reads being logged, as shown above.

A couple more confusing things:

jo ~ # blockdev --getss /dev/mapper/data
512
jo ~ # blockdev --getpbsz /dev/mapper/data
512
jo ~ # blockdev --getioopt /dev/mapper/data
4194304
jo ~ # blockdev --getiomin /dev/mapper/data
524288
jo ~ # blockdev --getmaxsect /dev/mapper/data
255
jo ~ # blockdev --getbsz /dev/mapper/data
512
jo ~ #

If optimum IO size is 4MBs (as it SHOULD be: 512k chunk * 8 data drives = 4MB stripe), but maxsect count is 255 (255*512=128k) how can optimal IO ever be done??? I re-mounted XFS with sunit=1024,swidth=8192 but that hasn't increased the average transaction size as expected. Perhaps it's respecting this maxsect limit?

--Bart

PS: The RAID6 full stripe has +2 parity drives for a total of 10, but they're not included in the "data zone" definitions of stripe size, which are the only important ones for figuring out how large your writes should be.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]