[linux-lvm] Tracing IO requests?

Bart Kus me at bartk.us
Wed Mar 2 22:19:33 UTC 2011


On 3/2/2011 12:13 PM, Jonathan Tripathy wrote:
> I once used a tool called dstat. dstat has modules which can tell you 
> which processes are using disk IO. I haven’t used dstat in a while so 
> maybe someone else can chime in

I know the IO is only being caused by a "cp -a" command, but the issue 
is why all the reads?  It should be 99% writes.  Another thing I noticed 
is the average request size is pretty small:

14:06:20          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  
avgqu-sz     await     svctm     %util
[...snip!...]
14:06:21          sde    219.00  11304.00  30640.00    191.53      
1.15      5.16      2.10     46.00
14:06:21          sdf    209.00  11016.00  29904.00    195.79      
1.06      5.02      2.01     42.00
14:06:21          sdg    178.00  11512.00  28568.00    225.17      
0.74      3.99      2.08     37.00
14:06:21          sdh    175.00  10736.00  26832.00    214.67      
0.89      4.91      2.00     35.00
14:06:21          sdi    206.00  11512.00  29112.00    197.20      
0.83      3.98      1.80     37.00
14:06:21          sdj    209.00  11264.00  30264.00    198.70      
0.79      3.78      1.96     41.00
14:06:21          sds    214.00  10984.00  28552.00    184.75      
0.78      3.60      1.78     38.00
14:06:21          sdt    194.00  13352.00  27808.00    212.16      
0.83      4.23      1.91     37.00
14:06:21          sdu    183.00  12856.00  28872.00    228.02      
0.60      3.22      2.13     39.00
14:06:21          sdv    189.00  11984.00  31696.00    231.11      
0.57      2.96      1.69     32.00
14:06:21          md5    754.00      0.00 153848.00    204.04      
0.00      0.00      0.00      0.00
14:06:21    DayTar-DayTar    753.00      0.00 153600.00    203.98     
15.73     20.58      1.33    100.00
14:06:21         data    760.00      0.00 155800.00    205.00   
4670.84   6070.91      1.32    100.00

Looks to be about 205 sectors/request, which is 104,960 bytes.  This 
might be causing read-modify-write cycles if for whatever reason md is 
not taking advantage of the stripe cache.  stripe_cache_active shows 
about 128 blocks (512kB) of RAM in use, per hard drive.  Given the chunk 
size is 512kB, and the writes being requested are linear, it should not 
be doing read-modify-write.  And yet, there are tons of reads being 
logged, as shown above.

A couple more confusing things:

jo ~ # blockdev --getss /dev/mapper/data
512
jo ~ # blockdev --getpbsz /dev/mapper/data
512
jo ~ # blockdev --getioopt /dev/mapper/data
4194304
jo ~ # blockdev --getiomin /dev/mapper/data
524288
jo ~ # blockdev --getmaxsect /dev/mapper/data
255
jo ~ # blockdev --getbsz /dev/mapper/data
512
jo ~ #

If optimum IO size is 4MBs (as it SHOULD be: 512k chunk * 8 data drives 
= 4MB stripe), but maxsect count is 255 (255*512=128k) how can optimal 
IO ever be done???  I re-mounted XFS with sunit=1024,swidth=8192 but 
that hasn't increased the average transaction size as expected.  Perhaps 
it's respecting this maxsect limit?

--Bart

PS: The RAID6 full stripe has +2 parity drives for a total of 10, but 
they're not included in the "data zone" definitions of stripe size, 
which are the only important ones for figuring out how large your writes 
should be.




More information about the linux-lvm mailing list