[rhelv6-list] RHEL6.2 XFS brutal performence with lots of files

Fri Apr 12 13:42:16 UTC 2013

Hi,

I figured I'd try to solicit the kind help of the mailing list again on
this as I continue to have issues with XFS and RHEL6.  For example, I have
a 12 TB software RAID5 filesystem on a LSI 92118  and the drives are 3 TB
Seagate Barracuda ST3000DM001.  This filesystem currently has around 140
million files with many of them smaller than 50 KB.  This system is running
a fully patched RHEL6.4

Within this filesystem, I have a one particular tree of files I need to
remove.  There are ~170 folders with around 4-10 sub-folders each and about
1,000 files in each of those sub-folders.  Most files are less than 40KB.
 Attempting to list out one of those top level folders like so:

ls -R * | wc -l

takes 50 seconds and wc reports 3825 lines (~files). Watching iostat during
this operation, the tps value pokes along around 100 to 150 tps.  This
filesystem is doing other things at the time as well.  Just running iostat
without args currently reports:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          11.12    0.03    2.70    3.60    0.00   82.56

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
md127           134.36     10336.87     11381.45 19674692141 21662893316

If I go into one of these folders with 1,000 files or so in it and attempt
to list out the directory cold, it takes 10-15 seconds.   Attempting to
remove one of the top level folders takes a long time and the other
filesystem operations at the time feel very sluggish as well.

$ time rm -rf myfolder  (this is around 4,000 files total within 6
subfolders of myfolder)

real 2m36.925s
user 0m0.018s
sys 0m0.657s

Running hdparm on one of the software raid5 drives reports decent numbers.

/dev/sdb:
 Timing cached reads:   12882 MB in  2.00 seconds = 6448.13 MB/sec
 Timing buffered disk reads:  396 MB in  3.06 seconds = 129.39 MB/sec

running some crude dd tests shows reasonable numbers, I think.

# dd bs=1M count=1280 if=/dev/zero of=test conv=fdatasync
1342177280 bytes (1.3 GB) copied, 29.389 s, 45.7 MB/s

I have other similiar filesystems on ext4 with similiar hardware and
millions of small files as well.  I don't see such sluggishness with small
files and directories there.  I guess I picked XFS for this filesystem
initially because of its fast fsck times.

Here are some more details on the filesystem

# xfs_info /dev/md127
meta-data=/dev/md127             isize=256    agcount=32, agsize=91570816
blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=2930265088, imaxpct=5
         =                       sunit=128    swidth=512 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

# grep md127 /proc/mounts
/dev/md127 /mesonet xfs
rw,noatime,attr2,delaylog,sunit=1024,swidth=4096,noquota 0 0

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : active raid5 sdf[3] sde[0] sdd[5] sdc[1] sdb[2]
      11721060352 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5]
[UUUUU]

Anybody have ideas? or are these still known issues with XFS on RHEL as
noted here:

http://www.redhat.com/summit/2011/presentations/summit/decoding_the_code/thursday/wheeler_t_0310_billion_files_2011.pdf

thanks
daryl

On Fri, Jun 22, 2012 at 7:45 PM, Daryl Herzmann <akrherz at iastate.edu> wrote:

> On Tue, Jun 5, 2012 at 3:10 PM, Jussi Silvennoinen
> <jussi_rhel6 at silvennoinen.net> wrote:
> >> I've been noticing lots of annoying problems with XFS performance with
> >> RHEL6.2 on 64bit.  I typically have 20-30 TB file systems with data
> >> structured in directories based on day of year, product type, for
> example,
> >>
> >>  /data/2012/06/05/product/blah.gif
> >>
> >> Doing operations like tar or rm over these directories bring the system
> to
> >> a grinding halt.  Load average goes vertical and eventually the power
> button
> >> needs to be pressed in many cases :( A hack workaround is to break
> apart the
> >> task into smaller chunks and let the system breath in between
> operations...
> >>
> >> Anyway, I read Ric Wheeler's "Billion Files" with great interest
> >>
> >>
> >>
> http://www.redhat.com/summit/2011/presentations/summit/decoding_the_code/thursday/wheeler_t_0310_billion_files_2011.pdf
> >>
> >> It appears there are 'known issues' with XFS and RHEL6.1.  It does not
> >> appear these issues were addressed in RHEL 6.2?
> >>
> >> Does anybody know if these issues were addressed in the upcoming RHEL
> 6.3?
> >> My impression is that upstream fixes for this only recently (last 6
> months?)
> >> appeared in the mainline kernel.
> >>
> >> Perhaps I am missing some tuning that could be done to help with this?
> >
> >
> > Enabling lazy-count does wonders for workloads that involve massive
> amounts
> > of metadata. Unfortunately it's a mkfs-time option only AFAIK.
>
> Thanks, but it was already enabled...
>
> daryl
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/rhelv6-list/attachments/20130412/ee253036/attachment.htm>