[rhelv6-list] RHEL6.2 XFS brutal performence with lots of files

Mon Apr 15 12:58:30 UTC 2013

Good morning,

Thanks for the response and the fun never stops!  This system crashed on
Saturday morning with the following

<4>------------[ cut here ]------------
<2>kernel BUG at include/linux/swapops.h:126!
<4>invalid opcode: 0000 [#1] SMP
<4>last sysfs file: /sys/kernel/mm/ksm/run
<4>CPU 7
<4>Modules linked in: iptable_filter ip_tables nfsd nfs lockd fscache
auth_rpcgss nfs_acl sunrpc bridge stp llc ip6t_REJECT nf_conntrack_ipv6
nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xfs
exportfs vhost_net macvtap macvlan tun kvm_intel kvm raid456
async_raid6_recov async_pq power_meter raid6_pq async_xor dcdbas xor
microcode serio_raw async_memcpy async_tx iTCO_wdt iTCO_vendor_support
i7core_edac edac_core sg bnx2 ext4 mbcache jbd2 sr_mod cdrom sd_mod
crc_t10dif pata_acpi ata_generic ata_piix wmi mpt2sas scsi_transport_sas
raid_class dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
speedstep_lib]
<4>
<4>Pid: 4581, comm: ssh Not tainted 2.6.32-358.2.1.el6.x86_64 #1 Dell Inc.
PowerEdge T410/0Y2G6P
<4>RIP: 0010:[<ffffffff8116c501>]  [<ffffffff8116c501>]
migration_entry_wait+0x181/0x190
<4>RSP: 0000:ffff8801c1703c88  EFLAGS: 00010246
<4>RAX: ffffea0000000000 RBX: ffffea0003bf6f58 RCX: ffff880236437580
<4>RDX: 00000000001121fd RSI: ffff8801c040e5d8 RDI: 000000002243fa3e
<4>RBP: ffff8801c1703ca8 R08: ffff8801c040e5d8 R09: 0000000000000029
<4>R10: ffff8801d6850200 R11: 00002ad7d96cbf5a R12: ffffea0007bdec18
<4>R13: 0000000236437580 R14: 0000000236437067 R15: 00002ad7d76b0000
<4>FS:  00002ad7dace2880(0000) GS:ffff880028260000(0000)
knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>CR2: 00002ad7d76b0000 CR3: 00000001bb686000 CR4: 00000000000007e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process ssh (pid: 4581, threadinfo ffff8801c1702000, task
ffff880261aa7500)
<4>Stack:
<4> ffff88024b5f22d8 0000000000000000 000000002243fa3e ffff8801c040e5d8
<4><d> ffff8801c1703d88 ffffffff811441b8 0000000000000000 ffff8801c1703d08
<4><d> ffff8801c1703eb8 ffff8801c1703dc8 ffff880328cb48c0 0000000000000040
<4>Call Trace:
<4> [<ffffffff811441b8>] handle_pte_fault+0xb48/0xb50
<4> [<ffffffff81437dbb>] ? sock_aio_write+0x19b/0x1c0
<4> [<ffffffff8112c6d4>] ? __pagevec_free+0x44/0x90
<4> [<ffffffff811443fa>] handle_mm_fault+0x23a/0x310
<4> [<ffffffff810474c9>] __do_page_fault+0x139/0x480
<4> [<ffffffff81194fb2>] ? vfs_ioctl+0x22/0xa0
<4> [<ffffffff811493a0>] ? unmap_region+0x110/0x130
<4> [<ffffffff81195154>] ? do_vfs_ioctl+0x84/0x580
<4> [<ffffffff8151339e>] do_page_fault+0x3e/0xa0
<4> [<ffffffff81510755>] page_fault+0x25/0x30
<4>Code: e8 f5 2f fc ff e9 59 ff ff ff 48 8d 53 08 85 c9 0f 84 44 ff ff ff
8d 71 01 48 63 c1 48 63 f6 f0 0f b1 32 39 c1 74 be 89 c1 eb e3 <0f> 0b eb
fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83
<1>RIP  [<ffffffff8116c501>] migration_entry_wait+0x181/0x190
<4> RSP <ffff8801c1703c88>

It rebooted itself, now I must have some filesytem corruption as this is
being dumped frequently:

XFS (md127): page discard on page ffffea0003c95018, inode 0x849ec442,
offset 0.
XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 342 of file
fs/xfs/xfs_alloc.c.  Caller 0xffffffffa02986c2

Pid: 1304, comm: xfsalloc/7 Not tainted 2.6.32-358.2.1.el6.x86_64 #1
Call Trace:
 [<ffffffffa02c20cf>] ? xfs_error_report+0x3f/0x50 [xfs]
 [<ffffffffa02986c2>] ? xfs_alloc_ag_vextent_size+0x482/0x630 [xfs]
 [<ffffffffa0296a69>] ? xfs_alloc_lookup_eq+0x19/0x20 [xfs]
 [<ffffffffa0296d16>] ? xfs_alloc_fixup_trees+0x236/0x350 [xfs]
 [<ffffffffa02986c2>] ? xfs_alloc_ag_vextent_size+0x482/0x630 [xfs]
 [<ffffffffa029943d>] ? xfs_alloc_ag_vextent+0xad/0x100 [xfs]
 [<ffffffffa0299e8c>] ? xfs_alloc_vextent+0x2bc/0x610 [xfs]
 [<ffffffffa02a4587>] ? xfs_bmap_btalloc+0x267/0x700 [xfs]
 [<ffffffff8105e759>] ? find_busiest_queue+0x69/0x150
 [<ffffffffa02a4a2e>] ? xfs_bmap_alloc+0xe/0x10 [xfs]
 [<ffffffffa02a4b0a>] ? xfs_bmapi_allocate_worker+0x4a/0x80 [xfs]
 [<ffffffffa02a4ac0>] ? xfs_bmapi_allocate_worker+0x0/0x80 [xfs]
 [<ffffffff81090ae0>] ? worker_thread+0x170/0x2a0
 [<ffffffff81096ca0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81090970>] ? worker_thread+0x0/0x2a0
 [<ffffffff81096936>] ? kthread+0x96/0xa0
 [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
 [<ffffffff810968a0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
XFS (md127): page discard on page ffffea0003890fa0, inode 0x849ec441,
offset 0.

Anyway, to respond to your questions:

On Mon, Apr 15, 2013 at 3:50 AM, Jussi Silvennoinen <
jussi_rhel6 at silvennoinen.net> wrote:

> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>           11.12    0.03    2.70    3.60    0.00   82.56
>>
>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>> md127           134.36     10336.87     11381.45 19674692141 21662893316
>>
>
> Do use iostat -x to see more details which will give a better indication
> how busy the disks are.

# iostat -x
Linux 2.6.32-358.2.1.el6.x86_64 (iem21.local) 04/15/2013 _x86_64_ (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.33    0.00    3.31    2.24    0.00   84.11

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               3.48  1002.05   22.42   33.26  1162.56  8277.06   169.55
    6.52  117.17   2.49  13.86
sdc            3805.96   173.47  292.94   28.83 33747.35  1611.10   109.89
    3.47   10.74   0.82  26.46
sde            3814.91   174.53  285.98   29.92 33761.01  1628.96   112.03
    5.70   17.97   0.97  30.63
sdb            3813.98   173.45  284.85   28.66 33745.12  1609.93   112.77
    4.07   12.94   0.91  28.48
sdd            3805.78   174.18  294.19   29.35 33754.41  1621.14   109.34
    3.81   11.73   0.84  27.32
sdf            3813.80   173.68  285.46   29.04 33751.91  1614.36   112.45
    4.70   14.91   0.93  29.17
md127             0.00     0.00   21.75   45.85  4949.72  5919.63   160.78
    0.00    0.00   0.00   0.00

but I suspect this is inflated, since it just completed a raid5 resync.

>  I have other similiar filesystems on ext4 with similiar hardware and
>> millions of small files as well.  I don't see such sluggishness with small
>> files and directories there.  I guess I picked XFS for this filesystem
>> initially because of its fast fsck times.
>>
>
> Are those other systems also employing software raid? In my experience,
> swraid is painfully slow with random writes. And your workload in this use
> case is exactly that.

Some of them are and some aren't.  I have an opportunity to move this
workload to a hardware RAID5, so I may just do that and cut my losses :)

>  # grep md127 /proc/mounts
>> /dev/md127 /mesonet xfs
>> rw,noatime,attr2,delaylog,**sunit=1024,swidth=4096,noquota 0 0
>>
>
> inode64 is not used, I suspect it would have helped alot. Enabling it
> afterwards will not help for data which is already on the disk but it will
> help with new files.

Thanks for the tip, I'll try that out.

daryl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/rhelv6-list/attachments/20130415/c7a729e6/attachment.htm>