[Linux-cluster] gfs2_logd eating 99% io, random filesystem freezes

Sun Sep 2 15:02:01 UTC 2012

Hello,

we're using gfs2 on drbd, we created cluster in incomplete state (only 1 node). When doing dd if=/dev/zero of=/gfs_partition/file we get filesystem freezes every 1-2 minutes for 10-20 seconds, I mean every filesystem on that machine freezes, doing ls /etc hangs in D state for 10-20 seconds. Sometimes this hang last for more than 2 minutes and hung task message gets logged in dmesg.

iotop shows gfs2_logd and flush-XXX:X kernel process taking 99% io resources.

GFS is mounted with rw,noatime,nodiratime,hostdata=jid=0 options.

gettune options:
quota_warn_period = 10
quota_quantum = 60
max_readahead = 262144
complain_secs = 10
statfs_slow = 0
quota_simul_sync = 64
statfs_quantum = 30
quota_scale = 1.0000   (1, 1)
new_files_jdata = 0

Server is kernel 3.2.0-25 64bit.

Dmesg error (we did echo 1 > /proc/sys/kernel/hung_task_timeout_secs, but we also tested it with 120 secs):
[  818.882147] INFO: task ls:3531 blocked for more than 1 seconds.
[  818.882479] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  818.882929] ls              D ffff8803639364e0     0  3531   3269 0x00000000
[  818.882932]  ffff88033c789c58 0000000000000082 ffff88033c789be8 ffff8801e9c33780
[  818.882936]  ffff88033c789fd8 ffff88033c789fd8 ffff88033c789fd8 0000000000013780
[  818.882940]  ffff8801e5a72e00 ffff8801e5b32e00 0000000000000286 ffff88033c789ce0
[  818.882943] Call Trace:
[  818.882950]  [<ffffffffa02d2300>] ? gfs2_glock_demote_wait+0x20/0x20 [gfs2]
[  818.882953]  [<ffffffff816579cf>] schedule+0x3f/0x60
[  818.882959]  [<ffffffffa02d230e>] gfs2_glock_holder_wait+0xe/0x20 [gfs2]
[  818.882963]  [<ffffffff8165829f>] __wait_on_bit+0x5f/0x90
[  818.882965]  [<ffffffff816598de>] ? _raw_spin_lock+0xe/0x20
[  818.882972]  [<ffffffffa02d2300>] ? gfs2_glock_demote_wait+0x20/0x20 [gfs2]
[  818.882975]  [<ffffffff8165834c>] out_of_line_wait_on_bit+0x7c/0x90
[  818.882978]  [<ffffffff8108aa90>] ? autoremove_wake_function+0x40/0x40
[  818.882985]  [<ffffffffa02d4467>] gfs2_glock_wait+0x47/0x90 [gfs2]
[  818.882992]  [<ffffffffa02d5d48>] gfs2_glock_nq+0x318/0x440 [gfs2]
[  818.882998]  [<ffffffff81161cff>] ? kmem_cache_free+0x2f/0x110
[  818.883007]  [<ffffffffa02e3ccb>] gfs2_getattr+0xbb/0xf0 [gfs2]
[  818.883015]  [<ffffffffa02e3cc2>] ? gfs2_getattr+0xb2/0xf0 [gfs2]
[  818.883020]  [<ffffffff8117c79e>] vfs_getattr+0x4e/0x80
[  818.883023]  [<ffffffff8117c81e>] vfs_fstatat+0x4e/0x70
[  818.883026]  [<ffffffff8117c85e>] vfs_lstat+0x1e/0x20
[  818.883029]  [<ffffffff8117c9fa>] sys_newlstat+0x1a/0x40
[  818.883033]  [<ffffffff811971cf>] ? mntput+0x1f/0x30
[  818.883036]  [<ffffffff81182652>] ? path_put+0x22/0x30
[  818.883039]  [<ffffffff8119bc1b>] ? sys_lgetxattr+0x5b/0x70
[  818.883042]  [<ffffffff81661ec2>] system_call_fastpath+0x16/0x1b

What could be the problem?

Thank you.

Martin