[Linux-cachefs] hang on __fscache_wait_on_page_write and __nfs_fscache_invalidate_page

Tue Jan 24 17:06:17 UTC 2012

I thought I'd sent this in December, but I didn't see it in the 
archives, so I thought I'd take another crack at it.  

The problem seems to be spreading to clients mounting Linux server 
hosted NFS exports as well.

Any help would be appreciated.

Thanks,
Brian

----- Forwarded message from Brian Kroth <bpkroth at gmail.com> -----

Date: Fri, 16 Dec 2011 10:36:45 -0600
From: Brian Kroth <bpkroth at gmail.com>
To: linux-cachefs at redhat.com
Subject: hang on __fscache_wait_on_page_write and
	__nfs_fscache_invalidate_page
Reply-To: Brian Kroth <bpkroth at gmail.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-Operating-System: Linux 2.6.32-bpo.3-amd64 x86_64

Hello all.  I hope this is the right place for this question.  If not 
please let me know where else it should go.

I've got some machines that from time to time will hang waiting on an 
umount.nfs command and eventually spit out a call trace referencing 
__fscache_wait_on_page_write and __nfs_fscache_invalidate_page.

The situation is as follows:

Machines are configure to nfs automount some set of filesystems.  Some 
are provided by an EMC Celerra NAS, others by Linux hosts, others by 
Solaris.

Most of the mounts are configured to use the fsc option so that data 
which is read is cached for faster access the next time around.

On one of these mounts I've seen on a couple client nodes about a half 
dozen times now, a situation where
1) The automounter will attempt to unmount a particular volume due to 
inactivity.  I've used lsof to confirm that nothing was using the 
mount and this should in theory be a safe operation to do.

2) The umount.nfs process will block with the following call trace:

[  722.136057] INFO: task umount.nfs:926 blocked for more than 120 seconds.
[  722.136061] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  722.136065] umount.nfs      D ffff88012bb807a0     0   926      1 0x00000000
[  722.136071]  ffff88012bb807a0 0000000000000082 ffffea0002179998 ffff8800c15ad1c0
[  722.136077]  0000000000013b40 ffff880016019fd8 ffff880016019fd8 0000000000013b40
[  722.136083]  ffff88012bb807a0 ffff880016018010 0000000000000246 0000000000000001
[  722.136088] Call Trace:
[  722.136100]  [<ffffffffa0ef8c3a>] ? __fscache_wait_on_page_write+0x83/0x9d [fscache]
[  722.136106]  [<ffffffff8105f3b1>] ? wake_up_bit+0x20/0x20
[  722.136120]  [<ffffffffa0f444c1>] ? __nfs_fscache_invalidate_page+0x45/0x7a [nfs]
[  722.136126]  [<ffffffff810bf556>] ? truncate_inode_page+0x45/0x78
[  722.136130]  [<ffffffff810bfa53>] ? truncate_inode_pages_range+0xca/0x2f2
[  722.136140]  [<ffffffffa0f1f3bd>] ? nfs_access_free_entry+0x15/0x1f [nfs]
[  722.136150]  [<ffffffffa0f21477>] ? nfs_access_zap_cache+0x116/0x12a [nfs]
[  722.136161]  [<ffffffffa0f2380c>] ? nfs_evict_inode+0x12/0x23 [nfs]
[  722.136166]  [<ffffffff8110e5b9>] ? evict+0x79/0x116
[  722.136169]  [<ffffffff8110e682>] ? dispose_list+0x2c/0x36
[  722.136173]  [<ffffffff8110e865>] ? evict_inodes+0xc7/0xea
[  722.136179]  [<ffffffff810fd872>] ? generic_shutdown_super+0x4e/0xd4
[  722.136183]  [<ffffffff810fd960>] ? kill_anon_super+0x9/0x40
[  722.136194]  [<ffffffffa0f27996>] ? nfs_kill_super+0x15/0x28 [nfs]
[  722.136199]  [<ffffffff810fd9d0>] ? deactivate_locked_super+0x1e/0x44
[  722.136203]  [<ffffffff81112c9c>] ? sys_umount+0x2ea/0x315
[  722.136207]  [<ffffffff81112039>] ? mntput_no_expire+0x8f/0x14f
[  722.136212]  [<ffffffff81339392>] ? system_call_fastpath+0x16/0x1b

(surprisingly early after boot don't you think?)

3) At some point later, the volume will attempt to be accessed again, 
causing automount to spawn a mount process, which will hang since 
there's already a umount process running.

Since the processes are blocked in the kernel.  There's not much I can 
do to resolve the issue but kick the machine.

The clients (~200 of them) are all running debian squeeze based 
machines with a 2.6.39 kernel (from debian-backports).  They run 
cachefilesd 0.9-3 with the following conf:
dir /var/cache/fscache
tag mycache
brun 15%
bcull 12%
bstop 10%
frun 15%
fcull 12%
fstop 10%
culltable 16

The backing store for the cache is mounted as follows:
# mount | grep fscache
/dev/mapper/vg-fscache on /var/cache/fscache type ext4 (rw,relatime,user_xattr)

The automount that seems to be having the problems looks like this:
cnerg		-hard,vers=3,nosuid,fsc	filespace:/groups_cnerg

It's served by the EMC Celerra.  However, none of the other 35 
automounts that that server serves, nor any of the ones that are 
served by the Linux/Solaris hosts, seem to have exhibited this 
problem.  Not necessarily a causal relationship, but suspicious.  It 
might just be the way the group is using the volume, though I haven't 
been able to pinpoint anything along those lines either.

More telling, I think, is that since I've removed the fsc mount option 
from that automount I haven't seen this problem again.  However, I 
would like to keep that option on.

The trouble in debugging is that I don't know how to reproduce the 
problem.  If I did, I'd capture some NFS packet traces to verify 
nothing is happening there.

Due to the call trace however, it seems to me like the fscache is 
failing for some reason.

Which is what brings me here.  So, now I'm wondering if anyone has an 
insight as to any known problems with a setup like this or other 
things I could look into?

Let me know if you need any more details.  I have a handful of ps, 
lsof, and dmesg dumps I can share.

Thanks,
Brian

----- End forwarded message -----