[Linux-cachefs] 2.6.30 kernel woes

Daire.Byrne at framestore.com Daire.Byrne at framestore.com
Wed Aug 26 11:54:01 UTC 2009


David,

I recently upgraded our kernel from 2.6.29 (+fscache-47 patches) to 2.6.30.5 and now FS-Cache seems to have become pretty unstable. I have already applied the following three patches:

  https://www.redhat.com/archives/linux-cachefs/2009-August/msg00004.html
  http://bugzilla.kernel.org/show_bug.cgi?id=13469
  https://www.redhat.com/archives/linux-cachefs/2009-July/msg00024.html

Basically at random times the entire system hangs and I get no useful debugging info.  Sometimes the system just hangs for 3-10 seconds, loses ticks and then returns but more often than not the entire system dies. If I turn off FS-Cache then everything seems to work fine so it does seem to be directly related to that. Are there any other patches for 2.6.30 that I should be applying other than the three mentioned above?

There also seems to be some weird interaction with automount/autofs. For example doing a umount on an fsc enabled mount after the timeout period invariably hangs:

Aug 26 12:13:05 syslap kernel: INFO: task umount.nfs:2010 blocked for more than 120 seconds.
Aug 26 12:13:05 syslap kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 26 12:13:05 syslap kernel: umount.nfs    D ffff88000100b920     0  2010      1
Aug 26 12:13:05 syslap kernel: ffff880061cefab8 0000000000000086 0000000000000002 0000000004b0062c
Aug 26 12:13:05 syslap kernel: ffff880061cefa68 0000000000000206 ffff880061cefa58 00000000810b0a37
Aug 26 12:13:05 syslap kernel: ffff88007e911af8 000000000000e2e8 ffff88007e911af8 0000000000012cc0
Aug 26 12:13:05 syslap kernel: Call Trace:
Aug 26 12:13:05 syslap kernel: [<ffffffff81497726>] schedule+0x21/0x49
Aug 26 12:13:05 syslap kernel: [<ffffffffa03442e3>] __fscache_wait_on_page_write+0x9c/0xcd [fscache]
Aug 26 12:13:05 syslap kernel: [<ffffffff81071b63>] ? autoremove_wake_function+0x0/0x5f
Aug 26 12:13:05 syslap kernel: [<ffffffffa043b3f4>] __nfs_fscache_invalidate_page+0x5e/0xbd [nfs]
Aug 26 12:13:05 syslap kernel: [<ffffffffa0412b14>] nfs_invalidate_page+0x6b/0x87 [nfs]
Aug 26 12:13:05 syslap kernel: [<ffffffff810df064>] do_invalidatepage+0x38/0x4e
Aug 26 12:13:05 syslap kernel: [<ffffffff810df796>] truncate_complete_page+0x39/0x7e
Aug 26 12:13:05 syslap kernel: [<ffffffff810df8d2>] truncate_inode_pages_range+0xf7/0x36d
Aug 26 12:13:05 syslap kernel: [<ffffffff810dfb6d>] truncate_inode_pages+0x25/0x3c
Aug 26 12:13:05 syslap kernel: [<ffffffff8112b072>] dispose_list+0x54/0x10f
Aug 26 12:13:05 syslap kernel: [<ffffffff8112b432>] invalidate_inodes+0xf1/0x123
Aug 26 12:13:05 syslap kernel: [<ffffffff811187d6>] generic_shutdown_super+0x62/0x112
Aug 26 12:13:05 syslap kernel: [<ffffffff8104f4e6>] ? default_wake_function+0x0/0x3b
Aug 26 12:13:05 syslap kernel: [<ffffffff81118915>] kill_anon_super+0x25/0x66
Aug 26 12:13:05 syslap kernel: [<ffffffffa0417ef3>] nfs_kill_super+0x3b/0x67 [nfs]
Aug 26 12:13:05 syslap kernel: [<ffffffff8111910a>] deactivate_super+0x68/0x96
Aug 26 12:13:05 syslap kernel: [<ffffffff8112f779>] mntput_no_expire+0x103/0x154
Aug 26 12:13:05 syslap kernel: [<ffffffff8112fdfc>] sys_umount+0x2fc/0x341
Aug 26 12:13:05 syslap kernel: [<ffffffff81117290>] ? fput+0x2d/0x43
Aug 26 12:13:05 syslap kernel: [<ffffffff81013082>] system_call_fastpath+0x16/0x1b

In fact I've noticed that you cannot restart autofs when there are active fsc mounts but you can when there are no fsc mounts active. I have also had a couple of instances of this:

Aug 25 18:16:21 syslap kernel: Pid: 11953, comm: kslowd Tainted: P           2.6.30.5-32.fc11.x86_64 #1 Precision M90
Aug 25 18:16:21 syslap kernel: RIP: 0010:[<ffffffffa10b7de6>]  [<ffffffffa10b7de6>] fscache_object_slow_work_execute+0x231/0x68b [fscache]
Aug 25 18:16:21 syslap kernel: RSP: 0000:ffff88003790fe60  EFLAGS: 00010246
Aug 25 18:16:21 syslap kernel: RAX: 0000000000000000 RBX: ffff8800744d58c0 RCX: ffff8800744d5948
Aug 25 18:16:21 syslap kernel: RDX: ffffffff816adde0 RSI: 0000000000000003 RDI: ffff8800744d58dc
Aug 25 18:16:21 syslap kernel: RBP: ffff88003790fea0 R08: ffff8800745adec8 R09: ffff88003790feb0
Aug 25 18:16:21 syslap kernel: R10: 0000000084908654 R11: ffff88003790faf0 R12: ffff8800744d5938
Aug 25 18:16:21 syslap kernel: R13: ffff8800744d5938 R14: 0000000000000001 R15: 0000000000000004
Aug 25 18:16:21 syslap kernel: FS:  0000000000000000(0000) GS:ffff880001010000(0000) knlGS:0000000000000000
Aug 25 18:16:21 syslap kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Aug 25 18:16:21 syslap kernel: CR2: 0000000000000040 CR3: 000000006a1b3000 CR4: 00000000000026e0
Aug 25 18:16:21 syslap kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 25 18:16:21 syslap kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 25 18:16:21 syslap kernel: Process kslowd (pid: 11953, threadinfo ffff88003790e000, task ffff88006a135d40)
Aug 25 18:16:21 syslap kernel: Stack:
Aug 25 18:16:21 syslap kernel: ffff88003790fea0 0000000084908654 ffff88003790fea0 0000000000000064
Aug 25 18:16:21 syslap kernel: ffff88003790feb0 ffff8800744d5938 0000000000000001 0000000000000004
Aug 25 18:16:21 syslap kernel: ffff88003790ff10 ffffffff810d2b96 0000000000000000 ffff88006a135d40
Aug 25 18:16:21 syslap kernel: Call Trace:
Aug 25 18:16:21 syslap kernel: [<ffffffff810d2b96>] slow_work_thread+0x2c7/0x495
Aug 25 18:16:21 syslap kernel: [<ffffffff81071bc7>] ? autoremove_wake_function+0x0/0x5f
Aug 25 18:16:21 syslap kernel: [<ffffffff810d28cf>] ? slow_work_thread+0x0/0x495
Aug 25 18:16:21 syslap kernel: [<ffffffff810716bd>] kthread+0x6d/0xae
Aug 25 18:16:21 syslap kernel: [<ffffffff8101413a>] child_rip+0xa/0x20
Aug 25 18:16:21 syslap kernel: [<ffffffff81013afd>] ? restore_args+0x0/0x30
Aug 25 18:16:21 syslap kernel: [<ffffffff81071650>] ? kthread+0x0/0xae
Aug 25 18:16:21 syslap kernel: [<ffffffff81014130>] ? child_rip+0x0/0x20
Aug 25 18:16:21 syslap kernel: Code: 44 24 b8 41 23 44 24 b0 0f bd c8 0f 44 ca 39 d1 89 c8 0f 8d 5b 03 00 00 e9 07 04 00 00 48 8d 7b 1c e8 4d 1f 3e e0 49 8b 44 24 f0 <f0> 0f ba 70 40 01 19 d2 85 d2 74 13 49 8b 7c 24 f0 be 01 00 00
Aug 25 18:16:21 syslap kernel: RIP  [<ffffffffa10b7de6>] fscache_object_slow_work_execute+0x231/0x68b [fscache]
Aug 25 18:16:21 syslap kernel: RSP <ffff88003790fe60>
Aug 25 18:16:21 syslap kernel: CR2: 0000000000000040

I will try and get some more useful logs to you. I'll probably have to downgrade to 2.6.29 again to get the system working again but I figured you would be interested in the reports anyway.

Cheers,

Daire




More information about the Linux-cachefs mailing list