[Linux-cluster] gfs2 bug

Steven Whitehouse swhiteho at redhat.com
Thu May 21 08:25:46 UTC 2009


Hi,

On Wed, 2009-05-20 at 19:43 +0200, Jürgen Knödlseder wrote:
> Dear all,
> 
> 
> I just got the attached kernel bug related to gfs2 handling. I have a
> 2 node cluster (PE 1950) connected to a MD3000 with several gfs2
> filesystems installed. I'm running kernel 2.6.27.7 together with
> cluster-2.03.11. Another machine has access to gfs2 via nfs. While the
> bug occured there was a substantial load on all machines with related
> heavy disk access.
> 
> 
> Does this bug ring some bells? Any clues what's going on?
> 
I suspect that you have run into the bug introduced in the patch
removing glockd & scand and recently fixed in Linus' upstream kernel
with the following patch:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0c7a531a200480c7bc447260376973d830da9069

It took a while to track down because it only happens when entries on
the glock lru list are arranged in a particular order and the memory
pressure was enough to cause the glock shrinker to run. To add to the
issue, the actual problem never occurred at the point where the bug was
as the ref count was always too high then.

So there is a good chance that is what you have hit, and I'd suggest
using the latest upstream Linus' kernel instead,

Steve.


> 
> Best regards,
> 
> 
> Jürgen
> - - - - - -
> 
> 
> May 20 17:41:40 einstein [87029.808089] BUG: unable to handle kernel
> NULL pointer dereference at 0000000000000000
> May 20 17:41:40 einstein [87029.808095] IP: [<ffffffffa0031acf>]
> gfs2_glock_nq+0x133/0x28f [gfs2]
> May 20 17:41:40 einstein [87029.808112] PGD 4730f0067 PUD 48ec8c067
> PMD 0 
> May 20 17:41:40 einstein [87029.808115] Oops: 0000 [1] SMP 
> May 20 17:41:40 einstein [87029.808117] CPU 0 
> May 20 17:41:40 einstein [87029.808118] Modules linked in: gfs
> lock_dlm gfs2 dlm configfs
> May 20 17:41:40 einstein [87029.808123] Pid: 23868,
> comm: /opt/projects/g Not tainted 2.6.27.7 #3
> May 20 17:41:40 einstein [87029.808124] RIP: 0010:[<ffffffffa0031acf>]
> [<ffffffffa0031acf>] gfs2_glock_nq+0x133/0x28f [gfs2]
> May 20 17:41:40 einstein [87029.808135] RSP: 0000:ffff8803029edb78
> EFLAGS: 00010202
> May 20 17:41:40 einstein [87029.808136] RAX: 0000000000000001 RBX:
> 0000000000000000 RCX: 0000000000000006
> May 20 17:41:40 einstein [87029.808137] RDX: 0000000000000000 RSI:
> 00007fff77f08540 RDI: 0000000000000040
> May 20 17:41:40 einstein [87029.808139] RBP: ffff8803029edbb8 R08:
> ffff8805a0d68e70 R09: ffff8804d08c4c90
> May 20 17:41:40 einstein [87029.808140] R10: ffff8803510d7938 R11:
> ffffffffa0047cc0 R12: ffff8803510d7978
> May 20 17:41:40 einstein [87029.808142] R13: ffff88058718d480 R14:
> 0000000000000000 R15: ffff88058718d480
> May 20 17:41:40 einstein [87029.808143] FS:  00007ff26feeb6f0(0000)
> GS:ffffffff817ca640(0000) knlGS:0000000000000000
> May 20 17:41:40 einstein [87029.808145] CS:  0010 DS: 0000 ES: 0000
> CR0: 000000008005003b
> May 20 17:41:40 einstein [87029.808146] CR2: 0000000000000000 CR3:
> 0000000302805000 CR4: 00000000000006a0
> May 20 17:41:40 einstein [87029.808148] DR0: ffffffffff600000 DR1:
> ffffffffff600400 DR2: ffffffffff600800
> May 20 17:41:40 einstein [87029.808149] DR3: 0000000000000000 DR6:
> 00000000ffff0ff2 DR7: 0000000000000400
> May 20 17:41:40 einstein [87029.808151] Process /opt/projects/g (pid:
> 23868, threadinfo ffff8803029ec000, task ffff8804d08c4c40)
> May 20 17:41:40 einstein [87029.808152] Stack:  ffff8803510d7938
> 00000000634ff148 ffff880876431000 ffff8803510d7978
> May 20 17:41:40 einstein [87029.808155]  ffff880588556398
> 0000000000000000 ffff880876431000 ffff8803510d7800
> May 20 17:41:40 einstein [87029.808158]  ffff8803029edbd8
> ffffffffa0032e8e ffff8805634feee0 ffff880588556398
> May 20 17:41:40 einstein [87029.808160] Call Trace:
> May 20 17:41:40 einstein [87029.808171]  [<ffffffffa0032e8e>]
> gfs2_glock_nq_init+0x17/0x2e [gfs2]
> May 20 17:41:40 einstein [87029.808181]  [<ffffffffa0033539>]
> gfs2_dinode_dealloc+0x116/0x1bd [gfs2]
> May 20 17:41:40 einstein [87029.808192]  [<ffffffffa003f601>]
> gfs2_delete_inode+0x111/0x1b5 [gfs2]
> May 20 17:41:40 einstein [87029.808204]  [<ffffffffa003f54c>] ?
> gfs2_delete_inode+0x5c/0x1b5 [gfs2]
> May 20 17:41:40 einstein [87029.808215]  [<ffffffffa003f4f0>] ?
> gfs2_delete_inode+0x0/0x1b5 [gfs2]
> May 20 17:41:40 einstein [87029.808220]  [<ffffffff810b9897>]
> generic_delete_inode+0xaa/0xff
> May 20 17:41:40 einstein [87029.808222]  [<ffffffff810b9901>]
> generic_drop_inode+0x15/0x11a
> May 20 17:41:40 einstein [87029.808233]  [<ffffffffa003f44b>]
> gfs2_drop_inode+0x54/0x58 [gfs2]
> May 20 17:41:40 einstein [87029.808235]  [<ffffffff810b8e6e>] iput
> +0x61/0x65
> May 20 17:41:40 einstein [87029.808236]  [<ffffffff810b6b88>]
> dentry_iput+0x8a/0x9a
> May 20 17:41:40 einstein [87029.808238]  [<ffffffff810b6c4d>] d_kill
> +0x38/0x58
> May 20 17:41:40 einstein [87029.808239]  [<ffffffff810b7e1e>] dput
> +0x101/0x10d
> May 20 17:41:40 einstein [87029.808242]  [<ffffffff810ae9d7>]
> do_revalidate+0x33/0x48
> May 20 17:41:40 einstein [87029.808244]  [<ffffffff810aebd2>]
> __lookup_hash+0x89/0xef
> May 20 17:41:40 einstein [87029.808245]  [<ffffffff810aec6d>]
> lookup_hash+0x35/0x3f
> May 20 17:41:40 einstein [87029.808247]  [<ffffffff810b099e>]
> sys_renameat+0x12d/0x1e3
> May 20 17:41:40 einstein [87029.808250]  [<ffffffff8107ef8b>] ?
> free_hot_page+0xb/0xd
> May 20 17:41:40 einstein [87029.808252]  [<ffffffff8107efa5>] ?
> __free_pages+0x18/0x21
> May 20 17:41:40 einstein [87029.808254]  [<ffffffff810a7088>] ?
> fsnotify_access+0x62/0x6a
> May 20 17:41:40 einstein [87029.808256]  [<ffffffff810a7d1a>] ?
> vfs_read+0xcd/0x102
> May 20 17:41:40 einstein [87029.808258]  [<ffffffff810b0a6a>]
> sys_rename+0x16/0x18
> May 20 17:41:40 einstein [87029.808260]  [<ffffffff8100c1fb>]
> system_call_fastpath+0x16/0x1b
> May 20 17:41:40 einstein [87029.808262] 
> May 20 17:41:40 einstein [87029.808262] 
> May 20 17:41:40 einstein [87029.808263] Code: 48 8d 73 30 bf 06 00 00
> 00 e8 66 df ff ff 85 c0 75 16 41 8b 54 24 24 31 c0 c1 ea 04 4d 85 f6
> 0f 94 c0 85 c2 4c 0f 45 f3 48 8b 1b <48> 8b 03 0f 18 08 49 8d 45 50 48
> 39 c3 0f 85 73 ff ff ff 4d 85 
> May 20 17:41:40 einstein [87029.808282] RIP  [<ffffffffa0031acf>]
> gfs2_glock_nq+0x133/0x28f [gfs2]
> May 20 17:41:40 einstein [87029.808292]  RSP <ffff8803029edb78>
> May 20 17:41:40 einstein [87029.808293] CR2: 0000000000000000
> May 20 17:41:40 einstein [87029.808295] ---[ end trace
> af94d521028b5618 ]---
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




More information about the Linux-cluster mailing list