[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] bug in GFS2?



Hi,

On Wed, 2013-09-25 at 16:25 +0200, Pavel Herrmann wrote:
> Hi
> 
> I am trying to build a two-node cluster for samba, but I'm having some GFS2
> issues.
> 
> The nodes themselves run as virtual machines in KVM (on different hosts), use
> gentoo kernel 3.10.7 (not sure what exact version of vanilla it is based on),
> and I use the cluster-next stack in somewhat minimal configuration (corosync-2
> with DLM-4, no pacemaker).
> 
> while testing my cluster (using smbtorture), everything works fine, but the
> moment I let users onto it, i get a kernel error that hangs the cluster
> (fencing is set up and working, but doesnt kick in for some reason)
> 
I suspect that this has been fixed, but without knowing exactly what
version of the kernel this is and what patches have been applied to the
kernel, I'm afraid that I'm a bit in the dark. I don't think we've seen
anything like this recently relating to type 5 glocks,

Steve.

> this is what I get in kernel log:
> 
> Sep 25 07:10:12 fs2 kernel: [18024.888481] GFS2: fsid=fs_clust:homes.1: quota exceeded for user 104202
> Sep 25 07:10:18 fs2 kernel: [18030.335727] GFS2: fsid=fs_clust:homes.1: quota exceeded for user 104202
> Sep 25 07:10:23 fs2 kernel: [18035.994476] original: gfs2_inode_lookup+0x128/0x240 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.994482] pid: 25317
> Sep 25 07:10:23 fs2 kernel: [18035.994484] lock type: 5 req lock state : 3
> Sep 25 07:10:23 fs2 kernel: [18035.994491] new: gfs2_inode_lookup+0x128/0x240 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.994493] pid: 25317
> Sep 25 07:10:23 fs2 kernel: [18035.994494] lock type: 5 req lock state : 3
> Sep 25 07:10:23 fs2 kernel: [18035.994498]  G:  s:SH n:5/168b15e f:Iqob t:SH d:EX/0 a:0 v:0 r:4 m:50
> Sep 25 07:10:23 fs2 kernel: [18035.994506]   H: s:SH f:EH e:0 p:25317 [smbd] gfs2_inode_lookup+0x128/0x240 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.994549] general protection fault: 0000 [#1] SMP 
> Sep 25 07:10:23 fs2 kernel: [18035.994840] Modules linked in: iptable_filter ip_tables x_tables gfs2 dm_mod dlm sctp libcrc32c ipv6 configfs virtio_net i6300esb
> Sep 25 07:10:23 fs2 kernel: [18035.995617] CPU: 2 PID: 25317 Comm: smbd Not tainted 3.10.7-gentoo #10
> Sep 25 07:10:23 fs2 kernel: [18035.995910] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> Sep 25 07:10:23 fs2 kernel: [18035.996191] task: ffff8800b2aa1b00 ti: ffff8800a4a02000 task.ti: ffff8800a4a02000
> Sep 25 07:10:23 fs2 kernel: [18035.996546] RIP: 0010:[<ffffffff81053bcb>]  [<ffffffff81053bcb>] pid_task+0xb/0x40
> Sep 25 07:10:23 fs2 kernel: [18035.996999] RSP: 0018:ffff8800a4a03a10  EFLAGS: 00010206
> Sep 25 07:10:23 fs2 kernel: [18035.997253] RAX: 13270cbeaaf4957b RBX: ffff8800988f7710 RCX: 0000000000000006
> Sep 25 07:10:23 fs2 kernel: [18035.997592] RDX: 0000000000000007 RSI: 0000000000000000 RDI: 13270cbeaaf4957b
> Sep 25 07:10:23 fs2 kernel: [18035.997934] RBP: ffff8800a4b43ba0 R08: 000000000000000a R09: 0000000000000000
> Sep 25 07:10:23 fs2 kernel: [18035.998019] R10: 0000000000000191 R11: 0000000000000190 R12: 0000000000000000
> Sep 25 07:10:23 fs2 kernel: [18035.998019] R13: ffff8800a4b43bf0 R14: ffffffffa0133720 R15: ffff8800995bd988
> Sep 25 07:10:23 fs2 kernel: [18035.998019] FS:  00007f1846316740(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000
> Sep 25 07:10:23 fs2 kernel: [18035.998019] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Sep 25 07:10:23 fs2 kernel: [18035.998019] CR2: 000000000122aae8 CR3: 000000009880c000 CR4: 00000000000007a0
> Sep 25 07:10:23 fs2 kernel: [18035.998019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Sep 25 07:10:23 fs2 kernel: [18035.998019] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Sep 25 07:10:23 fs2 kernel: [18035.998019] Stack:
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  ffffffffa0111f07 ffff8800b2aa1e70 ffffffffa011ffd8 0000000000000000
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  0000000000000000 0000000000000000 ffff880000000004 0000000000000032
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  ffff8800a4b43ba0 ffff8800a4b43bf0 00000000626f7149 ffff8800995bd988
> Sep 25 07:10:23 fs2 kernel: [18035.998019] Call Trace:
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa0111f07>] ? gfs2_dump_glock+0x1c7/0x360 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa011ffd8>] ? gfs2_inode_lookup+0x128/0x240 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81457b2b>] ? printk+0x4f/0x54
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81132e7d>] ? inode_init_always+0xed/0x1b0
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8145f245>] ? _raw_spin_lock+0x5/0x10
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa01138bb>] ? gfs2_glock_nq+0x30b/0x3e0 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa011ffe0>] ? gfs2_inode_lookup+0x130/0x240 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa0109195>] ? gfs2_dirent_search+0xe5/0x1c0 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa010a4aa>] ? gfs2_dir_search+0x4a/0x80 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa01202f7>] ? gfs2_lookupi+0xf7/0x1f0 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa01203b9>] ? gfs2_lookupi+0x1b9/0x1f0 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa0121821>] ? gfs2_lookup+0x21/0xa0 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8145f245>] ? _raw_spin_lock+0x5/0x10
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff811315e6>] ? d_alloc+0x76/0x90
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81124be3>] ? lookup_dcache+0xa3/0xd0
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff811246c4>] ? lookup_real+0x14/0x50
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81124c42>] ? __lookup_hash+0x32/0x50
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81459d64>] ? lookup_slow+0x3c/0xa2
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8145f245>] ? _raw_spin_lock+0x5/0x10
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81126edf>] ? path_lookupat+0x23f/0x780
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa011f169>] ? gfs2_getxattr+0x79/0xa0 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8112744f>] ? filename_lookup+0x2f/0xc0
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81125ccc>] ? getname_flags+0xbc/0x1a0
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8112a32c>] ? user_path_at_empty+0x5c/0xb0
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa01122c6>] ? gfs2_holder_uninit+0x16/0x30 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8111f8fd>] ? cp_new_stat+0x10d/0x120
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8111facf>] ? vfs_fstatat+0x3f/0x90
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8111fc02>] ? SYSC_newstat+0x12/0x30
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8105ed51>] ? lg_local_lock+0x11/0x20
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8145ff69>] ? system_call_fastpath+0x16/0x1b
> Sep 25 07:10:23 fs2 kernel: [18035.998019] Code: 31 f6 48 85 c0 74 0c 8b 50 04 48 c1 e2 05 48 8b 74 10 38 e9 28 ff ff ff 0f 1f 84 00 00 00 00 00 48 85 ff 74 23 89 f6 48 8d 04 f7 <48> 8b 40 08 48 85 c0 74 1c 48 8d 14 76 48 8d 14 d5 30 02 00 00 
> Sep 25 07:10:23 fs2 kernel: [18035.998019] RIP  [<ffffffff81053bcb>] pid_task+0xb/0x40
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  RSP <ffff8800a4a03a10>
> Sep 25 07:10:23 fs2 kernel: [18036.033702] ---[ end trace e5751bbc7d3a8d7c ]---
> 
> 
> simple inspecfion of the gfs2 code showed this is caused by attempting a
> recursive lock. two gfs2_inode_lookups are visible in the trace, not sure
> that is strictly relevant though.
> 
> this is followed by (probaby related) trace:
> 
> 
> Sep 25 07:10:24 fs2 kernel: [18036.162513] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
> Sep 25 07:10:24 fs2 kernel: [18036.164016] IP: [<ffffffffa011f7c6>] gfs2_permission+0x56/0x110 [gfs2]
> Sep 25 07:10:24 fs2 kernel: [18036.164016] PGD 989a3067 PUD 9886a067 PMD 0 
> Sep 25 07:10:24 fs2 kernel: [18036.164016] Oops: 0000 [#2] SMP 
> Sep 25 07:10:24 fs2 kernel: [18036.164016] Modules linked in: iptable_filter ip_tables x_tables gfs2 dm_mod dlm sctp libcrc32c ipv6 configfs virtio_net i6300esb
> Sep 25 07:10:24 fs2 kernel: [18036.164016] CPU: 1 PID: 25453 Comm: smbd Tainted: G      D      3.10.7-gentoo #10
> Sep 25 07:10:24 fs2 kernel: [18036.164016] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> Sep 25 07:10:24 fs2 kernel: [18036.164016] task: ffff8800afca0d80 ti: ffff8800a4a02000 task.ti: ffff8800a4a02000
> Sep 25 07:10:24 fs2 kernel: [18036.164016] RIP: 0010:[<ffffffffa011f7c6>]  [<ffffffffa011f7c6>] gfs2_permission+0x56/0x110 [gfs2]
> Sep 25 07:10:24 fs2 kernel: [18036.164016] RSP: 0018:ffff8800a4a03c08  EFLAGS: 00010286
> Sep 25 07:10:24 fs2 kernel: [18036.164016] RAX: ffffffff8145f245 RBX: 0000000000000040 RCX: 0000000000000000
> Sep 25 07:10:24 fs2 kernel: [18036.164016] RDX: ffff8800b5668f00 RSI: 0000000000000001 RDI: ffff8800a4b97ddc
> Sep 25 07:10:24 fs2 kernel: [18036.164016] RBP: ffff880099486e60 R08: 0000000000000061 R09: 0000000000000000
> Sep 25 07:10:24 fs2 kernel: [18036.164016] R10: ff48ad3954b34002 R11: d09e94939e979e85 R12: ffff8800a4b97ddc
> Sep 25 07:10:24 fs2 kernel: [18036.164016] R13: 0000000000000001 R14: ffff8800a4b97df8 R15: ffff8800afca0d80
> Sep 25 07:10:24 fs2 kernel: [18036.164016] FS:  00007f1846316740(0000) GS:ffff8800bfa80000(0000) knlGS:0000000000000000
> Sep 25 07:10:24 fs2 kernel: [18036.164016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Sep 25 07:10:24 fs2 kernel: [18036.164016] CR2: 0000000000000070 CR3: 000000009880c000 CR4: 00000000000007a0
> Sep 25 07:10:24 fs2 kernel: [18036.164016] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Sep 25 07:10:24 fs2 kernel: [18036.164016] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Sep 25 07:10:24 fs2 kernel: [18036.164016] Stack:
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  ffff8800994e0c00 ffffffff81125a8b ffff8800a4a03c18 ffff8800a4a03c18
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  0000000000000000 ffff8800bbba8d20 0000000800000003 0000000200000000
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  ffffffff8145f245 ffffffff8112ff5e ffff8800a4a03e08 0000000000000007
> Sep 25 07:10:24 fs2 kernel: [18036.164016] Call Trace:
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff81125a8b>] ? lookup_fast+0x1ab/0x2f0
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8145f245>] ? _raw_spin_lock+0x5/0x10
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8112ff5e>] ? dput+0x17e/0x220
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8112610a>] ? link_path_walk+0x23a/0x8b0
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff81126b9c>] ? path_init+0x30c/0x410
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff81126cf2>] ? path_lookupat+0x52/0x780
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8112744f>] ? filename_lookup+0x2f/0xc0
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff81125ccc>] ? getname_flags+0xbc/0x1a0
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8112a32c>] ? user_path_at_empty+0x5c/0xb0
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8111facf>] ? vfs_fstatat+0x3f/0x90
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8111fc02>] ? SYSC_newstat+0x12/0x30
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8111b420>] ? SyS_read+0x50/0xa0
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8145ff69>] ? system_call_fastpath+0x16/0x1b
> Sep 25 07:10:24 fs2 kernel: [18036.164016] Code: c6 50 65 48 8b 04 25 80 b7 00 00 48 8b 90 40 02 00 00 4c 39 f3 75 14 eb 1a 0f 1f 40 00 48 3b 53 18 74 12 48 8b 1b 49 39 de 74 08 <48> 8b 43 30 a8 40 75 ea 31 db 4c 89 e7 e8 e8 78 f0 e0 66 90 45 
> Sep 25 07:10:24 fs2 kernel: [18036.164016] RIP  [<ffffffffa011f7c6>] gfs2_permission+0x56/0x110 [gfs2]
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  RSP <ffff8800a4a03c08>
> Sep 25 07:10:24 fs2 kernel: [18036.164016] CR2: 0000000000000070
> Sep 25 07:10:24 fs2 kernel: [18036.218133] ---[ end trace e5751bbc7d3a8d7d ]---
> 
> afterwards the log is filled with "INFO: rcu_sched self-detected stall" and
> NMI-caused backtraces
> 
> Is this a known-and-fixed bug? is there a way to prevent this?
> 
> 
> thanks
> Pavel Herrmann
> 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]