[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] bug in GFS2?



Hi

I am trying to build a two-node cluster for samba, but I'm having some GFS2
issues.

The nodes themselves run as virtual machines in KVM (on different hosts), use
gentoo kernel 3.10.7 (not sure what exact version of vanilla it is based on),
and I use the cluster-next stack in somewhat minimal configuration (corosync-2
with DLM-4, no pacemaker).

while testing my cluster (using smbtorture), everything works fine, but the
moment I let users onto it, i get a kernel error that hangs the cluster
(fencing is set up and working, but doesnt kick in for some reason)

this is what I get in kernel log:

Sep 25 07:10:12 fs2 kernel: [18024.888481] GFS2: fsid=fs_clust:homes.1: quota exceeded for user 104202
Sep 25 07:10:18 fs2 kernel: [18030.335727] GFS2: fsid=fs_clust:homes.1: quota exceeded for user 104202
Sep 25 07:10:23 fs2 kernel: [18035.994476] original: gfs2_inode_lookup+0x128/0x240 [gfs2]
Sep 25 07:10:23 fs2 kernel: [18035.994482] pid: 25317
Sep 25 07:10:23 fs2 kernel: [18035.994484] lock type: 5 req lock state : 3
Sep 25 07:10:23 fs2 kernel: [18035.994491] new: gfs2_inode_lookup+0x128/0x240 [gfs2]
Sep 25 07:10:23 fs2 kernel: [18035.994493] pid: 25317
Sep 25 07:10:23 fs2 kernel: [18035.994494] lock type: 5 req lock state : 3
Sep 25 07:10:23 fs2 kernel: [18035.994498]  G:  s:SH n:5/168b15e f:Iqob t:SH d:EX/0 a:0 v:0 r:4 m:50
Sep 25 07:10:23 fs2 kernel: [18035.994506]   H: s:SH f:EH e:0 p:25317 [smbd] gfs2_inode_lookup+0x128/0x240 [gfs2]
Sep 25 07:10:23 fs2 kernel: [18035.994549] general protection fault: 0000 [#1] SMP 
Sep 25 07:10:23 fs2 kernel: [18035.994840] Modules linked in: iptable_filter ip_tables x_tables gfs2 dm_mod dlm sctp libcrc32c ipv6 configfs virtio_net i6300esb
Sep 25 07:10:23 fs2 kernel: [18035.995617] CPU: 2 PID: 25317 Comm: smbd Not tainted 3.10.7-gentoo #10
Sep 25 07:10:23 fs2 kernel: [18035.995910] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Sep 25 07:10:23 fs2 kernel: [18035.996191] task: ffff8800b2aa1b00 ti: ffff8800a4a02000 task.ti: ffff8800a4a02000
Sep 25 07:10:23 fs2 kernel: [18035.996546] RIP: 0010:[<ffffffff81053bcb>]  [<ffffffff81053bcb>] pid_task+0xb/0x40
Sep 25 07:10:23 fs2 kernel: [18035.996999] RSP: 0018:ffff8800a4a03a10  EFLAGS: 00010206
Sep 25 07:10:23 fs2 kernel: [18035.997253] RAX: 13270cbeaaf4957b RBX: ffff8800988f7710 RCX: 0000000000000006
Sep 25 07:10:23 fs2 kernel: [18035.997592] RDX: 0000000000000007 RSI: 0000000000000000 RDI: 13270cbeaaf4957b
Sep 25 07:10:23 fs2 kernel: [18035.997934] RBP: ffff8800a4b43ba0 R08: 000000000000000a R09: 0000000000000000
Sep 25 07:10:23 fs2 kernel: [18035.998019] R10: 0000000000000191 R11: 0000000000000190 R12: 0000000000000000
Sep 25 07:10:23 fs2 kernel: [18035.998019] R13: ffff8800a4b43bf0 R14: ffffffffa0133720 R15: ffff8800995bd988
Sep 25 07:10:23 fs2 kernel: [18035.998019] FS:  00007f1846316740(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000
Sep 25 07:10:23 fs2 kernel: [18035.998019] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 25 07:10:23 fs2 kernel: [18035.998019] CR2: 000000000122aae8 CR3: 000000009880c000 CR4: 00000000000007a0
Sep 25 07:10:23 fs2 kernel: [18035.998019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 25 07:10:23 fs2 kernel: [18035.998019] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 25 07:10:23 fs2 kernel: [18035.998019] Stack:
Sep 25 07:10:23 fs2 kernel: [18035.998019]  ffffffffa0111f07 ffff8800b2aa1e70 ffffffffa011ffd8 0000000000000000
Sep 25 07:10:23 fs2 kernel: [18035.998019]  0000000000000000 0000000000000000 ffff880000000004 0000000000000032
Sep 25 07:10:23 fs2 kernel: [18035.998019]  ffff8800a4b43ba0 ffff8800a4b43bf0 00000000626f7149 ffff8800995bd988
Sep 25 07:10:23 fs2 kernel: [18035.998019] Call Trace:
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa0111f07>] ? gfs2_dump_glock+0x1c7/0x360 [gfs2]
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa011ffd8>] ? gfs2_inode_lookup+0x128/0x240 [gfs2]
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81457b2b>] ? printk+0x4f/0x54
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81132e7d>] ? inode_init_always+0xed/0x1b0
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8145f245>] ? _raw_spin_lock+0x5/0x10
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa01138bb>] ? gfs2_glock_nq+0x30b/0x3e0 [gfs2]
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa011ffe0>] ? gfs2_inode_lookup+0x130/0x240 [gfs2]
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa0109195>] ? gfs2_dirent_search+0xe5/0x1c0 [gfs2]
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa010a4aa>] ? gfs2_dir_search+0x4a/0x80 [gfs2]
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa01202f7>] ? gfs2_lookupi+0xf7/0x1f0 [gfs2]
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa01203b9>] ? gfs2_lookupi+0x1b9/0x1f0 [gfs2]
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa0121821>] ? gfs2_lookup+0x21/0xa0 [gfs2]
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8145f245>] ? _raw_spin_lock+0x5/0x10
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff811315e6>] ? d_alloc+0x76/0x90
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81124be3>] ? lookup_dcache+0xa3/0xd0
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff811246c4>] ? lookup_real+0x14/0x50
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81124c42>] ? __lookup_hash+0x32/0x50
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81459d64>] ? lookup_slow+0x3c/0xa2
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8145f245>] ? _raw_spin_lock+0x5/0x10
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81126edf>] ? path_lookupat+0x23f/0x780
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa011f169>] ? gfs2_getxattr+0x79/0xa0 [gfs2]
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8112744f>] ? filename_lookup+0x2f/0xc0
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81125ccc>] ? getname_flags+0xbc/0x1a0
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8112a32c>] ? user_path_at_empty+0x5c/0xb0
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa01122c6>] ? gfs2_holder_uninit+0x16/0x30 [gfs2]
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8111f8fd>] ? cp_new_stat+0x10d/0x120
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8111facf>] ? vfs_fstatat+0x3f/0x90
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8111fc02>] ? SYSC_newstat+0x12/0x30
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8105ed51>] ? lg_local_lock+0x11/0x20
Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8145ff69>] ? system_call_fastpath+0x16/0x1b
Sep 25 07:10:23 fs2 kernel: [18035.998019] Code: 31 f6 48 85 c0 74 0c 8b 50 04 48 c1 e2 05 48 8b 74 10 38 e9 28 ff ff ff 0f 1f 84 00 00 00 00 00 48 85 ff 74 23 89 f6 48 8d 04 f7 <48> 8b 40 08 48 85 c0 74 1c 48 8d 14 76 48 8d 14 d5 30 02 00 00 
Sep 25 07:10:23 fs2 kernel: [18035.998019] RIP  [<ffffffff81053bcb>] pid_task+0xb/0x40
Sep 25 07:10:23 fs2 kernel: [18035.998019]  RSP <ffff8800a4a03a10>
Sep 25 07:10:23 fs2 kernel: [18036.033702] ---[ end trace e5751bbc7d3a8d7c ]---


simple inspecfion of the gfs2 code showed this is caused by attempting a
recursive lock. two gfs2_inode_lookups are visible in the trace, not sure
that is strictly relevant though.

this is followed by (probaby related) trace:


Sep 25 07:10:24 fs2 kernel: [18036.162513] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
Sep 25 07:10:24 fs2 kernel: [18036.164016] IP: [<ffffffffa011f7c6>] gfs2_permission+0x56/0x110 [gfs2]
Sep 25 07:10:24 fs2 kernel: [18036.164016] PGD 989a3067 PUD 9886a067 PMD 0 
Sep 25 07:10:24 fs2 kernel: [18036.164016] Oops: 0000 [#2] SMP 
Sep 25 07:10:24 fs2 kernel: [18036.164016] Modules linked in: iptable_filter ip_tables x_tables gfs2 dm_mod dlm sctp libcrc32c ipv6 configfs virtio_net i6300esb
Sep 25 07:10:24 fs2 kernel: [18036.164016] CPU: 1 PID: 25453 Comm: smbd Tainted: G      D      3.10.7-gentoo #10
Sep 25 07:10:24 fs2 kernel: [18036.164016] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Sep 25 07:10:24 fs2 kernel: [18036.164016] task: ffff8800afca0d80 ti: ffff8800a4a02000 task.ti: ffff8800a4a02000
Sep 25 07:10:24 fs2 kernel: [18036.164016] RIP: 0010:[<ffffffffa011f7c6>]  [<ffffffffa011f7c6>] gfs2_permission+0x56/0x110 [gfs2]
Sep 25 07:10:24 fs2 kernel: [18036.164016] RSP: 0018:ffff8800a4a03c08  EFLAGS: 00010286
Sep 25 07:10:24 fs2 kernel: [18036.164016] RAX: ffffffff8145f245 RBX: 0000000000000040 RCX: 0000000000000000
Sep 25 07:10:24 fs2 kernel: [18036.164016] RDX: ffff8800b5668f00 RSI: 0000000000000001 RDI: ffff8800a4b97ddc
Sep 25 07:10:24 fs2 kernel: [18036.164016] RBP: ffff880099486e60 R08: 0000000000000061 R09: 0000000000000000
Sep 25 07:10:24 fs2 kernel: [18036.164016] R10: ff48ad3954b34002 R11: d09e94939e979e85 R12: ffff8800a4b97ddc
Sep 25 07:10:24 fs2 kernel: [18036.164016] R13: 0000000000000001 R14: ffff8800a4b97df8 R15: ffff8800afca0d80
Sep 25 07:10:24 fs2 kernel: [18036.164016] FS:  00007f1846316740(0000) GS:ffff8800bfa80000(0000) knlGS:0000000000000000
Sep 25 07:10:24 fs2 kernel: [18036.164016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 25 07:10:24 fs2 kernel: [18036.164016] CR2: 0000000000000070 CR3: 000000009880c000 CR4: 00000000000007a0
Sep 25 07:10:24 fs2 kernel: [18036.164016] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 25 07:10:24 fs2 kernel: [18036.164016] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 25 07:10:24 fs2 kernel: [18036.164016] Stack:
Sep 25 07:10:24 fs2 kernel: [18036.164016]  ffff8800994e0c00 ffffffff81125a8b ffff8800a4a03c18 ffff8800a4a03c18
Sep 25 07:10:24 fs2 kernel: [18036.164016]  0000000000000000 ffff8800bbba8d20 0000000800000003 0000000200000000
Sep 25 07:10:24 fs2 kernel: [18036.164016]  ffffffff8145f245 ffffffff8112ff5e ffff8800a4a03e08 0000000000000007
Sep 25 07:10:24 fs2 kernel: [18036.164016] Call Trace:
Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff81125a8b>] ? lookup_fast+0x1ab/0x2f0
Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8145f245>] ? _raw_spin_lock+0x5/0x10
Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8112ff5e>] ? dput+0x17e/0x220
Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8112610a>] ? link_path_walk+0x23a/0x8b0
Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff81126b9c>] ? path_init+0x30c/0x410
Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff81126cf2>] ? path_lookupat+0x52/0x780
Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8112744f>] ? filename_lookup+0x2f/0xc0
Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff81125ccc>] ? getname_flags+0xbc/0x1a0
Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8112a32c>] ? user_path_at_empty+0x5c/0xb0
Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8111facf>] ? vfs_fstatat+0x3f/0x90
Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8111fc02>] ? SYSC_newstat+0x12/0x30
Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8111b420>] ? SyS_read+0x50/0xa0
Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8145ff69>] ? system_call_fastpath+0x16/0x1b
Sep 25 07:10:24 fs2 kernel: [18036.164016] Code: c6 50 65 48 8b 04 25 80 b7 00 00 48 8b 90 40 02 00 00 4c 39 f3 75 14 eb 1a 0f 1f 40 00 48 3b 53 18 74 12 48 8b 1b 49 39 de 74 08 <48> 8b 43 30 a8 40 75 ea 31 db 4c 89 e7 e8 e8 78 f0 e0 66 90 45 
Sep 25 07:10:24 fs2 kernel: [18036.164016] RIP  [<ffffffffa011f7c6>] gfs2_permission+0x56/0x110 [gfs2]
Sep 25 07:10:24 fs2 kernel: [18036.164016]  RSP <ffff8800a4a03c08>
Sep 25 07:10:24 fs2 kernel: [18036.164016] CR2: 0000000000000070
Sep 25 07:10:24 fs2 kernel: [18036.218133] ---[ end trace e5751bbc7d3a8d7d ]---

afterwards the log is filled with "INFO: rcu_sched self-detected stall" and
NMI-caused backtraces

Is this a known-and-fixed bug? is there a way to prevent this?


thanks
Pavel Herrmann


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]