[Linux-cluster] GFS2 crashes - sys_rename

5hosting Team office at 5hosting.com
Wed Apr 17 19:48:58 UTC 2013


Hey larry,

 

to what version did you roll back? Did you have to fsck the cluster or did
it work out of the box? Is your cluster stable right now?

 

Thanks in advance, Jürgen

 

Von: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] Im Auftrag von laurence.schuler
Gesendet: Mittwoch, 17. April 2013 21:35
An: linux-cluster at redhat.com
Betreff: Re: [Linux-cluster] GFS2 crashes - sys_rename

 

There's a similar bug about this same crash in the 358 kernel. Its a
different bug. I rolled back to the previous for now, Redhat should have a
fix soon.

--larry

On 04/17/2013 03:02 PM, 5hosting Team wrote:

Hey guys,

 

We run a 40 node webcluster (only apache, php processes) and the nodes keep
on crashing with a kernel panic. For me it looks like the rename of a
file/directory aint working. I found someone posting the same a few days ago
and it should be fixed in kernel 2.6.32-358.2.1.el6, but that’s the kernel
we’re running. And we just used fsck yesterday night to check for problems
with the file system. So something doesn’t seem right.

 

Here are 3 crashlogs from 3 different nodes:

Apr 17 20:20:16 001 kernel: Modules linked in: gfs2 dlm configfs sg sd_mod
crc_t10dif ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr
iscsi_tcp iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6
nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables serio_raw
i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ahci video output
e1000e dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss
nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi
cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi
scsi_transport_iscsi [last unloaded: scsi_wait_scan]

Apr 17 20:20:16 001 kernel: 

Apr 17 20:20:16 001 kernel: Pid: 2915, comm: php-cgi Not tainted
2.6.32-358.2.1.el6.x86_64 #1 Supermicro X9SCL/X9SCM/X9SCL/X9SCM

Apr 17 20:20:16 001 kernel: RIP: 0010:[<ffffffffa04266ff>]
[<ffffffffa04266ff>] gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]

Apr 17 20:20:16 001 kernel: RSP: 0018:ffff880417e8ba58  EFLAGS: 00010283

Apr 17 20:20:16 001 kernel: RAX: ffff8804185a3da8 RBX: 0000000000000003 RCX:
000000000db41094

Apr 17 20:20:16 001 kernel: RDX: 000000000db41094 RSI: 000000000db21756 RDI:
ffff8804187ef440

Apr 17 20:20:16 001 kernel: RBP: ffff880417e8bb18 R08: 0000000000000000 R09:
0000000000000000

Apr 17 20:20:16 001 kernel: R10: 0000000000001000 R11: 0000000000000000 R12:
ffff8804187ef000

Apr 17 20:20:16 001 kernel: R13: 0000000000000000 R14: ffff88041519c3e0 R15:
ffff880417e8bb78

Apr 17 20:20:16 001 kernel: FS:  00007f07791ff7c0(0000)
GS:ffff880028200000(0000) knlGS:0000000000000000

Apr 17 20:20:16 001 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033

Apr 17 20:20:16 001 kernel: CR2: 0000000000000060 CR3: 0000000411e94000 CR4:
00000000001407f0

Apr 17 20:20:16 001 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000

Apr 17 20:20:16 001 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400

Apr 17 20:20:16 001 kernel: Process php-cgi (pid: 2915, threadinfo
ffff880417e8a000, task ffff8804125cc040)

Apr 17 20:20:16 001 kernel: Stack:

Apr 17 20:20:16 001 kernel: ffff8804163dbab0 ffffffffa04012d0
ffff880417e8ba78 ffff8804163dd0c0

Apr 17 20:20:16 001 kernel: <d> ffff8804163dbab0 00000115a04012d0
ffff880417e8ba98 ffff8804187ef000

Apr 17 20:20:16 001 kernel: <d> ffff880417e8baf8 ffffffffa0402931
ffff8804185a3da8 0000000000000000

Apr 17 20:20:16 001 kernel: Call Trace:

Apr 17 20:20:16 001 kernel: [<ffffffffa04012d0>] ?
gfs2_dirent_find_space+0x0/0x50 [gfs2]

Apr 17 20:20:16 001 kernel: [<ffffffffa0402931>] ?
gfs2_dirent_search+0x191/0x1a0 [gfs2]

Apr 17 20:20:16 001 kernel: [<ffffffffa041e041>] gfs2_rename+0x6b1/0x8c0
[gfs2]

Apr 17 20:20:16 001 kernel: [<ffffffffa041dab8>] ? gfs2_rename+0x128/0x8c0
[gfs2]

Apr 17 20:20:16 001 kernel: [<ffffffffa041dad6>] ? gfs2_rename+0x146/0x8c0
[gfs2]

Apr 17 20:20:16 001 kernel: [<ffffffffa041dafc>] ? gfs2_rename+0x16c/0x8c0
[gfs2]

Apr 17 20:20:16 001 kernel: [<ffffffffa040c44f>] ? gfs2_glock_put+0x3f/0x180
[gfs2]

Apr 17 20:20:16 001 kernel: [<ffffffffa040c8a3>] ?
gfs2_holder_uninit+0x23/0x40 [gfs2]

Apr 17 20:20:16 001 kernel: [<ffffffffa040da5e>] ?
gfs2_glock_dq_uninit+0x1e/0x30 [gfs2]

Apr 17 20:20:16 001 kernel: [<ffffffffa041c9dc>] ?
gfs2_permission+0x9c/0x100 [gfs2]

Apr 17 20:20:16 001 kernel: [<ffffffffa041da65>] ? gfs2_rename+0xd5/0x8c0
[gfs2]

Apr 17 20:20:16 001 kernel: [<ffffffff8118ffdb>] vfs_rename+0x3ab/0x440

Apr 17 20:20:16 001 kernel: [<ffffffff81191d0a>] sys_renameat+0x1da/0x240

Apr 17 20:20:16 001 kernel: [<ffffffff81277495>] ?
_atomic_dec_and_lock+0x55/0x80

Apr 17 20:20:16 001 kernel: [<ffffffff81186874>] ? cp_new_stat+0xe4/0x100

Apr 17 20:20:16 001 kernel: [<ffffffff81186c46>] ? sys_newstat+0x36/0x50

Apr 17 20:20:16 001 kernel: [<ffffffff810dc8f7>] ?
audit_syscall_entry+0x1d7/0x200

Apr 17 20:20:16 001 kernel: [<ffffffff81191d8b>] sys_rename+0x1b/0x20

Apr 17 20:20:16 001 kernel: [<ffffffff8100b072>]
system_call_fastpath+0x16/0x1b

Apr 17 20:20:16 001 kernel: Code: 0f 84 c1 fc ff ff e9 41 fb ff ff 48 8b 4d
a0 48 8b b1 10 03 00 00 48 8b bd 78 ff ff ff ba 01 00 00 00 e8 75 d6 ff ff
48 89 45 90 <49> 89 45 60 c7 45 9c 01 00 00 00 48 8b 45 90 e9 01 fb ff ff 48


Apr 17 20:20:16 001 kernel: RIP  [<ffffffffa04266ff>]
gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]

Apr 17 20:20:16 001 kernel: RSP <ffff880417e8ba58>

Apr 17 20:20:16 001 kernel: CR2: 0000000000000060

Apr 17 20:20:16 001 kernel: ---[ end trace 0647d0d2004566f6 ]---

 

 

Apr 17 20:21:00 002 kernel: Modules linked in: gfs2 dlm configfs sg sd_mod
crc_t10dif ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr
iscsi_tcp iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6
nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables serio_raw
i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ahci video output
e1000e dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss
nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi
cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi
scsi_transport_iscsi [last unloaded: scsi_wait_scan]

Apr 17 20:21:00 002 kernel: 

Apr 17 20:21:00 002 kernel: Pid: 2839, comm: php-cgi Not tainted
2.6.32-358.2.1.el6.x86_64 #1 Supermicro X9SCL/X9SCM/X9SCL/X9SCM

Apr 17 20:21:00 002 kernel: RIP: 0010:[<ffffffffa04266ff>]
[<ffffffffa04266ff>] gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]

Apr 17 20:21:00 002 kernel: RSP: 0000:ffff8803f518ba58  EFLAGS: 00010283

Apr 17 20:21:00 002 kernel: RAX: ffff88041447bda8 RBX: 0000000000000003 RCX:
000000000db41094

Apr 17 20:21:00 002 kernel: RDX: 000000000db41094 RSI: 000000000db21756 RDI:
ffff880414cd5440

Apr 17 20:21:00 002 kernel: RBP: ffff8803f518bb18 R08: 0000000000000000 R09:
0000000000000000

Apr 17 20:21:00 002 kernel: R10: 0000000000001000 R11: 0000000000000000 R12:
ffff880414cd5000

Apr 17 20:21:00 002 kernel: R13: 0000000000000000 R14: ffff8803f9e918c0 R15:
ffff8803f518bb78

Apr 17 20:21:00 002 kernel: FS:  00007f6a7e8a27c0(0000)
GS:ffff8800282c0000(0000) knlGS:0000000000000000

Apr 17 20:21:00 002 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033

Apr 17 20:21:00 002 kernel: CR2: 0000000000000060 CR3: 00000003f6313000 CR4:
00000000001407e0

Apr 17 20:21:00 002 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000

Apr 17 20:21:00 002 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400

Apr 17 20:21:00 002 kernel: Process php-cgi (pid: 2839, threadinfo
ffff8803f518a000, task ffff8803f5189540)

Apr 17 20:21:00 002 kernel: Stack:

Apr 17 20:21:00 002 kernel: ffff880411bfbcb0 ffffffffa04012d0
ffff8803f518ba78 ffff88041518eb60

Apr 17 20:21:00 002 kernel: <d> ffff880411bfbcb0 00000115a04012d0
ffff8803f518ba98 ffff880414cd5000

Apr 17 20:21:00 002 kernel: <d> ffff8803f518baf8 ffffffffa0402931
ffff88041447bda8 0000000000000000

Apr 17 20:21:00 002 kernel: Call Trace:

Apr 17 20:21:00 002 kernel: [<ffffffffa04012d0>] ?
gfs2_dirent_find_space+0x0/0x50 [gfs2]

Apr 17 20:21:00 002 kernel: [<ffffffffa0402931>] ?
gfs2_dirent_search+0x191/0x1a0 [gfs2]

Apr 17 20:21:00 002 kernel: [<ffffffffa041e041>] gfs2_rename+0x6b1/0x8c0
[gfs2]

Apr 17 20:21:00 002 kernel: [<ffffffffa041dab8>] ? gfs2_rename+0x128/0x8c0
[gfs2]

Apr 17 20:21:00 002 kernel: [<ffffffffa041dad6>] ? gfs2_rename+0x146/0x8c0
[gfs2]

Apr 17 20:21:00 002 kernel: [<ffffffffa041dafc>] ? gfs2_rename+0x16c/0x8c0
[gfs2]

Apr 17 20:21:00 002 kernel: [<ffffffffa040c44f>] ? gfs2_glock_put+0x3f/0x180
[gfs2]

Apr 17 20:21:00 002 kernel: [<ffffffffa040c8a3>] ?
gfs2_holder_uninit+0x23/0x40 [gfs2]

Apr 17 20:21:00 002 kernel: [<ffffffffa040da5e>] ?
gfs2_glock_dq_uninit+0x1e/0x30 [gfs2]

Apr 17 20:21:00 002 kernel: [<ffffffffa041c9dc>] ?
gfs2_permission+0x9c/0x100 [gfs2]

Apr 17 20:21:00 002 kernel: [<ffffffffa041da65>] ? gfs2_rename+0xd5/0x8c0
[gfs2]

Apr 17 20:21:00 002 kernel: [<ffffffff8118ffdb>] vfs_rename+0x3ab/0x440

Apr 17 20:21:00 002 kernel: [<ffffffff81191d0a>] sys_renameat+0x1da/0x240

Apr 17 20:21:00 002 kernel: [<ffffffff81277495>] ?
_atomic_dec_and_lock+0x55/0x80

Apr 17 20:21:00 002 kernel: [<ffffffff81186874>] ? cp_new_stat+0xe4/0x100

Apr 17 20:21:00 002 kernel: [<ffffffff81186c46>] ? sys_newstat+0x36/0x50

Apr 17 20:21:00 002 kernel: [<ffffffff810dc8f7>] ?
audit_syscall_entry+0x1d7/0x200

Apr 17 20:21:00 002 kernel: [<ffffffff81191d8b>] sys_rename+0x1b/0x20

Apr 17 20:21:00 002 kernel: [<ffffffff8100b072>]
system_call_fastpath+0x16/0x1b

Apr 17 20:21:00 002 kernel: Code: 0f 84 c1 fc ff ff e9 41 fb ff ff 48 8b 4d
a0 48 8b b1 10 03 00 00 48 8b bd 78 ff ff ff ba 01 00 00 00 e8 75 d6 ff ff
48 89 45 90 <49> 89 45 60 c7 45 9c 01 00 00 00 48 8b 45 90 e9 01 fb ff ff 48


Apr 17 20:21:00 002 kernel: RIP  [<ffffffffa04266ff>]
gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]

Apr 17 20:21:00 002 kernel: RSP <ffff8803f518ba58>

Apr 17 20:21:00 002 kernel: CR2: 0000000000000060

Apr 17 20:21:00 002 kernel: ---[ end trace 1425fd0e2954015a ]---

 

 

Apr 17 20:12:49 003 kernel: BUG: unable to handle kernel NULL pointer
dereference at 0000000000000060

Apr 17 20:12:49 003 kernel: IP: [<ffffffffa04236ff>]
gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]

Apr 17 20:12:49 003 kernel: PGD 3d96fc067 PUD 3d2c0a067 PMD 0 

Apr 17 20:12:49 003 kernel: Oops: 0002 [#1] SMP 

Apr 17 20:12:49 003 kernel: last sysfs file: /sys/kernel/dlm/b1/control

Apr 17 20:12:49 003 kernel: CPU 1 

Apr 17 20:12:49 003 kernel: Modules linked in: gfs2 dlm configfs sg sd_mod
crc_t10dif ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr
iscsi_tcp iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6
nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables serio_raw
i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ahci video output
e1000e dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss
nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi
cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi
scsi_transport_iscsi [last unloaded: scsi_wait_scan]

Apr 17 20:12:49 003 kernel: 

Apr 17 20:12:49 003 kernel: Pid: 3386, comm: php-cgi Not tainted
2.6.32-358.2.1.el6.x86_64 #1 Supermicro X9SCL/X9SCM/X9SCL/X9SCM

Apr 17 20:12:49 003 kernel: RIP: 0010:[<ffffffffa04236ff>]
[<ffffffffa04236ff>] gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]

Apr 17 20:12:49 003 kernel: RSP: 0018:ffff8803d1c27a58  EFLAGS: 00010283

Apr 17 20:12:49 003 kernel: RAX: ffff880416771da8 RBX: 0000000000000003 RCX:
000000000db41094

Apr 17 20:12:49 003 kernel: RDX: 000000000db41094 RSI: 000000000db21756 RDI:
ffff88041277b440

Apr 17 20:12:49 003 kernel: RBP: ffff8803d1c27b18 R08: 0000000000000000 R09:
0000000000000000

Apr 17 20:12:49 003 kernel: R10: 0000000000001000 R11: 0000000000000000 R12:
ffff88041277b000

Apr 17 20:12:49 003 kernel: R13: 0000000000000000 R14: ffff8803a9c181c0 R15:
ffff8803d1c27b78

Apr 17 20:12:49 003 kernel: FS:  00007fd494d017c0(0000)
GS:ffff880028240000(0000) knlGS:0000000000000000

Apr 17 20:12:49 003 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033

Apr 17 20:12:49 003 kernel: CR2: 0000000000000060 CR3: 00000003d170c000 CR4:
00000000001407e0

Apr 17 20:12:49 003 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000

Apr 17 20:12:49 003 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400

Apr 17 20:12:49 003 kernel: Process php-cgi (pid: 3386, threadinfo
ffff8803d1c26000, task ffff8803d1450080)

Apr 17 20:12:49 003 kernel: Stack:

Apr 17 20:12:49 003 kernel: ffff8803a9c2e270 ffffffffa03fe2d0
ffff8803d1c27a78 ffff8803cee73800

Apr 17 20:12:49 003 kernel: <d> ffff8803a9c2e270 00000115a03fe2d0
ffff8803d1c27a98 ffff88041277b000

Apr 17 20:12:49 003 kernel: <d> ffff8803d1c27af8 ffffffffa03ff931
ffff880416771da8 0000000000000000

Apr 17 20:12:49 003 kernel: Call Trace:

Apr 17 20:12:49 003 kernel: [<ffffffffa03fe2d0>] ?
gfs2_dirent_find_space+0x0/0x50 [gfs2]

Apr 17 20:12:49 003 kernel: [<ffffffffa03ff931>] ?
gfs2_dirent_search+0x191/0x1a0 [gfs2]

Apr 17 20:12:49 003 kernel: [<ffffffffa041b041>] gfs2_rename+0x6b1/0x8c0
[gfs2]

Apr 17 20:12:49 003 kernel: [<ffffffffa041aab8>] ? gfs2_rename+0x128/0x8c0
[gfs2]

Apr 17 20:12:49 003 kernel: [<ffffffffa041aad6>] ? gfs2_rename+0x146/0x8c0
[gfs2]

Apr 17 20:12:49 003 kernel: [<ffffffffa041aafc>] ? gfs2_rename+0x16c/0x8c0
[gfs2]

Apr 17 20:12:49 003 kernel: [<ffffffffa040944f>] ? gfs2_glock_put+0x3f/0x180
[gfs2]

Apr 17 20:12:49 003 kernel: [<ffffffffa04098a3>] ?
gfs2_holder_uninit+0x23/0x40 [gfs2]

Apr 17 20:12:49 003 kernel: [<ffffffffa040aa5e>] ?
gfs2_glock_dq_uninit+0x1e/0x30 [gfs2]

Apr 17 20:12:49 003 kernel: [<ffffffffa04199dc>] ?
gfs2_permission+0x9c/0x100 [gfs2]

Apr 17 20:12:49 003 kernel: [<ffffffffa041aa65>] ? gfs2_rename+0xd5/0x8c0
[gfs2]

Apr 17 20:12:49 003 kernel: [<ffffffff8118ffdb>] vfs_rename+0x3ab/0x440

Apr 17 20:12:49 003 kernel: [<ffffffff81191d0a>] sys_renameat+0x1da/0x240

Apr 17 20:12:49 003 kernel: [<ffffffff81277495>] ?
_atomic_dec_and_lock+0x55/0x80

Apr 17 20:12:49 003 kernel: [<ffffffff81186874>] ? cp_new_stat+0xe4/0x100

Apr 17 20:12:49 003 kernel: [<ffffffff81186c46>] ? sys_newstat+0x36/0x50

Apr 17 20:12:49 003 kernel: [<ffffffff810dc8f7>] ?
audit_syscall_entry+0x1d7/0x200

Apr 17 20:12:49 003 kernel: [<ffffffff81191d8b>] sys_rename+0x1b/0x20

Apr 17 20:12:49 003 kernel: [<ffffffff8100b072>]
system_call_fastpath+0x16/0x1b

Apr 17 20:12:49 003 kernel: Code: 0f 84 c1 fc ff ff e9 41 fb ff ff 48 8b 4d
a0 48 8b b1 10 03 00 00 48 8b bd 78 ff ff ff ba 01 00 00 00 e8 75 d6 ff ff
48 89 45 90 <49> 89 45 60 c7 45 9c 01 00 00 00 48 8b 45 90 e9 01 fb ff ff 48


Apr 17 20:12:49 003 kernel: RIP  [<ffffffffa04236ff>]
gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]

Apr 17 20:12:49 003 kernel: RSP <ffff8803d1c27a58>

Apr 17 20:12:49 003 kernel: CR2: 0000000000000060

Apr 17 20:12:49 003 kernel: ---[ end trace 06b117dc4fff0890 ]---

 

 

 

The call trace looks for me kinda the same on all nodes and after we
rebooted ALL 40 nodes, the “bug” seems to be gone and the system is running
fine right now. (it’s running 20 minutes now without rebooting, before that
we had a reboot every half minute)

 

Do you know anything about that – how can we fix it?

It’s a webcluster and such crashes aren’t good. It should be online 24/7 but
right now it doesn’t look that good.

 

Thanks in advance, Jürgen






-- 
Laurence Schuler (Larry)                       Laurence.Schuler at nasa.gov
Systems Support                                       ADNET Systems, Inc
Scientific Visualization Studio                 http://svs.gsfc.nasa.gov
NASA/Goddard Space Flight Center, Code 606.4       phone: 1-301-286-1799
Greenbelt, MD 20771                                  fax: 1-301-286-1634
Note: I am not a government employee and have no authority to obligate
any federal, state or local government to perform any action or payment.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130417/478d4cc6/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6079 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130417/478d4cc6/attachment.p7s>


More information about the Linux-cluster mailing list