[Linux-cluster] GFS2 crashes after upgrade RHEL 6.4

Wed Mar 13 17:12:59 UTC 2013

Hi,

On Wed, 2013-03-13 at 10:09 -0700, Scooter Morris wrote:
> Hi all,
>      We're seeing gfs2 crashes since we've upgraded to RHEL 6.4.  The 
> traceback is:
> 
There is a fix available for that, bug #908398. Please open a ticket
with our support team and quote that bug number and they should be able
to send you the fix. It is already fixed in upstream,

Steve.

> [2013-03-13 08:48:24]BUG: unable to handle kernel NULL pointer 
> dereference at 0000000000000060^M
> [2013-03-13 08:48:24]IP: [<ffffffffa04d66ef>] 
> gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]^M
> [2013-03-13 08:48:24]PGD 0 ^M
> [2013-03-13 08:48:24]Oops: 0002 [#1] SMP ^M
> [2013-03-13 08:48:24]last sysfs file: 
> /sys/devices/pci0000:00/0000:00:06.0/0000:0b:00.0/0000:0c:09.0/0000:0d:00.1/host3/rport-3:0-4/target3:0:3/3:0:3:14/state^M
> [2013-03-13 08:48:24]CPU 0 ^M
> [2013-03-13 08:48:24]Modules linked in: autofs4 gfs2 dlm configfs sunrpc 
> p4_clockmod freq_table speedstep_lib arpt_mangle arptable_filter 
> arp_tables ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_recent ipt_LOG 
> iptable_filter ip_tables nf_conntrack_netbios_ns nf_conntrack_broadcast 
> ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack 
> ip6table_filter ip6_tables ipv6 uinput hpwdt hpilo microcode iTCO_wdt 
> iTCO_vendor_support i7300_edac edac_core bnx2 sg shpchp ext4 mbcache 
> jbd2 dm_round_robin sd_mod crc_t10dif sr_mod cdrom qla2xxx 
> scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix hpsa cciss 
> radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath 
> dm_mirror dm_region_hash dm_log dm_mod [last unloaded: mperf]^M
> [2013-03-13 08:48:24]^M
> [2013-03-13 08:48:24]Pid: 9888, comm: smbd Not tainted 
> 2.6.32-358.0.1.el6.x86_64 #1 HP ProLiant DL580 G5^M
> [2013-03-13 08:48:24]RIP: 0010:[<ffffffffa04d66ef>] [<ffffffffa04d66ef>] 
> gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]^M
> [2013-03-13 08:48:24]RSP: 0018:ffff880dce0f9c98  EFLAGS: 00010287^M
> [2013-03-13 08:48:24]RAX: ffff880ff78999a8 RBX: ffff880dae61d7c0 RCX: 
> 00000000006c0762^M
> [2013-03-13 08:48:24]RDX: 00000000006c0762 RSI: 00000000006c075b RDI: 
> ffff88100b2b6440^M
> [2013-03-13 08:48:24]RBP: ffff880dce0f9d58 R08: 1050000000000000 R09: 
> f213f3d57bbf820a^M
> [2013-03-13 08:48:24]R10: 0000000000000000 R11: 0000000000000246 R12: 
> 0000000000001000^M
> [2013-03-13 08:48:24]R13: 0000000000000000 R14: 0000000000000001 R15: 
> 0000000000000000^M
> [2013-03-13 08:48:24]FS:  00007f3ac254c7c0(0000) 
> GS:ffff880061a00000(0000) knlGS:0000000000000000^M
> [2013-03-13 08:48:24]CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
> [2013-03-13 08:48:24]CR2: 0000000000000060 CR3: 0000000dce153000 CR4: 
> 00000000000007f0^M
> [2013-03-13 08:48:24]DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000^M
> [2013-03-13 08:48:24]DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> 0000000000000400^M
> [2013-03-13 08:48:24]Process smbd (pid: 9888, threadinfo 
> ffff880dce0f8000, task ffff88100b0b6ae0)^M
> [2013-03-13 08:48:24]Stack:^M
> [2013-03-13 08:48:24] ffff880dce0f9e08 000000000000000a ffff880dce0f9cc8 
> ffffffff81096c8f^M
> [2013-03-13 08:48:24]<d> ffff880dce0f9dd8 00000007b078eaf8 
> ffff880dce0f9cd8 ffff88100b2b6000^M
> [2013-03-13 08:48:24]<d> ffff880dce0f9d28 ffffffffa04be2a8 
> ffff880ff78999a8 0000000000000000^M
> [2013-03-13 08:48:24]Call Trace:^M
> [2013-03-13 08:48:24] [<ffffffff81096c8f>] ? wake_up_bit+0x2f/0x40^M
> [2013-03-13 08:48:24] [<ffffffffa04be2a8>] ? do_promote+0x208/0x330 [gfs2]^M
> [2013-03-13 08:48:24] [<ffffffffa04b106e>] gfs2_setattr_size+0xce/0x210 
> [gfs2]^M
> [2013-03-13 08:48:24] [<ffffffffa04cd534>] gfs2_setattr+0x214/0x330 [gfs2]^M
> [2013-03-13 08:48:24] [<ffffffffa04cd366>] ? gfs2_setattr+0x46/0x330 
> [gfs2]^M
> [2013-03-13 08:48:24] [<ffffffff8119e768>] notify_change+0x168/0x340^M
> [2013-03-13 08:48:24] [<ffffffff8117f1e4>] do_truncate+0x64/0xa0^M
> [2013-03-13 08:48:24] [<ffffffff8117f520>] sys_ftruncate+0x120/0x130^M
> [2013-03-13 08:48:24] [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b^M
> [2013-03-13 08:48:24]Code: 0f 84 c1 fc ff ff e9 41 fb ff ff 48 8b 4d a0 
> 48 8b b1 10 03 00 00 48 8b bd 78 ff ff ff ba 01 00 00 00 e8 75 d6 ff ff 
> 48 89 45 90 <49> 89 45 60 c7 45 9c 01 00 00 00 48 8b 45 90 e9 01 fb ff 
> ff 48 ^M
> [2013-03-13 08:48:24]RIP  [<ffffffffa04d66ef>] 
> gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]^M
> [2013-03-13 08:48:24] RSP <ffff880dce0f9c98>^M
> [2013-03-13 08:48:24]CR2: 0000000000000060^M
> 
> 
> We've seen this from both svn and smbd now, and on a couple of different 
> nodes in our cluster.   We brought the cluster down last night and ran 
> gfs2.fsck on all filesystems, but the problem persists.
> 
> Has anyone seen this before?  Is there a workaround or should we drop 
> back to the previous kernel?
> 
> -- scooter
> 
>