[Linux-cluster] GFS2 panic on current release

Allen Belletti allen at isye.gatech.edu
Wed Nov 18 16:06:38 UTC 2009


Hi All,

A few weeks ago I discovered that I'd had an obsolete gfs2 kernel module 
loaded and removed it, thus bringing it up to the revision included in 
the current kernel.  Was hoping that all was well, but then yesterday 
morning one of the nodes panicked as follows:

original: gfs2_rename+0x19d/0x63b [gfs2]
pid : 12810
lock type: 3 req lock state : 1
new: gfs2_rlist_alloc+0x5c/0x6a [gfs2]
pid: 12810
lock type: 3 req lock state : 1
  G:  s:EX n:3/33d0327 f:y t:EX d:EX/0 l:0 a:5 r:4
   H: s:EX f:H e:0 p:12810 [imap] gfs2_rename+0x19d/0x63b [gfs2]
   R: n:54330151 f:05 b:274/274 i:1121
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at fs/gfs2/glock.c:1074
invalid opcode: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:0a.0/0000:02:02.0/irq
CPU 1
Modules linked in: nfs fscache nfs_acl lock_dlm gfs2 dlm configfs lockd 
sunrpc ipv6 xfrm_nalgo crypto_api ipt_LOG xt_state ip_conntrack 
nfnetlink xt_tcpudp iptable_filter ip_tables x_tables 8021q dm_multipath 
scsi_dh video backlight sbs i2c_ec button battery asus_acpi 
acpi_memhotplug ac parport_pc lp parport i2c_amd756 k8temp ide_cd 
i2c_core hwmon sg amd_rng cdrom k8_edac pcspkr tg3 floppy edac_mc e1000 
dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero 
dm_mirror dm_log dm_mod qla2xxx scsi_transport_fc shpchp mptspi mptscsih 
mptbase scsi_transport_spi sd_mod scsi_mod raid1 ext3 jbd uhci_hcd 
ohci_hcd ehci_hcd
Pid: 12810, comm: imap Not tainted 2.6.18-164.6.1.el5 #1
RIP: 0010:[<ffffffff8862a6df>]  [<ffffffff8862a6df>] 
:gfs2:gfs2_glock_nq+0x231/0x273
RSP: 0018:ffff8101ba8d9868  EFLAGS: 00010292
RAX: 0000000000000000 RBX: ffff8101ba8d9cb0 RCX: 0000000000000461
RDX: ffff8101ffe27a98 RSI: ffffffff80309c28 RDI: ffffffff80309c20
RBP: ffff8101860b1340 R08: ffffffff80309c28 R09: 000000000000003f
R10: ffff8101ba8d9368 R11: 0000000000000000 R12: ffff8100e87ea590
R13: ffff8100e87ea590 R14: ffff8100ed24e000 R15: 0000000000000000
FS:  00002b18a78ac530(0000) GS:ffff810103901940(0000) knlGS:00000000acbfbb90
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b70cf5cf000 CR3: 00000001b4d4a000 CR4: 00000000000006e0
Process imap (pid: 12810, threadinfo ffff8101ba8d8000, task 
ffff8101ffe277e0)
Stack:  ffff8101860b1340 0000000000000001 ffff8100b3e1b000 ffff8100b3e1a0e8
  0000000000000000 ffffffff8862a74e 0000000000000038 ffff810184e88368
  0000000000000001 ffffffff800caa0b 0000000000000005 ffff810184e88368
Call Trace:
  [<ffffffff8862a74e>] :gfs2:gfs2_glock_nq_m+0x2d/0xf4
  [<ffffffff800caa0b>] __kzalloc+0x9/0x21
  [<ffffffff88622831>] :gfs2:do_strip+0x175/0x349
  [<ffffffff886217e2>] :gfs2:recursive_scan+0xf2/0x175
  [<ffffffff886218fe>] :gfs2:trunc_dealloc+0x99/0xe7
  [<ffffffff886226bc>] :gfs2:do_strip+0x0/0x349
  [<ffffffff80090000>] sched_exit+0xb4/0xb5
  [<ffffffff88638dda>] :gfs2:gfs2_delete_inode+0xdd/0x191
  [<ffffffff88638d43>] :gfs2:gfs2_delete_inode+0x46/0x191
  [<ffffffff88628e77>] :gfs2:gfs2_glock_schedule_for_reclaim+0x5d/0x9a
  [<ffffffff88638cfd>] :gfs2:gfs2_delete_inode+0x0/0x191
  [<ffffffff8002f48f>] generic_delete_inode+0xc6/0x143
  [<ffffffff8863d9a4>] :gfs2:gfs2_inplace_reserve_i+0x63b/0x691
  [<ffffffff886248c4>] :gfs2:gfs2_dirent_find_space+0x0/0x41
  [<ffffffff88623983>] :gfs2:gfs2_dirent_search+0x147/0x16e
  [<ffffffff886377c5>] :gfs2:gfs2_rename+0x3be/0x63b
  [<ffffffff88637506>] :gfs2:gfs2_rename+0xff/0x63b
  [<ffffffff8863754c>] :gfs2:gfs2_rename+0x145/0x63b
  [<ffffffff88637571>] :gfs2:gfs2_rename+0x16a/0x63b
  [<ffffffff886375a4>] :gfs2:gfs2_rename+0x19d/0x63b
  [<ffffffff88629e29>] :gfs2:gfs2_holder_uninit+0xd/0x1f
  [<ffffffff886385bf>] :gfs2:gfs2_permission+0xaf/0xd4
  [<ffffffff88633124>] :gfs2:gfs2_drevalidate+0x158/0x214
  [<ffffffff8000d902>] permission+0x81/0xc8
  [<ffffffff8002a7d9>] vfs_rename+0x2f4/0x471
  [<ffffffff80036c20>] sys_renameat+0x180/0x1eb
  [<ffffffff800b66f5>] audit_syscall_entry+0x180/0x1b3
  [<ffffffff8005d28d>] tracesys+0xd5/0xe0


Code: 0f 0b 68 f8 27 64 88 c2 32 04 be 01 00 00 00 4c 89 ef e8 df
RIP  [<ffffffff8862a6df>] :gfs2:gfs2_glock_nq+0x231/0x273
  RSP <ffff8101ba8d9868>
<0>Kernel panic - not syncing: Fatal exception
  Killed by signal 15.

It seems possible that there would be some filesystem damage from 
running the old code and I'm going to fsck this weekend, but wanted to 
post this in case it revealed an obvious problem to anyone.  The 
"invalid opcode: 0000" makes me think we ended up executing code that 
was actually data, but beyond that I'm clueless.

Thanks,
Allen

-- 
Allen Belletti
allen at isye.gatech.edu                             404-894-6221 Phone
Industrial and Systems Engineering                404-385-2988 Fax
Georgia Institute of Technology




More information about the Linux-cluster mailing list