[Linux-cluster] Cluster Node Crash

Steve Rigler srigler at marathonoil.com
Fri Jul 27 19:21:37 UTC 2007


Hello All,

We are running GFS on RHEL4U3 (x86_64).  One of our cluster nodes
crashes this afternoon.  We are able to capture some of the message from
netdump (pasted below) before fencing killed the node.

Any advice would be appreciated.

Thanks,
Steve


lock_dlm:  Assertion failed on line 357 of file /usr/src/build/714650-
x86_64/BUILD/gfs-kernel-2.6.9-49/smp/src/dlm/lock.c
lock_dlm:  assertion:  "!error"
lock_dlm:  time = 5441671088
HOME: error=-22 num=3,bcd97c3 lkf=9 flags=84
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at lock:357
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in: lockd parport netconsole netdump lock_dlm(U)(U) dlm
md5 ipv6 battery ac tg3 floppy ext3 qla2300 scsi_transport_fc scsi_mod
Pid: 3221, comm: gfs_glockd Not tainted 2.6.9-34.ELsmp
RSP: 0018:00000101123c1dd8  EFLAGS: 00010212
RAX: 0000000000000001 RBX: 000001004df39d80 RCX: 0000000000000246
RDX: 000000000000a997 RSI: 0000000000000246 RDI: ffffffff803d9e60
RBP: 00000000ffffffea R08: 0000000000000004 R09: 000001004df39d80
R10: 0000000000000000 R11: 00000000000000e4 R12: 00000100d4a9a974
R13: ffffff0010182000 R14: ffffffffa0264e60 R15: 00000100d4a9a948
FS:  0000002a95575b00(0000) GS:ffffffff804d7b00(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000062fafc CR3: 0000000000101000 CR4: 00000000000006e0
Stack: 000001001d546c80 ffffff0010182000        00000100d4a9a948
ffffffffa0268b96

Call Trace:<ffffffffa022f97c>{:gfs:gfs_lm_unlock+41}
<ffffffffa02263d9>{:gfs:gfs_glock_drop_th+290}
<ffffffffa0224b7c>{:gfs:run_queue+314}
<ffffffffa0224dd0>{:gfs:unlock_on_glock+37}
<ffffffffa0224ec6>{:gfs:gfs_reclaim_glock+234}
<ffffffffa021975a>{:gfs:gfs_glockd+61}
<ffffffff801333c8>{default_wake_function+0}
       <ffffffff801333c8>{default_wake_function+0}
<ffffffff80110e17>{child_rip+8}
       <ffffffff801f64af>{vgacon_cursor+0}
<ffffffffa021971d>{:gfs:gfs_glockd+0}
       <ffffffff80110e0f>{child_rip+0}

Code: 0f 0b 52 d1 26 a0 ff ff ff ff 65 01 48 c7 c7 57 d1 26 a0 31
RIP <ffffffffa0268819>{:lock_dlm:do_dlm_unlock+167} RSP
<00000101123c1dd8>




More information about the Linux-cluster mailing list