[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Cluster Node Crash

On Fri, 2007-07-27 at 14:21 -0500, Steve Rigler wrote:
> Hello All,
> We are running GFS on RHEL4U3 (x86_64).  One of our cluster nodes
> crashes this afternoon.  We are able to capture some of the message from
> netdump (pasted below) before fencing killed the node.
> Any advice would be appreciated.
> Thanks,
> Steve

As a followup, this is past tense (the word "crashes" should have been
"crashed").  One of the other nodes panicked after the first one tried
to rejoin the cluster (this is a 3 node cluster).

The dump from that node had these messages near the beginning of its
WARNING: dlm_emergency_shutdown
WARNING: dlm_emergency_shutdown
SM: 00000001 sm_stop: SG still joined
SM: 01000002 sm_stop: SG still joined
SM: 02000004 sm_stop: SG still joined
SM: 0300000d sm_stop: SG still joined

Followed by this:

lock_dlm:  Assertion failed on line 428 of file /usr/src/build/714650-
lock_dlm:  assertion:  "!error"
lock_dlm:  time = 5442621324
STUL03E: num=1,2 err=-22 cur=-1 req=3 lkf=0

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at lock:428
invalid operand: 0000 [1] SMP
Modules linked in: nfsd exportfs nfs lockd nfs_acl parport_pc lp parport
netconsole netdump autofs4 i2c_dev i2c_core lock_dlm(U) gfs(U)
lock_harness(U) dlm(U) cman(U) md5 ipv6 sunrpc ds yenta_socket
pcmcia_core dm_mirror dm_round_robin dm_multipath button battery ac
uhci_hcd ehci_hcd hw_random tg3 floppy ext3 jbd dm_mod qla2300 qla2xxx
scsi_transport_fc cciss sd_mod scsi_mod
Pid: 30604, comm: umount Not tainted 2.6.9-34.ELsmp
RIP: 0010:[<ffffffffa02689e7>] <ffffffffa02689e7>{:lock_dlm:do_dlm_lock
RSP: 0018:000001002ab6dc38  EFLAGS: 00010216
RAX: 0000000000000001 RBX: 00000000ffffffea RCX: 0000000000000246
RDX: 000000000000996e RSI: 0000000000000246 RDI: ffffffff803d9e60
RBP: 0000010117945c80 R08: 0000000000000004 R09: 00000000ffffffea
R10: 0000000000000000 R11: 00000000000000e4 R12: 00000100dfd23400
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000003
FS:  0000002a95575b00(0000) GS:ffffffff804d7b00(0000)
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003f95fc60c0 CR3: 0000000000101000 CR4: 00000000000006e0
Process umount (pid: 30604, threadinfo 000001002ab6c000, task
Stack: 0000000000000003 0000000000000000 3120202020202020
       3220202020202020 0000000000000018 0000010117945c80
       0000000000000003 0000000000000000
Call Trace:<ffffffffa0268b2a>{:lock_dlm:lm_dlm_lock+214}
       <ffffffff80192537>{sys_umount+925} <ffffffff80180264>{sys_newstat
       <ffffffff80110c61>{error_exit+0} <ffffffff801101c6>{system_call

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]