[Linux-cluster] Kernel oops

Mon Aug 9 20:16:29 UTC 2004

While doing a bunch of 'while [ 0 ]; relocate_resource_group foo; done'
simultaneously, I triggered this in the DLM:

I haven't updated since last week; will do so and attempt to reproduce.
This is just a heads-up.

-- Lon

DLM:  Assertion failed on line 328 of file cluster/dlm/lockqueue.c
DLM:  assertion:  "rsb->res_nodeid == -1 || rsb->res_nodeid == 0"
DLM:  time = 2154223
dlm: lkb
id 200ca
remid 0
flags 0
status 0
rqmode 5
grmode -1
nodeid 4294967295
lqstate 0
lqflags 0
dlm: rsb
name "usrm::vf"
nodeid 1
ref 2
dlm: reply
rh_cmd 5
rh_lkid 200ca
lockstate 0
nodeid 1
status 0
lkid c02bf515

------------[ cut here ]------------
kernel BUG at cluster/dlm/lockqueue.c:328!
invalid operand: 0000 [#1]
PREEMPT SMP 
Modules linked in: dlm cman ipv6
CPU:    0
EIP:    0060:[<d099cd16>]    Not tainted
EFLAGS: 00010286   (2.6.7cman20040804) 
EIP is at process_lockqueue_reply+0x5e6/0x720 [dlm]
eax: 00000001   ebx: 00000001   ecx: c039df74   edx: 00000282
esi: c9bbb04c   edi: c9bbc708   ebp: caf55e24   esp: caf55dfc
ds: 007b   es: 007b   ss: 0068
Process dlm_recvd (pid: 2109, threadinfo=caf54000 task=cad33360)
Stack: d09ad257 00000148 d09ad23f d09ae8a0 0020deef c1398200 000200ca
c9bbc708 
       c1398200 caf55ee0 caf55eac d099de86 c9bbc708 caf55ee0 00000001
c03b94c0 
       caf55f88 caf55e90 caf55e74 c030a4d4 caf55e90 00000000 00000000
00000fc4 
Call Trace:
 [<c010718f>] show_stack+0x7f/0xa0
 [<c010733e>] show_registers+0x15e/0x1c0
 [<c01074f2>] die+0xa2/0x120
 [<c0107955>] do_invalid_op+0xb5/0xc0
 [<c0106e09>] error_code+0x2d/0x38
 [<d099de86>] process_cluster_request+0x746/0xde0 [dlm]
 [<d09a2be7>] midcomms_process_incoming_buffer+0x167/0x250 [dlm]
 [<d099ffd9>] receive_from_sock+0x189/0x360 [dlm]
 [<d09a12c8>] process_sockets+0xd8/0x110 [dlm]
 [<d09a16ad>] dlm_recvd+0xad/0x110 [dlm]
 [<c0104455>] kernel_thread_helper+0x5/0x10

Code: 0f 0b 48 01 3f d2 9a d0 e9 0d 01 00 00 e8 e8 f0 ff ff e8 33