[Linux-cluster] segfault during cman_tool services

Thu May 19 17:48:20 UTC 2005

i've been having some problems with my fs where a node
will mysteriously be removed from the cluster, even though
the node is still up.  here's what I see from syslog:

CMAN: node blade11 has been removed from the cluster : No response to messages
CMAN: killed by NODEDOWN message
CMAN: we are leaving the cluster. No response to messages
dlm: proj_lv: restbl_rsb_update failed -105
dlm: home_lv: rebuild_rsbs_send failed -105

so from blade11, I try to see what's going on, and when I do:

> cman_tool services

I get the fun pasted at the end of the message.  A while back I noticed
there was some code updates/patches, but I don't know where to find the
"Changes".  Would a cvs update on the sources help?  Let  me know if
you need more info on the system I'm running.

regards,
dan

--

lock_dlm:  Assertion failed on line 353 of file 
/usr/src/cluster-2.6.8.1/gfs-kernel/src/dlm/lock.c
lock_dlm:  assertion:  "!error"
lock_dlm:  time = 80864198
proj_lv: error=-22 num=2,1a lkf=10000 flags=84

------------[ cut here ]------------
kernel BUG at /usr/src/cluster-2.6.8.1/gfs-kernel/src/dlm/lock.c:353!
invalid operand: 0000 [#1]
Modules linked in: ipv6 evdev pcspkr psmouse sworks_agp agpgart ohci_hcd 
usbcore tg3 firmware_class lock_dlm dlm cman gfs lock_harness dm_mod 
qla2300 qla2xxx scsi_transport_fc sg sr_mod sd_mod scsi_mod ide_cd cdrom 
genrtc ext3 jbd mbcache ide_generic via82cxxx trm290 triflex slc90e66 
sis5513 siimage serverworks sc1200 rz1000 piix pdc202xx_old pdc202xx_new 
opti621 ns87415 hpt366 ide_disk hpt34x generic cy82c693 cs5530 cs5520 
cmd64x atiixp amd74xx alim15x3 aec62xx ide_core unix
CPU:    0
EIP:    0060:[<f89e6b46>]    Tainted: GF
EFLAGS: 00010286   (2.6.8.1)
EIP is at do_dlm_unlock+0x106/0x120 [lock_dlm]
eax: 00000001   ebx: ffffffea   ecx: c02b4870   edx: 000053ec
esi: f432aa00   edi: f8b301c0   ebp: f43ce000   esp: f43cfedc
ds: 007b   es: 007b   ss: 0068
Process gfs_glockd (pid: 2174, threadinfo=f43ce000 task=f43b4dd0)
Stack: f89ed876 f431cde0 ffffffea 00000002 0000001a 00000000 00010000 
00000084
       f8ba8000 f8ba8000 f89e6eef f432aa00 f8b01718 f432aa00 00000003 
f431dbd0
       f8af51f9 f8ba8000 f432aa00 00000003 00000000 f8bb83f4 f8b301c0 
00000000
Call Trace:
 [<f89e6eef>] lm_dlm_unlock+0x1f/0x30 [lock_dlm]
 [<f8b01718>] gfs_lm_unlock+0x38/0x60 [gfs]
 [<f8af51f9>] gfs_glock_drop_th+0x69/0x1a0 [gfs]
 [<f8af4588>] rq_demote+0x98/0xb0 [gfs]
 [<f8af468c>] run_queue+0xac/0xe0 [gfs]
 [<f8af6ed4>] demote_ok+0x74/0x80 [gfs]
 [<f8af700d>] gfs_reclaim_glock+0x7d/0x130 [gfs]
 [<f8ae80ca>] gfs_glockd+0x10a/0x120 [gfs]
 [<c0115950>] default_wake_function+0x0/0x20
 [<c0105d72>] ret_from_fork+0x6/0x14
 [<c0115950>] default_wake_function+0x0/0x20
 [<f8ae7fc0>] gfs_glockd+0x0/0x120 [gfs]
 [<c01042ad>] kernel_thread_helper+0x5/0x18
Code: 0f 0b 61 01 e0 c8 9e f8 c7 04 24 20 c9 9e f8 e8 16 18 73 c7

--