[Linux-cluster] CS5 : clurgmgrd[28359]: segfault

Alain Moulle Alain.Moulle at bull.net
Wed Jan 9 14:04:03 UTC 2008


Hi

Testing the CS5 on a two-nodes cluster with quorum disk, when I did
the test ifdown on the heart-beat interface, I got a segfault in log :

Jan  9 09:45:16 s_sys at am1 avahi-daemon[3106]: Interface eth0.IPv6 no longer
relevant for mDNS.
Jan  9 09:45:18 s_sys at am1 qdiskd[28265]: <debug> Heuristic: 'ping -t1 -c1
172.19.1.99' missed (1/3)
Jan  9 09:45:25 s_sys at am1 openais[28300]: [TOTEM] The token was lost in the
OPERATIONAL state.
Jan  9 09:45:25 s_sys at am1 openais[28300]: [TOTEM] Receive multicast socket recv
buffer size (288000 bytes).
Jan  9 09:45:25 s_sys at am1 openais[28300]: [TOTEM] Transmit multicast socket send
buffer size (262142 bytes).
Jan  9 09:45:25 s_sys at am1 openais[28300]: [TOTEM] The network interface is down.
Jan  9 09:45:25 s_sys at am1 openais[28300]: [TOTEM] entering GATHER state from 15.
Jan  9 09:45:25 s_sys at am1 openais[28300]: [TOTEM] entering GATHER state from 2.
Jan  9 09:45:28 s_sys at am1 qdiskd[28265]: <debug> Heuristic: 'ping -t1 -c1
172.19.1.99' missed (2/3)
Jan  9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] entering GATHER state from 0.
Jan  9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] Creating commit token because
I am the rep.
Jan  9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] Saving state aru 5c high seq
received 5c
Jan  9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] Storing new sequence id for
ring 12c
Jan  9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] entering COMMIT state.
Jan  9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] entering RECOVERY state.
Jan  9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] position [0] member 127.0.0.1:
Jan  9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] previous ring seq 296 rep
172.19.1.78
Jan  9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] aru 5c high delivered 5c
received flag 1
Jan  9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] Did not need to originate any
messages in recovery.
Jan  9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] Sending initial ORF token
Jan  9 09:45:30 s_sys at am1 openais[28300]: [CLM  ] CLM CONFIGURATION CHANGE
Jan  9 09:45:30 s_sys at am1 openais[28300]: [CLM  ] New Configuration:
Jan  9 09:45:30 s_sys at am1 openais[28300]: [CLM  ]       r(0) ip(127.0.0.1)
Jan  9 09:45:30 s_sys at am1 openais[28300]: [CLM  ] Members Left:
Jan  9 09:45:30 s_sys at am1 openais[28300]: [CLM  ]       r(0) ip(172.19.1.79)
Jan  9 09:45:30 s_sys at am1 openais[28300]: [CLM  ] Members Joined:
Jan  9 09:45:30 s_sys at am1 openais[28300]: [CLM  ] CLM CONFIGURATION CHANGE
Jan  9 09:45:30 s_sys at am1 openais[28300]: [CLM  ] New Configuration:
Jan  9 09:45:30 s_sys at am1 openais[28300]: [CLM  ]       r(0) ip(127.0.0.1)
Jan  9 09:45:30 s_sys at am1 openais[28300]: [CLM  ] Members Left:
Jan  9 09:45:30 s_sys at am1 openais[28300]: [CLM  ] Members Joined:
Jan  9 09:45:30 s_sys at am1 openais[28300]: [SYNC ] This node is within the
primary component and will provide service.
Jan  9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] entering OPERATIONAL state.
Jan  9 09:45:30 s_sys at am1 openais[28300]: [CLM  ] got nodejoin message 172.16.101.91
Jan  9 09:45:30 s_sys at am1 openais[28300]: [EVT  ] recovery error node: r(0)
ip(127.0.0.1)  not found
Jan  9 09:45:30 s_kernel at am1 kernel: clurgmgrd[28359]: segfault at
0000000000000000 rip 0000000000408c4a rsp 00007fff04a2c450 error 4
Jan  9 09:45:30 s_sys at am1 gfs_controld[28328]: cluster is down, exiting
Jan  9 09:45:30 s_kernel at am1 kernel: dlm: closing connection to node 2
Jan  9 09:45:30 s_kernel at am1 kernel: dlm: closing connection to node 0
Jan  9 09:45:30 s_kernel at am1 kernel: dlm: closing connection to node 1
Jan  9 09:45:30 s_sys at am1 dlm_controld[28322]: cluster is down, exiting
Jan  9 09:45:30 s_sys at am1 fenced[28316]: cman_get_nodes error -1 104
Jan  9 09:45:30 s_sys at am1 fenced[28316]: cluster is down, exiting
Jan  9 09:45:30 s_sys at am1 clurgmgrd[28358]: <crit> Watchdog: Daemon died,
rebooting...
Jan  9 09:45:30 s_sys at am1 shutdown[18377]: shutting down for system halt

Is-it already a known problem ?

Thanks
Regards
Alain Moullé





More information about the Linux-cluster mailing list