[Linux-cluster] cluster is not relocation on second node.

Shankar Jha shankar.jha at gmail.com
Thu Jun 9 10:27:22 UTC 2011


Hi,

I have problem in rhel5.5 cluster.
Mysqld service is on cluster. when there is any issue with cluster,
services(hell) not relocation automatically. Even I have tried to
enable on second node but fails. In that case we need to reboot both
nodes and enable it on manually on anyone. HP-ILO fencing is not
working.
Please find the below /var/log/message and suggest.


Jun  9 02:46:25 indls0040 clurgmgrd[6530]: <notice> Stopping service
service:hell
Jun  9 02:46:27 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
Jun  9 02:46:44 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
Jun  9 02:46:45 indls0040 ccsd[5222]: Unable to connect to cluster
infrastructure after 19710 seconds.
Jun  9 02:46:55 indls0040 clurgmgrd[6530]: <err> #52: Failed changing RG status
Jun  9 02:47:03 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
Jun  9 02:47:05 indls0040 clurgmgrd: [6530]: <warning> 10.48.64.82 is
not configured
Jun  9 02:47:05 indls0040 clurgmgrd[6530]: <notice> Stopping service
service:hell
Jun  9 02:47:15 indls0040 ccsd[5222]: Unable to connect to cluster
infrastructure after 19740 seconds.
Jun  9 02:47:20 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
Jun  9 02:47:35 indls0040 clurgmgrd[6530]: <err> #52: Failed changing RG status
Jun  9 02:47:38 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
Jun  9 02:47:45 indls0040 clurgmgrd: [6530]: <warning> 10.48.64.82 is
not configured
Jun  9 02:47:45 indls0040 clurgmgrd[6530]: <notice> Stopping service
service:hell
Jun  9 02:47:45 indls0040 ccsd[5222]: Unable to connect to cluster
infrastructure after 19770 seconds.
Jun  9 02:47:50 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
Jun  9 02:48:14 indls0040 last message repeated 2 times
Jun  9 02:48:15 indls0040 ccsd[5222]: Unable to connect to cluster
infrastructure after 19800 seconds.
Jun  9 02:48:15 indls0040 clurgmgrd[6530]: <err> #52: Failed changing RG status
Jun  9 02:48:23 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
Jun  9 02:48:25 indls0040 clurgmgrd: [6530]: <warning> 10.48.64.82 is
not configured
Jun  9 02:48:25 indls0040 clurgmgrd[6530]: <notice> Stopping service
service:hell
Jun  9 02:48:37 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
Jun  9 02:48:45 indls0040 ccsd[5222]: Unable to connect to cluster
infrastructure after 19830 seconds.
Jun  9 02:48:55 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
Jun  9 02:48:55 indls0040 clurgmgrd[6530]: <err> #52: Failed changing RG status
Jun  9 02:49:05 indls0040 clurgmgrd: [6530]: <warning> 10.48.64.82 is
not configured
Jun  9 02:49:05 indls0040 clurgmgrd[6530]: <notice> Stopping service
service:hell
Jun  9 02:49:13 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
Jun  9 02:49:15 indls0040 ccsd[5222]: Unable to connect to cluster
infrastructure after 19860 seconds.
Jun  9 02:49:26 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
Jun  9 02:49:35 indls0040 clurgmgrd[6530]: <err> #52: Failed changing RG status
Jun  9 02:49:45 indls0040 clurgmgrd: [6530]: <warning> 10.48.64.82 is
not configured
Jun  9 02:49:45 indls0040 clurgmgrd[6530]: <notice> Stopping service
service:hell
Jun  9 02:49:45 indls0040 ccsd[5222]: Unable to connect to cluster
infrastructure after 19890 seconds.
Jun  9 02:49:47 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
Jun  9 02:50:10 indls0040 last message repeated 2 times
Jun  9 02:50:15 indls0040 clurgmgrd[6530]: <err> #52: Failed changing RG status


Jun  9 10:03:59 indls0040 openais[23169]: [MAIN ] Using default
multicast address of 239.192.67.158
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] Token Timeout (10000
ms) retransmit timeout (495 ms)
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] token hold (386 ms)
retransmits before loss (20 retrans)
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] join (60 ms)
send_join (0 ms) consensus (4800 ms) merge (200 ms)
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] downcheck (1000 ms)
fail to recv const (50 msgs)
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] seqno unchanged
const (30 rotations) Maximum network MTU 1402
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] window size per
rotation (50 messages) maximum messages per rotation (1
7 messages)
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] missed count const
(5 messages)
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] send threads (0 threads)
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] RRP token expired
timeout (495 ms)
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] RRP token problem
counter (2000 ms)
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] RRP threshold (10
problem count)
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] RRP mode set to none.
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] heartbeat_failures_allowed (0)
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] max_network_delay (50 ms)
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] HeartBeat is
Disabled. To enable set heartbeat_failures_allowed > 0
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] Receive multicast
socket recv buffer size (320000 bytes).
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] Transmit multicast
socket send buffer size (262142 bytes).
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] The network
interface [10.48.65.54] is now up.
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] Created or loaded
sequence id 7136704.10.48.65.54 for this ring.
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] entering GATHER state from 15.
Jun  9 10:04:00 indls0040 openais[23169]: [CMAN ] CMAN 2.0.115 (built
Jul 28 2010 19:18:41) started
Jun  9 10:04:00 indls0040 openais[23169]: [MAIN ] Service initialized
'openais CMAN membership service 2.01'
Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
'openais extended virtual synchrony service'
Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
'openais cluster membership service B.01.01'
Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
'openais availability management framework B.01.01'

Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
'openais checkpoint service B.01.01'
Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
'openais event service B.01.01'
Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
'openais distributed locking service B.01.01'
Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
'openais message service B.01.01'
Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
'openais configuration service'
Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
'openais cluster closed process group service v1.01
'
Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
'openais cluster config database access v1.01'
Jun  9 10:04:00 indls0040 openais[23169]: [SYNC ] Not using a virtual
synchrony filter.
Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] Creating commit
token because I am the rep.
--More--


Thanks-
Shankar



Jun  9 10:04:01 indls0040 openais[23169]: [CLM  ]       r(0) ip(10.48.64.67)
Jun  9 10:04:01 indls0040 openais[23169]: [SYNC ] This node is within
the primary component and will provide service.
Jun  9 10:04:01 indls0040 openais[23169]: [TOTEM] entering OPERATIONAL state.
Jun  9 10:04:02 indls0040 openais[23169]: [CLM  ] got nodejoin message
10.48.64.67
Jun  9 10:04:02 indls0040 openais[23169]: [CLM  ] got nodejoin message
10.48.65.54
Jun  9 10:04:02 indls0040 openais[23169]: [CMAN ] cman killed by node
2 because we were killed by cman_tool or other appl
ication
Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading all
openais components
Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
component: openais_confdb v0 (19/10)
Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
component: openais_cpg v0 (18/8)
Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
component: openais_cfg v0 (17/7)
Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
component: openais_msg v0 (16/6)
Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
component: openais_lck v0 (15/5)
Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
component: openais_evt v0 (14/4)
Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
component: openais_ckpt v0 (13/3)
Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
component: openais_amf v0 (12/2)
Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
component: openais_clm v0 (11/1)
Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
component: openais_evs v0 (10/0)
Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
component: openais_cman v0 (9/9)
Jun  9 10:04:03 indls0040 dlm_controld[23196]: cluster is down, exiting
Jun  9 10:04:03 indls0040 fenced[23188]: cluster is down, exiting
Jun  9 10:04:03 indls0040 kernel: dlm: closing connection to node 1
Jun  9 10:04:03 indls0040 gfs_controld[23203]: cpg_join error 2
Jun  9 10:04:06 indls0040 fence_node[23194]: Fence of
"indls0040.qdx.in" was unsuccessful
Jun  9 10:04:15 indls0040 ccsd[5222]: Unable to connect to cluster
infrastructure after 45930 seconds.
Jun  9 10:04:16 indls0040 clurgmgrd[6530]: <err> #52: Failed changing RG status
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logs.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 147479 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110609/2844ec81/attachment.docx>


More information about the Linux-cluster mailing list