[Linux-cluster] cluster is not relocation on second node.

Thu Jun 9 10:40:28 UTC 2011

On 09/06/11 11:27, Shankar Jha wrote:
> Hi,
>
> I have problem in rhel5.5 cluster.
> Mysqld service is on cluster. when there is any issue with cluster,
> services(hell) not relocation automatically. Even I have tried to
> enable on second node but fails. In that case we need to reboot both
> nodes and enable it on manually on anyone. HP-ILO fencing is not
> working.

You answered your own question. Fix fencing and the failover should work 
fine :-)

Chrissie

> Please find the below /var/log/message and suggest.
>
>
> Jun  9 02:46:25 indls0040 clurgmgrd[6530]:<notice>  Stopping service
> service:hell
> Jun  9 02:46:27 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
> Jun  9 02:46:44 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
> Jun  9 02:46:45 indls0040 ccsd[5222]: Unable to connect to cluster
> infrastructure after 19710 seconds.
> Jun  9 02:46:55 indls0040 clurgmgrd[6530]:<err>  #52: Failed changing RG status
> Jun  9 02:47:03 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
> Jun  9 02:47:05 indls0040 clurgmgrd: [6530]:<warning>  10.48.64.82 is
> not configured
> Jun  9 02:47:05 indls0040 clurgmgrd[6530]:<notice>  Stopping service
> service:hell
> Jun  9 02:47:15 indls0040 ccsd[5222]: Unable to connect to cluster
> infrastructure after 19740 seconds.
> Jun  9 02:47:20 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
> Jun  9 02:47:35 indls0040 clurgmgrd[6530]:<err>  #52: Failed changing RG status
> Jun  9 02:47:38 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
> Jun  9 02:47:45 indls0040 clurgmgrd: [6530]:<warning>  10.48.64.82 is
> not configured
> Jun  9 02:47:45 indls0040 clurgmgrd[6530]:<notice>  Stopping service
> service:hell
> Jun  9 02:47:45 indls0040 ccsd[5222]: Unable to connect to cluster
> infrastructure after 19770 seconds.
> Jun  9 02:47:50 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
> Jun  9 02:48:14 indls0040 last message repeated 2 times
> Jun  9 02:48:15 indls0040 ccsd[5222]: Unable to connect to cluster
> infrastructure after 19800 seconds.
> Jun  9 02:48:15 indls0040 clurgmgrd[6530]:<err>  #52: Failed changing RG status
> Jun  9 02:48:23 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
> Jun  9 02:48:25 indls0040 clurgmgrd: [6530]:<warning>  10.48.64.82 is
> not configured
> Jun  9 02:48:25 indls0040 clurgmgrd[6530]:<notice>  Stopping service
> service:hell
> Jun  9 02:48:37 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
> Jun  9 02:48:45 indls0040 ccsd[5222]: Unable to connect to cluster
> infrastructure after 19830 seconds.
> Jun  9 02:48:55 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
> Jun  9 02:48:55 indls0040 clurgmgrd[6530]:<err>  #52: Failed changing RG status
> Jun  9 02:49:05 indls0040 clurgmgrd: [6530]:<warning>  10.48.64.82 is
> not configured
> Jun  9 02:49:05 indls0040 clurgmgrd[6530]:<notice>  Stopping service
> service:hell
> Jun  9 02:49:13 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
> Jun  9 02:49:15 indls0040 ccsd[5222]: Unable to connect to cluster
> infrastructure after 19860 seconds.
> Jun  9 02:49:26 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
> Jun  9 02:49:35 indls0040 clurgmgrd[6530]:<err>  #52: Failed changing RG status
> Jun  9 02:49:45 indls0040 clurgmgrd: [6530]:<warning>  10.48.64.82 is
> not configured
> Jun  9 02:49:45 indls0040 clurgmgrd[6530]:<notice>  Stopping service
> service:hell
> Jun  9 02:49:45 indls0040 ccsd[5222]: Unable to connect to cluster
> infrastructure after 19890 seconds.
> Jun  9 02:49:47 indls0040 dhclient: DHCPREQUEST on eth7 to 10.48.64.13 port 67
> Jun  9 02:50:10 indls0040 last message repeated 2 times
> Jun  9 02:50:15 indls0040 clurgmgrd[6530]:<err>  #52: Failed changing RG status
>
>
> Jun  9 10:03:59 indls0040 openais[23169]: [MAIN ] Using default
> multicast address of 239.192.67.158
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] Token Timeout (10000
> ms) retransmit timeout (495 ms)
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] token hold (386 ms)
> retransmits before loss (20 retrans)
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] join (60 ms)
> send_join (0 ms) consensus (4800 ms) merge (200 ms)
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] downcheck (1000 ms)
> fail to recv const (50 msgs)
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] seqno unchanged
> const (30 rotations) Maximum network MTU 1402
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] window size per
> rotation (50 messages) maximum messages per rotation (1
> 7 messages)
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] missed count const
> (5 messages)
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] send threads (0 threads)
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] RRP token expired
> timeout (495 ms)
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] RRP token problem
> counter (2000 ms)
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] RRP threshold (10
> problem count)
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] RRP mode set to none.
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] heartbeat_failures_allowed (0)
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] max_network_delay (50 ms)
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] HeartBeat is
> Disabled. To enable set heartbeat_failures_allowed>  0
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] Receive multicast
> socket recv buffer size (320000 bytes).
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] Transmit multicast
> socket send buffer size (262142 bytes).
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] The network
> interface [10.48.65.54] is now up.
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] Created or loaded
> sequence id 7136704.10.48.65.54 for this ring.
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] entering GATHER state from 15.
> Jun  9 10:04:00 indls0040 openais[23169]: [CMAN ] CMAN 2.0.115 (built
> Jul 28 2010 19:18:41) started
> Jun  9 10:04:00 indls0040 openais[23169]: [MAIN ] Service initialized
> 'openais CMAN membership service 2.01'
> Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
> 'openais extended virtual synchrony service'
> Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
> 'openais cluster membership service B.01.01'
> Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
> 'openais availability management framework B.01.01'
>
> Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
> 'openais checkpoint service B.01.01'
> Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
> 'openais event service B.01.01'
> Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
> 'openais distributed locking service B.01.01'
> Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
> 'openais message service B.01.01'
> Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
> 'openais configuration service'
> Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
> 'openais cluster closed process group service v1.01
> '
> Jun  9 10:04:00 indls0040 openais[23169]: [SERV ] Service initialized
> 'openais cluster config database access v1.01'
> Jun  9 10:04:00 indls0040 openais[23169]: [SYNC ] Not using a virtual
> synchrony filter.
> Jun  9 10:04:00 indls0040 openais[23169]: [TOTEM] Creating commit
> token because I am the rep.
> --More--
>
>
> Thanks-
> Shankar
>
>
>
> Jun  9 10:04:01 indls0040 openais[23169]: [CLM  ]       r(0) ip(10.48.64.67)
> Jun  9 10:04:01 indls0040 openais[23169]: [SYNC ] This node is within
> the primary component and will provide service.
> Jun  9 10:04:01 indls0040 openais[23169]: [TOTEM] entering OPERATIONAL state.
> Jun  9 10:04:02 indls0040 openais[23169]: [CLM  ] got nodejoin message
> 10.48.64.67
> Jun  9 10:04:02 indls0040 openais[23169]: [CLM  ] got nodejoin message
> 10.48.65.54
> Jun  9 10:04:02 indls0040 openais[23169]: [CMAN ] cman killed by node
> 2 because we were killed by cman_tool or other appl
> ication
> Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading all
> openais components
> Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
> component: openais_confdb v0 (19/10)
> Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
> component: openais_cpg v0 (18/8)
> Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
> component: openais_cfg v0 (17/7)
> Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
> component: openais_msg v0 (16/6)
> Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
> component: openais_lck v0 (15/5)
> Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
> component: openais_evt v0 (14/4)
> Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
> component: openais_ckpt v0 (13/3)
> Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
> component: openais_amf v0 (12/2)
> Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
> component: openais_clm v0 (11/1)
> Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
> component: openais_evs v0 (10/0)
> Jun  9 10:04:03 indls0040 openais[23169]: [SERV ] Unloading openais
> component: openais_cman v0 (9/9)
> Jun  9 10:04:03 indls0040 dlm_controld[23196]: cluster is down, exiting
> Jun  9 10:04:03 indls0040 fenced[23188]: cluster is down, exiting
> Jun  9 10:04:03 indls0040 kernel: dlm: closing connection to node 1
> Jun  9 10:04:03 indls0040 gfs_controld[23203]: cpg_join error 2
> Jun  9 10:04:06 indls0040 fence_node[23194]: Fence of
> "indls0040.qdx.in" was unsuccessful
> Jun  9 10:04:15 indls0040 ccsd[5222]: Unable to connect to cluster
> infrastructure after 45930 seconds.
> Jun  9 10:04:16 indls0040 clurgmgrd[6530]:<err>  #52: Failed changing RG status
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster