[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] pull plug on node, service never relocates



Thank you for that -- excellent tip.

Yesterday evening forced a re-install of all cluster associated RPMs just in case of maybe some sort of binary corruption... Still getting same result. This log is from yesterday after increasing the log level of rgmanager.

This is the log from the node that did the fencing. The "spare" machine did not pick up the service until after the "failed" node was noticed by all other nodes with a " clurgmgrd[5234]: <info> State change: 192.168.1.101 UP" - which is, of course, after the node was fenced and had rebooted and rejoined the cluster.... Really weird issue.

May 19 16:19:13 c1n2 root: MARK I fail c1n1 running core1 by ifconfigging its ethernet ports off
May 19 16:19:35 c1n2 openais[4660]: [TOTEM] The token was lost in the OPERATIONAL state.
May 19 16:19:35 c1n2 openais[4660]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
May 19 16:19:35 c1n2 openais[4660]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
May 19 16:19:35 c1n2 openais[4660]: [TOTEM] entering GATHER state from 2.
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] entering GATHER state from 11.
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] Saving state aru 8b high seq received 8b
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] Storing new sequence id for ring d0c
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] entering COMMIT state.
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] entering RECOVERY state.
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] position [0] member 192.168.1.103:
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] previous ring seq 3336 rep 192.168.1.103
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] aru 8b high delivered 8b received flag 1
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] position [1] member 192.168.1.104:
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] previous ring seq 3336 rep 192.168.1.103
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] aru 8b high delivered 8b received flag 1
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] position [2] member 192.168.1.105:
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] previous ring seq 3336 rep 192.168.1.103
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] aru 8b high delivered 8b received flag 1
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] position [3] member 192.168.1.102:
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] previous ring seq 3336 rep 192.168.1.103
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] aru 8b high delivered 8b received flag 1
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] Did not need to originate any messages in recovery.
May 19 16:19:40 c1n2 openais[4660]: [CLM  ] CLM CONFIGURATION CHANGE
May 19 16:19:40 c1n2 openais[4660]: [CLM  ] New Configuration:
May 19 16:19:40 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.103) 
May 19 16:19:40 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.104) 
May 19 16:19:40 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.105) 
May 19 16:19:40 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.102) 
May 19 16:19:40 c1n2 openais[4660]: [CLM  ] Members Left:
May 19 16:19:40 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.101) 
May 19 16:19:40 c1n2 openais[4660]: [CLM  ] Members Joined:
May 19 16:19:40 c1n2 openais[4660]: [CLM  ] CLM CONFIGURATION CHANGE
May 19 16:19:40 c1n2 openais[4660]: [CLM  ] New Configuration:
May 19 16:19:40 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.103) 
May 19 16:19:40 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.104) 
May 19 16:19:40 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.105) 
May 19 16:19:40 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.102) 
May 19 16:19:40 c1n2 openais[4660]: [CLM  ] Members Left:
May 19 16:19:40 c1n2 openais[4660]: [CLM  ] Members Joined:
May 19 16:19:40 c1n2 openais[4660]: [SYNC ] This node is within the primary component and will provide service.
May 19 16:19:40 c1n2 openais[4660]: [TOTEM] entering OPERATIONAL state.
May 19 16:19:40 c1n2 kernel: dlm: closing connection to node 1
May 19 16:19:40 c1n2 clurgmgrd[5234]: <info> State change: 192.168.1.101 DOWN
May 19 16:19:40 c1n2 openais[4660]: [CLM  ] got nodejoin message 192.168.1.103
May 19 16:19:40 c1n2 openais[4660]: [CLM  ] got nodejoin message 192.168.1.104
May 19 16:19:40 c1n2 openais[4660]: [CLM  ] got nodejoin message 192.168.1.105
May 19 16:19:40 c1n2 openais[4660]: [CLM  ] got nodejoin message 192.168.1.102
May 19 16:19:40 c1n2 openais[4660]: [CPG  ] got joinlist message from node 5
May 19 16:19:40 c1n2 openais[4660]: [CPG  ] got joinlist message from node 2
May 19 16:19:40 c1n2 openais[4660]: [CPG  ] got joinlist message from node 3
May 19 16:19:40 c1n2 openais[4660]: [CPG  ] got joinlist message from node 4
May 19 16:19:43 c1n2 fenced[4680]: 192.168.1.101 not a cluster member after 3 sec post_fail_delay
May 19 16:19:43 c1n2 fenced[4680]: fencing node "192.168.1.101"
May 19 16:19:45 c1n2 clurgmgrd[5234]: <info> Waiting for node #1 to be fenced
May 19 16:19:47 c1n2 fenced[4680]: fence "192.168.1.101" success
May 19 16:19:47 c1n2 clurgmgrd[5234]: <info> Node #1 fenced; continuing
May 19 16:20:05 c1n2 clurgmgrd: [5234]: <info> Executing /ha/bin/ha-hpss-mover1 status
May 19 16:22:37 c1n2 clurgmgrd: [5234]: <info> Executing /ha/bin/ha-hpss-mover1 status
May 19 16:23:27 c1n2 last message repeated 3 times
May 19 16:24:57 c1n2 last message repeated 3 times
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] entering GATHER state from 11.
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] Saving state aru 3e high seq received 3e
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] Storing new sequence id for ring d10
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] entering COMMIT state.
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] entering RECOVERY state.
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] position [0] member 192.168.1.103:
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] previous ring seq 3340 rep 192.168.1.103
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] aru 3e high delivered 3e received flag 1
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] position [1] member 192.168.1.104:
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] previous ring seq 3340 rep 192.168.1.103
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] aru 3e high delivered 3e received flag 1
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] position [2] member 192.168.1.105:
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] previous ring seq 3340 rep 192.168.1.103
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] aru 3e high delivered 3e received flag 1
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] position [3] member 192.168.1.101:
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] previous ring seq 3340 rep 192.168.1.101
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] aru a high delivered a received flag 1
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] position [4] member 192.168.1.102:
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] previous ring seq 3340 rep 192.168.1.103
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] aru 3e high delivered 3e received flag 1
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] Did not need to originate any messages in recovery.
May 19 16:25:17 c1n2 openais[4660]: [CLM  ] CLM CONFIGURATION CHANGE
May 19 16:25:17 c1n2 openais[4660]: [CLM  ] New Configuration:
May 19 16:25:17 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.103) 
May 19 16:25:17 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.104) 
May 19 16:25:17 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.105) 
May 19 16:25:17 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.102) 
May 19 16:25:17 c1n2 openais[4660]: [CLM  ] Members Left:
May 19 16:25:17 c1n2 openais[4660]: [CLM  ] Members Joined:
May 19 16:25:17 c1n2 openais[4660]: [CLM  ] CLM CONFIGURATION CHANGE
May 19 16:25:17 c1n2 openais[4660]: [CLM  ] New Configuration:
May 19 16:25:17 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.103) 
May 19 16:25:17 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.104) 
May 19 16:25:17 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.105) 
May 19 16:25:17 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.101) 
May 19 16:25:17 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.102) 
May 19 16:25:17 c1n2 openais[4660]: [CLM  ] Members Left:
May 19 16:25:17 c1n2 openais[4660]: [CLM  ] Members Joined:
May 19 16:25:17 c1n2 openais[4660]: [CLM  ]     r(0) ip(192.168.1.101) 
May 19 16:25:17 c1n2 openais[4660]: [SYNC ] This node is within the primary component and will provide service.
May 19 16:25:17 c1n2 openais[4660]: [TOTEM] entering OPERATIONAL state.
May 19 16:25:17 c1n2 openais[4660]: [CLM  ] got nodejoin message 192.168.1.103
May 19 16:25:17 c1n2 openais[4660]: [CLM  ] got nodejoin message 192.168.1.104
May 19 16:25:17 c1n2 openais[4660]: [CLM  ] got nodejoin message 192.168.1.105
May 19 16:25:17 c1n2 openais[4660]: [CLM  ] got nodejoin message 192.168.1.101
May 19 16:25:17 c1n2 openais[4660]: [CLM  ] got nodejoin message 192.168.1.102
May 19 16:25:17 c1n2 openais[4660]: [CPG  ] got joinlist message from node 2
May 19 16:25:17 c1n2 openais[4660]: [CPG  ] got joinlist message from node 3
May 19 16:25:17 c1n2 openais[4660]: [CPG  ] got joinlist message from node 4
May 19 16:25:17 c1n2 openais[4660]: [CPG  ] got joinlist message from node 5
May 19 16:25:24 c1n2 kernel: dlm: connecting to 1
May 19 16:25:27 c1n2 clurgmgrd: [5234]: <info> Executing /ha/bin/ha-hpss-mover1 status
May 19 16:25:57 c1n2 clurgmgrd: [5234]: <info> Executing /ha/bin/ha-hpss-mover1 status
May 19 16:26:00 c1n2 clurgmgrd[5234]: <info> State change: 192.168.1.101 UP
May 19 16:26:27 c1n2 clurgmgrd: [5234]: <info> Executing /ha/bin/ha-hpss-mover1 status
May 19 16:26:56 c1n2 xinetd[9002]: Exiting...
May 19 16:26:56 c1n2 xinetd[2236]: xinetd Version 2.3.14 started with libwrap loadavg labeled-networking options compiled in.
May 19 16:26:56 c1n2 xinetd[2236]: Started working: 1 available service
May 19 16:26:57 c1n2 clurgmgrd: [5234]: <info> Executing /ha/bin/ha-hpss-mover1 status
May 19 16:28:57 c1n2 last message repeated 2 times
May 19 16:28:58 c1n2 root: MARK II - end of test

On Wed, May 19, 2010 at 2:42 PM, Alfredo Moralejo <amoralej redhat com> wrote:
What is the state of service that was running in the node after pulling the power cables? stopped, failed?

Set rgmanager in verbose mode with <rm log_level="7" log_facility="local4">

Regards

Alfredo



On 05/19/2010 07:08 PM, Dusty wrote:
In the interest of trouble-shooting I've taken all the failover domains out of the configuration.

This resulted in no change:

Service on a failed node does not relocate until the failed node reboots.

To reiterate: Similar cluster configuration on similar hardware worked perfectly on RHEL5U3.
-- Linux-cluster mailing list


--

Alfredo Moralejo
Red Hat - Senior consultant

Office: +34 914148838
Cell: +34 607909535
Email: alfredo moralejo redhat com

Dirección Comercial: C/Jose Bardasano Baos, 9, Edif. Gorbea 3, planta 3ºD, 28016 Madrid, Spain
Dirección Registrada: Red Hat S.L., C/ Velazquez 63, Madrid 28001, Spain
Inscrita en el Reg. Mercantil de Madrid – C.I.F. B82657941

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]