[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] RHEL5 failover domain



I've found problem with RHEL5 cluster. When I use prioritized fail over domain and next reset the node witch have priority set to 1 cluster relocate service to node with priority 2. Next, when node 1 come back ,cluster is trying to relocate service back to primary node. In logfile I always find:

Nov 19 12:32:26 l2 openais[1977]: [TOTEM] entering OPERATIONAL state.
Nov 19 12:32:26 l2 openais[1977]: [CLM ] got nodejoin message 192.168.10.10 Nov 19 12:32:26 l2 openais[1977]: [CLM ] got nodejoin message 192.168.10.11
Nov 19 12:32:26 l2 openais[1977]: [CPG  ] got joinlist message from node 1
Nov 19 12:32:26 l2 clurgmgrd[2687]: <notice> Stopping service service:vsftpd
Nov 19 12:32:41 l2 clurgmgrd[2687]: <err> #52: Failed changing RG status
Nov 19 12:32:56 l2 clurgmgrd[2687]: <err> #57: Failed changing RG status
Nov 19 12:32:57 l2 clurgmgrd: [2687]: <info> Executing /etc/init.d/vsftpd status

I tested this many times and in this case clurgmgrd do not try to run script with stop parameter, but when I try to relocate service manualy using clusvcadm or when both nodes have priority 1 everything is successful. Is also successful if I'm restarting node using reboot. I think automatic (after crash) relocating starts too early. In my opinion cluster do not wait for rgmanager start.

now i tried packages:
   rgmanager-2.0.23-1 or rgmanager-2.0.28-1.el5
   cman-2.0.73-1.el5_1.1, cman-2.0.64 and earlier from RHEL5 CD
   openais-0.80.3-7.el5 and earlier from RHEL5 CD


My cluster.conf file:

<?xml version="1.0"?>
<cluster alias="OBN_HA" config_version="26" name="OBN_HA">
   <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
   <clusternodes>
       <clusternode name="l2.local" nodeid="1" votes="1">
           <fence>
               <method name="1">
                   <device name="l2_fence" nodename="l2.local"/>
               </method>
           </fence>
       </clusternode>
       <clusternode name="l1.local" nodeid="2" votes="1">
           <fence>
               <method name="1">
                   <device name="l1_fence" nodename="l1.local"/>
               </method>
           </fence>
       </clusternode>
   </clusternodes>
   <cman expected_votes="1" two_node="1"/>
   <fencedevices>
       <fencedevice agent="fence_manual" name="l1_fence"/>
       <fencedevice agent="fence_manual" name="l2_fence"/>
   </fencedevices>
   <rm>
       <failoverdomains>
           <failoverdomain name="OBN" ordered="1" restricted="0">
               <failoverdomainnode name="l1.local" priority="1"/>
               <failoverdomainnode name="l2.local" priority="2"/>
           </failoverdomain>
       </failoverdomains>
       <resources/>
<service autostart="1" domain="OBN" name="vsftpd" recovery="relocate">
           <script file="/etc/init.d/vsftpd" name="vsftpd"/>
       </service>
   </rm>
</cluster>

Full /dev/log/messages log :
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] Sending initial ORF token
Nov 19 12:27:39 l2 openais[1977]: [CLM  ] CLM CONFIGURATION CHANGE
Nov 19 12:27:39 l2 openais[1977]: [CLM  ] New Configuration:
Nov 19 12:27:39 l2 openais[1977]: [CLM  ] Members Left:
Nov 19 12:27:39 l2 openais[1977]: [CLM  ] Members Joined:
Nov 19 12:27:39 l2 openais[1977]: [CLM  ] CLM CONFIGURATION CHANGE
Nov 19 12:27:39 l2 openais[1977]: [CLM  ] New Configuration:
Nov 19 12:27:39 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.11) Nov 19 12:27:39 l2 openais[1977]: [CLM ] Members Left:
Nov 19 12:27:39 l2 openais[1977]: [CLM  ] Members Joined:
Nov 19 12:27:39 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.11) Nov 19 12:27:39 l2 openais[1977]: [SYNC ] This node is within the primary component and will provide service.
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] entering OPERATIONAL state.
Nov 19 12:27:39 l2 openais[1977]: [CMAN ] quorum regained, resuming activity Nov 19 12:27:39 l2 openais[1977]: [CLM ] got nodejoin message 192.168.10.11
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] entering GATHER state from 11.
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] Saving state aru 9 high seq received 9 Nov 19 12:27:39 l2 openais[1977]: [TOTEM] Storing new sequence id for ring e4
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] entering COMMIT state.
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] entering RECOVERY state.
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] position [0] member 192.168.10.10: Nov 19 12:27:39 l2 openais[1977]: [TOTEM] previous ring seq 224 rep 192.168.10.10 Nov 19 12:27:39 l2 openais[1977]: [TOTEM] aru 1a high delivered 1a received flag 1 Nov 19 12:27:39 l2 openais[1977]: [TOTEM] position [1] member 192.168.10.11: Nov 19 12:27:39 l2 openais[1977]: [TOTEM] previous ring seq 224 rep 192.168.10.11 Nov 19 12:27:39 l2 openais[1977]: [TOTEM] aru 9 high delivered 9 received flag 1 Nov 19 12:27:39 l2 openais[1977]: [TOTEM] Did not need to originate any messages in recovery.
Nov 19 12:27:40 l2 openais[1977]: [CLM  ] CLM CONFIGURATION CHANGE
Nov 19 12:27:40 l2 openais[1977]: [CLM  ] New Configuration:
Nov 19 12:27:40 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.11) Nov 19 12:27:40 l2 openais[1977]: [CLM ] Members Left:
Nov 19 12:27:40 l2 openais[1977]: [CLM  ] Members Joined:
Nov 19 12:27:40 l2 openais[1977]: [CLM  ] CLM CONFIGURATION CHANGE
Nov 19 12:27:40 l2 openais[1977]: [CLM  ] New Configuration:
Nov 19 12:27:40 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.10) Nov 19 12:27:40 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.11) Nov 19 12:27:40 l2 openais[1977]: [CLM ] Members Left:
Nov 19 12:27:40 l2 openais[1977]: [CLM  ] Members Joined:
Nov 19 12:27:40 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.10) Nov 19 12:27:40 l2 openais[1977]: [SYNC ] This node is within the primary component and will provide service.
Nov 19 12:27:40 l2 openais[1977]: [TOTEM] entering OPERATIONAL state.
Nov 19 12:27:40 l2 openais[1977]: [CLM ] got nodejoin message 192.168.10.10 Nov 19 12:27:40 l2 openais[1977]: [CLM ] got nodejoin message 192.168.10.11
Nov 19 12:27:40 l2 openais[1977]: [CPG  ] got joinlist message from node 2
Nov 19 12:27:40 l2 ccsd[1941]: Initial status:: Quorate
[...]
Nov 19 12:28:37 l2 kernel: dlm: Using TCP for communications
Nov 19 12:28:37 l2 kernel: dlm: connecting to 2
Nov 19 12:28:38 l2 clurgmgrd[2687]: <notice> Resource Group Manager Starting
Nov 19 12:28:38 l2 kernel: dlm: got connection from 2
[...]
Nov 19 12:28:47 l2 clurgmgrd: [2687]: <info> Executing /etc/init.d/vsftpd stop
Nov 19 12:28:47 l2 vsftpd: script param: stop
Nov 19 12:30:27 l2 openais[1977]: [TOTEM] The token was lost in the OPERATIONAL state. Nov 19 12:30:27 l2 openais[1977]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes). Nov 19 12:30:27 l2 openais[1977]: [TOTEM] Transmit multicast socket send buffer size (219136 bytes).
Nov 19 12:30:27 l2 openais[1977]: [TOTEM] entering GATHER state from 2.
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] entering GATHER state from 0.
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] Creating commit token because I am the rep. Nov 19 12:30:32 l2 openais[1977]: [TOTEM] Saving state aru 28 high seq received 28 Nov 19 12:30:32 l2 openais[1977]: [TOTEM] Storing new sequence id for ring e8
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] entering COMMIT state.
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] entering RECOVERY state.
Nov 19 12:30:32 l2 fenced[1993]: l1.local not a cluster member after 0 sec post_fail_delay
Nov 19 12:30:32 l2 kernel: dlm: closing connection to node 2
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] position [0] member 192.168.10.11: Nov 19 12:30:32 l2 openais[1977]: [TOTEM] previous ring seq 228 rep 192.168.10.10 Nov 19 12:30:32 l2 openais[1977]: [TOTEM] aru 28 high delivered 28 received flag 1 Nov 19 12:30:32 l2 openais[1977]: [TOTEM] Did not need to originate any messages in recovery.
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] Sending initial ORF token
Nov 19 12:30:32 l2 fenced[1993]: fencing node "l1.local"
Nov 19 12:30:32 l2 openais[1977]: [CLM  ] CLM CONFIGURATION CHANGE
Nov 19 12:30:32 l2 fence_manual: Node l1.local needs to be reset before recovery can procede. Waiting for l1.local to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n l1.local)
Nov 19 12:30:32 l2 openais[1977]: [CLM  ] New Configuration:
Nov 19 12:30:32 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.11) Nov 19 12:30:32 l2 openais[1977]: [CLM ] Members Left: Nov 19 12:30:32 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.10) Nov 19 12:30:32 l2 openais[1977]: [CLM ] Members Joined:
Nov 19 12:30:32 l2 openais[1977]: [CLM  ] CLM CONFIGURATION CHANGE
Nov 19 12:30:32 l2 openais[1977]: [CLM  ] New Configuration:
Nov 19 12:30:32 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.11) Nov 19 12:30:32 l2 openais[1977]: [CLM ] Members Left:
Nov 19 12:30:32 l2 openais[1977]: [CLM  ] Members Joined:
Nov 19 12:30:32 l2 openais[1977]: [SYNC ] This node is within the primary component and will provide service.
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] entering OPERATIONAL state.
Nov 19 12:30:32 l2 openais[1977]: [CLM ] got nodejoin message 192.168.10.11
Nov 19 12:30:32 l2 openais[1977]: [CPG  ] got joinlist message from node 1
Nov 19 12:30:52 l2 fenced[1993]: fence "l1.local" success
Nov 19 12:30:58 l2 clurgmgrd[2687]: <notice> Taking over service service:vsftpd from down member l1.local Nov 19 12:30:58 l2 clurgmgrd: [2687]: <info> Executing /etc/init.d/vsftpd start
Nov 19 12:30:58 l2 vsftpd: script param: start
Nov 19 12:30:59 l2 clurgmgrd[2687]: <notice> Service service:vsftpd started
Nov 19 12:31:07 l2 clurgmgrd: [2687]: <info> Executing /etc/init.d/vsftpd status
Nov 19 12:31:07 l2 vsftpd: script param: status
Nov 19 12:31:37 l2 clurgmgrd: [2687]: <info> Executing /etc/init.d/vsftpd status
Nov 19 12:31:37 l2 vsftpd: script param: status
Nov 19 12:32:07 l2 clurgmgrd: [2687]: <info> Executing /etc/init.d/vsftpd status
Nov 19 12:32:07 l2 vsftpd: script param: status
Nov 19 12:32:25 l2 openais[1977]: [TOTEM] entering GATHER state from 11.
Nov 19 12:32:25 l2 openais[1977]: [TOTEM] Saving state aru 18 high seq received 18 Nov 19 12:32:25 l2 openais[1977]: [TOTEM] Storing new sequence id for ring ec
Nov 19 12:32:25 l2 openais[1977]: [TOTEM] entering COMMIT state.
Nov 19 12:32:25 l2 openais[1977]: [TOTEM] entering RECOVERY state.
Nov 19 12:32:25 l2 openais[1977]: [TOTEM] position [0] member 192.168.10.10: Nov 19 12:32:25 l2 openais[1977]: [TOTEM] previous ring seq 232 rep 192.168.10.10 Nov 19 12:32:25 l2 openais[1977]: [TOTEM] aru 9 high delivered 8 received flag 1 Nov 19 12:32:25 l2 openais[1977]: [TOTEM] position [1] member 192.168.10.11: Nov 19 12:32:25 l2 openais[1977]: [TOTEM] previous ring seq 232 rep 192.168.10.11 Nov 19 12:32:25 l2 openais[1977]: [TOTEM] aru 18 high delivered 18 received flag 1 Nov 19 12:32:25 l2 openais[1977]: [TOTEM] Did not need to originate any messages in recovery.
Nov 19 12:32:25 l2 openais[1977]: [CLM  ] CLM CONFIGURATION CHANGE
Nov 19 12:32:25 l2 openais[1977]: [CLM  ] New Configuration:
Nov 19 12:32:25 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.11) Nov 19 12:32:25 l2 openais[1977]: [CLM ] Members Left:
Nov 19 12:32:25 l2 openais[1977]: [CLM  ] Members Joined:
Nov 19 12:32:25 l2 openais[1977]: [CLM  ] CLM CONFIGURATION CHANGE
Nov 19 12:32:25 l2 openais[1977]: [CLM  ] New Configuration:
Nov 19 12:32:25 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.10) Nov 19 12:32:25 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.11) Nov 19 12:32:25 l2 openais[1977]: [CLM ] Members Left:
Nov 19 12:32:25 l2 openais[1977]: [CLM  ] Members Joined:
Nov 19 12:32:25 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.10) Nov 19 12:32:26 l2 openais[1977]: [SYNC ] This node is within the primary component and will provide service.
Nov 19 12:32:26 l2 openais[1977]: [TOTEM] entering OPERATIONAL state.
Nov 19 12:32:26 l2 openais[1977]: [CLM ] got nodejoin message 192.168.10.10 Nov 19 12:32:26 l2 openais[1977]: [CLM ] got nodejoin message 192.168.10.11
Nov 19 12:32:26 l2 openais[1977]: [CPG  ] got joinlist message from node 1
Nov 19 12:32:26 l2 clurgmgrd[2687]: <notice> Stopping service service:vsftpd
Nov 19 12:32:41 l2 clurgmgrd[2687]: <err> #52: Failed changing RG status
Nov 19 12:32:56 l2 clurgmgrd[2687]: <err> #57: Failed changing RG status
Nov 19 12:32:57 l2 clurgmgrd: [2687]: <info> Executing /etc/init.d/vsftpd status
Nov 19 12:32:57 l2 vsftpd: script param: status
Nov 19 12:33:20 l2 kernel: dlm: connecting to 2
Nov 19 12:33:20 l2 kernel: dlm: got connection from 2
Nov 19 12:33:36 l2 clurgmgrd: [2687]: <info> Executing /etc/init.d/vsftpd status
Nov 19 12:33:36 l2 vsftpd: script param: status
Nov 19 12:34:06 l2 clurgmgrd: [2687]: <info> Executing /etc/init.d/vsftpd status

daro

begin:vcard
fn:Dariusz Skorupa
n:Skorupa;Dariusz
org:WASKO S.A;DWS/SII
adr;dom:;;Barlickiego 18;Gliwice;;44 -100 
email;internet:d skorupa wasko pl
title;quoted-printable:In=C5=BCynier Serwisu
tel;work:+48 32 3325-682
x-mozilla-html:FALSE
version:2.1
end:vcard


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]