[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] Killing node XXX because it has rejoined the cluster with existing state



Hi,

 

I have problem with two node cluster. When I force a node to faile, second node fences first one. When first one rejoin my cluster, cman shutdown on both nodes saying :

 

Sep 28 17:29:36 s64lmwbig3c openais[7273]: [MAIN ] Killing node s64lmwbig3b because it has rejoined the cluster with existing state

Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CMAN ] cman killed by node 1 because we rejoined the cluster without a full restart

 

 

Logs :

See attached

 

Conf :

<?xml version="1.0"?>

<cluster config_version="12" name="u64lmwbig8r">

        <cman expected_votes="1" two_node="1">

                <multicast addr="239.192.0.11"/>

        </cman>

        <clusternodes>

                <clusternode name="s64lmwbig3b" nodeid="1" votes="1">

                        <fence>

                                <method name="single">

                                        <device name="fenceHP_g3b"/>

                                </method>

                        </fence>

                </clusternode>

                <clusternode name="s64lmwbig3c" nodeid="2" votes="1">

                        <fence>

                                <method name="single">

                                        <device name="fenceHP_g3c"/>

                                </method>

                        </fence>

                </clusternode>

        </clusternodes>

        <fencedevices>

                <fencedevice agent="fence_ipmilan" ipaddr="XXXXX" lanplus="1" login="user" name="fenceHP_g3b" passwd="password" verbose="yes"/>

                <fencedevice agent="fence_ipmilan" ipaddr="XXXXX" lanplus="1" login="user" name="fenceHP_g3c" passwd="password" verbose="yes"/>

        </fencedevices>

        <rm>

                <failoverdomains/>

                <resources/>

        </rm>

        <fence_daemon clean_start="0" post_fail_delay="20" post_join_delay="60"/>

</cluster>

 

Do you know what I missed ?

 

Thanks

Regards,

cid:716255414@15022011-2264

 

Jean-Daniel BONNETOT

 

Sep 28 17:25:23 s64lmwbig3c fenced[7294]: s64lmwbig3b not a cluster member after 20 sec post_fail_delay
Sep 28 17:25:23 s64lmwbig3c fenced[7294]: fencing node "s64lmwbig3b"
Sep 28 17:25:34 s64lmwbig3c fenced[7294]: fence "s64lmwbig3b" success
…
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] entering GATHER state from 11.
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] Saving state aru 13 high seq received 13
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] Storing new sequence id for ring 1c8
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] entering COMMIT state.
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] entering RECOVERY state.
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] position [0] member 10.151.231.215:
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] previous ring seq 452 rep 10.151.231.215
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] aru c high delivered c received flag 1
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] position [1] member 10.151.231.216:
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] previous ring seq 452 rep 10.151.231.216
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] aru 13 high delivered 13 received flag 1
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] Did not need to originate any messages in recovery.
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM  ] CLM CONFIGURATION CHANGE
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM  ] New Configuration:
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM  ]      r(0) ip(10.151.231.216)
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM  ] Members Left:
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM  ] Members Joined:
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM  ] CLM CONFIGURATION CHANGE
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM  ] New Configuration:	
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM  ]      r(0) ip(10.151.231.215)
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM  ]      r(0) ip(10.151.231.216)
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM  ] Members Left:
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM  ] Members Joined:
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM  ]      r(0) ip(10.151.231.215)
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [SYNC ] This node is within the primary component and will provide service.
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] entering OPERATIONAL state.
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM  ] got nodejoin message 10.151.231.215
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM  ] got nodejoin message 10.151.231.216
Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CPG  ] got joinlist message from node 2
Sep 28 17:29:20 s64lmwbig3c kernel: dlm: got connection from 1
…
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] entering GATHER state from 11.
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] Creating commit token because I am the rep.
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] Saving state aru 2f high seq received 2f
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] Storing new sequence id for ring 1cc
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] entering COMMIT state.
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] entering RECOVERY state.
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] position [0] member 10.151.231.216:
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] previous ring seq 456 rep 10.151.231.215
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] aru 2f high delivered 2f received flag 1
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] Did not need to originate any messages in recovery.
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] Sending initial ORF token
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] CLM CONFIGURATION CHANGE
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] New Configuration:
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ]      r(0) ip(10.151.231.216)
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] Members Left:
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ]      r(0) ip(10.151.231.215)
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] Members Joined:
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] CLM CONFIGURATION CHANGE
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] New Configuration:
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ]      r(0) ip(10.151.231.216)
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] Members Left:
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] Members Joined:
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [SYNC ] This node is within the primary component and will provide service.
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] entering OPERATIONAL state.
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] got nodejoin message 10.151.231.216
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CPG  ] got joinlist message from node 2
Sep 28 17:29:36 s64lmwbig3c kernel: dlm: closing connection to node 1
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] entering GATHER state from 9.
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] Saving state aru e high seq received e
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] Storing new sequence id for ring 1d0
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] entering COMMIT state.
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] entering RECOVERY state.
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] position [0] member 10.151.231.215:
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] previous ring seq 460 rep 10.151.231.215
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] aru f high delivered f received flag 1
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] position [1] member 10.151.231.216:
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] previous ring seq 460 rep 10.151.231.216
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] aru e high delivered e received flag 1
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] Did not need to originate any messages in recovery.
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] CLM CONFIGURATION CHANGE
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] New Configuration:
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ]      r(0) ip(10.151.231.216)
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] Members Left:
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] Members Joined:
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] CLM CONFIGURATION CHANGE
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] New Configuration:
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ]      r(0) ip(10.151.231.215)
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ]      r(0) ip(10.151.231.216)
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] Members Left:
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] Members Joined:
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ]      r(0) ip(10.151.231.215)
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [SYNC ] This node is within the primary component and will provide service.
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] entering OPERATIONAL state.
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [MAIN ] Killing node s64lmwbig3b because it has rejoined the cluster with existing state
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CMAN ] cman killed by node 1 because we rejoined the cluster without a full restart
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] got nodejoin message 10.151.231.215
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM  ] got nodejoin message 10.151.231.216
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CPG  ] got joinlist message from node 1
Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CPG  ] got joinlist message from node 2
Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading all openais components
Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_confdb v0 (20/10)
Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_cpg v0 (19/8)
Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_cfg v0 (18/7)
Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_msg v0 (17/6)
Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_lck v0 (16/5)
Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_evt v0 (15/4)
Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_ckpt v0 (14/3)
Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_amf v0 (13/2)
Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_clm v0 (12/1)
Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_evs v0 (11/0)
Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_cman v0 (10/9)
Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] AIS Executive exiting (reason: CMAN kill requested, exiting).
Sep 28 17:29:37 s64lmwbig3c gfs_controld[7306]: cluster is down, exiting
Sep 28 17:29:37 s64lmwbig3c dlm_controld[7300]: cluster is down, exiting
Sep 28 17:29:37 s64lmwbig3c kernel: dlm: closing connection to node 2
Sep 28 17:29:37 s64lmwbig3c clurgmgrd[8204]: <warning> #67: Shutting down uncleanly
Sep 28 17:29:37 s64lmwbig3c clurgmgrd[8204]: <notice> Shutdown complete, exiting
Sep 28 17:29:37 s64lmwbig3c syslogd: /dev/console: Invalid argument
Sep 28 17:30:03 s64lmwbig3c ccsd[7263]: Unable to connect to cluster infrastructure after 30 seconds.
-------
Ce message et toutes les pièces jointes sont établis à l'intention exclusive de ses destinataires et sont confidentiels. L'intégrité de ce message n'étant pas assurée sur Internet, la SNCF ne peut être tenue responsable des altérations qui pourraient se produire sur son contenu. Toute publication, utilisation, reproduction, ou diffusion, même partielle, non autorisée préalablement par la SNCF, est strictement interdite. Si vous n'êtes pas le destinataire de ce message, merci d'en avertir immédiatement l'expéditeur et de le détruire.
-------
This message and any attachments are intended solely for the addressees and are confidential. SNCF may not be held responsible for their contents whose accuracy and completeness cannot be guaranteed over the Internet. Unauthorized use, disclosure, distribution, copying, or any part thereof is strictly prohibited. If you are not the intended recipient of this message, please notify the sender immediately and delete it. 

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]