[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] can't re-join cluster after upgrade



openais from 5.2 and 5.3 cannot talk top each other. There is a bugzilla ticket on this (but I cannot find the id right now) and several requests to RH support.

While RH support still works on it preliminary information indicates that there won't be a fix for this, and I also dooubt that there will be a workaround.

Shutting down amd restarting the entire cluster solves the problem.

Installing openais from RHEL 5.2 will let the updated node join the cluster as well, if you want it up and can't shutdown node 2 as well.

best regards, Gunther


Ramiro Blanco wrote:
Hi, I've just upgraded 1 of my 2-node cluster to RHEL 5.3 and now that
node can't join the cluster. Can i upgrade 1 node at a time?
here's the output of /var/log/messages:

...
Mar  9 03:26:34 web1 ccsd[29129]: Starting ccsd 2.0.98:
Mar  9 03:26:34 web1 ccsd[29129]:  Built: Dec  3 2008 16:32:30
Mar  9 03:26:34 web1 ccsd[29129]:  Copyright (C) Red Hat, Inc.  2004
All rights reserved.
Mar  9 03:26:34 web1 ccsd[29129]: cluster.conf (cluster name =
cluster_web, version = 3) found.
Mar  9 03:26:34 web1 ccsd[29129]: Remote copy of cluster.conf is from
quorate node.
Mar  9 03:26:34 web1 ccsd[29129]:  Local version # : 3
Mar  9 03:26:34 web1 ccsd[29129]:  Remote version #: 3
Mar  9 03:26:34 web1 ccsd[29129]: Remote copy of cluster.conf is from
quorate node.
Mar  9 03:26:34 web1 ccsd[29129]:  Local version # : 3
Mar  9 03:26:34 web1 ccsd[29129]:  Remote version #: 3
Mar  9 03:26:34 web1 ccsd[29129]: Remote copy of cluster.conf is from
quorate node.
Mar  9 03:26:34 web1 ccsd[29129]:  Local version # : 3
Mar  9 03:26:34 web1 ccsd[29129]:  Remote version #: 3
Mar  9 03:26:34 web1 ccsd[29129]: Remote copy of cluster.conf is from
quorate node.
Mar  9 03:26:34 web1 ccsd[29129]:  Local version # : 3
Mar  9 03:26:34 web1 ccsd[29129]:  Remote version #: 3
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] AIS Executive Service
RELEASE 'subrev 1358 version 0.80.3'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Copyright (C) 2002-2006
MontaVista Software, Inc and contributors.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Copyright (C) 2006 Red
Hat, Inc.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] AIS Executive Service:
started and ready to provide service.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Using default multicast
address of 239.192.73.137
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_cpg loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais cluster closed process group service v1.01'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_cfg loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais configuration service'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_msg loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais message service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_lck loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais distributed locking service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_evt loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais event service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_ckpt loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais checkpoint service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_amf loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais availability management framework B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_clm loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais cluster membership service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_evs loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais extended virtual synchrony service'
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] openais component
openais_cman loaded.
Mar  9 03:26:34 web1 openais[29135]: [MAIN ] Registering service
handler 'openais CMAN membership service 2.01'
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Token Timeout (10000 ms)
retransmit timeout (495 ms)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] token hold (386 ms)
retransmits before loss (20 retrans)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] join (60 ms) send_join (0
ms) consensus (4800 ms) merge (200 ms)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] downcheck (1000 ms) fail
to recv const (50 msgs)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] seqno unchanged const (30
rotations) Maximum network MTU 1500
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] window size per rotation
(50 messages) maximum messages per rotation (17 messages)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] send threads (0 threads)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] RRP token expired timeout
(495 ms)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] RRP token problem counter
(2000 ms)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] RRP threshold (10 problem
count)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] RRP mode set to none.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM]
heartbeat_failures_allowed (0)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] max_network_delay (50 ms)
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] HeartBeat is Disabled. To
enable set heartbeat_failures_allowed > 0
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Receive multicast socket
recv buffer size (262142 bytes).
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Transmit multicast socket
send buffer size (262142 bytes).
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] The network interface
[192.168.10.3] is now up.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Created or loaded
sequence id 280.192.168.10.3 for this ring.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering GATHER state
from 15.
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais extended virtual synchrony service'
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais cluster membership service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais availability management framework B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais checkpoint service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais event service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais distributed locking service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais message service B.01.01'
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais configuration service'
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais cluster closed process group service v1.01'
Mar  9 03:26:34 web1 openais[29135]: [SERV ] Initialising service
handler 'openais CMAN membership service 2.01'
Mar  9 03:26:34 web1 openais[29135]: [CMAN ] CMAN 2.0.98 (built Dec  3
2008 16:32:34) started
Mar  9 03:26:34 web1 openais[29135]: [SYNC ] Not using a virtual
synchrony filter.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Creating commit token
because I am the rep.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Saving state aru 0 high
seq received 0
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Storing new sequence id
for ring 11c
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering COMMIT state.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering RECOVERY state.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] position [0] member
192.168.10.3:
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] previous ring seq 280 rep
192.168.10.3
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] aru 0 high delivered 0
received flag 1
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Did not need to originate
any messages in recovery.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Sending initial ORF token
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] CLM CONFIGURATION CHANGE
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] New Configuration:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Left:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Joined:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] CLM CONFIGURATION CHANGE
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] New Configuration:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ]    r(0) ip(192.168.10.3)
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Left:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Joined:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ]    r(0) ip(192.168.10.3)
Mar  9 03:26:34 web1 openais[29135]: [SYNC ] This node is within the
primary component and will provide service.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering OPERATIONAL state.
Mar  9 03:26:34 web1 openais[29135]: [CMAN ] quorum regained, resuming activity
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] got nodejoin message 192.168.10.3
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering GATHER state from 11.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Creating commit token
because I am the rep.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Saving state aru a high
seq received a
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Storing new sequence id
for ring 120
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering COMMIT state.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering RECOVERY state.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] position [0] member 192.168.10.3:
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] previous ring seq 284 rep
192.168.10.3
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] aru a high delivered a
received flag 1
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] position [1] member 192.168.10.4:
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] previous ring seq 284 rep
192.168.10.4
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] aru 8e high delivered 8e
received flag 1
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Did not need to originate
any messages in recovery.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] Sending initial ORF token
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] CLM CONFIGURATION CHANGE
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] New Configuration:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ]    r(0) ip(192.168.10.3)
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Left:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Joined:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] CLM CONFIGURATION CHANGE
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] New Configuration:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ]    r(0) ip(192.168.10.3)
Mar  9 03:26:34 web1 openais[29135]: [CLM  ]    r(0) ip(192.168.10.4)
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Left:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] Members Joined:
Mar  9 03:26:34 web1 openais[29135]: [CLM  ]    r(0) ip(192.168.10.4)
Mar  9 03:26:34 web1 openais[29135]: [SYNC ] This node is within the
primary component and will provide service.
Mar  9 03:26:34 web1 openais[29135]: [TOTEM] entering OPERATIONAL state.
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] got nodejoin message 192.168.10.3
Mar  9 03:26:34 web1 openais[29135]: [CLM  ] got nodejoin message 192.168.10.4
..

Any help would be appreciated.




--
Gunther Schlegel
Manager IT Infrastructure


.............................................................
Riege Software International GmbH  Fon: +49 (2159) 9148 0
Mollsfeld 10                       Fax: +49 (2159) 9148 11
40670 Meerbusch                    Web: www.riege.com
Germany                            E-Mail: schlegel riege com
---                                ---
Handelsregister:                   Managing Directors:
Amtsgericht Neuss HRB-NR 4207      Christian Riege
USt-ID-Nr.: DE120585842            Gabriele  Riege
                                  Johannes  Riege
.............................................................
YOU CARE FOR FREIGHT, WE CARE FOR YOU


begin:vcard
fn:Gunther Schlegel
n:Schlegel;Gunther
org:Riege Software International GmbH;IT Infrastructure
adr:;;Mollsfeld 10;Meerbusch;;40670;Germany
email;internet:schlegel riege com
title:Manager IT Infrastructure
tel;work:+49-2159-9148-0
tel;fax:+49-2159-9148-11
x-mozilla-html:FALSE
url:http://riege.com
version:2.1
end:vcard


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]