[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] cman startup after after update to 5.3



Rolling back to openais-0.80.3-15.el5 worked for me as well.

Though, this is an 5.3 update blocker, as it prevents rolling upgrades -- and that is why you run a cluster, ins't it?

I also have no clue whether a native "nativwe" 5.3 / openais-0.80.3-22el5 system will work. Can anyone confirm this?


regards, Gunther


Dave Costakos wrote:
Confirmed. Same here. Seems like a bug to me still though. I would hope we have to ability to do rolling upgrades on openais in our RHEL clusters.
2009/1/28 Alan A <alan zg gmail com <mailto:alan zg gmail com>>

    Rolling back to previous openais package allowed me to restart cman.
    From openais-0.80.3-22el5 to
    openais-0.80.3-15.el5.


    2009/1/28 Dave Costakos <david costakos gmail com
    <mailto:david costakos gmail com>>

        Like you, I've run into this same issue.  I have 2 clusters that
        I'm trying to update in our lab.  On one, I only updated the
        cman and rgmanager packages: this update was successful.  On
        another I did a full update to 5.3 and ran into what appears to
        be this same problem.  II've noticed that manually attempting to
        start cman via 'cman_tool -d join' prints out this message right
        before cman fails.

        aisexec: ckpt.c:3961: message_handler_req_exec_ckpt_sync_checkpoint_refcount:Assertion `checkpoint != ((void *)0)' failed





        I suspect an openais issue, would someone be able to confirm that?

        Also, II'm going to try downgrading openais back to the version from RHEL 5.2 to see if that fixes it (though I won't get to that until the end of today).  If that works, I'll report back.





        2009/1/27 Alan A <alan zg gmail com <mailto:alan zg gmail com>>

            I just opened RHEL case number 1890184 regarding the same
            issue. First Kernel would not start due to the HP ILO driver
            conflict, but at the same time CMAN broke, and fencing
            fails. I rolled back cman rpm to the previous version but
            problem persists. Something else changed to affect CMAN not
            starting again.

            2009/1/27 Gunther Schlegel <schlegel riege com
            <mailto:schlegel riege com>>

                Hello,

                I updated one node from 5.2 to 5.3 using yum update and
                now cman does not start up anymore -- looks like ccsd
                has some problems:

                [root motel6 /]# /sbin/ccsd -4 -n
                Starting ccsd 2.0.98:
                 Built: Dec  3 2008 16:32:30
                 Copyright (C) Red Hat, Inc.  2004  All rights reserved.
                 IP Protocol:: IPv4 only
                 No Daemon:: SET

                Cluster is not quorate.  Refusing connection.
                Error while processing connect: Connection refused
                Cluster is not quorate.  Refusing connection.
                Error while processing connect: Connection refused
                Unable to connect to cluster infrastructure after 30
                seconds.
                Unable to connect to cluster infrastructure after 60
                seconds.


                When starting ccsd using /etc/init.d/cman it reports all
                three nodes to be on cluster.conf version 78, so I guess
                it is not a network connectivity problem.

                The other two nodes (still on 5.2z) of the cluster are
                up and running with quorum. Openais is talking to those
                2 other nodes and it looks fine to me:

                Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] Members
                Joined:
                Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] #011r(0)
                ip(10.11.5.22)
                Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] #011r(0)
                ip(10.11.5.23)
                Jan 27 21:05:26 motel6 openais[1278]: [SYNC ] This node
                is within the primary component and will provide service.
                Jan 27 21:05:26 motel6 openais[1278]: [TOTEM] entering
                OPERATIONAL state.
                Jan 27 21:05:26 motel6 openais[1278]: [CMAN ] quorum
                regained, resuming activity
                Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got
                nodejoin message 10.11.5.21
                Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got
                nodejoin message 10.11.5.22
                Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got
                nodejoin message 10.11.5.23


                I am a bit lost...

                cluster.conf:
                [root motel6 init.d]# cat /etc/cluster/cluster.conf
                <?xml version="1.0"?>
                <cluster alias="RSIXENCluster2" config_version="87"
                name="RSIXENCluster2">
                       <fence_daemon clean_start="0" post_fail_delay="0"
                post_join_delay="3"/>
                       <clusternodes>
                               <clusternode name="concorde.riege.de
                <http://concorde.riege.de>" nodeid="1" votes="1">
                                       <fence>
                                               <method name="1">
                                                       <device
                name="Concorde_IPMI"/>
                                               </method>
                                       </fence>
                               </clusternode>
                               <clusternode name="motel6.riege.de
                <http://motel6.riege.de>" nodeid="2" votes="1">
                                       <fence>
                                               <method name="1">
                                                       <device
                name="Motel6_IPMI"/>
                                               </method>
                                       </fence>
                               </clusternode>
                               <clusternode name="mercure.riege.de
                <http://mercure.riege.de>" nodeid="3" votes="1">
                                       <fence>
                                               <method name="1">
                                                       <device
                name="Mercure_IPMI"/>
                                               </method>
                                       </fence>
                               </clusternode>
                       </clusternodes>
                       <fencedevices>
                               <fencedevice agent="fence_ipmilan"
                ipaddr="10.11.5.132" login="root" name="Concorde_IPMI"
                passwd="XXX"/>
                               <fencedevice agent="fence_ipmilan"
                ipaddr="10.11.5.131" login="root" name="Motel6_IPMI"
                passwd="xxx"/>
                               <fencedevice agent="fence_ipmilan"
                ipaddr="10.11.5.133" login="root" name="Mercure_IPMI"
                passwd="XXX"/>
                       </fencedevices>
                       <rm>
                               <failoverdomains>
                                       <failoverdomain name="Earth"
                nofailback="1" ordered="1" restricted="1">
                                               <failoverdomainnode
                name="concorde.riege.de <http://concorde.riege.de>"
                priority="1"/>
                                               <failoverdomainnode
                name="motel6.riege.de <http://motel6.riege.de>"
                priority="1"/>
                                               <failoverdomainnode
                name="mercure.riege.de <http://mercure.riege.de>"
                priority="1"/>
                                       </failoverdomain>
                                       <failoverdomain name="Europe"
                nofailback="0" ordered="1" restricted="0">
                                               <failoverdomainnode
                name="concorde.riege.de <http://concorde.riege.de>"
                priority="2"/>
                                       </failoverdomain>
                                       <failoverdomain name="North
                America" nofailback="0" ordered="1" restricted="0">
                                               <failoverdomainnode
                name="motel6.riege.de <http://motel6.riege.de>"
                priority="2"/>
                                       </failoverdomain>
                                       <failoverdomain name="Africa"
                nofailback="0" ordered="1" restricted="0">
                                               <failoverdomainnode
                name="mercure.riege.de <http://mercure.riege.de>"
                priority="1"/>
                                       </failoverdomain>
                               </failoverdomains>
                               <resources/>
                               <vm autostart="1" domain="Africa"
                exclusive="0" migrate="live"
                name="vm64.test.riege.de_64" path="/etc/xen"
                recovery="restart"/>
                               <vm autostart="1" domain="North America"
                exclusive="0" migrate="pause" name="rt.test.riege.de_32"
                path="/etc/xen" recovery="restart"/>
                               <vm autostart="1" domain="Africa"
                exclusive="0" migrate="pause"
                name="poincare.riege.de_32" path="/etc/xen"
                recovery="restart"/>
                               <vm autostart="1" domain="North America"
                exclusive="0" migrate="live"
                name="jboss.dev.riege.de_64" path="/etc/xen"
                recovery="relocate"/>
                               <vm autostart="1" domain="Africa"
                exclusive="0" migrate="live"
                name="master.cc3.dev.riege.de_64" path="/etc/xen"
                recovery="relocate"/>
                               <vm autostart="1" domain="Europe"
                exclusive="0" migrate="pause"
                name="test.alphatrans.scope.riege.com_32"
                path="/etc/xen" recovery="relocate"/>
                               <vm autostart="1" domain="North America"
                exclusive="0" migrate="live"
                name="slave.cc3.dev.riege.de_64" path="/etc/xen"
                recovery="restart"/>
                               <vm autostart="1" domain="North America"
                exclusive="0" migrate="live" name="webmail.riege.com_64"
                path="/etc/xen" recovery="relocate"/>
                               <vm autostart="1" domain="Europe"
                exclusive="0" migrate="live"
                name="live.rsi.scope.riege.com_64" path="/etc/xen"
                recovery="relocate"/>
                               <vm autostart="1" domain="Europe"
                exclusive="0" migrate="pause"
                name="qa-16.rsi.scope.riege.com_32" path="/etc/xen"
                recovery="relocate"/>
                               <vm autostart="1" domain="Africa"
                exclusive="0" migrate="pause"
                name="qa-18.rsi.scope.riege.com_32" path="/etc/xen"
                recovery="relocate"/>
                               <vm autostart="1" domain="Africa"
                exclusive="0" migrate="pause"
                name="vm32.test.riege.de_32" path="/etc/xen"
                recovery="restart"/>
                               <vm autostart="1" domain="Europe"
                exclusive="0" migrate="pause"
                name="qa-head.rsi.scope.riege.com_32" path="/etc/xen"
                recovery="restart"/>
                               <vm autostart="1" domain="North America"
                exclusive="0" migrate="live" name="mq.dev.riege.de_64"
                path="/etc/xen" recovery="relocate"/>
                               <vm autostart="1" domain="Europe"
                exclusive="0" migrate="live"
                name="archive.dev.riege.de_64" path="/etc/xen"
                recovery="restart"/>
                       </rm>
                       <cman quorum_dev_poll="50000"/>
                       <totem consensus="4800" join="60" token="60000"
                token_retransmits_before_loss_const="20"/>
                       <quorumd device="/dev/mapper/Quorum_Partition"
                interval="3" min_score="1" tko="10" votes="2"/>
                </cluster>

                best regards, Gunther

-- .............................................................
                Riege Software International GmbH  Fon: +49 (2159) 9148 0
                Mollsfeld 10                       Fax: +49 (2159) 9148 11
                40670 Meerbusch                    Web: www.riege.com
                <http://www.riege.com>
                Germany                            E-Mail:
                schlegel riege com <mailto:schlegel riege com>
                ---                                ---
                Handelsregister:                   Managing Directors:
                Amtsgericht Neuss HRB-NR 4207      Christian Riege
                USt-ID-Nr.: DE120585842            Gabriele  Riege
                                                 Johannes  Riege
                .............................................................
YOU CARE FOR FREIGHT, WE CARE FOR YOU


                --
                Linux-cluster mailing list
                Linux-cluster redhat com <mailto:Linux-cluster redhat com>
                https://www.redhat.com/mailman/listinfo/linux-cluster




-- Alan A.

            --
            Linux-cluster mailing list
            Linux-cluster redhat com <mailto:Linux-cluster redhat com>
            https://www.redhat.com/mailman/listinfo/linux-cluster




-- Dave Costakos
        mailto:david costakos gmail com <mailto:david costakos gmail com>

        --
        Linux-cluster mailing list
        Linux-cluster redhat com <mailto:Linux-cluster redhat com>
        https://www.redhat.com/mailman/listinfo/linux-cluster




-- Alan A.

    --
    Linux-cluster mailing list
    Linux-cluster redhat com <mailto:Linux-cluster redhat com>
    https://www.redhat.com/mailman/listinfo/linux-cluster




--
Dave Costakos
mailto:david costakos gmail com <mailto:david costakos gmail com>


------------------------------------------------------------------------

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Gunther Schlegel
Manager IT Infrastructure


.............................................................
Riege Software International GmbH  Fon: +49 (2159) 9148 0
Mollsfeld 10                       Fax: +49 (2159) 9148 11
40670 Meerbusch                    Web: www.riege.com
Germany                            E-Mail: schlegel riege com
---                                ---
Handelsregister:                   Managing Directors:
Amtsgericht Neuss HRB-NR 4207      Christian Riege
USt-ID-Nr.: DE120585842            Gabriele  Riege
                                  Johannes  Riege
.............................................................
YOU CARE FOR FREIGHT, WE CARE FOR YOU


begin:vcard
fn:Gunther Schlegel
n:Schlegel;Gunther
org:Riege Software International GmbH;IT Infrastructure
adr:;;Mollsfeld 10;Meerbusch;;40670;Germany
email;internet:schlegel riege com
title:Manager IT Infrastructure
tel;work:+49-2159-9148-0
tel;fax:+49-2159-9148-11
x-mozilla-html:FALSE
url:http://riege.com
version:2.1
end:vcard


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]