[Linux-cluster] Cluster services die when nonactive node is rebooted

Eric Schneider eschneid at uccs.edu
Fri Jul 23 22:20:32 UTC 2010


I have a few 2 node clusters and I notice that recently the clusters lose
quorum when I reboot the node without running services.  I could do this in
the past without any problems.  CentOS 5.5 on ESX 4.0 u1.  Maybe a bug with
a new kernel or cman software?

 

I get the following right away when the node reboots:

Jul 23 16:02:32 happy5 clurgmgrd[4269]: <notice> Member 2 shutting down

Jul 23 16:02:52 happy5 qdiskd[3562]: <info> Node 2 shutdown

Jul 23 16:03:02 happy5 qdiskd[3562]: <info> Assuming master role

Jul 23 16:03:03 happy5 clurgmgrd[4269]: <emerg> #1: Quorum Dissolved

Jul 23 16:03:03 happy5 openais[3533]: [CMAN ] lost contact with quorum
device

Jul 23 16:03:03 happy5 openais[3533]: [CMAN ] quorum lost, blocking activity

Jul 23 16:03:03 happy5 ccsd[3493]: Cluster is not quorate.  Refusing
connection.

Jul 23 16:03:03 happy5 ccsd[3493]: Error while processing connect:
Connection refused

Jul 23 16:03:03 happy5 ccsd[3493]: Cluster is not quorate.  Refusing
connection.

Jul 23 16:03:03 happy5 ccsd[3493]: Error while processing connect:
Connection refused

Jul 23 16:03:03 happy5 ccsd[3493]: Invalid descriptor specified (-111).

Jul 23 16:03:03 happy5 ccsd[3493]: Someone may be attempting something evil.

Jul 23 16:03:03 happy5 ccsd[3493]: Error while processing get: Invalid
request descriptor

Jul 23 16:03:03 happy5 ccsd[3493]: Invalid descriptor specified (-111).

Jul 23 16:03:03 happy5 ccsd[3493]: Someone may be attempting something evil.

Jul 23 16:03:03 happy5 ccsd[3493]: Error while processing get: Invalid
request descriptor

 

<?xml version="1.0"?>

<cluster alias="delta_cluster" config_version="40" name="delta_cluster">

        <fence_daemon post_fail_delay="5" post_join_delay="120"/>

        <quorumd interval="5" label="delta_qdisk" min_score="1" tko="6"
votes="1">

                <heuristic interval="5" program="ping -t1 -c1 192.168.1.1"
score="1"/>

        </quorumd>

        <clusternodes>

                <clusternode name="node1" nodeid="1" votes="1">

                        <fence>

                                <method name="1">

                                        <device name="node1"/>

                                </method>

                        </fence>

                </clusternode>

                <clusternode name="node2" nodeid="2" votes="1">

                        <fence>

                                <method name="1">

                                        <device name="node2"/>

                                </method>

                        </fence>

                </clusternode>

        </clusternodes>

        <cman expected_votes="3">

                <multicast addr="224.0.0.1" interface="eth0"/>

        </cman>

        <fencedevices>

                <fencedevice agent="fence_manual" name="fence_manual"/>

                <fencedevice agent="fence_vmware" ipaddr="bob"
login="username" name="node1" passwd="password" port="node1"/>

                <fencedevice agent="fence_vmware" ipaddr="bob"
login="username" name="node2" passwd="password" port="node2"/>

        </fencedevices>

        <rm>

                <failoverdomains>

                        <failoverdomain name="node1" ordered="0"
restricted="1">

                                <failoverdomainnode name="node1"
priority="1"/>

                        </failoverdomain>

                        <failoverdomain name="node2" restricted="1">

                                <failoverdomainnode name="node2"
priority="1"/>

                        </failoverdomain>

                        <failoverdomain name="failover_pro-http"
restricted="0">

                                <failoverdomainnode name="node1"
priority="1"/>

                                <failoverdomainnode name="node2"
priority="1"/>

                        </failoverdomain>

                </failoverdomains>

             

        </rm>

        <totem token="21000"/>

</cluster>

 

Thanks,

 

Eric 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100723/c1475d75/attachment.htm>


More information about the Linux-cluster mailing list