[Linux-cluster] Can not establish new cluster 5.3 with luci - quorum error

Alan A alan.zg at gmail.com
Thu Aug 27 13:59:52 UTC 2009


Srry about partial post - web browser did that for me.

On Thu, Aug 27, 2009 at 8:58 AM, Alan A <alan.zg at gmail.com> wrote:

> I decided this morning to start checking packages/versions first. Here are
> some details about the system thus far:
>
> CONF:
> <?xml version="1.0" ?>
> <cluster alias="mrcluster" config_version="2" name="mrcluster">
>     <fence_daemon post_fail_delay="0" post_join_delay="30"/>
>     <clusternodes>
>         <clusternode name="clxmrcati12.xxxxxx.com" nodeid="1" votes="1">
>             <fence>
>                 <method name="1">
>                     <device name="apcps05" option="off" port="3"
> switch="3"/>
>                     <device name="apcps06" option="off" port="3"
> switch="3"/>
>                     <device name="apcps05" option="on" port="3"
> switch="3"/>
>                     <device name="apcps06" option="on" port="3"
> switch="3"/>
>                 </method>
>             </fence>
>         </clusternode>
>         <clusternode name="clxmrcati11.xxxxxx.com" nodeid="2" votes="1">
>             <fence>
>                 <method name="1">
>                     <device name="apcps05" option="off" port="4"
> switch="4"/>
>                     <device name="apcps06" option="off" port="4"
> switch="4"/>
>                     <device name="apcps05" option="on" port="4"
> switch="4"/>
>                     <device name="apcps06" option="on" port="4"
> switch="4"/>
>                 </method>
>             </fence>
>         </clusternode>
>         <clusternode name="clxmrweb20.xxxxxx.com" nodeid="3" votes="1">
>             <fence>
>                 <method name="1">
>                     <device name="apcps05" option="off" port="2"
> switch="2"/>
>                     <device name="apcps06" option="off" port="2"
> switch="2"/>
>                     <device name="apcps05" option="on" port="2"
> switch="2"/>
>                     <device name="apcps06" option="on" port="2"
> switch="2"/>
>                 </method>
>             </fence>
>         </clusternode>
>     </clusternodes>
>     <cman/>
>     <fencedevices>
>         <fencedevice agent="fence_apc" ipaddr="172.XX.XX.27" login="apc"
> name="apcps05" passwd="xxx"/>
>         <fencedevice agent="fence_apc" ipaddr="172.XX.XX..28" login="apc"
> name="apcps06" passwd="xxx"/>
>     </fencedevices>
>     <rm>
>         <failoverdomains/>
>         <resources/>
>     </rm>
> </cluster>
>
> -------------------------------------------------------------------------------------------
> Host Files:
> From Luci Node clxmrcati11:
> 127.0.0.1    localhost.localdomain    localhost
> 172.XX.XX.18    clxmrcati11.xxxxxx.com       clxmrcati11
> 172.XX.XX.19    clxmrcati12.xxxxxx.com       clxmrcati12
> 172.XX.XX.20    clxmrrpt10.xxxxxx.com         clxmrrpt10
> 172.XX.XX.21    clxmrweb20.xxxxxx.com      clxmrweb20
>
> From ricci node clxmrcati12:
> 127.0.0.1    localhost.localdomain    localhost
> 172.XX.XX.19    clxmrcati12.maritz.com               fenclxmrcati12
> 172.XX.XX.21    clxmrweb20.maritz.com       I decided this morning to
> start checking packages/versions first. Here are some details about the
> system thus far:
>
> CONF:
> <?xml version="1.0" ?>
> <cluster alias="mrcluster" config_version="2" name="mrcluster">
>     <fence_daemon post_fail_delay="0" post_join_delay="30"/>
>     <clusternodes>
>         <clusternode name="clxmrcati12.xxxxxx.com" nodeid="1" votes="1">
>             <fence>
>                 <method name="1">
>                     <device name="apcps05" option="off" port="3"
> switch="3"/>
>                     <device name="apcps06" option="off" port="3"
> switch="3"/>
>                     <device name="apcps05" option="on" port="3"
> switch="3"/>
>                     <device name="apcps06" option="on" port="3"
> switch="3"/>
>                 </method>
>             </fence>
>         </clusternode>
>         <clusternode name="clxmrcati11.xxxxxx.com" nodeid="2" votes="1">
>             <fence>
>                 <method name="1">
>                     <device name="apcps05" option="off" port="4"
> switch="4"/>
>                     <device name="apcps06" option="off" port="4"
> switch="4"/>
>                     <device name="apcps05" option="on" port="4"
> switch="4"/>
>                     <device name="apcps06" option="on" port="4"
> switch="4"/>
>                 </method>
>             </fence>
>         </clusternode>
>         <clusternode name="clxmrweb20.xxxxxx.com" nodeid="3" votes="1">
>             <fence>
>                 <method name="1">
>                     <device name="apcps05" option="off" port="2"
> switch="2"/>
>                     <device name="apcps06" option="off" port="2"
> switch="2"/>
>                     <device name="apcps05" option="on" port="2"
> switch="2"/>
>                     <device name="apcps06" option="on" port="2"
> switch="2"/>
>                 </method>
>             </fence>
>         </clusternode>
>     </clusternodes>
>     <cman/>
>     <fencedevices>
>         <fencedevice agent="fence_apc" ipaddr="172.XX.XX.27" login="apc"
> name="apcps05" passwd="xxx"/>
>         <fencedevice agent="fence_apc" ipaddr="172.XX.XX..28" login="apc"
> name="apcps06" passwd="xxx"/>
>     </fencedevices>
>     <rm>
>         <failoverdomains/>
>         <resources/>
>     </rm>
> </cluster>
>
> -------------------------------------------------------------------------------------------
> Host Files:
> From Luci Node clxmrcati11:
> 127.0.0.1    localhost.localdomain    localhost
> 172.XX.XX.18    clxmrcati11.xxxxxx.com       clxmrcati11
> 172.XX.XX.19    clxmrcati12.xxxxxx.com       clxmrcati12
> 172.XX.XX.20    clxmrrpt10.xxxxxx.com         clxmrrpt10
> 172.XX.XX.21    clxmrweb20.xxxxxx.com      clxmrweb20
>
> From ricci node clxmrcati12:
> 127.0.0.1    localhost.localdomain    localhost
> 172.XX.XX.19    clxmrcati12.xxxxxx.com               clxmrcati12
> 172.XX.XX.21    clxmrweb20.xxxxxx.com              clxmrweb20
> 172.XX.XX.20    clxmrrpt10.xxxxxx.com                 clxmrrpt10
> 172.XX.XX.18    clxmrcati11.xxxxxx.com               clxmrcati11
>
> From ricci node clxmrweb20:
> 127.0.0.1    localhost.localdomain    localhost
> 172.XX.XX.21    clxmrweb20.xxxxxx.com             clxmrweb20
> 172.XX.XX.20    clxmrrpt10.xxxxxx.com                clxmrrpt10
> 172.XX.XX.18    clxmrcati11.xxxxxx.com              clxmrcati11
> 172.XX.XX.19    clxmrcati12.xxxxxx.com              clxmrcati12
>
> Mostly this in /var/log/messages:
> Aug 25 09:36:12 fenclxmrcati11 dlm_controld[2267]: connect to ccs error
> -111, check ccsd or cluster status
> Aug 25 09:36:12 fenclxmrcati11 ccsd[3758]: Cluster is not quorate.
> Refusing connection.
> Aug 25 09:36:12 fenclxmrcati11 ccsd[3758]: Error while processing connect:
> Connection refused
> Aug 25 09:36:12 fenclxmrcati11 gfs_controld[2273]: connect to ccs error
> -111, check ccsd or cluster status
> Aug 25 09:36:12 fenclxmrcati11 ccsd[3758]: Cluster is not quorate.
> Refusing connection.
> Aug 25 09:36:12 fenclxmrcati11 ccsd[3758]: Error while processing connect:
> Connection refused
> Aug 25 09:36:13 fenclxmrcati11 ccsd[3758]: Cluster is not quorate.
> Refusing connection.
> Aug 25 09:36:13 fenclxmrcati11 ccsd[3758]: Error while processing connect:
> Connection refused
> Aug 25 09:36:13 fenclxmrcati11 ccsd[3758]: Cluster is not quorate.
> Refusing connection.
> Aug 25 09:36:13 fenclxmrcati11 ccsd[3758]: Error while processing connect:
> Connection refused
> Aug 25 09:36:13 fenclxmrcati11 ccsd[3758]: Cluster is not quorate.
> Refusing connection.
> Aug 25 09:36:13 fenclxmrcati11 ccsd[3758]: Error while processing connect:
> Connection refused
> Aug 25 09:36:14 fenclxmrcati11 ccsd[3758]: Cluster is not quorate.
> Refusing connection.
> Aug 25 09:36:14 fenclxmrcati11 ccsd[3758]: Error while processing connect:
> Connection refused
> Aug 25 09:36:14 fenclxmrcati11 ccsd[3758]: Cluster is not quorate.
> Refusing connection.
> Aug 25 09:36:14 fenclxmrcati11 ccsd[3758]: Error while processing connect:
> Connection re
>
>
>
> On Thu, Aug 27, 2009 at 3:27 AM, Jakov Sosic <jakov.sosic at srce.hr> wrote:
>
>> On Wed, 26 Aug 2009 18:36:26 -0500
>> Alan A <alan.zg at gmail.com> wrote:
>>
>> > I have tried almost everything at this point to try and troubleshoot
>> > this further. I can't create new cluster with luci.
>> >
>>
>          fenclxmrweb20
> 172.XX.XX.20    clxmrrpt10.maritz.com                fenclxmrrpt10
> 172.XX.XX.18    clxmrcati11..com               clxmrcati11
>
>
>
>
>
>
> On Thu, Aug 27, 2009 at 3:27 AM, Jakov Sosic <jakov.sosic at srce.hr> wrote:
>
>> On Wed, 26 Aug 2009 18:36:26 -0500
>> Alan A <alan.zg at gmail.com> wrote:
>>
>> > I have tried almost everything at this point to try and troubleshoot
>> > this further. I can't create new cluster with luci.
>> >
>> > I broke and tried to reconfigure 3 node cluster at least 6 times.
>> >
>> > I have noticed nodes taking expectational long on initializing
>> > fencing upon cman start. I tried with defined and undefined fencing,
>> > the amount of time needed is still the same. Even after the fencing
>> > is overcome in /var/log/messages nodes refuse to join cluster due to
>> > the state of 'not in quorum' during joining process. I uped the
>> > post_join_delay as much as 150 but the result is the same.
>> >
>> > Fencing - I use APC PW Switches - I can login into apc PWS from the
>> > node, I can even fence the other node, but when cman is started it
>> > looks like it is almost timign out on staring fencing.
>> >
>> > If I issue cman_tool nodes it gives me the local node name as the
>> > member of the cluster and the other two with state 'X'. If I try
>> > cman_tool join clustername - it tells me the nodes are already in
>> > that cluster but cluster as the whole does not register. Each node
>> > thinks it's the only working member of the cluster.
>> >
>> >
>> > Any pointers?
>>
>> Looks like network issue to me.
>>
>> Are you sure your network is operational in a sense of a multicast /
>> igmp? Try forcing igmp v1 in sysctl.conf - and if you have Cisco
>> equipment take a look at openais FAQ (mode sparse-dense).
>>
>>
>> --
>> |    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
>> =================================================================
>> | start fighting cancer -> http://www.worldcommunitygrid.org/   |
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
> --
> Alan A.
>



-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090827/7db97f5f/attachment.htm>


More information about the Linux-cluster mailing list