[Linux-cluster] 3 node cluster problems

Panigrahi, Santosh Kumar Santosh.Panigrahi at in.unisys.com
Tue Mar 25 12:33:29 UTC 2008


If you are configuring your cluster by system-config-cluster then no
need to run ricci/luci. Ricci/luci needed for configuring the cluster
using conga. You can configure in either ways.

On seeing your clustat command outputs, it seems cluster is partitioned
(spilt brain) into 2 sub clusters [Sub1- (csarcsys1-eth0,
csarcsys2-eth0) 2- csarcsys3-eth0]. Without a quorum device you can more
often face this situation. To avoid this you can configure a quorum
device with a heuristic like ping message. Use the link
(http://www.redhatmagazine.com/2007/12/19/enhancing-cluster-quorum-with-
qdisk/) for configuring a quorum disk in RHCS.

Thanks,
S

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Dalton, Maurice
Sent: Tuesday, March 25, 2008 5:18 PM
To: linux clustering
Subject: RE: [Linux-cluster] 3 node cluster problems

Still no change. Same as below. 

I completely rebuilt the cluster using system-config-cluster
The Cluster software was installed from rhn, luci and ricci are running.

This is the new config file and it has been copied to the 2 other
systems



[root at csarcsys1-eth0 cluster]# more cluster.conf
<?xml version="1.0"?>
<cluster config_version="5" name="csarcsys5">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="csarcsys1-eth0" nodeid="1" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="csarcsys2-eth0" nodeid="2" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="csarcsys3-eth0" nodeid="3" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman/>
        <fencedevices/>
        <rm>
                <failoverdomains>
                        <failoverdomain name="csarcsysfo" ordered="0"
restricted="1">
                                <failoverdomainnode
name="csarcsys1-eth0" priority="1"/>
                                <failoverdomainnode
name="csarcsys2-eth0" priority="1"/>
                                <failoverdomainnode
name="csarcsys3-eth0" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="172.xx.xx.xxx" monitor_link="1"/>
                        <fs device="/dev/sdc1" force_fsck="0"
force_unmount="1" fsid="57739" fstype="ext3" mountpo
int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>
                </resources>
        </rm>
</cluster>

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bennie Thomas
Sent: Monday, March 24, 2008 4:17 PM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

Did you load the Cluster software via Conga or manually ? You would have

had to load
luci on one node and ricci on all three.

Try copying the modified /etc/cluster/cluster.conf from csarcsys1 to the

other two nodes.
Make sure you can ping the private interface to/from all nodes and 
reboot. If this does not work
post your /etc/cluster/cluster.conf file again.


Dalton, Maurice wrote:
> Yes
> I also rebooted again just now to be sure.
>
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bennie Thomas
> Sent: Monday, March 24, 2008 3:33 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] 3 node cluster problems
>
> When you changed the nodenames in the /etc/lcuster/cluster.conf and
made
>
> sure the /etc/hosts
> file had the correct nodenames (Ie. 10.0.0.100  csarcsys1-eth0   
> csarcsys1-eth0.xxxx.xxxx.xxx.)
> Did you reboot all the nodes at the sametime ?
>
> Dalton, Maurice wrote:
>   
>> No luck. It seems as if csarcsys3 thinks its in his own cluster
>> I renamed all config files and rebuilt from system-config-cluster
>>
>> Clustat command from csarcsys3
>>
>>
>> [root at csarcsys3-eth0 cluster]# clustat
>> msg_open: No such file or directory
>> Member Status: Inquorate
>>
>>   Member Name                        ID   Status
>>   ------ ----                        ---- ------
>>   csarcsys1-eth0                        1 Offline
>>   csarcsys2-eth0                        2 Offline
>>   csarcsys3-eth0                        3 Online, Local
>>
>> clustat command from csarcsys2 
>>
>> [root at csarcsys2-eth0 cluster]# clustat
>> msg_open: No such file or directory
>> Member Status: Quorate
>>
>>   Member Name                        ID   Status
>>   ------ ----                        ---- ------
>>   csarcsys1-eth0                        1 Online
>>   csarcsys2-eth0                        2 Online, Local
>>   csarcsys3-eth0                        3 Offline
>>
>>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bennie Thomas
>> Sent: Monday, March 24, 2008 2:25 PM
>> To: linux clustering
>> Subject: Re: [Linux-cluster] 3 node cluster problems
>>
>> You will also, need to make sure the clustered nodenames are in your 
>> /etc/hosts file.
>> Also, make sure your cluster network interface is up on all nodes and

>> that the
>> /etc/cluster/cluster.conf are the same on all nodes.
>>
>>
>>
>> Dalton, Maurice wrote:
>>   
>>     
>>> The last post is incorrect.
>>>
>>> Fence is still hanging at start up.
>>>
>>> Here's another log message.
>>>
>>> Mar 24 19:03:14 csarcsys3-eth0 ccsd[6425]: Error while processing 
>>> connect: Connection refused
>>>
>>> Mar 24 19:03:15 csarcsys3-eth0 dlm_controld[6453]: connect to ccs 
>>> error -111, check ccsd or cluster status
>>>
>>> *From:* linux-cluster-bounces at redhat.com 
>>> [mailto:linux-cluster-bounces at redhat.com] *On Behalf Of *Bennie
>>>       
> Thomas
>   
>>> *Sent:* Monday, March 24, 2008 11:22 AM
>>> *To:* linux clustering
>>> *Subject:* Re: [Linux-cluster] 3 node cluster problems
>>>
>>> try removing the fully qualified hostname from the cluster.conf
file.
>>>
>>>
>>> Dalton, Maurice wrote:
>>>
>>> I have NO fencing equipment
>>>
>>> I have been task to setup a 3 node cluster
>>>
>>> Currently I have having problems getting cman(fence) to start
>>>
>>> Fence will try to start up during cman start up but will fail
>>>
>>> I tried to run /sbin/fenced -D - I get the following
>>>
>>> 1206373475 cman_init error 0 111
>>>
>>> Here's my cluster.conf file
>>>
>>> <?xml version="1.0"?>
>>>
>>> <cluster alias="csarcsys51" config_version="26" name="csarcsys51">
>>>
>>> <fence_daemon clean_start="0" post_fail_delay="0"
>>>     
>>>       
>> post_join_delay="3"/>
>>   
>>     
>>> <clusternodes>
>>>
>>> <clusternode name="csarcsys1-eth0.xxx.xxxx.nasa.gov" nodeid="1"
>>>     
>>>       
>> votes="1">
>>   
>>     
>>> <fence/>
>>>
>>> </clusternode>
>>>
>>> <clusternode name="csarcsys2-eth0.xxx.xxxx.nasa.gov" nodeid="2"
>>>     
>>>       
>> votes="1">
>>   
>>     
>>> <fence/>
>>>
>>> </clusternode>
>>>
>>> <clusternode name="csarcsys3-eth0.xxx.xxxxnasa.gov" nodeid="3"
>>>     
>>>       
>> votes="1">
>>   
>>     
>>> <fence/>
>>>
>>> </clusternode>
>>>
>>> </clusternodes>
>>>
>>> <cman/>
>>>
>>> <fencedevices/>
>>>
>>> <rm>
>>>
>>> <failoverdomains>
>>>
>>> <failoverdomain name="csarcsys-fo" ordered="1" restricted="0">
>>>
>>> <failoverdomainnode name="csarcsys1-eth0.xxx.xxxx.nasa.gov"
>>>     
>>>       
>> priority="1"/>
>>   
>>     
>>> <failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"
>>>     
>>>       
>> priority="1"/>
>>   
>>     
>>> <failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"
>>>     
>>>       
>> priority="1"/>
>>   
>>     
>>> </failoverdomain>
>>>
>>> </failoverdomains>
>>>
>>> <resources>
>>>
>>> <ip address="xxx.xxx.xxx.xxx" monitor_link="1"/>
>>>
>>> <fs device="/dev/sdc1" force_fsck="0" force_unmount="1" fsid="57739"

>>> fstype="ext3" mountpo
>>>
>>> int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>
>>>
>>> <nfsexport name="csarcsys-export"/>
>>>
>>> <nfsclient name="csarcsys-nfs-client" options="no_root_squash,rw" 
>>> path="/csarc-test" targe
>>>
>>> t="xxx.xxx.xxx.*"/>
>>>
>>> </resources>
>>>
>>> </rm>
>>>
>>> </cluster>
>>>
>>> Messages from the logs
>>>
>>> ar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
>>> Refusing connection.
>>>
>>> Mar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Error while processing 
>>> connect: Connection refused
>>>
>>> Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
>>> Refusing connection.
>>>
>>> Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Error while processing 
>>> connect: Connection refused
>>>
>>> Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
>>> Refusing connection.
>>>
>>> Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Error while processing 
>>> connect: Connection refused
>>>
>>> Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
>>> Refusing connection.
>>>
>>> Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Error while processing 
>>> connect: Connection refused
>>>
>>> Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
>>> Refusing connection.
>>>
>>> Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Error while processing 
>>> connect: Connection refused
>>>
>>>  
>>>
>>>     
>>>       
>
------------------------------------------------------------------------
>   
>>   
>>     
>>>   
>>>  
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>>     
>>>       
>
------------------------------------------------------------------------
>   
>>   
>>     
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>     
>>>       
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>   
>>     
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080325/37e0a037/attachment.htm>


More information about the Linux-cluster mailing list