[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] 3 node cluster problems



I believe some of the cisco switches do not have multicast enabled by default which would prevent some of the cluster communications from getting through properly.

   http://kbase.redhat.com/faq/FAQ_51_11755

John

Bennie Thomas wrote:
Are you using a private vlan for your cluster communications. If not, you should be. the communicatuions
between the clustered nodes is very chatty Just my opinion.

These are my opinions and experiences.

Any views or opinions presented are solely those of the author and do not necessarily represent those of Raytheon unless specifically stated. Electronic communications including email might be monitored by Raytheon. for operational or business reasons.


Dalton, Maurice wrote:
Cisco 3550


-----Original Message-----
From: linux-cluster-bounces redhat com
[mailto:linux-cluster-bounces redhat com] On Behalf Of Bennie Thomas
Sent: Thursday, March 27, 2008 9:53 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

what is the switch brand.   I have read where the RHCS has problems with

certain switches

Dalton, Maurice wrote:
Switches

Storage is fiber


-----Original Message-----
From: linux-cluster-bounces redhat com
[mailto:linux-cluster-bounces redhat com] On Behalf Of Bennie Thomas
Sent: Thursday, March 27, 2008 9:04 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

How is your Cluster connections connected. (ie. Are you using a hub,switch or direct connecting the heartbeat cables) ?

Dalton, Maurice wrote:
Still having the problem. I can't figure it out.
I just upgraded to the latest 5.1 cman.. No help.!!!!!!!!!


-----Original Message-----
From: linux-cluster-bounces redhat com
[mailto:linux-cluster-bounces redhat com] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:57 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems


Glad they are working. I have not used lvm with our Clusters. You
know
have peaked
my curiosity and I will have to try building one. So were you also
using
GFS ?

Dalton, Maurice wrote:
Sorry but security here will not allow me to send host files

BUT.


I was getting this in /var/log/messages on csarcsys3

Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused
Mar 25 15:26:12 csarcsys3-eth0 dlm_controld[7476]: connect to ccs
error
-111, check ccsd or cluster status
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused


I had /dev/vg0/gfsvol on these systems.

I did a lvremove
Restarted cman on all systems and for some strange reason my
clusters
are working.

It doesn't make any sense.

I can't thank you enough for your help.......!!!!!!


Thanks.


-----Original Message-----
From: linux-cluster-bounces redhat com
[mailto:linux-cluster-bounces redhat com] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:27 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

I am currently running several 3-node cluster without a quorum disk.

However, If you want your cluster to run
if only one node is up then you will need a quorum disk. Can you
send
your /etc/hosts file
for all systems, Also, could there be another node name called csarcsys3-eth0 in your NIS or DNS

I configured some using Conga and some with system-config-cluster.
When
using the system-config-cluster
I basically run the config on all nodes; just adding the nodenames
and
cluster name. I reboot all nodes
to make sure they see each other then go back and modify the config
files.

The file /var/log/messages should also shed some light on the
problem.
Dalton, Maurice wrote:
Same problem.

I now have qdiskd running.

I have ran diff's on all three cluster.conf files.. all are the
same
[root csarcsys1-eth0 cluster]# more cluster.conf

<?xml version="1.0"?>

<cluster config_version="6" name="csarcsys5">

<fence_daemon post_fail_delay="0" post_join_delay="3"/>

<clusternodes>

<clusternode name="csarcsys1-eth0" nodeid="1" votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys2-eth0" nodeid="2" votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys3-eth0" nodeid="3" votes="1">

<fence/>

</clusternode>

</clusternodes>

<cman/>

<fencedevices/>

<rm>

<failoverdomains>

<failoverdomain name="csarcsysfo" ordered="0" restricted="1">

<failoverdomainnode name="csarcsys1-eth0" priority="1"/>

<failoverdomainnode name="csarcsys2-eth0" priority="1"/>

<failoverdomainnode name="csarcsys3-eth0" priority="1"/>

</failoverdomain>

</failoverdomains>

<resources>

<ip address="172.24.86.177" monitor_link="1"/>

<fs device="/dev/sdc1" force_fsck="0" force_unmount="1"
fsid="57739"
fstype="ext3" mountpo

int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>

</resources>

</rm>

<quorumd interval="4" label="csarcsysQ" min_score="1" tko="30"
votes="2"/>
</cluster>

More info from csarcsys3

[root csarcsys3-eth0 cluster]# clustat

msg_open: No such file or directory

Member Status: Inquorate

Member Name ID Status

------ ---- ---- ------

csarcsys1-eth0 1 Offline

csarcsys2-eth0 2 Offline

csarcsys3-eth0 3 Online, Local

/dev/sdd1 0 Offline

[root csarcsys3-eth0 cluster]# mkqdisk -L

mkqdisk v0.5.1

/dev/sdd1:

Magic: eb7a62c2

Label: csarcsysQ

Created: Wed Feb 13 13:44:35 2008

Host: csarcsys1-eth0.xxx.xxx.nasa.gov

[root csarcsys3-eth0 cluster]# ls -l /dev/sdd1

brw-r----- 1 root disk 8, 49 Mar 25 14:09 /dev/sdd1

clustat from csarcsys1

msg_open: No such file or directory

Member Status: Quorate

Member Name ID Status

------ ---- ---- ------

csarcsys1-eth0 1 Online, Local

csarcsys2-eth0 2 Online

csarcsys3-eth0 3 Offline

/dev/sdd1 0 Offline, Quorum Disk

[root csarcsys1-eth0 cluster]# ls -l /dev/sdd1

brw-r----- 1 root disk 8, 49 Mar 25 14:19 /dev/sdd1

mkqdisk v0.5.1

/dev/sdd1:

Magic: eb7a62c2

Label: csarcsysQ

Created: Wed Feb 13 13:44:35 2008

Host: csarcsys1-eth0.xxx.xxx.nasa.gov

Info from csarcsys2

root csarcsys2-eth0 cluster]# clustat

msg_open: No such file or directory

Member Status: Quorate

Member Name ID Status

------ ---- ---- ------

csarcsys1-eth0 1 Offline

csarcsys2-eth0 2 Online, Local

csarcsys3-eth0 3 Offline

/dev/sdd1 0 Online, Quorum Disk

*From:* linux-cluster-bounces redhat com [mailto:linux-cluster-bounces redhat com] *On Behalf Of *Panigrahi,

Santosh Kumar
*Sent:* Tuesday, March 25, 2008 7:33 AM
*To:* linux clustering
*Subject:* RE: [Linux-cluster] 3 node cluster problems

If you are configuring your cluster by system-config-cluster then
no
need to run ricci/luci. Ricci/luci needed for configuring the
cluster
using conga. You can configure in either ways.

On seeing your clustat command outputs, it seems cluster is partitioned (spilt brain) into 2 sub clusters [Sub1-* **(csarcsys1-eth0, csarcsys2-eth0*) 2-* **csarcsys3-eth0*]. Without
a
quorum device you can more often face this situation. To avoid this

you can configure a quorum device with a heuristic like ping
message.
Use the link
(http://www.redhatmagazine.com/2007/12/19/enhancing-cluster-quorum-with-
qdisk/)
for configuring a quorum disk in RHCS.

Thanks,

S

-----Original Message-----
From: linux-cluster-bounces redhat com [mailto:linux-cluster-bounces redhat com] On Behalf Of Dalton,
Maurice
Sent: Tuesday, March 25, 2008 5:18 PM
To: linux clustering
Subject: RE: [Linux-cluster] 3 node cluster problems

Still no change. Same as below.

I completely rebuilt the cluster using system-config-cluster

The Cluster software was installed from rhn, luci and ricci are
running.
This is the new config file and it has been copied to the 2 other

systems

[root csarcsys1-eth0 cluster]# more cluster.conf

<?xml version="1.0"?>

<cluster config_version="5" name="csarcsys5">

<fence_daemon post_fail_delay="0" post_join_delay="3"/>

<clusternodes>

<clusternode name="csarcsys1-eth0" nodeid="1" votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys2-eth0" nodeid="2" votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys3-eth0" nodeid="3" votes="1">

<fence/>

</clusternode>

</clusternodes>

<cman/>

<fencedevices/>

<rm>

<failoverdomains>

<failoverdomain name="csarcsysfo" ordered="0"

restricted="1">

<failoverdomainnode

name="csarcsys1-eth0" priority="1"/>

<failoverdomainnode

name="csarcsys2-eth0" priority="1"/>

<failoverdomainnode

name="csarcsys3-eth0" priority="1"/>

</failoverdomain>

</failoverdomains>

<resources>

<ip address="172.xx.xx.xxx" monitor_link="1"/>

<fs device="/dev/sdc1" force_fsck="0"

force_unmount="1" fsid="57739" fstype="ext3" mountpo

int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>

</resources>

</rm>

</cluster>

-----Original Message-----

From: linux-cluster-bounces redhat com

[mailto:linux-cluster-bounces redhat com] On Behalf Of Bennie
Thomas
Sent: Monday, March 24, 2008 4:17 PM

To: linux clustering

Subject: Re: [Linux-cluster] 3 node cluster problems

Did you load the Cluster software via Conga or manually ? You would
have
had to load

luci on one node and ricci on all three.

Try copying the modified /etc/cluster/cluster.conf from csarcsys1
to
the
other two nodes.

Make sure you can ping the private interface to/from all nodes and

reboot. If this does not work

post your /etc/cluster/cluster.conf file again.

Dalton, Maurice wrote:

Yes
      I also rebooted again just now to be sure.
      -----Original Message-----
      From: linux-cluster-bounces redhat com
      [mailto:linux-cluster-bounces redhat com] On Behalf Of Bennie
Thomas
      Sent: Monday, March 24, 2008 3:33 PM
      To: linux clustering
      Subject: Re: [Linux-cluster] 3 node cluster problems
When you changed the nodenames in the /etc/lcuster/cluster.conf
and
made

sure the /etc/hosts
      file had the correct nodenames (Ie. 10.0.0.100 csarcsys1-eth0
      csarcsys1-eth0.xxxx.xxxx.xxx.)
      Did you reboot all the nodes at the sametime ?
      Dalton, Maurice wrote:
No luck. It seems as if csarcsys3 thinks its in his own cluster
I renamed all config files and rebuilt from system-config-cluster
        Clustat command from csarcsys3
        [root csarcsys3-eth0 cluster]# clustat
        msg_open: No such file or directory
        Member Status: Inquorate
        Member Name ID Status
        ------ ---- ---- ------
        csarcsys1-eth0 1 Offline
        csarcsys2-eth0 2 Offline
        csarcsys3-eth0 3 Online, Local
        clustat command from csarcsys2
        [root csarcsys2-eth0 cluster]# clustat
        msg_open: No such file or directory
        Member Status: Quorate
        Member Name ID Status
        ------ ---- ---- ------
        csarcsys1-eth0 1 Online
        csarcsys2-eth0 2 Online, Local
        csarcsys3-eth0 3 Offline
        -----Original Message-----
        From: linux-cluster-bounces redhat com
[mailto:linux-cluster-bounces redhat com] On Behalf Of Bennie
Thomas
Sent: Monday, March 24, 2008 2:25 PM
        To: linux clustering
        Subject: Re: [Linux-cluster] 3 node cluster problems
You will also, need to make sure the clustered nodenames are in
your
/etc/hosts file.
Also, make sure your cluster network interface is up on all nodes
and
that the
        /etc/cluster/cluster.conf are the same on all nodes.
        Dalton, Maurice wrote:
The last post is incorrect.
          Fence is still hanging at start up.

          Here's another log message.
          Mar 24 19:03:14 csarcsys3-eth0 ccsd[6425]: Error while
processing
          connect: Connection refused
Mar 24 19:03:15 csarcsys3-eth0 dlm_controld[6453]: connect to
ccs
          error -111, check ccsd or cluster status
          *From:* linux-cluster-bounces redhat com
[mailto:linux-cluster-bounces redhat com] *On Behalf Of *Bennie
Thomas
*Sent:* Monday, March 24, 2008 11:22 AM
          *To:* linux clustering
          *Subject:* Re: [Linux-cluster] 3 node cluster problems
try removing the fully qualified hostname from the cluster.conf
file.

Dalton, Maurice wrote:
          I have NO fencing equipment
          I have been task to setup a 3 node cluster
Currently I have having problems getting cman(fence) to start Fence will try to start up during cman start up but will fail
          I tried to run /sbin/fenced -D - I get the following
          1206373475 cman_init error 0 111
          Here's my cluster.conf file
          <?xml version="1.0"?>
          <cluster alias="csarcsys51" config_version="26"
name="csarcsys51">
          <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
<clusternodes>
<clusternode name="csarcsys1-eth0.xxx.xxxx.nasa.gov" nodeid="1"
votes="1">
<fence/>
          </clusternode>
<clusternode name="csarcsys2-eth0.xxx.xxxx.nasa.gov" nodeid="2"
votes="1">
<fence/>
          </clusternode>
<clusternode name="csarcsys3-eth0.xxx.xxxxnasa.gov" nodeid="3"
votes="1">
<fence/>
          </clusternode>
          </clusternodes>
          <cman/>
          <fencedevices/>
          <rm>
          <failoverdomains>
<failoverdomain name="csarcsys-fo" ordered="1" restricted="0"> <failoverdomainnode name="csarcsys1-eth0.xxx.xxxx.nasa.gov"
priority="1"/>
<failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"
priority="1"/>
<failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"
priority="1"/>
</failoverdomain>
          </failoverdomains>
          <resources>
          <ip address="xxx.xxx.xxx.xxx" monitor_link="1"/>
          <fs device="/dev/sdc1" force_fsck="0" force_unmount="1"
fsid="57739"
fstype="ext3" mountpo
          int="/csarc-test" name="csarcsys-fs" options="rw"
self_fence="0"/>
          <nfsexport name="csarcsys-export"/>
          <nfsclient name="csarcsys-nfs-client"
options="no_root_squash,rw"
          path="/csarc-test" targe
          t="xxx.xxx.xxx.*"/>
          </resources>
          </rm>
          </cluster>
          Messages from the logs
ar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Cluster is not
quorate.
          Refusing connection.
          Mar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Error while
processing
          connect: Connection refused
Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Cluster is not
quorate.
Refusing connection.
          Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Error while
processing
          connect: Connection refused
Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Cluster is not
quorate.
Refusing connection.
          Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Error while
processing
          connect: Connection refused
Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Cluster is not
quorate.
Refusing connection.
          Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Error while
processing
          connect: Connection refused
Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Cluster is not
quorate.
Refusing connection.
          Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Error while
processing
          connect: Connection refused
------------------------------------------------------------------------
--
          Linux-cluster mailing list
Linux-cluster redhat com <mailto:Linux-cluster redhat com>
          https://www.redhat.com/mailman/listinfo/linux-cluster
------------------------------------------------------------------------
--
          Linux-cluster mailing list
          Linux-cluster redhat com
          https://www.redhat.com/mailman/listinfo/linux-cluster
--
        Linux-cluster mailing list
        Linux-cluster redhat com
        https://www.redhat.com/mailman/listinfo/linux-cluster
        --
        Linux-cluster mailing list
        Linux-cluster redhat com
        https://www.redhat.com/mailman/listinfo/linux-cluster
--
      Linux-cluster mailing list
      Linux-cluster redhat com
      https://www.redhat.com/mailman/listinfo/linux-cluster
      --
      Linux-cluster mailing list
      Linux-cluster redhat com
      https://www.redhat.com/mailman/listinfo/linux-cluster
--

Linux-cluster mailing list

Linux-cluster redhat com

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster redhat com

https://www.redhat.com/mailman/listinfo/linux-cluster


------------------------------------------------------------------------
--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]