[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] having problems trying to setup a two node cluster



vahram wrote:
Rick Stevens wrote:


I had a similar issue. The problem was with the multicast routing. I was using two NICs on each node...one public (eth0) and one private (eth1), with the default gateway going out eth0.

The route for the multicast (224.x.x.x) was going out the default
gateway and not reaching the other machine.  By putting in a fixed route
in for multicast:

route add -net 224.0.0.0/8 dev eth1

it all started working.  This was my fix, it may not work for you.
Also, I use the CVS code from http://sources.redhat.com/cluster and
not the source RPMs from where you specified.
----------------------------------------------------------------------
- Rick Stevens, Senior Systems Engineer     rstevens vitalstream com -
- VitalStream, Inc.                       http://www.vitalstream.com -
-                                                                    -
-     Veni, Vidi, VISA:  I came, I saw, I did a little shopping.     -
----------------------------------------------------------------------

--
Linux-cluster mailing list
Linux-cluster redhat com
http://www.redhat.com/mailman/listinfo/linux-cluster


Yeap, both boxes have two NICs. eth0 is public, and eth1 is private (192.168.2.x). I tried adding the route, and that didn't fix it. I've also tried disabling the private NIC before and running with one public NIC, and that didn't fix it either. One other interesting thing I noticed...when I run cman_tool join on nodeA, netstat shows ccsd trying to do this:

tcp 0 0 127.0.0.1:50006 127.0.0.1:739 TIME_WAIT -
tcp 0 0 127.0.0.1:50006 127.0.0.1:738 TIME_WAIT -
tcp 0 0 127.0.0.1:50006 127.0.0.1:737 TIME_WAIT -
tcp 0 0 127.0.0.1:50006 127.0.0.1:736 TIME_WAIT -
tcp 0 0 127.0.0.1:50006 127.0.0.1:743 TIME_WAIT -
tcp 0 0 127.0.0.1:50006 127.0.0.1:742 TIME_WAIT -
tcp 0 0 127.0.0.1:50006 127.0.0.1:741 TIME_WAIT -
tcp 0 0 127.0.0.1:50006 127.0.0.1:740 TIME_WAIT -
tcp 0 0 127.0.0.1:50006 127.0.0.1:727 TIME_WAIT -
tcp 0 0 127.0.0.1:50006 127.0.0.1:731 TIME_WAIT -
tcp 0 0 127.0.0.1:50006 127.0.0.1:730 TIME_WAIT -
tcp 0 0 127.0.0.1:50006 127.0.0.1:729 TIME_WAIT -
tcp 0 0 127.0.0.1:50006 127.0.0.1:728 TIME_WAIT -
tcp 0 0 127.0.0.1:50006 127.0.0.1:735 TIME_WAIT -
tcp 0 0 127.0.0.1:50006 127.0.0.1:734 TIME_WAIT -
tcp 0 0 127.0.0.1:50006 127.0.0.1:733 TIME_WAIT -
tcp 0 0 127.0.0.1:50006 127.0.0.1:732 TIME_WAIT -



Looking back at your cluster.conf, I see you're using broadcast. I used multicast because, in the first CVS checkout I did, broadcast didn't work properly. It's possible your SRPMs also have that flaw. Why not try multicast and see if that works. Add that route I mentioned and here's my cluster.conf which you can crib:

<?xml version="1.0"?>
<cluster name="test" config_version="1">


<cman two-node="1" expected_votes="1"> <multicast addr="224.0.0.1"/> </cman>


<nodes> <node name="gfs-01-001" votes="1"> <multicast addr="224.0.0.1" interface="eth1"/> <fence> <method name="single"> <device name="human" ipaddr="gfs-01-001"/> </method> </fence> </node>


<node name="gfs-01-002" votes="1"> <multicast addr="224.0.0.1" interface="eth1"/> <fence> <method name="single"> <device name="human" ipaddr="gfs-01-002"/> </method> </fence> </node> </nodes>


<fence_devices> <device name="human" agent="fence_manual"/> </fence_devices> </cluster>

----------------------------------------------------------------------
- Rick Stevens, Senior Systems Engineer     rstevens vitalstream com -
- VitalStream, Inc.                       http://www.vitalstream.com -
-                                                                    -
-  What's small, yellow and very, VERY dangerous?  The root canary!  -
----------------------------------------------------------------------


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]