Re: [Linux-cluster] node failing

On Wed, 2004-07-14 at 11:48 +1200, Royce Brown wrote:

> I am trying to track down a problem I’ll been having with the
> clustering software on redhat 3.0 (supplied rpm’s).  

This would be taroon-list material, actually.

> I am running a 2 cluster node using Multicast Heartbeat, Network
> Tiebreaker IP address and have bonded Ethernet interfaces to different
> switches. 

Good.  Try running in HA-bonded/failover mode if you're not already.

> There is no networking problems that I can see. On the bad node I can
> ping the other node by it’s address and the multicast address. I have
> full debug mode on, but the log files don’t show anything.

You should file a support ticket with Red Hat Support:


> Has any one else seen this problem or can give me some tips what to
> look at next ?

Try the latest package from the RHN beta channel if you have access to
it, it fixes a problem which causes membership to enter an infinite loop
in some cases where timeouts occurred.  The infinite loop causes
multiple clumembd (or cluquorumd) processes to appear.

Here's a ref to the bugzilla:


-- Lon

