Re: [Linux-cluster] Odd cluster problems

Lon Hohberger wrote:
On Tue, Jul 31, 2007 at 10:48:44AM -0500, Jay Leafey wrote:
I've got a 3-node cluster running CentOS 4.5 and I cannot communicate with the resource group manager. When I use the clustat command I get a timeout:

[root rapier ~]# clustat
Timed out waiting for a response from Resource Group Manager
Member Status: Quorate

 Member Name                              Status
 ------ ----                              ------
 rapier.utmem.edu                         Online, Local, rgmanager
 thorax.utmem.edu                         Offline
 cyclops.utmem.edu                        Online, rgmanager

Fence Domain:    "default"                           2   2 recover 4 -
[1 2]

Until fencing completes, rgmanager won't respond.

fence_ack_manual needs to be run.


User:            "usrm::manager"                    10  10 recover 2 -
[1 2]

Your reply was a bit confusing at first, but looking deeper showed you were right on the mark. The systems (using HP ILO fencing) were unable to communicate with each other very well or with the ILO ports at all. Turns out some of the ports they were configured on had been moved to a different VLAN, so the network was split between the ILOs and the host ports.

Configuring the ports properly seems to have resolved the issue, everything is working fine now. I guess I just need to keep the rubber hose handy for "discussions" with the network guys! (grin!)

Jay Leafey - University of Tennessee
E-Mail:  jleafey utmem edu  Phone:  901-448-6534  FAX:  901-448-8199

