<!-- This is just a wrapper for LSB init scripts, so monitor
and status can't have a timeout, nor do they do any extra
work regardless of the depth -->
<action name="status" interval="30s" timeout="0"/>
<action name="monitor" interval="30s" timeout="0"/>
<action name="meta-data" timeout="0"/>
<action name="verify-all" timeout="0"/>
I am running a four node GFS cluster with about 20 services per node. All four nodes belong to the same failover domain, and they each have a priority of 1. My shared storage is an iSCSI SAN.After rgmanager has been running for a couple of days, clustat produces the following result on all four nodes:
Timed out waiting for a response from Resource Group Manager
Member Status: Quorate
Member Name Status
------ ---- ------
node01 Online, rgmanager
node02 Online, Local, rgmanager
node03 Online, rgmanager
node04 Online, rgmanager
I also get a time out when I try to determine the status of a particular service with "clustat -s servicename".
All of the services seem to be up and running, but clustat does not work. Is there something wrong? Is there a way for me to increase the time out?
clurgmgrd and dlm_recvd seem to be using a lot of CPU cycles on Node02, 40 and 60 percent, respectively.
Thank you for your help.