[Linux-cluster] RHEL/RHCS3: /usr/lib/clumanager/services/service status # stays up

Thu Aug 18 20:12:25 UTC 2005

I'm running a new instance of RHCS on RHEL3 and am having an issue  
where I get many instances (over how many ever days of the machine  
running) of the following:

/bin/bash /usr/lib/clumanager/services/service status 1

all showing up when I do a ps auxww | grep status. The number on the  
end changes and is not always the same but currently on my system I  
have status 1 and status 0 both "stuck" running. These happen to be  
checks for mysql (1) and httpd (0), both of which are using standard  
redhat startup/shutdown/status scripts.

If I kill them, the service that it is associated with restarts  
thinking that the result didn't return back correctly. Not desirable  
since the service is actually up.

The number of occurrences has been greatly reduced since I increased  
the time between checks from 1 to 10. I didn't realize it was in  
seconds (RTFM) and so I'll probably boost that up to 30 or 60 seconds.

Anyway, in an attempt to debug this, I started a while loop that  
called the above statement with a -x after the bash and found that  
the command occassionally hangs at

+ retVal=0
+ '[' -n 6 -a -n 5 -a 6 -le 5 ']'
+ return 3
+ return 0
+ rm -f /tmp/cluster-httpd_status.z16209
+ ip status 0

Anybody venture a guess as to why this might be occurring? And are my  
check intervals too low?

Thanks,
Tarun