[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] (new) problems with qdisk, running test rpms



On Wed, May 02, 2007 at 02:44:06PM +0100, Frederik Ferner wrote:
> Hi,
> 
> finally I had a chance to experiment with the test rpms for cman[1] that 
> should solve the problem with multiple master I had...
> 
> For these tests I was using the following rpms on RHEL4U4:
> 
> kernel-smp-2.6.9-42.0.3.EL
> cman-kernel-smp-2.6.9-45.8.1TEST
> cman-1.0.11-0.4.1qdisk
> rgmanager-1.9.54-1
> 
> To test this I have two server connected to one switch with nothing else 
> connected and one uplink. As heuristics for qdiskd I'm pinging a few IP 
> addresses outside of this switch. When I unplug the uplink with the old 
> cman installed, qdiskd on both servers immediately notice this and lower 
> the score accordingly.


> With the new version of qdiskd it seems the heuristics are not tested 
> anymore after it reaches a sufficient score once. When the outside 
> network is lost qdiskd on both server still claim the same score in the 
> status file and both servers report the votes for the qdisk to cman.

Hmm, could you add 'tko="1"' to your cluster.conf for the heuristics?  I
wonder if it's an initialization problem.

> If qdiskd is started while the outside network is unreachable the scores 
> start without the scores for the failing heuristics. Once network is 
> restored the score jumps to at least the minimum required for operation 
> and once again stays there.

> 
> Is this a bug that will be fixed in the upcoming RHEL4U5 release or 
> could there be something else wrong with my setup?

This seems to work for me:

[10538] debug: Heuristic: 'ping 192.168.79.254 -c1 -t3' missed (1/3)
[10538] debug: Heuristic: 'ping 192.168.79.254 -c1 -t3' missed (2/3)
[10538] info: Heuristic: 'ping 192.168.79.254 -c1 -t3' DOWN (3/3)
[10537] notice: Score insufficient for master operation (0/11;
required=6); downgrading

Message from syslogd green at Mon May  7 10:36:43 2007 ...
green clurgmgrd[7305]: <emerg> #1: Quorum Dissolved 

(machine rebooted)

> Here's my quorumd section from cluster.conf
> 
> -----
> 	<quorumd interval="1" tko="5" votes="3" log_level="9" 
> log_facility="local4" status_file="/tmp/qdisk_status" 
> device="/dev/emcpowerq1">
> 		<heuristic program="ping 172.23.4.254 -c1 -t1" score="1" 
> 		interval="2"/>
> 		<heuristic program="ping 130.246.8.13 -c1 -t3" score="1" 
> 		interval="2"/>
> 		<heuristic program="ping 130.246.72.21 -c1 -t3" score="1" 
> 		interval="2"/>
> 		<heuristic program="ping 172.23.5.120 -c1 -t1" score="1" 
> 		interval="2"/>
> 		<heuristic program="ping 172.23.6.229 -c1 -t1" score="1" 
> 		interval="2"/>
> 		<heuristic program="ping 172.23.7.34 -c1 -t1" score="1" 
> 		interval="2"/>
> 		<heuristic program="ping 172.23.7.35 -c1 -t1" score="1" 
> 		interval="2"/>
> 		<heuristic program="ping 172.23.6.233 -c1 -t1" score="1" 
> 		interval="2"/>
> 	</quorumd>
> -----

> If you need any more information, I happy to provide this.

Hmm, try adding tko="3" to each of your ping heuristics, like this:

 		<heuristic program="ping 172.23.6.233 -c1 -t1" score="1" 
 		interval="2" tko="3"/>

-- Lon

-- 
Lon Hohberger - Software Engineer - Red Hat, Inc.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]