[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Help with a two node cluster for a webserverneeded



Hello Lon,

Am Donnerstag 24 Januar 2008 00:12:41 schrieb Lon Hohberger:
> On Wed, 2008-01-23 at 17:53 -0500, Lon Hohberger wrote:
> > On Tue, 2008-01-15 at 15:45 +0100, Holger L. Ratzel wrote:
> > > Hi,
> > >
> > > Am Montag 14 Januar 2008 21:03:46 schrieb Lon Hohberger:
> > > > So, what was happening was this:
> > >
> > > [...]
> > >
> > > > First, let's ping the router with the cable unplugged to see how long
> > > > it takes for our heuristic to complete when things are "broken".  On
> > > > my machine:
> > > >
> > > > [lhh ayanami ~]$ time ping -c1 -t1 frederick
> > > > PING frederick (12.1.2.99) 56(84) bytes of data.
> > > >
> > > > >From ayanami (12.1.2.37) icmp_seq=1 Destination Host Unreachable
> >
> > Holger,
> >
> > Digging deeper -- for some reason, ping occasionally doesn't exit for
> > some reason if you make the dest IP unreachable, but only if started
> > from the init script - e.g. 'service qdiskd start'.
> >
> > I was working with someone today and we reproduced it.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=429927
>
> Very, very strange indeed.

I've tried to implement the workaround given in the bug report:

- I've created a wrapper for ping (copied your attachment)
- Changed cluster.conf to give qdiskd more time to finish its job
  (see attached cluster.conf)

Now qudiskd occasionaly reports the heuristik to be down (the network isn't 
touched, no cable pulled):

Jan 25 15:17:34 testcluster-2 qdiskd[2151]: <info> 
Heuristic: 'ping-wrap -c3 -t1 10.200.10.1' DOWN (1/1)
Jan 25 15:17:36 testcluster-2 qdiskd[2151]: <notice> Score insufficient for 
master operation (0/1; required=1); downgrading

The result is that this node gets fenced an will reboot. This repeats after 
some time on the other node too, creating an endless loop.

Do you know when your fix will make it into the regular upgrades for RHEL5?

Regards,

	Holger

-- 
----------------- SHE - IT-Sicherheit von Experten ------------------
SHE Informationstechnologie AG
Holger L. Ratzel                               Fon:+49 621 5200 - 210 
Service Delivery & Support                     Fax:+49 621 5200 - 555
Donnersbergweg 3                                holger ratzel she net
D-67059 Ludwigshafen                              http://www.she.net/
Sitz der Gesellschaft und Registergericht Ludwigshafen HRB 4593
Aufsichtsratsvorsitzender: Ulrich Engelhardt
Vorstand: Klaus Schulz
-------------------- while( !asleep( ) ) ++sheep; -------------------

PGP-Fingerprint:
9A 73 40 22 72 64 BE D1  D8 1A 54 3C 5B 64 AF C3  CC E3 CA A8
Get my PGP public key at: http://pgp.she.net/
<?xml version="1.0"?>
<cluster alias="Test" config_version="30" name="Test">
	<quorumd interval="5" label="Qdisk1" tko="3" votes="1">
		<heuristic interval="5" program="ping-wrap -c3 -t1 10.200.10.1" score="1" tko="1"/>
	</quorumd>
	<fence_daemon post_fail_delay="0" post_join_delay="3"/>
	<clusternodes>
		<clusternode name="testcluster-2" nodeid="2" votes="1">
			<fence>
				<method name="1">
					<device name="RPS"/>
				</method>
			</fence>
			<multicast addr="224.0.0.10" interface="eth0"/>
		</clusternode>
		<clusternode name="testcluster-1" nodeid="1" votes="1">
			<fence>
				<method name="1">
					<device name="RPS"/>
				</method>
			</fence>
			<multicast addr="224.0.0.10" interface="eth0"/>
		</clusternode>
	</clusternodes>
	<cman expected_votes="3" two_node="0">
		<multicast addr="224.0.0.10"/>
	</cman>
	<fencedevices>
		<fencedevice agent="fence_rps10" device="/dev/ttyS0" name="RPS" option="reboot" port="0"/>
	</fencedevices>
	<rm>
		<failoverdomains>
			<failoverdomain name="Apache" ordered="1" restricted="1">
				<failoverdomainnode name="testcluster-1" priority="1"/>
				<failoverdomainnode name="testcluster-2" priority="2"/>
			</failoverdomain>
		</failoverdomains>
		<resources>
			<ip address="10.200.10.189" monitor_link="1"/>
			<script file="/etc/init.d/httpd" name="Apache"/>
			<fs device="/dev/sdb1" force_fsck="0" force_unmount="1" fsid="26076" fstype="ext3" mountpoint="/data/httpd" name="DISK_Apache" options="" self_fence="0"/>
		</resources>
		<service autostart="1" domain="Apache" name="HTTPD">
			<ip ref="10.200.10.189"/>
			<script ref="Apache"/>
			<fs ref="DISK_Apache"/>
		</service>
	</rm>
	<totem token="40000"/>
</cluster>

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]