[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] cluster service not running any more



Hi folks,

I have setup a cluster on 5.2 with system-config-cluster. It is quite simple: the only service is an ip ressource that is switched.

The cluster has started up fine the first time, the virtual ip was where ist belonged. Since then I have not changed anything, I simply had to restart the machines for other reasons.

Now nothing works as it should:
- shutting down clurgmgrd normally (service rgmanager stop) is impossible; even kill -9 does not work. I have to call "reboot" twice to force a reboot to stop clurgmgrd. - after reboot I can manually start the cluster again (did not venture to do it with system startup), the daemons start, nothing unusual is logged, but
 a) the service containing the ip ressource is not started
b) clustat on the primary node moans a "timed out trying to connect to Ressource Group Manager" c) clustat on both nodes shows the node state, but does not list the service

I have tried everything to get the environement clean (shutdown the firewall, set selinux to permissive, etc.), but the result is always the same. Since I did not change anything after the first successfull start of the cluster, I wonder - if there is some run time data/temporary files the ressource group manager writes to disk and tries to reread after reboot (remember, I had to kill it by violent force to be able to reboot my machines) - if it is possible at all to successfully run a cluster with cman and clurgmgrd.

In case it helps here is my cluster.conf:

<?xml version="1.0" ?>
<cluster config_version="5" name="GatewayCluster">
	<fence_daemon post_fail_delay="0" post_join_delay="3"/>
	<clusternodes>
		<clusternode name="rtr1hb" nodeid="1" votes="1">
			<fence>
				<method name="1">
					<device name="fence1" nodename="rtr1hb"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="rtr2hb" nodeid="2" votes="1">
			<fence>
				<method name="1">
					<device name="fence2" nodename="rtr2hb"/>
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<cman expected_votes="1" two_node="1"/>
	<fencedevices>
		<fencedevice agent="fence_manual" name="fence1"/>
		<fencedevice agent="fence_manual" name="fence2"/>
	</fencedevices>
	<rm>
		<failoverdomains>
			<failoverdomain name="Gateway1" ordered="1" restricted="1">
				<failoverdomainnode name="rtr1hb" priority="1"/>
				<failoverdomainnode name="rtr2hb" priority="2"/>
			</failoverdomain>
		</failoverdomains>
		<resources>
			<ip address="IP Address" monitor_link="1"/>
		</resources>
		<service autostart="1" domain="Gateway1" name="Gateway1-IP">
			<ip ref="IP Address"/>
		</service>
	</rm>
</cluster>

The logs show the nodes successfully joining the cluster and such stuff and as last clurgmgrd starting, then nothing more from cluster daemons.

Any hint or help is appreciated. I am stuck and do not know where to look at.

Dirk


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]