Re: [Linux-cluster] Problems with failover and services

Hi Robert,

Thanks for this. However, the init scripts run OK from the command line, but the rgmanager never starts them, i.e. it never executes /etc/init.d/httpd start (for example), and they were never visible in the "Cluster Management" tab.

I think the root of the problem here was some mismatched versioning. I had kernel 2.6.9-34.0.1, cman-kernel-smp-2.6.9-43.8.3.i686, cman-kernheaders-2.6.9-43.8.3.i686 and rgmanager-1.9.54-1.i386. Rolling back to kernel version 2.6.9-43, cman version 2.6.9-43.8, while keeping rgmanager-1.9.54-1.i386 seems to yield some better results in terms of the ability to view and manage sercvices. I'm still running tests on the failover so I will post updates as I get them :)

The RG manager lockup problem seems to resolve itself when I send about 10 - 15 "kill -9"s to the rgmanager pid.


Robert Peterson wrote:

Jonathan Daniels wrote:

Hi Linux Clusterers,

I have set up the following cluster environment:

2 x HP DL385, with RedHat EL4 Update 3. These are the clustered nodes
RedHat Cluster Suite 4 on each node
Apache 2.2.2 on each node
A dummy daemon on each node

Initial problem:

RHEL4 U3 kernel version 2.6.9-34
CMan kernel/headers 2.6.9-43.8

I created a simple 2 node cluster running Apache httpd server. When it started up as normal the virtual IP was in place and the apache daemon was running on the 'owning' server. However whenever I failed over (by shutting down network services), the floating IP doesn't get assigned to the standby server, and the apache daemon never starts on that standby server.

I was also having deadlocks between CMan and RGManager and found that this was due to a known and fixed bug in RHEL4U3 and Cman so I upgraded them to the following:

RHEL4 U3 kernel version 2.6.9-34.0.1
CMan kernel/headers 2.6.9-43.8.3

Now I start up the "system-cluster-config" and see no services at all. I also removed GFS but I have known RHCS to run without GFS, and in any case the two apache servers and dummy daemons do not share storage - I simply want to perform the failover initially.

Anyone have any workarounds?

Many thanks,

Hi Jon,

You may also be another victim of the init-scripts-not-returning-zero thing.
See: http://sources.redhat.com/cluster/faq.html#rgm_wontrestart


Bob Peterson
Red Hat Cluster Suite

