[Linux-cluster] Problems with failover and services

Jonathan Daniels jon.daniels at voxsurf.com
Tue Oct 17 16:54:45 UTC 2006


Hi Robert,

Thanks for this. However, the init scripts run OK from the command line, 
but the rgmanager never starts them, i.e. it never executes 
/etc/init.d/httpd start (for example), and they were never visible in 
the "Cluster Management" tab.

I think the root of the problem here was some mismatched versioning. I 
had kernel 2.6.9-34.0.1, cman-kernel-smp-2.6.9-43.8.3.i686, 
cman-kernheaders-2.6.9-43.8.3.i686 and rgmanager-1.9.54-1.i386. Rolling 
back to kernel version 2.6.9-43, cman version 2.6.9-43.8, while keeping 
rgmanager-1.9.54-1.i386 seems to yield some better results in terms of 
the ability to view and manage sercvices. I'm still running tests on the 
failover so I will post updates as I get them :)

The RG manager lockup problem seems to resolve itself when I send about 
10 - 15 "kill -9"s to the rgmanager pid.

Thanks,
Jon




Robert Peterson wrote:

> Jonathan Daniels wrote:
>
>> Hi Linux Clusterers,
>>
>> I have set up the following cluster environment:
>>
>> 2 x HP DL385, with RedHat EL4 Update 3. These are the clustered nodes
>> RedHat Cluster Suite 4 on each node
>> Apache 2.2.2 on each node
>> A dummy daemon on each node
>>
>> Initial problem:
>>
>> RHEL4 U3 kernel version 2.6.9-34
>> CMan kernel/headers 2.6.9-43.8
>>
>> I created a simple 2 node cluster running Apache httpd server. When 
>> it started up as normal the virtual IP was in place and the apache 
>> daemon was running on the 'owning' server. However whenever I failed 
>> over (by shutting down network services), the floating IP doesn't get 
>> assigned to the standby server, and the apache daemon never starts on 
>> that standby server.
>>
>> I was also having deadlocks between CMan and RGManager and found that 
>> this was due to a known and fixed bug in RHEL4U3 and Cman so I 
>> upgraded them to the following:
>>
>> RHEL4 U3 kernel version 2.6.9-34.0.1
>> CMan kernel/headers 2.6.9-43.8.3
>>
>> Now I start up the "system-cluster-config" and see no services at 
>> all. I also removed GFS but I have known RHCS to run without GFS, and 
>> in any case the two apache servers and dummy daemons do not share 
>> storage - I simply want to perform the failover initially.
>>
>> Anyone have any workarounds?
>>
>> Many thanks,
>> Jon
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
> Hi Jon,
>
> You may also be another victim of the init-scripts-not-returning-zero 
> thing.
> See: http://sources.redhat.com/cluster/faq.html#rgm_wontrestart
>
> Regards,
>
> Bob Peterson
> Red Hat Cluster Suite
>




More information about the Linux-cluster mailing list