[Linux-cluster] Testing Failover - Failing in few cases

Wed May 30 15:15:13 UTC 2007

Has the cluster suite finally been released for RHEL4u5 at this point? 
I'm still not seeing it as an ISO download on RHN.

rhurst at bidmc.harvard.edu wrote:
> We second that motion -- skip U4 altogether, go directly to U5.
> 
> 
> On Wed, 2007-05-30 at 16:55 +0200, Hagmann, Michael wrote:
> 
>> Hi 
> 
>>   
> 
>> First of all when you really have RHEL4 update4, then you should 
>> update to RHEL4 update5 befor you go into more testing. 
> 
>>   
> 
>> There are a lot of bugs in RHEL4 CS Update 4 ! 
> 
>>   
> 
>> Mike 
> 
>>   
> 
>>
>> ------------------------------------------------------------------------
>>
>> *From:* linux-cluster-bounces at redhat.com 
>> [mailto:linux-cluster-bounces at redhat.com] *On Behalf Of *Satya Daragani
>> *Sent:* Montag, 28. Mai 2007 15:12
>> *To:* Linux-cluster at redhat.com
>> *Subject:* [Linux-cluster] Testing Failover - Failing in few cases
>>
>>
>>
>> Hi Linux-Cluster Team, 
> 
>>   
> 
>> Please help me in testing the failover with the RHEL Cluster Suite 4 
>> with update 4. I am appending the details related to cluster nodes and 
>> configuration here. Kindly suggest me how to proceed further. 
> 
>>   
> 
>> IBM Lenovo Thinkcentre with AMD Opteron 64bit processor - Two nodes
>>
>> 256 MB RAM
>>
>> One NIC
>>
>>  
>>
>>    1. Installed RHEL AS 4 Update 4 on both the nodes
>>    2. Configured NIC with IP range 192.168.1.x (node1 – 192.168.1.1
>>       <http://192.168.1.1> , node2 – 192.168.1.2 <http://192.168.1.2>)
>>    3. Configured /etc/hosts.
>>    4. Installed the RHEL cluster suite 4 update 4 on both nodes.
>>    5. Added both the nodes in the cluster manager with one quorum vote
>>    6. No fence devices configured (chkconfig --del fenced)
>>    7. Restricted & ordered by priority (node1 – 1, node -2) level
>>       failover domain configured.
>>    8. Shared IP address (192.168.1.5 <http://192.168.1.5>) resource is
>>       configured and enabled the monitor link option.
>>    9. Created a service with the name httpd and configured the following
>>          1. Checked the Autostart this service
>>          2. Selected the failover domain configured in the previous
>>             steps.
>>          3. Selected the Relocate as the recovery policy
>>          4. Added the shared resource (IP created in the above steps),
>>             under this shared resource added the private resource
>>             script(/etc/rc.d/init.d/httpd). 
>>
>>
>>  
>>
>> Checking the failover:
>>
>> 1st case
>>
>> After configuring the above, now node1 is the primary node for the 
>> httpd service.
>>
>> If I restart the node1 the service is failed over to the node2, and 
>> once the node1 comes up again the service is failing over to the node1 
>> (as the priority is configured)
>>
>>  
>>
>> 2nd case
>>
>> Currently node1 is running the httpd service, if I down the network 
>> interface (ifconfig eth0 down), the httpd service is failing over to 
>> the node2.
>>
>> Then if I up the interface (ifconfig eth0 up) on node1, the service is 
>> not failovering to the node1 and in the /var/log/messages it is saying 
>> "unable to contact the cluster infrastructure". *Need your help here*
>>
>>  
>>
>> If I restart the cluster services on the node1 again the service is 
>> getting started on the node1.
>>
>>  
>>
>> 3rd case
>>
>> Currently node1 is running the HTTPd service, if I remove the 
>> powercord (I mean the improper shutdown), the service is going to the 
>> recovery mode and not getting started on the node2. *Need your help here.*
>>
>>  
>>
>> 4th case
>>
>> Currently node1 is running the httpd service, if I stop or killall the 
>> httpd service (service httpd stop) failover is not happening. *Need 
>> your help here.*
>>
>>  
>>
>> -- 
>> Thanx
>> Satya Daragani
>> satya.daragani at gmail.com <mailto:satya.daragani at gmail.com>
>> +91 98850 58366 
> 
>>--
>>Linux-cluster mailing list
>>Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>https://www.redhat.com/mailman/listinfo/linux-cluster
>>
> *Robert Hurst, Sr. Caché Administrator*
> *Beth Israel Deaconess Medical Center*
> *1135 Tremont Street, REN-7*
> *Boston, Massachusetts   02120-2140*
> *617-754-8754 ∙ Fax: 617-754-8730 ∙ Cell: 401-787-3154*
> Any technology distinguishable from magic is insufficiently advanced.
> 
> 
> ------------------------------------------------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
James A. McOrmond (jamesm at xandros.com)
Hardware QA Lead & Network Administrator
Xandros Corporation, Ottawa, Canada.
Morpheus: ...after a century of war I remember that which matters most:
  *We are still HERE!*