I am using two VM's hosted in my internal lab that has two interfaces one configured with a valid IP and other being down. I have kept the VIP also in the same network. My intention is to have a Apache configured as cluster service in these two nodes and do a fail-over when the node or the interface goes down. I try to use fence_vmware as fencing device. These two VM's are now part of a ESX 4.1 host and the GuestOS in my VM's are RHEL6.0 32-bit.
I am seeing the following problems in my setup now ...
1. When starting a apache service from LUCI, it starts fine in a node. But, if i kill httpd process from that node manually, it does not detect the service is down to restart or to relocate
2. -same- case if i do "ip adds del <VIP>" ; it just detects the node is down but does not do a restart or relocate of the service
3. Whenever i reboot the nodes, it comes online and the service properly starts fine in either of the node and both nodes perfectly in Quorum but the fail-over never happens if i stop that active node.
4. I am not sure what format of fence that i must put in the cluster.conf, since there is no way i can test that out if at all it works fine.
Manual tests :
1. I manually run something like this
"fence_vmware --action="" --ip=10.72.145.145 --username=<login> --password=<password> --plug=<vm-name>" which works fine on both the nodes.
2. Apache starts/stops just particularly fine from both nodes when i do
"rg_test test /etc/cluster/cluster.conf start service WEB"
Cluster.conf is attached herewith.
rgmanager.log is attached herewith.
Please let me know any specific debug commands that i can run manually to find out the issues going on here, more particularly the "relocation" of service and the "fencing"; both consistently fails.
Please help. I have been spending more than 10 days now to set this up in my internal lab to show it as Proof of Concept to my business heads to buy RHEL cluster indeed works for our production requirement.