[Linux-cluster] Event in one failover domain affecting another separate failover domain

Mon Apr 21 17:22:21 UTC 2008

I have a 3 node RHEL 4.6 cluster with two failover domains. The idea is
that two of the nodes are primary for their respective services and the
remaining node is a shared failover node for both of the services. Here
is an example of how the two ordered domains are configured:

DOMAIN_ONE (service_one)
  server_a (priority=1)
  server_b (priority=2)

DOMAIN_TWO (service_two)
  server_c (priority=1)
  server_b (priority=2)

The issue I have observed is that when server_c (DOMAIN_TWO) had an
issue that led to it being fenced, the service running on server_a
(service_one) immediately stopped and relocated to server_b (the
recovery action is set to "relocate" for both services). What I don't
understand is how a failure in DOMAIN_TWO with service_two on server_c
would affect service_one running on server_a in DOMAIN_ONE. The logs do
not provide any obvious hints. Here is a snippet from the messages log
on server_a for this time period:

11:10:56 server_a fenced[11638]: fencing node "server_c" 
11:12:03 server_a fenced[11638]: fence "server_c" success 
11:12:04 server_a clurgmgrd[11776]: <info> Magma Event: Membership
Change 
11:12:04 server_a clurgmgrd[11776]: <info> State change: server_c DOWN 
11:12:04 server_a clurgmgrd[11776]: <notice> Stopping service
service_one 
11:12:04 server_a clurgmgrd: [11776]: <info> Executing
/etc/init.d/service_one stop

As you can see, there is no indication as to why service_one is being
stopped. The last two events in the above log should not have occurred.
Has anyone else ever had this sort of issue? I'm not sure if this is a
bug or a config problem.

Thanks,
Sam