[Linux-cluster] Event in one failover domain affecting anotherseparate failover domain

Mon Apr 21 18:45:21 UTC 2008

Ok, I've set the log level to debug so hopefully next time this happens
I can get more info. Of course this is a production cluster so there is
only so much I can do in terms of testing.. Here is the cluster.conf
(sanitized but otherwise accurate):

<?xml version="1.0"?>
<cluster alias="cluster_a" config_version="2" name="cluster_a">
        <quorumd device="/dev/mapper/mpath5p1" interval="3" tko="23"
votes="3"/>
        <cman deadnode_timeout="135" expected_votes="6">
                <multicast addr="239.0.0.10"/>
        </cman>
        <fence_daemon post_fail_delay="0" post_join_delay="30"/>
        <clusternodes>
                <clusternode name="server_a" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="server_a-ilo"/>
                                </method>
                        </fence>
                        <multicast addr="239.0.0.10" interface="bond0"/>
                </clusternode>
                <clusternode name="server_b" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="server_b-ilo"/>
                                </method>
                        </fence>
                        <multicast addr="239.0.0.10" interface="bond0"/>
                </clusternode>
                <clusternode name="server_c" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="server_c-ilo"/>
                                </method>
                        </fence>
                        <multicast addr="239.0.0.10" interface="bond0"/>
                </clusternode>
        </clusternodes>
        <fencedevices>
                <fencedevice agent="fence_ilo" hostname="server_a-ilo"
login="clu_user" name="server_a-ilo" passwd="..removed.."/>
                <fencedevice agent="fence_ilo" hostname="server_b-ilo"
login="clu_user" name="server_b-ilo" passwd="..removed.."/>
                <fencedevice agent="fence_ilo" hostname="server_c-ilo"
login="clu_user" name="server_c-ilo" passwd="..removed.."/>
        </fencedevices>
        <rm log_level="7">
                <failoverdomains>
                        <failoverdomain name="DOMAIN_ONE" ordered="1"
restricted="1">
                                <failoverdomainnode name="server_a"
priority="1"/>
                                <failoverdomainnode name="server_b"
priority="2"/>
                        </failoverdomain>
                        <failoverdomain name="DOMAIN_TWO" ordered="1"
restricted="1">
                                <failoverdomainnode name="server_c"
priority="1"/>
                                <failoverdomainnode name="server_b"
priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                </resources>
                <service autostart="1" domain="DOMAIN_ONE"
name="service_one" recovery="relocate">
                        <script file="/etc/init.d/service_one"
name="script-service_one"/>
                        <lvm lv_name="lvapp1" name="app1-lvm"
vg_name="cluvg-app1"/>
                        <ip address="xxx.xxx.xxx.100" monitor_link="1"/>
                        <fs device="/dev/cluvg-app1/lvapp1"
force_fsck="1" force_unmount="1" fsid="64050" fstype="ext3"
mountpoint="/app1" name="app1-fs" options="" self_fence="0"/>
                </service>
                <service autostart="1" domain="DOMAIN_TWO"
name="service_two" recovery="relocate">
                        <script file="/etc/init.d/service_two"
name="script-service_two"/>
                        <lvm lv_name="lvapp2" name="app2-lvm"
vg_name="cluvg-app2"/>
                        <lvm lv_name="lvapp2_data" name="app2-data-lvm"
vg_name="cluvg-app2-data"/>
                        <ip address="xxx.xxx.xxx.200" monitor_link="1"/>
                        <fs device="/dev/cluvg-app2/lvapp2"
force_fsck="1" force_unmount="1" fsid="45751" fstype="ext3"
mountpoint="/app2" name="app2-fs" options="" self_fence="0"/>
                        <fs device="/dev/cluvg-app2-data/lvapp2_data"
force_fsck="1" force_unmount="1" fsid="985" fstype="ext3"
mountpoint="/app2/data" name="app2-data-fs" options="" self_fence="0"/>
                </service>
        </rm>
</cluster>

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
Sent: Monday, April 21, 2008 2:02 PM
To: linux clustering
Subject: Re: [Linux-cluster] Event in one failover domain affecting
anotherseparate failover domain

On Mon, 2008-04-21 at 13:22 -0400, Kielek, Samuel wrote:

> The issue I have observed is that when server_c (DOMAIN_TWO) had an
> issue that led to it being fenced, the service running on server_a
> (service_one) immediately stopped and relocated to server_b (the
> recovery action is set to "relocate" for both services).

Your cluster.conf would be helpful.

Also, you can increase the log level to 'debug' which would tell you
more; see "Logging Configuration":

  http://sources.redhat.com/cluster/wiki/RGManager

...for more information.

-- Lon

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster