[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Question regarding typed resources ´ parent child vs. sibling ordering


On 02/11/2014 09:17 PM, Ralph Grothe itdz-berlin de wrote:

My actions and questions relate to a RHEL 5 RHCS cluster.

Though I studied carefully the official RHEL 5 Cluster Admin
Guide with special emphasis on the chapter "HA Resource Behavior"
there remain certain things unclear to me.

First of all, I have to mention that my cluster.conf´s
parent-child-sibling hierarchies whithin the service scopes could
successfully be checked in as valid cluster configuration (i.e.
"ccs_tool update /etc/cluster/cluster.conf" succeeded).

My first question is whether it is feasible to use the
<resources> tag, which originally is meant to map inheritance,
and populate such a block although I don't make any use of
inheritance in my configuration?
I simply find that its use makes the appearance of the <rm> block
much more readable an tidier.

I don't entirely follow, but I'll take a guess that you are asking if it is compulsory to define resources in the <resources></resources> section, and then referencing them in the <services></services> section>?

If that is what you mean, then I can confirm that the recommended method is to define resources in the <resources></resources> tags and to reference those definitions in the <service></service> tags. But it is also possible to leave the <resources></resources> section blank, and declare the resources when they are specified in the <service></service> section. Both are possible.

Maybe I misunderstood you, or I misunderstood what you were referring to when you mentioned inheritance.  Please clarify if I did not answer your question.

Now to my main concern.

Would such a <service> block be valid and start and stop
resources in the proper order (i.e. according to my intention)?


   <service name="srv-a" ...>
      <ip ref="">
        <lvm ref="vg-a" ...>
           <fs ref="fs_srv-a_vg-a_lv-a"/>
           <fs ref="fs_srv-a_vg-a_lv-b"/>
           <fs ref="fs_srv-a_vg-a_lv-c"/>
        <lvm ref="vg_b" ...>
           <fs ref="fs_srv-a_vg-b_lv-a"/>
           <fs ref="fs_srv-a_vg-b_lv-b"/>
           <fs ref="fs_srv-a_vg-b_lv-c"/>
      <oracledb ref="SID-A">
        <script ref="oracle_em" __independent_subtree="1"
__max_restarts="2" __restart_expire_time="0"/>
      </script ref="sid-a_statechg_notify"
__independent_subtree="1" __max_restarts="2"
   <service name="srv-b" ...>

You have not stated your intentional starting order (to this point in the email), but my understanding is that this services will start in the following order:
1. <ip ref="">
2. <lvm ref="vg-a" ...>
3. <fs ref="fs_srv-a_vg-a_lv-a"/>
4. <fs ref="fs_srv-a_vg-a_lv-b"/>
5. <fs ref="fs_srv-a_vg-a_lv-c"/>
6. <lvm ref="vg_b" ...>
7. <fs ref="fs_srv-a_vg-b_lv-a"/>
8. <fs ref="fs_srv-a_vg-b_lv-b"/>
9. <fs ref="fs_srv-a_vg-b_lv-c"/>
10. <script ref="sid-a_statechg_notify".../>
11. <oracledb ref="SID-A">
12. <script ref="oracle_em"...>

Later you mention that you expected #11 and #12 to start before #10.  I explained why they start in this order below.

I am asking because I read in the mentioned doc above that for a
typed resource (such as ip, lvm, fs,...) there exists a strict
start and stop sequence for siblings.
In my parent-child hierarchy above I am reversing this start
order by making the ip resource a parent of the lvm resource
which in sibling context would have a higher starting precedence
than the ip resource.

This is correct, and is the recommended method of overriding the default resource starting order to create dependencies.  The defaults do not work for everyone.

Of course I had a second thought in mind when rigging up this
seemingly oblique hierarchy of typed resources.

Because there are scheduled maintenance downtimes I wanted to
ease the activation of a whole bunch of a service's resources
like LVM LVs, mountpoints and IP addresses with a single rg_test
invocation when a service has previously been disabled.
I then could issue according to the above config snippet just a

e.g.  rg_test test /etc/cluster/cluster.conf start ip

and have all resources activated apart from the Oracle DB

You are correct that rg_test can be used to start and stop individual resources within a FROZEN service. You need to make sure that the <oracledb> and <script> resources can operate if the IP, filesystems and LVM resources are unavailable before you use this method to stop IP, LVM and filesystem resources.

Assuming the <oracledb> and both <script> resources require the IP resource but can operate without the LVM and filesystem resources, you could leave the IP, oracledb and both script resources running and just stop the filesystems and LVM resources like this:
# clusvcadm -Z srv-a          # freezes the service so it won't failover
# rg_test test /etc/cluster/cluster.conf stop lvm vg-a
# rg_test test /etc/cluster/cluster.conf stop lvm vg_b

The <ip>, <oracledb> and both <script> resources should still be running (they won't be checked for failure however because service is frozen). To reactivate after maintenance:
# rg_test test /etc/cluster/cluster.conf start lvm vg-a
# rg_test test /etc/cluster/cluster.conf start lvm vg_b
# clusvcadm -U srv-a

There is yet another issue that puzzles me.
If I look at the starting sequence by issuing

rg_test noop /etc/cluster/cluster.conf start service srv-a

then the resource script:sid-a_statechg_notify gets executed
before the resources oracledb:SID-A and script:oracle_em.

This would imply to me that any resource of type script has a
higher starting precedence than any resource of type oracledb
because in my config above they are siblings.
I actually would have thought it to be the other way round, i.e.
that script resources have the lowest starting precedence of all.

Unfortunately, in  Table D.1. "Child Resource Type Start and Stop
Order" on page 112 of the cluster administration guide the typed
resource oracledb does not appear.

The reason that the <script> resource starts before the <oracledb> resource is because oracledb resources do not have a default start order so it just comes after all resources that are defined. From /usr/share/cluster/service.sh:
    <special tag="rgmanager">
        <attributes root="1" maxinstances="1"/>
        <child type="lvm" start="1" stop="9"/>
        <child type="fs" start="2" stop="8"/>
        <child type="clusterfs" start="3" stop="7"/>
        <child type="netfs" start="4" stop="6"/>
        <child type="nfsexport" start="5" stop="5"/>

        <child type="nfsclient" start="6" stop="4"/>

        <child type="ip" start="7" stop="2"/>
        <child type="smb" start="8" stop="3"/>
        <child type="script" start="9" stop="1"/>

There is no entry for oracledb resources, but there is an entry for script resource. As a result, the script resource will start with 9th priority, and all non-typed resources (including oracledb) will start in order listed after all typed resources.

You could argue that the omission of the oracledb resource is a bug or missing feature, however in the meantime you will have to force ordering of resources by using parent-child relationships (ie. if you want the oracledb resource to start before the script resource, the script resource should become a child of the oracledb resource).

Many thanks for your patience having read this far.

If you have active RHEL subscriptions with addons, Red Hat Support can assist specifically with these issues.


Ryan Mitchell
Red Hat Global Support Services

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]