[Linux-cluster] dboracle agent would attempt restart of failed instance forever rather than failing service over, why, and are there maxrestart params available per resource?

Ralph.Grothe at itdz-berlin.de Ralph.Grothe at itdz-berlin.de
Mon Aug 13 14:08:49 UTC 2012


Hello Cluster Gurus,

while making tests with our RHCS cluster we observed a behavior
that I hadn't expected.

As one other test we shut the running Oracle DB instance down
manually and renamed its pfile so that the (RedHat provided and
unmodified by us) oracledb could not restart it while the service
was running on one node and the failover node was also a cluster
member and would have been able to take over the service's
resources.

Tailing to the messages log and watching for clurgmgrd entries on
that node where the service was running we could observe endless
restarting attempts of clurgmgrd of the failed oracledb instance.

Honestly, I would have rather expected the agent to try a restart
of the downed instance at most three times (or whatever default
max restart attempts this resource had defined, if any) on the
local node, and after that to pull down all resources of that
service to try an relocate the whole service on the standby
failover node.

So, is this normal behavior or is the oracledb agent poorly
written that it cannot bail out with an error exit code that
would signal the clusterware to relocate the whole service?

In order to avoid endless restart attempts is it possible to
assign each single resource of a service a max_restart attribute
as one can to the whole service tag?
Well, I fear not because the RelaxNG XML parser complaint when I
insertet this in the oracledb tag.
So what would be the recommended practice then?


In this RHCS cluster we have this rgmanager release:

# rpm -q rgmanager
rgmanager-2.0.52-9.el5


Also, the meta params of the oracledb agent don't list a
max_restart attribute (of course, why the XML parser complaint, I
suppose)


# /usr/share/cluster/oracledb.sh meta-data|grep parameter\ name
        <parameter name="name" primary="1">
        <parameter name="listener_name" unique="1">
        <parameter name="user" required="1">
        <parameter name="home" required="1">
        <parameter name="type" required="0">
        <parameter name="vhost" required="0" unique="1">



Regards
Ralph




More information about the Linux-cluster mailing list