[Linux-cluster] RHCS resource agent: status interval vs. monitor interval

Tue Aug 9 03:35:51 UTC 2011

On Thu, Jul 28, 2011 at 05:39:24PM -0400, I wrote:
> In the <actions> section of a RHCS resource agent's meta-data,
> there are nodes for both action name="status" and action name="monitor".
> Both of them have an interval and a timeout.  For example, in ip.sh:
> 
>         <!-- Checks to see if the IP is up and (optionally) the link is
>              working -->
>         <action name="status" interval="20" timeout="10"/>
>         <action name="monitor" interval="20" timeout="10"/>
> 
>         <!-- Checks to see if we can ping the IP address locally -->
>         <action name="status" depth="10" interval="60" timeout="20"/>
>         <action name="monitor" depth="10" interval="60" timeout="20"/>
> 
> I assume that one of them controls how often rgmanager runs the
> resource agent to check the resource status, but which one, and
> what's the point of the other one?
> 
> I tried to find the answer in:
>   https://fedorahosted.org/cluster/wiki/ResourceActions
>   http://www.opencf.org/cgi-bin/viewcvs.cgi/*checkout*/specs/ra/resource-agent-api.txt?rev=1.10
> 
> Neither of them explain why there are separate "status" and "monitor" actions.

Ralph.Grothe at itdz-berlin.de was the only person to respond.  He said
he thinks that under RHCS, "monitor" is ignored and only "status" is
used, but he's not sure.

Separately, since then I've come to understand that what I thought
"interval" and "timeout" controlled is not the case.  I believed
that rgmanager would attempt a status (or monitor?) check every
interval seconds.  That does not appear to be true.

I've been unable to find documentation of any of this on the wiki
or anywhere else I've searched.  Some references are made to things
like how to change the status interval, but what it does is implied,
not stated.

Does there exist any real documentation anywhere, of how rgmanager
reads and makes use of this metadata, and how it does status checks?
Or is diving into the source the only way of figuring this out?

(I'm resistant to that partly because I'm not really a programmer,
and partly because code often contains bugs or hidden assumptions
and doesn't really document how things are intended to work; some
answers from the source may turn out to be ephemeral, others partial.)
  -- Cos