[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Monitoring services/customize failure criteria



Jeff Stoner wrote:
-----Original Message-----
It is also the detail of status/monitor which implementers get most
frequently wrong. "But it's either running or not!" ... Which is clearly
not true, or at least such a case couldn't protect against certain
failure modes. (Such as multiple-active on several nodes, which is
likely to be _also_ failed.)

Ok. I think I understand where the confusion lies.

LSB is strictly for init scripts.
OCF is strictly for a cluster-managed resource.

They are similar but have significant differences. For example, LSB
scripts are required to implement a 'status' action while OCF scripts
are required to implement a 'monitor' action. This difference alone
means, technically, you can't interchange LSB and OCF scripts unless
they implement both (in some fashion.)

I think this is the missing link in our conversation: the script
resource type in Cluster Services is an attempt to make a LSB-compliant
script into a OCF-compliant script. So, the /usr/share/cluster/script.sh
expects the script you specify to behave like an LSB script, not an OCF
script. As such, the script resource type falls back to LSB conventions
and uses a binary approach to a resource's start/stop/status actions:
zero for success and non-zero for any failure. Other resource types
(file system, nfs, ip, mysql, samba, etc.) may implement full OCF RA API
exit codes.

Does this help?


Also internally, rgmanager can recognize other non-zero OCF return codes:

http://git.fedorahosted.org/git/cluster.git?p=cluster.git&a=search&h=HEAD&st=grep&s=OCF_RA

-subhendu


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]