[Linux-cluster] Monitoring Failovers

Fri Feb 20 19:25:06 UTC 2009

I have 2 thoughts.  I was looking at the output of the clustat -l
command and i noticed it has a 'last transition' timestamp.  I was
thinking about looking at that and using it to create an alarm that
says "The last transition happened less than X minutes ago or
something like that.   It is a little sketchy, but could be a
possibility, and most of the code needed has already by the author of
check_rhcs script that i found, I believed authored by Chris St.
Pierre.  (If this is incorrect, please let me know as the author
should be credited.)

My second thought was a little more complicated and would require more
work, but basically using syslog-ng's parsing capiblities, I would
send all cluster service messages to a script that would parse and
look for failover messages.  The messages could be sent to NSCA.

I am going to try the first one, and see if it can meet my needs, then
maybe work on the second.

Thanks,
B

On Fri, Feb 20, 2009 at 11:41 AM, Martin Fuerstenau
<martin.fuerstenau at oce.com> wrote:
> It is a little bit hard to do. It is on my todo list too. The problem is
> to determine the old state. So for example if you switch an ip address
> and you have a service bound to that address you have nearly no chance
> to monitor it from the Nagios side.
>
> I have tested using the MAC address and arp but this is awesome if you
> have bonding. Because if the MAC switches it may be the bonding of the
> cluster or the cluster switched. But hardcoded MAC addresses in the
> monitor script will not be good idea.
>
> Too much trouble in maintenance.
>
> If anyone has a good idea I will write the plugin and post it
> Nagiosexchange.
>
> Martin Fuerstenau
>
> On Fri, 2009-02-20 at 11:04 -0500, Burton Simonds wrote:
>> I am in the process of setting up Nagios for system monitoring, and I
>> would like to have a way to know if a failover has occurred.  If
>> everything works as it should, there be a minimal impact on the
>> services.  Right now it looks like my best bet is basically scrape the
>> logs and look for the failover messages there and trigger an alarm.
>>
>> I was wondering if anyone else has done anything.  I found in an
>> archive a check_rhcs script that I am going to employ (which looks
>> pretty cool), but that just looks at the status of the services.  I
>> want to either compare the current status to the previous status or
>> have something monitoring the cluster an pushes the alert to Nagios.
>>
>> Thanks,
>> B
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> This message and attachment(s) are intended solely for use by the addressee and may contain information that is privileged, confidential or otherwise exempt from disclosure under applicable law.
>
> If you are not the intended recipient or agent thereof responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited.
>
> If you have received this communication in error, please notify the sender immediately by telephone and with a 'reply' message.
>
> Thank you for your co-operation.
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>