[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Monitoring services with Nagios



I've attached my (very basic) check_rhcs script that I use with
Nagios.  HTH.

Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University

On Fri, 11 Jul 2008, Finnur Örn Guðmundsson - TM Software wrote:

Hi,



I was planning on monitoring the status of a service from clustat (run clustat, grab the output).

And as i am running a x86_64 system i can not seem to load the correct lib for snmpd to be able to read any data from it:

nmpd[30150]: dlopen failed: /usr/lib64/cluster-snmp/libClusterMonitorSnmp.so: undefined symbol: _ZN17ClusterMonitoring7Cluster15runningServicesEv



How do you monitor your cluster with Nagios/Other open source solutions ? (What scripts do you use etc).



Kær kveðja / Best Regards,

Finnur Örn Guðmundsson
Network Engineer - Network Operations
fog t is <mailto:fog t is>

TM Software
Urðarhvarf 6, IS-203 Kópavogur, Iceland
Tel: +354 545 3000 - fax +354 545 3610
www.tm-software.is <http://www.tm-software.is/>

This e-mail message and any attachments are confidential and may be privileged. TM Software e-mail disclaimer: www.tm-software.is/disclaimer <http://www.tm-software.is/disclaimer>

#! /usr/bin/perl -w
#
# $Id: check_rhcs 11710 2008-06-25 19:50:44Z stpierre $
#
# check_rhcs
#
# Nagios host script to check a Redhat Cluster Suite cluster

require 5.004;
use strict;
use lib qw(/usr/lib/nagios/plugins /usr/lib64/nagios/plugins /usr/local/nagios/libexec);
use utils qw($TIMEOUT %ERRORS &print_revision &support &usage);
use XML::Simple;

sub cleanup($$);

my $PROGNAME = "check_rhcs";

my $clustat = "/usr/sbin/clustat";

if (!-e $clustat) {
    cleanup("UNKNOWN", "$clustat not found");
} elsif (!-x $clustat) {
    cleanup("UNKNOWN", "$clustat not executable");
}

# Just in case of problems, let's not hang Nagios
$SIG{'ALRM'} = sub {
    cleanup("UNKNOWN", "clustat timed out");
};
alarm($TIMEOUT);

my $output = `$clustat -x`;
my $retval = $?;

# Turn off alarm
alarm(0);

if ($output =~ /cman is not running/) {
    cleanup("CRITICAL", $output);
} else {
    my $status = XMLin($output, ForceArray => ['group']);

    # check quorum
    if (!$status->{'quorum'}->{'quorate'}) {
	cleanup("CRITICAL", "Cluster is not quorate");
    }

    # check nodes
    my %nodes = %{$status->{'nodes'}->{'node'}};
    foreach my $node (keys(%nodes)) {
	if (!$nodes{$node}->{'state'}) {
	    cleanup("WARNING", "Node $node is down");
	} elsif (!$nodes{$node}->{'rgmanager'}) {
	    cleanup("WARNING", "rgmanager is not running on node $node");
	}
    }

    # check services
    my %svcs = %{$status->{'groups'}->{'group'}};
    foreach my $svc (keys(%svcs)) {
	if ($svcs{$svc}->{'state_str'} ne 'started') {
	    cleanup("CRITICAL", "$svc is in state " . $svcs{$svc}->{'state_str'});
	}
    }

    # check return value
    if ($retval) {
	cleanup("UNKNOWN",
		"Cluster appeared okay, but clustat returned $retval");
    }
}

cleanup("OK", "Cluster is sound");

##############################
#   Subroutines start here   #
##############################
sub cleanup ($$) {
    my ($state, $answer) = @_;
    print "Cluster $state: $answer\n";
    exit $ERRORS{$state};
}

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]