Slightly [OT] Network Monitoring/Alerting tools

Fri Aug 22 20:41:47 UTC 2008

Adam Hough wrote:
>
>> Polling is the best way to know if a service is actually working, but
>> OpenNMS also listens for SNMP traps, syslog messages, or xmlrpc events if
>> you want to send things to it.
> 
> So I have seen polling systems fail to properly diagnose a s system(s)
> when said system(s) would be under very heavy load (like say a process
> using almost all available resources) but the they system was just to
> slow to actually respond to the poll request.  The monitoring system
> (Nagios) would mark the system as just down.  With a client/server
> system it gives you a better chance of figuring out that you are
> running out of memory then the polling system.

In practical terms, it only matters whether your service is responding 
in a reasonable amount of time or not.  It doesn't really matter if the 
server is still barely alive and some other code is still telling you it 
is OK.  OpenNMS lets you tune what a 'reasonable amount of time' is in 
the polling step and also before any notifications are sent.  Of course 
if your server is prone to running out of resources (cpu, memory, disk, 
bandwidth) you should also be collecting that data via snmp with 
thresholds to notify you before you have an outage or at least graphing 
it so you'll understand the trends.

>>> OpenNMS from
>>> what I can tell still does not give me the flexibility that I want or
>>> need that I get from other system such as Hobbit (BB) or Nagios.
>> Example?  Stock SNMP will report most of the usual stuff (interface
>> bandwidth/errors, memory/disk/cpu use, etc.) and there are ways to extend it
>> to other values.
> 
> But when I was using Big Brother as a monitoring server we were able
> to easily right scripts to extend information that was reported to the
> monitoring server.  We were able to use to scripts to moniter database
> operations for some of our users so they would know how many and what
> ones were running.  We were able to use the monitor to look for
> hardware problems (AIX/pseries) and dump the log of the hardware
> reporting to the monitoring server.  We were able to monitor when
> backups were running on the system and if they have been running for
> an unusually long time.

You can do all of that with OpenNMS also, the real issue is how 
difficult it is.  Most of the common things you would want are already 
built in so you don't have to script them - or install scripts on all 
the clients.  Some of the less common things might be more difficult but 
the framework is there to extend.

> See below as I have never tried to configure SNMP other then to get
> the basic system information, but I think it would be much harder to
> setup SNMP to do some of those tasks then just having to write a
> simple script in bash, korn, perl, or python.

The server can run a script to poll something if you don't mind the 
efficiency hit - or you can run a script elsewhere and use the included 
send_event.pl script to send the result to the server.

>>> Though I will admit I had not know all that much about snmp other then
>>> to make sure that it is turned off on systems I install to give bots
>>> one less attack point if they make it past my iptable rules in some
>>> manner.
>> Don't turn read access off, just use a hard-to-guess community string.
>> Usually you would block inbound access at your internet firewalls anyway.
> 
> My machines live on a university network which are notoriously unsafe.
>   Further more since I deal with systems devoted to research so I have
> to allow (ssh) access to the machines from from other universities all
> over the world.   I cannot trust my public networks and can only trust
> my non-routeable networks to the extent that no user has used a easy
> to guess password.  Coupled with the fact that SNMP had a history of
> security issues though with SNMPv3 they have actually added security
> from what I have read.

Can you name a service that does not have a history of security issues? 
  That means you update them to get the fixes, not that you stop using 
them. If I thought someone could get rich by reading my CPU usage I'd 
worry about it...  If your network is subject to sniffing between the 
server and targets, you might want v3, though.

> Running SNMP just seems like an unnecessary
> risk when you can have your monitored systems pushing data to the
> monitor server(s) and just have to secure the monitor server(s).

The big advantages are that you can monitor the hosts, network 
equipment, UPS's, etc. with the same tool with the same notification 
setup and with a little extra configuration it can understand the 
topology for drawing maps and restricting notifications to a 
router/switch instead of the hundreds of now-unreachable hosts/services 
behind it.

-- 
   Les Mikesell
    lesmikesell at gmail.com