OpenNMS

Dennis Leeuw

Permission to use, copy, modify and distribute the OpenNMS Guide and its accompanying documentation for any purpose and without fee is hereby granted in perpetuity, provided that the above copyright notice and this paragraph appear in all copies.

The copyright holders make no representation about the suitability of this Guide for any purpose. It is provided "as is" without expressed or implied warranty.


Table of Contents
1. Introduction
2. Installation
CentOS 5.2
SMTP configuration
User configuration
Basic configuration
The different views
Home view
Node List
Surveillance
Dashboard
3. First steps
Adding nodes
Configure Discovery
Add interface
Labeling
Adding monitors
Adjust Home view
Notifications
Setting up notification actions
4. More on SNMP
Linux NET-SNMP installation
Windows installation
Mac OS X
Using the SNMP data
Add additional MIB information for NET-SNMP
5. Other ways of collecting data
NRPE
NRPE on the to be monitored system
NRPE testing from the OpenNMS machine
NRPE and OpenNMS
List of Tables
5-1. Exit codes

Chapter 1. Introduction


Chapter 2. Installation

CentOS 5.2

rpm -Uvh http://yum.opennms.org/repofiles/opennms-repo-stable-rhel5.noarch.rpm
yum -y install postgresql-server
yum --nogpgcheck install opennms

Add enabled=0 to

/etc/yum.repos.d/opennms*

export OPENNMS_HOME=/opt/opennms
service postgresql start
vi /var/lib/pgsql/data/pg_hba.conf
change all "ident sameuser" to "trust". Adjust postgresql.conf with
listen_addresses = 'localhost'

sudo -u postgres createdb -U postgres -E UNICODE opennms
service postgresql restart

yum install iplike.x86_64
install_iplike.sh
$OPENNMS_HOME/bin/runjava -s
$OPENNMS_HOME/bin/install -dis
Files that contain the information for the Java libraries and JVM on your system are: /opt/opennms/etc/libraries.properties (opennms.library.jicmp) /opt/opennms/etc/java.conf (path to jdk/bin/java)

service opennms start
Open port 8980 of the firewall Browse to http://<server>:8980/opennms login with admin/admin


SMTP configuration

Use the /opt/opennms/etc/javamail-configuration.properties file to setup how OpenNMS should output e-mail.


User configuration

The first thing you should do is to change the password for the admin account. To do this go to the Admin section, select "Configure Users, Groups and Roles", click "Configure Users" and press Modify. Here you should use "Reset Password" to change the Admin password.

And while at it, also fill in an e-mail address for the Admin user. This e-mail address will be used to send notifications to. When done click the Finish button for the changes to take effect.

You could add more users if you like and give them read only access or read and write access.

Data is stored in /opt/opennms/etc/users.xml


Basic configuration

In the Admin menu select IMG "Notification Status On" and press update. From now on the system will send you notifies of changes.

You most likely want to add some security if you connect to your OpenNMS system, like adding HTTPS, instead of plain HTTP for the login. To change these basic settings use the /opt/opennms/etc/opennms.properties file.

FIXME

More on security: http://www.opennms.org/index.php/Security_Considerations

Change in /opt/opennms/jetty-webapps/opennms/WEB-INF/web.xml all occurances of localhost:8080 to localhost:8980. To prevent messages like:

2008-12-30 13:11:59,885 WARN  [RTC Updater Pool-fiber3] DataSender: DataSender:  Unable to send category 'Overall Service Availability' to URL 'http://localhost:8980/opennms/rtc/post/Overall+Service+Availability': 
java.net.ConnectException: Connection refused


The different views


Node List

Node List


Surveillance

Is a stripped down version of Dashboard only upper part of Dashboard is visible.


Dashboard

To add nodes to the Dashboard or Surveillance view there is currently only one right way of doing it, since to be part of the Surveillance part a node needs to be part of Production, Testing or Development plus a category. The "Manage Surveillance Categories" in the Admin section doesn't allow you to set the Production/Testing/Development part and the Category option in the Assets page has nothing to do with the Dashboard or the Surveillance.

Right way to do it: Select Node List -> click on Node -> Use Surveillance Category Memberships (Edit) (click edit). Click the Function and one of Production/Test/Development. Add nodes to category (with control you can select one more or less, with shift you can select a list).

To add, rename or remove categories: Admin -> Manage Surveillance Categories


Chapter 3. First steps

Adding nodes

There are several ways to add hosts to the OpenNMS system. The easiest way is to add a range of IP addresses for autodetection. Next to that you can add a single host for automatic discovery or a single interface.

All these options are available from the Admin tab.


Configure Discovery

Configure Discovery: Set up the IP addresses (individual addresses and/or ranges) that you want OpenNMS to scan for new nodes.

Modify Configuration page, click on the 'Add New' button. In the window that pops up, enter the beginning and ending IP addresses for the range that you wish to include for discovery. The default values for Retries and Timeout are usually appropriate. Click the 'Add' button, and the popup window will close and the new range will show up in the Include Ranges section. Click the 'Save and Restart Discovery' button to apply your changes. Discovery of the newly added range will begin within a few seconds; the ping requests and service scans are spread out over time to avoid flooding your network, so it will take some time for all nodes in your newly added range to be scanned and discovered.

Click on the Home link and see the amount of hosts added to the OpenNMS system. Jump around in joy, your first steps are done.


Add interface

Add Interface: Add an interfaces to the database. If the IP address of the interface is contained in the ipAddrTable of an existing node, the interface will be added into the node. Otherwise, a new node will be created.

Enter in a valid IP address to generate a newSuspectEvent. This will add a node to the OpenNMS database for this device. Note: if the IP address already exists in OpenNMS, use "Rescan" from the node page to update it. Also, if no services exist for this IP, it will still be added.


Labeling

Some nodes will not be found correctly (that is by their fqdn), due to no correct DNS configurations or due to their SNMP name. There are two ways to solve this problem. Fix the problem at the originating end, meaning fix DNS and the SNMP configuration. Which of course is always the best thing to do, solve issues at the source.

But sometimes you can not fix these problems, because you are not the maintainer of these systems. The second option is to tell OpenNMS what the fqdn of the system should be:

Click "Node List" in the top-bar of the web interface. Click on the IP address that you want to change. Select Admin and then Change Node Label. Change from Automatic to User Defined and fill in the details. Click Change Label and you are done.


Adding monitors

For our tests we are going to add printer monitoring. For this we will check on IPP, JetDirect and LPD support on the monitored systems.

Create checks for these services in capsd-configuration.xml:

<protocol-plugin protocol="JetDirect" class-name="org.opennms.netmgt.capsd.plugins.TcpPlugin" scan="on">
        <property key="port" value="9100" />
        <property key="timeout" value="3000" />
        <property key="retry" value="1" />
</protocol-plugin>

<protocol-plugin protocol="LPD" class-name="org.opennms.netmgt.capsd.plugins.TcpPlugin" scan="on">
        <property key="port" value="515" />
        <property key="timeout" value="3000" />
        <property key="retry" value="1" />
</protocol-plugin>

<protocol-plugin protocol="IPP" class-name="org.opennms.netmgt.capsd.plugins.TcpPlugin" scan="on">
        <property key="port" value="631" />
        <property key="timeout" value="3000" />
        <property key="retry" value="1" />
</protocol-plugin>

And in poller-configuration.xml:

<service name="LPD" interval="300000" user-defined="true" status="on">
         <parameter key="retry" value="1"/>
         <parameter key="timeout" value="3000"/>
         <parameter key="port" value="515"/>
         <parameter key="rrd-repository" value="/var/opennms/rrd/response"/>
         <parameter key="ds-name" value="lpd"/>
</service>

<service name="JetDirect" interval="300000" user-defined="true" status="on">
        <parameter key="retry" value="1"/>
        <parameter key="timeout" value="3000"/>
        <parameter key="port" value="9100"/>
        <parameter key="rrd-repository" value="/var/opennms/rrd/response"/>
        <parameter key="ds-name" value="jetdirect"/>
</service>

<service name="IPP" interval="300000" user-defined="true" status="on">
        <parameter key="retry" value="1"/>
        <parameter key="timeout" value="3000"/>
        <parameter key="port" value="631"/>
        <parameter key="url" value="/printers"/>
        <parameter key="rrd-repository" value="/var/opennms/rrd/response"/>
        <parameter key="ds-name" value="ipp"/>
</service>

<monitor service="LPD" class-name="org.opennms.netmgt.poller.monitors.TcpMonitor"/>
<monitor service="JetDirect" class-name="org.opennms.netmgt.poller.monitors.TcpMonitor"/>
<monitor service="IPP" class-name="org.opennms.netmgt.poller.monitors.TcppMonitor"/>

Restart opennms. After some time the new services are found.


Adjust Home view

Define this section in categories.xml:

<category>
      <label><![CDATA[Print Servers]]></label>
      <comment>This category includes all managed interfaces which are
      running an IPP, LPD or JetDirect services.</comment>
      <normal>99.99</normal>
      <warning>97</warning>
      <service>IPP</service>
      <service>JetDirect</service>
      <rule><![CDATA[isIPP | isJetDirect | isLPD]]></rule>
</category>

Most of our print servers also support an http interface, so we want to filter these out. Adjust the categories.xml file so the Web Servers section reads like:

<category>
    <label><![CDATA[Web Servers]]></label>
    <comment>This category includes all managed interfaces which are running an HTTP (Web) server on port 80 or other common ports.</comment>
    <normal>99.9</normal>
    <warning>97</warning>
    <service>HTTP</service>
    <service>HTTPS</service>
    <service>HTTP-8000</service>
    <service>HTTP-8080</service>
    <rule><![CDATA[( isHTTP | isHTTPS | isHTTP-8000 | isHTTP-8080 ) & ( notisJetDirect | notisIPP | notisLPD )]]></rule>
</category>

Add a section to viewsdisplay.xml:

<category><![CDATA[Print Servers]]></category>

Restart opennms. All printers where part of Web servers and Printer servers. A couple of days later, most printers where no longer visible in the Web servers section. A couple however remained, I don't know why. It just did.


Notifications

To make filtering of OpenNMS messages easier, it helps when OpenNMS subjects start with an identifier. To do this edit the notifications.xml file and change all lines that start with <subject> to <subject>[OpenNMS].

Messages will then look like: [OpenNMS] Notice # for problem reporting and RESOLVED: [OpenNMS] Notice # for the reporting of solved problems.


Setting up notification actions

Using the web interface or the users.xml and groups.xml files one can create users and make them part of groups. With this information we can define destination paths for notificiations. The destinationPaths.xml file describes these destination paths. The following example is a default one from this file:

<path name="Email-Admin">
   <target>
      <name>Admin</name>
      <command>javaEmail</command>
   </target>
</path>

The sections that may occur in this file are: destinationPaths, escalate, path, and target. destinationPaths is the main section which always there, unless you are starting the file from scratch. Within this section there is the path section, which uniquely defines a path. The name parameter is used as the unique identifier. In the above example javaMaili (see the notificationCommands.xml file for predefined commands) is used to send notifies to Admin, where Admin can be a group or a user. Default admin (small caps) is the user and Admin is the group in OpenNMS. So in this case the Admin group is notified.

Note

If you are part of the Admin group and your e-mail address is the one used for the admin user you will receive all messages twice!

The path section can also contain an escalate section. The escalate is used to notify somebody when the first person does not repsond to the notification, meaning he or she does not acknowledge the problem. More about acknowledgements can be found in the next section.

The escalate section contains normal target sections:

<escalate delay="string">
	<target> ... </target>
</escalate>
The delay parameter tells the system the amount of seconds(?) it has to wait before sending notifies to the escalate targets.

With these destination paths we can tell where notifications have to go in the notification.xml file:

<notification name="interfaceDown" status="on">
   <uei>uei.opennms.org/nodes/interfaceDown</uei>
   <rule>IPADDR != '0.0.0.0'</rule>
   <destinationPath>Email-Admin</destinationPath>
   <text-message>All services are down on interface %interfaceresolve% (%interface%) 
on node %nodelabel%.  New Outage records have been created 
and service level availability calculations will be impacted 
until this outage is resolved.  
   </text-message>
   <subject>[OpenNMS] Notice #%noticeid%: %interfaceresolve% (%interface%) on node %nodelabel% down.</subject>
   <numeric-message>111-%noticeid%</numeric-message>
</notification>

The toplevel section of the notification.xml is the notifications tag. Within this container you will find several notification definitions. Elements that can be found in the notification container are: uei, description, rule, notice-queue, destinationPath, text-message, subject, numeric-message, event-severity, parameter, varbind

But first we will start with the parameters to the notification-tag. The name is a identifier and the status can be on, auto, and off. A status of on means that a message will be sent, off means no message will be sent and auto....?. Not shown in the example is the parameter writeable, it can be set to yes and no, but can't find any explanation....????

The uei defines the trigger for the notification. In this case as soon as the status for an interface goes to down a message will be sent. The different events can be found in eventconf.xml. Next to the fact that there has to be a interfaceDown event, it also has to comply to the rule filter. In this case the IP address of the node may not be equal to 0.0.0.0.

The other elements from our example define how the message looks like and to whom the message will be sent (destinationPath). And they all speak for them selves. The additional tags mentioned before need some explanation: Events also have an associated severity, ranging from Normal for interesting, but non-problematic events, through various levels of importance up to Critical. Also, arbitrary text can be attached to an event for use as operator instructions to deal with the event. For the notifications I assume, that this is a filter... but no documentation found.

The events predefined severity levels are:

Critical

This is the highest severity level possible. Probably only useful when your serverroom is on fire.

Seriously

Mostly not used on its own, but used to escalate a Major problem.

Major

This means a critical problem. Immediate action is needed.

Minor

Action is needed, but the device is still functional. It will require your attention or may fail completely.

Warning

This one needs your attention, but might not need any action. Warnings might resolve themselfs.

Normal

"For your information" messages. Things like logins etc. should be reported with this kind of severity.

Cleared

A problem of any severity above normal is resolved.

Parameter entries are passed to the notification command as switches and look like this:

>parameter name="trapVersion" value="v1" /<
>parameter name="trapTransport" value="UDP" /<
>parameter name="trapHost" value="my-trap-host.mydomain.org" /<
>parameter name="trapPort" value="162" /<
>parameter name="trapCommunity" value="public" /<
>parameter name="trapEnterprise" value=".1.3.6.1.4.1.5813" /<
>parameter name="trapGeneric" value="6" /<
>parameter name="trapSpecific" value="1" /<
>parameter name="trapVarbind" value="Node: %nodelabel%" /<
This creates a trap to be sent. The destinationPath should then of course be trapNotifier (see also http://www.opennms.org/index.php/Notification_Configuration_How-To).

If the notification has a varbind configured with a name and value, it is used for a case sensitive match against the beginning of an event parameter of the same name. ???


Chapter 4. More on SNMP

Linux NET-SNMP installation

yum install net-snmp
vi /etc/snmp/snmpd.conf

TODO ---> CentOS heeft geen snmpconf

Fill in:

syslocation <fill in the room or other location details like GPS coordinates>
syscontact <fill in the contact details like: User One <uone@somewhere.com>>

More on NET-SNMP: http://www.net-snmp.org/


Windows installation

dus alleen SNMP installeren en de rechten op de service wijzigen: - accepted community names op public - accept SNMP packet from OpenNMS server


Using the SNMP data

Now that everybody talks SNMP we can start collecting data. OpenNMS will automatically find the new services on the host, or you can force a rescan. To force a rescan go to the host-page by selecting "Node List", click on the link to the node and press the Rescan link. Confirm by clicking the Rescan button.

From the node page select the "Resource Graphs" link. IMG Select one or more items and press Submit. Using the CTRL-key one can select or deselect more items.


Add additional MIB information for NET-SNMP

To also display the NET-SNMP disk storage add to /opt/opennms/etc/datacollection-config.xml to the sections <systemDef name="Net-SNMP (UCD)"> and <systemDef name="Net-SNMP"> the following line: <includeGroup>mib2-host-resources-storage</includeGroup>

The node overview "Resource Graphs" will after a restart be expanded with "Storage (MIB-2 Host Resources)" where you can select one or more disk partitions to graph.


Chapter 5. Other ways of collecting data

NRPE

This section describes the use of NRPE with the OpenNMS system.


NRPE on the to be monitored system

# yum install nrpe nagios-plugins-nrpe nagios-plugins

Edit the /etc/nagios/nrpe.cfg. You should change the line reading allowed_hosts=127.0.0.1 and add the IP address of the OpenNMS system like: allowed_hosts=127.0.0.1,192.168.1.5. Then you should also adjust your firewall to allow connections from the OpenNMS system on port 5666.

# service nrpe start


NRPE testing from the OpenNMS machine

# yum install nagios-plugins-nrpe nagios-plugins

As of version 1.3.10 OpenNMS supports SSL for the NRPE plugin. To check the functionality of NRPE we check with SSL support:

$ /usr/lib/nagios/plugins/check_nrpe -H <to be monitored IP> -c _NRPE_CHECK
This should return the version of NRPE.

Some documentation still refers to the old non-SSL behaviour. In these documents check_nrpe is called with the -n option. If you do that on an SSL enabled system you get:

/usr/lib/nagios/plugins/check_nrpe -n -H <to be monitored IP> -c _NRPE_CHECK
CHECK_NRPE: Error receiving data from daemon.
In the logs on the to be monitored system you get something like:
nrpe[17208]: Error: Could not complete SSL handshake. 1
Which means SSL is turned on, while the check_nrpe does not want to do SSL. Remove the -n from the command.


NRPE and OpenNMS

If you installed the latest 1.6.1 installation from the RPMs, the basic NRPE checks are already configured for you. So no configuration has to be done on the OpenNMS part. Just clicking Rescan on the node should show you NRPE in the availability list.

You now have a check on the presence of NRPE. But nothing else. So you know it is running and that it is healthy. Before we can use scripts on the to be monitored system, we first have to explain a couple of things. We will explain this through the use of shell scripts, but this is true for every kind of test tool. I am just more familiar with shell scripts than with anything else.

NRPE uses exit codes to tell the system the severity of the problem that occured. However this is currently not how OpenNMS treats them. When using NRPE with OpenNMS you should regard the system as an on/off system. Meaning that your test are eighter "everything is ok" or "there is a problem" and nothing in between. To give you an overview of what the relationship is between the NRPE status codes and the OpenNMS events, I will show you a little table:

Table 5-1. Exit codes

NRPE statusNRPE exitOpenNMS eventOpenNMS code report
--1normalcode=3
STATE_OK0normal-
STATE_WARNING1minorcode=1
STATE_CRITICAL2minorcode=2
STATE_UNKNOWN3minorcode=3
STATE_DEPENDENT4minorcode=3

Everything else is reported as minor code 3, no matter what exit code you use in your tools. The official Nagios Developer Guidelines (http://nagiosplug.sourceforge.net/developer-guidelines.html#AEN78), only describes codes 0-3, so that should be sufficient. The dependent is only mentioned and -1 are only mentioned for completeness, since a couple of scripts (utils.sh in the nagios/plugins directory for example) still use state 4 and the old behaviour of using -1 is still in use.


How To use NRPE and OpenNMS

The easiest way to test the system is by creating a little test script. In the /usr/lib/nagios/plugins directory create a script called check_test.sh with the following content:

#!/bin/sh

STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
STATE_DEPENDENT=4

echo "check test"

exit ${STATE_OK}
Now we have to tell the nrpe daemon on the to be monitored machine that there is a new command that it should support. So add the following line to /etc/nagios/nrpe.cfg:
command[check_test]=/usr/lib/nagios/plugins/check_test.sh
Restart the nrpe daemon for the changes to take effect.

First we have to tell OpenNMS there is something new to watch, so add to /opt/opennms/etc/capsd-configuration.xml the following protocol:

    <protocol-plugin protocol="nrpe-test" class-name="org.opennms.netmgt.capsd.plugins.NrpePlugin" scan="on">
        <property key="banner" value="*" />
        <property key="port" value="5666" />
        <property key="timeout" value="3000" />
        <property key="retry" value="2" />
        <property key="command" value="_NRPE_CHECK" />
    </protocol-plugin>
Then tell OpenNMS that there is a new service to watch and add to /opt/opennms/etc/poller-configuration.xml the following service definition:
        <service name="nrpe-test" interval="300000" user-defined="true" status="on">
            <parameter key="retry" value="3"/>
            <parameter key="timeout" value="3000"/>
            <parameter key="port" value="5666"/>
            <parameter key="command" value="check_test"/>
            <parameter key="padding" value="2"/>
            <parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
            <parameter key="ds-name" value="nrpe"/>
        </service>
And last add to the same file a monitor line if you like the tool to be monitored:
<monitor service="nrpe-test" class-name="org.opennms.netmgt.poller.monitors.NrpeMonitor"/>
Restart OpenNMS for the changes to take effect. Changing the exit line in our check_test.sh script will now make OpenNMS react to the different states and send events. From this point on you can monitor anything on the remote system. Note that for every script that you write that you need to adjust the capsd-configuration.xml and the poller-configuration.xml. In think it might be a good habbit to prefix all the stuff that you add with the nrpe- prefix, like I did with nrpe-test, but of course you can do it as you like.

More NRPE and SSL: http://www.opennms.org/index.php/NRPE_SSL_Support