roddie.digital

Handling SNMP Traps in Nagios

I'm writing this to help anyone else trying to do this in 2019 as the best guide I could find was a post from 2010 from Askar Ali Khan's blog called Nothing but Linux. I am indebted primarily to this and the book apparently used (but not credited) as the basis for it (Pro Nagios 2.0 by James Turnbull's from 2006) but also countless other posts on the Nagios support forums and Stack Overflow etc. This is my experience of configuring it with Nagios Core 4.3.2 on Red Hat Enterprise Linux 7.4 with a Sophos Email Appliance as the device sending the traps - Nagios was installed from the official RHEL repos to /etc/nagios with the plugins in /usr/lib64/nagios/plugins.

Although Nagios is fine for active SNMP checks where it polls the target device (through the standard check_snmp plugin) it cannot natively handle SNMP traps where Nagios waits for target device to initiates alert (known as a passive check). For this, we need a daemon to capture SNMP messages (snmptrapd) and a daemon to translate them and pass them to Nagios (snmptt) - these are both available from the RHEL repos. I'm writing this mostly for the benefit of people who are familiar with Nagios but not with SNMP traps so I'll try not to waste time going into the basics of Nagios along the way.

For SNMPTRAPD on a Linux distribution using the systemd init system, the file you need to modify is /etc/sysconfig/snmptrapd.options with the line OPTIONS="-On -Lsd -p /var/run/snmptrapd.pid" and then edit /etc/snmp/snmptrapd.conf to include the following lines:

traphandle default /usr/sbin/snmptthandler
disableAuthorization yes

With SNMPTT installed from the RHEL repos, the only configuration needed should be adding the following lines to /etc/snmp/snmptt.ini:

mode = daemon
daemon_fork = 1
daemon_uid = snmptt
spool_directory = /var/spool/snmptt/
sleep = 5
dns_enable = 1
strip_domain = 1
log_enable = 1
syslog_enable = 0
exec_enable = 1
snmptt_conf_files = <
/etc/snmp/snmptt.conf

MIBs containing the reference data for SNMP OIDs by default should be in /usr/share/snmp/mibs/ - this is where I had mine anyway but some people in the comments for the NbL blogpost noted the following did not work until they moved them into this directory. The snmpttconverttmib command will take the traps from a given MIB and create the necessary config for SNMPTT to pass on to Nagios. Under my install of Nagios Core (from the RHEL repos), my plugins are in /usr/lib64/nagios/plugins not /usr/local/nagios/libexec - also, my install did not include the submit_check_result script but you can grab it from the source here, copy it to the eventhandlers directory and make it executable. Here is an example command to run for the Sophos Email Appliance MIB I'd downloaded from its web interface:

snmpttconvertmib --in=/usr/share/snmp/mibs/SOPHOS.txt --out=/etc/snmp/snmptt.conf --exec='/usr/lib64/nagios/plugins/eventhandlers/submit_check_result $r "SNMP Alerts" 1'

This assumes for the host sending the traps that you have a service with the description "SNMP Alerts" and you want an exit code of 1 which will translate to a warning status in Nagios. It's useful to create a service template for SNMP traps in /etc/nagios/objects/templates.cfg. In Askar's template, he disabled active checks since you are only interested in receiving passive checks and you don't want to inadvertently clear an alert. Unfortunately, this results in an ugly question mark icon displaying next to the service in the Nagios CGI, so I instead elected to enable active checks but set a check interval of 60 years so it would never be triggered. Also I've replaced the deprecated directives normal_check_interval and retry_check_interval with check_interval and retry_interval respectively. The check_command can be used to reset the service status to OK - Askar had used check-host-alive which (by default) is ping-based but I have used an SNMP availability check as a slight improvement. Here is an example you can use in your /etc/nagios/objects/commands.cfg or adjust accordingly:

define command{
        command_name    check_snmp_public
        command_line    $USER1$/check_snmp -H $HOSTADDRESS$ -C public -o SNMPv2-MIB::sysDescr.0 -l Availability:
        }

Personally, I prefer to use the Submit a passive check result option on the service status details page as you can supply a comment as to why the trap was sent or what you did to correct it (which can be useful when looking back through the alert history).

# SNMP Trap service template

define service{
        name                    trap-service
        use                     generic-service         ; Otherwise inheriting values from your standard service
        register                0                       ; This must be 0 since it's a template not a service!
        service_description     SNMP Alerts             ; Make sure this matches your snmpttconvertmib command
        is_volatile             1
        check_command           check_snmp_public       ; Used to reset the status to OK when 'Schedule an immediate check of this service' is selected.
        flap_detection_enabled  0                       ; Flap detection is disabled
        process_perf_data       0                       ; Do not Process performance data
        max_check_attempts      1                       ; Leave as 1
        check_interval          31536000                ; Active checks enabled and interval set to 60 years
        retry_interval          1                       ; Leave as 1
        passive_checks_enabled  1                       ; Enables passive checks
        check_period            24x7
        notification_interval   31536000                ; Notification interval. Set to a very high number to prevent you from getting pages of previously received traps (1 year - restart Nagios at least once a year! - do not set to 0!).
        active_checks_enabled   1                       ; Prevent active checks from occuring as we are only using passive checks.
        notification_options    w,u,c                   ; Notify on warning, unknown and critical.
        contact_groups          admins
        }

Then in your /etc/nagios/objects/services.cfg file you can define the following service (ensuring the host or hostgroups are already defined in their relevant files):

define service{
        service_description     SNMP Alerts
        host_name               servername01
        use                     trap-service
        }

One of the problems I had at this point was the name of the device not matching the name of the host as defined in Nagios. You can check how the traps look with tail -f /var/log/messages | grep snmptrapd. If you have dns_enable = 0 set in /etc/snmp/snmptt.ini then it will come through as the IP address and likely not match your host name as defined in Nagios. If you have dns_enable = 1 and strip_domain = 1 then you may see SERVERNAME01.DOMAIN.COM come through as SERVERNAME01 which will still not match your host if it is defined in Nagios in lowercase (annoyingly, it still won't match it even if you have an alias defined for your host in uppercase). I'm hoping to come up with a more manageable solution but unfortunately in the meantime I have had to resort to a section in /etc/hosts of the Nagios server which it will use for a reverse lookup with dns_enable = 1.

# reverse lookups used by snmp so that trap events resolve to correct host name

10.1.1.10    servername01

Another option I've considered but not yet deployed may be to use a customised version of the submit_check_result script to force the hostname to lower case with shell parameter expansion (or hostname="${1^^}" if your Nagios hosts are named in uppercase):

#!/bin/sh

# force the first argument in lowercase
hostname="${1,,}"

echocmd="/bin/echo"

CommandFile="/usr/local/nagios/var/rw/nagios.cmd"

# get the current date/time in seconds since UNIX epoch
datetime=`date +%s`

# create the command line to add to the command file
cmdline="[$datetime] PROCESS_SERVICE_CHECK_RESULT;$hostname;$2;$3;$4"

# append the command to the end of the command file
`$echocmd $cmdline >> $CommandFile`

Please send any corrections, clarifications or suggestions to me on Twitter: @roddie_digital