Handling SNMP Traps in Nagios
Last updated 25 Feb 2021I'm writing this to help anyone else trying to do this in 2019 as the best guide I could find was a post from 2010 from Askar Ali Khan's blog called Nothing but Linux. I am indebted primarily to this and the book apparently used (but not credited) as the basis for it (Pro Nagios 2.0 by James Turnbull's from 2006) but also countless other posts on the Nagios support forums and Stack Overflow etc. This is my experience of configuring it with Nagios Core 4.3.2 on Red Hat Enterprise Linux 7.4 with a Sophos Email Appliance as the device sending the traps - Nagios was installed from the official RHEL repos to /etc/nagios
with the plugins in /usr/lib64/nagios/plugins
.
Although Nagios is fine for active SNMP checks where it polls the target device (through the standard check_snmp
plugin) it cannot natively handle SNMP traps where Nagios waits for target device to initiates alert (known as a passive check). For this, we need a daemon to capture SNMP messages (snmptrapd) and a daemon to translate them and pass them to Nagios (snmptt) - these are both available from the RHEL repos. I'm writing this mostly for the benefit of people who are familiar with Nagios but not with SNMP traps so I'll try not to waste time going into the basics of Nagios along the way.
For SNMPTRAPD on a Linux distribution using the systemd init system, the file you need to modify is /etc/sysconfig/snmptrapd.options
with the line OPTIONS="-On -Lsd -p /var/run/snmptrapd.pid"
and then edit /etc/snmp/snmptrapd.conf
to include the following lines:
traphandle default /usr/sbin/snmptthandler
disableAuthorization yes
With SNMPTT installed from the RHEL repos, the only configuration needed should be adding the following lines to /etc/snmp/snmptt.ini
:
mode = daemon
daemon_fork = 1
daemon_uid = snmptt
spool_directory = /var/spool/snmptt/
sleep = 5
dns_enable = 1
strip_domain = 1
log_enable = 1
syslog_enable = 0
exec_enable = 1
snmptt_conf_files = <
/etc/snmp/snmptt.conf
MIBs containing the reference data for SNMP OIDs by default should be in /usr/share/snmp/mibs/
- this is where I had mine anyway but some people in the comments for the NbL blogpost noted the following did not work until they moved them into this directory. The snmpttconverttmib
command will take the traps from a given MIB and create the necessary config for SNMPTT to pass on to Nagios. Under my install of Nagios Core (from the RHEL repos), my plugins are in /usr/lib64/nagios/plugins
not /usr/local/nagios/libexec
- also, my install did not include the submit_check_result
script but you can grab it from the source here, copy it to the eventhandlers
directory and make it executable. Here is an example command to run for the Sophos Email Appliance MIB I'd downloaded from its web interface:
snmpttconvertmib --in=/usr/share/snmp/mibs/SOPHOS.txt --out=/etc/snmp/snmptt.conf --exec='/usr/lib64/nagios/plugins/eventhandlers/submit_check_result $r "SNMP Alerts" 1'
This assumes for the host sending the traps that you have a service with the description "SNMP Alerts" and you want an exit code of 1 which will translate to a warning status in Nagios. It's useful to create a service template for SNMP traps in /etc/nagios/objects/templates.cfg
. In Askar's template, he disabled active checks since you are only interested in receiving passive checks and you don't want to inadvertently clear an alert. Unfortunately, this results in an ugly question mark icon displaying next to the service in the Nagios CGI, so I instead elected to enable active checks but set a check interval of 60 years so it would never be triggered. Also I've replaced the deprecated directives normal_check_interval
and retry_check_interval
with check_interval
and retry_interval
respectively. The check_command
can be used to reset the service status to OK - Askar had used check-host-alive
which (by default) is ping-based but I have used an SNMP availability check as a slight improvement. Here is an example you can use in your /etc/nagios/objects/commands.cfg
or adjust accordingly:
define command {
command_name check_snmp_public
command_line $USER1$/check_snmp -H $HOSTADDRESS$ -C public -o SNMPv2-MIB::sysDescr.0 -l Availability:
}
Personally, I prefer to use the Submit a passive check result option on the service status details page as you can supply a comment as to why the trap was sent or what you did to correct it (which can be useful when looking back through the alert history).
# SNMP Trap service template
define service {
name trap-service
use generic-service ; Otherwise inheriting values from your standard service
register 0 ; This must be 0 since it's a template not a service!
service_description SNMP Alerts ; Make sure this matches your snmpttconvertmib command
is_volatile 1
check_command check_snmp_public ; Used to reset the status to OK when 'Schedule an immediate check of this service' is selected.
flap_detection_enabled 0 ; Flap detection is disabled
process_perf_data 0 ; Do not Process performance data
max_check_attempts 1 ; Leave as 1
check_interval 31536000 ; Active checks enabled and interval set to 60 years
retry_interval 1 ; Leave as 1
passive_checks_enabled 1 ; Enables passive checks
check_period 24x7
notification_interval 31536000 ; Notification interval. Set to a very high number to prevent you from getting pages of previously received traps (1 year - restart Nagios at least once a year! - do not set to 0!).
active_checks_enabled 1 ; Prevent active checks from occuring as we are only using passive checks.
notification_options w,u,c ; Notify on warning, unknown and critical.
contact_groups admins
}
Then in your /etc/nagios/objects/services.cfg
file you can define the following service (ensuring the host or hostgroups are already defined in their relevant files):
define service {
service_description SNMP Alerts
host_name servername01
use trap-service
}
One of the problems I had at this point was the name of the device not matching the name of the host as defined in Nagios. You can check how the traps look with tail -f /var/log/messages | grep snmptrapd
. If you have dns_enable = 0
set in /etc/snmp/snmptt.ini
then it will come through as the IP address and likely not match your host name as defined in Nagios. If you have dns_enable = 1
and strip_domain = 1
then you may see SERVERNAME01.DOMAIN.COM come through as SERVERNAME01 which will still not match your host if it is defined in Nagios in lowercase (annoyingly, it still won't match it even if you have an alias defined for your host in uppercase). I'm hoping to come up with a more manageable solution but unfortunately in the meantime I have had to resort to a section in /etc/hosts
of the Nagios server which it will use for a reverse lookup with dns_enable = 1
.
# reverse lookups used by snmp so that trap events resolve to correct host name
10.1.1.10 servername01
Another option I've considered but not yet deployed may be to use a customised version of the submit_check_result
script to force the hostname to lower case with shell parameter expansion (or hostname="${1^^}"
if your Nagios hosts are named in uppercase):
#!/bin/sh
# force the first argument in lowercase
hostname="${1,,}"
echocmd="/bin/echo"
CommandFile="/usr/local/nagios/var/rw/nagios.cmd"
# get the current date/time in seconds since UNIX epoch
datetime=`date +%s`
# create the command line to add to the command file
cmdline="[$datetime] PROCESS_SERVICE_CHECK_RESULT;$hostname;$2;$3;$4"
# append the command to the end of the command file
`$echocmd $cmdline >> $CommandFile`
Return to roddie.digital / top