Nagios
Path | Description |
---|---|
/etc/nagios3/conf.d |
Config files |
/etc/nagios-plugins/config |
Plugin commands |
/usr/lib/nagios/plugins |
Plugin executables |
nagios3 -v /etc/nagios3/nagios.cfg |
Config check |
service nagios3 restart |
Restart service (reloads config) |
./usr/share/nagios ./usr/lib/nagios ./var/lib/nagios
define service{
use generic-service ; Inherit default values from a template
hostgroup_name zimbra-servers
service_description IMAP
check_command check_imap
}
define service{ use generic-service ; Inherit default values from a template hostgroup_name zimbra-servers service_description SMTP check_command check_smtp }
- check that MySQL services are up
define service {
hostgroup_name mysql-servers service_description MySQL check_command check_mysql use generic-service notification_interval 0 ; set > 0 if you want to be renotified
}
define command{
command_name check_http_auth command_line /usr/lib/nagios/plugins/check_http -H '$HOSTADDRESS$' -I '$HOSTADDRESS$' -a '$ARG1$'q }
define service{
use generic-service ; Name of service template to use host_name localhost service_description HTTP check_command check_http_auth!user:pass ; Enter actual user/pass }
define hostextinfo{
hostgroup_name debian-servers notes Debian GNU/Linux servers
icon_image base/debian.png icon_image_alt Debian GNU/Linux vrml_image debian.png statusmap_image base/debian.gd2 }
define hostextinfo{
hostgroup_name ubuntu-servers notes Ubuntu servers icon_image base/ubuntu.png icon_image_alt Ubuntu vrml_image ubuntu.png statusmap_image base/ubuntu.gd2 }
Create SNMP Checks
Everything here creates various checks for my QNAP NAS, which I've used as an example.
Define OID's to Poll
Before you start you need to know what SNMP OID's you want to poll, and what they're values should be. For common devices and metrics you can often get by with a Google search or two, but it doesn't take much for you to need to get a bit more involved.
When it comes to investigating what OID's you can poll for a specific device your friend is [1].
By way of example, , which checks for the temperatures of the disks and system, status of the disks and volume, and space on the volume (service level checks for things like FTP access aren't done by SNMP). Having downloaded the MIB and done some probing GetIf, I've decided I need to monitor the following OID's...
OID | Description | Example Return Data |
---|---|---|
.1.3.6.1.4.1.24681.1.2.6.0 |
System Temperature | 41 C/105 F
|
.1.3.6.1.4.1.24681.1.2.17.1.6.1 |
System Volume 1 Status | Ready
|
.1.3.6.1.4.1.24681.1.2.17.1.5.1 |
System Volume 1 Space | 1.74 TB
|
.1.3.6.1.4.1.24681.1.2.11.1.7.1 |
Physical Disk 1 SMART Status | GOOD
|
.1.3.6.1.4.1.24681.1.2.11.1.3.1 |
Physical Disk 1 Temperature | 35 C/95 F
|
Create Commands
Each type of check needs a command defined for it, in which you can have flexibility in that if you've certain checks that will be similar (eg checks for status of disk 1, disk 2 etc) then you can add arguments to the checks that can be defined later on. I created a new file, called /etc/nagios3/conf.d/commands_qnap.cfg
and added the following...
define command{ command_name check_qnap_sys_temp command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.6.0 -w 45 -c 55 -l Temp -u C }
-H '$HOSTADDRESS$'
- This is a standard wildcard for all check commands, Nagios substitutes the device's IP address-o .1.3.6.1.4.1.24681.1.2.6.0
- The SNMP OID being checked **.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemTemperature.0
-w 45
- The warning threshold-c 55
- The critical threshold-l Temp
- A label for the check (appears in the checks Status Information column in Nagios display)-u C
- The units of the metric being checked (appears in the checks Status Information column in Nagios display)
define command{ command_name check_qnap_sysvol_status command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.17.1.6.$ARG1$ -l "Volume Status" }
-o .1.3.6.1.4.1.24681.1.2.17.1.6.$ARG1$
- The SNMP OID being checked, $ARG1$ is used as a wildcard so that if I had more than one volume I could repeat the check for volume 1, 2 etc without creating a separate check command for each..iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemVolumeTable.SysVolumeEntry.SysVolumeStatus.$ARG1$
define command{ command_name check_qnap_sysvol_space command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.17.1.5.$ARG1$ -w $ARG2$: -c $ARG3$: -l "Volume Space" -u TB }
-o .1.3.6.1.4.1.24681.1.2.17.1.5.$ARG1$
- The SNMP OID being checked, as above $ARG1$ is used as a wildcard so that if I had more than one volume I could repeat the check for volume 1, 2 etc without creating a separate check command for each..iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemVolumeTable.SysVolumeEntry.SysVolumeFreeSize.$ARG1$
define command{ command_name check_qnap_disk_status command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.11.1.7.$ARG1$ -l "SMART Info State" }
define command{ command_name check_qnap_disk_temp command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.11.1.3.$ARG1$ -w 45 -c 55 -l Temp -u C }