Nagios: Difference between revisions

From vwiki
Jump to navigation Jump to search
m (Draft)
m (→‎Define OID's to Poll: Revised wording)
Line 83: Line 83:
Before you start you need to know what SNMP OID's you want to poll, and what they're values should be.  For common devices and metrics you can often get by with a Google search or two, but it doesn't take much for you to need to get a bit more involved.
Before you start you need to know what SNMP OID's you want to poll, and what they're values should be.  For common devices and metrics you can often get by with a Google search or two, but it doesn't take much for you to need to get a bit more involved.


When it comes to investigating what OID's you can poll for a specific device your friend is [http://www.wtcs.org/snmp4tpc/getif.htm|GetIf].
When it comes to investigating what OID's you can poll for a specific device your friend is [http://www.wtcs.org/snmp4tpc/getif.htm GetIf].


By way of example, , which checks for the temperatures of the disks and system, status of the disks and volume, and space on the volume (service level checks for things like FTP access aren't done by SNMP). Having downloaded the MIB and done some probing GetIf, I've decided I need to monitor the following OID's...
Having downloaded the MIB and done some probing GetIf, I've decided I need to monitor the following OID's...


{|cellpadding="4" cellspacing="0" border="1"
{|cellpadding="4" cellspacing="0" border="1"

Revision as of 08:12, 31 August 2011

Path Description
/etc/nagios3/conf.d Config files
/etc/nagios-plugins/config Plugin commands
/usr/lib/nagios/plugins Plugin executables
nagios3 -v /etc/nagios3/nagios.cfg Config check
service nagios3 restart Restart service (reloads config)

./usr/share/nagios ./usr/lib/nagios ./var/lib/nagios


define service{ use generic-service ; Inherit default values from a template hostgroup_name zimbra-servers service_description IMAP check_command check_imap }

define service{ use generic-service ; Inherit default values from a template hostgroup_name zimbra-servers service_description SMTP check_command check_smtp }

  1. check that MySQL services are up

define service {

       hostgroup_name                  mysql-servers
       service_description             MySQL
       check_command                   check_mysql
       use                             generic-service
       notification_interval           0 ; set > 0 if you want to be renotified

}


define command{

       command_name    check_http_auth
       command_line    /usr/lib/nagios/plugins/check_http -H '$HOSTADDRESS$' -I '$HOSTADDRESS$' -a '$ARG1$'q
       } 


define service{

       use                             generic-service         ; Name of service template to use
       host_name                       localhost
       service_description             HTTP
       check_command                   check_http_auth!user:pass  ; Enter actual user/pass
       }


define hostextinfo{

       hostgroup_name   debian-servers
       notes            Debian GNU/Linux servers
  1. notes_url http://webserver.localhost.localdomain/hostinfo.pl?host=netware1
       icon_image       base/debian.png
       icon_image_alt   Debian GNU/Linux
       vrml_image       debian.png
       statusmap_image  base/debian.gd2
       }

define hostextinfo{

       hostgroup_name   ubuntu-servers
       notes            Ubuntu servers
       icon_image       base/ubuntu.png
       icon_image_alt   Ubuntu
       vrml_image       ubuntu.png
       statusmap_image  base/ubuntu.gd2
       }


Create SNMP Checks

Everything here creates various checks for my QNAP NAS, which I've used as an example.

Define OID's to Poll

Before you start you need to know what SNMP OID's you want to poll, and what they're values should be. For common devices and metrics you can often get by with a Google search or two, but it doesn't take much for you to need to get a bit more involved.

When it comes to investigating what OID's you can poll for a specific device your friend is GetIf.

Having downloaded the MIB and done some probing GetIf, I've decided I need to monitor the following OID's...

OID Description Example Return Data
.1.3.6.1.4.1.24681.1.2.6.0 System Temperature 41 C/105 F
.1.3.6.1.4.1.24681.1.2.17.1.6.1 System Volume 1 Status Ready
.1.3.6.1.4.1.24681.1.2.17.1.5.1 System Volume 1 Space 1.74 TB
.1.3.6.1.4.1.24681.1.2.11.1.7.1 Physical Disk 1 SMART Status GOOD
.1.3.6.1.4.1.24681.1.2.11.1.3.1 Physical Disk 1 Temperature 35 C/95 F

Create Commands

Each type of check needs a command defined for it, in which you can have flexibility in that if you've certain checks that will be similar (eg checks for status of disk 1, disk 2 etc) then you can add arguments to the checks that can be defined later on. I created a new file, called /etc/nagios3/conf.d/commands_qnap.cfg and added the following...

define command{
        command_name    check_qnap_sys_temp
        command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.6.0 -w 45 -c 55 -l Temp -u C
        }
  • -H '$HOSTADDRESS$' - This is a standard wildcard for all check commands, Nagios substitutes the device's IP address
  • -o .1.3.6.1.4.1.24681.1.2.6.0 - The SNMP OID being checked ** .iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemTemperature.0
  • -w 45 - The warning threshold
  • -c 55 - The critical threshold
  • -l Temp - A label for the check (appears in the checks Status Information column in Nagios display)
  • -u C - The units of the metric being checked (appears in the checks Status Information column in Nagios display)


define command{
        command_name    check_qnap_sysvol_status
        command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.17.1.6.$ARG1$ -l "Volume Status"
        }
  • -o .1.3.6.1.4.1.24681.1.2.17.1.6.$ARG1$ - The SNMP OID being checked, $ARG1$ is used as a wildcard so that if I had more than one volume I could repeat the check for volume 1, 2 etc without creating a separate check command for each.
    • .iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemVolumeTable.SysVolumeEntry.SysVolumeStatus.$ARG1$


define command{
        command_name    check_qnap_sysvol_space
        command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.17.1.5.$ARG1$ -w $ARG2$: -c $ARG3$: -l "Volume Space" -u TB
        }
  • -o .1.3.6.1.4.1.24681.1.2.17.1.5.$ARG1$ - The SNMP OID being checked, as above $ARG1$ is used as a wildcard so that if I had more than one volume I could repeat the check for volume 1, 2 etc without creating a separate check command for each.
    • .iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemVolumeTable.SysVolumeEntry.SysVolumeFreeSize.$ARG1$


define command{
        command_name    check_qnap_disk_status
        command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.11.1.7.$ARG1$ -l "SMART Info State"
        }


define command{
        command_name    check_qnap_disk_temp
        command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.11.1.3.$ARG1$ -w 45 -c 55 -l Temp -u C
        }