Difference between revisions of "Nagios"

Jump to navigation Jump to search
4,891 bytes added ,  15:01, 30 August 2011
m
Draft
(Initial draft)
 
m (Draft)
Line 1: Line 1:
Config files /etc/nagios3/conf.d
{|cellpadding="4" cellspacing="0" border="1"
Config check nagios3 -v /etc/nagios3/nagios.cfg
|- style="background-color:#bbddff;"
Plugins /etc/nagios-plugins/config
! Path  !! Description
|-
| <code> /etc/nagios3/conf.d </code>  || Config files
|-
| <code> /etc/nagios-plugins/config </code>  || Plugin commands
|-
| <code> /usr/lib/nagios/plugins </code>  || Plugin executables
|-
| <code> nagios3 -v /etc/nagios3/nagios.cfg </code>  || Config check
|-
| <code> service nagios3 restart </code>  || Restart service (reloads config)
|}


./usr/share/nagios
./usr/share/nagios
Line 64: Line 75:
         statusmap_image  base/ubuntu.gd2
         statusmap_image  base/ubuntu.gd2
         }
         }
=== Create SNMP Checks ===
Everything here creates various checks for my '''QNAP NAS''', which I've used as an example.
==== Define OID's to Poll ====
Before you start you need to know what SNMP OID's you want to poll, and what they're values should be.  For common devices and metrics you can often get by with a Google search or two, but it doesn't take much for you to need to get a bit more involved.
When it comes to investigating what OID's you can poll for a specific device your friend is [http://www.wtcs.org/snmp4tpc/getif.htm|GetIf].
By way of example, , which checks for the temperatures of the disks and system, status of the disks and volume, and space on the volume (service level checks for things like FTP access aren't done by SNMP). Having downloaded the MIB and done some probing GetIf, I've decided I need to monitor the following OID's...
{|cellpadding="4" cellspacing="0" border="1"
|- style="background-color:#bbddff;"
! OID  !! Description    !! Example Return Data
|-
| <code> .1.3.6.1.4.1.24681.1.2.6.0 </code>  || System Temperature || <code> 41 C/105 F </code>
|-
| <code> .1.3.6.1.4.1.24681.1.2.17.1.6.1 </code>  || System Volume 1 Status || <code> Ready </code>
|-
| <code> .1.3.6.1.4.1.24681.1.2.17.1.5.1 </code>  || System Volume 1 Space || <code> 1.74 TB </code>
|-
| <code> .1.3.6.1.4.1.24681.1.2.11.1.7.1 </code>  || Physical Disk 1 SMART Status || <code> GOOD </code>
|-
| <code> .1.3.6.1.4.1.24681.1.2.11.1.3.1 </code>  || Physical Disk 1 Temperature || <code> 35 C/95 F </code>
|}
==== Create Commands ====
Each type of check needs a command defined for it, in which you can have flexibility in that if you've certain checks that will be similar (eg checks for status of disk 1, disk 2 etc) then you can add arguments to the checks that can be defined later on.  I created a new file, called <code>/etc/nagios3/conf.d/commands_qnap.cfg</code> and added the following...
define command{
        command_name    check_qnap_sys_temp
        command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.6.0 -w 45 -c 55 -l Temp -u C
        }
* <code> -H '$HOSTADDRESS$' </code> - This is a standard wildcard for all check commands, Nagios substitutes the device's IP address
* <code> -o .1.3.6.1.4.1.24681.1.2.6.0 </code> - The SNMP OID being checked ** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemTemperature.0</code>
* <code> -w 45 </code> - The warning threshold
* <code> -c 55 </code> - The critical threshold
* <code> -l Temp </code> - A label for the check (appears in the checks Status Information column in Nagios display)
* <code> -u C </code> - The units of the metric being checked (appears in the checks Status Information column in Nagios display)
define command{
        command_name    check_qnap_sysvol_status
        command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.17.1.6.$ARG1$ -l "Volume Status"
        }
* <code> -o .1.3.6.1.4.1.24681.1.2.17.1.6.$ARG1$ </code> - The SNMP OID being checked, $ARG1$ is used as a wildcard so that if I had more than one volume I could repeat the check for volume 1, 2 etc without creating a separate check command for each.
** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemVolumeTable.SysVolumeEntry.SysVolumeStatus.$ARG1$</code>
define command{
        command_name    check_qnap_sysvol_space
        command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.17.1.5.$ARG1$ -w $ARG2$: -c $ARG3$: -l "Volume Space" -u TB
        }
* <code> -o .1.3.6.1.4.1.24681.1.2.17.1.5.$ARG1$ </code> - The SNMP OID being checked, as above $ARG1$ is used as a wildcard so that if I had more than one volume I could repeat the check for volume 1, 2 etc without creating a separate check command for each.
** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemVolumeTable.SysVolumeEntry.SysVolumeFreeSize.$ARG1$</code>
define command{
        command_name    check_qnap_disk_status
        command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.11.1.7.$ARG1$ -l "SMART Info State"
        }
define command{
        command_name    check_qnap_disk_temp
        command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.11.1.3.$ARG1$ -w 45 -c 55 -l Temp -u C
        }
[[Category:Nagios]]
[[Category:SNMP]]
[[Category:QNAP]]

Navigation menu