Difference between revisions of "Nagios"

Jump to navigation Jump to search
1,745 bytes added ,  14:54, 3 October 2013
m
Reverted edits by Ipodsoft (talk) to last revision by Sstrutt
(Added "Check Tuning" and Meta)
m (Reverted edits by Ipodsoft (talk) to last revision by Sstrutt)
 
(3 intermediate revisions by 2 users not shown)
Line 46: Line 46:
|-
|-
| <code> .1.3.6.1.4.1.24681.1.2.17.1.5.1 </code>  || System Volume 1 Space || <code> 1.74 TB </code>
| <code> .1.3.6.1.4.1.24681.1.2.17.1.5.1 </code>  || System Volume 1 Space || <code> 1.74 TB </code>
|-
| <code> .1.3.6.1.4.1.24681.1.2.11.1.4.1 </code>  || Physical Disk 1 Status || <code> ready </code>
|-
|-
| <code> .1.3.6.1.4.1.24681.1.2.11.1.7.1 </code>  || Physical Disk 1 SMART Status || <code> GOOD </code>
| <code> .1.3.6.1.4.1.24681.1.2.11.1.7.1 </code>  || Physical Disk 1 SMART Status || <code> GOOD </code>
Line 58: Line 60:
I created a new file, called <code>/etc/nagios3/conf.d/commands_qnap.cfg</code> and added the following...
I created a new file, called <code>/etc/nagios3/conf.d/commands_qnap.cfg</code> and added the following...


==== System Temperature ====
  define command{
  define command{
         command_name    check_qnap_sys_temp
         command_name    check_qnap_sys_temp
Line 70: Line 73:
* <code> -u C </code> - The units of the metric being checked (appears in the check's Status Information column in Nagios display)
* <code> -u C </code> - The units of the metric being checked (appears in the check's Status Information column in Nagios display)


 
==== Volume Status ====
  define command{
  define command{
         command_name    check_qnap_sysvol_status
         command_name    check_qnap_sysvol_status
Line 79: Line 82:
* <code> -r "Ready" </code> - The text expected back from the poll, anything else causes a critical error
* <code> -r "Ready" </code> - The text expected back from the poll, anything else causes a critical error


 
==== Volume Space ====
  define command{
  define command{
         command_name    check_qnap_sysvol_space
         command_name    check_qnap_sysvol_space
Line 89: Line 92:
** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemVolumeTable.SysVolumeEntry.SysVolumeFreeSize.$ARG1$</code>
** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemVolumeTable.SysVolumeEntry.SysVolumeFreeSize.$ARG1$</code>


==== Disk Status ====
define command{
        command_name    check_qnap_disk_status
        command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.11.1.4.$ARG1$ -m /etc/nagios3/mibs/QNAP-NAS.mib -l "Disk Status" -r 0
        }
* <code> -o .1.3.6.1.4.1.24681.1.2.11.1.7.$ARG1$ </code> - The SNMP OID being checked, similar to above $ARG1$ is used as a command parameter so that I can create separate checks for the individual disks without creating a separate check command for each.
** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemHdTable.HdEntry.HdStatus.$ARG1$</code>
* <code> -m /etc/nagios3/mibs/QNAP-NAS.mib </code> - Path to the QNAP MIB file.  The value returned is an integer, 0 for ready/good, a negative value for a fault.  In order to translate the value (eg <code>-9</code>) to its actual meaning (eg <code>rwError</code>), Nagios needs access to the MIB file.  You will need to download it from your NAS (from the Network Services | SNMP Settings page), and copy it to path indicated on your Nagios server.
* <code> -r 0 </code> - The data expected back from the poll, 0 maps to <code>ready</code>anything else causes a critical error


==== Disk SMART Status ====
  define command{
  define command{
         command_name    check_qnap_disk_status
         command_name    check_qnap_disk_smart_status
         command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.11.1.7.$ARG1$ -l "SMART Info State" -r "GOOD"
         command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.11.1.7.$ARG1$ -l "SMART Info State" -r "GOOD"
         }
         }
Line 98: Line 111:
* <code> -r "GOOD" </code> - The text expected back from the poll, anything else causes a critical error
* <code> -r "GOOD" </code> - The text expected back from the poll, anything else causes a critical error


==== Disk Temperature ====
  define command{
  define command{
         command_name    check_qnap_disk_temp
         command_name    check_qnap_disk_temp
Line 104: Line 118:
* <code> -o .1.3.6.1.4.1.24681.1.2.11.1.3.$ARG1$ </code> - The SNMP OID being checked, as above $ARG1$ is used as a command parameter so that I can create separate checks for the individual disks without creating a separate check command for each.
* <code> -o .1.3.6.1.4.1.24681.1.2.11.1.3.$ARG1$ </code> - The SNMP OID being checked, as above $ARG1$ is used as a command parameter so that I can create separate checks for the individual disks without creating a separate check command for each.
** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemHdTable.HdEntry.HdTemperature.$ARG1$</code>
** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemHdTable.HdEntry.HdTemperature.$ARG1$</code>


=== Create Services ===
=== Create Services ===
Line 144: Line 157:
         service_description    Status Disk 1
         service_description    Status Disk 1
         check_command          check_qnap_disk_status!1
         check_command          check_qnap_disk_status!1
        }
define service{
        use                    generic-service
        hostgroup_name          qnap-nas
        service_description    SMART Disk 1
        check_command          check_qnap_disk_smart_status!1
         }
         }


Line 309: Line 329:


== NRPE ==
== NRPE ==
The Nagios Remote Plugin Executor allows Nagios checks to completed on remote servers in a similar fashion to performing checks on the Nagios server.  Whilst its not always necessary, as many remote checks can be performed by probing remotely accessible services such as SNMP or HTTP, there are times when such checks are not suitable, for example...
The '''Nagios Remote Plugin Executor''' allows Nagios checks to completed on remote servers in a similar fashion to performing checks on the Nagios server.  Whilst its not always necessary, as many remote checks can be performed by probing remotely accessible services (such as SNMP or HTTP), there are times when such checks are not suitable, for example...
* Running checks that aren't easily achievable via SNMP
* Running checks that aren't easily achievable via SNMP
* Checking services such as MySQL that should only be accessible local to the server
* Checking local services such as MySQL that aren't accessible remotely from the server
* Running HTTP checks to test your web servers from more than one location
* Running HTTP checks to test your web servers from more than one location
** EG local to server to ensure the web-server itself is OK, and remotely to check that access is likely to OK for global users
** EG local to server to ensure the web-server itself is OK, and remotely to check that access is likely to OK for global users


The NRPE server that runs on remote monitored machines does require quite a few additional packages to be installed (see below for in-exhaustive list), and if you are concerned you try the alternative approach of getting data back from your remote server via SNMP as described in this example [[#Ubuntu_Software_Updates_Monitor|Ubuntu Software Updates Monitor]].  This can make for a more lightweight solution, but will require you to write your own monitoring scripts to be called by the SNMP daemon. Swings and roundabouts.
The NRPE server that runs on remote monitored machines does require quite a few additional packages to be installed (see below for in-exhaustive list), and if you are concerned you can try the alternative approach of getting data back from your remote server via SNMP as described in this example [[#Ubuntu_Software_Updates_Monitor|Ubuntu Software Updates Monitor]].  This can make for a more lightweight solution, but will require you to write your own monitoring scripts to be called by the SNMP daemon.
 
Additional packages required by NRPE...
* mysql-common
* mysql-common
* radiusclient1
* radiusclient1
Line 322: Line 344:
* snmp
* snmp


=== Setup ===
The procedures below will get NRPE running to monitor disk space, load and MySQL service availability on a remote server.
The procedures below will get NRPE running to monitor disk space, load and MySQL service availability on a remote server.


Navigation menu