2,187
edits
(Added "Check Tuning" and Meta) |
|||
(3 intermediate revisions by 2 users not shown) | |||
Line 46: | Line 46: | ||
|- | |- | ||
| <code> .1.3.6.1.4.1.24681.1.2.17.1.5.1 </code> || System Volume 1 Space || <code> 1.74 TB </code> | | <code> .1.3.6.1.4.1.24681.1.2.17.1.5.1 </code> || System Volume 1 Space || <code> 1.74 TB </code> | ||
|- | |||
| <code> .1.3.6.1.4.1.24681.1.2.11.1.4.1 </code> || Physical Disk 1 Status || <code> ready </code> | |||
|- | |- | ||
| <code> .1.3.6.1.4.1.24681.1.2.11.1.7.1 </code> || Physical Disk 1 SMART Status || <code> GOOD </code> | | <code> .1.3.6.1.4.1.24681.1.2.11.1.7.1 </code> || Physical Disk 1 SMART Status || <code> GOOD </code> | ||
Line 58: | Line 60: | ||
I created a new file, called <code>/etc/nagios3/conf.d/commands_qnap.cfg</code> and added the following... | I created a new file, called <code>/etc/nagios3/conf.d/commands_qnap.cfg</code> and added the following... | ||
==== System Temperature ==== | |||
define command{ | define command{ | ||
command_name check_qnap_sys_temp | command_name check_qnap_sys_temp | ||
Line 70: | Line 73: | ||
* <code> -u C </code> - The units of the metric being checked (appears in the check's Status Information column in Nagios display) | * <code> -u C </code> - The units of the metric being checked (appears in the check's Status Information column in Nagios display) | ||
==== Volume Status ==== | |||
define command{ | define command{ | ||
command_name check_qnap_sysvol_status | command_name check_qnap_sysvol_status | ||
Line 79: | Line 82: | ||
* <code> -r "Ready" </code> - The text expected back from the poll, anything else causes a critical error | * <code> -r "Ready" </code> - The text expected back from the poll, anything else causes a critical error | ||
==== Volume Space ==== | |||
define command{ | define command{ | ||
command_name check_qnap_sysvol_space | command_name check_qnap_sysvol_space | ||
Line 89: | Line 92: | ||
** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemVolumeTable.SysVolumeEntry.SysVolumeFreeSize.$ARG1$</code> | ** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemVolumeTable.SysVolumeEntry.SysVolumeFreeSize.$ARG1$</code> | ||
==== Disk Status ==== | |||
define command{ | |||
command_name check_qnap_disk_status | |||
command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.11.1.4.$ARG1$ -m /etc/nagios3/mibs/QNAP-NAS.mib -l "Disk Status" -r 0 | |||
} | |||
* <code> -o .1.3.6.1.4.1.24681.1.2.11.1.7.$ARG1$ </code> - The SNMP OID being checked, similar to above $ARG1$ is used as a command parameter so that I can create separate checks for the individual disks without creating a separate check command for each. | |||
** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemHdTable.HdEntry.HdStatus.$ARG1$</code> | |||
* <code> -m /etc/nagios3/mibs/QNAP-NAS.mib </code> - Path to the QNAP MIB file. The value returned is an integer, 0 for ready/good, a negative value for a fault. In order to translate the value (eg <code>-9</code>) to its actual meaning (eg <code>rwError</code>), Nagios needs access to the MIB file. You will need to download it from your NAS (from the Network Services | SNMP Settings page), and copy it to path indicated on your Nagios server. | |||
* <code> -r 0 </code> - The data expected back from the poll, 0 maps to <code>ready</code>anything else causes a critical error | |||
==== Disk SMART Status ==== | |||
define command{ | define command{ | ||
command_name | command_name check_qnap_disk_smart_status | ||
command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.11.1.7.$ARG1$ -l "SMART Info State" -r "GOOD" | command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.11.1.7.$ARG1$ -l "SMART Info State" -r "GOOD" | ||
} | } | ||
Line 98: | Line 111: | ||
* <code> -r "GOOD" </code> - The text expected back from the poll, anything else causes a critical error | * <code> -r "GOOD" </code> - The text expected back from the poll, anything else causes a critical error | ||
==== Disk Temperature ==== | |||
define command{ | define command{ | ||
command_name check_qnap_disk_temp | command_name check_qnap_disk_temp | ||
Line 104: | Line 118: | ||
* <code> -o .1.3.6.1.4.1.24681.1.2.11.1.3.$ARG1$ </code> - The SNMP OID being checked, as above $ARG1$ is used as a command parameter so that I can create separate checks for the individual disks without creating a separate check command for each. | * <code> -o .1.3.6.1.4.1.24681.1.2.11.1.3.$ARG1$ </code> - The SNMP OID being checked, as above $ARG1$ is used as a command parameter so that I can create separate checks for the individual disks without creating a separate check command for each. | ||
** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemHdTable.HdEntry.HdTemperature.$ARG1$</code> | ** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemHdTable.HdEntry.HdTemperature.$ARG1$</code> | ||
=== Create Services === | === Create Services === | ||
Line 144: | Line 157: | ||
service_description Status Disk 1 | service_description Status Disk 1 | ||
check_command check_qnap_disk_status!1 | check_command check_qnap_disk_status!1 | ||
} | |||
define service{ | |||
use generic-service | |||
hostgroup_name qnap-nas | |||
service_description SMART Disk 1 | |||
check_command check_qnap_disk_smart_status!1 | |||
} | } | ||
Line 309: | Line 329: | ||
== NRPE == | == NRPE == | ||
The Nagios Remote Plugin Executor allows Nagios checks to completed on remote servers in a similar fashion to performing checks on the Nagios server. Whilst its not always necessary, as many remote checks can be performed by probing remotely accessible services such as SNMP or HTTP, there are times when such checks are not suitable, for example... | The '''Nagios Remote Plugin Executor''' allows Nagios checks to completed on remote servers in a similar fashion to performing checks on the Nagios server. Whilst its not always necessary, as many remote checks can be performed by probing remotely accessible services (such as SNMP or HTTP), there are times when such checks are not suitable, for example... | ||
* Running checks that aren't easily achievable via SNMP | * Running checks that aren't easily achievable via SNMP | ||
* Checking services such as MySQL that | * Checking local services such as MySQL that aren't accessible remotely from the server | ||
* Running HTTP checks to test your web servers from more than one location | * Running HTTP checks to test your web servers from more than one location | ||
** EG local to server to ensure the web-server itself is OK, and remotely to check that access is likely to OK for global users | ** EG local to server to ensure the web-server itself is OK, and remotely to check that access is likely to OK for global users | ||
The NRPE server that runs on remote monitored machines does require quite a few additional packages to be installed (see below for in-exhaustive list), and if you are concerned you try the alternative approach of getting data back from your remote server via SNMP as described in this example [[#Ubuntu_Software_Updates_Monitor|Ubuntu Software Updates Monitor]]. This can make for a more lightweight solution, but will require you to write your own monitoring scripts to be called by the SNMP daemon. | The NRPE server that runs on remote monitored machines does require quite a few additional packages to be installed (see below for in-exhaustive list), and if you are concerned you can try the alternative approach of getting data back from your remote server via SNMP as described in this example [[#Ubuntu_Software_Updates_Monitor|Ubuntu Software Updates Monitor]]. This can make for a more lightweight solution, but will require you to write your own monitoring scripts to be called by the SNMP daemon. | ||
Additional packages required by NRPE... | |||
* mysql-common | * mysql-common | ||
* radiusclient1 | * radiusclient1 | ||
Line 322: | Line 344: | ||
* snmp | * snmp | ||
=== Setup === | |||
The procedures below will get NRPE running to monitor disk space, load and MySQL service availability on a remote server. | The procedures below will get NRPE running to monitor disk space, load and MySQL service availability on a remote server. | ||