2,187
edits
Line 3: | Line 3: | ||
Nagios is centred around device polling (it can receive SNMP traps, but its a more advanced feature), and the presentation of state data. Though the first thing to appreciate is that Nagios doesn't actually do any monitoring, at its core it's a task scheduling and state management engine. It needs third party '''plugins''', which do the actual monitoring a report back the state of the host you're monitoring to it. There are plugins provided out-of-the-box, which will probably achieve most (if not all) of what you want. | Nagios is centred around device polling (it can receive SNMP traps, but its a more advanced feature), and the presentation of state data. Though the first thing to appreciate is that Nagios doesn't actually do any monitoring, at its core it's a task scheduling and state management engine. It needs third party '''plugins''', which do the actual monitoring a report back the state of the host you're monitoring to it. There are plugins provided out-of-the-box, which will probably achieve most (if not all) of what you want. | ||
== Terminology == | == Terminology == | ||
Line 14: | Line 10: | ||
* '''command''' - A command is command line call of a plugin with one or more parameters, which defines how you might use a plugin to test a host. | * '''command''' - A command is command line call of a plugin with one or more parameters, which defines how you might use a plugin to test a host. | ||
* '''service''' - A service is something that you care about on a host, that you want to test (eg web server response, ping, disk space, CPU, | * '''service''' - A service is something that you care about on a host, that you want to test (eg web server response, ping, disk space, CPU, | ||
== Useful Paths etc == | == Useful Paths etc == | ||
Line 34: | Line 26: | ||
| <code> service nagios3 restart </code> || Restart service (reloads config - will fail if config is invalid!) | | <code> service nagios3 restart </code> || Restart service (reloads config - will fail if config is invalid!) | ||
|} | |} | ||
== Create SNMP Checks == | == Create SNMP Checks == | ||
Everything here creates various checks for my '''QNAP NAS''', which I've used as an example. | Everything here creates various checks for my '''QNAP NAS''', which I've used as an example. | ||
=== Define OID's to Poll === | === Define OID's to Poll === | ||
Line 52: | Line 36: | ||
Having downloaded the MIB and done some probing GetIf, I've decided I need to monitor the following OID's... | Having downloaded the MIB and done some probing GetIf, I've decided I need to monitor the following OID's... | ||
{|class="vwikitable" | {|class="vwikitable" | ||
Line 79: | Line 59: | ||
I created a new file, called <code>/etc/nagios3/conf.d/commands_qnap.cfg</code> and added the following... | I created a new file, called <code>/etc/nagios3/conf.d/commands_qnap.cfg</code> and added the following... | ||
==== System Temperature ==== | ==== System Temperature ==== | ||
Line 96: | Line 72: | ||
* <code> -l Temp </code> - A label for the check (appears in the check's Status Information column in Nagios display) | * <code> -l Temp </code> - A label for the check (appears in the check's Status Information column in Nagios display) | ||
* <code> -u C </code> - The units of the metric being checked (appears in the check's Status Information column in Nagios display) | * <code> -u C </code> - The units of the metric being checked (appears in the check's Status Information column in Nagios display) | ||
==== Volume Status ==== | ==== Volume Status ==== | ||
Line 109: | Line 81: | ||
** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemVolumeTable.SysVolumeEntry.SysVolumeStatus.$ARG1$</code> | ** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemVolumeTable.SysVolumeEntry.SysVolumeStatus.$ARG1$</code> | ||
* <code> -r "Ready" </code> - The text expected back from the poll, anything else causes a critical error | * <code> -r "Ready" </code> - The text expected back from the poll, anything else causes a critical error | ||
==== Volume Space ==== | ==== Volume Space ==== | ||
Line 123: | Line 91: | ||
* <code> -c $ARG2$: </code> - The critical threshold, defining it as a command parameter allows me to alter the service threshold without altering the command definition. The trailing <code> : </code> makes it a ''should be more than'' check rather than the normal ''should be less than'' check. | * <code> -c $ARG2$: </code> - The critical threshold, defining it as a command parameter allows me to alter the service threshold without altering the command definition. The trailing <code> : </code> makes it a ''should be more than'' check rather than the normal ''should be less than'' check. | ||
** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemVolumeTable.SysVolumeEntry.SysVolumeFreeSize.$ARG1$</code> | ** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemVolumeTable.SysVolumeEntry.SysVolumeFreeSize.$ARG1$</code> | ||
==== Disk Status ==== | ==== Disk Status ==== | ||
Line 146: | Line 110: | ||
** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemHdTable.HdEntry.HdSmartInfo.$ARG1$</code> | ** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemHdTable.HdEntry.HdSmartInfo.$ARG1$</code> | ||
* <code> -r "GOOD" </code> - The text expected back from the poll, anything else causes a critical error | * <code> -r "GOOD" </code> - The text expected back from the poll, anything else causes a critical error | ||
==== Disk Temperature ==== | ==== Disk Temperature ==== | ||
Line 212: | Line 172: | ||
check_command check_qnap_disk_temp!1 | check_command check_qnap_disk_temp!1 | ||
} | } | ||
Line 239: | Line 196: | ||
In general its better to make such changes to generic templates, that can then be applied to one or more service checks. You can then edit changes centrally, rather than going round and updating services. Templates can be daisy chained so that subsequent templates override or add to config (see http://nagios.sourceforge.net/docs/3_0/objectinheritance.html for further info). | In general its better to make such changes to generic templates, that can then be applied to one or more service checks. You can then edit changes centrally, rather than going round and updating services. Templates can be daisy chained so that subsequent templates override or add to config (see http://nagios.sourceforge.net/docs/3_0/objectinheritance.html for further info). | ||
=== Check Frequency === | === Check Frequency === | ||
Line 295: | Line 248: | ||
check_command check_wib_svc | check_command check_wib_svc | ||
} | } | ||
== Ubuntu Software Updates Monitor == | == Ubuntu Software Updates Monitor == | ||
Line 330: | Line 279: | ||
=== SNMP Based (Michal Ludvig) === | === SNMP Based (Michal Ludvig) === | ||
'''The check script that is called by SNMP doesn't work! I've left this here for the time being as the remote SNMP exec mechanism does work, and I expect to use it at some point. When I do, I'll remove this, and document that instead.''' | '''The check script that is called by SNMP doesn't work! I've left this here for the time being as the remote SNMP exec mechanism does work, and I expect to use it at some point. When I do, I'll remove this, and document that instead.''' | ||
Line 381: | Line 327: | ||
notification_interval 0 ; set > 0 if you want to be renotified | notification_interval 0 ; set > 0 if you want to be renotified | ||
} | } | ||
== NRPE == | == NRPE == | ||
Line 401: | Line 343: | ||
* smbclient | * smbclient | ||
* snmp | * snmp | ||
=== Setup === | === Setup === | ||
Line 475: | Line 413: | ||
notification_interval 0 ; set > 0 if you want to be renotified | notification_interval 0 ; set > 0 if you want to be renotified | ||
} | } | ||
== Web Site Content and Response Time Monitoring == | == Web Site Content and Response Time Monitoring == | ||
Line 489: | Line 423: | ||
Therefore I took one that almost did, <code>[http://exchange.nagios.org/directory/Plugins/Websites%2C-Forms-and-Transactions/check_http_content/details check_http_content]</code>, and modified it to match my requirements (which I'll upload to the exchange once I've got it working with the <code>Nagios::Plugin</code> Perl module), and called it <code>[http://dl.sandfordit.com/scripts/check_url_content check_url_content]</code> (for the time being its available via the previous link). | Therefore I took one that almost did, <code>[http://exchange.nagios.org/directory/Plugins/Websites%2C-Forms-and-Transactions/check_http_content/details check_http_content]</code>, and modified it to match my requirements (which I'll upload to the exchange once I've got it working with the <code>Nagios::Plugin</code> Perl module), and called it <code>[http://dl.sandfordit.com/scripts/check_url_content check_url_content]</code> (for the time being its available via the previous link). | ||
=== Script Options === | === Script Options === | ||
Line 527: | Line 457: | ||
| Host (optional when Username specified), should be in the following format 'www.domain.com:443' | | Host (optional when Username specified), should be in the following format 'www.domain.com:443' | ||
|} | |} | ||
=== Examples === | === Examples === |