Nagios: Difference between revisions
m (→Create Commands: Finished off) |
(→Create SNMP Checks: First draft) |
||
Line 101: | Line 101: | ||
| <code> .1.3.6.1.4.1.24681.1.2.11.1.3.1 </code> || Physical Disk 1 Temperature || <code> 35 C/95 F </code> | | <code> .1.3.6.1.4.1.24681.1.2.11.1.3.1 </code> || Physical Disk 1 Temperature || <code> 35 C/95 F </code> | ||
|} | |} | ||
==== Create Commands ==== | ==== Create Commands ==== | ||
Each type of check needs a command defined for it, | Each type of check needs a command defined for it, which where the SNMP OID that will be checked is defined. Commands are are not specific to a particular host, so could be run against any system for which the check would be valid. There is some flexibility in that if you've certain checks that will be similar (eg checks for status of disk 1, disk 2 etc) then you can add arguments to the checks that can be defined later on. | ||
I created a new file, called <code>/etc/nagios3/conf.d/commands_qnap.cfg</code> and added the following... | |||
define command{ | define command{ | ||
Line 113: | Line 116: | ||
* <code> -w 45 </code> - The warning threshold | * <code> -w 45 </code> - The warning threshold | ||
* <code> -c 55 </code> - The critical threshold | * <code> -c 55 </code> - The critical threshold | ||
* <code> -l Temp </code> - A label for the check (appears in the | * <code> -l Temp </code> - A label for the check (appears in the check's Status Information column in Nagios display) | ||
* <code> -u C </code> - The units of the metric being checked (appears in the | * <code> -u C </code> - The units of the metric being checked (appears in the check's Status Information column in Nagios display) | ||
Line 129: | Line 132: | ||
command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.17.1.5.$ARG1$ -w $ARG2$: -c $ARG3$: -l "Volume Space" -u TB | command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.17.1.5.$ARG1$ -w $ARG2$: -c $ARG3$: -l "Volume Space" -u TB | ||
} | } | ||
* <code> -o .1.3.6.1.4.1.24681.1.2.17.1.5.$ARG1$ </code> - The SNMP OID being checked, as above $ARG1$ is used as a | * <code> -o .1.3.6.1.4.1.24681.1.2.17.1.5.$ARG1$ </code> - The SNMP OID being checked, as above $ARG1$ is used as a command parameter so that if I had more than one volume I could repeat the check for volume 1, 2 etc without creating a separate check command for each. | ||
* <code> -w $ARG2$: </code> - The warning threshold, defining it as a command parameter allows me to alter the service threshold without altering the command definition. The trailing <code> : </code> makes it a ''should be more than'' check rather than the normal ''should be less than'' check. | |||
* <code> -c $ARG2$: </code> - The critical threshold, defining it as a command parameter allows me to alter the service threshold without altering the command definition. The trailing <code> : </code> makes it a ''should be more than'' check rather than the normal ''should be less than'' check. | |||
** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemVolumeTable.SysVolumeEntry.SysVolumeFreeSize.$ARG1$</code> | ** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemVolumeTable.SysVolumeEntry.SysVolumeFreeSize.$ARG1$</code> | ||
Line 137: | Line 142: | ||
command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.11.1.7.$ARG1$ -l "SMART Info State" | command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.11.1.7.$ARG1$ -l "SMART Info State" | ||
} | } | ||
* <code> -o .1.3.6.1.4.1.24681.1.2.11.1.7.$ARG1$ </code> - The SNMP OID being checked, similar to above $ARG1$ is used as a | * <code> -o .1.3.6.1.4.1.24681.1.2.11.1.7.$ARG1$ </code> - The SNMP OID being checked, similar to above $ARG1$ is used as a command parameter so that I can create separate checks for the individual disks without creating a separate check command for each. | ||
** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemHdTable.HdEntry.HdSmartInfo.$ARG1$</code> | ** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemHdTable.HdEntry.HdSmartInfo.$ARG1$</code> | ||
Line 144: | Line 149: | ||
command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.11.1.3.$ARG1$ -w 45 -c 55 -l Temp -u C | command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.11.1.3.$ARG1$ -w 45 -c 55 -l Temp -u C | ||
} | } | ||
* <code> -o .1.3.6.1.4.1.24681.1.2.11.1.3.$ARG1$ </code> - The SNMP OID being checked, as above $ARG1$ is used as a | * <code> -o .1.3.6.1.4.1.24681.1.2.11.1.3.$ARG1$ </code> - The SNMP OID being checked, as above $ARG1$ is used as a command parameter so that I can create separate checks for the individual disks without creating a separate check command for each. | ||
** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemHdTable.HdEntry.HdTemperature.$ARG1$</code> | ** <code>.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemHdTable.HdEntry.HdTemperature.$ARG1$</code> | ||
==== Create Services ==== | |||
Services are used to define a generic check command within the bounds of a specific service. So for example, you could define two separate disk space checks, using the same command definition, but with different alerting thresholds depending on your requirements. | |||
Services need to be defined with... | |||
* <code> hostgroup_name </code> - The hostgroup defines which servers will have the service checks applied to it. For a host to be checked for the service it needs to be a member of the hostgroup, see [[#Create Hostgroup]] for further info. | |||
* <code> service_description </code> - A name for the service check, this is what is displayed in the Service field on the Nagios display | |||
* <code> check_command </code> - The command (and its parameters, if any) to perform the check. | |||
I created a new file, called <code>/etc/nagios3/conf.d/services_qnap.cfg</code>, in which to add service definitions, examples of which are below... | |||
define service{ | |||
use generic-service | |||
hostgroup_name qnap-nas | |||
service_description Temp Sys | |||
check_command check_qnap_sys_temp | |||
} | |||
define service{ | |||
use generic-service | |||
hostgroup_name qnap-nas | |||
service_description Status SysVol 1 | |||
check_command check_qnap_sysvol_status!1 | |||
} | |||
* Note the <code>!1</code> at the end of the command in order to pass a parameter of 1 (ie 1st volume) to the command | |||
define service{ | |||
use generic-service | |||
hostgroup_name qnap-nas | |||
service_description Space SysVol 1 | |||
check_command check_qnap_sysvol_space!1!.5!.25 | |||
} | |||
* Note the <code>!1!.5!.25</code> at the end of the command in order to pass parameters for volume 1, warning threshold of .5TB, and critical threshold of .25TB to the command | |||
define service{ | |||
use generic-service | |||
hostgroup_name qnap-nas | |||
service_description Status Disk 1 | |||
check_command check_qnap_disk_status!1 | |||
} | |||
define service{ | |||
use generic-service | |||
hostgroup_name qnap-nas | |||
service_description Temp Disk 1 | |||
check_command check_qnap_disk_temp!1 | |||
} | |||
==== Create Hostgroup ==== | |||
The hostgroup definition allows you to group one or more hosts together, in order to have service checks run against them. So in the above I created services that would apply to hosts in the <code>qnap-nas</code> hostgroup. I can then add my NAS server to this hostgroup in order for it to be monitored (hostgroup definitions are normally found in <code>/etc/nagios3/conf.d/hostgroups_nagios2.cfg</code> | |||
define hostgroup { | |||
hostgroup_name qnap-nas | |||
alias QNAP NAS | |||
members nas | |||
} | |||
If I wanted to monitor more than one NAS I could just add further members (comma separated, no spaces). Note that any hosts specified in a hostgroup must themselves have a host definition (normally found in <code>/etc/nagios3/conf.d/hosts.cfg</code>, for example... | |||
define host{ | |||
use generic-host | |||
host_name nas | |||
alias NAS | |||
address 192.168.1.200 | |||
} | |||
Revision as of 10:07, 31 August 2011
Path | Description |
---|---|
/etc/nagios3/conf.d |
Config files |
/etc/nagios-plugins/config |
Plugin commands |
/usr/lib/nagios/plugins |
Plugin executables |
nagios3 -v /etc/nagios3/nagios.cfg |
Config check |
service nagios3 restart |
Restart service (reloads config) |
./usr/share/nagios ./usr/lib/nagios ./var/lib/nagios
define service{
use generic-service ; Inherit default values from a template
hostgroup_name zimbra-servers
service_description IMAP
check_command check_imap
}
define service{ use generic-service ; Inherit default values from a template hostgroup_name zimbra-servers service_description SMTP check_command check_smtp }
- check that MySQL services are up
define service {
hostgroup_name mysql-servers service_description MySQL check_command check_mysql use generic-service notification_interval 0 ; set > 0 if you want to be renotified
}
define command{
command_name check_http_auth command_line /usr/lib/nagios/plugins/check_http -H '$HOSTADDRESS$' -I '$HOSTADDRESS$' -a '$ARG1$'q }
define service{
use generic-service ; Name of service template to use host_name localhost service_description HTTP check_command check_http_auth!user:pass ; Enter actual user/pass }
define hostextinfo{
hostgroup_name debian-servers notes Debian GNU/Linux servers
icon_image base/debian.png icon_image_alt Debian GNU/Linux vrml_image debian.png statusmap_image base/debian.gd2 }
define hostextinfo{
hostgroup_name ubuntu-servers notes Ubuntu servers icon_image base/ubuntu.png icon_image_alt Ubuntu vrml_image ubuntu.png statusmap_image base/ubuntu.gd2 }
Create SNMP Checks
Everything here creates various checks for my QNAP NAS, which I've used as an example.
Define OID's to Poll
Before you start you need to know what SNMP OID's you want to poll, and what they're values should be. For common devices and metrics you can often get by with a Google search or two, but it doesn't take much for you to need to get a bit more involved.
When it comes to investigating what OID's you can poll for a specific device your friend is GetIf.
Having downloaded the MIB and done some probing GetIf, I've decided I need to monitor the following OID's...
OID | Description | Example Return Data |
---|---|---|
.1.3.6.1.4.1.24681.1.2.6.0 |
System Temperature | 41 C/105 F
|
.1.3.6.1.4.1.24681.1.2.17.1.6.1 |
System Volume 1 Status | Ready
|
.1.3.6.1.4.1.24681.1.2.17.1.5.1 |
System Volume 1 Space | 1.74 TB
|
.1.3.6.1.4.1.24681.1.2.11.1.7.1 |
Physical Disk 1 SMART Status | GOOD
|
.1.3.6.1.4.1.24681.1.2.11.1.3.1 |
Physical Disk 1 Temperature | 35 C/95 F
|
Create Commands
Each type of check needs a command defined for it, which where the SNMP OID that will be checked is defined. Commands are are not specific to a particular host, so could be run against any system for which the check would be valid. There is some flexibility in that if you've certain checks that will be similar (eg checks for status of disk 1, disk 2 etc) then you can add arguments to the checks that can be defined later on.
I created a new file, called /etc/nagios3/conf.d/commands_qnap.cfg
and added the following...
define command{ command_name check_qnap_sys_temp command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.6.0 -w 45 -c 55 -l Temp -u C }
-H '$HOSTADDRESS$'
- This is a standard wildcard for all check commands, Nagios substitutes the device's IP address-o .1.3.6.1.4.1.24681.1.2.6.0
- The SNMP OID being checked **.iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemTemperature.0
-w 45
- The warning threshold-c 55
- The critical threshold-l Temp
- A label for the check (appears in the check's Status Information column in Nagios display)-u C
- The units of the metric being checked (appears in the check's Status Information column in Nagios display)
define command{ command_name check_qnap_sysvol_status command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.17.1.6.$ARG1$ -l "Volume Status" }
-o .1.3.6.1.4.1.24681.1.2.17.1.6.$ARG1$
- The SNMP OID being checked, $ARG1$ is used as a wildcard so that if I had more than one volume I could repeat the check for volume 1, 2 etc without creating a separate check command for each..iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemVolumeTable.SysVolumeEntry.SysVolumeStatus.$ARG1$
define command{ command_name check_qnap_sysvol_space command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.17.1.5.$ARG1$ -w $ARG2$: -c $ARG3$: -l "Volume Space" -u TB }
-o .1.3.6.1.4.1.24681.1.2.17.1.5.$ARG1$
- The SNMP OID being checked, as above $ARG1$ is used as a command parameter so that if I had more than one volume I could repeat the check for volume 1, 2 etc without creating a separate check command for each.-w $ARG2$:
- The warning threshold, defining it as a command parameter allows me to alter the service threshold without altering the command definition. The trailing:
makes it a should be more than check rather than the normal should be less than check.-c $ARG2$:
- The critical threshold, defining it as a command parameter allows me to alter the service threshold without altering the command definition. The trailing:
makes it a should be more than check rather than the normal should be less than check..iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemVolumeTable.SysVolumeEntry.SysVolumeFreeSize.$ARG1$
define command{ command_name check_qnap_disk_status command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.11.1.7.$ARG1$ -l "SMART Info State" }
-o .1.3.6.1.4.1.24681.1.2.11.1.7.$ARG1$
- The SNMP OID being checked, similar to above $ARG1$ is used as a command parameter so that I can create separate checks for the individual disks without creating a separate check command for each..iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemHdTable.HdEntry.HdSmartInfo.$ARG1$
define command{ command_name check_qnap_disk_temp command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -o .1.3.6.1.4.1.24681.1.2.11.1.3.$ARG1$ -w 45 -c 55 -l Temp -u C }
-o .1.3.6.1.4.1.24681.1.2.11.1.3.$ARG1$
- The SNMP OID being checked, as above $ARG1$ is used as a command parameter so that I can create separate checks for the individual disks without creating a separate check command for each..iso.org.dod.internet.private.enterprises.storage.storageSystem.SystemInfo.SystemHdTable.HdEntry.HdTemperature.$ARG1$
Create Services
Services are used to define a generic check command within the bounds of a specific service. So for example, you could define two separate disk space checks, using the same command definition, but with different alerting thresholds depending on your requirements.
Services need to be defined with...
hostgroup_name
- The hostgroup defines which servers will have the service checks applied to it. For a host to be checked for the service it needs to be a member of the hostgroup, see #Create Hostgroup for further info.service_description
- A name for the service check, this is what is displayed in the Service field on the Nagios displaycheck_command
- The command (and its parameters, if any) to perform the check.
I created a new file, called /etc/nagios3/conf.d/services_qnap.cfg
, in which to add service definitions, examples of which are below...
define service{ use generic-service hostgroup_name qnap-nas service_description Temp Sys check_command check_qnap_sys_temp }
define service{ use generic-service hostgroup_name qnap-nas service_description Status SysVol 1 check_command check_qnap_sysvol_status!1 }
- Note the
!1
at the end of the command in order to pass a parameter of 1 (ie 1st volume) to the command
define service{ use generic-service hostgroup_name qnap-nas service_description Space SysVol 1 check_command check_qnap_sysvol_space!1!.5!.25 }
- Note the
!1!.5!.25
at the end of the command in order to pass parameters for volume 1, warning threshold of .5TB, and critical threshold of .25TB to the command
define service{ use generic-service hostgroup_name qnap-nas service_description Status Disk 1 check_command check_qnap_disk_status!1 }
define service{ use generic-service hostgroup_name qnap-nas service_description Temp Disk 1 check_command check_qnap_disk_temp!1 }
Create Hostgroup
The hostgroup definition allows you to group one or more hosts together, in order to have service checks run against them. So in the above I created services that would apply to hosts in the qnap-nas
hostgroup. I can then add my NAS server to this hostgroup in order for it to be monitored (hostgroup definitions are normally found in /etc/nagios3/conf.d/hostgroups_nagios2.cfg
define hostgroup { hostgroup_name qnap-nas alias QNAP NAS members nas }
If I wanted to monitor more than one NAS I could just add further members (comma separated, no spaces). Note that any hosts specified in a hostgroup must themselves have a host definition (normally found in /etc/nagios3/conf.d/hosts.cfg
, for example...
define host{ use generic-host host_name nas alias NAS address 192.168.1.200 }