1# Service Monitoring <a id="service-monitoring"></a>
2
3The power of Icinga 2 lies in its modularity. There are thousands of
4community plugins available next to the standard plugins provided by
5the [Monitoring Plugins project](https://www.monitoring-plugins.org).
6
7Start your research on [Icinga Exchange](https://exchange.icinga.com)
8and look which services are already [covered](05-service-monitoring.md#service-monitoring-overview).
9
10The [requirements chapter](05-service-monitoring.md#service-monitoring-requirements) guides you
11through the plugin setup, tests and their integration with an [existing](05-service-monitoring.md#service-monitoring-plugin-checkcommand)
12or [new](05-service-monitoring.md#service-monitoring-plugin-checkcommand-new) CheckCommand object
13and host/service objects inside the [Director](05-service-monitoring.md#service-monitoring-plugin-checkcommand-integration-director)
14or [Icinga config files](05-service-monitoring.md#service-monitoring-plugin-checkcommand-integration-config-files).
15It also adds hints on [modifying](05-service-monitoring.md#service-monitoring-plugin-checkcommand-modify) existing commands.
16
17Plugins follow the [Plugin API specification](05-service-monitoring.md#service-monitoring-plugin-api)
18which is enriched with examples and also code examples to get you started with
19[your own plugin](05-service-monitoring.md#service-monitoring-plugin-new).
20
21
22
23## Requirements <a id="service-monitoring-requirements"></a>
24
25### Plugins <a id="service-monitoring-plugins"></a>
26
27All existing Icinga or Nagios plugins work with Icinga 2. Community
28plugins can be found for example on [Icinga Exchange](https://exchange.icinga.com).
29
30The recommended way of setting up these plugins is to copy them
31into the `PluginDir` directory.
32
33If you have plugins with many dependencies, consider creating a
34custom RPM/DEB package which handles the required libraries and binaries.
35
36Configuration management tools such as Puppet, Ansible, Chef or Saltstack
37also help with automatically installing the plugins on different
38operating systems. They can also help with installing the required
39dependencies, e.g. Python libraries, Perl modules, etc.
40
41### Plugin Setup <a id="service-monitoring-plugins-setup"></a>
42
43Good plugins provide installations and configuration instructions
44in their docs and/or README on GitHub.
45
46Sometimes dependencies are not listed, or your distribution differs from the one
47described. Try running the plugin after setup and [ensure it works](05-service-monitoring.md#service-monitoring-plugins-it-works).
48
49#### Ensure it works <a id="service-monitoring-plugins-it-works"></a>
50
51Prior to using the check plugin with Icinga 2 you should ensure that it is working properly
52by trying to run it on the console using whichever user Icinga 2 is running as:
53
54RHEL/CentOS/Fedora
55
56```bash
57sudo -u icinga /usr/lib64/nagios/plugins/check_mysql_health --help
58```
59
60Debian/Ubuntu
61
62```bash
63sudo -u nagios /usr/lib/nagios/plugins/check_mysql_health --help
64```
65
66Additional libraries may be required for some plugins. Please consult the plugin
67documentation and/or the included README file for installation instructions.
68Sometimes plugins contain hard-coded paths to other components. Instead of changing
69the plugin it might be easier to create a symbolic link to make sure it doesn't get
70overwritten during the next update.
71
72Sometimes there are plugins which do not exactly fit your requirements.
73In that case you can modify an existing plugin or just write your own.
74
75#### Plugin Dependency Errors <a id="service-monitoring-plugins-setup-dependency-errors"></a>
76
77Plugins can be scripts (Shell, Python, Perl, Ruby, PHP, etc.)
78or compiled binaries (C, C++, Go).
79
80These scripts/binaries may require additional libraries
81which must be installed on every system they are executed.
82
83> **Tip**
84>
85> Don't test the plugins on your master instance, instead
86> do that on the satellites and clients which execute the
87> checks.
88
89There are errors, now what? Typical errors are missing libraries,
90binaries or packages.
91
92##### Python Example
93
94Example for a Python plugin which uses the `tinkerforge` module
95to query a network service:
96
97```
98ImportError: No module named tinkerforge.ip_connection
99```
100
101Its [documentation](https://github.com/NETWAYS/check_tinkerforge#installation)
102points to installing the `tinkerforge` Python module.
103
104##### Perl Example
105
106Example for a Perl plugin which uses SNMP:
107
108```
109Can't locate Net/SNMP.pm in @INC (you may need to install the Net::SNMP module)
110```
111
112Prior to installing the Perl module via CPAN, look for a distribution
113specific package, e.g. `libnet-snmp-perl` on Debian/Ubuntu or `perl-Net-SNMP`
114on RHEL/CentOS.
115
116
117#### Optional: Custom Path <a id="service-monitoring-plugins-custom-path"></a>
118
119If you are not using the default `PluginDir` directory, you
120can create a custom plugin directory and constant
121and reference this in the created CheckCommand objects.
122
123Create a common directory e.g. `/opt/monitoring/plugins`
124and install the plugin there.
125
126```bash
127mkdir -p /opt/monitoring/plugins
128cp check_snmp_int.pl /opt/monitoring/plugins
129chmod +x /opt/monitoring/plugins/check_snmp_int.pl
130```
131
132Next create a new global constant, e.g. `CustomPluginDir`
133in your [constants.conf](04-configuration.md#constants-conf)
134configuration file:
135
136```
137vim /etc/icinga2/constants.conf
138
139const PluginDir = "/usr/lib/nagios/plugins"
140const CustomPluginDir = "/opt/monitoring/plugins"
141```
142
143### CheckCommand Definition <a id="service-monitoring-plugin-checkcommand"></a>
144
145Each plugin requires a [CheckCommand](09-object-types.md#objecttype-checkcommand) object in your
146configuration which can be used in the [Service](09-object-types.md#objecttype-service) or
147[Host](09-object-types.md#objecttype-host) object definition.
148
149Please check if the Icinga 2 package already provides an
150[existing CheckCommand definition](10-icinga-template-library.md#icinga-template-library).
151
152If that's the case, thoroughly check the required parameters and integrate the check command
153into your host and service objects. Best practice is to run the plugin on the CLI
154with the required parameters first.
155
156Example for database size checks with [check_mysql_health](10-icinga-template-library.md#plugin-contrib-command-mysql_health).
157
158```bash
159/usr/lib64/nagios/plugins/check_mysql_health --hostname '127.0.0.1' --username root --password icingar0xx --mode sql --name 'select sum(data_length + index_length) / 1024 / 1024 from information_schema.tables where table_schema = '\''icinga'\'';' '--name2' 'db_size' --units 'MB' --warning 4096 --critical 8192
160```
161
162The parameter names inside the ITL commands follow the
163`<command name>_<parameter name>` schema.
164
165#### Icinga Director Integration <a id="service-monitoring-plugin-checkcommand-integration-director"></a>
166
167Navigate into `Commands > External Commands` and search for `mysql_health`.
168Select `mysql_health` and navigate into the `Fields` tab.
169
170In order to access the parameters, the Director requires you to first
171define the needed custom data fields:
172
173* `mysql_health_hostname`
174* `mysql_health_username` and `mysql_health_password`
175* `mysql_health_mode`
176* `mysql_health_name`, `mysql_health_name2` and `mysql_health_units`
177* `mysql_health_warning` and `mysql_health_critical`
178
179Create a new host template and object where you'll generic
180settings like `mysql_health_hostname` (if it differs from the host's
181`address` attribute) and `mysql_health_username` and `mysql_health_password`.
182
183Create a new service template for `mysql-health` and set the `mysql_health`
184as check command. You can also define a default for `mysql_health_mode`.
185
186Next, create a service apply rule or a new service set which gets assigned
187to matching host objects.
188
189
190#### Icinga Config File Integration <a id="service-monitoring-plugin-checkcommand-integration-config-files"></a>
191
192Create or modify a host object which stores
193the generic database defaults and prepares details
194for a service apply for rule.
195
196```
197object Host "icinga2-master1.localdomain" {
198  check_command = "hostalive"
199  address = "..."
200
201  // Database listens locally, not external
202  vars.mysql_health_hostname = "127.0.0.1"
203
204  // Basic database size checks for Icinga DBs
205  vars.databases["icinga"] = {
206    mysql_health_warning = 4096 //MB
207    mysql_health_critical = 8192 //MB
208  }
209  vars.databases["icingaweb2"] = {
210    mysql_health_warning = 4096 //MB
211    mysql_health_critical = 8192 //MB
212  }
213}
214```
215
216The host object prepares the database details and thresholds already
217for advanced [apply for](03-monitoring-basics.md#using-apply-for) rules. It also uses
218conditions to fetch host specified values, or set default values.
219
220```
221apply Service "db-size-" for (db_name => config in host.vars.databases) {
222  check_interval = 1m
223  retry_interval = 30s
224
225  check_command = "mysql_health"
226
227  if (config.mysql_health_username) {
228    vars.mysql_healt_username = config.mysql_health_username
229  } else {
230    vars.mysql_health_username = "root"
231  }
232  if (config.mysql_health_password) {
233    vars.mysql_healt_password = config.mysql_health_password
234  } else {
235    vars.mysql_health_password = "icingar0xx"
236  }
237
238  vars.mysql_health_mode = "sql"
239  vars.mysql_health_name = "select sum(data_length + index_length) / 1024 / 1024 from information_schema.tables where table_schema = '" + db_name + "';"
240  vars.mysql_health_name2 = "db_size"
241  vars.mysql_health_units = "MB"
242
243  if (config.mysql_health_warning) {
244    vars.mysql_health_warning = config.mysql_health_warning
245  }
246  if (config.mysql_health_critical) {
247    vars.mysql_health_critical = config.mysql_health_critical
248  }
249
250  vars += config
251}
252```
253
254#### New CheckCommand <a id="service-monitoring-plugin-checkcommand-new"></a>
255
256This chapter describes how to add a new CheckCommand object for a plugin.
257
258Please make sure to follow these conventions when adding a new command object definition:
259
260* Use [command arguments](03-monitoring-basics.md#command-arguments) whenever possible. The `command` attribute
261must be an array in `[ ... ]` for shell escaping.
262* Define a unique `prefix` for the command's specific arguments. Best practice is to follow this schema:
263
264```
265<command name>_<parameter name>
266```
267
268That way you can safely set them on host/service level and you'll always know which command they control.
269* Use command argument default values, e.g. for thresholds.
270* Use [advanced conditions](09-object-types.md#objecttype-checkcommand) like `set_if` definitions.
271
272Before starting with the CheckCommand definition, please check
273the existing objects available inside the ITL. They follow best
274practices and are maintained by developers and our community.
275
276This example picks a new plugin called [check_systemd](https://exchange.icinga.com/joseffriedrich/check_systemd)
277uploaded to Icinga Exchange in June 2019.
278
279First, [install](05-service-monitoring.md#service-monitoring-plugins-setup) the plugin and ensure
280that [it works](05-service-monitoring.md#service-monitoring-plugins-it-works). Then run it with the
281`--help` parameter to see the actual parameters (docs might be outdated).
282
283```
284./check_systemd.py --help
285
286usage: check_systemd.py [-h] [-c SECONDS] [-e UNIT | -u UNIT] [-v] [-V]
287                        [-w SECONDS]
288
289...
290
291optional arguments:
292  -h, --help            show this help message and exit
293  -c SECONDS, --critical SECONDS
294                        Startup time in seconds to result in critical status.
295  -e UNIT, --exclude UNIT
296                        Exclude a systemd unit from the checks. This option
297                        can be applied multiple times. For example: -e mnt-
298                        data.mount -e task.service.
299  -u UNIT, --unit UNIT  Name of the systemd unit that is beeing tested.
300  -v, --verbose         Increase output verbosity (use up to 3 times).
301  -V, --version         show program's version number and exit
302  -w SECONDS, --warning SECONDS
303                        Startup time in seconds to result in warning status.
304```
305
306The argument description is important, based on this you need to create the
307command arguments.
308
309> **Tip**
310>
311> When you are using the Director, you can prepare the commands as files
312> e.g. inside the `global-templates` zone. Then run the kickstart wizard
313> again to import the commands as external reference.
314>
315> If you prefer to use the Director GUI/CLI, please apply the steps
316> in the `Add Command` form.
317
318Start with the basic plugin call without any parameters.
319
320```
321object CheckCommand "systemd" { // Plugin name without 'check_' prefix
322  command = [ PluginContribDir + "/check_systemd.py" ] // Use the 'PluginContribDir' constant, see the contributed ITL commands
323}
324```
325
326Run a config validation to see if that works, `icinga2 daemon -C`
327
328Next, analyse the plugin parameters. Plugins with a good help output show
329optional parameters in square brackes. This is the case for all parameters
330for this plugin. If there are required parameters, use the `required` key
331inside the argument.
332
333The `arguments` attribute is a dictionary which takes the parameters as keys.
334
335```
336  arguments = {
337    "--unit" = { ... }
338  }
339```
340
341If there a long parameter names available, prefer them. This increases
342readability in both the configuration as well as the executed command line.
343
344The argument value itself is a sub dictionary which has additional keys:
345
346* `value` which references the runtime macro string
347* `description` where you copy the plugin parameter help text into
348* `required`, `set_if`, etc. for advanced parameters, check the [CheckCommand object](09-object-types.md#objecttype-checkcommand) chapter.
349
350The runtime macro syntax is required to allow value extraction when
351the command is executed.
352
353> **Tip**
354>
355> Inside the Director, store the new command first in order to
356> unveil the `Arguments` tab.
357
358Best practice is to use the command name as prefix, in this specific
359case e.g. `systemd_unit`.
360
361```
362  arguments = {
363    "--unit" = {
364      value = "$systemd_unit$" // The service parameter would then be defined as 'vars.systemd_unit = "icinga2"'
365      description = "Name of the systemd unit that is beeing tested."
366    }
367    "--warning" = {
368      value = "$systemd_warning$"
369      description = "Startup time in seconds to result in warning status."
370    }
371    "--critical" = {
372      value = "$systemd_critical$"
373      description = "Startup time in seconds to result in critical status."
374    }
375  }
376```
377
378This may take a while -- validate the configuration in between up until
379the CheckCommand definition is done.
380
381Then test and integrate it into your monitoring configuration.
382
383Remember: Do it once and right, and never touch the CheckCommand again.
384Optional arguments allow different use cases and scenarios.
385
386
387Once you have created your really good CheckCommand, please consider
388sharing it with our community by creating a new PR on [GitHub](https://github.com/Icinga/icinga2/blob/master/CONTRIBUTING.md).
389_Please also update the documentation for the ITL._
390
391
392> **Tip**
393>
394> Inside the Director, you can render the configuration in the Deployment
395> section. Extract the static configuration object and use that as a source
396> for sending it upstream.
397
398
399
400#### Modify Existing CheckCommand <a id="service-monitoring-plugin-checkcommand-modify"></a>
401
402Sometimes an existing CheckCommand inside the ITL is missing a parameter.
403Or you don't need a default parameter value being set.
404
405Instead of copying the entire configuration object, you can import
406an object into another new object.
407
408```
409object CheckCommand "http-custom" {
410  import "http" // Import existing http object
411
412  arguments += { // Use additive assignment to add missing parameters
413    "--key" = {
414      value = "$http_..." // Keep the parameter name the same as with http
415    }
416  }
417
418  // Override default parameters
419  vars.http_address = "..."
420}
421```
422
423This CheckCommand can then be referenced in your host/service object
424definitions.
425
426
427### Plugin API <a id="service-monitoring-plugin-api"></a>
428
429Icinga 2 supports the native plugin API specification from the Monitoring Plugins project.
430It is defined in the [Monitoring Plugins](https://www.monitoring-plugins.org) guidelines.
431
432The Icinga documentation revamps the specification into our
433own guideline enriched with examples and best practices.
434
435#### Output <a id="service-monitoring-plugin-api-output"></a>
436
437The output should be as short and as detailed as possible. The
438most common cases include:
439
440- Viewing a problem list in Icinga Web and dashboards
441- Getting paged about a problem
442- Receiving the alert on the CLI or forwarding it to external (ticket) systems
443
444Examples:
445
446```
447<STATUS>: <A short description what happened>
448
449OK: MySQL connection time is fine (0.0002s)
450WARNING: MySQL connection time is slow (0.5s > 0.1s threshold)
451CRITICAL: MySQL connection time is causing degraded performance (3s > 0.5s threshold)
452```
453
454Icinga supports reading multi-line output where Icinga Web
455only shows the first line in the listings and everything in the detail view.
456
457Example for an end2end check with many smaller test cases integrated:
458
459```
460OK: Online banking works.
461Testcase 1: Site reached.
462Testcase 2: Attempted login, JS loads.
463Testcase 3: Login succeeded.
464Testcase 4: View current state works.
465Testcase 5: Transactions fine.
466```
467
468If the extended output shouldn't be visible in your monitoring, but only for testing,
469it is recommended to implement the `--verbose` plugin parameter to allow
470developers and users to debug further. Check [here](05-service-monitoring.md#service-monitoring-plugin-api-verbose)
471for more implementation tips.
472
473> **Tip**
474>
475> More debug output also helps when implementing your plugin.
476>
477> Best practice is to have the plugin parameter and handling implemented first,
478> then add it anywhere you want to see more, e.g. from initial database connections
479> to actual query results.
480
481
482#### Status <a id="service-monitoring-plugin-api-status"></a>
483
484Value | Status    | Description
485------|-----------|-------------------------------
4860     | OK        | The check went fine and everything is considered working.
4871     | Warning   | The check is above the given warning threshold, or anything else is suspicious requiring attention before it breaks.
4882     | Critical  | The check exceeded the critical threshold, or something really is broken and will harm the production environment.
4893     | Unknown   | Invalid parameters, low level resource errors (IO device busy, no fork resources, TCP sockets, etc.) preventing the actual check. Higher level errors such as DNS resolving, TCP connection timeouts should be treated as `Critical` instead. Whenever the plugin reaches its timeout (best practice) it should also terminate with `Unknown`.
490
491Keep in mind that these are service states. Icinga automatically maps
492the [host state](03-monitoring-basics.md#check-result-state-mapping) from the returned plugin states.
493
494#### Thresholds <a id="service-monitoring-plugin-api-thresholds"></a>
495
496A plugin calculates specific values and may decide about the exit state on its own.
497This is done with thresholds - warning and critical values which are compared with
498the actual value. Upon this logic, the exit state is determined.
499
500Imagine the following value and defined thresholds:
501
502```
503ptc_value = 57.8
504
505warning = 50
506critical = 60
507```
508
509Whenever `ptc_value` is higher than warning or critical, it should return
510the appropriate [state](05-service-monitoring.md#service-monitoring-plugin-api-status).
511
512The threshold evaluation order also is important:
513
514* Critical thresholds are evaluated first and superseed everything else.
515* Warning thresholds are evaluated second
516* If no threshold is matched, return the OK state
517
518Avoid using hardcoded threshold values in your plugins, always
519add them to the argument parser.
520
521Example for Python:
522
523```python
524import argparse
525import signal
526import sys
527
528if __name__ == '__main__':
529    parser = argparse.ArgumentParser()
530
531    parser.add_argument("-w", "--warning", help="Warning threshold. Single value or range, e.g. '20:50'.")
532    parser.add_argument("-c", "--critical", help="Critical threshold. Single vluae or range, e.g. '25:45'.")
533
534    args = parser.parse_args()
535```
536
537Users might call plugins only with the critical threshold parameter,
538leaving out the warning parameter. Keep this in mind when evaluating
539the thresholds, always check if the parameters have been defined before.
540
541```python
542    if args.critical:
543        if ptc_value > args.critical:
544            print("CRITICAL - ...")
545            sys.exit(2) # Critical
546
547    if args.warning:
548        if ptc_value > args.warning:
549            print("WARNING - ...")
550            sys.exit(1) # Warning
551
552    print("OK - ...")
553    sys.exit(0) # OK
554```
555
556The above is a simplified example for printing the [output](05-service-monitoring.md#service-monitoring-plugin-api-output)
557and using the [state](05-service-monitoring.md#service-monitoring-plugin-api-status)
558as exit code.
559
560Before diving into the implementation, learn more about required
561[performance data metrics](05-service-monitoring.md#service-monitoring-plugin-api-performance-data-metrics)
562and more best practices below.
563
564##### Threshold Ranges <a id="service-monitoring-plugin-api-thresholds-ranges"></a>
565
566Threshold ranges can be used to specify an alert window, e.g. whenever a calculated
567value is between a lower and higher critical threshold.
568
569The schema for threshold ranges looks as follows. The `@` character in square brackets
570is optional.
571
572```
573[@]start:end
574```
575
576There are a few requirements for ranges:
577
578* `start <= end`. Add a check in your code and let the user know about problematic values.
579
580```
58110:20 	# OK
582
58330:10 	# Error
584```
585
586* `start:` can be omitted if its value is 0. This is the default handling for single threshold values too.
587
588```
58910 	# Every value > 10 and < 0, outside of 0..10
590```
591
592* If `end` is omitted, assume end is infinity.
593
594```
59510: 	# < 10, outside of 10..∞
596```
597
598* In order to specify negative infinity, use the `~` character.
599
600```
601~:10	# > 10, outside of -∞..10
602```
603
604* Raise alert if value is outside of the defined range.
605
606```
60710:20 	# < 10 or > 20, outside of 10..20
608```
609
610* Start with `@` to raise an alert if the value is **inside** the defined range, inclusive start/end values.
611
612```
613@10:20	# >= 10 and <= 20, inside of 10..20
614```
615
616Best practice is to either implement single threshold values, or fully support ranges.
617This requires parsing the input parameter values, therefore look for existing libraries
618already providing this functionality.
619
620[check_tinkerforge](https://github.com/NETWAYS/check_tinkerforge/blob/master/check_tinkerforge.py)
621implements a simple parser to avoid dependencies.
622
623
624#### Performance Data Metrics <a id="service-monitoring-plugin-api-performance-data-metrics"></a>
625
626Performance data metrics must be appended to the plugin output with a preceding `|` character.
627The schema is as follows:
628
629```
630<output> | 'label'=value[UOM];[warn];[crit];[min];[max]
631```
632
633The label should be encapsulated with single quotes. Avoid spaces or special characters such
634as `%` in there, this could lead to problems with metric receivers such as Graphite.
635
636Labels must not include `'` and `=` characters. Keep the label length as short and unique as possible.
637
638Example:
639
640```
641'load1'=4.7
642```
643
644Values must respect the C/POSIX locale and not implement e.g. German locale for floating point numbers with `,`.
645Icinga sets `LC_NUMERIC=C` to enforce this locale on plugin execution.
646
647##### Unit of Measurement (UOM) <a id="service-monitoring-plugin-api-performance-data-metrics-uom"></a>
648
649```
650'rta'=12.445000ms 'pl'=0%
651```
652
653The UoMs are written as-is into the [core backends](14-features.md#core-backends)
654(IDO, API). I.e. 12.445000ms remain 12.445000ms.
655
656In contrast, the [metric backends](14-features.md#metrics)
657(Graphite, InfluxDB, etc.) get perfdata (including warn, crit, min, max)
658normalized by Icinga. E.g. 12.445000ms become 0.012445 seconds.
659
660Some plugins change the UoM for different sizing, e.g. returning the disk usage in MB and later GB
661for the same performance data label. This is to ensure that graphs always look the same.
662
663[Icinga DB](14-features.md#core-backends-icingadb) gets both the as-is and the normalized perfdata.
664
665What metric backends get... | ... from which perfdata UoMs (case-insensitive if possible)
666----------------------------|---------------------------------------
667bytes (B)                   | B, KB, MB, ..., YB, KiB, MiB, ..., YiB
668bits (b)                    | b, kb, mb, ..., yb, kib, mib, ..., yib
669packets                     | packets
670seconds (s)                 | ns, us, ms, s, m, h, d
671percent                     | %
672amperes (A)                 | nA, uA, mA, A, kA, MA, GA, ..., YA
673ohms (O)                    | nO, uO, mO, O, kO, MO, GO, ..., YO
674volts (V)                   | nV, uV, mV, V, kV, MV, GV, ..., YV
675watts (W)                   | nW, uW, mW, W, kW, MW, GW, ..., YW
676ampere seconds (As)         | nAs, uAs, mAs, As, kAs, MAs, GAs, ..., YAs
677ampere seconds              | nAm, uAm, mAm, Am (ampere minutes), kAm, MAm, GAm, ..., YAm
678ampere seconds              | nAh, uAh, mAh, Ah (ampere hours), kAh, MAh, GAh, ..., YAh
679watt hours                  | nWs, uWs, mWs, Ws (watt seconds), kWs, MWs, GWs, ..., YWs
680watt hours                  | nWm, uWm, mWm, Wm (watt minutes), kWm, MWm, GWm, ..., YWm
681watt hours (Wh)             | nWh, uWh, mWh, Wh, kWh, MWh, GWh, ..., YWh
682lumens                      | lm
683decibel-milliwatts          | dBm
684grams (g)                   | ng, ug, mg, g, kg, t
685degrees Celsius             | C
686degrees Fahrenheit          | F
687degrees Kelvin              | K
688liters (l)                  | ml, l, hl
689
690The UoM "c" represents a continuous counter (e.g. interface traffic counters).
691
692Unknown UoMs are discarted (as if none was given).
693A value without any UoM may be an integer or floating point number
694for any type (processes, users, etc.).
695
696##### Thresholds and Min/Max <a id="service-monitoring-plugin-api-performance-data-metrics-thresholds-min-max"></a>
697
698Next to the performance data value, warn, crit, min, max can optionally be provided. They must be separated
699with the semi-colon `;` character. They share the same UOM with the performance data value.
700
701```
702$ check_ping -4 -H icinga.com -c '200,15%' -w '100,5%'
703
704PING OK - Packet loss = 0%, RTA = 12.44 ms|rta=12.445000ms;100.000000;200.000000;0.000000 pl=0%;5;15;0
705```
706
707##### Multiple Performance Data Values <a id="service-monitoring-plugin-api-performance-data-metrics-multiple"></a>
708
709Multiple performance data values must be joined with a space character. The below example
710is from the [check_load](10-icinga-template-library.md#plugin-check-command-load) plugin.
711
712```
713load1=4.680;1.000;2.000;0; load5=0.000;5.000;10.000;0; load15=0.000;10.000;20.000;0;
714```
715
716#### Timeout <a id="service-monitoring-plugin-api-timeout"></a>
717
718Icinga has a safety mechanism where it kills processes running for too
719long. The timeout can be specified in [CheckCommand objects](09-object-types.md#objecttype-checkcommand)
720or on the host/service object.
721
722Best practice is to control the timeout in the plugin itself
723and provide a clear message followed by the Unknown state.
724
725Example in Python taken from [check_tinkerforge](https://github.com/NETWAYS/check_tinkerforge/blob/master/check_tinkerforge.py):
726
727```python
728import argparse
729import signal
730import sys
731
732def handle_sigalrm(signum, frame, timeout=None):
733    output('Plugin timed out after %d seconds' % timeout, 3)
734
735if __name__ == '__main__':
736    parser = argparse.ArgumentParser()
737    # ... add more arguments
738    parser.add_argument("-t", "--timeout", help="Timeout in seconds (default 10s)", type=int, default=10)
739    args = parser.parse_args()
740
741    signal.signal(signal.SIGALRM, partial(handle_sigalrm, timeout=args.timeout))
742    signal.alarm(args.timeout)
743
744    # ... perform the check and generate output/status
745```
746
747#### Versions <a id="service-monitoring-plugin-api-versions"></a>
748
749Plugins should provide a version via `-V` or `--version` parameter
750which is bumped on releases. This allows to identify problems with
751too old or new versions on the community support channels.
752
753Example in Python taken from [check_tinkerforge](https://github.com/NETWAYS/check_tinkerforge/blob/master/check_tinkerforge.py):
754
755```python
756import argparse
757import signal
758import sys
759
760__version__ = '0.9.1'
761
762if __name__ == '__main__':
763    parser = argparse.ArgumentParser()
764
765    parser.add_argument('-V', '--version', action='version', version='%(prog)s v' + sys.modules[__name__].__version__)
766```
767
768#### Verbose <a id="service-monitoring-plugin-api-verbose"></a>
769
770Plugins should provide a verbose mode with `-v` or `--verbose` in order
771to show more detailed log messages. This helps to debug and analyse the
772flow and execution steps inside the plugin.
773
774Ensure to add the parameter prior to implementing the check logic into
775the plugin.
776
777Example in Python taken from [check_tinkerforge](https://github.com/NETWAYS/check_tinkerforge/blob/master/check_tinkerforge.py):
778
779```python
780import argparse
781import signal
782import sys
783
784if __name__ == '__main__':
785    parser = argparse.ArgumentParser()
786
787    parser.add_argument('-v', '--verbose', action='store_true')
788
789    if args.verbose:
790        print("Verbose debug output")
791```
792
793
794### Create a new Plugin <a id="service-monitoring-plugin-new"></a>
795
796Sometimes an existing plugin does not satisfy your requirements. You
797can either kindly contact the original author about plans to add changes
798and/or create a patch.
799
800If you just want to format the output and state of an existing plugin
801it might also be helpful to write a wrapper script. This script
802could pass all configured parameters, call the plugin script, parse
803its output/exit code and return your specified output/exit code.
804
805On the other hand plugins for specific services and hardware might not yet
806exist.
807
808> **Tip**
809>
810> Watch this presentation from Icinga Camp Berlin to learn more
811> about [How to write checks that don't suck](https://www.youtube.com/watch?v=Ey_APqSCoFQ).
812
813Common best practices:
814
815* Choose the programming language wisely
816 * Scripting languages (Bash, Python, Perl, Ruby, PHP, etc.) are easier to write and setup but their check execution might take longer (invoking the script interpreter as overhead, etc.).
817 * Plugins written in C/C++, Go, etc. improve check execution time but may generate an overhead with installation and packaging.
818* Use a modern VCS such as Git for developing the plugin, e.g. share your plugin on GitHub and let it sync to [Icinga Exchange](https://exchange.icinga.com).
819* **Look into existing plugins endorsed by community members.**
820
821Implementation hints:
822
823* Add parameters with key-value pairs to your plugin. They should allow long names (e.g. `--host localhost`) and also short parameters (e.g. `-H localhost`)
824 * `-h|--help` should print the version and all details about parameters and runtime invocation. Note: Python's ArgParse class provides this OOTB.
825 * `--version` should print the plugin [version](05-service-monitoring.md#service-monitoring-plugin-api-versions).
826* Add a [verbose/debug output](05-service-monitoring.md#service-monitoring-plugin-api-verbose) functionality for detailed on-demand logging.
827* Respect the exit codes required by the [Plugin API](05-service-monitoring.md#service-monitoring-plugin-api).
828* Always add [performance data](05-service-monitoring.md#service-monitoring-plugin-api-performance-data-metrics) to your plugin output.
829* Allow to specify [warning/critical thresholds](05-service-monitoring.md#service-monitoring-plugin-api-thresholds) as parameters.
830
831Example skeleton:
832
833```
834# 1. include optional libraries
835# 2. global variables
836# 3. helper functions and/or classes
837# 4. define timeout condition
838
839if (<timeout_reached>) then
840  print "UNKNOWN - Timeout (...) reached | 'time'=30.0
841endif
842
843# 5. main method
844
845<execute and fetch data>
846
847if (<threshold_critical_condition>) then
848  print "CRITICAL - ... | 'time'=0.1 'myperfdatavalue'=5.0
849  exit(2)
850else if (<threshold_warning_condition>) then
851  print "WARNING - ... | 'time'=0.1 'myperfdatavalue'=3.0
852  exit(1)
853else
854  print "OK - ... | 'time'=0.2 'myperfdatavalue'=1.0
855endif
856```
857
858There are various plugin libraries available which will help
859with plugin execution and output formatting too, for example
860[nagiosplugin from Python](https://pypi.python.org/pypi/nagiosplugin/).
861
862> **Note**
863>
864> Ensure to test your plugin properly with special cases before putting it
865> into production!
866
867Once you've finished your plugin please upload/sync it to [Icinga Exchange](https://exchange.icinga.com/new).
868Thanks in advance!
869
870
871## Service Monitoring Overview <a id="service-monitoring-overview"></a>
872
873The following examples should help you to start implementing your own ideas.
874There is a variety of plugins available. This collection is not complete --
875if you have any updates, please send a documentation patch upstream.
876
877Please visit our [community forum](https://community.icinga.com) which
878may provide an answer to your use case already. If not, do not hesitate
879to create a new topic.
880
881### General Monitoring <a id="service-monitoring-general"></a>
882
883If the remote service is available (via a network protocol and port),
884and if a check plugin is also available, you don't necessarily need a local client.
885Instead, choose a plugin and configure its parameters and thresholds. The following examples are included in the [Icinga 2 Template Library](10-icinga-template-library.md#icinga-template-library):
886
887* [ping4](10-icinga-template-library.md#plugin-check-command-ping4), [ping6](10-icinga-template-library.md#plugin-check-command-ping6),
888[fping4](10-icinga-template-library.md#plugin-check-command-fping4), [fping6](10-icinga-template-library.md#plugin-check-command-fping6), [hostalive](10-icinga-template-library.md#plugin-check-command-hostalive)
889* [tcp](10-icinga-template-library.md#plugin-check-command-tcp), [udp](10-icinga-template-library.md#plugin-check-command-udp), [ssl](10-icinga-template-library.md#plugin-check-command-ssl)
890* [ntp_time](10-icinga-template-library.md#plugin-check-command-ntp-time)
891
892### Linux Monitoring <a id="service-monitoring-linux"></a>
893
894* [disk](10-icinga-template-library.md#plugin-check-command-disk)
895* [mem](10-icinga-template-library.md#plugin-contrib-command-mem), [swap](10-icinga-template-library.md#plugin-check-command-swap)
896* [procs](10-icinga-template-library.md#plugin-check-command-processes)
897* [users](10-icinga-template-library.md#plugin-check-command-users)
898* [running_kernel](10-icinga-template-library.md#plugin-contrib-command-running_kernel)
899* package management: [apt](10-icinga-template-library.md#plugin-check-command-apt), [yum](10-icinga-template-library.md#plugin-contrib-command-yum), etc.
900* [ssh](10-icinga-template-library.md#plugin-check-command-ssh)
901* performance: [iostat](10-icinga-template-library.md#plugin-contrib-command-iostat), [check_sar_perf](https://github.com/dnsmichi/icinga-plugins/blob/master/scripts/check_sar_perf.py)
902
903### Windows Monitoring <a id="service-monitoring-windows"></a>
904
905* [check_wmi_plus](https://edcint.co.nz/checkwmiplus/)
906* [NSClient++](https://www.nsclient.org) (in combination with the Icinga 2 client and either [check_nscp_api](10-icinga-template-library.md#nscp-check-api) or [nscp-local](10-icinga-template-library.md#nscp-plugin-check-commands) check commands)
907* [Icinga 2 Windows Plugins](10-icinga-template-library.md#windows-plugins) (disk, load, memory, network, performance counters, ping, procs, service, swap, updates, uptime, users
908* vbs and Powershell scripts
909
910### Database Monitoring <a id="service-monitoring-database"></a>
911
912* MySQL/MariaDB: [mysql_health](10-icinga-template-library.md#plugin-contrib-command-mysql_health), [mysql](10-icinga-template-library.md#plugin-check-command-mysql), [mysql_query](10-icinga-template-library.md#plugin-check-command-mysql-query)
913* PostgreSQL: [postgres](10-icinga-template-library.md#plugin-contrib-command-postgres)
914* Oracle: [oracle_health](10-icinga-template-library.md#plugin-contrib-command-oracle_health)
915* MSSQL: [mssql_health](10-icinga-template-library.md#plugin-contrib-command-mssql_health)
916* DB2: [db2_health](10-icinga-template-library.md#plugin-contrib-command-db2_health)
917* MongoDB: [mongodb](10-icinga-template-library.md#plugin-contrib-command-mongodb)
918* Elasticsearch: [elasticsearch](10-icinga-template-library.md#plugin-contrib-command-elasticsearch)
919* Redis: [redis](10-icinga-template-library.md#plugin-contrib-command-redis)
920
921### SNMP Monitoring <a id="service-monitoring-snmp"></a>
922
923* [Manubulon plugins](10-icinga-template-library.md#snmp-manubulon-plugin-check-commands) (interface, storage, load, memory, process)
924* [snmp](10-icinga-template-library.md#plugin-check-command-snmp), [snmpv3](10-icinga-template-library.md#plugin-check-command-snmpv3)
925
926### Network Monitoring <a id="service-monitoring-network"></a>
927
928* [nwc_health](10-icinga-template-library.md#plugin-contrib-command-nwc_health)
929* [interfaces](10-icinga-template-library.md#plugin-contrib-command-interfaces)
930* [interfacetable](10-icinga-template-library.md#plugin-contrib-command-interfacetable)
931* [iftraffic](10-icinga-template-library.md#plugin-contrib-command-iftraffic), [iftraffic64](10-icinga-template-library.md#plugin-contrib-command-iftraffic64)
932
933### Web Monitoring <a id="service-monitoring-web"></a>
934
935* [http](10-icinga-template-library.md#plugin-check-command-http)
936* [ftp](10-icinga-template-library.md#plugin-check-command-ftp)
937* [webinject](10-icinga-template-library.md#plugin-contrib-command-webinject)
938* [squid](10-icinga-template-library.md#plugin-contrib-command-squid)
939* [apache-status](10-icinga-template-library.md#plugin-contrib-command-apache-status)
940* [nginx_status](10-icinga-template-library.md#plugin-contrib-command-nginx_status)
941* [kdc](10-icinga-template-library.md#plugin-contrib-command-kdc)
942* [rbl](10-icinga-template-library.md#plugin-contrib-command-rbl)
943
944* [Icinga Certificate Monitoring](https://icinga.com/products/icinga-certificate-monitoring/)
945
946### Java Monitoring <a id="service-monitoring-java"></a>
947
948* [jmx4perl](10-icinga-template-library.md#plugin-contrib-command-jmx4perl)
949
950### DNS Monitoring <a id="service-monitoring-dns"></a>
951
952* [dns](10-icinga-template-library.md#plugin-check-command-dns)
953* [dig](10-icinga-template-library.md#plugin-check-command-dig)
954* [dhcp](10-icinga-template-library.md#plugin-check-command-dhcp)
955
956### Backup Monitoring <a id="service-monitoring-backup"></a>
957
958* [check_bareos](https://github.com/widhalmt/check_bareos)
959
960### Log Monitoring <a id="service-monitoring-log"></a>
961
962* [check_logfiles](https://labs.consol.de/nagios/check_logfiles/)
963* [check_logstash](https://github.com/NETWAYS/check_logstash)
964* [check_graylog2_stream](https://github.com/Graylog2/check-graylog2-stream)
965
966### Virtualization Monitoring <a id="service-monitoring-virtualization"></a>
967
968### VMware Monitoring <a id="service-monitoring-virtualization-vmware"></a>
969
970* [Icinga Module for vSphere](https://icinga.com/products/icinga-module-for-vsphere/)
971* [esxi_hardware](10-icinga-template-library.md#plugin-contrib-command-esxi-hardware)
972* [VMware](10-icinga-template-library.md#plugin-contrib-vmware)
973
974**Tip**: If you are encountering timeouts using the VMware Perl SDK,
975check [this blog entry](https://www.claudiokuenzler.com/blog/650/slow-vmware-perl-sdk-soap-request-error-libwww-version).
976Ubuntu 16.04 LTS can have troubles with random entropy in Perl asked [here](https://monitoring-portal.org/t/check-vmware-api-slow-when-run-multiple-times/2868).
977In that case, [haveged](https://issihosts.com/haveged/) may help.
978
979### SAP Monitoring <a id="service-monitoring-sap"></a>
980
981* [check_sap_health](https://labs.consol.de/nagios/check_sap_health/index.html)
982* [SAP CCMS](https://sourceforge.net/projects/nagios-sap-ccms/)
983
984### Mail Monitoring <a id="service-monitoring-mail"></a>
985
986* [smtp](10-icinga-template-library.md#plugin-check-command-smtp), [ssmtp](10-icinga-template-library.md#plugin-check-command-ssmtp)
987* [imap](10-icinga-template-library.md#plugin-check-command-imap), [simap](10-icinga-template-library.md#plugin-check-command-simap)
988* [pop](10-icinga-template-library.md#plugin-check-command-pop), [spop](10-icinga-template-library.md#plugin-check-command-spop)
989* [mailq](10-icinga-template-library.md#plugin-check-command-mailq)
990
991### Hardware Monitoring <a id="service-monitoring-hardware"></a>
992
993* [hpasm](10-icinga-template-library.md#plugin-contrib-command-hpasm)
994* [ipmi-sensor](10-icinga-template-library.md#plugin-contrib-command-ipmi-sensor)
995
996### Metrics Monitoring <a id="service-monitoring-metrics"></a>
997
998* [graphite](10-icinga-template-library.md#plugin-contrib-command-graphite)
999