1# Service Monitoring <a id="service-monitoring"></a> 2 3The power of Icinga 2 lies in its modularity. There are thousands of 4community plugins available next to the standard plugins provided by 5the [Monitoring Plugins project](https://www.monitoring-plugins.org). 6 7Start your research on [Icinga Exchange](https://exchange.icinga.com) 8and look which services are already [covered](05-service-monitoring.md#service-monitoring-overview). 9 10The [requirements chapter](05-service-monitoring.md#service-monitoring-requirements) guides you 11through the plugin setup, tests and their integration with an [existing](05-service-monitoring.md#service-monitoring-plugin-checkcommand) 12or [new](05-service-monitoring.md#service-monitoring-plugin-checkcommand-new) CheckCommand object 13and host/service objects inside the [Director](05-service-monitoring.md#service-monitoring-plugin-checkcommand-integration-director) 14or [Icinga config files](05-service-monitoring.md#service-monitoring-plugin-checkcommand-integration-config-files). 15It also adds hints on [modifying](05-service-monitoring.md#service-monitoring-plugin-checkcommand-modify) existing commands. 16 17Plugins follow the [Plugin API specification](05-service-monitoring.md#service-monitoring-plugin-api) 18which is enriched with examples and also code examples to get you started with 19[your own plugin](05-service-monitoring.md#service-monitoring-plugin-new). 20 21 22 23## Requirements <a id="service-monitoring-requirements"></a> 24 25### Plugins <a id="service-monitoring-plugins"></a> 26 27All existing Icinga or Nagios plugins work with Icinga 2. Community 28plugins can be found for example on [Icinga Exchange](https://exchange.icinga.com). 29 30The recommended way of setting up these plugins is to copy them 31into the `PluginDir` directory. 32 33If you have plugins with many dependencies, consider creating a 34custom RPM/DEB package which handles the required libraries and binaries. 35 36Configuration management tools such as Puppet, Ansible, Chef or Saltstack 37also help with automatically installing the plugins on different 38operating systems. They can also help with installing the required 39dependencies, e.g. Python libraries, Perl modules, etc. 40 41### Plugin Setup <a id="service-monitoring-plugins-setup"></a> 42 43Good plugins provide installations and configuration instructions 44in their docs and/or README on GitHub. 45 46Sometimes dependencies are not listed, or your distribution differs from the one 47described. Try running the plugin after setup and [ensure it works](05-service-monitoring.md#service-monitoring-plugins-it-works). 48 49#### Ensure it works <a id="service-monitoring-plugins-it-works"></a> 50 51Prior to using the check plugin with Icinga 2 you should ensure that it is working properly 52by trying to run it on the console using whichever user Icinga 2 is running as: 53 54RHEL/CentOS/Fedora 55 56```bash 57sudo -u icinga /usr/lib64/nagios/plugins/check_mysql_health --help 58``` 59 60Debian/Ubuntu 61 62```bash 63sudo -u nagios /usr/lib/nagios/plugins/check_mysql_health --help 64``` 65 66Additional libraries may be required for some plugins. Please consult the plugin 67documentation and/or the included README file for installation instructions. 68Sometimes plugins contain hard-coded paths to other components. Instead of changing 69the plugin it might be easier to create a symbolic link to make sure it doesn't get 70overwritten during the next update. 71 72Sometimes there are plugins which do not exactly fit your requirements. 73In that case you can modify an existing plugin or just write your own. 74 75#### Plugin Dependency Errors <a id="service-monitoring-plugins-setup-dependency-errors"></a> 76 77Plugins can be scripts (Shell, Python, Perl, Ruby, PHP, etc.) 78or compiled binaries (C, C++, Go). 79 80These scripts/binaries may require additional libraries 81which must be installed on every system they are executed. 82 83> **Tip** 84> 85> Don't test the plugins on your master instance, instead 86> do that on the satellites and clients which execute the 87> checks. 88 89There are errors, now what? Typical errors are missing libraries, 90binaries or packages. 91 92##### Python Example 93 94Example for a Python plugin which uses the `tinkerforge` module 95to query a network service: 96 97``` 98ImportError: No module named tinkerforge.ip_connection 99``` 100 101Its [documentation](https://github.com/NETWAYS/check_tinkerforge#installation) 102points to installing the `tinkerforge` Python module. 103 104##### Perl Example 105 106Example for a Perl plugin which uses SNMP: 107 108``` 109Can't locate Net/SNMP.pm in @INC (you may need to install the Net::SNMP module) 110``` 111 112Prior to installing the Perl module via CPAN, look for a distribution 113specific package, e.g. `libnet-snmp-perl` on Debian/Ubuntu or `perl-Net-SNMP` 114on RHEL/CentOS. 115 116 117#### Optional: Custom Path <a id="service-monitoring-plugins-custom-path"></a> 118 119If you are not using the default `PluginDir` directory, you 120can create a custom plugin directory and constant 121and reference this in the created CheckCommand objects. 122 123Create a common directory e.g. `/opt/monitoring/plugins` 124and install the plugin there. 125 126```bash 127mkdir -p /opt/monitoring/plugins 128cp check_snmp_int.pl /opt/monitoring/plugins 129chmod +x /opt/monitoring/plugins/check_snmp_int.pl 130``` 131 132Next create a new global constant, e.g. `CustomPluginDir` 133in your [constants.conf](04-configuration.md#constants-conf) 134configuration file: 135 136``` 137vim /etc/icinga2/constants.conf 138 139const PluginDir = "/usr/lib/nagios/plugins" 140const CustomPluginDir = "/opt/monitoring/plugins" 141``` 142 143### CheckCommand Definition <a id="service-monitoring-plugin-checkcommand"></a> 144 145Each plugin requires a [CheckCommand](09-object-types.md#objecttype-checkcommand) object in your 146configuration which can be used in the [Service](09-object-types.md#objecttype-service) or 147[Host](09-object-types.md#objecttype-host) object definition. 148 149Please check if the Icinga 2 package already provides an 150[existing CheckCommand definition](10-icinga-template-library.md#icinga-template-library). 151 152If that's the case, thoroughly check the required parameters and integrate the check command 153into your host and service objects. Best practice is to run the plugin on the CLI 154with the required parameters first. 155 156Example for database size checks with [check_mysql_health](10-icinga-template-library.md#plugin-contrib-command-mysql_health). 157 158```bash 159/usr/lib64/nagios/plugins/check_mysql_health --hostname '127.0.0.1' --username root --password icingar0xx --mode sql --name 'select sum(data_length + index_length) / 1024 / 1024 from information_schema.tables where table_schema = '\''icinga'\'';' '--name2' 'db_size' --units 'MB' --warning 4096 --critical 8192 160``` 161 162The parameter names inside the ITL commands follow the 163`<command name>_<parameter name>` schema. 164 165#### Icinga Director Integration <a id="service-monitoring-plugin-checkcommand-integration-director"></a> 166 167Navigate into `Commands > External Commands` and search for `mysql_health`. 168Select `mysql_health` and navigate into the `Fields` tab. 169 170In order to access the parameters, the Director requires you to first 171define the needed custom data fields: 172 173* `mysql_health_hostname` 174* `mysql_health_username` and `mysql_health_password` 175* `mysql_health_mode` 176* `mysql_health_name`, `mysql_health_name2` and `mysql_health_units` 177* `mysql_health_warning` and `mysql_health_critical` 178 179Create a new host template and object where you'll generic 180settings like `mysql_health_hostname` (if it differs from the host's 181`address` attribute) and `mysql_health_username` and `mysql_health_password`. 182 183Create a new service template for `mysql-health` and set the `mysql_health` 184as check command. You can also define a default for `mysql_health_mode`. 185 186Next, create a service apply rule or a new service set which gets assigned 187to matching host objects. 188 189 190#### Icinga Config File Integration <a id="service-monitoring-plugin-checkcommand-integration-config-files"></a> 191 192Create or modify a host object which stores 193the generic database defaults and prepares details 194for a service apply for rule. 195 196``` 197object Host "icinga2-master1.localdomain" { 198 check_command = "hostalive" 199 address = "..." 200 201 // Database listens locally, not external 202 vars.mysql_health_hostname = "127.0.0.1" 203 204 // Basic database size checks for Icinga DBs 205 vars.databases["icinga"] = { 206 mysql_health_warning = 4096 //MB 207 mysql_health_critical = 8192 //MB 208 } 209 vars.databases["icingaweb2"] = { 210 mysql_health_warning = 4096 //MB 211 mysql_health_critical = 8192 //MB 212 } 213} 214``` 215 216The host object prepares the database details and thresholds already 217for advanced [apply for](03-monitoring-basics.md#using-apply-for) rules. It also uses 218conditions to fetch host specified values, or set default values. 219 220``` 221apply Service "db-size-" for (db_name => config in host.vars.databases) { 222 check_interval = 1m 223 retry_interval = 30s 224 225 check_command = "mysql_health" 226 227 if (config.mysql_health_username) { 228 vars.mysql_healt_username = config.mysql_health_username 229 } else { 230 vars.mysql_health_username = "root" 231 } 232 if (config.mysql_health_password) { 233 vars.mysql_healt_password = config.mysql_health_password 234 } else { 235 vars.mysql_health_password = "icingar0xx" 236 } 237 238 vars.mysql_health_mode = "sql" 239 vars.mysql_health_name = "select sum(data_length + index_length) / 1024 / 1024 from information_schema.tables where table_schema = '" + db_name + "';" 240 vars.mysql_health_name2 = "db_size" 241 vars.mysql_health_units = "MB" 242 243 if (config.mysql_health_warning) { 244 vars.mysql_health_warning = config.mysql_health_warning 245 } 246 if (config.mysql_health_critical) { 247 vars.mysql_health_critical = config.mysql_health_critical 248 } 249 250 vars += config 251} 252``` 253 254#### New CheckCommand <a id="service-monitoring-plugin-checkcommand-new"></a> 255 256This chapter describes how to add a new CheckCommand object for a plugin. 257 258Please make sure to follow these conventions when adding a new command object definition: 259 260* Use [command arguments](03-monitoring-basics.md#command-arguments) whenever possible. The `command` attribute 261must be an array in `[ ... ]` for shell escaping. 262* Define a unique `prefix` for the command's specific arguments. Best practice is to follow this schema: 263 264``` 265<command name>_<parameter name> 266``` 267 268That way you can safely set them on host/service level and you'll always know which command they control. 269* Use command argument default values, e.g. for thresholds. 270* Use [advanced conditions](09-object-types.md#objecttype-checkcommand) like `set_if` definitions. 271 272Before starting with the CheckCommand definition, please check 273the existing objects available inside the ITL. They follow best 274practices and are maintained by developers and our community. 275 276This example picks a new plugin called [check_systemd](https://exchange.icinga.com/joseffriedrich/check_systemd) 277uploaded to Icinga Exchange in June 2019. 278 279First, [install](05-service-monitoring.md#service-monitoring-plugins-setup) the plugin and ensure 280that [it works](05-service-monitoring.md#service-monitoring-plugins-it-works). Then run it with the 281`--help` parameter to see the actual parameters (docs might be outdated). 282 283``` 284./check_systemd.py --help 285 286usage: check_systemd.py [-h] [-c SECONDS] [-e UNIT | -u UNIT] [-v] [-V] 287 [-w SECONDS] 288 289... 290 291optional arguments: 292 -h, --help show this help message and exit 293 -c SECONDS, --critical SECONDS 294 Startup time in seconds to result in critical status. 295 -e UNIT, --exclude UNIT 296 Exclude a systemd unit from the checks. This option 297 can be applied multiple times. For example: -e mnt- 298 data.mount -e task.service. 299 -u UNIT, --unit UNIT Name of the systemd unit that is beeing tested. 300 -v, --verbose Increase output verbosity (use up to 3 times). 301 -V, --version show program's version number and exit 302 -w SECONDS, --warning SECONDS 303 Startup time in seconds to result in warning status. 304``` 305 306The argument description is important, based on this you need to create the 307command arguments. 308 309> **Tip** 310> 311> When you are using the Director, you can prepare the commands as files 312> e.g. inside the `global-templates` zone. Then run the kickstart wizard 313> again to import the commands as external reference. 314> 315> If you prefer to use the Director GUI/CLI, please apply the steps 316> in the `Add Command` form. 317 318Start with the basic plugin call without any parameters. 319 320``` 321object CheckCommand "systemd" { // Plugin name without 'check_' prefix 322 command = [ PluginContribDir + "/check_systemd.py" ] // Use the 'PluginContribDir' constant, see the contributed ITL commands 323} 324``` 325 326Run a config validation to see if that works, `icinga2 daemon -C` 327 328Next, analyse the plugin parameters. Plugins with a good help output show 329optional parameters in square brackes. This is the case for all parameters 330for this plugin. If there are required parameters, use the `required` key 331inside the argument. 332 333The `arguments` attribute is a dictionary which takes the parameters as keys. 334 335``` 336 arguments = { 337 "--unit" = { ... } 338 } 339``` 340 341If there a long parameter names available, prefer them. This increases 342readability in both the configuration as well as the executed command line. 343 344The argument value itself is a sub dictionary which has additional keys: 345 346* `value` which references the runtime macro string 347* `description` where you copy the plugin parameter help text into 348* `required`, `set_if`, etc. for advanced parameters, check the [CheckCommand object](09-object-types.md#objecttype-checkcommand) chapter. 349 350The runtime macro syntax is required to allow value extraction when 351the command is executed. 352 353> **Tip** 354> 355> Inside the Director, store the new command first in order to 356> unveil the `Arguments` tab. 357 358Best practice is to use the command name as prefix, in this specific 359case e.g. `systemd_unit`. 360 361``` 362 arguments = { 363 "--unit" = { 364 value = "$systemd_unit$" // The service parameter would then be defined as 'vars.systemd_unit = "icinga2"' 365 description = "Name of the systemd unit that is beeing tested." 366 } 367 "--warning" = { 368 value = "$systemd_warning$" 369 description = "Startup time in seconds to result in warning status." 370 } 371 "--critical" = { 372 value = "$systemd_critical$" 373 description = "Startup time in seconds to result in critical status." 374 } 375 } 376``` 377 378This may take a while -- validate the configuration in between up until 379the CheckCommand definition is done. 380 381Then test and integrate it into your monitoring configuration. 382 383Remember: Do it once and right, and never touch the CheckCommand again. 384Optional arguments allow different use cases and scenarios. 385 386 387Once you have created your really good CheckCommand, please consider 388sharing it with our community by creating a new PR on [GitHub](https://github.com/Icinga/icinga2/blob/master/CONTRIBUTING.md). 389_Please also update the documentation for the ITL._ 390 391 392> **Tip** 393> 394> Inside the Director, you can render the configuration in the Deployment 395> section. Extract the static configuration object and use that as a source 396> for sending it upstream. 397 398 399 400#### Modify Existing CheckCommand <a id="service-monitoring-plugin-checkcommand-modify"></a> 401 402Sometimes an existing CheckCommand inside the ITL is missing a parameter. 403Or you don't need a default parameter value being set. 404 405Instead of copying the entire configuration object, you can import 406an object into another new object. 407 408``` 409object CheckCommand "http-custom" { 410 import "http" // Import existing http object 411 412 arguments += { // Use additive assignment to add missing parameters 413 "--key" = { 414 value = "$http_..." // Keep the parameter name the same as with http 415 } 416 } 417 418 // Override default parameters 419 vars.http_address = "..." 420} 421``` 422 423This CheckCommand can then be referenced in your host/service object 424definitions. 425 426 427### Plugin API <a id="service-monitoring-plugin-api"></a> 428 429Icinga 2 supports the native plugin API specification from the Monitoring Plugins project. 430It is defined in the [Monitoring Plugins](https://www.monitoring-plugins.org) guidelines. 431 432The Icinga documentation revamps the specification into our 433own guideline enriched with examples and best practices. 434 435#### Output <a id="service-monitoring-plugin-api-output"></a> 436 437The output should be as short and as detailed as possible. The 438most common cases include: 439 440- Viewing a problem list in Icinga Web and dashboards 441- Getting paged about a problem 442- Receiving the alert on the CLI or forwarding it to external (ticket) systems 443 444Examples: 445 446``` 447<STATUS>: <A short description what happened> 448 449OK: MySQL connection time is fine (0.0002s) 450WARNING: MySQL connection time is slow (0.5s > 0.1s threshold) 451CRITICAL: MySQL connection time is causing degraded performance (3s > 0.5s threshold) 452``` 453 454Icinga supports reading multi-line output where Icinga Web 455only shows the first line in the listings and everything in the detail view. 456 457Example for an end2end check with many smaller test cases integrated: 458 459``` 460OK: Online banking works. 461Testcase 1: Site reached. 462Testcase 2: Attempted login, JS loads. 463Testcase 3: Login succeeded. 464Testcase 4: View current state works. 465Testcase 5: Transactions fine. 466``` 467 468If the extended output shouldn't be visible in your monitoring, but only for testing, 469it is recommended to implement the `--verbose` plugin parameter to allow 470developers and users to debug further. Check [here](05-service-monitoring.md#service-monitoring-plugin-api-verbose) 471for more implementation tips. 472 473> **Tip** 474> 475> More debug output also helps when implementing your plugin. 476> 477> Best practice is to have the plugin parameter and handling implemented first, 478> then add it anywhere you want to see more, e.g. from initial database connections 479> to actual query results. 480 481 482#### Status <a id="service-monitoring-plugin-api-status"></a> 483 484Value | Status | Description 485------|-----------|------------------------------- 4860 | OK | The check went fine and everything is considered working. 4871 | Warning | The check is above the given warning threshold, or anything else is suspicious requiring attention before it breaks. 4882 | Critical | The check exceeded the critical threshold, or something really is broken and will harm the production environment. 4893 | Unknown | Invalid parameters, low level resource errors (IO device busy, no fork resources, TCP sockets, etc.) preventing the actual check. Higher level errors such as DNS resolving, TCP connection timeouts should be treated as `Critical` instead. Whenever the plugin reaches its timeout (best practice) it should also terminate with `Unknown`. 490 491Keep in mind that these are service states. Icinga automatically maps 492the [host state](03-monitoring-basics.md#check-result-state-mapping) from the returned plugin states. 493 494#### Thresholds <a id="service-monitoring-plugin-api-thresholds"></a> 495 496A plugin calculates specific values and may decide about the exit state on its own. 497This is done with thresholds - warning and critical values which are compared with 498the actual value. Upon this logic, the exit state is determined. 499 500Imagine the following value and defined thresholds: 501 502``` 503ptc_value = 57.8 504 505warning = 50 506critical = 60 507``` 508 509Whenever `ptc_value` is higher than warning or critical, it should return 510the appropriate [state](05-service-monitoring.md#service-monitoring-plugin-api-status). 511 512The threshold evaluation order also is important: 513 514* Critical thresholds are evaluated first and superseed everything else. 515* Warning thresholds are evaluated second 516* If no threshold is matched, return the OK state 517 518Avoid using hardcoded threshold values in your plugins, always 519add them to the argument parser. 520 521Example for Python: 522 523```python 524import argparse 525import signal 526import sys 527 528if __name__ == '__main__': 529 parser = argparse.ArgumentParser() 530 531 parser.add_argument("-w", "--warning", help="Warning threshold. Single value or range, e.g. '20:50'.") 532 parser.add_argument("-c", "--critical", help="Critical threshold. Single vluae or range, e.g. '25:45'.") 533 534 args = parser.parse_args() 535``` 536 537Users might call plugins only with the critical threshold parameter, 538leaving out the warning parameter. Keep this in mind when evaluating 539the thresholds, always check if the parameters have been defined before. 540 541```python 542 if args.critical: 543 if ptc_value > args.critical: 544 print("CRITICAL - ...") 545 sys.exit(2) # Critical 546 547 if args.warning: 548 if ptc_value > args.warning: 549 print("WARNING - ...") 550 sys.exit(1) # Warning 551 552 print("OK - ...") 553 sys.exit(0) # OK 554``` 555 556The above is a simplified example for printing the [output](05-service-monitoring.md#service-monitoring-plugin-api-output) 557and using the [state](05-service-monitoring.md#service-monitoring-plugin-api-status) 558as exit code. 559 560Before diving into the implementation, learn more about required 561[performance data metrics](05-service-monitoring.md#service-monitoring-plugin-api-performance-data-metrics) 562and more best practices below. 563 564##### Threshold Ranges <a id="service-monitoring-plugin-api-thresholds-ranges"></a> 565 566Threshold ranges can be used to specify an alert window, e.g. whenever a calculated 567value is between a lower and higher critical threshold. 568 569The schema for threshold ranges looks as follows. The `@` character in square brackets 570is optional. 571 572``` 573[@]start:end 574``` 575 576There are a few requirements for ranges: 577 578* `start <= end`. Add a check in your code and let the user know about problematic values. 579 580``` 58110:20 # OK 582 58330:10 # Error 584``` 585 586* `start:` can be omitted if its value is 0. This is the default handling for single threshold values too. 587 588``` 58910 # Every value > 10 and < 0, outside of 0..10 590``` 591 592* If `end` is omitted, assume end is infinity. 593 594``` 59510: # < 10, outside of 10..∞ 596``` 597 598* In order to specify negative infinity, use the `~` character. 599 600``` 601~:10 # > 10, outside of -∞..10 602``` 603 604* Raise alert if value is outside of the defined range. 605 606``` 60710:20 # < 10 or > 20, outside of 10..20 608``` 609 610* Start with `@` to raise an alert if the value is **inside** the defined range, inclusive start/end values. 611 612``` 613@10:20 # >= 10 and <= 20, inside of 10..20 614``` 615 616Best practice is to either implement single threshold values, or fully support ranges. 617This requires parsing the input parameter values, therefore look for existing libraries 618already providing this functionality. 619 620[check_tinkerforge](https://github.com/NETWAYS/check_tinkerforge/blob/master/check_tinkerforge.py) 621implements a simple parser to avoid dependencies. 622 623 624#### Performance Data Metrics <a id="service-monitoring-plugin-api-performance-data-metrics"></a> 625 626Performance data metrics must be appended to the plugin output with a preceding `|` character. 627The schema is as follows: 628 629``` 630<output> | 'label'=value[UOM];[warn];[crit];[min];[max] 631``` 632 633The label should be encapsulated with single quotes. Avoid spaces or special characters such 634as `%` in there, this could lead to problems with metric receivers such as Graphite. 635 636Labels must not include `'` and `=` characters. Keep the label length as short and unique as possible. 637 638Example: 639 640``` 641'load1'=4.7 642``` 643 644Values must respect the C/POSIX locale and not implement e.g. German locale for floating point numbers with `,`. 645Icinga sets `LC_NUMERIC=C` to enforce this locale on plugin execution. 646 647##### Unit of Measurement (UOM) <a id="service-monitoring-plugin-api-performance-data-metrics-uom"></a> 648 649``` 650'rta'=12.445000ms 'pl'=0% 651``` 652 653The UoMs are written as-is into the [core backends](14-features.md#core-backends) 654(IDO, API). I.e. 12.445000ms remain 12.445000ms. 655 656In contrast, the [metric backends](14-features.md#metrics) 657(Graphite, InfluxDB, etc.) get perfdata (including warn, crit, min, max) 658normalized by Icinga. E.g. 12.445000ms become 0.012445 seconds. 659 660Some plugins change the UoM for different sizing, e.g. returning the disk usage in MB and later GB 661for the same performance data label. This is to ensure that graphs always look the same. 662 663[Icinga DB](14-features.md#core-backends-icingadb) gets both the as-is and the normalized perfdata. 664 665What metric backends get... | ... from which perfdata UoMs (case-insensitive if possible) 666----------------------------|--------------------------------------- 667bytes (B) | B, KB, MB, ..., YB, KiB, MiB, ..., YiB 668bits (b) | b, kb, mb, ..., yb, kib, mib, ..., yib 669packets | packets 670seconds (s) | ns, us, ms, s, m, h, d 671percent | % 672amperes (A) | nA, uA, mA, A, kA, MA, GA, ..., YA 673ohms (O) | nO, uO, mO, O, kO, MO, GO, ..., YO 674volts (V) | nV, uV, mV, V, kV, MV, GV, ..., YV 675watts (W) | nW, uW, mW, W, kW, MW, GW, ..., YW 676ampere seconds (As) | nAs, uAs, mAs, As, kAs, MAs, GAs, ..., YAs 677ampere seconds | nAm, uAm, mAm, Am (ampere minutes), kAm, MAm, GAm, ..., YAm 678ampere seconds | nAh, uAh, mAh, Ah (ampere hours), kAh, MAh, GAh, ..., YAh 679watt hours | nWs, uWs, mWs, Ws (watt seconds), kWs, MWs, GWs, ..., YWs 680watt hours | nWm, uWm, mWm, Wm (watt minutes), kWm, MWm, GWm, ..., YWm 681watt hours (Wh) | nWh, uWh, mWh, Wh, kWh, MWh, GWh, ..., YWh 682lumens | lm 683decibel-milliwatts | dBm 684grams (g) | ng, ug, mg, g, kg, t 685degrees Celsius | C 686degrees Fahrenheit | F 687degrees Kelvin | K 688liters (l) | ml, l, hl 689 690The UoM "c" represents a continuous counter (e.g. interface traffic counters). 691 692Unknown UoMs are discarted (as if none was given). 693A value without any UoM may be an integer or floating point number 694for any type (processes, users, etc.). 695 696##### Thresholds and Min/Max <a id="service-monitoring-plugin-api-performance-data-metrics-thresholds-min-max"></a> 697 698Next to the performance data value, warn, crit, min, max can optionally be provided. They must be separated 699with the semi-colon `;` character. They share the same UOM with the performance data value. 700 701``` 702$ check_ping -4 -H icinga.com -c '200,15%' -w '100,5%' 703 704PING OK - Packet loss = 0%, RTA = 12.44 ms|rta=12.445000ms;100.000000;200.000000;0.000000 pl=0%;5;15;0 705``` 706 707##### Multiple Performance Data Values <a id="service-monitoring-plugin-api-performance-data-metrics-multiple"></a> 708 709Multiple performance data values must be joined with a space character. The below example 710is from the [check_load](10-icinga-template-library.md#plugin-check-command-load) plugin. 711 712``` 713load1=4.680;1.000;2.000;0; load5=0.000;5.000;10.000;0; load15=0.000;10.000;20.000;0; 714``` 715 716#### Timeout <a id="service-monitoring-plugin-api-timeout"></a> 717 718Icinga has a safety mechanism where it kills processes running for too 719long. The timeout can be specified in [CheckCommand objects](09-object-types.md#objecttype-checkcommand) 720or on the host/service object. 721 722Best practice is to control the timeout in the plugin itself 723and provide a clear message followed by the Unknown state. 724 725Example in Python taken from [check_tinkerforge](https://github.com/NETWAYS/check_tinkerforge/blob/master/check_tinkerforge.py): 726 727```python 728import argparse 729import signal 730import sys 731 732def handle_sigalrm(signum, frame, timeout=None): 733 output('Plugin timed out after %d seconds' % timeout, 3) 734 735if __name__ == '__main__': 736 parser = argparse.ArgumentParser() 737 # ... add more arguments 738 parser.add_argument("-t", "--timeout", help="Timeout in seconds (default 10s)", type=int, default=10) 739 args = parser.parse_args() 740 741 signal.signal(signal.SIGALRM, partial(handle_sigalrm, timeout=args.timeout)) 742 signal.alarm(args.timeout) 743 744 # ... perform the check and generate output/status 745``` 746 747#### Versions <a id="service-monitoring-plugin-api-versions"></a> 748 749Plugins should provide a version via `-V` or `--version` parameter 750which is bumped on releases. This allows to identify problems with 751too old or new versions on the community support channels. 752 753Example in Python taken from [check_tinkerforge](https://github.com/NETWAYS/check_tinkerforge/blob/master/check_tinkerforge.py): 754 755```python 756import argparse 757import signal 758import sys 759 760__version__ = '0.9.1' 761 762if __name__ == '__main__': 763 parser = argparse.ArgumentParser() 764 765 parser.add_argument('-V', '--version', action='version', version='%(prog)s v' + sys.modules[__name__].__version__) 766``` 767 768#### Verbose <a id="service-monitoring-plugin-api-verbose"></a> 769 770Plugins should provide a verbose mode with `-v` or `--verbose` in order 771to show more detailed log messages. This helps to debug and analyse the 772flow and execution steps inside the plugin. 773 774Ensure to add the parameter prior to implementing the check logic into 775the plugin. 776 777Example in Python taken from [check_tinkerforge](https://github.com/NETWAYS/check_tinkerforge/blob/master/check_tinkerforge.py): 778 779```python 780import argparse 781import signal 782import sys 783 784if __name__ == '__main__': 785 parser = argparse.ArgumentParser() 786 787 parser.add_argument('-v', '--verbose', action='store_true') 788 789 if args.verbose: 790 print("Verbose debug output") 791``` 792 793 794### Create a new Plugin <a id="service-monitoring-plugin-new"></a> 795 796Sometimes an existing plugin does not satisfy your requirements. You 797can either kindly contact the original author about plans to add changes 798and/or create a patch. 799 800If you just want to format the output and state of an existing plugin 801it might also be helpful to write a wrapper script. This script 802could pass all configured parameters, call the plugin script, parse 803its output/exit code and return your specified output/exit code. 804 805On the other hand plugins for specific services and hardware might not yet 806exist. 807 808> **Tip** 809> 810> Watch this presentation from Icinga Camp Berlin to learn more 811> about [How to write checks that don't suck](https://www.youtube.com/watch?v=Ey_APqSCoFQ). 812 813Common best practices: 814 815* Choose the programming language wisely 816 * Scripting languages (Bash, Python, Perl, Ruby, PHP, etc.) are easier to write and setup but their check execution might take longer (invoking the script interpreter as overhead, etc.). 817 * Plugins written in C/C++, Go, etc. improve check execution time but may generate an overhead with installation and packaging. 818* Use a modern VCS such as Git for developing the plugin, e.g. share your plugin on GitHub and let it sync to [Icinga Exchange](https://exchange.icinga.com). 819* **Look into existing plugins endorsed by community members.** 820 821Implementation hints: 822 823* Add parameters with key-value pairs to your plugin. They should allow long names (e.g. `--host localhost`) and also short parameters (e.g. `-H localhost`) 824 * `-h|--help` should print the version and all details about parameters and runtime invocation. Note: Python's ArgParse class provides this OOTB. 825 * `--version` should print the plugin [version](05-service-monitoring.md#service-monitoring-plugin-api-versions). 826* Add a [verbose/debug output](05-service-monitoring.md#service-monitoring-plugin-api-verbose) functionality for detailed on-demand logging. 827* Respect the exit codes required by the [Plugin API](05-service-monitoring.md#service-monitoring-plugin-api). 828* Always add [performance data](05-service-monitoring.md#service-monitoring-plugin-api-performance-data-metrics) to your plugin output. 829* Allow to specify [warning/critical thresholds](05-service-monitoring.md#service-monitoring-plugin-api-thresholds) as parameters. 830 831Example skeleton: 832 833``` 834# 1. include optional libraries 835# 2. global variables 836# 3. helper functions and/or classes 837# 4. define timeout condition 838 839if (<timeout_reached>) then 840 print "UNKNOWN - Timeout (...) reached | 'time'=30.0 841endif 842 843# 5. main method 844 845<execute and fetch data> 846 847if (<threshold_critical_condition>) then 848 print "CRITICAL - ... | 'time'=0.1 'myperfdatavalue'=5.0 849 exit(2) 850else if (<threshold_warning_condition>) then 851 print "WARNING - ... | 'time'=0.1 'myperfdatavalue'=3.0 852 exit(1) 853else 854 print "OK - ... | 'time'=0.2 'myperfdatavalue'=1.0 855endif 856``` 857 858There are various plugin libraries available which will help 859with plugin execution and output formatting too, for example 860[nagiosplugin from Python](https://pypi.python.org/pypi/nagiosplugin/). 861 862> **Note** 863> 864> Ensure to test your plugin properly with special cases before putting it 865> into production! 866 867Once you've finished your plugin please upload/sync it to [Icinga Exchange](https://exchange.icinga.com/new). 868Thanks in advance! 869 870 871## Service Monitoring Overview <a id="service-monitoring-overview"></a> 872 873The following examples should help you to start implementing your own ideas. 874There is a variety of plugins available. This collection is not complete -- 875if you have any updates, please send a documentation patch upstream. 876 877Please visit our [community forum](https://community.icinga.com) which 878may provide an answer to your use case already. If not, do not hesitate 879to create a new topic. 880 881### General Monitoring <a id="service-monitoring-general"></a> 882 883If the remote service is available (via a network protocol and port), 884and if a check plugin is also available, you don't necessarily need a local client. 885Instead, choose a plugin and configure its parameters and thresholds. The following examples are included in the [Icinga 2 Template Library](10-icinga-template-library.md#icinga-template-library): 886 887* [ping4](10-icinga-template-library.md#plugin-check-command-ping4), [ping6](10-icinga-template-library.md#plugin-check-command-ping6), 888[fping4](10-icinga-template-library.md#plugin-check-command-fping4), [fping6](10-icinga-template-library.md#plugin-check-command-fping6), [hostalive](10-icinga-template-library.md#plugin-check-command-hostalive) 889* [tcp](10-icinga-template-library.md#plugin-check-command-tcp), [udp](10-icinga-template-library.md#plugin-check-command-udp), [ssl](10-icinga-template-library.md#plugin-check-command-ssl) 890* [ntp_time](10-icinga-template-library.md#plugin-check-command-ntp-time) 891 892### Linux Monitoring <a id="service-monitoring-linux"></a> 893 894* [disk](10-icinga-template-library.md#plugin-check-command-disk) 895* [mem](10-icinga-template-library.md#plugin-contrib-command-mem), [swap](10-icinga-template-library.md#plugin-check-command-swap) 896* [procs](10-icinga-template-library.md#plugin-check-command-processes) 897* [users](10-icinga-template-library.md#plugin-check-command-users) 898* [running_kernel](10-icinga-template-library.md#plugin-contrib-command-running_kernel) 899* package management: [apt](10-icinga-template-library.md#plugin-check-command-apt), [yum](10-icinga-template-library.md#plugin-contrib-command-yum), etc. 900* [ssh](10-icinga-template-library.md#plugin-check-command-ssh) 901* performance: [iostat](10-icinga-template-library.md#plugin-contrib-command-iostat), [check_sar_perf](https://github.com/dnsmichi/icinga-plugins/blob/master/scripts/check_sar_perf.py) 902 903### Windows Monitoring <a id="service-monitoring-windows"></a> 904 905* [check_wmi_plus](https://edcint.co.nz/checkwmiplus/) 906* [NSClient++](https://www.nsclient.org) (in combination with the Icinga 2 client and either [check_nscp_api](10-icinga-template-library.md#nscp-check-api) or [nscp-local](10-icinga-template-library.md#nscp-plugin-check-commands) check commands) 907* [Icinga 2 Windows Plugins](10-icinga-template-library.md#windows-plugins) (disk, load, memory, network, performance counters, ping, procs, service, swap, updates, uptime, users 908* vbs and Powershell scripts 909 910### Database Monitoring <a id="service-monitoring-database"></a> 911 912* MySQL/MariaDB: [mysql_health](10-icinga-template-library.md#plugin-contrib-command-mysql_health), [mysql](10-icinga-template-library.md#plugin-check-command-mysql), [mysql_query](10-icinga-template-library.md#plugin-check-command-mysql-query) 913* PostgreSQL: [postgres](10-icinga-template-library.md#plugin-contrib-command-postgres) 914* Oracle: [oracle_health](10-icinga-template-library.md#plugin-contrib-command-oracle_health) 915* MSSQL: [mssql_health](10-icinga-template-library.md#plugin-contrib-command-mssql_health) 916* DB2: [db2_health](10-icinga-template-library.md#plugin-contrib-command-db2_health) 917* MongoDB: [mongodb](10-icinga-template-library.md#plugin-contrib-command-mongodb) 918* Elasticsearch: [elasticsearch](10-icinga-template-library.md#plugin-contrib-command-elasticsearch) 919* Redis: [redis](10-icinga-template-library.md#plugin-contrib-command-redis) 920 921### SNMP Monitoring <a id="service-monitoring-snmp"></a> 922 923* [Manubulon plugins](10-icinga-template-library.md#snmp-manubulon-plugin-check-commands) (interface, storage, load, memory, process) 924* [snmp](10-icinga-template-library.md#plugin-check-command-snmp), [snmpv3](10-icinga-template-library.md#plugin-check-command-snmpv3) 925 926### Network Monitoring <a id="service-monitoring-network"></a> 927 928* [nwc_health](10-icinga-template-library.md#plugin-contrib-command-nwc_health) 929* [interfaces](10-icinga-template-library.md#plugin-contrib-command-interfaces) 930* [interfacetable](10-icinga-template-library.md#plugin-contrib-command-interfacetable) 931* [iftraffic](10-icinga-template-library.md#plugin-contrib-command-iftraffic), [iftraffic64](10-icinga-template-library.md#plugin-contrib-command-iftraffic64) 932 933### Web Monitoring <a id="service-monitoring-web"></a> 934 935* [http](10-icinga-template-library.md#plugin-check-command-http) 936* [ftp](10-icinga-template-library.md#plugin-check-command-ftp) 937* [webinject](10-icinga-template-library.md#plugin-contrib-command-webinject) 938* [squid](10-icinga-template-library.md#plugin-contrib-command-squid) 939* [apache-status](10-icinga-template-library.md#plugin-contrib-command-apache-status) 940* [nginx_status](10-icinga-template-library.md#plugin-contrib-command-nginx_status) 941* [kdc](10-icinga-template-library.md#plugin-contrib-command-kdc) 942* [rbl](10-icinga-template-library.md#plugin-contrib-command-rbl) 943 944* [Icinga Certificate Monitoring](https://icinga.com/products/icinga-certificate-monitoring/) 945 946### Java Monitoring <a id="service-monitoring-java"></a> 947 948* [jmx4perl](10-icinga-template-library.md#plugin-contrib-command-jmx4perl) 949 950### DNS Monitoring <a id="service-monitoring-dns"></a> 951 952* [dns](10-icinga-template-library.md#plugin-check-command-dns) 953* [dig](10-icinga-template-library.md#plugin-check-command-dig) 954* [dhcp](10-icinga-template-library.md#plugin-check-command-dhcp) 955 956### Backup Monitoring <a id="service-monitoring-backup"></a> 957 958* [check_bareos](https://github.com/widhalmt/check_bareos) 959 960### Log Monitoring <a id="service-monitoring-log"></a> 961 962* [check_logfiles](https://labs.consol.de/nagios/check_logfiles/) 963* [check_logstash](https://github.com/NETWAYS/check_logstash) 964* [check_graylog2_stream](https://github.com/Graylog2/check-graylog2-stream) 965 966### Virtualization Monitoring <a id="service-monitoring-virtualization"></a> 967 968### VMware Monitoring <a id="service-monitoring-virtualization-vmware"></a> 969 970* [Icinga Module for vSphere](https://icinga.com/products/icinga-module-for-vsphere/) 971* [esxi_hardware](10-icinga-template-library.md#plugin-contrib-command-esxi-hardware) 972* [VMware](10-icinga-template-library.md#plugin-contrib-vmware) 973 974**Tip**: If you are encountering timeouts using the VMware Perl SDK, 975check [this blog entry](https://www.claudiokuenzler.com/blog/650/slow-vmware-perl-sdk-soap-request-error-libwww-version). 976Ubuntu 16.04 LTS can have troubles with random entropy in Perl asked [here](https://monitoring-portal.org/t/check-vmware-api-slow-when-run-multiple-times/2868). 977In that case, [haveged](https://issihosts.com/haveged/) may help. 978 979### SAP Monitoring <a id="service-monitoring-sap"></a> 980 981* [check_sap_health](https://labs.consol.de/nagios/check_sap_health/index.html) 982* [SAP CCMS](https://sourceforge.net/projects/nagios-sap-ccms/) 983 984### Mail Monitoring <a id="service-monitoring-mail"></a> 985 986* [smtp](10-icinga-template-library.md#plugin-check-command-smtp), [ssmtp](10-icinga-template-library.md#plugin-check-command-ssmtp) 987* [imap](10-icinga-template-library.md#plugin-check-command-imap), [simap](10-icinga-template-library.md#plugin-check-command-simap) 988* [pop](10-icinga-template-library.md#plugin-check-command-pop), [spop](10-icinga-template-library.md#plugin-check-command-spop) 989* [mailq](10-icinga-template-library.md#plugin-check-command-mailq) 990 991### Hardware Monitoring <a id="service-monitoring-hardware"></a> 992 993* [hpasm](10-icinga-template-library.md#plugin-contrib-command-hpasm) 994* [ipmi-sensor](10-icinga-template-library.md#plugin-contrib-command-ipmi-sensor) 995 996### Metrics Monitoring <a id="service-monitoring-metrics"></a> 997 998* [graphite](10-icinga-template-library.md#plugin-contrib-command-graphite) 999