1# Advanced Topics <a id="advanced-topics"></a> 2 3This chapter covers a number of advanced topics. If you're new to Icinga, you 4can safely skip over things you're not interested in. 5 6## Downtimes <a id="downtimes"></a> 7 8Downtimes can be scheduled for planned server maintenance or 9any other targeted service outage you are aware of in advance. 10 11Downtimes suppress notifications and can trigger other 12downtimes too. If the downtime was set by accident, or the duration 13exceeds the maintenance windows, you can manually cancel the downtime. 14 15### Scheduling a downtime <a id="scheduling-downtime"></a> 16 17The most convenient way to schedule planned downtimes is to create 18them in Icinga Web 2 inside the host/service detail view. Select 19multiple hosts/services from the listing with the shift key to 20schedule multiple downtimes. 21 22![Downtime in Icinga Web 2](images/advanced-topics/icingaweb2_downtime_handled.png) 23 24In addition to that you can schedule a downtime by using the Icinga 2 API action 25[schedule-downtime](12-icinga2-api.md#icinga2-api-actions-schedule-downtime). 26This is especially useful to schedule a downtime on-demand inside a (remote) backup 27script, or create maintenance downtimes from a cron job for specific dates and intervals. 28 29Multiple downtimes for a single object may overlap. This is useful 30when you want to extend your maintenance window taking longer than expected. 31If there are multiple downtimes triggered for one object, the overall downtime depth 32will be greater than `1`. 33 34If the downtime was scheduled after the problem changed to a critical hard 35state triggering a problem notification, and the service recovers during 36the downtime window, the recovery notification won't be suppressed. 37 38Planned downtimes are also taken into account for SLA reporting 39tools calculating the SLAs based on the state and downtime history. 40 41### Fixed and Flexible Downtimes <a id="fixed-flexible-downtimes"></a> 42 43A `fixed` downtime will be activated at the defined start time, and 44removed at the end time. During this time window the service state 45will change to `NOT-OK` and then actually trigger the downtime. 46Notifications are suppressed and the downtime depth is incremented. 47 48Common scenarios are a planned distribution upgrade on your linux 49servers, or database updates in your warehouse. The customer knows 50about a fixed downtime window between 23:00 and 24:00. After 24:00 51all problems should be alerted again. Solution is simple - 52schedule a `fixed` downtime starting at 23:00 and ending at 24:00. 53 54Unlike a `fixed` downtime, a `flexible` downtime will be triggered 55by the state change in the time span defined by start and end time, 56and then last for the specified duration in minutes. 57 58Imagine the following scenario: Your service is frequently polled 59by users trying to grab free deleted domains for immediate registration. 60Between 07:30 and 08:00 the impact will hit for 15 minutes and generate 61a network outage visible to the monitoring. The service is still alive, 62but answering too slow to Icinga 2 service checks. 63For that reason, you may want to schedule a downtime between 07:30 and 6408:00 with a duration of 15 minutes. The downtime will then last from 65its trigger time until the duration is over. After that, the downtime 66is removed (may happen before or after the actual end time!). 67 68#### Fixed Downtime <a id="fixed-downtime"></a> 69 70If the host/service changes into a NOT-OK state between the start and 71end time window, the downtime will be marked as `in effect` and 72increases the downtime depth counter. 73 74``` 75 | | | 76start | end 77 trigger time 78``` 79 80#### Flexible Downtime <a id="flexible-downtime"></a> 81 82A flexible downtime defines a time window where the downtime may be 83triggered from a host/service NOT-OK state change. It will then last 84until the specified time duration is reached. That way it can happen 85that the downtime end time is already gone, but the downtime ends 86at `trigger time + duration`. 87 88 89``` 90 | | | 91start | end actual end time 92 |--------------duration--------| 93 trigger time 94``` 95 96 97### Triggered Downtimes <a id="triggered-downtimes"></a> 98 99This is optional when scheduling a downtime. If there is already a downtime 100scheduled for a future maintenance, the current downtime can be triggered by 101that downtime. This renders useful if you have scheduled a host downtime and 102are now scheduling a child host's downtime getting triggered by the parent 103downtime on `NOT-OK` state change. 104 105### Recurring Downtimes <a id="recurring-downtimes"></a> 106 107[ScheduledDowntime objects](09-object-types.md#objecttype-scheduleddowntime) can be used to set up 108recurring downtimes for services. 109 110Example: 111 112``` 113apply ScheduledDowntime "backup-downtime" to Service { 114 author = "icingaadmin" 115 comment = "Scheduled downtime for backup" 116 117 ranges = { 118 monday = "02:00-03:00" 119 tuesday = "02:00-03:00" 120 wednesday = "02:00-03:00" 121 thursday = "02:00-03:00" 122 friday = "02:00-03:00" 123 saturday = "02:00-03:00" 124 sunday = "02:00-03:00" 125 } 126 127 assign where "backup" in service.groups 128} 129``` 130 131Icinga 2 attempts to find the next possible segment from a ScheduledDowntime object's 132`ranges` attribute, and wont create multiple downtimes in the future. In case you need 133all these downtimes planned and visible for the next days, weeks or months, schedule them 134manually via the [REST API](12-icinga2-api.md#icinga2-api-actions-schedule-downtime) using 135a script or cron job. 136 137> **Note** 138> 139> If ScheduledDowntime objects are synced in a distributed high-availability setup, 140> both will create the next possible downtime on their own. These runtime generated 141> downtimes are synced among both zone instances, and you may see sort-of duplicate downtimes 142> in Icinga Web 2. 143 144 145## Comments <a id="comments-intro"></a> 146 147Comments can be added at runtime and are persistent over restarts. You can 148add useful information for others on repeating incidents (for example 149"last time syslog at 100% cpu on 17.10.2013 due to stale nfs mount") which 150is primarily accessible using web interfaces. 151 152You can add a comment either by using the Icinga 2 API action 153[add-comment](12-icinga2-api.md#icinga2-api-actions-add-comment) or 154by sending an [external command](14-features.md#external-commands). 155 156## Acknowledgements <a id="acknowledgements"></a> 157 158If a problem persists and notifications have been sent, you can 159acknowledge the problem. That way other users will get 160a notification that you're aware of the issue and probably are 161already working on a fix. 162 163Note: Acknowledgements also add a new [comment](08-advanced-topics.md#comments-intro) 164which contains the author and text fields. 165 166You can send an acknowledgement either by using the Icinga 2 API action 167[acknowledge-problem](12-icinga2-api.md#icinga2-api-actions-acknowledge-problem) or 168by sending an [external command](14-features.md#external-commands). 169 170 171### Sticky Acknowledgements <a id="sticky-acknowledgements"></a> 172 173The acknowledgement is removed if a state change occurs or if the host/service 174recovers (OK/Up state). 175 176If you acknowledge a problem once you've received a `Critical` notification, 177the acknowledgement will be removed if there is a state transition to `Warning`. 178``` 179OK -> WARNING -> CRITICAL -> WARNING -> OK 180``` 181 182If you prefer to keep the acknowledgement until the problem is resolved (`OK` 183recovery) you need to enable the `sticky` parameter. 184 185 186### Expiring Acknowledgements <a id="expiring-acknowledgements"></a> 187 188Once a problem is acknowledged it may disappear from your `handled problems` 189dashboard and no-one ever looks at it again since it will suppress 190notifications too. 191 192This `fire-and-forget` action is quite common. If you're sure that a 193current problem should be resolved in the future at a defined time, 194you can define an expiration time when acknowledging the problem. 195 196Icinga 2 will clear the acknowledgement when expired and start to 197re-notify, if the problem persists. 198 199 200## Time Periods <a id="timeperiods"></a> 201 202[Time Periods](09-object-types.md#objecttype-timeperiod) define 203time ranges in Icinga where event actions are triggered, for 204example whether a service check is executed or not within 205the `check_period` attribute. Or a notification should be sent to 206users or not, filtered by the `period` and `notification_period` 207configuration attributes for `Notification` and `User` objects. 208 209The `TimePeriod` attribute `ranges` may contain multiple directives, 210including weekdays, days of the month, and calendar dates. 211These types may overlap/override other types in your ranges dictionary. 212 213The descending order of precedence is as follows: 214 215* Calendar date (2008-01-01) 216* Specific month date (January 1st) 217* Generic month date (Day 15) 218* Offset weekday of specific month (2nd Tuesday in December) 219* Offset weekday (3rd Monday) 220* Normal weekday (Tuesday) 221 222If you don't set any `check_period` or `notification_period` attribute 223on your configuration objects, Icinga 2 assumes `24x7` as time period 224as shown below. 225 226``` 227object TimePeriod "24x7" { 228 display_name = "Icinga 2 24x7 TimePeriod" 229 ranges = { 230 "monday" = "00:00-24:00" 231 "tuesday" = "00:00-24:00" 232 "wednesday" = "00:00-24:00" 233 "thursday" = "00:00-24:00" 234 "friday" = "00:00-24:00" 235 "saturday" = "00:00-24:00" 236 "sunday" = "00:00-24:00" 237 } 238} 239``` 240 241If your operation staff should only be notified during workhours, 242create a new timeperiod named `workhours` defining a work day from 24309:00 to 17:00. 244 245``` 246object TimePeriod "workhours" { 247 display_name = "Icinga 2 8x5 TimePeriod" 248 ranges = { 249 "monday" = "09:00-17:00" 250 "tuesday" = "09:00-17:00" 251 "wednesday" = "09:00-17:00" 252 "thursday" = "09:00-17:00" 253 "friday" = "09:00-17:00" 254 } 255} 256``` 257 258### Across midnight <a id="timeperiods-across-midnight"></a> 259 260If you want to specify a notification period across midnight, 261you can define it the following way: 262 263``` 264object TimePeriod "across-midnight" { 265 display_name = "Nightly Notification" 266 ranges = { 267 "saturday" = "22:00-24:00" 268 "sunday" = "00:00-03:00" 269 } 270} 271``` 272 273Starting with v2.11 this can be shortened to using 274the first day as start with an overlapping range into 275the next day: 276 277``` 278object TimePeriod "do-not-disturb" { 279 display_name = "Weekend DND" 280 ranges = { 281 "saturday" = "22:00-06:00" 282 } 283} 284``` 285 286### Across several days, weeks or months <a id="timeperiods-across-days-weeks-months"></a> 287 288Below you can see another example for configuring timeperiods across several 289days, weeks or months. This can be useful when taking components offline 290for a distinct period of time. 291 292``` 293object TimePeriod "standby" { 294 display_name = "Standby" 295 ranges = { 296 "2016-09-30 - 2016-10-30" = "00:00-24:00" 297 } 298} 299``` 300 301Please note that the spaces before and after the dash are mandatory. 302 303Once your time period is configured you can Use the `period` attribute 304to assign time periods to `Notification` and `Dependency` objects: 305 306``` 307apply Notification "mail-icingaadmin" to Service { 308 import "mail-service-notification" 309 user_groups = host.vars.notification.mail.groups 310 users = host.vars.notification.mail.users 311 312 period = "workhours" 313 314 assign where host.vars.notification.mail 315} 316``` 317 318### Time Periods Inclusion and Exclusion <a id="timeperiods-includes-excludes"></a> 319 320Sometimes it is necessary to exclude certain time ranges from 321your default time period definitions, for example, if you don't 322want to send out any notification during the holiday season, 323or if you only want to allow small time windows for executed checks. 324 325The [TimePeriod object](09-object-types.md#objecttype-timeperiod) 326provides the `includes` and `excludes` attributes to solve this issue. 327`prefer_includes` defines whether included or excluded time periods are 328preferred. 329 330The following example defines a time period called `holidays` where 331notifications should be suppressed: 332 333``` 334object TimePeriod "holidays" { 335 ranges = { 336 "january 1" = "00:00-24:00" //new year's day 337 "july 4" = "00:00-24:00" //independence day 338 "december 25" = "00:00-24:00" //christmas 339 "december 31" = "18:00-24:00" //new year's eve (6pm+) 340 "2017-04-16" = "00:00-24:00" //easter 2017 341 "monday -1 may" = "00:00-24:00" //memorial day (last monday in may) 342 "monday 1 september" = "00:00-24:00" //labor day (1st monday in september) 343 "thursday 4 november" = "00:00-24:00" //thanksgiving (4th thursday in november) 344 } 345} 346``` 347 348In addition to that the time period `weekends` defines an additional 349time window which should be excluded from notifications: 350 351``` 352object TimePeriod "weekends-excluded" { 353 ranges = { 354 "saturday" = "00:00-09:00,18:00-24:00" 355 "sunday" = "00:00-09:00,18:00-24:00" 356 } 357} 358``` 359 360The time period `prod-notification` defines the default time ranges 361and adds the excluded time period names as an array. 362 363``` 364object TimePeriod "prod-notification" { 365 excludes = [ "holidays", "weekends-excluded" ] 366 367 ranges = { 368 "monday" = "00:00-24:00" 369 "tuesday" = "00:00-24:00" 370 "wednesday" = "00:00-24:00" 371 "thursday" = "00:00-24:00" 372 "friday" = "00:00-24:00" 373 "saturday" = "00:00-24:00" 374 "sunday" = "00:00-24:00" 375 } 376} 377``` 378 379### Time zone handling <a id="timeperiods-timezones"></a> 380 381Icinga 2 takes the OS' time zone including DST changes into account. 382 383Times inside DST changes are interpreted as before the DST changes. 384I.e. for the time zone Europe/Berlin: 385 386* On 2020-10-25 03:00 CEST the time jumps back to 02:00 CET. 387 For Icinga 02:30 means 02:30 CEST. 388* On 2021-02-28 02:00 CET the time jumps forward to 03:00 CEST. 389 For Icinga (the actually not existing) 02:30 refers to CET 390 and effectively means 03:30 CEST. 391 392## External Passive Check Results <a id="external-check-results"></a> 393 394Hosts or services which do not actively execute a check plugin to receive 395the state and output are called "passive checks" or "external check results". 396In this scenario an external client or script is sending in check results. 397 398You can feed check results into Icinga 2 with the following transport methods: 399 400* [process-check-result action](12-icinga2-api.md#icinga2-api-actions-process-check-result) available with the [REST API](12-icinga2-api.md#icinga2-api) (remote and local) 401* External command sent via command pipe (local only) 402 403Each time a new check result is received, the next expected check time 404is updated. This means that if there are no check result received from 405the external source, Icinga 2 will execute [freshness checks](08-advanced-topics.md#check-result-freshness). 406 407> **Note** 408> 409> The REST API action allows to specify the `check_source` attribute 410> which helps identifying the external sender. This is also visible 411> in Icinga Web 2 and the REST API queries. 412 413## Check Result Freshness <a id="check-result-freshness"></a> 414 415In Icinga 2 active check freshness is enabled by default. It is determined by the 416`check_interval` attribute and no incoming check results in that period of time. 417 418The threshold is calculated based on the last check execution time for actively executed checks: 419 420``` 421(last check execution time + check interval) > current time 422``` 423 424If this host/service receives check results from an [external source](08-advanced-topics.md#external-check-results), 425the threshold is based on the last time a check result was received: 426 427``` 428(last check result time + check interval) > current time 429``` 430 431> **Tip** 432> 433> The [process-check-result](12-icinga2-api.md#icinga2-api-actions-process-check-result) REST API 434> action allows to overrule the pre-defined check interval with a specified TTL in Icinga 2 v2.9+. 435 436If the freshness checks fail, Icinga 2 will execute the defined check command unless active checks are disabled. 437 438Best practice is to define a [dummy](10-icinga-template-library.md#itl-dummy) `check_command` which gets 439executed when freshness checks fail. 440 441``` 442apply Service "external-check" { 443 check_command = "dummy" 444 check_interval = 1m 445 446 /* Set the state to UNKNOWN (3) if freshness checks fail. */ 447 vars.dummy_state = 3 448 449 /* Use a runtime function to retrieve the last check time and more details. */ 450 vars.dummy_text = {{ 451 var service = get_service(macro("$host.name$"), macro("$service.name$")) 452 var lastCheck = DateTime(service.last_check).to_string() 453 454 return "No check results received. Last result time: " + lastCheck 455 }} 456 457 assign where "external" in host.vars.services 458} 459``` 460 461References: [get_service](18-library-reference.md#objref-get_service), [macro](18-library-reference.md#scoped-functions-macro), [DateTime](18-library-reference.md#datetime-type). 462 463Example output in Icinga Web 2: 464 465![Icinga 2 Freshness Checks](images/advanced-topics/icinga2_external_checks_freshness_icingaweb2.png) 466 467 468## Check Flapping <a id="check-flapping"></a> 469 470Icinga 2 supports optional detection of hosts and services that are "flapping". 471 472Flapping occurs when a service or host changes state too frequently, which would result in a storm of problem and 473recovery notifications. With flapping detection enabled a flapping notification will be sent while other notifications are 474suppressed until it calms down after receiving the same status from checks a few times. Flapping detection can help detect 475configuration problems (wrong thresholds), troublesome services or network problems. 476 477Flapping detection can be enabled or disabled using the `enable_flapping` attribute. 478The `flapping_threshold_high` and `flapping_threshold_low` attributes allows to specify the thresholds that control 479when a [host](09-object-types.md#objecttype-host) or [service](09-object-types.md#objecttype-service) is considered to be flapping. 480 481The default thresholds are 30% for high and 25% for low. If the computed flapping value exceeds the high threshold a 482host or service is considered flapping until it drops below the low flapping threshold. 483 484The attribute `flapping_ignore_states` allows to ignore state changes to specified states during the flapping calculation. 485 486`FlappingStart` and `FlappingEnd` notifications will be sent out accordingly, if configured. See the chapter on 487[notifications](alert-notifications) for details 488 489> Note: There is no distinctions between hard and soft states with flapping. All state changes count and notifications 490> will be sent out regardless of the objects state. 491 492### How it works <a id="check-flapping-how-it-works"></a> 493 494Icinga 2 saves the last 20 state changes for every host and service. See the graphic below: 495 496![Icinga 2 Flapping State Timeline](images/advanced-topics/flapping-state-graph.png) 497 498All the states are weighted, with the most recent one being worth the most (1.15) and the 20th the least (0.8). The 499states in between are fairly distributed. The final flapping value are the weighted state changes divided by the total 500count of 20. 501 502In the example above, the added states would have a total value of 7.82 (`0.84 + 0.86 + 0.88 + 0.9 + 0.98 + 1.06 + 1.12 + 1.18`). 503This yields a flapping percentage of 39.1% (`7.82 / 20 * 100`). As the default upper flapping threshold is 30%, it would be 504considered flapping. 505 506If the next seven check results then would not be state changes, the flapping percentage would fall below the lower threshold 507of 25% and therefore the host or service would recover from flapping. 508 509## Volatile Services and Hosts <a id="volatile-services-hosts"></a> 510 511The `volatile` option, if enabled for a host or service, makes it treat every [state change](03-monitoring-basics.md#hard-soft-states) 512as a `HARD` state change. It is comparable to `max_check_attempts = 1`. With this any `NOT-OK` result will 513ignore `max_check_attempts` and trigger notifications etc. It will further cause any additional `NOT-OK` 514result to re-send notifications. 515 516It may be reasonable to have a volatile service which stays in a `HARD` state if the service stays in a `NOT-OK` 517state. That way each service recheck will automatically trigger a notification unless the service is acknowledged or 518in a scheduled downtime. 519 520A common example are security checks where each `NOT-OK` check result should immediately trigger a notification. 521 522The default for this option is `false` and should only be enabled when required. 523 524 525## Monitoring Icinga 2 <a id="monitoring-icinga"></a> 526 527Why should you do that? Icinga and its components run like any other 528service application on your server. There are predictable issues 529such as "disk space is running low" and your monitoring suffers from just 530that. 531 532You would also like to ensure that features and backends are running 533and storing required data. Be it the database backend where Icinga Web 2 534presents fancy dashboards, forwarded metrics to Graphite or InfluxDB or 535the entire distributed setup. 536 537This list isn't complete but should help with your own setup. 538Windows client specific checks are highlighted. 539 540Type | Description | Plugins and CheckCommands 541----------------|-------------------------------|----------------------------------------------------- 542System | Filesystem | [disk](10-icinga-template-library.md#plugin-check-command-disk), [disk-windows](10-icinga-template-library.md#windows-plugins) (Windows Client) 543System | Memory, Swap | [mem](10-icinga-template-library.md#plugin-contrib-command-mem), [swap](10-icinga-template-library.md#plugin-check-command-swap), [memory](10-icinga-template-library.md#windows-plugins) (Windows Client) 544System | Hardware | [hpasm](10-icinga-template-library.md#plugin-contrib-command-hpasm), [ipmi-sensor](10-icinga-template-library.md#plugin-contrib-command-ipmi-sensor) 545System | Virtualization | [VMware](10-icinga-template-library.md#plugin-contrib-vmware), [esxi_hardware](10-icinga-template-library.md#plugin-contrib-command-esxi-hardware) 546System | Processes | [procs](10-icinga-template-library.md#plugin-check-command-processes), [service-windows](10-icinga-template-library.md#windows-plugins) (Windows Client) 547System | System Activity Reports | [sar-perf](10-icinga-template-library.md#plugin-contrib-command-sar-perf) 548System | I/O | [iostat](10-icinga-template-library.md#plugin-contrib-command-iostat) 549System | Network interfaces | [nwc_health](10-icinga-template-library.md#plugin-contrib-command-nwc_health), [interfaces](10-icinga-template-library.md#plugin-contrib-command-interfaces) 550System | Users | [users](10-icinga-template-library.md#plugin-check-command-users), [users-windows](10-icinga-template-library.md#windows-plugins) (Windows Client) 551System | Logs | Forward them to [Elastic Stack](14-features.md#elastic-stack-integration) or [Graylog](14-features.md#graylog-integration) and add your own alerts. 552System | NTP | [ntp_time](10-icinga-template-library.md#plugin-check-command-ntp-time) 553System | Updates | [apt](10-icinga-template-library.md#plugin-check-command-apt), [yum](10-icinga-template-library.md#plugin-contrib-command-yum) 554Icinga | Status & Stats | [icinga](10-icinga-template-library.md#itl-icinga) (more below) 555Icinga | Cluster & Clients | [health checks](06-distributed-monitoring.md#distributed-monitoring-health-checks) 556Database | MySQL | [mysql_health](10-icinga-template-library.md#plugin-contrib-command-mysql_health) 557Database | PostgreSQL | [postgres](10-icinga-template-library.md#plugin-contrib-command-postgres) 558Database | Housekeeping | Check the database size and growth and analyse metrics to examine trends. 559Database | DB IDO | [ido](10-icinga-template-library.md#itl-icinga-ido) (more below) 560Webserver | Apache2, Nginx, etc. | [http](10-icinga-template-library.md#plugin-check-command-http), [apache-status](10-icinga-template-library.md#plugin-contrib-command-apache-status), [nginx_status](10-icinga-template-library.md#plugin-contrib-command-nginx_status) 561Webserver | Certificates | [http](10-icinga-template-library.md#plugin-check-command-http), [Icinga certificate monitoring](https://icinga.com/products/icinga-certificate-monitoring/) 562Webserver | Authorization | [http](10-icinga-template-library.md#plugin-check-command-http) 563Notifications | Mail (queue) | [smtp](10-icinga-template-library.md#plugin-check-command-smtp), [mailq](10-icinga-template-library.md#plugin-check-command-mailq) 564Notifications | SMS (GSM modem) | [check_sms3_status](https://exchange.icinga.com/netways/check_sms3status) 565Notifications | Messengers, Cloud services | XMPP, Twitter, IRC, Telegram, PagerDuty, VictorOps, etc. 566Metrics | PNP, RRDTool | [check_pnp_rrds](https://github.com/lingej/pnp4nagios/tree/master/scripts) checks for stale RRD files. 567Metrics | Graphite | [graphite](10-icinga-template-library.md#plugin-contrib-command-graphite) 568Metrics | InfluxDB | [check_influxdb](https://exchange.icinga.com/Mikanoshi/InfluxDB+data+monitoring+plugin) 569Metrics | Elastic Stack | [elasticsearch](10-icinga-template-library.md#plugin-contrib-command-elasticsearch), [Elastic Stack integration](14-features.md#elastic-stack-integration) 570Metrics | Graylog | [Graylog integration](14-features.md#graylog-integration) 571 572 573The [icinga](10-icinga-template-library.md#itl-icinga) CheckCommand provides metrics for the runtime stats of 574Icinga 2. You can forward them to your preferred graphing solution. 575If you require more metrics you can also query the [REST API](12-icinga2-api.md#icinga2-api) and write 576your own custom check plugin. Or you keep using the built-in [object accessor functions](08-advanced-topics.md#access-object-attributes-at-runtime) 577to calculate stats in-memory. 578 579There is a built-in [ido](10-icinga-template-library.md#itl-icinga-ido) check available for DB IDO MySQL/PostgreSQL 580which provides additional metrics for the IDO database. 581 582``` 583apply Service "ido-mysql" { 584 check_command = "ido" 585 586 vars.ido_type = "IdoMysqlConnection" 587 vars.ido_name = "ido-mysql" //the name defined in /etc/icinga2/features-enabled/ido-mysql.conf 588 589 assign where match("master*.localdomain", host.name) 590} 591``` 592 593More specific database queries can be found in the [DB IDO](14-features.md#db-ido) chapter. 594 595Distributed setups should include specific [health checks](06-distributed-monitoring.md#distributed-monitoring-health-checks). 596 597You might also want to add additional checks for TLS certificate expiration. 598This can be done using the [Icinga certificate monitoring](https://icinga.com/products/icinga-certificate-monitoring/) module. 599 600 601 602## Advanced Configuration Hints <a id="advanced-configuration-hints"></a> 603 604### Advanced Use of Apply Rules <a id="advanced-use-of-apply-rules"></a> 605 606[Apply rules](03-monitoring-basics.md#using-apply) can be used to create a rule set which is 607entirely based on host objects and their attributes. 608In addition to that [apply for and custom variable override](03-monitoring-basics.md#using-apply-for) 609extend the possibilities. 610 611The following example defines a dictionary on the host object which contains 612configuration attributes for multiple web servers. This then used to add three checks: 613 614* A `ping4` check using the local IP `address` of the web server. 615* A `tcp` check querying the TCP port where the HTTP service is running on. 616* If the `url` key is defined, the third apply for rule will create service objects using the `http` CheckCommand. 617In addition to that you can optionally define the `ssl` attribute which enables HTTPS checks. 618 619Host definition: 620 621``` 622object Host "webserver01" { 623 import "generic-host" 624 address = "192.168.56.200" 625 vars.os = "Linux" 626 627 vars.webserver = { 628 instance["status"] = { 629 address = "192.168.56.201" 630 port = "80" 631 url = "/status" 632 } 633 instance["tomcat"] = { 634 address = "192.168.56.202" 635 port = "8080" 636 } 637 instance["icingaweb2"] = { 638 address = "192.168.56.210" 639 port = "443" 640 url = "/icingaweb2" 641 ssl = true 642 } 643 } 644} 645``` 646 647Service apply for definitions: 648 649``` 650apply Service "webserver_ping" for (instance => config in host.vars.webserver.instance) { 651 display_name = "webserver_" + instance 652 check_command = "ping4" 653 654 vars.ping_address = config.address 655 656 assign where host.vars.webserver.instance 657} 658 659apply Service "webserver_port" for (instance => config in host.vars.webserver.instance) { 660 display_name = "webserver_" + instance + "_" + config.port 661 check_command = "tcp" 662 663 vars.tcp_address = config.address 664 vars.tcp_port = config.port 665 666 assign where host.vars.webserver.instance 667} 668 669apply Service "webserver_url" for (instance => config in host.vars.webserver.instance) { 670 display_name = "webserver_" + instance + "_" + config.url 671 check_command = "http" 672 673 vars.http_address = config.address 674 vars.http_port = config.port 675 vars.http_uri = config.url 676 677 if (config.ssl) { 678 vars.http_ssl = config.ssl 679 } 680 681 assign where config.url != "" 682} 683``` 684 685The variables defined in the host dictionary are not using the typical custom variable 686prefix recommended for CheckCommand parameters. Instead they are re-used for multiple 687service checks in this example. 688In addition to defining check parameters this way, you can also enrich the `display_name` 689attribute with more details. This will be shown in in Icinga Web 2 for example. 690 691### Use Functions in Object Configuration <a id="use-functions-object-config"></a> 692 693There is a limited scope where functions can be used as object attributes such as: 694 695* As value for [Custom Variables](03-monitoring-basics.md#custom-variables-functions) 696* Returning boolean expressions for [set_if](08-advanced-topics.md#use-functions-command-arguments-setif) inside command arguments 697* Returning a [command](08-advanced-topics.md#use-functions-command-attribute) array inside command objects 698 699The other way around you can create objects dynamically using your own global functions. 700 701> **Note** 702> 703> Functions called inside command objects share the same global scope as runtime macros. 704> Therefore you can access host custom variables like `host.vars.os`, or any other 705> object attribute from inside the function definition used for [set_if](08-advanced-topics.md#use-functions-command-arguments-setif) or [command](08-advanced-topics.md#use-functions-command-attribute). 706 707Tips when implementing functions: 708 709* Use [log()](18-library-reference.md#global-functions-log) to dump variables. You can see the output 710inside the `icinga2.log` file depending in your log severity 711* Use the `icinga2 console` to test basic functionality (e.g. iterating over a dictionary) 712* Build them step-by-step. You can always refactor your code later on. 713 714#### Register and Use Global Functions <a id="use-functions-global-register"></a> 715 716[Functions](17-language-reference.md#functions) can be registered into the global scope. This allows custom functions being available 717in objects and other functions. Keep in mind that these functions are not marked 718as side-effect-free and as such are not available via the REST API. 719 720Add a new configuration file `functions.conf` and include it into the [icinga2.conf](04-configuration.md#icinga2-conf) 721configuration file in the very beginning, e.g. after `constants.conf`. You can also manage global 722functions inside `constants.conf` if you prefer. 723 724The following function converts a given state parameter into a returned string value. The important 725bits for registering it into the global scope are: 726 727* `globals.<unique_function_name>` adds a new globals entry. 728* `function()` specifies that a call to `state_to_string()` executes a function. 729* Function parameters are defined inside the `function()` definition. 730 731``` 732globals.state_to_string = function(state) { 733 if (state == 2) { 734 return "Critical" 735 } else if (state == 1) { 736 return "Warning" 737 } else if (state == 0) { 738 return "OK" 739 } else if (state == 3) { 740 return "Unknown" 741 } else { 742 log(LogWarning, "state_to_string", "Unknown state " + state + " provided.") 743 } 744} 745``` 746 747The else-condition allows for better error handling. This warning will be shown in the Icinga 2 748log file once the function is called. 749 750> **Note** 751> 752> If these functions are used in a distributed environment, you must ensure to deploy them 753> everywhere needed. 754 755In order to test-drive the newly created function, restart Icinga 2 and use the [debug console](11-cli-commands.md#cli-command-console) 756to connect to the REST API. 757 758``` 759$ ICINGA2_API_PASSWORD=icinga icinga2 console --connect 'https://root@localhost:5665/' 760Icinga 2 (version: v2.11.0) 761<1> => globals.state_to_string(1) 762"Warning" 763<2> => state_to_string(2) 764"Critical" 765``` 766 767You can see that this function is now registered into the [global scope](17-language-reference.md#variable-scopes). The function call 768`state_to_string()` can be used in any object at static config compile time or inside runtime 769lambda functions. 770 771The following service object example uses the service state and converts it to string output. 772The function definition is not optimized and is enrolled for better readability including a log message. 773 774``` 775object Service "state-test" { 776 check_command = "dummy" 777 host_name = NodeName 778 779 vars.dummy_state = 2 780 781 vars.dummy_text = {{ 782 var h = macro("$host.name$") 783 var s = macro("$service.name$") 784 785 var state = get_service(h, s).state 786 787 log(LogInformation, "dummy_state", "Host: " + h + " Service: " + s + " State: " + state) 788 789 return state_to_string(state) 790 }} 791} 792``` 793 794 795#### Use Custom Functions as Attribute <a id="custom-functions-as-attribute"></a> 796 797To use custom functions as attributes, the function must be defined in a 798slightly unexpected way. The following example shows how to assign values 799depending on group membership. All hosts in the `slow-lan` host group use 300 800as value for `ping_wrta`, all other hosts use 100. 801 802``` 803globals.group_specific_value = function(group, group_value, non_group_value) { 804 return function() use (group, group_value, non_group_value) { 805 if (group in host.groups) { 806 return group_value 807 } else { 808 return non_group_value 809 } 810 } 811} 812 813apply Service "ping4" { 814 import "generic-service" 815 check_command = "ping4" 816 817 vars.ping_wrta = group_specific_value("slow-lan", 300, 100) 818 vars.ping_crta = group_specific_value("slow-lan", 500, 200) 819 820 assign where true 821} 822``` 823 824#### Use Functions in Assign Where Expressions <a id="use-functions-assign-where"></a> 825 826If a simple expression for matching a name or checking if an item 827exists in an array or dictionary does not fit, you should consider 828writing your own global [functions](17-language-reference.md#functions). 829You can call them inside `assign where` and `ignore where` expressions 830for [apply rules](03-monitoring-basics.md#using-apply-expressions) or 831[group assignments](03-monitoring-basics.md#group-assign-intro) just like 832any other global functions for example [match](18-library-reference.md#global-functions-match). 833 834The following example requires the host `myprinter` being added 835to the host group `printers-lexmark` but only if the host uses 836a template matching the name `lexmark*`. 837 838``` 839template Host "lexmark-printer-host" { 840 vars.printer_type = "Lexmark" 841} 842 843object Host "myprinter" { 844 import "generic-host" 845 import "lexmark-printer-host" 846 847 address = "192.168.1.1" 848} 849 850/* register a global function for the assign where call */ 851globals.check_host_templates = function(host, search) { 852 /* iterate over all host templates and check if the search matches */ 853 for (tmpl in host.templates) { 854 if (match(search, tmpl)) { 855 return true 856 } 857 } 858 859 /* nothing matched */ 860 return false 861} 862 863object HostGroup "printers-lexmark" { 864 display_name = "Lexmark Printers" 865 /* call the global function and pass the arguments */ 866 assign where check_host_templates(host, "lexmark*") 867} 868``` 869 870Take a different more complex example: All hosts with the 871custom variable `vars_app` as nested dictionary should be 872added to the host group `ABAP-app-server`. But only if the 873`app_type` for all entries is set to `ABAP`. 874 875It could read as wildcard match for nested dictionaries: 876 877``` 878 where host.vars.vars_app["*"].app_type == "ABAP" 879``` 880 881The solution for this problem is to register a global 882function which checks the `app_type` for all hosts 883with the `vars_app` dictionary. 884 885``` 886object Host "appserver01" { 887 check_command = "dummy" 888 vars.vars_app["ABC"] = { app_type = "ABAP" } 889} 890object Host "appserver02" { 891 check_command = "dummy" 892 vars.vars_app["DEF"] = { app_type = "ABAP" } 893} 894 895globals.check_app_type = function(host, type) { 896 /* ensure that other hosts without the custom variable do not match */ 897 if (typeof(host.vars.vars_app) != Dictionary) { 898 return false 899 } 900 901 /* iterate over the vars_app dictionary */ 902 for (key => val in host.vars.vars_app) { 903 /* if the value is a dictionary and if contains the app_type being the requested type */ 904 if (typeof(val) == Dictionary && val.app_type == type) { 905 return true 906 } 907 } 908 909 /* nothing matched */ 910 return false 911} 912 913object HostGroup "ABAP-app-server" { 914 assign where check_app_type(host, "ABAP") 915} 916``` 917 918#### Use Functions in Command Arguments set_if <a id="use-functions-command-arguments-setif"></a> 919 920The `set_if` attribute inside the command arguments definition in the 921[CheckCommand object definition](09-object-types.md#objecttype-checkcommand) is primarily used to 922evaluate whether the command parameter should be set or not. 923 924By default you can evaluate runtime macros for their existence. If the result is not an empty 925string, the command parameter is passed. This becomes fairly complicated when want to evaluate 926multiple conditions and attributes. 927 928The following example was found on the community support channels. The user had defined a host 929dictionary named `compellent` with the key `disks`. This was then used inside service apply for rules. 930 931``` 932object Host "dict-host" { 933 check_command = "check_compellent" 934 vars.compellent["disks"] = { 935 file = "/var/lib/check_compellent/san_disks.0.json", 936 checks = ["disks"] 937 } 938} 939``` 940 941The more significant problem was to only add the command parameter `--disk` to the plugin call 942when the dictionary `compellent` contains the key `disks`, and omit it if not found. 943 944By defining `set_if` as [abbreviated lambda function](17-language-reference.md#nullary-lambdas) 945and evaluating the host custom variable `compellent` containing the `disks` this problem was 946solved like this: 947 948``` 949object CheckCommand "check_compellent" { 950 command = [ "/usr/bin/check_compellent" ] 951 arguments = { 952 "--disks" = { 953 set_if = {{ 954 var host_vars = host.vars 955 log(host_vars) 956 var compel = host_vars.compellent 957 log(compel) 958 compel.contains("disks") 959 }} 960 } 961 } 962} 963``` 964 965This implementation uses the dictionary type method [contains](18-library-reference.md#dictionary-contains) 966and will fail if `host.vars.compellent` is not of the type `Dictionary`. 967Therefore you can extend the checks using the [typeof](17-language-reference.md#types) function. 968 969You can test the types using the `icinga2 console`: 970 971``` 972# icinga2 console 973Icinga (version: v2.3.0-193-g3eb55ad) 974<1> => srv_vars.compellent["check_a"] = { file="outfile_a.json", checks = [ "disks", "fans" ] } 975null 976<2> => srv_vars.compellent["check_b"] = { file="outfile_b.json", checks = [ "power", "voltages" ] } 977null 978<3> => typeof(srv_vars.compellent) 979type 'Dictionary' 980<4> => 981``` 982 983The more programmatic approach for `set_if` could look like this: 984 985``` 986 "--disks" = { 987 set_if = {{ 988 var srv_vars = service.vars 989 if(len(srv_vars) > 0) { 990 if (typeof(srv_vars.compellent) == Dictionary) { 991 return srv_vars.compellent.contains("disks") 992 } else { 993 log(LogInformation, "checkcommand set_if", "custom variable compellent_checks is not a dictionary, ignoring it.") 994 return false 995 } 996 } else { 997 log(LogWarning, "checkcommand set_if", "empty custom variables") 998 return false 999 } 1000 }} 1001 } 1002``` 1003 1004#### Use Functions as Command Attribute <a id="use-functions-command-attribute"></a> 1005 1006This comes in handy for [NotificationCommands](09-object-types.md#objecttype-notificationcommand) 1007or [EventCommands](09-object-types.md#objecttype-eventcommand) which does not require 1008a returned checkresult including state/output. 1009 1010The following example was taken from the community support channels. The requirement was to 1011specify a custom variable inside the notification apply rule and decide which notification 1012script to call based on that. 1013 1014``` 1015object User "short-dummy" { 1016} 1017 1018object UserGroup "short-dummy-group" { 1019 assign where user.name == "short-dummy" 1020} 1021 1022apply Notification "mail-admins-short" to Host { 1023 import "mail-host-notification" 1024 command = "mail-host-notification-test" 1025 user_groups = [ "short-dummy-group" ] 1026 vars.short = true 1027 assign where host.vars.notification.mail 1028} 1029``` 1030 1031The solution is fairly simple: The `command` attribute is implemented as function returning 1032an array required by the caller Icinga 2. 1033The local variable `mailscript` sets the default value for the notification scrip location. 1034If the notification custom variable `short` is set, it will override the local variable `mailscript` 1035with a new value. 1036The `mailscript` variable is then used to compute the final notification command array being 1037returned. 1038 1039You can omit the `log()` calls, they only help debugging. 1040 1041``` 1042object NotificationCommand "mail-host-notification-test" { 1043 command = {{ 1044 log("command as function") 1045 var mailscript = "mail-host-notification-long.sh" 1046 if (notification.vars.short) { 1047 mailscript = "mail-host-notification-short.sh" 1048 } 1049 log("Running command") 1050 log(mailscript) 1051 1052 var cmd = [ ConfigDir + "/scripts/" + mailscript ] 1053 log(LogCritical, "me", cmd) 1054 return cmd 1055 }} 1056 1057 env = { 1058 } 1059} 1060``` 1061 1062### Access Object Attributes at Runtime <a id="access-object-attributes-at-runtime"></a> 1063 1064The [Object Accessor Functions](18-library-reference.md#object-accessor-functions) 1065can be used to retrieve references to other objects by name. 1066 1067This allows you to access configuration and runtime object attributes. A detailed 1068list can be found [here](09-object-types.md#object-types). 1069 1070#### Access Object Attributes at Runtime: Cluster Check <a id="access-object-attributes-at-runtime-cluster-check"></a> 1071 1072This is a simple cluster example for accessing two host object states and calculating a virtual 1073cluster state and output: 1074 1075``` 1076object Host "cluster-host-01" { 1077 check_command = "dummy" 1078 vars.dummy_state = 2 1079 vars.dummy_text = "This host is down." 1080} 1081 1082object Host "cluster-host-02" { 1083 check_command = "dummy" 1084 vars.dummy_state = 0 1085 vars.dummy_text = "This host is up." 1086} 1087 1088object Host "cluster" { 1089 check_command = "dummy" 1090 vars.cluster_nodes = [ "cluster-host-01", "cluster-host-02" ] 1091 1092 vars.dummy_state = {{ 1093 var up_count = 0 1094 var down_count = 0 1095 var cluster_nodes = macro("$cluster_nodes$") 1096 1097 for (node in cluster_nodes) { 1098 if (get_host(node).state > 0) { 1099 down_count += 1 1100 } else { 1101 up_count += 1 1102 } 1103 } 1104 1105 if (up_count >= down_count) { 1106 return 0 //same up as down -> UP 1107 } else { 1108 return 2 //something is broken 1109 } 1110 }} 1111 1112 vars.dummy_text = {{ 1113 var output = "Cluster hosts:\n" 1114 var cluster_nodes = macro("$cluster_nodes$") 1115 1116 for (node in cluster_nodes) { 1117 output += node + ": " + get_host(node).last_check_result.output + "\n" 1118 } 1119 1120 return output 1121 }} 1122} 1123``` 1124 1125#### Time Dependent Thresholds <a id="access-object-attributes-at-runtime-time-dependent-thresholds"></a> 1126 1127The following example sets time dependent thresholds for the load check based on the current 1128time of the day compared to the defined time period. 1129 1130``` 1131object TimePeriod "backup" { 1132 ranges = { 1133 monday = "02:00-03:00" 1134 tuesday = "02:00-03:00" 1135 wednesday = "02:00-03:00" 1136 thursday = "02:00-03:00" 1137 friday = "02:00-03:00" 1138 saturday = "02:00-03:00" 1139 sunday = "02:00-03:00" 1140 } 1141} 1142 1143object Host "webserver-with-backup" { 1144 check_command = "hostalive" 1145 address = "127.0.0.1" 1146} 1147 1148object Service "webserver-backup-load" { 1149 check_command = "load" 1150 host_name = "webserver-with-backup" 1151 1152 vars.load_wload1 = {{ 1153 if (get_time_period("backup").is_inside) { 1154 return 20 1155 } else { 1156 return 5 1157 } 1158 }} 1159 vars.load_cload1 = {{ 1160 if (get_time_period("backup").is_inside) { 1161 return 40 1162 } else { 1163 return 10 1164 } 1165 }} 1166} 1167``` 1168 1169 1170## Advanced Value Types <a id="advanced-value-types"></a> 1171 1172In addition to the default value types Icinga 2 also uses a few other types 1173to represent its internal state. The following types are exposed via the [API](12-icinga2-api.md#icinga2-api). 1174 1175### CheckResult <a id="advanced-value-types-checkresult"></a> 1176 1177 Name | Type | Description 1178 --------------------------|-----------------------|---------------------------------- 1179 exit\_status | Number | The exit status returned by the check execution. 1180 output | String | The check output. 1181 performance\_data | Array | Array of [performance data values](08-advanced-topics.md#advanced-value-types-perfdatavalue). 1182 check\_source | String | Name of the node executing the check. 1183 scheduling\_source | String | Name of the node scheduling the check. 1184 state | Number | The current state (0 = OK, 1 = WARNING, 2 = CRITICAL, 3 = UNKNOWN). 1185 command | Value | Array of command with shell-escaped arguments or command line string. 1186 execution\_start | Timestamp | Check execution start time (as a UNIX timestamp). 1187 execution\_end | Timestamp | Check execution end time (as a UNIX timestamp). 1188 schedule\_start | Timestamp | Scheduled check execution start time (as a UNIX timestamp). 1189 schedule\_end | Timestamp | Scheduled check execution end time (as a UNIX timestamp). 1190 active | Boolean | Whether the result is from an active or passive check. 1191 vars\_before | Dictionary | Internal attribute used for calculations. 1192 vars\_after | Dictionary | Internal attribute used for calculations. 1193 ttl | Number | Time-to-live duration in seconds for this check result. The next expected check result is `now + ttl` where freshness checks are executed. 1194 1195### PerfdataValue <a id="advanced-value-types-perfdatavalue"></a> 1196 1197Icinga 2 parses performance data strings returned by check plugins and makes the information available to external interfaces (e.g. [GraphiteWriter](09-object-types.md#objecttype-graphitewriter) or the [Icinga 2 API](12-icinga2-api.md#icinga2-api)). 1198 1199 Name | Type | Description 1200 --------------------------|-----------------------|---------------------------------- 1201 label | String | Performance data label. 1202 value | Number | Normalized performance data value without unit. 1203 counter | Boolean | Enabled if the original value contains `c` as unit. Defaults to `false`. 1204 unit | String | Unit of measurement (`seconds`, `bytes`. `percent`) according to the [plugin API](05-service-monitoring.md#service-monitoring-plugin-api). 1205 crit | Value | Critical threshold value. 1206 warn | Value | Warning threshold value. 1207 min | Value | Minimum value returned by the check. 1208 max | Value | Maximum value returned by the check. 1209