1# Advanced Topics <a id="advanced-topics"></a>
2
3This chapter covers a number of advanced topics. If you're new to Icinga, you
4can safely skip over things you're not interested in.
5
6## Downtimes <a id="downtimes"></a>
7
8Downtimes can be scheduled for planned server maintenance or
9any other targeted service outage you are aware of in advance.
10
11Downtimes suppress notifications and can trigger other
12downtimes too. If the downtime was set by accident, or the duration
13exceeds the maintenance windows, you can manually cancel the downtime.
14
15### Scheduling a downtime <a id="scheduling-downtime"></a>
16
17The most convenient way to schedule planned downtimes is to create
18them in Icinga Web 2 inside the host/service detail view. Select
19multiple hosts/services from the listing with the shift key to
20schedule multiple downtimes.
21
22![Downtime in Icinga Web 2](images/advanced-topics/icingaweb2_downtime_handled.png)
23
24In addition to that you can schedule a downtime by using the Icinga 2 API action
25[schedule-downtime](12-icinga2-api.md#icinga2-api-actions-schedule-downtime).
26This is especially useful to schedule a downtime on-demand inside a (remote) backup
27script, or create maintenance downtimes from a cron job for specific dates and intervals.
28
29Multiple downtimes for a single object may overlap. This is useful
30when you want to extend your maintenance window taking longer than expected.
31If there are multiple downtimes triggered for one object, the overall downtime depth
32will be greater than `1`.
33
34If the downtime was scheduled after the problem changed to a critical hard
35state triggering a problem notification, and the service recovers during
36the downtime window, the recovery notification won't be suppressed.
37
38Planned downtimes are also taken into account for SLA reporting
39tools calculating the SLAs based on the state and downtime history.
40
41### Fixed and Flexible Downtimes <a id="fixed-flexible-downtimes"></a>
42
43A `fixed` downtime will be activated at the defined start time, and
44removed at the end time. During this time window the service state
45will change to `NOT-OK` and then actually trigger the downtime.
46Notifications are suppressed and the downtime depth is incremented.
47
48Common scenarios are a planned distribution upgrade on your linux
49servers, or database updates in your warehouse. The customer knows
50about a fixed downtime window between 23:00 and 24:00. After 24:00
51all problems should be alerted again. Solution is simple -
52schedule a `fixed` downtime starting at 23:00 and ending at 24:00.
53
54Unlike a `fixed` downtime, a `flexible` downtime will be triggered
55by the state change in the time span defined by start and end time,
56and then last for the specified duration in minutes.
57
58Imagine the following scenario: Your service is frequently polled
59by users trying to grab free deleted domains for immediate registration.
60Between 07:30 and 08:00 the impact will hit for 15 minutes and generate
61a network outage visible to the monitoring. The service is still alive,
62but answering too slow to Icinga 2 service checks.
63For that reason, you may want to schedule a downtime between 07:30 and
6408:00 with a duration of 15 minutes. The downtime will then last from
65its trigger time until the duration is over. After that, the downtime
66is removed (may happen before or after the actual end time!).
67
68#### Fixed Downtime <a id="fixed-downtime"></a>
69
70If the host/service changes into a NOT-OK state between the start and
71end time window, the downtime will be marked as `in effect` and
72increases the downtime depth counter.
73
74```
75   |       |         |
76start      |        end
77       trigger time
78```
79
80#### Flexible Downtime <a id="flexible-downtime"></a>
81
82A flexible downtime defines a time window where the downtime may be
83triggered from a host/service NOT-OK state change. It will then last
84until the specified time duration is reached. That way it can happen
85that the downtime end time is already gone, but the downtime ends
86at `trigger time + duration`.
87
88
89```
90   |       |         |
91start      |        end               actual end time
92           |--------------duration--------|
93       trigger time
94```
95
96
97### Triggered Downtimes <a id="triggered-downtimes"></a>
98
99This is optional when scheduling a downtime. If there is already a downtime
100scheduled for a future maintenance, the current downtime can be triggered by
101that downtime. This renders useful if you have scheduled a host downtime and
102are now scheduling a child host's downtime getting triggered by the parent
103downtime on `NOT-OK` state change.
104
105### Recurring Downtimes <a id="recurring-downtimes"></a>
106
107[ScheduledDowntime objects](09-object-types.md#objecttype-scheduleddowntime) can be used to set up
108recurring downtimes for services.
109
110Example:
111
112```
113apply ScheduledDowntime "backup-downtime" to Service {
114  author = "icingaadmin"
115  comment = "Scheduled downtime for backup"
116
117  ranges = {
118    monday = "02:00-03:00"
119    tuesday = "02:00-03:00"
120    wednesday = "02:00-03:00"
121    thursday = "02:00-03:00"
122    friday = "02:00-03:00"
123    saturday = "02:00-03:00"
124    sunday = "02:00-03:00"
125  }
126
127  assign where "backup" in service.groups
128}
129```
130
131Icinga 2 attempts to find the next possible segment from a ScheduledDowntime object's
132`ranges` attribute, and wont create multiple downtimes in the future. In case you need
133all these downtimes planned and visible for the next days, weeks or months, schedule them
134manually via the [REST API](12-icinga2-api.md#icinga2-api-actions-schedule-downtime) using
135a script or cron job.
136
137> **Note**
138>
139> If ScheduledDowntime objects are synced in a distributed high-availability setup,
140> both will create the next possible downtime on their own. These runtime generated
141> downtimes are synced among both zone instances, and you may see sort-of duplicate downtimes
142> in Icinga Web 2.
143
144
145## Comments <a id="comments-intro"></a>
146
147Comments can be added at runtime and are persistent over restarts. You can
148add useful information for others on repeating incidents (for example
149"last time syslog at 100% cpu on 17.10.2013 due to stale nfs mount") which
150is primarily accessible using web interfaces.
151
152You can add a comment either by using the Icinga 2 API action
153[add-comment](12-icinga2-api.md#icinga2-api-actions-add-comment) or
154by sending an [external command](14-features.md#external-commands).
155
156## Acknowledgements <a id="acknowledgements"></a>
157
158If a problem persists and notifications have been sent, you can
159acknowledge the problem. That way other users will get
160a notification that you're aware of the issue and probably are
161already working on a fix.
162
163Note: Acknowledgements also add a new [comment](08-advanced-topics.md#comments-intro)
164which contains the author and text fields.
165
166You can send an acknowledgement either by using the Icinga 2 API action
167[acknowledge-problem](12-icinga2-api.md#icinga2-api-actions-acknowledge-problem) or
168by sending an [external command](14-features.md#external-commands).
169
170
171### Sticky Acknowledgements <a id="sticky-acknowledgements"></a>
172
173The acknowledgement is removed if a state change occurs or if the host/service
174recovers (OK/Up state).
175
176If you acknowledge a problem once you've received a `Critical` notification,
177the acknowledgement will be removed if there is a state transition to `Warning`.
178```
179OK -> WARNING -> CRITICAL -> WARNING -> OK
180```
181
182If you prefer to keep the acknowledgement until the problem is resolved (`OK`
183recovery) you need to enable the `sticky` parameter.
184
185
186### Expiring Acknowledgements <a id="expiring-acknowledgements"></a>
187
188Once a problem is acknowledged it may disappear from your `handled problems`
189dashboard and no-one ever looks at it again since it will suppress
190notifications too.
191
192This `fire-and-forget` action is quite common. If you're sure that a
193current problem should be resolved in the future at a defined time,
194you can define an expiration time when acknowledging the problem.
195
196Icinga 2 will clear the acknowledgement when expired and start to
197re-notify, if the problem persists.
198
199
200## Time Periods <a id="timeperiods"></a>
201
202[Time Periods](09-object-types.md#objecttype-timeperiod) define
203time ranges in Icinga where event actions are triggered, for
204example whether a service check is executed or not within
205the `check_period` attribute. Or a notification should be sent to
206users or not, filtered by the `period` and `notification_period`
207configuration attributes for `Notification` and `User` objects.
208
209The `TimePeriod` attribute `ranges` may contain multiple directives,
210including weekdays, days of the month, and calendar dates.
211These types may overlap/override other types in your ranges dictionary.
212
213The descending order of precedence is as follows:
214
215* Calendar date (2008-01-01)
216* Specific month date (January 1st)
217* Generic month date (Day 15)
218* Offset weekday of specific month (2nd Tuesday in December)
219* Offset weekday (3rd Monday)
220* Normal weekday (Tuesday)
221
222If you don't set any `check_period` or `notification_period` attribute
223on your configuration objects, Icinga 2 assumes `24x7` as time period
224as shown below.
225
226```
227object TimePeriod "24x7" {
228  display_name = "Icinga 2 24x7 TimePeriod"
229  ranges = {
230    "monday"    = "00:00-24:00"
231    "tuesday"   = "00:00-24:00"
232    "wednesday" = "00:00-24:00"
233    "thursday"  = "00:00-24:00"
234    "friday"    = "00:00-24:00"
235    "saturday"  = "00:00-24:00"
236    "sunday"    = "00:00-24:00"
237  }
238}
239```
240
241If your operation staff should only be notified during workhours,
242create a new timeperiod named `workhours` defining a work day from
24309:00 to 17:00.
244
245```
246object TimePeriod "workhours" {
247  display_name = "Icinga 2 8x5 TimePeriod"
248  ranges = {
249    "monday"    = "09:00-17:00"
250    "tuesday"   = "09:00-17:00"
251    "wednesday" = "09:00-17:00"
252    "thursday"  = "09:00-17:00"
253    "friday"    = "09:00-17:00"
254  }
255}
256```
257
258### Across midnight <a id="timeperiods-across-midnight"></a>
259
260If you want to specify a notification period across midnight,
261you can define it the following way:
262
263```
264object TimePeriod "across-midnight" {
265  display_name = "Nightly Notification"
266  ranges = {
267    "saturday" = "22:00-24:00"
268    "sunday" = "00:00-03:00"
269  }
270}
271```
272
273Starting with v2.11 this can be shortened to using
274the first day as start with an overlapping range into
275the next day:
276
277```
278object TimePeriod "do-not-disturb" {
279  display_name = "Weekend DND"
280  ranges = {
281    "saturday" = "22:00-06:00"
282  }
283}
284```
285
286### Across several days, weeks or months <a id="timeperiods-across-days-weeks-months"></a>
287
288Below you can see another example for configuring timeperiods across several
289days, weeks or months. This can be useful when taking components offline
290for a distinct period of time.
291
292```
293object TimePeriod "standby" {
294  display_name = "Standby"
295  ranges = {
296    "2016-09-30 - 2016-10-30" = "00:00-24:00"
297  }
298}
299```
300
301Please note that the spaces before and after the dash are mandatory.
302
303Once your time period is configured you can Use the `period` attribute
304to assign time periods to `Notification` and `Dependency` objects:
305
306```
307apply Notification "mail-icingaadmin" to Service {
308  import "mail-service-notification"
309  user_groups = host.vars.notification.mail.groups
310  users = host.vars.notification.mail.users
311
312  period = "workhours"
313
314  assign where host.vars.notification.mail
315}
316```
317
318### Time Periods Inclusion and Exclusion <a id="timeperiods-includes-excludes"></a>
319
320Sometimes it is necessary to exclude certain time ranges from
321your default time period definitions, for example, if you don't
322want to send out any notification during the holiday season,
323or if you only want to allow small time windows for executed checks.
324
325The [TimePeriod object](09-object-types.md#objecttype-timeperiod)
326provides the `includes` and `excludes` attributes to solve this issue.
327`prefer_includes` defines whether included or excluded time periods are
328preferred.
329
330The following example defines a time period called `holidays` where
331notifications should be suppressed:
332
333```
334object TimePeriod "holidays" {
335  ranges = {
336    "january 1" = "00:00-24:00"                 //new year's day
337    "july 4" = "00:00-24:00"                    //independence day
338    "december 25" = "00:00-24:00"               //christmas
339    "december 31" = "18:00-24:00"               //new year's eve (6pm+)
340    "2017-04-16" = "00:00-24:00"                //easter 2017
341    "monday -1 may" = "00:00-24:00"             //memorial day (last monday in may)
342    "monday 1 september" = "00:00-24:00"        //labor day (1st monday in september)
343    "thursday 4 november" = "00:00-24:00"       //thanksgiving (4th thursday in november)
344  }
345}
346```
347
348In addition to that the time period `weekends` defines an additional
349time window which should be excluded from notifications:
350
351```
352object TimePeriod "weekends-excluded" {
353  ranges = {
354    "saturday"  = "00:00-09:00,18:00-24:00"
355    "sunday"    = "00:00-09:00,18:00-24:00"
356  }
357}
358```
359
360The time period `prod-notification` defines the default time ranges
361and adds the excluded time period names as an array.
362
363```
364object TimePeriod "prod-notification" {
365  excludes = [ "holidays", "weekends-excluded" ]
366
367  ranges = {
368    "monday"    = "00:00-24:00"
369    "tuesday"   = "00:00-24:00"
370    "wednesday" = "00:00-24:00"
371    "thursday"  = "00:00-24:00"
372    "friday"    = "00:00-24:00"
373    "saturday"  = "00:00-24:00"
374    "sunday"    = "00:00-24:00"
375  }
376}
377```
378
379### Time zone handling <a id="timeperiods-timezones"></a>
380
381Icinga 2 takes the OS' time zone including DST changes into account.
382
383Times inside DST changes are interpreted as before the DST changes.
384I.e. for the time zone Europe/Berlin:
385
386* On 2020-10-25 03:00 CEST the time jumps back to 02:00 CET.
387  For Icinga 02:30 means 02:30 CEST.
388* On 2021-02-28 02:00 CET the time jumps forward to 03:00 CEST.
389  For Icinga (the actually not existing) 02:30 refers to CET
390  and effectively means 03:30 CEST.
391
392## External Passive Check Results <a id="external-check-results"></a>
393
394Hosts or services which do not actively execute a check plugin to receive
395the state and output are called "passive checks" or "external check results".
396In this scenario an external client or script is sending in check results.
397
398You can feed check results into Icinga 2 with the following transport methods:
399
400* [process-check-result action](12-icinga2-api.md#icinga2-api-actions-process-check-result) available with the [REST API](12-icinga2-api.md#icinga2-api) (remote and local)
401* External command sent via command pipe (local only)
402
403Each time a new check result is received, the next expected check time
404is updated. This means that if there are no check result received from
405the external source, Icinga 2 will execute [freshness checks](08-advanced-topics.md#check-result-freshness).
406
407> **Note**
408>
409> The REST API action allows to specify the `check_source` attribute
410> which helps identifying the external sender. This is also visible
411> in Icinga Web 2 and the REST API queries.
412
413## Check Result Freshness <a id="check-result-freshness"></a>
414
415In Icinga 2 active check freshness is enabled by default. It is determined by the
416`check_interval` attribute and no incoming check results in that period of time.
417
418The threshold is calculated based on the last check execution time for actively executed checks:
419
420```
421(last check execution time + check interval) > current time
422```
423
424If this host/service receives check results from an [external source](08-advanced-topics.md#external-check-results),
425the threshold is based on the last time a check result was received:
426
427```
428(last check result time + check interval) > current time
429```
430
431> **Tip**
432>
433> The [process-check-result](12-icinga2-api.md#icinga2-api-actions-process-check-result) REST API
434> action allows to overrule the pre-defined check interval with a specified TTL in Icinga 2 v2.9+.
435
436If the freshness checks fail, Icinga 2 will execute the defined check command unless active checks are disabled.
437
438Best practice is to define a [dummy](10-icinga-template-library.md#itl-dummy) `check_command` which gets
439executed when freshness checks fail.
440
441```
442apply Service "external-check" {
443  check_command = "dummy"
444  check_interval = 1m
445
446  /* Set the state to UNKNOWN (3) if freshness checks fail. */
447  vars.dummy_state = 3
448
449  /* Use a runtime function to retrieve the last check time and more details. */
450  vars.dummy_text = {{
451    var service = get_service(macro("$host.name$"), macro("$service.name$"))
452    var lastCheck = DateTime(service.last_check).to_string()
453
454    return "No check results received. Last result time: " + lastCheck
455  }}
456
457  assign where "external" in host.vars.services
458}
459```
460
461References: [get_service](18-library-reference.md#objref-get_service), [macro](18-library-reference.md#scoped-functions-macro), [DateTime](18-library-reference.md#datetime-type).
462
463Example output in Icinga Web 2:
464
465![Icinga 2 Freshness Checks](images/advanced-topics/icinga2_external_checks_freshness_icingaweb2.png)
466
467
468## Check Flapping <a id="check-flapping"></a>
469
470Icinga 2 supports optional detection of hosts and services that are "flapping".
471
472Flapping occurs when a service or host changes state too frequently, which would result in a storm of problem and
473recovery notifications. With flapping detection enabled a flapping notification will be sent while other notifications are
474suppressed until it calms down after receiving the same status from checks a few times. Flapping detection can help detect
475configuration problems (wrong thresholds), troublesome services or network problems.
476
477Flapping detection can be enabled or disabled using the `enable_flapping` attribute.
478The `flapping_threshold_high` and `flapping_threshold_low` attributes allows to specify the thresholds that control
479when a [host](09-object-types.md#objecttype-host) or [service](09-object-types.md#objecttype-service) is considered to be flapping.
480
481The default thresholds are 30% for high and 25% for low. If the computed flapping value exceeds the high threshold a
482host or service is considered flapping until it drops below the low flapping threshold.
483
484The attribute `flapping_ignore_states` allows to ignore state changes to specified states during the flapping calculation.
485
486`FlappingStart` and `FlappingEnd` notifications will be sent out accordingly, if configured. See the chapter on
487[notifications](alert-notifications) for details
488
489> Note: There is no distinctions between hard and soft states with flapping. All state changes count and notifications
490> will be sent out regardless of the objects state.
491
492### How it works <a id="check-flapping-how-it-works"></a>
493
494Icinga 2 saves the last 20 state changes for every host and service. See the graphic below:
495
496![Icinga 2 Flapping State Timeline](images/advanced-topics/flapping-state-graph.png)
497
498All the states are weighted, with the most recent one being worth the most (1.15) and the 20th the least (0.8). The
499states in between are fairly distributed. The final flapping value are the weighted state changes divided by the total
500count of 20.
501
502In the example above, the added states would have a total value of 7.82 (`0.84 + 0.86 + 0.88 + 0.9 + 0.98 + 1.06 + 1.12 + 1.18`).
503This yields a flapping percentage of 39.1% (`7.82 / 20 * 100`). As the default upper flapping threshold is 30%, it would be
504considered flapping.
505
506If the next seven check results then would not be state changes, the flapping percentage would fall below the lower threshold
507of 25% and therefore the host or service would recover from flapping.
508
509## Volatile Services and Hosts <a id="volatile-services-hosts"></a>
510
511The `volatile` option, if enabled for a host or service, makes it treat every [state change](03-monitoring-basics.md#hard-soft-states)
512as a `HARD` state change. It is comparable to `max_check_attempts = 1`. With this any `NOT-OK` result will
513ignore `max_check_attempts` and trigger notifications etc. It will further cause any additional `NOT-OK`
514result to re-send notifications.
515
516It may be reasonable to have a volatile service which stays in a `HARD` state if the service stays in a `NOT-OK`
517state. That way each service recheck will automatically trigger a notification unless the service is acknowledged or
518in a scheduled downtime.
519
520A common example are security checks where each `NOT-OK` check result should immediately trigger a notification.
521
522The default for this option is `false` and should only be enabled when required.
523
524
525## Monitoring Icinga 2 <a id="monitoring-icinga"></a>
526
527Why should you do that? Icinga and its components run like any other
528service application on your server. There are predictable issues
529such as "disk space is running low" and your monitoring suffers from just
530that.
531
532You would also like to ensure that features and backends are running
533and storing required data. Be it the database backend where Icinga Web 2
534presents fancy dashboards, forwarded metrics to Graphite or InfluxDB or
535the entire distributed setup.
536
537This list isn't complete but should help with your own setup.
538Windows client specific checks are highlighted.
539
540Type		| Description			| Plugins and CheckCommands
541----------------|-------------------------------|-----------------------------------------------------
542System		| Filesystem			| [disk](10-icinga-template-library.md#plugin-check-command-disk), [disk-windows](10-icinga-template-library.md#windows-plugins) (Windows Client)
543System		| Memory, Swap			| [mem](10-icinga-template-library.md#plugin-contrib-command-mem), [swap](10-icinga-template-library.md#plugin-check-command-swap), [memory](10-icinga-template-library.md#windows-plugins) (Windows Client)
544System		| Hardware			| [hpasm](10-icinga-template-library.md#plugin-contrib-command-hpasm), [ipmi-sensor](10-icinga-template-library.md#plugin-contrib-command-ipmi-sensor)
545System		| Virtualization		| [VMware](10-icinga-template-library.md#plugin-contrib-vmware), [esxi_hardware](10-icinga-template-library.md#plugin-contrib-command-esxi-hardware)
546System		| Processes			| [procs](10-icinga-template-library.md#plugin-check-command-processes), [service-windows](10-icinga-template-library.md#windows-plugins) (Windows Client)
547System		| System Activity Reports	| [sar-perf](10-icinga-template-library.md#plugin-contrib-command-sar-perf)
548System		| I/O				| [iostat](10-icinga-template-library.md#plugin-contrib-command-iostat)
549System		| Network interfaces		| [nwc_health](10-icinga-template-library.md#plugin-contrib-command-nwc_health), [interfaces](10-icinga-template-library.md#plugin-contrib-command-interfaces)
550System		| Users				| [users](10-icinga-template-library.md#plugin-check-command-users), [users-windows](10-icinga-template-library.md#windows-plugins) (Windows Client)
551System		| Logs				| Forward them to [Elastic Stack](14-features.md#elastic-stack-integration) or [Graylog](14-features.md#graylog-integration) and add your own alerts.
552System		| NTP				| [ntp_time](10-icinga-template-library.md#plugin-check-command-ntp-time)
553System		| Updates			| [apt](10-icinga-template-library.md#plugin-check-command-apt), [yum](10-icinga-template-library.md#plugin-contrib-command-yum)
554Icinga		| Status & Stats		| [icinga](10-icinga-template-library.md#itl-icinga) (more below)
555Icinga		| Cluster & Clients		| [health checks](06-distributed-monitoring.md#distributed-monitoring-health-checks)
556Database	| MySQL				| [mysql_health](10-icinga-template-library.md#plugin-contrib-command-mysql_health)
557Database	| PostgreSQL			| [postgres](10-icinga-template-library.md#plugin-contrib-command-postgres)
558Database	| Housekeeping			| Check the database size and growth and analyse metrics to examine trends.
559Database	| DB IDO			| [ido](10-icinga-template-library.md#itl-icinga-ido) (more below)
560Webserver	| Apache2, Nginx, etc.		| [http](10-icinga-template-library.md#plugin-check-command-http), [apache-status](10-icinga-template-library.md#plugin-contrib-command-apache-status), [nginx_status](10-icinga-template-library.md#plugin-contrib-command-nginx_status)
561Webserver	| Certificates			| [http](10-icinga-template-library.md#plugin-check-command-http), [Icinga certificate monitoring](https://icinga.com/products/icinga-certificate-monitoring/)
562Webserver	| Authorization			| [http](10-icinga-template-library.md#plugin-check-command-http)
563Notifications	| Mail (queue)			| [smtp](10-icinga-template-library.md#plugin-check-command-smtp), [mailq](10-icinga-template-library.md#plugin-check-command-mailq)
564Notifications	| SMS (GSM modem)		| [check_sms3_status](https://exchange.icinga.com/netways/check_sms3status)
565Notifications	| Messengers, Cloud services	| XMPP, Twitter, IRC, Telegram, PagerDuty, VictorOps, etc.
566Metrics		| PNP, RRDTool			| [check_pnp_rrds](https://github.com/lingej/pnp4nagios/tree/master/scripts) checks for stale RRD files.
567Metrics		| Graphite			| [graphite](10-icinga-template-library.md#plugin-contrib-command-graphite)
568Metrics		| InfluxDB			| [check_influxdb](https://exchange.icinga.com/Mikanoshi/InfluxDB+data+monitoring+plugin)
569Metrics		| Elastic Stack			| [elasticsearch](10-icinga-template-library.md#plugin-contrib-command-elasticsearch), [Elastic Stack integration](14-features.md#elastic-stack-integration)
570Metrics		| Graylog			| [Graylog integration](14-features.md#graylog-integration)
571
572
573The [icinga](10-icinga-template-library.md#itl-icinga) CheckCommand provides metrics for the runtime stats of
574Icinga 2. You can forward them to your preferred graphing solution.
575If you require more metrics you can also query the [REST API](12-icinga2-api.md#icinga2-api) and write
576your own custom check plugin. Or you keep using the built-in [object accessor functions](08-advanced-topics.md#access-object-attributes-at-runtime)
577to calculate stats in-memory.
578
579There is a built-in [ido](10-icinga-template-library.md#itl-icinga-ido) check available for DB IDO MySQL/PostgreSQL
580which provides additional metrics for the IDO database.
581
582```
583apply Service "ido-mysql" {
584  check_command = "ido"
585
586  vars.ido_type = "IdoMysqlConnection"
587  vars.ido_name = "ido-mysql" //the name defined in /etc/icinga2/features-enabled/ido-mysql.conf
588
589  assign where match("master*.localdomain", host.name)
590}
591```
592
593More specific database queries can be found in the [DB IDO](14-features.md#db-ido) chapter.
594
595Distributed setups should include specific [health checks](06-distributed-monitoring.md#distributed-monitoring-health-checks).
596
597You might also want to add additional checks for TLS certificate expiration.
598This can be done using the [Icinga certificate monitoring](https://icinga.com/products/icinga-certificate-monitoring/) module.
599
600
601
602## Advanced Configuration Hints <a id="advanced-configuration-hints"></a>
603
604### Advanced Use of Apply Rules <a id="advanced-use-of-apply-rules"></a>
605
606[Apply rules](03-monitoring-basics.md#using-apply) can be used to create a rule set which is
607entirely based on host objects and their attributes.
608In addition to that [apply for and custom variable override](03-monitoring-basics.md#using-apply-for)
609extend the possibilities.
610
611The following example defines a dictionary on the host object which contains
612configuration attributes for multiple web servers. This then used to add three checks:
613
614* A `ping4` check using the local IP `address` of the web server.
615* A `tcp` check querying the TCP port where the HTTP service is running on.
616* If the `url` key is defined, the third apply for rule will create service objects using the `http` CheckCommand.
617In addition to that you can optionally define the `ssl` attribute which enables HTTPS checks.
618
619Host definition:
620
621```
622object Host "webserver01" {
623  import "generic-host"
624  address = "192.168.56.200"
625  vars.os = "Linux"
626
627  vars.webserver = {
628    instance["status"] = {
629      address = "192.168.56.201"
630      port = "80"
631      url = "/status"
632    }
633    instance["tomcat"] = {
634      address = "192.168.56.202"
635      port = "8080"
636    }
637    instance["icingaweb2"] = {
638      address = "192.168.56.210"
639      port = "443"
640      url = "/icingaweb2"
641      ssl = true
642    }
643  }
644}
645```
646
647Service apply for definitions:
648
649```
650apply Service "webserver_ping" for (instance => config in host.vars.webserver.instance) {
651  display_name = "webserver_" + instance
652  check_command = "ping4"
653
654  vars.ping_address = config.address
655
656  assign where host.vars.webserver.instance
657}
658
659apply Service "webserver_port" for (instance => config in host.vars.webserver.instance) {
660  display_name = "webserver_" + instance + "_" + config.port
661  check_command = "tcp"
662
663  vars.tcp_address = config.address
664  vars.tcp_port = config.port
665
666  assign where host.vars.webserver.instance
667}
668
669apply Service "webserver_url" for (instance => config in host.vars.webserver.instance) {
670  display_name = "webserver_" + instance + "_" + config.url
671  check_command = "http"
672
673  vars.http_address = config.address
674  vars.http_port = config.port
675  vars.http_uri = config.url
676
677  if (config.ssl) {
678    vars.http_ssl = config.ssl
679  }
680
681  assign where config.url != ""
682}
683```
684
685The variables defined in the host dictionary are not using the typical custom variable
686prefix recommended for CheckCommand parameters. Instead they are re-used for multiple
687service checks in this example.
688In addition to defining check parameters this way, you can also enrich the `display_name`
689attribute with more details. This will be shown in in Icinga Web 2 for example.
690
691### Use Functions in Object Configuration <a id="use-functions-object-config"></a>
692
693There is a limited scope where functions can be used as object attributes such as:
694
695* As value for [Custom Variables](03-monitoring-basics.md#custom-variables-functions)
696* Returning boolean expressions for [set_if](08-advanced-topics.md#use-functions-command-arguments-setif) inside command arguments
697* Returning a [command](08-advanced-topics.md#use-functions-command-attribute) array inside command objects
698
699The other way around you can create objects dynamically using your own global functions.
700
701> **Note**
702>
703> Functions called inside command objects share the same global scope as runtime macros.
704> Therefore you can access host custom variables like `host.vars.os`, or any other
705> object attribute from inside the function definition used for [set_if](08-advanced-topics.md#use-functions-command-arguments-setif) or [command](08-advanced-topics.md#use-functions-command-attribute).
706
707Tips when implementing functions:
708
709* Use [log()](18-library-reference.md#global-functions-log) to dump variables. You can see the output
710inside the `icinga2.log` file depending in your log severity
711* Use the `icinga2 console` to test basic functionality (e.g. iterating over a dictionary)
712* Build them step-by-step. You can always refactor your code later on.
713
714#### Register and Use Global Functions <a id="use-functions-global-register"></a>
715
716[Functions](17-language-reference.md#functions) can be registered into the global scope. This allows custom functions being available
717in objects and other functions. Keep in mind that these functions are not marked
718as side-effect-free and as such are not available via the REST API.
719
720Add a new configuration file `functions.conf` and include it into the [icinga2.conf](04-configuration.md#icinga2-conf)
721configuration file in the very beginning, e.g. after `constants.conf`. You can also manage global
722functions inside `constants.conf` if you prefer.
723
724The following function converts a given state parameter into a returned string value. The important
725bits for registering it into the global scope are:
726
727* `globals.<unique_function_name>` adds a new globals entry.
728* `function()` specifies that a call to `state_to_string()` executes a function.
729* Function parameters are defined inside the `function()` definition.
730
731```
732globals.state_to_string = function(state) {
733  if (state == 2) {
734    return "Critical"
735  } else if (state == 1) {
736    return "Warning"
737  } else if (state == 0) {
738    return "OK"
739  } else if (state == 3) {
740    return "Unknown"
741  } else {
742    log(LogWarning, "state_to_string", "Unknown state " + state + " provided.")
743  }
744}
745```
746
747The else-condition allows for better error handling. This warning will be shown in the Icinga 2
748log file once the function is called.
749
750> **Note**
751>
752> If these functions are used in a distributed environment, you must ensure to deploy them
753> everywhere needed.
754
755In order to test-drive the newly created function, restart Icinga 2 and use the [debug console](11-cli-commands.md#cli-command-console)
756to connect to the REST API.
757
758```
759$ ICINGA2_API_PASSWORD=icinga icinga2 console --connect 'https://root@localhost:5665/'
760Icinga 2 (version: v2.11.0)
761<1> => globals.state_to_string(1)
762"Warning"
763<2> => state_to_string(2)
764"Critical"
765```
766
767You can see that this function is now registered into the [global scope](17-language-reference.md#variable-scopes). The function call
768`state_to_string()` can be used in any object at static config compile time or inside runtime
769lambda functions.
770
771The following service object example uses the service state and converts it to string output.
772The function definition is not optimized and is enrolled for better readability including a log message.
773
774```
775object Service "state-test" {
776  check_command = "dummy"
777  host_name = NodeName
778
779  vars.dummy_state = 2
780
781  vars.dummy_text = {{
782    var h = macro("$host.name$")
783    var s = macro("$service.name$")
784
785    var state = get_service(h, s).state
786
787    log(LogInformation, "dummy_state", "Host: " + h + " Service: " + s + " State: " + state)
788
789    return state_to_string(state)
790  }}
791}
792```
793
794
795#### Use Custom Functions as Attribute <a id="custom-functions-as-attribute"></a>
796
797To use custom functions as attributes, the function must be defined in a
798slightly unexpected way. The following example shows how to assign values
799depending on group membership. All hosts in the `slow-lan` host group use 300
800as value for `ping_wrta`, all other hosts use 100.
801
802```
803globals.group_specific_value = function(group, group_value, non_group_value) {
804    return function() use (group, group_value, non_group_value) {
805        if (group in host.groups) {
806            return group_value
807        } else {
808            return non_group_value
809        }
810    }
811}
812
813apply Service "ping4" {
814    import "generic-service"
815    check_command = "ping4"
816
817    vars.ping_wrta = group_specific_value("slow-lan", 300, 100)
818    vars.ping_crta = group_specific_value("slow-lan", 500, 200)
819
820    assign where true
821}
822```
823
824#### Use Functions in Assign Where Expressions <a id="use-functions-assign-where"></a>
825
826If a simple expression for matching a name or checking if an item
827exists in an array or dictionary does not fit, you should consider
828writing your own global [functions](17-language-reference.md#functions).
829You can call them inside `assign where` and `ignore where` expressions
830for [apply rules](03-monitoring-basics.md#using-apply-expressions) or
831[group assignments](03-monitoring-basics.md#group-assign-intro) just like
832any other global functions for example [match](18-library-reference.md#global-functions-match).
833
834The following example requires the host `myprinter` being added
835to the host group `printers-lexmark` but only if the host uses
836a template matching the name `lexmark*`.
837
838```
839template Host "lexmark-printer-host" {
840  vars.printer_type = "Lexmark"
841}
842
843object Host "myprinter" {
844  import "generic-host"
845  import "lexmark-printer-host"
846
847  address = "192.168.1.1"
848}
849
850/* register a global function for the assign where call */
851globals.check_host_templates = function(host, search) {
852  /* iterate over all host templates and check if the search matches */
853  for (tmpl in host.templates) {
854    if (match(search, tmpl)) {
855      return true
856    }
857  }
858
859  /* nothing matched */
860  return false
861}
862
863object HostGroup "printers-lexmark" {
864  display_name = "Lexmark Printers"
865  /* call the global function and pass the arguments */
866  assign where check_host_templates(host, "lexmark*")
867}
868```
869
870Take a different more complex example: All hosts with the
871custom variable `vars_app` as nested dictionary should be
872added to the host group `ABAP-app-server`. But only if the
873`app_type` for all entries is set to `ABAP`.
874
875It could read as wildcard match for nested dictionaries:
876
877```
878    where host.vars.vars_app["*"].app_type == "ABAP"
879```
880
881The solution for this problem is to register a global
882function which checks the `app_type` for all hosts
883with the `vars_app` dictionary.
884
885```
886object Host "appserver01" {
887  check_command = "dummy"
888  vars.vars_app["ABC"] = { app_type = "ABAP" }
889}
890object Host "appserver02" {
891  check_command = "dummy"
892  vars.vars_app["DEF"] = { app_type = "ABAP" }
893}
894
895globals.check_app_type = function(host, type) {
896  /* ensure that other hosts without the custom variable do not match */
897  if (typeof(host.vars.vars_app) != Dictionary) {
898    return false
899  }
900
901  /* iterate over the vars_app dictionary */
902  for (key => val in host.vars.vars_app) {
903    /* if the value is a dictionary and if contains the app_type being the requested type */
904    if (typeof(val) == Dictionary && val.app_type == type) {
905      return true
906    }
907  }
908
909  /* nothing matched */
910  return false
911}
912
913object HostGroup "ABAP-app-server" {
914  assign where check_app_type(host, "ABAP")
915}
916```
917
918#### Use Functions in Command Arguments set_if <a id="use-functions-command-arguments-setif"></a>
919
920The `set_if` attribute inside the command arguments definition in the
921[CheckCommand object definition](09-object-types.md#objecttype-checkcommand) is primarily used to
922evaluate whether the command parameter should be set or not.
923
924By default you can evaluate runtime macros for their existence. If the result is not an empty
925string, the command parameter is passed. This becomes fairly complicated when want to evaluate
926multiple conditions and attributes.
927
928The following example was found on the community support channels. The user had defined a host
929dictionary named `compellent` with the key `disks`. This was then used inside service apply for rules.
930
931```
932object Host "dict-host" {
933  check_command = "check_compellent"
934  vars.compellent["disks"] = {
935    file = "/var/lib/check_compellent/san_disks.0.json",
936    checks = ["disks"]
937  }
938}
939```
940
941The more significant problem was to only add the command parameter `--disk` to the plugin call
942when the dictionary `compellent` contains the key `disks`, and omit it if not found.
943
944By defining `set_if` as [abbreviated lambda function](17-language-reference.md#nullary-lambdas)
945and evaluating the host custom variable `compellent` containing the `disks` this problem was
946solved like this:
947
948```
949object CheckCommand "check_compellent" {
950  command   = [ "/usr/bin/check_compellent" ]
951  arguments   = {
952    "--disks"  = {
953      set_if = {{
954        var host_vars = host.vars
955        log(host_vars)
956        var compel = host_vars.compellent
957        log(compel)
958        compel.contains("disks")
959      }}
960    }
961  }
962}
963```
964
965This implementation uses the dictionary type method [contains](18-library-reference.md#dictionary-contains)
966and will fail if `host.vars.compellent` is not of the type `Dictionary`.
967Therefore you can extend the checks using the [typeof](17-language-reference.md#types) function.
968
969You can test the types using the `icinga2 console`:
970
971```
972# icinga2 console
973Icinga (version: v2.3.0-193-g3eb55ad)
974<1> => srv_vars.compellent["check_a"] = { file="outfile_a.json", checks = [ "disks", "fans" ] }
975null
976<2> => srv_vars.compellent["check_b"] = { file="outfile_b.json", checks = [ "power", "voltages" ] }
977null
978<3> => typeof(srv_vars.compellent)
979type 'Dictionary'
980<4> =>
981```
982
983The more programmatic approach for `set_if` could look like this:
984
985```
986    "--disks" = {
987      set_if = {{
988        var srv_vars = service.vars
989        if(len(srv_vars) > 0) {
990          if (typeof(srv_vars.compellent) == Dictionary) {
991            return srv_vars.compellent.contains("disks")
992          } else {
993            log(LogInformation, "checkcommand set_if", "custom variable compellent_checks is not a dictionary, ignoring it.")
994            return false
995          }
996        } else {
997          log(LogWarning, "checkcommand set_if", "empty custom variables")
998          return false
999        }
1000      }}
1001    }
1002```
1003
1004#### Use Functions as Command Attribute <a id="use-functions-command-attribute"></a>
1005
1006This comes in handy for [NotificationCommands](09-object-types.md#objecttype-notificationcommand)
1007or [EventCommands](09-object-types.md#objecttype-eventcommand) which does not require
1008a returned checkresult including state/output.
1009
1010The following example was taken from the community support channels. The requirement was to
1011specify a custom variable inside the notification apply rule and decide which notification
1012script to call based on that.
1013
1014```
1015object User "short-dummy" {
1016}
1017
1018object UserGroup "short-dummy-group" {
1019  assign where user.name == "short-dummy"
1020}
1021
1022apply Notification "mail-admins-short" to Host {
1023   import "mail-host-notification"
1024   command = "mail-host-notification-test"
1025   user_groups = [ "short-dummy-group" ]
1026   vars.short = true
1027   assign where host.vars.notification.mail
1028}
1029```
1030
1031The solution is fairly simple: The `command` attribute is implemented as function returning
1032an array required by the caller Icinga 2.
1033The local variable `mailscript` sets the default value for the notification scrip location.
1034If the notification custom variable `short` is set, it will override the local variable `mailscript`
1035with a new value.
1036The `mailscript` variable is then used to compute the final notification command array being
1037returned.
1038
1039You can omit the `log()` calls, they only help debugging.
1040
1041```
1042object NotificationCommand "mail-host-notification-test" {
1043  command = {{
1044    log("command as function")
1045    var mailscript = "mail-host-notification-long.sh"
1046    if (notification.vars.short) {
1047       mailscript = "mail-host-notification-short.sh"
1048    }
1049    log("Running command")
1050    log(mailscript)
1051
1052    var cmd = [ ConfigDir + "/scripts/" + mailscript ]
1053    log(LogCritical, "me", cmd)
1054    return cmd
1055  }}
1056
1057  env = {
1058  }
1059}
1060```
1061
1062### Access Object Attributes at Runtime <a id="access-object-attributes-at-runtime"></a>
1063
1064The [Object Accessor Functions](18-library-reference.md#object-accessor-functions)
1065can be used to retrieve references to other objects by name.
1066
1067This allows you to access configuration and runtime object attributes. A detailed
1068list can be found [here](09-object-types.md#object-types).
1069
1070#### Access Object Attributes at Runtime: Cluster Check <a id="access-object-attributes-at-runtime-cluster-check"></a>
1071
1072This is a simple cluster example for accessing two host object states and calculating a virtual
1073cluster state and output:
1074
1075```
1076object Host "cluster-host-01" {
1077  check_command = "dummy"
1078  vars.dummy_state = 2
1079  vars.dummy_text = "This host is down."
1080}
1081
1082object Host "cluster-host-02" {
1083  check_command = "dummy"
1084  vars.dummy_state = 0
1085  vars.dummy_text = "This host is up."
1086}
1087
1088object Host "cluster" {
1089  check_command = "dummy"
1090  vars.cluster_nodes = [ "cluster-host-01", "cluster-host-02" ]
1091
1092  vars.dummy_state = {{
1093    var up_count = 0
1094    var down_count = 0
1095    var cluster_nodes = macro("$cluster_nodes$")
1096
1097    for (node in cluster_nodes) {
1098      if (get_host(node).state > 0) {
1099        down_count += 1
1100      } else {
1101        up_count += 1
1102      }
1103    }
1104
1105    if (up_count >= down_count) {
1106      return 0 //same up as down -> UP
1107    } else {
1108      return 2 //something is broken
1109    }
1110  }}
1111
1112  vars.dummy_text = {{
1113    var output = "Cluster hosts:\n"
1114    var cluster_nodes = macro("$cluster_nodes$")
1115
1116    for (node in cluster_nodes) {
1117      output += node + ": " + get_host(node).last_check_result.output + "\n"
1118    }
1119
1120    return output
1121  }}
1122}
1123```
1124
1125#### Time Dependent Thresholds <a id="access-object-attributes-at-runtime-time-dependent-thresholds"></a>
1126
1127The following example sets time dependent thresholds for the load check based on the current
1128time of the day compared to the defined time period.
1129
1130```
1131object TimePeriod "backup" {
1132  ranges = {
1133    monday = "02:00-03:00"
1134    tuesday = "02:00-03:00"
1135    wednesday = "02:00-03:00"
1136    thursday = "02:00-03:00"
1137    friday = "02:00-03:00"
1138    saturday = "02:00-03:00"
1139    sunday = "02:00-03:00"
1140  }
1141}
1142
1143object Host "webserver-with-backup" {
1144  check_command = "hostalive"
1145  address = "127.0.0.1"
1146}
1147
1148object Service "webserver-backup-load" {
1149  check_command = "load"
1150  host_name = "webserver-with-backup"
1151
1152  vars.load_wload1 = {{
1153    if (get_time_period("backup").is_inside) {
1154      return 20
1155    } else {
1156      return 5
1157    }
1158  }}
1159  vars.load_cload1 = {{
1160    if (get_time_period("backup").is_inside) {
1161      return 40
1162    } else {
1163      return 10
1164    }
1165  }}
1166}
1167```
1168
1169
1170## Advanced Value Types <a id="advanced-value-types"></a>
1171
1172In addition to the default value types Icinga 2 also uses a few other types
1173to represent its internal state. The following types are exposed via the [API](12-icinga2-api.md#icinga2-api).
1174
1175### CheckResult <a id="advanced-value-types-checkresult"></a>
1176
1177  Name                      | Type                  | Description
1178  --------------------------|-----------------------|----------------------------------
1179  exit\_status              | Number                | The exit status returned by the check execution.
1180  output                    | String                | The check output.
1181  performance\_data         | Array                 | Array of [performance data values](08-advanced-topics.md#advanced-value-types-perfdatavalue).
1182  check\_source             | String                | Name of the node executing the check.
1183  scheduling\_source        | String                | Name of the node scheduling the check.
1184  state                     | Number                | The current state (0 = OK, 1 = WARNING, 2 = CRITICAL, 3 = UNKNOWN).
1185  command                   | Value                 | Array of command with shell-escaped arguments or command line string.
1186  execution\_start          | Timestamp             | Check execution start time (as a UNIX timestamp).
1187  execution\_end            | Timestamp             | Check execution end time (as a UNIX timestamp).
1188  schedule\_start           | Timestamp             | Scheduled check execution start time (as a UNIX timestamp).
1189  schedule\_end             | Timestamp             | Scheduled check execution end time (as a UNIX timestamp).
1190  active                    | Boolean               | Whether the result is from an active or passive check.
1191  vars\_before              | Dictionary            | Internal attribute used for calculations.
1192  vars\_after               | Dictionary            | Internal attribute used for calculations.
1193  ttl                       | Number                | Time-to-live duration in seconds for this check result. The next expected check result is `now + ttl` where freshness checks are executed.
1194
1195### PerfdataValue <a id="advanced-value-types-perfdatavalue"></a>
1196
1197Icinga 2 parses performance data strings returned by check plugins and makes the information available to external interfaces (e.g. [GraphiteWriter](09-object-types.md#objecttype-graphitewriter) or the [Icinga 2 API](12-icinga2-api.md#icinga2-api)).
1198
1199  Name                      | Type                  | Description
1200  --------------------------|-----------------------|----------------------------------
1201  label                     | String                | Performance data label.
1202  value                     | Number                | Normalized performance data value without unit.
1203  counter                   | Boolean               | Enabled if the original value contains `c` as unit. Defaults to `false`.
1204  unit                      | String                | Unit of measurement (`seconds`, `bytes`. `percent`) according to the [plugin API](05-service-monitoring.md#service-monitoring-plugin-api).
1205  crit                      | Value                 | Critical threshold value.
1206  warn                      | Value                 | Warning threshold value.
1207  min                       | Value                 | Minimum value returned by the check.
1208  max                       | Value                 | Maximum value returned by the check.
1209