1# Icinga 2 Features <a id="icinga2-features"></a> 2 3## Logging <a id="logging"></a> 4 5Icinga 2 supports three different types of logging: 6 7* File logging 8* Syslog (on Linux/UNIX) 9* Console logging (`STDOUT` on tty) 10 11You can enable additional loggers using the `icinga2 feature enable` 12and `icinga2 feature disable` commands to configure loggers: 13 14Feature | Description 15----------------|------------ 16debuglog | Debug log (path: `/var/log/icinga2/debug.log`, severity: `debug` or higher) 17mainlog | Main log (path: `/var/log/icinga2/icinga2.log`, severity: `information` or higher) 18syslog | Syslog (severity: `warning` or higher) 19windowseventlog | Windows Event Log (severity: `information` or higher) 20 21By default file the `mainlog` feature is enabled. When running Icinga 2 22on a terminal log messages with severity `information` or higher are 23written to the console. 24 25### Log Rotation <a id="logging-logrotate"></a> 26 27Packages provide a configuration file for [logrotate](https://linux.die.net/man/8/logrotate) 28on Linux/Unix. Typically this is installed into `/etc/logrotate.d/icinga2` 29and modifications won't be overridden on upgrade. 30 31Instead of sending the reload HUP signal, logrotate 32sends the USR1 signal to notify the Icinga daemon 33that it has rotate the log file. Icinga reopens the log 34files then: 35 36* `/var/log/icinga2/icinga2.log` (requires `mainlog` enabled) 37* `/var/log/icinga2/debug.log` (requires `debuglog` enabled) 38* `/var/log/icinga2/error.log` 39 40By default, log files will be rotated daily. 41 42## Core Backends <a id="core-backends"></a> 43 44### REST API <a id="core-backends-api"></a> 45 46The REST API is documented [here](12-icinga2-api.md#icinga2-api) as a core feature. 47 48### Icinga DB <a id="core-backends-icingadb"></a> 49 50Icinga DB provides a new core backend and aims to replace the IDO backend 51output. It consists of different components: 52 53* Icinga 2 provides the `icingadb` feature which stores monitoring data in a memory database 54* The [IcingaDB service](https://github.com/icinga/icingadb) collects and synchronizes monitoring data into its backend 55* Icinga Web reads monitoring data from the new IcingaDB backend 56 57Requirements: 58 59* Local Redis instance 60* MySQL/MariaDB server with `icingadb` database, user and schema imports 61* Icinga 2's `icingadb` feature enabled 62* IcingaDB service requires Redis and MySQL/MariaDB server 63* Icinga Web module 64 65Consult the [Icinga DB section](02-installation.md#configuring-icinga-db) in the installation chapter for setup instructions. 66 67We will deprecate the IDO and shift towards the Icinga DB as main backend, 68but we will not drop the IDO for now. 69We know that it takes time until the Icinga DB is adopted 70(maybe even up to one to two years) 71and we won’t drop the IDO until it is comfortable to do so. 72 73### IDO Database (DB IDO) <a id="db-ido"></a> 74 75The IDO (Icinga Data Output) feature for Icinga 2 takes care of exporting all 76configuration and status information into a database. The IDO database is used 77by Icinga Web 2 as data backend. 78 79Details on the installation can be found in the [Configuring DB IDO](02-installation.md#configuring-db-ido-mysql) 80chapter. Details on the configuration can be found in the 81[IdoMysqlConnection](09-object-types.md#objecttype-idomysqlconnection) and 82[IdoPgsqlConnection](09-object-types.md#objecttype-idopgsqlconnection) 83object configuration documentation. 84 85#### DB IDO Health <a id="db-ido-health"></a> 86 87If the monitoring health indicator is critical in Icinga Web 2, 88you can use the following queries to manually check whether Icinga 2 89is actually updating the IDO database. 90 91Icinga 2 writes its current status to the `icinga_programstatus` table 92every 10 seconds. The query below checks 60 seconds into the past which is a reasonable 93amount of time -- adjust it for your requirements. If the condition is not met, 94the query returns an empty result. 95 96> **Tip** 97> 98> Use [check plugins](05-service-monitoring.md#service-monitoring-plugins) to monitor the backend. 99 100Replace the `default` string with your instance name if different. 101 102Example for MySQL: 103 104``` 105# mysql -u root -p icinga -e "SELECT status_update_time FROM icinga_programstatus ps 106 JOIN icinga_instances i ON ps.instance_id=i.instance_id 107 WHERE (UNIX_TIMESTAMP(ps.status_update_time) > UNIX_TIMESTAMP(NOW())-60) 108 AND i.instance_name='default';" 109 110+---------------------+ 111| status_update_time | 112+---------------------+ 113| 2014-05-29 14:29:56 | 114+---------------------+ 115``` 116 117Example for PostgreSQL: 118 119``` 120# export PGPASSWORD=icinga; psql -U icinga -d icinga -c "SELECT ps.status_update_time FROM icinga_programstatus AS ps 121 JOIN icinga_instances AS i ON ps.instance_id=i.instance_id 122 WHERE ((SELECT extract(epoch from status_update_time) FROM icinga_programstatus) > (SELECT extract(epoch from now())-60)) 123 AND i.instance_name='default'"; 124 125status_update_time 126------------------------ 127 2014-05-29 15:11:38+02 128(1 Zeile) 129``` 130 131A detailed list on the available table attributes can be found in the [DB IDO Schema documentation](24-appendix.md#schema-db-ido). 132 133#### DB IDO in Cluster HA Zones <a id="db-ido-cluster-ha"></a> 134 135The DB IDO feature supports [High Availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-db-ido) in 136the Icinga 2 cluster. 137 138By default, both endpoints in a zone calculate the 139endpoint which activates the feature, the other endpoint 140automatically pauses it. If the cluster connection 141breaks at some point, the paused IDO feature automatically 142does a failover. 143 144You can disable this behaviour by setting `enable_ha = false` 145in both feature configuration files. 146 147#### DB IDO Cleanup <a id="db-ido-cleanup"></a> 148 149Objects get deactivated when they are deleted from the configuration. 150This is visible with the `is_active` column in the `icinga_objects` table. 151Therefore all queries need to join this table and add `WHERE is_active=1` as 152condition. Deleted objects preserve their history table entries for later SLA 153reporting. 154 155Historical data isn't purged by default. You can enable the least 156kept data age inside the `cleanup` configuration attribute for the 157IDO features [IdoMysqlConnection](09-object-types.md#objecttype-idomysqlconnection) 158and [IdoPgsqlConnection](09-object-types.md#objecttype-idopgsqlconnection). 159 160Example if you prefer to keep notification history for 30 days: 161 162``` 163 cleanup = { 164 notifications_age = 30d 165 contactnotifications_age = 30d 166 } 167``` 168 169The historical tables are populated depending on the data `categories` specified. 170Some tables are empty by default. 171 172#### DB IDO Tuning <a id="db-ido-tuning"></a> 173 174As with any application database, there are ways to optimize and tune the database performance. 175 176General tips for performance tuning: 177 178* [MariaDB KB](https://mariadb.com/kb/en/library/optimization-and-tuning/) 179* [PostgreSQL Wiki](https://wiki.postgresql.org/wiki/Performance_Optimization) 180 181Re-creation of indexes, changed column values, etc. will increase the database size. Ensure to 182add health checks for this, and monitor the trend in your Grafana dashboards. 183 184In order to optimize the tables, there are different approaches. Always keep in mind to have a 185current backup and schedule maintenance downtime for these kind of tasks! 186 187MySQL: 188 189``` 190mariadb> OPTIMIZE TABLE icinga_statehistory; 191``` 192 193> **Important** 194> 195> Tables might not support optimization at runtime. This can take a **long** time. 196> 197> `Table does not support optimize, doing recreate + analyze instead`. 198 199If you want to optimize all tables in a specified database, there is a script called `mysqlcheck`. 200This also allows to repair broken tables in the case of emergency. 201 202```bash 203mysqlcheck --optimize icinga 204``` 205 206PostgreSQL: 207 208``` 209icinga=# vacuum; 210VACUUM 211``` 212 213> **Note** 214> 215> Don't use `VACUUM FULL` as this has a severe impact on performance. 216 217 218## Metrics <a id="metrics"></a> 219 220Whenever a host or service check is executed, or received via the REST API, 221best practice is to provide performance data. 222 223This data is parsed by features sending metrics to time series databases (TSDB): 224 225* [Graphite](14-features.md#graphite-carbon-cache-writer) 226* [InfluxDB](14-features.md#influxdb-writer) 227* [OpenTSDB](14-features.md#opentsdb-writer) 228 229Metrics, state changes and notifications can be managed with the following integrations: 230 231* [Elastic Stack](14-features.md#elastic-stack-integration) 232* [Graylog](14-features.md#graylog-integration) 233 234 235### Graphite Writer <a id="graphite-carbon-cache-writer"></a> 236 237[Graphite](13-addons.md#addons-graphing-graphite) is a tool stack for storing 238metrics and needs to be running prior to enabling the `graphite` feature. 239 240Icinga 2 writes parsed metrics directly to Graphite's Carbon Cache 241TCP port, defaulting to `2003`. 242 243You can enable the feature using 244 245```bash 246icinga2 feature enable graphite 247``` 248 249By default the [GraphiteWriter](09-object-types.md#objecttype-graphitewriter) feature 250expects the Graphite Carbon Cache to listen at `127.0.0.1` on TCP port `2003`. 251 252#### Graphite Schema <a id="graphite-carbon-cache-writer-schema"></a> 253 254The current naming schema is defined as follows. The [Icinga Web 2 Graphite module](https://icinga.com/products/integrations/graphite/) 255depends on this schema. 256 257The default prefix for hosts and services is configured using 258[runtime macros](03-monitoring-basics.md#runtime-macros)like this: 259 260``` 261icinga2.$host.name$.host.$host.check_command$ 262icinga2.$host.name$.services.$service.name$.$service.check_command$ 263``` 264 265You can customize the prefix name by using the `host_name_template` and 266`service_name_template` configuration attributes. 267 268The additional levels will allow fine granular filters and also template 269capabilities, e.g. by using the check command `disk` for specific 270graph templates in web applications rendering the Graphite data. 271 272The following characters are escaped in prefix labels: 273 274 Character | Escaped character 275 --------------|-------------------------- 276 whitespace | _ 277 . | _ 278 \ | _ 279 / | _ 280 281Metric values are stored like this: 282 283``` 284<prefix>.perfdata.<perfdata-label>.value 285``` 286 287The following characters are escaped in performance labels 288parsed from plugin output: 289 290 Character | Escaped character 291 --------------|-------------------------- 292 whitespace | _ 293 \ | _ 294 / | _ 295 :: | . 296 297Note that labels may contain dots (`.`) allowing to 298add more subsequent levels inside the Graphite tree. 299`::` adds support for [multi performance labels](http://my-plugin.de/wiki/projects/check_multi/configuration/performance) 300and is therefore replaced by `.`. 301 302By enabling `enable_send_thresholds` Icinga 2 automatically adds the following threshold metrics: 303 304``` 305<prefix>.perfdata.<perfdata-label>.min 306<prefix>.perfdata.<perfdata-label>.max 307<prefix>.perfdata.<perfdata-label>.warn 308<prefix>.perfdata.<perfdata-label>.crit 309``` 310 311By enabling `enable_send_metadata` Icinga 2 automatically adds the following metadata metrics: 312 313``` 314<prefix>.metadata.current_attempt 315<prefix>.metadata.downtime_depth 316<prefix>.metadata.acknowledgement 317<prefix>.metadata.execution_time 318<prefix>.metadata.latency 319<prefix>.metadata.max_check_attempts 320<prefix>.metadata.reachable 321<prefix>.metadata.state 322<prefix>.metadata.state_type 323``` 324 325Metadata metric overview: 326 327 metric | description 328 -------------------|------------------------------------------ 329 current_attempt | current check attempt 330 max_check_attempts | maximum check attempts until the hard state is reached 331 reachable | checked object is reachable 332 downtime_depth | number of downtimes this object is in 333 acknowledgement | whether the object is acknowledged or not 334 execution_time | check execution time 335 latency | check latency 336 state | current state of the checked object 337 state_type | 0=SOFT, 1=HARD state 338 339The following example illustrates how to configure the storage schemas for Graphite Carbon 340Cache. 341 342``` 343[icinga2_default] 344# intervals like PNP4Nagios uses them per default 345pattern = ^icinga2\. 346retentions = 1m:2d,5m:10d,30m:90d,360m:4y 347``` 348 349#### Graphite in Cluster HA Zones <a id="graphite-carbon-cache-writer-cluster-ha"></a> 350 351The Graphite feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features) 352in cluster zones since 2.11. 353 354By default, all endpoints in a zone will activate the feature and start 355writing metrics to a Carbon Cache socket. In HA enabled scenarios, 356it is possible to set `enable_ha = true` in all feature configuration 357files. This allows each endpoint to calculate the feature authority, 358and only one endpoint actively writes metrics, the other endpoints 359pause the feature. 360 361When the cluster connection breaks at some point, the remaining endpoint(s) 362in that zone will automatically resume the feature. This built-in failover 363mechanism ensures that metrics are written even if the cluster fails. 364 365The recommended way of running Graphite in this scenario is a dedicated server 366where Carbon Cache/Relay is running as receiver. 367 368 369### InfluxDB Writer <a id="influxdb-writer"></a> 370 371Once there are new metrics available, Icinga 2 will directly write them to the 372defined InfluxDB v1/v2 HTTP API. 373 374You can enable the feature using 375 376```bash 377icinga2 feature enable influxdb 378``` 379 380or 381 382```bash 383icinga2 feature enable influxdb2 384``` 385 386By default the 387[InfluxdbWriter](09-object-types.md#objecttype-influxdbwriter)/[Influxdb2Writer](09-object-types.md#objecttype-influxdb2writer) 388features expect the InfluxDB daemon to listen at `127.0.0.1` on port `8086`. 389 390Measurement names and tags are fully configurable by the end user. The Influxdb(2)Writer 391object will automatically add a `metric` tag to each data point. This correlates to the 392perfdata label. Fields (value, warn, crit, min, max, unit) are created from data if available 393and the configuration allows it. If a value associated with a tag is not able to be 394resolved, it will be dropped and not sent to the target host. 395 396Backslashes are allowed in tag keys, tag values and field keys, however they are also 397escape characters when followed by a space or comma, but cannot be escaped themselves. 398As a result all trailling slashes in these fields are replaced with an underscore. This 399predominantly affects Windows paths e.g. `C:\` becomes `C:_`. 400 401The database/bucket is assumed to exist so this object will make no attempt to create it currently. 402 403If [SELinux](22-selinux.md#selinux) is enabled, it will not allow access for Icinga 2 to InfluxDB until the [boolean](22-selinux.md#selinux-policy-booleans) 404`icinga2_can_connect_all` is set to true as InfluxDB is not providing its own policy. 405 406More configuration details can be found [here for v1](09-object-types.md#objecttype-influxdbwriter) 407and [here for v2](09-object-types.md#objecttype-influxdb2writer). 408 409#### Instance Tagging <a id="influxdb-writer-instance-tags"></a> 410 411Consider the following service check: 412 413``` 414apply Service "disk" for (disk => attributes in host.vars.disks) { 415 import "generic-service" 416 check_command = "disk" 417 display_name = "Disk " + disk 418 vars.disk_partitions = disk 419 assign where host.vars.disks 420} 421``` 422 423This is a typical pattern for checking individual disks, NICs, TLS certificates etc associated 424with a host. What would be useful is to have the data points tagged with the specific instance 425for that check. This would allow you to query time series data for a check on a host and for a 426specific instance e.g. /dev/sda. To do this quite simply add the instance to the service variables: 427 428``` 429apply Service "disk" for (disk => attributes in host.vars.disks) { 430 ... 431 vars.instance = disk 432 ... 433} 434``` 435 436Then modify your writer configuration to add this tag to your data points if the instance variable 437is associated with the service: 438 439``` 440object InfluxdbWriter "influxdb" { 441 ... 442 service_template = { 443 measurement = "$service.check_command$" 444 tags = { 445 hostname = "$host.name$" 446 service = "$service.name$" 447 instance = "$service.vars.instance$" 448 } 449 } 450 ... 451} 452``` 453 454#### InfluxDB in Cluster HA Zones <a id="influxdb-writer-cluster-ha"></a> 455 456The InfluxDB feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features) 457in cluster zones since 2.11. 458 459By default, all endpoints in a zone will activate the feature and start 460writing metrics to the InfluxDB HTTP API. In HA enabled scenarios, 461it is possible to set `enable_ha = true` in all feature configuration 462files. This allows each endpoint to calculate the feature authority, 463and only one endpoint actively writes metrics, the other endpoints 464pause the feature. 465 466When the cluster connection breaks at some point, the remaining endpoint(s) 467in that zone will automatically resume the feature. This built-in failover 468mechanism ensures that metrics are written even if the cluster fails. 469 470The recommended way of running InfluxDB in this scenario is a dedicated server 471where the InfluxDB HTTP API or Telegraf as Proxy are running. 472 473### Elastic Stack Integration <a id="elastic-stack-integration"></a> 474 475[Icingabeat](https://icinga.com/products/integrations/elastic/) is an Elastic Beat that fetches data 476from the Icinga 2 API and sends it either directly to [Elasticsearch](https://www.elastic.co/products/elasticsearch) 477or [Logstash](https://www.elastic.co/products/logstash). 478 479More integrations: 480 481* [Logstash output](https://icinga.com/products/integrations/elastic/) for the Icinga 2 API. 482* [Logstash Grok Pattern](https://icinga.com/products/integrations/elastic/) for Icinga 2 logs. 483 484#### Elasticsearch Writer <a id="elasticsearch-writer"></a> 485 486This feature forwards check results, state changes and notification events 487to an [Elasticsearch](https://www.elastic.co/products/elasticsearch) installation over its HTTP API. 488 489The check results include parsed performance data metrics if enabled. 490 491> **Note** 492> 493> Elasticsearch 5.x or 6.x are required. This feature has been successfully tested with 494> Elasticsearch 5.6.7 and 6.3.1. 495 496 497 498Enable the feature and restart Icinga 2. 499 500```bash 501icinga2 feature enable elasticsearch 502``` 503 504The default configuration expects an Elasticsearch instance running on `localhost` on port `9200` 505 and writes to an index called `icinga2`. 506 507More configuration details can be found [here](09-object-types.md#objecttype-elasticsearchwriter). 508 509#### Current Elasticsearch Schema <a id="elastic-writer-schema"></a> 510 511The following event types are written to Elasticsearch: 512 513* icinga2.event.checkresult 514* icinga2.event.statechange 515* icinga2.event.notification 516 517Performance data metrics must be explicitly enabled with the `enable_send_perfdata` 518attribute. 519 520Metric values are stored like this: 521 522``` 523check_result.perfdata.<perfdata-label>.value 524``` 525 526The following characters are escaped in perfdata labels: 527 528 Character | Escaped character 529 ------------|-------------------------- 530 whitespace | _ 531 \ | _ 532 / | _ 533 :: | . 534 535Note that perfdata labels may contain dots (`.`) allowing to 536add more subsequent levels inside the tree. 537`::` adds support for [multi performance labels](http://my-plugin.de/wiki/projects/check_multi/configuration/performance) 538and is therefore replaced by `.`. 539 540Icinga 2 automatically adds the following threshold metrics 541if existing: 542 543``` 544check_result.perfdata.<perfdata-label>.min 545check_result.perfdata.<perfdata-label>.max 546check_result.perfdata.<perfdata-label>.warn 547check_result.perfdata.<perfdata-label>.crit 548``` 549 550#### Elasticsearch in Cluster HA Zones <a id="elasticsearch-writer-cluster-ha"></a> 551 552The Elasticsearch feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features) 553in cluster zones since 2.11. 554 555By default, all endpoints in a zone will activate the feature and start 556writing events to the Elasticsearch HTTP API. In HA enabled scenarios, 557it is possible to set `enable_ha = true` in all feature configuration 558files. This allows each endpoint to calculate the feature authority, 559and only one endpoint actively writes events, the other endpoints 560pause the feature. 561 562When the cluster connection breaks at some point, the remaining endpoint(s) 563in that zone will automatically resume the feature. This built-in failover 564mechanism ensures that events are written even if the cluster fails. 565 566The recommended way of running Elasticsearch in this scenario is a dedicated server 567where you either have the Elasticsearch HTTP API, or a TLS secured HTTP proxy, 568or Logstash for additional filtering. 569 570### Graylog Integration <a id="graylog-integration"></a> 571 572#### GELF Writer <a id="gelfwriter"></a> 573 574The `Graylog Extended Log Format` (short: [GELF](https://docs.graylog.org/en/latest/pages/gelf.html)) 575can be used to send application logs directly to a TCP socket. 576 577While it has been specified by the [Graylog](https://www.graylog.org) project as their 578[input resource standard](https://docs.graylog.org/en/latest/pages/sending_data.html), other tools such as 579[Logstash](https://www.elastic.co/products/logstash) also support `GELF` as 580[input type](https://www.elastic.co/guide/en/logstash/current/plugins-inputs-gelf.html). 581 582You can enable the feature using 583 584```bash 585icinga2 feature enable gelf 586``` 587 588By default the `GelfWriter` object expects the GELF receiver to listen at `127.0.0.1` on TCP port `12201`. 589The default `source` attribute is set to `icinga2`. You can customize that for your needs if required. 590 591Currently these events are processed: 592* Check results 593* State changes 594* Notifications 595 596#### Graylog/GELF in Cluster HA Zones <a id="gelf-writer-cluster-ha"></a> 597 598The Gelf feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features) 599in cluster zones since 2.11. 600 601By default, all endpoints in a zone will activate the feature and start 602writing events to the Graylog HTTP API. In HA enabled scenarios, 603it is possible to set `enable_ha = true` in all feature configuration 604files. This allows each endpoint to calculate the feature authority, 605and only one endpoint actively writes events, the other endpoints 606pause the feature. 607 608When the cluster connection breaks at some point, the remaining endpoint(s) 609in that zone will automatically resume the feature. This built-in failover 610mechanism ensures that events are written even if the cluster fails. 611 612The recommended way of running Graylog in this scenario is a dedicated server 613where you have the Graylog HTTP API listening. 614 615### OpenTSDB Writer <a id="opentsdb-writer"></a> 616 617While there are some OpenTSDB collector scripts and daemons like tcollector available for 618Icinga 1.x it's more reasonable to directly process the check and plugin performance 619in memory in Icinga 2. Once there are new metrics available, Icinga 2 will directly 620write them to the defined TSDB TCP socket. 621 622You can enable the feature using 623 624```bash 625icinga2 feature enable opentsdb 626``` 627 628By default the `OpenTsdbWriter` object expects the TSD to listen at 629`127.0.0.1` on port `4242`. 630 631The current default naming schema is: 632 633``` 634icinga.host.<perfdata_metric_label> 635icinga.service.<servicename>.<perfdata_metric_label> 636``` 637 638for host and service checks. The tag `host` is always applied. 639 640Icinga also sends perfdata warning, critical, minimum and maximum threshold values to OpenTSDB. 641These are stored as new OpenTSDB metric names appended with `_warn`, `_crit`, `_min`, `_max`. 642Values are only stored when the corresponding threshold exists in Icinga's perfdata. 643 644Example: 645``` 646icinga.service.<servicename>.<perfdata_metric_label> 647icinga.service.<servicename>.<perfdata_metric_label>._warn 648icinga.service.<servicename>.<perfdata_metric_label>._crit 649icinga.service.<servicename>.<perfdata_metric_label>._min 650icinga.service.<servicename>.<perfdata_metric_label>._max 651``` 652 653To make sure Icinga 2 writes a valid metric into OpenTSDB some characters are replaced 654with `_` in the target name: 655 656``` 657\ : (and space) 658``` 659 660The resulting name in OpenTSDB might look like: 661 662``` 663www-01 / http-cert / response time 664icinga.http_cert.response_time 665``` 666 667In addition to the performance data retrieved from the check plugin, Icinga 2 sends 668internal check statistic data to OpenTSDB: 669 670 metric | description 671 -------------------|------------------------------------------ 672 current_attempt | current check attempt 673 max_check_attempts | maximum check attempts until the hard state is reached 674 reachable | checked object is reachable 675 downtime_depth | number of downtimes this object is in 676 acknowledgement | whether the object is acknowledged or not 677 execution_time | check execution time 678 latency | check latency 679 state | current state of the checked object 680 state_type | 0=SOFT, 1=HARD state 681 682While reachable, state and state_type are metrics for the host or service the 683other metrics follow the current naming schema 684 685``` 686icinga.check.<metricname> 687``` 688 689with the following tags 690 691 tag | description 692 --------|------------------------------------------ 693 type | the check type, one of [host, service] 694 host | hostname, the check ran on 695 service | the service name (if type=service) 696 697> **Note** 698> 699> You might want to set the tsd.core.auto_create_metrics setting to `true` 700> in your opentsdb.conf configuration file. 701 702#### OpenTSDB Metric Prefix <a id="opentsdb-metric-prefix"></a> 703Functionality exists to modify the built in OpenTSDB metric names that the plugin 704writes to. By default this is `icinga.host` and `icinga.service.<servicename>`. 705 706These prefixes can be modified as necessary to any arbitary string. The prefix 707configuration also supports Icinga macros, so if you rather use `<checkcommand>` 708or any other variable instead of `<servicename>` you may do so. 709 710To configure OpenTSDB metric name prefixes, create or modify the `host_template` and/or 711`service_template` blocks in the `opentsdb.conf` file, to add a `metric` definition. 712These modifications go hand in hand with the **OpenTSDB Custom Tag Support** detailed below, 713and more information around macro use can be found there. 714 715Additionally, using custom Metric Prefixes or your own macros in the prefix may be 716helpful if you are using the **OpenTSDB Generic Metric** functionality detailed below. 717 718An example configuration which includes prefix name modification: 719 720``` 721object OpenTsdbWriter "opentsdb" { 722 host = "127.0.0.1" 723 port = 4242 724 host_template = { 725 metric = "icinga.myhost" 726 tags = { 727 location = "$host.vars.location$" 728 checkcommand = "$host.check_command$" 729 } 730 } 731 service_template = { 732 metric = "icinga.service.$service.check_command$" 733 } 734} 735``` 736 737The above configuration will output the following naming schema: 738``` 739icinga.myhost.<perfdata_metric_label> 740icinga.service.<check_command_name>.<perfdata_metric_label> 741``` 742Note how `<perfdata_metric_label>` is always appended in the default naming schema mode. 743 744#### OpenTSDB Generic Metric Naming Schema <a id="opentsdb-generic-metrics"></a> 745 746An alternate naming schema (`Generic Metrics`) is available where OpenTSDB metric names are more generic 747and do not include the Icinga perfdata label in the metric name. Instead, 748perfdata labels are stored in a tag `label` which is stored along with each perfdata value. 749 750This ultimately reduces the number of unique OpenTSDB metric names which may make 751querying aggregate data easier. This also allows you to store all perfdata values for a 752particular check inside one OpenTSDB metric name for each check. 753 754This alternate naming schema can be enabled by setting the following in the OpenTSDBWriter config: 755`enable_generic_metrics = true` 756 757> **Tip** 758> Consider using `Generic Metrics` along with the **OpenTSDB Metric Prefix** naming options 759> described above 760 761An example of this naming schema when compared to the default is: 762 763``` 764icinga.host 765icinga.service.<servicename> 766``` 767 768> **Note** 769> Note how `<perfdata_metric_label>` does not appear in the OpenTSDB metric name 770> when using `Generic Metrics`. Instead, a new tag `label` appears on each value written 771> to OpenTSDB which contains the perfdata label. 772 773#### Custom Tags <a id="opentsdb-custom-tags"></a> 774 775In addition to the default tags listed above, it is possible to send 776your own custom tags with your data to OpenTSDB. 777 778Note that custom tags are sent **in addition** to the default hostname, 779type and service name tags. If you do not include this section in the 780config file, no custom tags will be included. 781 782Custom tags can be custom attributes or built in attributes. 783 784Consider a host object: 785 786``` 787object Host "my-server1" { 788 address = "10.0.0.1" 789 check_command = "hostalive" 790 vars.location = "Australia" 791} 792``` 793 794and a service object: 795 796``` 797object Service "ping" { 798 host_name = "localhost" 799 check_command = "my-ping" 800 801 vars.ping_packets = 10 802} 803``` 804 805It is possible to send `vars.location` and `vars.ping_packets` along 806with performance data. Additionally, any other attribute can be sent 807as a tag, such as `check_command`. 808 809You can make use of the `host_template` and `service_template` blocks 810in the `opentsdb.conf` configuration file. 811 812An example OpenTSDB configuration file which makes use of custom tags: 813 814``` 815object OpenTsdbWriter "opentsdb" { 816 host = "127.0.0.1" 817 port = 4242 818 host_template = { 819 tags = { 820 location = "$host.vars.location$" 821 checkcommand = "$host.check_command$" 822 } 823 } 824 service_template = { 825 tags = { 826 location = "$host.vars.location$" 827 pingpackets = "$service.vars.ping_packets$" 828 checkcommand = "$service.check_command$" 829 } 830 } 831} 832``` 833 834Depending on what keyword the macro begins with, will determine what 835attributes are available in the macro context. The below table explains 836what attributes are available with links to each object type. 837 838 start of macro | description 839 ---------------|------------------------------------------ 840 \$host...$ | Attributes available on a [Host object](09-object-types.md#objecttype-host) 841 \$service...$ | Attributes available on a [Service object](09-object-types.md#objecttype-service) 842 \$icinga...$ | Attributes available on the [IcingaApplication object](09-object-types.md#objecttype-icingaapplication) 843 844> **Note** 845> 846> Ensure you do not name your custom attributes with a dot in the name. 847> Dots located inside a macro tell the interpreter to expand a 848> dictionary. 849> 850> Do not do this in your object configuration: 851> 852> `vars["my.attribute"]` 853> 854> as you will be unable to reference `my.attribute` because it is not a 855> dictionary. 856> 857> Instead, use underscores or another character: 858> 859> `vars.my_attribute` or `vars["my_attribute"]` 860 861 862 863#### OpenTSDB in Cluster HA Zones <a id="opentsdb-writer-cluster-ha"></a> 864 865The OpenTSDB feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features) 866in cluster zones since 2.11. 867 868By default, all endpoints in a zone will activate the feature and start 869writing events to the OpenTSDB listener. In HA enabled scenarios, 870it is possible to set `enable_ha = true` in all feature configuration 871files. This allows each endpoint to calculate the feature authority, 872and only one endpoint actively writes metrics, the other endpoints 873pause the feature. 874 875When the cluster connection breaks at some point, the remaining endpoint(s) 876in that zone will automatically resume the feature. This built-in failover 877mechanism ensures that metrics are written even if the cluster fails. 878 879The recommended way of running OpenTSDB in this scenario is a dedicated server 880where you have OpenTSDB running. 881 882 883### Writing Performance Data Files <a id="writing-performance-data-files"></a> 884 885PNP and Graphios use performance data collector daemons to fetch 886the current performance files for their backend updates. 887 888Therefore the Icinga 2 [PerfdataWriter](09-object-types.md#objecttype-perfdatawriter) 889feature allows you to define the output template format for host and services helped 890with Icinga 2 runtime vars. 891 892``` 893host_format_template = "DATATYPE::HOSTPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tHOSTPERFDATA::$host.perfdata$\tHOSTCHECKCOMMAND::$host.check_command$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.state_type$" 894service_format_template = "DATATYPE::SERVICEPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tSERVICEDESC::$service.name$\tSERVICEPERFDATA::$service.perfdata$\tSERVICECHECKCOMMAND::$service.check_command$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.state_type$\tSERVICESTATE::$service.state$\tSERVICESTATETYPE::$service.state_type$" 895``` 896 897The default templates are already provided with the Icinga 2 feature configuration 898which can be enabled using 899 900```bash 901icinga2 feature enable perfdata 902``` 903 904By default all performance data files are rotated in a 15 seconds interval into 905the `/var/spool/icinga2/perfdata/` directory as `host-perfdata.<timestamp>` and 906`service-perfdata.<timestamp>`. 907External collectors need to parse the rotated performance data files and then 908remove the processed files. 909 910#### Perfdata Files in Cluster HA Zones <a id="perfdata-writer-cluster-ha"></a> 911 912The Perfdata feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features) 913in cluster zones since 2.11. 914 915By default, all endpoints in a zone will activate the feature and start 916writing metrics to the local spool directory. In HA enabled scenarios, 917it is possible to set `enable_ha = true` in all feature configuration 918files. This allows each endpoint to calculate the feature authority, 919and only one endpoint actively writes metrics, the other endpoints 920pause the feature. 921 922When the cluster connection breaks at some point, the remaining endpoint(s) 923in that zone will automatically resume the feature. This built-in failover 924mechanism ensures that metrics are written even if the cluster fails. 925 926The recommended way of running Perfdata is to mount the perfdata spool 927directory via NFS on a central server where PNP with the NPCD collector 928is running on. 929 930 931 932 933 934## Deprecated Features <a id="deprecated-features"></a> 935 936### Status Data Files <a id="status-data"></a> 937 938> **Note** 939> 940> This feature is DEPRECATED and may be removed in future releases. 941> Check the [roadmap](https://github.com/Icinga/icinga2/milestones). 942 943Icinga 1.x writes object configuration data and status data in a cyclic 944interval to its `objects.cache` and `status.dat` files. Icinga 2 provides 945the `StatusDataWriter` object which dumps all configuration objects and 946status updates in a regular interval. 947 948```bash 949icinga2 feature enable statusdata 950``` 951 952If you are not using any web interface or addon which uses these files, 953you can safely disable this feature. 954 955### Compat Log Files <a id="compat-logging"></a> 956 957> **Note** 958> 959> This feature is DEPRECATED and may be removed in future releases. 960> Check the [roadmap](https://github.com/Icinga/icinga2/milestones). 961 962The Icinga 1.x log format is considered being the `Compat Log` 963in Icinga 2 provided with the `CompatLogger` object. 964 965These logs are used for informational representation in 966external web interfaces parsing the logs, but also to generate 967SLA reports and trends. 968The [Livestatus](14-features.md#setting-up-livestatus) feature uses these logs 969for answering queries to historical tables. 970 971The `CompatLogger` object can be enabled with 972 973```bash 974icinga2 feature enable compatlog 975``` 976 977By default, the Icinga 1.x log file called `icinga.log` is located 978in `/var/log/icinga2/compat`. Rotated log files are moved into 979`var/log/icinga2/compat/archives`. 980 981### External Command Pipe <a id="external-commands"></a> 982 983> **Note** 984> 985> Please use the [REST API](12-icinga2-api.md#icinga2-api) as modern and secure alternative 986> for external actions. 987 988> **Note** 989> 990> This feature is DEPRECATED and may be removed in future releases. 991> Check the [roadmap](https://github.com/Icinga/icinga2/milestones). 992 993Icinga 2 provides an external command pipe for processing commands 994triggering specific actions (for example rescheduling a service check 995through the web interface). 996 997In order to enable the `ExternalCommandListener` configuration use the 998following command and restart Icinga 2 afterwards: 999 1000```bash 1001icinga2 feature enable command 1002``` 1003 1004Icinga 2 creates the command pipe file as `/var/run/icinga2/cmd/icinga2.cmd` 1005using the default configuration. 1006 1007Web interfaces and other Icinga addons are able to send commands to 1008Icinga 2 through the external command pipe, for example for rescheduling 1009a forced service check: 1010 1011``` 1012# /bin/echo "[`date +%s`] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;`date +%s`" >> /var/run/icinga2/cmd/icinga2.cmd 1013 1014# tail -f /var/log/messages 1015 1016Oct 17 15:01:25 icinga-server icinga2: Executing external command: [1382014885] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382014885 1017Oct 17 15:01:25 icinga-server icinga2: Rescheduling next check for service 'ping4' 1018``` 1019 1020A list of currently supported external commands can be found [here](24-appendix.md#external-commands-list-detail). 1021 1022Detailed information on the commands and their required parameters can be found 1023on the [Icinga 1.x documentation](https://docs.icinga.com/latest/en/extcommands2.html). 1024 1025 1026### Check Result Files <a id="check-result-files"></a> 1027 1028> **Note** 1029> 1030> This feature is DEPRECATED and may be removed in future releases. 1031> Check the [roadmap](https://github.com/Icinga/icinga2/milestones). 1032 1033Icinga 1.x writes its check result files to a temporary spool directory 1034where they are processed in a regular interval. 1035While this is extremely inefficient in performance regards it has been 1036rendered useful for passing passive check results directly into Icinga 1.x 1037skipping the external command pipe. 1038 1039Several clustered/distributed environments and check-aggregation addons 1040use that method. In order to support step-by-step migration of these 1041environments, Icinga 2 supports the `CheckResultReader` object. 1042 1043There is no feature configuration available, but it must be defined 1044on-demand in your Icinga 2 objects configuration. 1045 1046``` 1047object CheckResultReader "reader" { 1048 spool_dir = "/data/check-results" 1049} 1050``` 1051 1052### Livestatus <a id="setting-up-livestatus"></a> 1053 1054> **Note** 1055> 1056> This feature is DEPRECATED and may be removed in future releases. 1057> Check the [roadmap](https://github.com/Icinga/icinga2/milestones). 1058 1059The [MK Livestatus](https://mathias-kettner.de/checkmk_livestatus.html) project 1060implements a query protocol that lets users query their Icinga instance for 1061status information. It can also be used to send commands. 1062 1063The Livestatus component that is distributed as part of Icinga 2 is a 1064re-implementation of the Livestatus protocol which is compatible with MK 1065Livestatus. 1066 1067> **Tip** 1068> 1069> Only install the Livestatus feature if your web interface or addon requires 1070> you to do so. 1071> [Icinga Web 2](02-installation.md#setting-up-icingaweb2) does not need 1072> Livestatus. 1073 1074Details on the available tables and attributes with Icinga 2 can be found 1075in the [Livestatus Schema](24-appendix.md#schema-livestatus) section. 1076 1077You can enable Livestatus using icinga2 feature enable: 1078 1079```bash 1080icinga2 feature enable livestatus 1081``` 1082 1083After that you will have to restart Icinga 2: 1084 1085```bash 1086systemctl restart icinga2 1087``` 1088 1089By default the Livestatus socket is available in `/var/run/icinga2/cmd/livestatus`. 1090 1091In order for queries and commands to work you will need to add your query user 1092(e.g. your web server) to the `icingacmd` group: 1093 1094```bash 1095usermod -a -G icingacmd www-data 1096``` 1097 1098The Debian packages use `nagios` as the user and group name. Make sure to change `icingacmd` to 1099`nagios` if you're using Debian. 1100 1101Change `www-data` to the user you're using to run queries. 1102 1103In order to use the historical tables provided by the livestatus feature (for example, the 1104`log` table) you need to have the `CompatLogger` feature enabled. By default these logs 1105are expected to be in `/var/log/icinga2/compat`. A different path can be set using the 1106`compat_log_path` configuration attribute. 1107 1108```bash 1109icinga2 feature enable compatlog 1110``` 1111 1112#### Livestatus Sockets <a id="livestatus-sockets"></a> 1113 1114Other to the Icinga 1.x Addon, Icinga 2 supports two socket types 1115 1116* Unix socket (default) 1117* TCP socket 1118 1119Details on the configuration can be found in the [LivestatusListener](09-object-types.md#objecttype-livestatuslistener) 1120object configuration. 1121 1122#### Livestatus GET Queries <a id="livestatus-get-queries"></a> 1123 1124> **Note** 1125> 1126> All Livestatus queries require an additional empty line as query end identifier. 1127> The `nc` tool (`netcat`) provides the `-U` parameter to communicate using 1128> a unix socket. 1129 1130There also is a Perl module available in CPAN for accessing the Livestatus socket 1131programmatically: [Monitoring::Livestatus](https://metacpan.org/release/NIERLEIN/Monitoring-Livestatus-0.74) 1132 1133 1134Example using the unix socket: 1135 1136``` 1137# echo -e "GET services\n" | /usr/bin/nc -U /var/run/icinga2/cmd/livestatus 1138 1139Example using the tcp socket listening on port `6558`: 1140 1141# echo -e 'GET services\n' | netcat 127.0.0.1 6558 1142 1143# cat servicegroups <<EOF 1144GET servicegroups 1145 1146EOF 1147 1148(cat servicegroups; sleep 1) | netcat 127.0.0.1 6558 1149``` 1150 1151#### Livestatus COMMAND Queries <a id="livestatus-command-queries"></a> 1152 1153A list of available external commands and their parameters can be found [here](24-appendix.md#external-commands-list-detail) 1154 1155```bash 1156echo -e 'COMMAND <externalcommandstring>' | netcat 127.0.0.1 6558 1157``` 1158 1159#### Livestatus Filters <a id="livestatus-filters"></a> 1160 1161and, or, negate 1162 1163 Operator | Negate | Description 1164 ----------|----------|------------- 1165 = | != | Equality 1166 ~ | !~ | Regex match 1167 =~ | !=~ | Equality ignoring case 1168 ~~ | !~~ | Regex ignoring case 1169 < | | Less than 1170 > | | Greater than 1171 <= | | Less than or equal 1172 >= | | Greater than or equal 1173 1174 1175#### Livestatus Stats <a id="livestatus-stats"></a> 1176 1177Schema: "Stats: aggregatefunction aggregateattribute" 1178 1179 Aggregate Function | Description 1180 -------------------|-------------- 1181 sum | 1182 min | 1183 max | 1184 avg | sum / count 1185 std | standard deviation 1186 suminv | sum (1 / value) 1187 avginv | suminv / count 1188 count | ordinary default for any stats query if not aggregate function defined 1189 1190Example: 1191 1192``` 1193GET hosts 1194Filter: has_been_checked = 1 1195Filter: check_type = 0 1196Stats: sum execution_time 1197Stats: sum latency 1198Stats: sum percent_state_change 1199Stats: min execution_time 1200Stats: min latency 1201Stats: min percent_state_change 1202Stats: max execution_time 1203Stats: max latency 1204Stats: max percent_state_change 1205OutputFormat: json 1206ResponseHeader: fixed16 1207``` 1208 1209#### Livestatus Output <a id="livestatus-output"></a> 1210 1211* CSV 1212 1213CSV output uses two levels of array separators: The members array separator 1214is a comma (1st level) while extra info and host|service relation separator 1215is a pipe (2nd level). 1216 1217Separators can be set using ASCII codes like: 1218 1219``` 1220Separators: 10 59 44 124 1221``` 1222 1223* JSON 1224 1225Default separators. 1226 1227#### Livestatus Error Codes <a id="livestatus-error-codes"></a> 1228 1229 Code | Description 1230 ----------|-------------- 1231 200 | OK 1232 404 | Table does not exist 1233 452 | Exception on query 1234 1235#### Livestatus Tables <a id="livestatus-tables"></a> 1236 1237 Table | Join |Description 1238 --------------|-----------|---------------------------- 1239 hosts | | host config and status attributes, services counter 1240 hostgroups | | hostgroup config, status attributes and host/service counters 1241 services | hosts | service config and status attributes 1242 servicegroups | | servicegroup config, status attributes and service counters 1243 contacts | | contact config and status attributes 1244 contactgroups | | contact config, members 1245 commands | | command name and line 1246 status | | programstatus, config and stats 1247 comments | services | status attributes 1248 downtimes | services | status attributes 1249 timeperiods | | name and is inside flag 1250 endpoints | | config and status attributes 1251 log | services, hosts, contacts, commands | parses [compatlog](09-object-types.md#objecttype-compatlogger) and shows log attributes 1252 statehist | hosts, services | parses [compatlog](09-object-types.md#objecttype-compatlogger) and aggregates state change attributes 1253 hostsbygroup | hostgroups | host attributes grouped by hostgroup and its attributes 1254 servicesbygroup | servicegroups | service attributes grouped by servicegroup and its attributes 1255 servicesbyhostgroup | hostgroups | service attributes grouped by hostgroup and its attributes 1256 1257The `commands` table is populated with `CheckCommand`, `EventCommand` and `NotificationCommand` objects. 1258 1259A detailed list on the available table attributes can be found in the [Livestatus Schema documentation](24-appendix.md#schema-livestatus). 1260