1# Technical Concepts <a id="technical-concepts"></a> 2 3This chapter provides technical concepts and design insights 4into specific Icinga 2 components such as: 5 6* [Application](19-technical-concepts.md#technical-concepts-application) 7* [Configuration](19-technical-concepts.md#technical-concepts-configuration) 8* [Features](19-technical-concepts.md#technical-concepts-features) 9* [Check Scheduler](19-technical-concepts.md#technical-concepts-check-scheduler) 10* [Checks](19-technical-concepts.md#technical-concepts-checks) 11* [Cluster](19-technical-concepts.md#technical-concepts-cluster) 12* [TLS Network IO](19-technical-concepts.md#technical-concepts-tls-network-io) 13 14## Application <a id="technical-concepts-application"></a> 15 16### CLI Commands <a id="technical-concepts-application-cli-commands"></a> 17 18The Icinga 2 application is managed with different CLI sub commands. 19`daemon` takes care about loading the configuration files, running the 20application as daemon, etc. 21Other sub commands allow to enable features, generate and request 22TLS certificates or enter the debug console. 23 24The main entry point for each CLI command parses the command line 25parameters and then triggers the required actions. 26 27### daemon CLI command <a id="technical-concepts-application-cli-commands-daemon"></a> 28 29This CLI command loads the configuration files, starting with `icinga2.conf`. 30The [configuration compiler](19-technical-concepts.md#technical-concepts-configuration) parses the 31file and detects additional file includes, constants, and any other DSL 32specific declaration. 33 34At this stage, the configuration will already be checked against the 35defined grammar in the scanner, and custom object validators will also be 36checked. 37 38If the user provided `-C/--validate`, the CLI command returns with the 39validation exit code. 40 41When running as daemon, additional parameters are checked, e.g. whether 42this application was triggered by a reload, needs to daemonize with fork() 43involved and update the object's authority. The latter is important for 44HA-enabled cluster zones. 45 46## Configuration <a id="technical-concepts-configuration"></a> 47 48### Lexer <a id="technical-concepts-configuration-lexer"></a> 49 50The lexer stage does not understand the DSL itself, it only 51maps specific character sequences into identifiers. 52 53This allows Icinga to detect the beginning of a string with `"`, 54reading the following characters and determining the end of the 55string with again `"`. 56 57Other parts covered by the lexer a escape sequences insides a string, 58e.g. `"\"abc"`. 59 60The lexer also identifiers logical operators, e.g. `&` or `in`, 61specific keywords like `object`, `import`, etc. and comment blocks. 62 63Please check `lib/config/config_lexer.ll` for details. 64 65Icinga uses [Flex](https://github.com/westes/flex) in the first stage. 66 67> Flex (The Fast Lexical Analyzer) 68> 69> Flex is a fast lexical analyser generator. It is a tool for generating programs 70> that perform pattern-matching on text. Flex is a free (but non-GNU) implementation 71> of the original Unix lex program. 72 73### Parser <a id="technical-concepts-configuration-parser"></a> 74 75The parser stage puts the identifiers from the lexer into more 76context with flow control and sequences. 77 78The following comparison is parsed into a left term, an operator 79and a right term. 80 81``` 82x > 5 83``` 84 85The DSL contains many elements which require a specific order, 86and sometimes only a left term for example. 87 88The parser also takes care of parsing an object declaration for 89example. It already knows from the lexer that `object` marks the 90beginning of an object. It then expects a type string afterwards, 91and the object name - which can be either a string with double quotes 92or a previously defined constant. 93 94An opening bracket `{` in this specific context starts the object 95scope, which also is stored for later scope specific variable access. 96 97If there's an apply rule defined, this follows the same principle. 98The config parser detects the scope of an apply rule and generates 99Icinga 2 C++ code for the parsed string tokens. 100 101``` 102assign where host.vars.sla == "24x7" 103``` 104 105is parsed into an assign token identifier, and the string expression 106is compiled into a new `ApplyExpression` object. 107 108The flow control inside the parser ensures that for example `ignore where` 109can only be defined when a previous `assign where` was given - or when 110inside an apply for rule. 111 112Another example are specific object types which allow assign expression, 113specifically group objects. Others objects must throw a configuration error. 114 115Please check `lib/config/config_parser.yy` for more details, 116and the [language reference](17-language-reference.md#language-reference) chapter for 117documented DSL keywords and sequences. 118 119> Icinga uses [Bison](https://en.wikipedia.org/wiki/GNU_bison) as parser generator 120> which reads a specification of a context-free language, warns about any parsing 121> ambiguities, and generates a parser in C++ which reads sequences of tokens and 122> decides whether the sequence conforms to the syntax specified by the grammar. 123 124 125### Compiler <a id="technical-concepts-configuration-compiler"></a> 126 127The config compiler initializes the scanner inside the [lexer](19-technical-concepts.md#technical-concepts-configuration-lexer) 128stage. 129 130The configuration files are parsed into memory from inside the [daemon CLI command](19-technical-concepts.md#technical-concepts-application-cli-commands-daemon) 131which invokes the config validation in `ValidateConfigFiles()`. This compiles the 132files into an AST expression which is executed. 133 134At this stage, the expressions generate so-called "config items" which 135are a pre-stage of the later compiled object. 136 137`ConfigItem::CommitItems` takes care of committing the items, and doing a 138rollback on failure. It also checks against matching apply rules from the previous run 139and generates statistics about the objects which can be seen by the config validation. 140 141`ConfigItem::CommitNewItems` collects the registered types and items, 142and checks for a specific required order, e.g. a service object needs 143a host object first. 144 145The following stages happen then: 146 147- **Commit**: A workqueue then commits the items in a parallel fashion for this specific type. The object gets its name, and the AST expression is executed. It is then registered into the item into `m_Object` as reference. 148- **OnAllConfigLoaded**: Special signal for each object to pre-load required object attributes, resolve group membership, initialize functions and timers. 149- **CreateChildObjects**: Run apply rules for this specific type. 150- **CommitNewItems**: Apply rules may generate new config items, this is to ensure that they again run through the stages. 151 152Note that the items are now committed and the configuration is validated and loaded 153into memory. The final config objects are not yet activated though. 154 155This only happens after the validation, when the application is about to be run 156with `ConfigItem::ActivateItems`. 157 158Each item has an object created in `m_Object` which is checked in a loop. 159Again, the dependency order of activated objects is important here, e.g. logger features come first, then 160config objects and last the checker, api, etc. features. This is done by sorting the objects 161based on their type specific activation priority. 162 163The following signals are triggered in the stages: 164 165- **PreActivate**: Setting the `active` flag for the config object. 166- **Activate**: Calls `Start()` on the object, sets the local HA authority and notifies subscribers that this object is now activated (e.g. for config updates in the DB backend). 167 168 169### References <a id="technical-concepts-configuration-references"></a> 170 171* [The Icinga Config Compiler: An Overview](https://www.netways.de/blog/2018/07/12/the-icinga-config-compiler-an-overview/) 172* [A parser/lexer/compiler for the Leonardo language](https://github.com/EmilGedda/Leonardo) 173* [I wrote a programming language. Here’s how you can, too.](https://medium.freecodecamp.org/the-programming-language-pipeline-91d3f449c919) 174* [http://onoffswitch.net/building-a-custom-lexer/](http://onoffswitch.net/building-a-custom-lexer/) 175* [Writing an Interpreter with Lex, Yacc, and Memphis](http://memphis.compilertools.net/interpreter.html) 176* [Flex](https://github.com/westes/flex) 177* [GNU Bison](https://www.gnu.org/software/bison/) 178 179## Core <a id="technical-concepts-core"></a> 180 181### Core: Reload Handling <a id="technical-concepts-core-reload"></a> 182 183The initial design of the reload state machine looks like this: 184 185* receive reload signal SIGHUP 186* fork a child process, start configuration validation in parallel work queues 187* parent process continues with old configuration objects and the event scheduling 188(doing checks, replicating cluster events, triggering alert notifications, etc.) 189* validation NOT ok: child process terminates, parent process continues with old configuration state 190* validation ok: child process signals parent process to terminate and save its current state (all events until now) into the icinga2 state file 191* parent process shuts down writing icinga2.state file 192* child process waits for parent process gone, reads the icinga2 state file and synchronizes all historical and status data 193* child becomes the new session leader 194 195Since Icinga 2.6, there are two processes when checked with `ps aux | grep icinga2` or `pidof icinga2`. 196This was to ensure that feature file descriptors don't leak into the plugin process (e.g. DB IDO MySQL sockets). 197 198Icinga 2.9 changed the reload handling a bit with SIGUSR2 signals 199and systemd notifies. 200 201With systemd, it could occur that the tree was broken thus resulting 202in killing all remaining processes on stop, instead of a clean exit. 203You can read the full story [here](https://github.com/Icinga/icinga2/issues/7309). 204 205With 2.11 you'll now see 3 processes: 206 207- The umbrella process which takes care about signal handling and process spawning/stopping 208- The main process with the check scheduler, notifications, etc. 209- The execution helper process 210 211During reload, the umbrella process spawns a new reload process which validates the configuration. 212Once successful, the new reload process signals the umbrella process that it is finished. 213The umbrella process forwards the signal and tells the old main process to shutdown. 214The old main process writes the icinga2.state file. The umbrella process signals 215the reload process that the main process terminated. 216 217The reload process was in idle wait before, and now continues to read the written 218state file and run the event loop (checks, notifications, "events", ...). The reload 219process itself also spawns the execution helper process again. 220 221 222## Features <a id="technical-concepts-features"></a> 223 224Features are implemented in specific libraries and can be enabled 225using CLI commands. 226 227Features either write specific data or receive data. 228 229Examples for writing data: [DB IDO](14-features.md#db-ido), [Graphite](14-features.md#graphite-carbon-cache-writer), [InfluxDB](14-features.md#influxdb-writer). [GELF](14-features.md#gelfwriter), etc. 230Examples for receiving data: [REST API](12-icinga2-api.md#icinga2-api), etc. 231 232The implementation of features makes use of existing libraries 233and functionality. This makes the code more abstract, but shorter 234and easier to read. 235 236Features register callback functions on specific events they want 237to handle. For example the `GraphiteWriter` feature subscribes to 238new CheckResult events. 239 240Each time Icinga 2 receives and processes a new check result, this 241event is triggered and forwarded to all subscribers. 242 243The GraphiteWriter feature calls the registered function and processes 244the received data. Features which connect Icinga 2 to external interfaces 245normally parse and reformat the received data into an applicable format. 246 247Since this check result signal is blocking, many of the features include a work queue 248with asynchronous task handling. 249 250The GraphiteWriter uses a TCP socket to communicate with the carbon cache 251daemon of Graphite. The InfluxDBWriter is instead writing bulk metric messages 252to InfluxDB's HTTP API, similar to Elasticsearch. 253 254 255## Check Scheduler <a id="technical-concepts-check-scheduler"></a> 256 257The check scheduler starts a thread which loops forever. It waits for 258check events being inserted into `m_IdleCheckables`. 259 260If the current pending check event number is larger than the configured 261max concurrent checks, the thread waits up until it there's slots again. 262 263In addition, further checks on enabled checks, check periods, etc. are 264performed. Once all conditions have passed, the next check timestamp is 265calculated and updated. This also is the timestamp where Icinga expects 266a new check result ("freshness check"). 267 268The object is removed from idle checkables, and inserted into the 269pending checkables list. This can be seen via REST API metrics for the 270checker component feature as well. 271 272The actual check execution happens asynchronously using the application's 273thread pool. 274 275Once the check returns, it is removed from pending checkables and again 276inserted into idle checkables. This ensures that the scheduler takes this 277checkable event into account in the next iteration. 278 279### Start <a id="technical-concepts-check-scheduler-start"></a> 280 281When checkable objects get activated during the startup phase, 282the checker feature registers a handler for this event. This is due 283to the fact that the `checker` feature is fully optional, and e.g. not 284used on command endpoint clients. 285 286Whenever such an object activation signal is triggered, Icinga 2 checks 287whether it is [authoritative for this object](19-technical-concepts.md#technical-concepts-cluster-ha-object-authority). 288This means that inside an HA enabled zone with two endpoints, only non-paused checkable objects are 289actively inserted into the idle checkable list for the check scheduler. 290 291### Initial Check <a id="technical-concepts-check-scheduler-initial"></a> 292 293When a new checkable object (host or service) is initially added to the 294configuration, Icinga 2 performs the following during startup: 295 296* `Checkable::Start()` is called and calculates the first check time 297* With a spread delta, the next check time is actually set. 298 299If the next check should happen within a time frame of 60 seconds, 300Icinga 2 calculates a delta from a random value. The minimum of `check_interval` 301and 60 seconds is used as basis, multiplied with a random value between 0 and 1. 302 303In the best case, this check gets immediately executed after application start. 304The worst case scenario is that the check is scheduled 60 seconds after start 305the latest. 306 307The reasons for delaying and spreading checks during startup is that 308the application typically needs more resources at this time (cluster connections, 309feature warmup, initial syncs, etc.). Immediate check execution with 310thousands of checks could lead into performance problems, and additional 311events for each received check results. 312 313Therefore the initial check window is 60 seconds on application startup, 314random seed for all checkables. This is not predictable over multiple restarts 315for specific checkable objects, the delta changes every time. 316 317### Scheduling Offset <a id="technical-concepts-check-scheduler-offset"></a> 318 319There's a high chance that many checkable objects get executed at the same time 320and interval after startup. The initial scheduling spreads that a little, but 321Icinga 2 also attempts to ensure to keep fixed intervals, even with high check latency. 322 323During startup, Icinga 2 calculates the scheduling offset from a random number: 324 325* `Checkable::Checkable()` calls `SetSchedulingOffset()` with `Utility::Random()` 326* The offset is a pseudo-random integral value between `0` and `RAND_MAX`. 327 328Whenever the next check time is updated with `Checkable::UpdateNextCheck()`, 329the scheduling offset is taken into account. 330 331Depending on the state type (SOFT or HARD), either the `retry_interval` or `check_interval` 332is used. If the interval is greater than 1 second, the time adjustment is calculated in the 333following way: 334 335`now * 100 + offset` divided by `interval * 100`, using the remainder (that's what `fmod()` is for) 336and dividing this again onto base 100. 337 338Example: offset is 6500, interval 300, now is 1542190472. 339 340``` 3411542190472 * 100 + 6500 = 154219053714 342300 * 100 = 30000 343154219053714 / 30000 = 5140635.1238 344 345(5140635.1238 - 5140635.0) * 30000 = 3714 3463714 / 100 = 37.14 347``` 348 34937.15 seconds as an offset would be far too much, so this is again used as a calculation divider for the 350real offset with the base of 5 times the actual interval. 351 352Again, the remainder is calculated from the offset and `interval * 5`. This is divided onto base 100 again, 353with an additional 0.5 seconds delay. 354 355Example: offset is 6500, interval 300. 356 357``` 3586500 / 300 = 21.666666666666667 359(21.666666666666667 - 21.0) * 300 = 200 360200 / 100 = 2 3612 + 0.5 = 2.5 362``` 363 364The minimum value between the first adjustment and the second offset calculation based on the interval is 365taken, in the above example `2.5` wins. 366 367The actual next check time substracts the adjusted time from the future interval addition to provide 368a more widespread scheduling time among all checkable objects. 369 370`nextCheck = now - adj + interval` 371 372You may ask, what other values can happen with this offset calculation. Consider calculating more examples 373with different interval settings. 374 375Example: offset is 34567, interval 60, now is 1542190472. 376 377``` 3781542190472 * 100 + 34567 = 154219081767 37960 * 100 = 6000 380154219081767 / 6000 = 25703180.2945 381(25703180.2945 - 25703180.0) * 6000 / 100 = 17.67 382 38334567 / 60 = 576.116666666666667 384(576.116666666666667 - 576.0) * 60 / 100 + 0.5 = 1.2 385``` 386 387`1m` interval starts at `now + 1.2s`. 388 389Example: offset is 12345, interval 86400, now is 1542190472. 390 391``` 3921542190472 * 100 + 12345 = 154219059545 39386400 * 100 = 8640000 394154219059545 / 8640000 = 17849.428188078703704 395(17849.428188078703704 - 17849) * 8640000 = 3699545 3963699545 / 100 = 36995.45 397 39812345 / 86400 = 0.142881944444444 3990.142881944444444 * 86400 / 100 + 0.5 = 123.95 400``` 401 402`1d` interval starts at `now + 2m4s`. 403 404> **Note** 405> 406> In case you have a better algorithm at hand, feel free to discuss this in a PR on GitHub. 407> It needs to fulfill two things: 1) spread and shuffle execution times on each `next_check` update 408> 2) not too narrowed window for both long and short intervals 409> Application startup and initial checks need to be handled with care in a slightly different 410> fashion. 411 412When `SetNextCheck()` is called, there are signals registered. One of them sits 413inside the `CheckerComponent` class whose handler `CheckerComponent::NextCheckChangedHandler()` 414deletes/inserts the next check event from the scheduling queue. This basically 415is a list with multiple indexes with the keys for scheduling info and the object. 416 417 418## Checks<a id="technical-concepts-checks"></a> 419 420### Check Latency and Execution Time <a id="technical-concepts-checks-latency"></a> 421 422Each check command execution logs the start and end time where 423Icinga 2 (and the end user) is able to calculate the plugin execution time from it. 424 425```cpp 426GetExecutionEnd() - GetExecutionStart() 427``` 428 429The higher the execution time, the higher the command timeout must be set. Furthermore 430users and developers are encouraged to look into plugin optimizations to minimize the 431execution time. Sometimes it is better to let an external daemon/script do the checks 432and feed them back via REST API. 433 434Icinga 2 stores the scheduled start and end time for a check. If the actual 435check execution time differs from the scheduled time, e.g. due to performance 436problems or limited execution slots (concurrent checks), this value is stored 437and computed from inside the check result. 438 439The difference between the two deltas is called `check latency`. 440 441```cpp 442(GetScheduleEnd() - GetScheduleStart()) - CalculateExecutionTime() 443``` 444 445### Severity <a id="technical-concepts-checks-severity"></a> 446 447The severity attribute is introduced with Icinga v2.11 and provides 448a bit mask calculated value from specific checkable object states. 449 450The severity value is pre-calculated for visualization interfaces 451such as Icinga Web which sorts the problem dashboard by severity by default. 452 453The higher the severity number is, the more important the problem is. 454 455Flags: 456 457```cpp 458/** 459 * Severity Flags 460 * 461 * @ingroup icinga 462 */ 463enum SeverityFlag 464{ 465 SeverityFlagDowntime = 1, 466 SeverityFlagAcknowledgement = 2, 467 SeverityFlagHostDown = 4, 468 SeverityFlagUnhandled = 8, 469 SeverityFlagPending = 16, 470 SeverityFlagWarning = 32, 471 SeverityFlagUnknown = 64, 472 SeverityFlagCritical = 128, 473}; 474``` 475 476 477Host: 478 479```cpp 480 /* OK/Warning = Up, Critical/Unknown = Down */ 481 if (!HasBeenChecked()) 482 severity |= SeverityFlagPending; 483 else if (state == ServiceUnknown) 484 severity |= SeverityFlagCritical; 485 else if (state == ServiceCritical) 486 severity |= SeverityFlagCritical; 487 488 if (IsInDowntime()) 489 severity |= SeverityFlagDowntime; 490 else if (IsAcknowledged()) 491 severity |= SeverityFlagAcknowledgement; 492 else 493 severity |= SeverityFlagUnhandled; 494``` 495 496 497Service: 498 499```cpp 500 if (!HasBeenChecked()) 501 severity |= SeverityFlagPending; 502 else if (state == ServiceWarning) 503 severity |= SeverityFlagWarning; 504 else if (state == ServiceUnknown) 505 severity |= SeverityFlagUnknown; 506 else if (state == ServiceCritical) 507 severity |= SeverityFlagCritical; 508 509 if (IsInDowntime()) 510 severity |= SeverityFlagDowntime; 511 else if (IsAcknowledged()) 512 severity |= SeverityFlagAcknowledgement; 513 else if (m_Host->GetProblem()) 514 severity |= SeverityFlagHostDown; 515 else 516 severity |= SeverityFlagUnhandled; 517``` 518 519 520 521## Cluster <a id="technical-concepts-cluster"></a> 522 523This documentation refers to technical roles between cluster 524endpoints. 525 526- The `server` or `parent` role accepts incoming connection attempts and handles requests 527- The `client` role actively connects to remote endpoints receiving config/commands, requesting certificates, etc. 528 529A client role is not necessarily bound to the Icinga agent. 530It may also be a satellite which actively connects to the 531master. 532 533### Communication <a id="technical-concepts-cluster-communication"></a> 534 535Icinga 2 uses its own certificate authority (CA) by default. The 536public and private CA keys can be generated on the signing master. 537 538Each node certificate must be signed by the private CA key. 539 540Note: The following description uses `parent node` and `child node`. 541This also applies to nodes in the same cluster zone. 542 543During the connection attempt, a TLS handshake is performed. 544If the public certificate of a child node is not signed by the same 545CA, the child node is not trusted and the connection will be closed. 546 547If the TLS handshake succeeds, the parent node reads the 548certificate's common name (CN) of the child node and looks for 549a local Endpoint object name configuration. 550 551If there is no Endpoint object found, further communication 552(runtime and config sync, etc.) is terminated. 553 554The child node also checks the CN from the parent node's public 555certificate. If the child node does not find any local Endpoint 556object name configuration, it will not trust the parent node. 557 558Both checks prevent accepting cluster messages from an untrusted 559source endpoint. 560 561If an Endpoint match was found, there is one additional security 562mechanism in place: Endpoints belong to a Zone hierarchy. 563 564Several cluster messages can only be sent "top down", others like 565check results are allowed being sent from the child to the parent node. 566 567Once this check succeeds the cluster messages are exchanged and processed. 568 569 570### CSR Signing <a id="technical-concepts-cluster-csr-signing"></a> 571 572In order to make things easier, Icinga 2 provides built-in methods 573to allow child nodes to request a signed certificate from the 574signing master. 575 576Icinga 2 v2.8 introduces the possibility to request certificates 577from indirectly connected nodes. This is required for multi level 578cluster environments with masters, satellites and agents. 579 580CSR Signing in general starts with the master setup. This step 581ensures that the master is in a working CSR signing state with: 582 583* public and private CA key in `/var/lib/icinga2/ca` 584* private `TicketSalt` constant defined inside the `api` feature 585* Cluster communication is ready and Icinga 2 listens on port 5665 586 587The child node setup which is run with CLI commands will now 588attempt to connect to the parent node. This is not necessarily 589the signing master instance, but could also be a parent satellite node. 590 591During this process the child node asks the user to verify the 592parent node's public certificate to prevent MITM attacks. 593 594There are two methods to request signed certificates: 595 596* Add the ticket into the request. This ticket was generated on the master 597beforehand and contains hashed details for which client it has been created. 598The signing master uses this information to automatically sign the certificate 599request. 600 601* Do not add a ticket into the request. It will be sent to the signing master 602which stores the pending request. Manual user interaction with CLI commands 603is necessary to sign the request. 604 605The certificate request is sent as `pki::RequestCertificate` cluster 606message to the parent node. 607 608If the parent node is not the signing master, it stores the request 609in `/var/lib/icinga2/certificate-requests` and forwards the 610cluster message to its parent node. 611 612Once the message arrives on the signing master, it first verifies that 613the sent certificate request is valid. This is to prevent unwanted errors 614or modified requests from the "proxy" node. 615 616After verification, the signing master checks if the request contains 617a valid signing ticket. It hashes the certificate's common name and 618compares the value to the received ticket number. 619 620If the ticket is valid, the certificate request is immediately signed 621with CA key. The request is sent back to the client inside a `pki::UpdateCertificate` 622cluster message. 623 624If the child node was not the certificate request origin, it only updates 625the cached request for the child node and send another cluster message 626down to its child node (e.g. from a satellite to an agent). 627 628 629If no ticket was specified, the signing master waits until the 630`ca sign` CLI command manually signed the certificate. 631 632> **Note** 633> 634> Push notifications for manual request signing is not yet implemented (TODO). 635 636Once the child node reconnects it synchronizes all signed certificate requests. 637This takes some minutes and requires all nodes to reconnect to each other. 638 639 640#### CSR Signing: Clients without parent connection <a id="technical-concepts-cluster-csr-signing-clients-no-connection"></a> 641 642There is an additional scenario: The setup on a child node does 643not necessarily need a connection to the parent node. 644 645This mode leaves the node in a semi-configured state. You need 646to manually copy the master's public CA key into `/var/lib/icinga2/certs/ca.crt` 647on the client before starting Icinga 2. 648 649> **Note** 650> 651> The `client` in this case can be either a satellite or an agent. 652 653The parent node needs to actively connect to the child node. 654Once this connections succeeds, the child node will actively 655request a signed certificate. 656 657The update procedure works the same way as above. 658 659### High Availability <a id="technical-concepts-cluster-ha"></a> 660 661General high availability is automatically enabled between two endpoints in the same 662cluster zone. 663 664**This requires the same configuration and enabled features on both nodes.** 665 666HA zone members trust each other and share event updates as cluster messages. 667This includes for example check results, next check timestamp updates, acknowledgements 668or notifications. 669 670This ensures that both nodes are synchronized. If one node goes away, the 671remaining node takes over and continues as normal. 672 673#### High Availability: Object Authority <a id="technical-concepts-cluster-ha-object-authority"></a> 674 675Cluster nodes automatically determine the authority for configuration 676objects. By default, all config objects are set to `HARunEverywhere` and 677as such the object authority is true for any config object on any instance. 678 679Specific objects can override and influence this setting, e.g. with `HARunOnce` 680instead prior to config object activation. 681 682This is done when the daemon starts and in a regular interval inside 683the ApiListener class, specifically calling `ApiListener::UpdateObjectAuthority()`. 684 685The algorithm works like this: 686 687* Determine whether this instance is assigned to a local zone and endpoint. 688* Collects all endpoints in this zone if they are connected. 689* If there's two endpoints, but only us seeing ourselves and the application start is less than 60 seconds in the past, do nothing (wait for cluster reconnect to take place, grace period). 690* Sort the collected endpoints by name. 691* Iterate over all config types and their respective objects 692 * Ignore !active objects 693 * Ignore objects which are !HARunOnce. This means, they can run multiple times in a zone and don't need an authority update. 694 * If this instance doesn't have a local zone, set authority to true. This is for non-clustered standalone environments where everything belongs to this instance. 695 * Calculate the object authority based on the connected endpoint names. 696 * Set the authority (true or false) 697 698The object authority calculation works "offline" without any message exchange. 699Each instance alculates the SDBM hash of the config object name, puts that in contrast 700modulo the connected endpoints size. 701This index is used to lookup the corresponding endpoint in the connected endpoints array, 702including the local endpoint. Whether the local endpoint is equal to the selected endpoint, 703or not, this sets the authority to `true` or `false`. 704 705```cpp 706authority = endpoints[Utility::SDBM(object->GetName()) % endpoints.size()] == my_endpoint; 707``` 708 709`ConfigObject::SetAuthority(bool authority)` triggers the following events: 710 711* Authority is true and object now paused: Resume the object and set `paused` to `false`. 712* Authority is false, object not paused: Pause the object and set `paused` to true. 713 714**This results in activated but paused objects on one endpoint.** You can verify 715that by querying the `paused` attribute for all objects via REST API 716or debug console on both endpoints. 717 718Endpoints inside a HA zone calculate the object authority independent from each other. 719This object authority is important for selected features explained below. 720 721Since features are configuration objects too, you must ensure that all nodes 722inside the HA zone share the same enabled features. If configured otherwise, 723one might have a checker feature on the left node, nothing on the right node. 724This leads to late check results because one half is not executed by the right 725node which holds half of the object authorities. 726 727By default, features are enabled to "Run-Everywhere". Specific features which 728support HA awareness, provide the `enable_ha` configuration attribute. When `enable_ha` 729is set to `true` (usually the default), "Run-Once" is set and the feature pauses on one side. 730 731``` 732vim /etc/icinga2/features-enabled/graphite.conf 733 734object GraphiteWriter "graphite" { 735 ... 736 enable_ha = true 737} 738``` 739 740Once such a feature is paused, there won't be any more event handling, e.g. the Elasticsearch 741feature won't process any checkresults nor write to the Elasticsearch REST API. 742 743When the cluster connection drops, the feature configuration object is updated with 744the new object authority by the ApiListener timer and resumes its operation. You can see 745that by grepping the log file for `resumed` and `paused`. 746 747``` 748[2018-10-24 13:28:28 +0200] information/GraphiteWriter: 'g-ha' paused. 749``` 750 751``` 752[2018-10-24 13:28:28 +0200] information/GraphiteWriter: 'g-ha' resumed. 753``` 754 755Specific features with HA capabilities are explained below. 756 757#### High Availability: Checker <a id="technical-concepts-cluster-ha-checker"></a> 758 759The `checker` feature only executes checks for `Checkable` objects (Host, Service) 760where it is authoritative. 761 762That way each node only executes checks for a segment of the overall configuration objects. 763 764The cluster message routing ensures that all check results are synchronized 765to nodes which are not authoritative for this configuration object. 766 767 768#### High Availability: Notifications <a id="technical-concepts-cluster-notifications"></a> 769 770The `notification` feature only sends notifications for `Notification` objects 771where it is authoritative. 772 773That way each node only executes notifications for a segment of all notification objects. 774 775Notified users and other event details are synchronized throughout the cluster. 776This is required if for example the DB IDO feature is active on the other node. 777 778#### High Availability: DB IDO <a id="technical-concepts-cluster-ha-ido"></a> 779 780If you don't have HA enabled for the IDO feature, both nodes will 781write their status and historical data to their own separate database 782backends. 783 784In order to avoid data separation and a split view (each node would require its 785own Icinga Web 2 installation on top), the high availability option was added 786to the DB IDO feature. This is enabled by default with the `enable_ha` setting. 787 788This requires a central database backend. Best practice is to use a MySQL cluster 789with a virtual IP. 790 791Both Icinga 2 nodes require the connection and credential details configured in 792their DB IDO feature. 793 794During startup Icinga 2 calculates whether the feature configuration object 795is authoritative on this node or not. The order is an alpha-numeric 796comparison, e.g. if you have `master1` and `master2`, Icinga 2 will enable 797the DB IDO feature on `master2` by default. 798 799If the connection between endpoints drops, the object authority is re-calculated. 800 801In order to prevent data duplication in a split-brain scenario where both 802nodes would write into the same database, there is another safety mechanism 803in place. 804 805The split-brain decision which node will write to the database is calculated 806from a quorum inside the `programstatus` table. Each node 807verifies whether the `endpoint_name` column is not itself on database connect. 808In addition to that the DB IDO feature compares the `last_update_time` column 809against the current timestamp plus the configured `failover_timeout` offset. 810 811That way only one active DB IDO feature writes to the database, even if they 812are not currently connected in a cluster zone. This prevents data duplication 813in historical tables. 814 815### Health Checks <a id="technical-concepts-cluster-health-checks"></a> 816 817#### cluster-zone <a id="technical-concepts-cluster-health-checks-cluster-zone"></a> 818 819This built-in check provides the possibility to check for connectivity between 820zones. 821 822If you for example need to know whether the `master` zone is connected and processing 823messages with the child zone called `satellite` in this example, you can configure 824the [cluster-zone](10-icinga-template-library.md#itl-icinga-cluster-zone) check as new service on all `master` zone hosts. 825 826``` 827vim /etc/zones.d/master/host1.conf 828 829object Service "cluster-zone-satellite" { 830 check_command = "cluster-zone" 831 host_name = "host1" 832 833 vars.cluster_zone = "satellite" 834} 835``` 836 837The check itself changes to NOT-OK if one or more child endpoints in the child zone 838are not connected to parent zone endpoints. 839 840In addition to the overall connectivity check, the log lag is calculated based 841on the to-be-sent replay log. Each instance stores that for its configured endpoint 842objects. 843 844This health check iterates over the target zone (`cluster_zone`) and their endpoints. 845 846The log lag is greater than zero if 847 848* the replay log synchronization is in progress and not yet finished or 849* the endpoint is not connected, and no replay log sync happened (obviously). 850 851The final log lag value is the worst value detected. If satellite1 has a log lag of 852`1.5` and satellite2 only has `0.5`, the computed value will be `1.5.`. 853 854You can control the check state by using optional warning and critical thresholds 855for the log lag value. 856 857If this service exists multiple times, e.g. for each master host object, the log lag 858may differ based on the execution time. This happens for example on restart of 859an instance when the log replay is in progress and a health check is executed at different 860times. 861If the endpoint is not connected, both master instances may have saved a different log replay 862position from the last synchronisation. 863 864The lag value is returned as performance metric key `slave_lag`. 865 866Icinga 2 v2.9+ adds more performance metrics for these values: 867 868* `last_messages_sent` and `last_messages_received` as UNIX timestamp 869* `sum_messages_sent_per_second` and `sum_messages_received_per_second` 870* `sum_bytes_sent_per_second` and `sum_bytes_received_per_second` 871 872 873### Config Sync <a id="technical-concepts-cluster-config-sync"></a> 874 875The visible feature for the user is to put configuration files in `/etc/icinga2/zones.d/<zonename>` 876and have them synced automatically to all involved zones and endpoints. 877 878This not only includes host and service objects being checked 879in a satellite zone, but also additional config objects such as 880commands, groups, timeperiods and also templates. 881 882Additional thoughts and complexity added: 883 884- Putting files into zone directory names removes the burden to set the `zone` attribute on each object in this directory. This is done automatically by the config compiler. 885- Inclusion of `zones.d` happens automatically, the user shouldn't be bothered about this. 886- Before the REST API was created, only static configuration files in `/etc/icinga2/zones.d` existed. With the addition of config packages, additional `zones.d` targets must be registered (e.g. used by the Director) 887- Only one config master is allowed. This one identifies itself with configuration files in `/etc/icinga2/zones.d`. This is not necessarily the zone master seen in the debug logs, that one is important for message routing internally. 888- Objects and templates which cannot be bound into a specific zone (e.g. hosts in the satellite zone) must be made available "globally". 889- Users must be able to deny the synchronisation of specific zones, e.g. for security reasons. 890 891#### Config Sync: Config Master <a id="technical-concepts-cluster-config-sync-config-master"></a> 892 893All zones must be configured and included in the `zones.conf` config file beforehand. 894The zone names are the identifier for the directories underneath the `/etc/icinga2/zones.d` 895directory. If a zone is not configured, it will not be included in the config sync - keep this 896in mind for troubleshooting. 897 898When the config master starts, the content of `/etc/icinga2/zones.d` is automatically 899included. There's no need for an additional entry in `icinga2.conf` like `conf.d`. 900You can verify this by running the config validation on debug level: 901 902``` 903icinga2 daemon -C -x debug | grep 'zones.d' 904 905[2019-06-19 15:16:19 +0200] notice/ConfigCompiler: Compiling config file: /etc/icinga2/zones.d/global-templates/commands.conf 906``` 907 908Once the config validation succeeds, the startup routine for the daemon 909copies the files into the "production" directory in `/var/lib/icinga2/api/zones`. 910This directory is used for all endpoints where Icinga stores the received configuration. 911With the exception of the config master retrieving this from `/etc/icinga2/zones.d` instead. 912 913These operations are logged for better visibility. 914 915``` 916[2019-06-19 15:26:38 +0200] information/ApiListener: Copying 1 zone configuration files for zone 'global-templates' to '/var/lib/icinga2/api/zones/global-templates'. 917[2019-06-19 15:26:38 +0200] information/ApiListener: Updating configuration file: /var/lib/icinga2/api/zones/global-templates//_etc/commands.conf 918``` 919 920The master is finished at this point. Depending on the cluster configuration, 921the next iteration is a connected endpoint after successful TLS handshake and certificate 922authentication. 923 924It calls `SendConfigUpdate(client)` which sends the [config::Update](19-technical-concepts.md#technical-concepts-json-rpc-messages-config-update) 925JSON-RPC message including all required zones and their configuration file content. 926 927 928#### Config Sync: Receive Config <a id="technical-concepts-cluster-config-sync-receive-config"></a> 929 930The secondary master endpoint and endpoints in a child zone will be connected to the config 931master. The endpoint receives the [config::Update](19-technical-concepts.md#technical-concepts-json-rpc-messages-config-update) 932JSON-RPC message and processes the content in `ConfigUpdateHandler()`. This method checks 933whether config should be accepted. In addition to that, it locks a local mutex to avoid race conditions 934with multiple syncs in parallel. 935 936After that, the received configuration content is analysed. 937 938> **Note** 939> 940> The cluster design allows that satellite endpoints may connect to the secondary master first. 941> There is no immediate need to always connect to the config master first, especially since 942> the satellite endpoints don't know that. 943> 944> The secondary master not only stores the master zone config files, but also all child zones. 945> This is also the case for any HA enabled zone with more than one endpoint. 946 947 9482.11 puts the received configuration files into a staging directory in 949`/var/lib/icinga2/api/zones-stage`. Previous versions directly wrote the 950files into production which could have led to broken configuration on the 951next manual restart. 952 953``` 954[2019-06-19 16:08:29 +0200] information/ApiListener: New client connection for identity 'master1' to [127.0.0.1]:5665 955[2019-06-19 16:08:30 +0200] information/ApiListener: Applying config update from endpoint 'master1' of zone 'master'. 956[2019-06-19 16:08:30 +0200] information/ApiListener: Received configuration for zone 'agent' from endpoint 'master1'. Comparing the checksums. 957[2019-06-19 16:08:30 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/agent//_etc/host.conf' for zone 'agent'. 958[2019-06-19 16:08:30 +0200] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones-stage/agent' (176 Bytes). 959[2019-06-19 16:08:30 +0200] information/ApiListener: Received configuration for zone 'master' from endpoint 'master1'. Comparing the checksums. 960[2019-06-19 16:08:30 +0200] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones-stage/master' (17 Bytes). 961[2019-06-19 16:08:30 +0200] information/ApiListener: Received configuration from endpoint 'master1' is different to production, triggering validation and reload. 962``` 963 964It then validates the received configuration in its own config stage. There is 965an parameter override in place which disables the automatic inclusion of the production 966config in `/var/lib/icinga2/api/zones`. 967 968Once completed, the reload is triggered. This follows the same configurable timeout 969as with the global reload. 970 971``` 972[2019-06-19 16:52:26 +0200] information/ApiListener: Config validation for stage '/var/lib/icinga2/api/zones-stage/' was OK, replacing into '/var/lib/icinga2/api/zones/' and triggering reload. 973[2019-06-19 16:52:27 +0200] information/Application: Got reload command: Started new instance with PID '19945' (timeout is 300s). 974[2019-06-19 16:52:28 +0200] information/Application: Reload requested, letting new process take over. 975``` 976 977Whenever the staged configuration validation fails, Icinga logs this including a reference 978to the startup log file which includes additional errors. 979 980``` 981[2019-06-19 15:45:27 +0200] critical/ApiListener: Config validation failed for staged cluster config sync in '/var/lib/icinga2/api/zones-stage/'. Aborting. Logs: '/var/lib/icinga2/api/zones-stage//startup.log' 982``` 983 984 985#### Config Sync: Changes and Reload <a id="technical-concepts-cluster-config-sync-changes-reload"></a> 986 987Whenever a new configuration is received, it is validated and upon success, the 988daemon automatically reloads. While the daemon continues with checks, the reload 989cannot hand over open TCP connections. That being said, reloading the daemon everytime 990a configuration is synchronized would lead into many not connected endpoints. 991 992Therefore the cluster config sync checks whether the configuration files actually 993changed, and will only trigger a reload when such a change happened. 994 9952.11 calculates a checksum from each file content and compares this to the 996production configuration. Previous versions used additional metadata with timestamps from 997files which sometimes led to problems with asynchronous dates. 998 999> **Note** 1000> 1001> For compatibility reasons, the timestamp metadata algorithm is still intact, e.g. 1002> when the client is 2.11 already, but the parent endpoint is still on 2.10. 1003 1004Icinga logs a warning when this happens. 1005 1006``` 1007Received configuration update without checksums from parent endpoint satellite1. This behaviour is deprecated. Please upgrade the parent endpoint to 2.11+ 1008``` 1009 1010 1011The debug log provides more details on the actual checksums and checks. Future output 1012may change, use this solely for troubleshooting and debugging whenever the cluster 1013config sync fails. 1014 1015``` 1016[2019-06-19 16:13:16 +0200] information/ApiListener: Received configuration for zone 'agent' from endpoint 'master1'. Comparing the checksums. 1017[2019-06-19 16:13:16 +0200] debug/ApiListener: Checking for config change between stage and production. Old (3): '{"/.checksums":"7ede1276a9a32019c1412a52779804a976e163943e268ec4066e6b6ec4d15d73","/.timestamp":"ec4354b0eca455f7c2ca386fddf5b9ea810d826d402b3b6ac56ba63b55c2892c","/_etc/host.conf":"35d4823684d83a5ab0ca853c9a3aa8e592adfca66210762cdf2e54339ccf0a44"}' vs. new (3): '{"/.checksums":"84a586435d732327e2152e7c9b6d85a340cc917b89ae30972042f3dc344ea7cf","/.timestamp":"0fd6facf35e49ab1b2a161872fa7ad794564eba08624373d99d31c32a7a4c7d3","/_etc/host.conf":"0d62075e89be14088de1979644b40f33a8f185fcb4bb6ff1f7da2f63c7723fcb"}'. 1018[2019-06-19 16:13:16 +0200] debug/ApiListener: Checking /_etc/host.conf for checksum: 35d4823684d83a5ab0ca853c9a3aa8e592adfca66210762cdf2e54339ccf0a44 1019[2019-06-19 16:13:16 +0200] debug/ApiListener: Path '/_etc/host.conf' doesn't match old checksum '0d62075e89be14088de1979644b40f33a8f185fcb4bb6ff1f7da2f63c7723fcb' with new checksum '35d4823684d83a5ab0ca853c9a3aa8e592adfca66210762cdf2e54339ccf0a44'. 1020``` 1021 1022 1023#### Config Sync: Trust <a id="technical-concepts-cluster-config-sync-trust"></a> 1024 1025The config sync follows the "top down" approach, where the master endpoint in the master 1026zone is allowed to synchronize configuration to the child zone, e.g. the satellite zone. 1027 1028Endpoints in the same zone, e.g. a secondary master, receive configuration for the same 1029zone and all child zones. 1030 1031Endpoints in the satellite zone trust the parent zone, and will accept the pushed 1032configuration via JSON-RPC cluster messages. By default, this is disabled and must 1033be enabled with the `accept_config` attribute in the ApiListener feature (manually or with CLI 1034helpers). 1035 1036The satellite zone will not only accept zone configuration for its own zone, but also 1037all configured child zones. That is why it is important to configure the zone hierarchy 1038on the satellite as well. 1039 1040Child zones are not allowed to sync configuration up to the parent zone. Each Icinga instance 1041evaluates this in startup and knows on endpoint connect which config zones need to be synced. 1042 1043 1044Global zones have a special trust relationship: They are synced to all child zones, be it 1045a satellite zone or agent zone. Since checkable objects such as a Host or a Service object 1046must have only one endpoint as authority, they cannot be put into a global zone (denied by 1047the config compiler). 1048 1049Apply rules and templates are allowed, since they are evaluated in the endpoint which received 1050the synced configuration. Keep in mind that there may be differences on the master and the satellite 1051when e.g. hostgroup membership is used for assign where expressions, but the groups are only 1052available on the master. 1053 1054 1055### Cluster: Message Routing <a id="technical-concepts-cluster-message-routing"></a> 1056 1057One fundamental part of the cluster message routing is the MessageOrigin object. 1058This is created when a new JSON-RPC message is received in `JsonRpcConnection::MessageHandler()`. 1059 1060It contains 1061 1062- FromZone being extracted from the endpoint object which owns the JsonRpcConnection 1063- FromClient being the JsonRpcConnection bound to the endpoint object 1064 1065These attributes are checked in message receive api handlers for security access. E.g. whether a 1066message origin is from a child zone which is not allowed, etc. 1067This is explained in the [JSON-RPC messages](19-technical-concepts.md#technical-concepts-json-rpc-messages) chapter. 1068 1069Whenever such a message is processed on the client, it may trigger additional cluster events 1070which are sent back to other endpoints. Therefore it is key to always pass the MessageOrigin 1071`origin` when processing these messages locally. 1072 1073Example: 1074 1075- Client receives a CheckResult from another endpoint in the same zone, call it `sender` for now 1076- Calls ProcessCheckResult() to store the CR and calculcate states, notifications, etc. 1077- Calls the OnNewCheckResult() signal to trigger IDO updates 1078 1079OnNewCheckResult() also calls a registered cluster handler which forwards the CheckResult to other cluster members. 1080 1081Without any origin details, this CheckResult would be relayed to the `sender` endpoint again. 1082Which processes the message, ProcessCheckResult(), OnNewCheckResult(), sends back and so on. 1083 1084That creates a loop which our cluster protocol needs to prevent at all cost. 1085 1086RelayMessageOne() takes care of the routing. This involves fetching the targetZone for this message and its endpoints. 1087 1088- Don't relay messages to ourselves. 1089- Don't relay messages to disconnected endpoints. 1090- Don't relay the message to the zone through more than one endpoint unless this is our own zone. 1091- Don't relay messages back to the endpoint which we got the message from. **THIS** 1092- Don't relay messages back to the zone which we got the message from. 1093- Only relay message to the zone master if we're not currently the zone master. 1094 1095``` 1096 e1 is zone master, e2 and e3 are zone members. 1097 1098 Message is sent from e2 or e3: 1099 !isMaster == true 1100 targetEndpoint e1 is zone master -> send the message 1101 targetEndpoint e3 is not zone master -> skip it, avoid routing loops 1102 1103 Message is sent from e1: 1104 !isMaster == false -> send the messages to e2 and e3 being the zone routing master. 1105``` 1106 1107With passing the `origin` the following condition prevents sending a message back to sender: 1108 1109```cpp 1110if (origin && origin->FromClient && targetEndpoint == origin->FromClient->GetEndpoint()) { 1111``` 1112 1113This message then simply gets skipped for this specific Endpoint and is never sent. 1114 1115This analysis originates from a long-lasting [downtime loop bug](https://github.com/Icinga/icinga2/issues/7198). 1116 1117## TLS Network IO <a id="technical-concepts-tls-network-io"></a> 1118 1119### TLS Connection Handling <a id="technical-concepts-tls-network-io-connection-handling"></a> 1120 1121Icinga supports two connection directions, controlled via the `host` attribute 1122inside the Endpoint objects: 1123 1124* Outgoing connection attempts 1125* Incoming connection handling 1126 1127Once the connection is established, higher layers can exchange JSON-RPC and 1128HTTP messages. It doesn't matter which direction these message go. 1129 1130This offers a big advantage over single direction connections, just like 1131polling via HTTP only. Also, connections are kept alive as long as data 1132is transmitted. 1133 1134When the master connects to the child zone member(s), this requires more 1135resources there. Keep this in mind when endpoints are not reachable, the 1136TCP timeout blocks other resources. Moving a satellite zone in the middle 1137between masters and agents helps to split the tasks - the master 1138processes and stores data, deploys configuration and serves the API. The 1139satellites schedule the checks, connect to the agents and receive 1140check results. 1141 1142Agents/Clients can also connect to the parent endpoints - be it a master or 1143a satellite. This is the preferred way out of a DMZ, and also reduces the 1144overhead with connecting to e.g. 2000 agents on the master. You can 1145benchmark this when TCP connections are broken and timeouts are encountered. 1146 1147#### Master Processes Incoming Connection <a id="technical-concepts-tls-network-io-connection-handling-incoming"></a> 1148 1149* The node starts a new ApiListener, this invokes `AddListener()` 1150 * Setup TLS Context (SslContext) 1151 * Initialize global I/O engine and create a TCP acceptor 1152 * Resolve bind host/port (optional) 1153 * Listen on IPv4 and IPv6 1154 * Re-use socket address and port 1155 * Listen on port 5665 with `INT_MAX` possible sockets 1156* Spawn a new Coroutine which listens for new incoming connections as 'TCP server' pattern 1157 * Accept new connections asynchronously 1158 * Spawn a new Coroutine which handles the new client connection in a different context, Role: Server 1159 1160#### Master Connects Outgoing <a id="technical-concepts-tls-network-io-connection-handling-outgoing"></a> 1161 1162* The node starts a timer in a 10 seconds interval with `ApiReconnectTimerHandler()` as callback 1163 * Loop over all configured zones, exclude global zones and not direct parent/child zones 1164 * Get the endpoints configured in the zones, exclude: local endpoint, no 'host' attribute, already connected or in progress 1165 * Call `AddConnection()` 1166* Spawn a new Coroutine after making the TLS context 1167 * Use the global I/O engine for socket I/O 1168 * Create TLS stream 1169 * Connect to endpoint host/port details 1170 * Handle the client connection, Role: Client 1171 1172#### TLS Handshake <a id="technical-concepts-tls-network-io-connection-handling-handshake"></a> 1173 1174* Create a TLS connection in sslConn and perform an asynchronous TLS handshake 1175* Get the peer certificate 1176* Verify the presented certificate: `ssl::verify_peer` and `ssl::verify_client_once` 1177* Get the certificate CN and compare it against the endpoint name - if not matching, return and close the connection 1178 1179#### Data Exchange <a id="technical-concepts-tls-network-io-connection-data-exchange"></a> 1180 1181Everything runs through TLS, we don't use any "raw" connections nor plain message handling. 1182 1183HTTP and JSON-RPC messages share the same port and API, so additional handling is required. 1184 1185On a new connection and successful TLS handshake, the first byte is read. This either 1186is a JSON-RPC message in Netstring format starting with a number, or plain HTTP. 1187 1188``` 1189HTTP/1.1 1190 11912:{} 1192``` 1193 1194Depending on this, `ClientJsonRpc` or `ClientHttp` are assigned. 1195 1196JSON-RPC: 1197 1198* Create a new JsonRpcConnection object 1199 * When the endpoint object is configured, spawn a Coroutine which takes care of syncing the client (file and runtime config, replay log, etc.) 1200 * No endpoint treats this connection as anonymous client, with a configurable limit. This client may send a CSR signing request for example. 1201 * Start the JsonRpcConnection - this spawns Coroutines to HandleIncomingMessages, WriteOutgoingMessages, HandleAndWriteHeartbeats and CheckLiveness 1202 1203HTTP: 1204 1205* Create a new HttpServerConnection 1206 * Start the HttpServerConnection - this spawns Coroutines to ProcessMessages and CheckLiveness 1207 1208 1209All the mentioned Coroutines run asynchronously using the global I/O engine's context. 1210More details on this topic can be found in [this blogpost](https://www.netways.de/blog/2019/04/04/modern-c-programming-coroutines-with-boost/). 1211 1212The lower levels of context switching and sharing or event polling are 1213hidden in Boost ASIO, Beast, Coroutine and Context libraries. 1214 1215#### Data Exchange: Coroutines and I/O Engine <a id="technical-concepts-tls-network-io-connection-data-exchange-coroutines"></a> 1216 1217Light-weight and fast operations such as connection handling or TLS handshakes 1218are performed in the default `IoBoundWorkSlot` pool inside the I/O engine. 1219 1220The I/O engine has another pool available: `CpuBoundWork`. 1221 1222This is used for processing CPU intensive tasks, such as handling a HTTP request. 1223Depending on the available CPU cores, this is limited to `std::thread::hardware_concurrency() * 3u / 2u`. 1224 1225``` 12261 core * 3 / 2 = 1 12272 cores * 3 / 2 = 3 12288 cores * 3 / 2 = 12 122916 cores * 3 / 2 = 24 1230``` 1231 1232The I/O engine itself is used with all network I/O in Icinga, not only the cluster 1233and the REST API. Features such as Graphite, InfluxDB, etc. also consume its functionality. 1234 1235There are 2 * CPU cores threads available which run the event loop 1236in the I/O engine. This polls the I/O service with `m_IoService.run();` 1237and triggers an asynchronous event progress for waiting coroutines. 1238 1239<!-- 1240## REST API <a id="technical-concepts-rest-api"></a> 1241 1242Icinga 2 provides its own HTTP server which shares the port 5665 with 1243the JSON-RPC cluster protocol. 1244--> 1245 1246## JSON-RPC Message API <a id="technical-concepts-json-rpc-messages"></a> 1247 1248**The JSON-RPC message API is not a public API for end users.** In case you want 1249to interact with Icinga, use the [REST API](12-icinga2-api.md#icinga2-api). 1250 1251This section describes the internal cluster messages exchanged between endpoints. 1252 1253> **Tip** 1254> 1255> Debug builds with `icinga2 daemon -DInternal.DebugJsonRpc=1` unveils the JSON-RPC messages. 1256 1257### Registered Handler Functions 1258 1259Functions by example: 1260 1261Event Sender: `Checkable::OnNewCheckResult` 1262 1263``` 1264On<xyz>.connect(&xyzHandler) 1265``` 1266 1267Event Receiver (Client): `CheckResultAPIHandler` in `REGISTER_APIFUNCTION` 1268 1269``` 1270<xyz>APIHandler() 1271``` 1272 1273### Messages 1274 1275#### icinga::Hello <a id="technical-concepts-json-rpc-messages-icinga-hello"></a> 1276 1277> Location: `apilistener.cpp` 1278 1279##### Message Body 1280 1281Key | Value 1282----------|--------- 1283jsonrpc | 2.0 1284method | icinga::Hello 1285params | Dictionary 1286 1287##### Params 1288 1289Key | Type | Description 1290---------------------|-------------|------------------ 1291capabilities | Number | Bitmask, see `lib/remote/apilistener.hpp`. 1292version | Number | Icinga 2 version, e.g. 21300 for v2.13.0. 1293 1294##### Functions 1295 1296Event Sender: When a new client connects in `NewClientHandlerInternal()`. 1297Event Receiver: `HelloAPIHandler` 1298 1299##### Permissions 1300 1301None, this is a required message. 1302 1303#### event::Heartbeat <a id="technical-concepts-json-rpc-messages-event-heartbeat"></a> 1304 1305> Location: `jsonrpcconnection-heartbeat.cpp` 1306 1307##### Message Body 1308 1309Key | Value 1310----------|--------- 1311jsonrpc | 2.0 1312method | event::Heartbeat 1313params | Dictionary 1314 1315##### Params 1316 1317Key | Type | Description 1318----------|---------------|------------------ 1319timeout | Number | Heartbeat timeout, sender sets 120s. 1320 1321 1322##### Functions 1323 1324Event Sender: `JsonRpcConnection::HeartbeatTimerHandler` 1325Event Receiver: `HeartbeatAPIHandler` 1326 1327Both sender and receiver exchange this heartbeat message. If the sender detects 1328that a client endpoint hasn't sent anything in the updated timeout span, it disconnects 1329the client. This is to avoid stale connections with no message processing. 1330 1331##### Permissions 1332 1333None, this is a required message. 1334 1335#### event::CheckResult <a id="technical-concepts-json-rpc-messages-event-checkresult"></a> 1336 1337> Location: `clusterevents.cpp` 1338 1339##### Message Body 1340 1341Key | Value 1342----------|--------- 1343jsonrpc | 2.0 1344method | event::CheckResult 1345params | Dictionary 1346 1347##### Params 1348 1349Key | Type | Description 1350----------|---------------|------------------ 1351host | String | Host name 1352service | String | Service name 1353cr | Serialized CR | Check result 1354 1355##### Functions 1356 1357Event Sender: `Checkable::OnNewCheckResult` 1358Event Receiver: `CheckResultAPIHandler` 1359 1360##### Permissions 1361 1362The receiver will not process messages from not configured endpoints. 1363 1364Message updates will be dropped when: 1365 1366* Hosts/services do not exist 1367* Origin is a remote command endpoint different to the configured, and whose zone is not allowed to access this checkable. 1368 1369#### event::SetNextCheck <a id="technical-concepts-json-rpc-messages-event-setnextcheck"></a> 1370 1371> Location: `clusterevents.cpp` 1372 1373##### Message Body 1374 1375Key | Value 1376----------|--------- 1377jsonrpc | 2.0 1378method | event::SetNextCheck 1379params | Dictionary 1380 1381##### Params 1382 1383Key | Type | Description 1384------------|---------------|------------------ 1385host | String | Host name 1386service | String | Service name 1387next\_check | Timestamp | Next scheduled time as UNIX timestamp. 1388 1389##### Functions 1390 1391Event Sender: `Checkable::OnNextCheckChanged` 1392Event Receiver: `NextCheckChangedAPIHandler` 1393 1394##### Permissions 1395 1396The receiver will not process messages from not configured endpoints. 1397 1398Message updates will be dropped when: 1399 1400* Checkable does not exist. 1401* Origin endpoint's zone is not allowed to access this checkable. 1402 1403#### event::SetLastCheckStarted <a id="technical-concepts-json-rpc-messages-event-setlastcheckstarted"></a> 1404 1405> Location: `clusterevents.cpp` 1406 1407##### Message Body 1408 1409Key | Value 1410----------|--------- 1411jsonrpc | 2.0 1412method | event::SetLastCheckStarted 1413params | Dictionary 1414 1415##### Params 1416 1417Key | Type | Description 1418---------------------|-----------|------------------ 1419host | String | Host name 1420service | String | Service name 1421last\_check\_started | Timestamp | Last check's start time as UNIX timestamp. 1422 1423##### Functions 1424 1425Event Sender: `Checkable::OnLastCheckStartedChanged` 1426Event Receiver: `LastCheckStartedChangedAPIHandler` 1427 1428##### Permissions 1429 1430The receiver will not process messages from not configured endpoints. 1431 1432Message updates will be dropped when: 1433 1434* Checkable does not exist. 1435* Origin endpoint's zone is not allowed to access this checkable. 1436 1437#### event::SetSuppressedNotifications <a id="technical-concepts-json-rpc-messages-event-setsupressednotifications"></a> 1438 1439> Location: `clusterevents.cpp` 1440 1441##### Message Body 1442 1443Key | Value 1444----------|--------- 1445jsonrpc | 2.0 1446method | event::SetSuppressedNotifications 1447params | Dictionary 1448 1449##### Params 1450 1451Key | Type | Description 1452-------------------------|---------------|------------------ 1453host | String | Host name 1454service | String | Service name 1455supressed\_notifications | Number | Bitmask for suppressed notifications. 1456 1457##### Functions 1458 1459Event Sender: `Checkable::OnSuppressedNotificationsChanged` 1460Event Receiver: `SuppressedNotificationsChangedAPIHandler` 1461 1462Used to sync the notification state of a host or service object within the same HA zone. 1463 1464##### Permissions 1465 1466The receiver will not process messages from not configured endpoints. 1467 1468Message updates will be dropped when: 1469 1470* Checkable does not exist. 1471* Origin endpoint is not within the local zone. 1472 1473#### event::SetSuppressedNotificationTypes <a id="technical-concepts-json-rpc-messages-event-setsuppressednotificationtypes"></a> 1474 1475> Location: `clusterevents.cpp` 1476 1477##### Message Body 1478 1479Key | Value 1480----------|--------- 1481jsonrpc | 2.0 1482method | event::SetSuppressedNotificationTypes 1483params | Dictionary 1484 1485##### Params 1486 1487Key | Type | Description 1488-------------------------|--------|------------------ 1489notification | String | Notification name 1490supressed\_notifications | Number | Bitmask for suppressed notifications. 1491 1492Used to sync the state of a notification object within the same HA zone. 1493 1494##### Functions 1495 1496Event Sender: `Notification::OnSuppressedNotificationsChanged` 1497Event Receiver: `SuppressedNotificationTypesChangedAPIHandler` 1498 1499##### Permissions 1500 1501The receiver will not process messages from not configured endpoints. 1502 1503Message updates will be dropped when: 1504 1505* Notification does not exist. 1506* Origin endpoint is not within the local zone. 1507 1508 1509#### event::SetNextNotification <a id="technical-concepts-json-rpc-messages-event-setnextnotification"></a> 1510 1511> Location: `clusterevents.cpp` 1512 1513##### Message Body 1514 1515Key | Value 1516----------|--------- 1517jsonrpc | 2.0 1518method | event::SetNextNotification 1519params | Dictionary 1520 1521##### Params 1522 1523Key | Type | Description 1524-------------------|---------------|------------------ 1525host | String | Host name 1526service | String | Service name 1527notification | String | Notification name 1528next\_notification | Timestamp | Next scheduled notification time as UNIX timestamp. 1529 1530##### Functions 1531 1532Event Sender: `Notification::OnNextNotificationChanged` 1533Event Receiver: `NextNotificationChangedAPIHandler` 1534 1535##### Permissions 1536 1537The receiver will not process messages from not configured endpoints. 1538 1539Message updates will be dropped when: 1540 1541* Notification does not exist. 1542* Origin endpoint's zone is not allowed to access this checkable. 1543 1544#### event::SetForceNextCheck <a id="technical-concepts-json-rpc-messages-event-setforcenextcheck"></a> 1545 1546> Location: `clusterevents.cpp` 1547 1548##### Message Body 1549 1550Key | Value 1551----------|--------- 1552jsonrpc | 2.0 1553method | event::SetForceNextCheck 1554params | Dictionary 1555 1556##### Params 1557 1558Key | Type | Description 1559----------|---------------|------------------ 1560host | String | Host name 1561service | String | Service name 1562forced | Boolean | Forced next check (execute now) 1563 1564##### Functions 1565 1566Event Sender: `Checkable::OnForceNextCheckChanged` 1567Event Receiver: `ForceNextCheckChangedAPIHandler` 1568 1569##### Permissions 1570 1571The receiver will not process messages from not configured endpoints. 1572 1573Message updates will be dropped when: 1574 1575* Checkable does not exist. 1576* Origin endpoint's zone is not allowed to access this checkable. 1577 1578#### event::SetForceNextNotification <a id="technical-concepts-json-rpc-messages-event-setforcenextnotification"></a> 1579 1580> Location: `clusterevents.cpp` 1581 1582##### Message Body 1583 1584Key | Value 1585----------|--------- 1586jsonrpc | 2.0 1587method | event::SetForceNextNotification 1588params | Dictionary 1589 1590##### Params 1591 1592Key | Type | Description 1593----------|---------------|------------------ 1594host | String | Host name 1595service | String | Service name 1596forced | Boolean | Forced next check (execute now) 1597 1598##### Functions 1599 1600Event Sender: `Checkable::SetForceNextNotification` 1601Event Receiver: `ForceNextNotificationChangedAPIHandler` 1602 1603##### Permissions 1604 1605The receiver will not process messages from not configured endpoints. 1606 1607Message updates will be dropped when: 1608 1609* Checkable does not exist. 1610* Origin endpoint's zone is not allowed to access this checkable. 1611 1612#### event::SetAcknowledgement <a id="technical-concepts-json-rpc-messages-event-setacknowledgement"></a> 1613 1614> Location: `clusterevents.cpp` 1615 1616##### Message Body 1617 1618Key | Value 1619----------|--------- 1620jsonrpc | 2.0 1621method | event::SetAcknowledgement 1622params | Dictionary 1623 1624##### Params 1625 1626Key | Type | Description 1627-----------|---------------|------------------ 1628host | String | Host name 1629service | String | Service name 1630author | String | Acknowledgement author name. 1631comment | String | Acknowledgement comment content. 1632acktype | Number | Acknowledgement type (0=None, 1=Normal, 2=Sticky) 1633notify | Boolean | Notification should be sent. 1634persistent | Boolean | Whether the comment is persistent. 1635expiry | Timestamp | Optional expire time as UNIX timestamp. 1636 1637##### Functions 1638 1639Event Sender: `Checkable::OnForceNextCheckChanged` 1640Event Receiver: `ForceNextCheckChangedAPIHandler` 1641 1642##### Permissions 1643 1644The receiver will not process messages from not configured endpoints. 1645 1646Message updates will be dropped when: 1647 1648* Checkable does not exist. 1649* Origin endpoint's zone is not allowed to access this checkable. 1650 1651#### event::ClearAcknowledgement <a id="technical-concepts-json-rpc-messages-event-clearacknowledgement"></a> 1652 1653> Location: `clusterevents.cpp` 1654 1655##### Message Body 1656 1657Key | Value 1658----------|--------- 1659jsonrpc | 2.0 1660method | event::ClearAcknowledgement 1661params | Dictionary 1662 1663##### Params 1664 1665Key | Type | Description 1666----------|---------------|------------------ 1667host | String | Host name 1668service | String | Service name 1669 1670##### Functions 1671 1672Event Sender: `Checkable::OnAcknowledgementCleared` 1673Event Receiver: `AcknowledgementClearedAPIHandler` 1674 1675##### Permissions 1676 1677The receiver will not process messages from not configured endpoints. 1678 1679Message updates will be dropped when: 1680 1681* Checkable does not exist. 1682* Origin endpoint's zone is not allowed to access this checkable. 1683 1684#### event::SendNotifications <a id="technical-concepts-json-rpc-messages-event-sendnotifications"></a> 1685 1686> Location: `clusterevents.cpp` 1687 1688##### Message Body 1689 1690Key | Value 1691----------|--------- 1692jsonrpc | 2.0 1693method | event::SendNotifications 1694params | Dictionary 1695 1696##### Params 1697 1698Key | Type | Description 1699----------|---------------|------------------ 1700host | String | Host name 1701service | String | Service name 1702cr | Serialized CR | Check result 1703type | Number | enum NotificationType, same as `types` for notification objects. 1704author | String | Author name 1705text | String | Notification text 1706 1707##### Functions 1708 1709Event Sender: `Checkable::OnNotificationsRequested` 1710Event Receiver: `SendNotificationsAPIHandler` 1711 1712Signals that notifications have to be sent within the same HA zone. This is relevant if the checkable and its 1713notifications are active on different endpoints. 1714 1715##### Permissions 1716 1717The receiver will not process messages from not configured endpoints. 1718 1719Message updates will be dropped when: 1720 1721* Checkable does not exist. 1722* Origin endpoint is not within the local zone. 1723 1724#### event::NotificationSentUser <a id="technical-concepts-json-rpc-messages-event-notificationsentuser"></a> 1725 1726> Location: `clusterevents.cpp` 1727 1728##### Message Body 1729 1730Key | Value 1731----------|--------- 1732jsonrpc | 2.0 1733method | event::NotificationSentUser 1734params | Dictionary 1735 1736##### Params 1737 1738Key | Type | Description 1739--------------|-----------------|------------------ 1740host | String | Host name 1741service | String | Service name 1742notification | String | Notification name. 1743user | String | Notified user name. 1744type | Number | enum NotificationType, same as `types` in Notification objects. 1745cr | Serialized CR | Check result. 1746author | String | Notification author (for specific types) 1747text | String | Notification text (for specific types) 1748command | String | Notification command name. 1749 1750##### Functions 1751 1752Event Sender: `Checkable::OnNotificationSentToUser` 1753Event Receiver: `NotificationSentUserAPIHandler` 1754 1755##### Permissions 1756 1757The receiver will not process messages from not configured endpoints. 1758 1759Message updates will be dropped when: 1760 1761* Checkable does not exist. 1762* Origin endpoint's zone the same as the receiver. This binds notification messages to the HA zone. 1763 1764#### event::NotificationSentToAllUsers <a id="technical-concepts-json-rpc-messages-event-notificationsenttoallusers"></a> 1765 1766> Location: `clusterevents.cpp` 1767 1768##### Message Body 1769 1770Key | Value 1771----------|--------- 1772jsonrpc | 2.0 1773method | event::NotificationSentToAllUsers 1774params | Dictionary 1775 1776##### Params 1777 1778Key | Type | Description 1779----------------------------|-----------------|------------------ 1780host | String | Host name 1781service | String | Service name 1782notification | String | Notification name. 1783users | Array of String | Notified user names. 1784type | Number | enum NotificationType, same as `types` in Notification objects. 1785cr | Serialized CR | Check result. 1786author | String | Notification author (for specific types) 1787text | String | Notification text (for specific types) 1788last\_notification | Timestamp | Last notification time as UNIX timestamp. 1789next\_notification | Timestamp | Next scheduled notification time as UNIX timestamp. 1790notification\_number | Number | Current notification number in problem state. 1791last\_problem\_notification | Timestamp | Last problem notification time as UNIX timestamp. 1792no\_more\_notifications | Boolean | Whether to send future notifications when this notification becomes active on this HA node. 1793 1794##### Functions 1795 1796Event Sender: `Checkable::OnNotificationSentToAllUsers` 1797Event Receiver: `NotificationSentToAllUsersAPIHandler` 1798 1799##### Permissions 1800 1801The receiver will not process messages from not configured endpoints. 1802 1803Message updates will be dropped when: 1804 1805* Checkable does not exist. 1806* Origin endpoint's zone the same as the receiver. This binds notification messages to the HA zone. 1807 1808#### event::ExecuteCommand <a id="technical-concepts-json-rpc-messages-event-executecommand"></a> 1809 1810> Location: `clusterevents-check.cpp` and `checkable-check.cpp` 1811 1812##### Message Body 1813 1814Key | Value 1815----------|--------- 1816jsonrpc | 2.0 1817method | event::ExecuteCommand 1818params | Dictionary 1819 1820##### Params 1821 1822Key | Type | Description 1823---------------|---------------|------------------ 1824host | String | Host name. 1825service | String | Service name. 1826command\_type | String | `check_command` or `event_command`. 1827command | String | CheckCommand or EventCommand name. 1828check\_timeout | Number | Check timeout of the checkable object, if specified as `check_timeout` attribute. 1829macros | Dictionary | Command arguments as key/value pairs for remote execution. 1830endpoint | String | The endpoint to execute the command on. 1831deadline | Number | A Unix timestamp indicating the execution deadline 1832source | String | The execution UUID 1833 1834 1835##### Functions 1836 1837**Event Sender:** This gets constructed directly in `Checkable::ExecuteCheck()`, `Checkable::ExecuteEventHandler()` or `ApiActions::ExecuteCommand()` when a remote command endpoint is configured. 1838 1839* `Get{CheckCommand,EventCommand}()->Execute()` simulates an execution and extracts all command arguments into the `macro` dictionary (inside lib/methods tasks). 1840* When the endpoint is connected, the message is constructed and sent directly. 1841* When the endpoint is not connected and not syncing replay logs and 5m after application start, generate an UNKNOWN check result for the user ("not connected"). 1842 1843**Event Receiver:** `ExecuteCommandAPIHandler` 1844 1845Special handling, calls `ClusterEvents::EnqueueCheck()` for command endpoint checks. 1846This function enqueues check tasks into a queue which is controlled in `RemoteCheckThreadProc()`. 1847If the `endpoint` parameter is specified and is not equal to the local endpoint then the message is forwarded to the correct endpoint zone. 1848 1849##### Permissions 1850 1851The receiver will not process messages from not configured endpoints. 1852 1853Message updates will be dropped when: 1854 1855* Origin endpoint's zone is not a parent zone of the receiver endpoint. 1856* `accept_commands = false` in the `api` feature configuration sends back an UNKNOWN check result to the sender. 1857 1858The receiver constructs a virtual host object and looks for the local CheckCommand object. 1859 1860Returns UNKNOWN as check result to the sender 1861 1862* when the CheckCommand object does not exist. 1863* when there was an exception triggered from check execution, e.g. the plugin binary could not be executed or similar. 1864 1865The returned messages are synced directly to the sender's endpoint, no cluster broadcast. 1866 1867> **Note**: EventCommand errors are just logged on the remote endpoint. 1868 1869### event::UpdateExecutions <a id="technical-concepts-json-rpc-messages-event-updateexecutions"></a> 1870 1871> Location: `clusterevents.cpp` 1872 1873##### Message Body 1874 1875Key | Value 1876----------|--------- 1877jsonrpc | 2.0 1878method | event::UpdateExecutions 1879params | Dictionary 1880 1881##### Params 1882 1883Key | Type | Description 1884---------------|---------------|------------------ 1885host | String | Host name. 1886service | String | Service name. 1887executions | Dictionary | Executions to be updated 1888 1889##### Functions 1890 1891**Event Sender:** `ClusterEvents::ExecutedCommandAPIHandler`, `ClusterEvents::UpdateExecutionsAPIHandler`, `ApiActions::ExecuteCommand` 1892**Event Receiver:** `ClusterEvents::UpdateExecutionsAPIHandler` 1893 1894##### Permissions 1895 1896The receiver will not process messages from not configured endpoints. 1897 1898Message updates will be dropped when: 1899 1900* Checkable does not exist. 1901* Origin endpoint's zone is not allowed to access this checkable. 1902 1903### event::ExecutedCommand <a id="technical-concepts-json-rpc-messages-event-executedcommand"></a> 1904 1905> Location: `clusterevents.cpp` 1906 1907##### Message Body 1908 1909Key | Value 1910----------|--------- 1911jsonrpc | 2.0 1912method | event::ExecutedCommand 1913params | Dictionary 1914 1915##### Params 1916 1917Key | Type | Description 1918---------------|---------------|------------------ 1919host | String | Host name. 1920service | String | Service name. 1921execution | String | The execution ID executed. 1922exitStatus | Number | The command exit status. 1923output | String | The command output. 1924start | Number | The unix timestamp at the start of the command execution 1925end | Number | The unix timestamp at the end of the command execution 1926 1927##### Functions 1928 1929**Event Sender:** `ClusterEvents::ExecuteCheckFromQueue`, `ClusterEvents::ExecuteCommandAPIHandler` 1930**Event Receiver:** `ClusterEvents::ExecutedCommandAPIHandler` 1931 1932##### Permissions 1933 1934The receiver will not process messages from not configured endpoints. 1935 1936Message updates will be dropped when: 1937 1938* Checkable does not exist. 1939* Origin endpoint's zone is not allowed to access this checkable. 1940 1941#### config::Update <a id="technical-concepts-json-rpc-messages-config-update"></a> 1942 1943> Location: `apilistener-filesync.cpp` 1944 1945##### Message Body 1946 1947Key | Value 1948----------|--------- 1949jsonrpc | 2.0 1950method | config::Update 1951params | Dictionary 1952 1953##### Params 1954 1955Key | Type | Description 1956-----------|---------------|------------------ 1957update | Dictionary | Config file paths and their content. 1958update\_v2 | Dictionary | Additional meta config files introduced in 2.4+ for compatibility reasons. 1959 1960##### Functions 1961 1962**Event Sender:** `SendConfigUpdate()` called in `ApiListener::SyncClient()` when a new client endpoint connects. 1963**Event Receiver:** `ConfigUpdateHandler` reads the config update content and stores them in `/var/lib/icinga2/api`. 1964When it detects a configuration change, the function requests and application restart. 1965 1966##### Permissions 1967 1968The receiver will not process messages from not configured endpoints. 1969 1970Message updates will be dropped when: 1971 1972* The origin sender is not in a parent zone of the receiver. 1973* `api` feature does not accept config. 1974 1975Config updates will be ignored when: 1976 1977* The zone is not configured on the receiver endpoint. 1978* The zone is authoritative on this instance (this only happens on a master which has `/etc/icinga2/zones.d` populated, and prevents sync loops) 1979 1980#### config::UpdateObject <a id="technical-concepts-json-rpc-messages-config-updateobject"></a> 1981 1982> Location: `apilistener-configsync.cpp` 1983 1984##### Message Body 1985 1986Key | Value 1987----------|--------- 1988jsonrpc | 2.0 1989method | config::UpdateObject 1990params | Dictionary 1991 1992##### Params 1993 1994Key | Type | Description 1995---------------------|-------------|------------------ 1996name | String | Object name. 1997type | String | Object type name. 1998version | Number | Object version. 1999config | String | Config file content for `_api` packages. 2000modified\_attributes | Dictionary | Modified attributes at runtime as key value pairs. 2001original\_attributes | Array | Original attributes as array of keys. 2002 2003 2004##### Functions 2005 2006**Event Sender:** Either on client connect (full sync), or runtime created/updated object 2007 2008`ApiListener::SendRuntimeConfigObjects()` gets called when a new endpoint is connected 2009and runtime created config objects need to be synced. This invokes a call to `UpdateConfigObject()` 2010to only sync this JsonRpcConnection client. 2011 2012`ConfigObject::OnActiveChanged` (created or deleted) or `ConfigObject::OnVersionChanged` (updated) 2013also call `UpdateConfigObject()`. 2014 2015**Event Receiver:** `ConfigUpdateObjectAPIHandler` calls `ConfigObjectUtility::CreateObject()` in order 2016to create the object if it is not already existing. Afterwards, all modified attributes are applied 2017and in case, original attributes are restored. The object version is set as well, keeping it in sync 2018with the sender. 2019 2020##### Permissions 2021 2022###### Sender 2023 2024Client receiver connects: 2025 2026The sender only syncs config object updates to a client which can access 2027the config object, in `ApiListener::SendRuntimeConfigObjects()`. 2028 2029In addition to that, the client endpoint's zone is checked whether this zone may access 2030the config object. 2031 2032Runtime updated object: 2033 2034Only if the config object belongs to the `_api` package. 2035 2036 2037###### Receiver 2038 2039The receiver will not process messages from not configured endpoints. 2040 2041Message updates will be dropped when: 2042 2043* Origin sender endpoint's zone is in a child zone. 2044* `api` feature does not accept config 2045* The received config object type does not exist (this is to prevent failures with older nodes and new object types). 2046 2047Error handling: 2048 2049* Log an error if `CreateObject` fails (only if the object does not already exist) 2050* Local object version is newer than the received version, object will not be updated. 2051* Compare modified and original attributes and restore any type of change here. 2052 2053 2054#### config::DeleteObject <a id="technical-concepts-json-rpc-messages-config-deleteobject"></a> 2055 2056> Location: `apilistener-configsync.cpp` 2057 2058##### Message Body 2059 2060Key | Value 2061----------|--------- 2062jsonrpc | 2.0 2063method | config::DeleteObject 2064params | Dictionary 2065 2066##### Params 2067 2068Key | Type | Description 2069--------------------|-------------|------------------ 2070name | String | Object name. 2071type | String | Object type name. 2072version | Number | Object version. 2073 2074##### Functions 2075 2076**Event Sender:** 2077 2078`ConfigObject::OnActiveChanged` (created or deleted) or `ConfigObject::OnVersionChanged` (updated) 2079call `DeleteConfigObject()`. 2080 2081**Event Receiver:** `ConfigDeleteObjectAPIHandler` 2082 2083##### Permissions 2084 2085###### Sender 2086 2087Runtime deleted object: 2088 2089Only if the config object belongs to the `_api` package. 2090 2091###### Receiver 2092 2093The receiver will not process messages from not configured endpoints. 2094 2095Message updates will be dropped when: 2096 2097* Origin sender endpoint's zone is in a child zone. 2098* `api` feature does not accept config 2099* The received config object type does not exist (this is to prevent failures with older nodes and new object types). 2100* The object in question was not created at runtime, it does not belong to the `_api` package. 2101 2102Error handling: 2103 2104* Log an error if `DeleteObject` fails (only if the object does not already exist) 2105 2106#### pki::RequestCertificate <a id="technical-concepts-json-rpc-messages-pki-requestcertificate"></a> 2107 2108> Location: `jsonrpcconnection-pki.cpp` 2109 2110##### Message Body 2111 2112Key | Value 2113----------|--------- 2114jsonrpc | 2.0 2115method | pki::RequestCertificate 2116params | Dictionary 2117 2118##### Params 2119 2120Key | Type | Description 2121--------------|---------------|------------------ 2122ticket | String | Own ticket, or as satellite in CA proxy from local store. 2123cert\_request | String | Certificate request content from local store, optional. 2124 2125##### Functions 2126 2127Event Sender: `RequestCertificateHandler` 2128Event Receiver: `RequestCertificateHandler` 2129 2130##### Permissions 2131 2132This is an anonymous request, and the number of anonymous clients can be configured 2133in the `api` feature. 2134 2135Only valid certificate request messages are processed, and valid signed certificates 2136won't be signed again. 2137 2138#### pki::UpdateCertificate <a id="technical-concepts-json-rpc-messages-pki-updatecertificate"></a> 2139 2140> Location: `jsonrpcconnection-pki.cpp` 2141 2142##### Message Body 2143 2144Key | Value 2145----------|--------- 2146jsonrpc | 2.0 2147method | pki::UpdateCertificate 2148params | Dictionary 2149 2150##### Params 2151 2152Key | Type | Description 2153---------------------|---------------|------------------ 2154status\_code | Number | Status code, 0=ok. 2155cert | String | Signed certificate content. 2156ca | String | Public CA certificate content. 2157fingerprint\_request | String | Certificate fingerprint from the CSR. 2158 2159 2160##### Functions 2161 2162**Event Sender:** 2163 2164* When a client requests a certificate in `RequestCertificateHandler` and the satellite 2165already has a signed certificate, the `pki::UpdateCertificate` message is constructed and sent back. 2166* When the endpoint holding the master's CA private key (and TicketSalt private key) is able to sign 2167the request, the `pki::UpdateCertificate` message is constructed and sent back. 2168 2169**Event Receiver:** `UpdateCertificateHandler` 2170 2171##### Permissions 2172 2173Message updates are dropped when 2174 2175* The origin sender is not in a parent zone of the receiver. 2176* The certificate fingerprint is in an invalid format. 2177 2178#### log::SetLogPosition <a id="technical-concepts-json-rpc-messages-log-setlogposition"></a> 2179 2180> Location: `apilistener.cpp` and `jsonrpcconnection.cpp` 2181 2182##### Message Body 2183 2184Key | Value 2185----------|--------- 2186jsonrpc | 2.0 2187method | log::SetLogPosition 2188params | Dictionary 2189 2190##### Params 2191 2192Key | Type | Description 2193--------------------|---------------|------------------ 2194log\_position | Timestamp | The endpoint's log position as UNIX timestamp. 2195 2196 2197##### Functions 2198 2199**Event Sender:** 2200 2201During log replay to a client endpoint in `ApiListener::ReplayLog()`, each processed 2202file generates a message which updates the log position timestamp. 2203 2204`ApiListener::ApiTimerHandler()` invokes a check to keep all connected endpoints and 2205their log position in sync during replay log. 2206 2207**Event Receiver:** `SetLogPositionHandler` 2208 2209##### Permissions 2210 2211The receiver will not process messages from not configured endpoints. 2212