1# Technical Concepts <a id="technical-concepts"></a>
2
3This chapter provides technical concepts and design insights
4into specific Icinga 2 components such as:
5
6* [Application](19-technical-concepts.md#technical-concepts-application)
7* [Configuration](19-technical-concepts.md#technical-concepts-configuration)
8* [Features](19-technical-concepts.md#technical-concepts-features)
9* [Check Scheduler](19-technical-concepts.md#technical-concepts-check-scheduler)
10* [Checks](19-technical-concepts.md#technical-concepts-checks)
11* [Cluster](19-technical-concepts.md#technical-concepts-cluster)
12* [TLS Network IO](19-technical-concepts.md#technical-concepts-tls-network-io)
13
14## Application <a id="technical-concepts-application"></a>
15
16### CLI Commands <a id="technical-concepts-application-cli-commands"></a>
17
18The Icinga 2 application is managed with different CLI sub commands.
19`daemon` takes care about loading the configuration files, running the
20application as daemon, etc.
21Other sub commands allow to enable features, generate and request
22TLS certificates or enter the debug console.
23
24The main entry point for each CLI command parses the command line
25parameters and then triggers the required actions.
26
27### daemon CLI command <a id="technical-concepts-application-cli-commands-daemon"></a>
28
29This CLI command loads the configuration files, starting with `icinga2.conf`.
30The [configuration compiler](19-technical-concepts.md#technical-concepts-configuration) parses the
31file and detects additional file includes, constants, and any other DSL
32specific declaration.
33
34At this stage, the configuration will already be checked against the
35defined grammar in the scanner, and custom object validators will also be
36checked.
37
38If the user provided `-C/--validate`, the CLI command returns with the
39validation exit code.
40
41When running as daemon, additional parameters are checked, e.g. whether
42this application was triggered by a reload, needs to daemonize with fork()
43involved and update the object's authority. The latter is important for
44HA-enabled cluster zones.
45
46## Configuration <a id="technical-concepts-configuration"></a>
47
48### Lexer <a id="technical-concepts-configuration-lexer"></a>
49
50The lexer stage does not understand the DSL itself, it only
51maps specific character sequences into identifiers.
52
53This allows Icinga to detect the beginning of a string with `"`,
54reading the following characters and determining the end of the
55string with again `"`.
56
57Other parts covered by the lexer a escape sequences insides a string,
58e.g. `"\"abc"`.
59
60The lexer also identifiers logical operators, e.g. `&` or `in`,
61specific keywords like `object`, `import`, etc. and comment blocks.
62
63Please check `lib/config/config_lexer.ll` for details.
64
65Icinga uses [Flex](https://github.com/westes/flex) in the first stage.
66
67> Flex (The Fast Lexical Analyzer)
68>
69> Flex is a fast lexical analyser generator. It is a tool for generating programs
70> that perform pattern-matching on text. Flex is a free (but non-GNU) implementation
71> of the original Unix lex program.
72
73### Parser <a id="technical-concepts-configuration-parser"></a>
74
75The parser stage puts the identifiers from the lexer into more
76context with flow control and sequences.
77
78The following comparison is parsed into a left term, an operator
79and a right term.
80
81```
82x > 5
83```
84
85The DSL contains many elements which require a specific order,
86and sometimes only a left term for example.
87
88The parser also takes care of parsing an object declaration for
89example. It already knows from the lexer that `object` marks the
90beginning of an object. It then expects a type string afterwards,
91and the object name - which can be either a string with double quotes
92or a previously defined constant.
93
94An opening bracket `{` in this specific context starts the object
95scope, which also is stored for later scope specific variable access.
96
97If there's an apply rule defined, this follows the same principle.
98The config parser detects the scope of an apply rule and generates
99Icinga 2 C++ code for the parsed string tokens.
100
101```
102assign where host.vars.sla == "24x7"
103```
104
105is parsed into an assign token identifier, and the string expression
106is compiled into a new `ApplyExpression` object.
107
108The flow control inside the parser ensures that for example `ignore where`
109can only be defined when a previous `assign where` was given - or when
110inside an apply for rule.
111
112Another example are specific object types which allow assign expression,
113specifically group objects. Others objects must throw a configuration error.
114
115Please check `lib/config/config_parser.yy` for more details,
116and the [language reference](17-language-reference.md#language-reference) chapter for
117documented DSL keywords and sequences.
118
119> Icinga uses [Bison](https://en.wikipedia.org/wiki/GNU_bison) as parser generator
120> which reads a specification of a context-free language, warns about any parsing
121> ambiguities, and generates a parser in C++ which reads sequences of tokens and
122> decides whether the sequence conforms to the syntax specified by the grammar.
123
124
125### Compiler <a id="technical-concepts-configuration-compiler"></a>
126
127The config compiler initializes the scanner inside the [lexer](19-technical-concepts.md#technical-concepts-configuration-lexer)
128stage.
129
130The configuration files are parsed into memory from inside the [daemon CLI command](19-technical-concepts.md#technical-concepts-application-cli-commands-daemon)
131which invokes the config validation in `ValidateConfigFiles()`. This compiles the
132files into an AST expression which is executed.
133
134At this stage, the expressions generate so-called "config items" which
135are a pre-stage of the later compiled object.
136
137`ConfigItem::CommitItems` takes care of committing the items, and doing a
138rollback on failure. It also checks against matching apply rules from the previous run
139and generates statistics about the objects which can be seen by the config validation.
140
141`ConfigItem::CommitNewItems` collects the registered types and items,
142and checks for a specific required order, e.g. a service object needs
143a host object first.
144
145The following stages happen then:
146
147- **Commit**: A workqueue then commits the items in a parallel fashion for this specific type. The object gets its name, and the AST expression is executed. It is then registered into the item into `m_Object` as reference.
148- **OnAllConfigLoaded**: Special signal for each object to pre-load required object attributes, resolve group membership, initialize functions and timers.
149- **CreateChildObjects**: Run apply rules for this specific type.
150- **CommitNewItems**: Apply rules may generate new config items, this is to ensure that they again run through the stages.
151
152Note that the items are now committed and the configuration is validated and loaded
153into memory. The final config objects are not yet activated though.
154
155This only happens after the validation, when the application is about to be run
156with `ConfigItem::ActivateItems`.
157
158Each item has an object created in `m_Object` which is checked in a loop.
159Again, the dependency order of activated objects is important here, e.g. logger features come first, then
160config objects and last the checker, api, etc. features. This is done by sorting the objects
161based on their type specific activation priority.
162
163The following signals are triggered in the stages:
164
165- **PreActivate**: Setting the `active` flag for the config object.
166- **Activate**: Calls `Start()` on the object, sets the local HA authority and notifies subscribers that this object is now activated (e.g. for config updates in the DB backend).
167
168
169### References <a id="technical-concepts-configuration-references"></a>
170
171* [The Icinga Config Compiler: An Overview](https://www.netways.de/blog/2018/07/12/the-icinga-config-compiler-an-overview/)
172* [A parser/lexer/compiler for the Leonardo language](https://github.com/EmilGedda/Leonardo)
173* [I wrote a programming language. Here’s how you can, too.](https://medium.freecodecamp.org/the-programming-language-pipeline-91d3f449c919)
174* [http://onoffswitch.net/building-a-custom-lexer/](http://onoffswitch.net/building-a-custom-lexer/)
175* [Writing an Interpreter with Lex, Yacc, and Memphis](http://memphis.compilertools.net/interpreter.html)
176* [Flex](https://github.com/westes/flex)
177* [GNU Bison](https://www.gnu.org/software/bison/)
178
179## Core <a id="technical-concepts-core"></a>
180
181### Core: Reload Handling <a id="technical-concepts-core-reload"></a>
182
183The initial design of the reload state machine looks like this:
184
185* receive reload signal SIGHUP
186* fork a child process, start configuration validation in parallel work queues
187* parent process continues with old configuration objects and the event scheduling
188(doing checks, replicating cluster events, triggering alert notifications, etc.)
189* validation NOT ok: child process terminates, parent process continues with old configuration state
190* validation ok: child process signals parent process to terminate and save its current state (all events until now) into the icinga2 state file
191* parent process shuts down writing icinga2.state file
192* child process waits for parent process gone, reads the icinga2 state file and synchronizes all historical and status data
193* child becomes the new session leader
194
195Since Icinga 2.6, there are two processes when checked with `ps aux | grep icinga2` or `pidof icinga2`.
196This was to ensure that feature file descriptors don't leak into the plugin process (e.g. DB IDO MySQL sockets).
197
198Icinga 2.9 changed the reload handling a bit with SIGUSR2 signals
199and systemd notifies.
200
201With systemd, it could occur that the tree was broken thus resulting
202in killing all remaining processes on stop, instead of a clean exit.
203You can read the full story [here](https://github.com/Icinga/icinga2/issues/7309).
204
205With 2.11 you'll now see 3 processes:
206
207- The umbrella process which takes care about signal handling and process spawning/stopping
208- The main process with the check scheduler, notifications, etc.
209- The execution helper process
210
211During reload, the umbrella process spawns a new reload process which validates the configuration.
212Once successful, the new reload process signals the umbrella process that it is finished.
213The umbrella process forwards the signal and tells the old main process to shutdown.
214The old main process writes the icinga2.state file. The umbrella process signals
215the reload process that the main process terminated.
216
217The reload process was in idle wait before, and now continues to read the written
218state file and run the event loop (checks, notifications, "events", ...). The reload
219process itself also spawns the execution helper process again.
220
221
222## Features <a id="technical-concepts-features"></a>
223
224Features are implemented in specific libraries and can be enabled
225using CLI commands.
226
227Features either write specific data or receive data.
228
229Examples for writing data: [DB IDO](14-features.md#db-ido), [Graphite](14-features.md#graphite-carbon-cache-writer), [InfluxDB](14-features.md#influxdb-writer). [GELF](14-features.md#gelfwriter), etc.
230Examples for receiving data: [REST API](12-icinga2-api.md#icinga2-api), etc.
231
232The implementation of features makes use of existing libraries
233and functionality. This makes the code more abstract, but shorter
234and easier to read.
235
236Features register callback functions on specific events they want
237to handle. For example the `GraphiteWriter` feature subscribes to
238new CheckResult events.
239
240Each time Icinga 2 receives and processes a new check result, this
241event is triggered and forwarded to all subscribers.
242
243The GraphiteWriter feature calls the registered function and processes
244the received data. Features which connect Icinga 2 to external interfaces
245normally parse and reformat the received data into an applicable format.
246
247Since this check result signal is blocking, many of the features include a work queue
248with asynchronous task handling.
249
250The GraphiteWriter uses a TCP socket to communicate with the carbon cache
251daemon of Graphite. The InfluxDBWriter is instead writing bulk metric messages
252to InfluxDB's HTTP API, similar to Elasticsearch.
253
254
255## Check Scheduler <a id="technical-concepts-check-scheduler"></a>
256
257The check scheduler starts a thread which loops forever. It waits for
258check events being inserted into `m_IdleCheckables`.
259
260If the current pending check event number is larger than the configured
261max concurrent checks, the thread waits up until it there's slots again.
262
263In addition, further checks on enabled checks, check periods, etc. are
264performed. Once all conditions have passed, the next check timestamp is
265calculated and updated. This also is the timestamp where Icinga expects
266a new check result ("freshness check").
267
268The object is removed from idle checkables, and inserted into the
269pending checkables list. This can be seen via REST API metrics for the
270checker component feature as well.
271
272The actual check execution happens asynchronously using the application's
273thread pool.
274
275Once the check returns, it is removed from pending checkables and again
276inserted into idle checkables. This ensures that the scheduler takes this
277checkable event into account in the next iteration.
278
279### Start <a id="technical-concepts-check-scheduler-start"></a>
280
281When checkable objects get activated during the startup phase,
282the checker feature registers a handler for this event. This is due
283to the fact that the `checker` feature is fully optional, and e.g. not
284used on command endpoint clients.
285
286Whenever such an object activation signal is triggered, Icinga 2 checks
287whether it is [authoritative for this object](19-technical-concepts.md#technical-concepts-cluster-ha-object-authority).
288This means that inside an HA enabled zone with two endpoints, only non-paused checkable objects are
289actively inserted into the idle checkable list for the check scheduler.
290
291### Initial Check <a id="technical-concepts-check-scheduler-initial"></a>
292
293When a new checkable object (host or service) is initially added to the
294configuration, Icinga 2 performs the following during startup:
295
296* `Checkable::Start()` is called and calculates the first check time
297* With a spread delta, the next check time is actually set.
298
299If the next check should happen within a time frame of 60 seconds,
300Icinga 2 calculates a delta from a random value. The minimum of `check_interval`
301and 60 seconds is used as basis, multiplied with a random value between 0 and 1.
302
303In the best case, this check gets immediately executed after application start.
304The worst case scenario is that the check is scheduled 60 seconds after start
305the latest.
306
307The reasons for delaying and spreading checks during startup is that
308the application typically needs more resources at this time (cluster connections,
309feature warmup, initial syncs, etc.). Immediate check execution with
310thousands of checks could lead into performance problems, and additional
311events for each received check results.
312
313Therefore the initial check window is 60 seconds on application startup,
314random seed for all checkables. This is not predictable over multiple restarts
315for specific checkable objects, the delta changes every time.
316
317### Scheduling Offset <a id="technical-concepts-check-scheduler-offset"></a>
318
319There's a high chance that many checkable objects get executed at the same time
320and interval after startup. The initial scheduling spreads that a little, but
321Icinga 2 also attempts to ensure to keep fixed intervals, even with high check latency.
322
323During startup, Icinga 2 calculates the scheduling offset from a random number:
324
325* `Checkable::Checkable()` calls `SetSchedulingOffset()` with `Utility::Random()`
326* The offset is a pseudo-random integral value between `0` and `RAND_MAX`.
327
328Whenever the next check time is updated with `Checkable::UpdateNextCheck()`,
329the scheduling offset is taken into account.
330
331Depending on the state type (SOFT or HARD), either the `retry_interval` or `check_interval`
332is used. If the interval is greater than 1 second, the time adjustment is calculated in the
333following way:
334
335`now * 100 + offset` divided by `interval * 100`, using the remainder (that's what `fmod()` is for)
336and dividing this again onto base 100.
337
338Example: offset is 6500, interval 300, now is 1542190472.
339
340```
3411542190472 * 100 + 6500 = 154219053714
342300 * 100 = 30000
343154219053714 / 30000 = 5140635.1238
344
345(5140635.1238 - 5140635.0) * 30000 = 3714
3463714 / 100 = 37.14
347```
348
34937.15 seconds as an offset would be far too much, so this is again used as a calculation divider for the
350real offset with the base of 5 times the actual interval.
351
352Again, the remainder is calculated from the offset and `interval * 5`. This is divided onto base 100 again,
353with an additional 0.5 seconds delay.
354
355Example: offset is 6500, interval 300.
356
357```
3586500 / 300 = 21.666666666666667
359(21.666666666666667 - 21.0) * 300 = 200
360200 / 100 = 2
3612 + 0.5 = 2.5
362```
363
364The minimum value between the first adjustment and the second offset calculation based on the interval is
365taken, in the above example `2.5` wins.
366
367The actual next check time substracts the adjusted time from the future interval addition to provide
368a more widespread scheduling time among all checkable objects.
369
370`nextCheck = now - adj + interval`
371
372You may ask, what other values can happen with this offset calculation. Consider calculating more examples
373with different interval settings.
374
375Example: offset is 34567, interval 60, now is 1542190472.
376
377```
3781542190472 * 100 + 34567 = 154219081767
37960 * 100 = 6000
380154219081767 / 6000 = 25703180.2945
381(25703180.2945 - 25703180.0) * 6000 / 100 = 17.67
382
38334567 / 60 = 576.116666666666667
384(576.116666666666667 - 576.0) * 60 / 100 + 0.5 = 1.2
385```
386
387`1m` interval starts at `now + 1.2s`.
388
389Example: offset is 12345, interval 86400, now is 1542190472.
390
391```
3921542190472 * 100 + 12345 = 154219059545
39386400 * 100 = 8640000
394154219059545 / 8640000 = 17849.428188078703704
395(17849.428188078703704 - 17849) * 8640000 = 3699545
3963699545 / 100 = 36995.45
397
39812345 / 86400 = 0.142881944444444
3990.142881944444444 * 86400 / 100 + 0.5 = 123.95
400```
401
402`1d` interval starts at `now + 2m4s`.
403
404> **Note**
405>
406> In case you have a better algorithm at hand, feel free to discuss this in a PR on GitHub.
407> It needs to fulfill two things: 1) spread and shuffle execution times on each `next_check` update
408> 2) not too narrowed window for both long and short intervals
409> Application startup and initial checks need to be handled with care in a slightly different
410> fashion.
411
412When `SetNextCheck()` is called, there are signals registered. One of them sits
413inside the `CheckerComponent` class whose handler `CheckerComponent::NextCheckChangedHandler()`
414deletes/inserts the next check event from the scheduling queue. This basically
415is a list with multiple indexes with the keys for scheduling info and the object.
416
417
418## Checks<a id="technical-concepts-checks"></a>
419
420### Check Latency and Execution Time <a id="technical-concepts-checks-latency"></a>
421
422Each check command execution logs the start and end time where
423Icinga 2 (and the end user) is able to calculate the plugin execution time from it.
424
425```cpp
426GetExecutionEnd() - GetExecutionStart()
427```
428
429The higher the execution time, the higher the command timeout must be set. Furthermore
430users and developers are encouraged to look into plugin optimizations to minimize the
431execution time. Sometimes it is better to let an external daemon/script do the checks
432and feed them back via REST API.
433
434Icinga 2 stores the scheduled start and end time for a check. If the actual
435check execution time differs from the scheduled time, e.g. due to performance
436problems or limited execution slots (concurrent checks), this value is stored
437and computed from inside the check result.
438
439The difference between the two deltas is called `check latency`.
440
441```cpp
442(GetScheduleEnd() - GetScheduleStart()) - CalculateExecutionTime()
443```
444
445### Severity <a id="technical-concepts-checks-severity"></a>
446
447The severity attribute is introduced with Icinga v2.11 and provides
448a bit mask calculated value from specific checkable object states.
449
450The severity value is pre-calculated for visualization interfaces
451such as Icinga Web which sorts the problem dashboard by severity by default.
452
453The higher the severity number is, the more important the problem is.
454
455Flags:
456
457```cpp
458/**
459 * Severity Flags
460 *
461 * @ingroup icinga
462 */
463enum SeverityFlag
464{
465	SeverityFlagDowntime = 1,
466	SeverityFlagAcknowledgement = 2,
467	SeverityFlagHostDown = 4,
468	SeverityFlagUnhandled = 8,
469	SeverityFlagPending = 16,
470	SeverityFlagWarning = 32,
471	SeverityFlagUnknown = 64,
472	SeverityFlagCritical = 128,
473};
474```
475
476
477Host:
478
479```cpp
480	/* OK/Warning = Up, Critical/Unknown = Down */
481	if (!HasBeenChecked())
482		severity |= SeverityFlagPending;
483	else if (state == ServiceUnknown)
484		severity |= SeverityFlagCritical;
485	else if (state == ServiceCritical)
486		severity |= SeverityFlagCritical;
487
488	if (IsInDowntime())
489		severity |= SeverityFlagDowntime;
490	else if (IsAcknowledged())
491		severity |= SeverityFlagAcknowledgement;
492	else
493		severity |= SeverityFlagUnhandled;
494```
495
496
497Service:
498
499```cpp
500	if (!HasBeenChecked())
501		severity |= SeverityFlagPending;
502	else if (state == ServiceWarning)
503		severity |= SeverityFlagWarning;
504	else if (state == ServiceUnknown)
505		severity |= SeverityFlagUnknown;
506	else if (state == ServiceCritical)
507		severity |= SeverityFlagCritical;
508
509	if (IsInDowntime())
510		severity |= SeverityFlagDowntime;
511	else if (IsAcknowledged())
512		severity |= SeverityFlagAcknowledgement;
513	else if (m_Host->GetProblem())
514		severity |= SeverityFlagHostDown;
515	else
516		severity |= SeverityFlagUnhandled;
517```
518
519
520
521## Cluster <a id="technical-concepts-cluster"></a>
522
523This documentation refers to technical roles between cluster
524endpoints.
525
526- The `server` or `parent` role accepts incoming connection attempts and handles requests
527- The `client` role actively connects to remote endpoints receiving config/commands, requesting certificates, etc.
528
529A client role is not necessarily bound to the Icinga agent.
530It may also be a satellite which actively connects to the
531master.
532
533### Communication <a id="technical-concepts-cluster-communication"></a>
534
535Icinga 2 uses its own certificate authority (CA) by default. The
536public and private CA keys can be generated on the signing master.
537
538Each node certificate must be signed by the private CA key.
539
540Note: The following description uses `parent node` and `child node`.
541This also applies to nodes in the same cluster zone.
542
543During the connection attempt, a TLS handshake is performed.
544If the public certificate of a child node is not signed by the same
545CA, the child node is not trusted and the connection will be closed.
546
547If the TLS handshake succeeds, the parent node reads the
548certificate's common name (CN) of the child node and looks for
549a local Endpoint object name configuration.
550
551If there is no Endpoint object found, further communication
552(runtime and config sync, etc.) is terminated.
553
554The child node also checks the CN from the parent node's public
555certificate. If the child node does not find any local Endpoint
556object name configuration, it will not trust the parent node.
557
558Both checks prevent accepting cluster messages from an untrusted
559source endpoint.
560
561If an Endpoint match was found, there is one additional security
562mechanism in place: Endpoints belong to a Zone hierarchy.
563
564Several cluster messages can only be sent "top down", others like
565check results are allowed being sent from the child to the parent node.
566
567Once this check succeeds the cluster messages are exchanged and processed.
568
569
570### CSR Signing <a id="technical-concepts-cluster-csr-signing"></a>
571
572In order to make things easier, Icinga 2 provides built-in methods
573to allow child nodes to request a signed certificate from the
574signing master.
575
576Icinga 2 v2.8 introduces the possibility to request certificates
577from indirectly connected nodes. This is required for multi level
578cluster environments with masters, satellites and agents.
579
580CSR Signing in general starts with the master setup. This step
581ensures that the master is in a working CSR signing state with:
582
583* public and private CA key in `/var/lib/icinga2/ca`
584* private `TicketSalt` constant defined inside the `api` feature
585* Cluster communication is ready and Icinga 2 listens on port 5665
586
587The child node setup which is run with CLI commands will now
588attempt to connect to the parent node. This is not necessarily
589the signing master instance, but could also be a parent satellite node.
590
591During this process the child node asks the user to verify the
592parent node's public certificate to prevent MITM attacks.
593
594There are two methods to request signed certificates:
595
596* Add the ticket into the request. This ticket was generated on the master
597beforehand and contains hashed details for which client it has been created.
598The signing master uses this information to automatically sign the certificate
599request.
600
601* Do not add a ticket into the request. It will be sent to the signing master
602which stores the pending request. Manual user interaction with CLI commands
603is necessary to sign the request.
604
605The certificate request is sent as `pki::RequestCertificate` cluster
606message to the parent node.
607
608If the parent node is not the signing master, it stores the request
609in `/var/lib/icinga2/certificate-requests` and forwards the
610cluster message to its parent node.
611
612Once the message arrives on the signing master, it first verifies that
613the sent certificate request is valid. This is to prevent unwanted errors
614or modified requests from the "proxy" node.
615
616After verification, the signing master checks if the request contains
617a valid signing ticket. It hashes the certificate's common name and
618compares the value to the received ticket number.
619
620If the ticket is valid, the certificate request is immediately signed
621with CA key. The request is sent back to the client inside a `pki::UpdateCertificate`
622cluster message.
623
624If the child node was not the certificate request origin, it only updates
625the cached request for the child node and send another cluster message
626down to its child node (e.g. from a satellite to an agent).
627
628
629If no ticket was specified, the signing master waits until the
630`ca sign` CLI command manually signed the certificate.
631
632> **Note**
633>
634> Push notifications for manual request signing is not yet implemented (TODO).
635
636Once the child node reconnects it synchronizes all signed certificate requests.
637This takes some minutes and requires all nodes to reconnect to each other.
638
639
640#### CSR Signing: Clients without parent connection <a id="technical-concepts-cluster-csr-signing-clients-no-connection"></a>
641
642There is an additional scenario: The setup on a child node does
643not necessarily need a connection to the parent node.
644
645This mode leaves the node in a semi-configured state. You need
646to manually copy the master's public CA key into `/var/lib/icinga2/certs/ca.crt`
647on the client before starting Icinga 2.
648
649> **Note**
650>
651> The `client` in this case can be either a satellite or an agent.
652
653The parent node needs to actively connect to the child node.
654Once this connections succeeds, the child node will actively
655request a signed certificate.
656
657The update procedure works the same way as above.
658
659### High Availability <a id="technical-concepts-cluster-ha"></a>
660
661General high availability is automatically enabled between two endpoints in the same
662cluster zone.
663
664**This requires the same configuration and enabled features on both nodes.**
665
666HA zone members trust each other and share event updates as cluster messages.
667This includes for example check results, next check timestamp updates, acknowledgements
668or notifications.
669
670This ensures that both nodes are synchronized. If one node goes away, the
671remaining node takes over and continues as normal.
672
673#### High Availability: Object Authority <a id="technical-concepts-cluster-ha-object-authority"></a>
674
675Cluster nodes automatically determine the authority for configuration
676objects. By default, all config objects are set to `HARunEverywhere` and
677as such the object authority is true for any config object on any instance.
678
679Specific objects can override and influence this setting, e.g. with `HARunOnce`
680instead prior to config object activation.
681
682This is done when the daemon starts and in a regular interval inside
683the ApiListener class, specifically calling `ApiListener::UpdateObjectAuthority()`.
684
685The algorithm works like this:
686
687* Determine whether this instance is assigned to a local zone and endpoint.
688* Collects all endpoints in this zone if they are connected.
689* If there's two endpoints, but only us seeing ourselves and the application start is less than 60 seconds in the past, do nothing (wait for cluster reconnect to take place, grace period).
690* Sort the collected endpoints by name.
691* Iterate over all config types and their respective objects
692 * Ignore !active objects
693 * Ignore objects which are !HARunOnce. This means, they can run multiple times in a zone and don't need an authority update.
694 * If this instance doesn't have a local zone, set authority to true. This is for non-clustered standalone environments where everything belongs to this instance.
695 * Calculate the object authority based on the connected endpoint names.
696 * Set the authority (true or false)
697
698The object authority calculation works "offline" without any message exchange.
699Each instance alculates the SDBM hash of the config object name, puts that in contrast
700modulo the connected endpoints size.
701This index is used to lookup the corresponding endpoint in the connected endpoints array,
702including the local endpoint. Whether the local endpoint is equal to the selected endpoint,
703or not, this sets the authority to `true` or `false`.
704
705```cpp
706authority = endpoints[Utility::SDBM(object->GetName()) % endpoints.size()] == my_endpoint;
707```
708
709`ConfigObject::SetAuthority(bool authority)` triggers the following events:
710
711* Authority is true and object now paused: Resume the object and set `paused` to `false`.
712* Authority is false, object not paused: Pause the object and set `paused` to true.
713
714**This results in activated but paused objects on one endpoint.** You can verify
715that by querying the `paused` attribute for all objects via REST API
716or debug console on both endpoints.
717
718Endpoints inside a HA zone calculate the object authority independent from each other.
719This object authority is important for selected features explained below.
720
721Since features are configuration objects too, you must ensure that all nodes
722inside the HA zone share the same enabled features. If configured otherwise,
723one might have a checker feature on the left node, nothing on the right node.
724This leads to late check results because one half is not executed by the right
725node which holds half of the object authorities.
726
727By default, features are enabled to "Run-Everywhere". Specific features which
728support HA awareness, provide the `enable_ha` configuration attribute. When `enable_ha`
729is set to `true` (usually the default), "Run-Once" is set and the feature pauses on one side.
730
731```
732vim /etc/icinga2/features-enabled/graphite.conf
733
734object GraphiteWriter "graphite" {
735  ...
736  enable_ha = true
737}
738```
739
740Once such a feature is paused, there won't be any more event handling, e.g. the Elasticsearch
741feature won't process any checkresults nor write to the Elasticsearch REST API.
742
743When the cluster connection drops, the feature configuration object is updated with
744the new object authority by the ApiListener timer and resumes its operation. You can see
745that by grepping the log file for `resumed` and `paused`.
746
747```
748[2018-10-24 13:28:28 +0200] information/GraphiteWriter: 'g-ha' paused.
749```
750
751```
752[2018-10-24 13:28:28 +0200] information/GraphiteWriter: 'g-ha' resumed.
753```
754
755Specific features with HA capabilities are explained below.
756
757#### High Availability: Checker <a id="technical-concepts-cluster-ha-checker"></a>
758
759The `checker` feature only executes checks for `Checkable` objects (Host, Service)
760where it is authoritative.
761
762That way each node only executes checks for a segment of the overall configuration objects.
763
764The cluster message routing ensures that all check results are synchronized
765to nodes which are not authoritative for this configuration object.
766
767
768#### High Availability: Notifications <a id="technical-concepts-cluster-notifications"></a>
769
770The `notification` feature only sends notifications for `Notification` objects
771where it is authoritative.
772
773That way each node only executes notifications for a segment of all notification objects.
774
775Notified users and other event details are synchronized throughout the cluster.
776This is required if for example the DB IDO feature is active on the other node.
777
778#### High Availability: DB IDO <a id="technical-concepts-cluster-ha-ido"></a>
779
780If you don't have HA enabled for the IDO feature, both nodes will
781write their status and historical data to their own separate database
782backends.
783
784In order to avoid data separation and a split view (each node would require its
785own Icinga Web 2 installation on top), the high availability option was added
786to the DB IDO feature. This is enabled by default with the `enable_ha` setting.
787
788This requires a central database backend. Best practice is to use a MySQL cluster
789with a virtual IP.
790
791Both Icinga 2 nodes require the connection and credential details configured in
792their DB IDO feature.
793
794During startup Icinga 2 calculates whether the feature configuration object
795is authoritative on this node or not. The order is an alpha-numeric
796comparison, e.g. if you have `master1` and `master2`, Icinga 2 will enable
797the DB IDO feature on `master2` by default.
798
799If the connection between endpoints drops, the object authority is re-calculated.
800
801In order to prevent data duplication in a split-brain scenario where both
802nodes would write into the same database, there is another safety mechanism
803in place.
804
805The split-brain decision which node will write to the database is calculated
806from a quorum inside the `programstatus` table. Each node
807verifies whether the `endpoint_name` column is not itself on database connect.
808In addition to that the DB IDO feature compares the `last_update_time` column
809against the current timestamp plus the configured `failover_timeout` offset.
810
811That way only one active DB IDO feature writes to the database, even if they
812are not currently connected in a cluster zone. This prevents data duplication
813in historical tables.
814
815### Health Checks <a id="technical-concepts-cluster-health-checks"></a>
816
817#### cluster-zone <a id="technical-concepts-cluster-health-checks-cluster-zone"></a>
818
819This built-in check provides the possibility to check for connectivity between
820zones.
821
822If you for example need to know whether the `master` zone is connected and processing
823messages with the child zone called `satellite` in this example, you can configure
824the [cluster-zone](10-icinga-template-library.md#itl-icinga-cluster-zone) check as new service on all `master` zone hosts.
825
826```
827vim /etc/zones.d/master/host1.conf
828
829object Service "cluster-zone-satellite" {
830  check_command = "cluster-zone"
831  host_name = "host1"
832
833  vars.cluster_zone = "satellite"
834}
835```
836
837The check itself changes to NOT-OK if one or more child endpoints in the child zone
838are not connected to parent zone endpoints.
839
840In addition to the overall connectivity check, the log lag is calculated based
841on the to-be-sent replay log. Each instance stores that for its configured endpoint
842objects.
843
844This health check iterates over the target zone (`cluster_zone`) and their endpoints.
845
846The log lag is greater than zero if
847
848* the replay log synchronization is in progress and not yet finished or
849* the endpoint is not connected, and no replay log sync happened (obviously).
850
851The final log lag value is the worst value detected. If satellite1 has a log lag of
852`1.5` and satellite2 only has `0.5`, the computed value will be `1.5.`.
853
854You can control the check state by using optional warning and critical thresholds
855for the log lag value.
856
857If this service exists multiple times, e.g. for each master host object, the log lag
858may differ based on the execution time. This happens for example on restart of
859an instance when the log replay is in progress and a health check is executed at different
860times.
861If the endpoint is not connected, both master instances may have saved a different log replay
862position from the last synchronisation.
863
864The lag value is returned as performance metric key `slave_lag`.
865
866Icinga 2 v2.9+ adds more performance metrics for these values:
867
868* `last_messages_sent` and `last_messages_received` as UNIX timestamp
869* `sum_messages_sent_per_second` and `sum_messages_received_per_second`
870* `sum_bytes_sent_per_second` and `sum_bytes_received_per_second`
871
872
873### Config Sync <a id="technical-concepts-cluster-config-sync"></a>
874
875The visible feature for the user is to put configuration files in `/etc/icinga2/zones.d/<zonename>`
876and have them synced automatically to all involved zones and endpoints.
877
878This not only includes host and service objects being checked
879in a satellite zone, but also additional config objects such as
880commands, groups, timeperiods and also templates.
881
882Additional thoughts and complexity added:
883
884- Putting files into zone directory names removes the burden to set the `zone` attribute on each object in this directory. This is done automatically by the config compiler.
885- Inclusion of `zones.d` happens automatically, the user shouldn't be bothered about this.
886- Before the REST API was created, only static configuration files in `/etc/icinga2/zones.d` existed. With the addition of config packages, additional `zones.d` targets must be registered (e.g. used by the Director)
887- Only one config master is allowed. This one identifies itself with configuration files in `/etc/icinga2/zones.d`. This is not necessarily the zone master seen in the debug logs, that one is important for message routing internally.
888- Objects and templates which cannot be bound into a specific zone (e.g. hosts in the satellite zone) must be made available "globally".
889- Users must be able to deny the synchronisation of specific zones, e.g. for security reasons.
890
891#### Config Sync: Config Master <a id="technical-concepts-cluster-config-sync-config-master"></a>
892
893All zones must be configured and included in the `zones.conf` config file beforehand.
894The zone names are the identifier for the directories underneath the `/etc/icinga2/zones.d`
895directory. If a zone is not configured, it will not be included in the config sync - keep this
896in mind for troubleshooting.
897
898When the config master starts, the content of `/etc/icinga2/zones.d` is automatically
899included. There's no need for an additional entry in `icinga2.conf` like `conf.d`.
900You can verify this by running the config validation on debug level:
901
902```
903icinga2 daemon -C -x debug | grep 'zones.d'
904
905[2019-06-19 15:16:19 +0200] notice/ConfigCompiler: Compiling config file: /etc/icinga2/zones.d/global-templates/commands.conf
906```
907
908Once the config validation succeeds, the startup routine for the daemon
909copies the files into the "production" directory in `/var/lib/icinga2/api/zones`.
910This directory is used for all endpoints where Icinga stores the received configuration.
911With the exception of the config master retrieving this from `/etc/icinga2/zones.d` instead.
912
913These operations are logged for better visibility.
914
915```
916[2019-06-19 15:26:38 +0200] information/ApiListener: Copying 1 zone configuration files for zone 'global-templates' to '/var/lib/icinga2/api/zones/global-templates'.
917[2019-06-19 15:26:38 +0200] information/ApiListener: Updating configuration file: /var/lib/icinga2/api/zones/global-templates//_etc/commands.conf
918```
919
920The master is finished at this point. Depending on the cluster configuration,
921the next iteration is a connected endpoint after successful TLS handshake and certificate
922authentication.
923
924It calls `SendConfigUpdate(client)` which sends the [config::Update](19-technical-concepts.md#technical-concepts-json-rpc-messages-config-update)
925JSON-RPC message including all required zones and their configuration file content.
926
927
928#### Config Sync: Receive Config <a id="technical-concepts-cluster-config-sync-receive-config"></a>
929
930The secondary master endpoint and endpoints in a child zone will be connected to the config
931master. The endpoint receives the [config::Update](19-technical-concepts.md#technical-concepts-json-rpc-messages-config-update)
932JSON-RPC message and processes the content in `ConfigUpdateHandler()`. This method checks
933whether config should be accepted. In addition to that, it locks a local mutex to avoid race conditions
934with multiple syncs in parallel.
935
936After that, the received configuration content is analysed.
937
938> **Note**
939>
940> The cluster design allows that satellite endpoints may connect to the secondary master first.
941> There is no immediate need to always connect to the config master first, especially since
942> the satellite endpoints don't know that.
943>
944> The secondary master not only stores the master zone config files, but also all child zones.
945> This is also the case for any HA enabled zone with more than one endpoint.
946
947
9482.11 puts the received configuration files into a staging directory in
949`/var/lib/icinga2/api/zones-stage`. Previous versions directly wrote the
950files into production which could have led to broken configuration on the
951next manual restart.
952
953```
954[2019-06-19 16:08:29 +0200] information/ApiListener: New client connection for identity 'master1' to [127.0.0.1]:5665
955[2019-06-19 16:08:30 +0200] information/ApiListener: Applying config update from endpoint 'master1' of zone 'master'.
956[2019-06-19 16:08:30 +0200] information/ApiListener: Received configuration for zone 'agent' from endpoint 'master1'. Comparing the checksums.
957[2019-06-19 16:08:30 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/agent//_etc/host.conf' for zone 'agent'.
958[2019-06-19 16:08:30 +0200] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones-stage/agent' (176 Bytes).
959[2019-06-19 16:08:30 +0200] information/ApiListener: Received configuration for zone 'master' from endpoint 'master1'. Comparing the checksums.
960[2019-06-19 16:08:30 +0200] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones-stage/master' (17 Bytes).
961[2019-06-19 16:08:30 +0200] information/ApiListener: Received configuration from endpoint 'master1' is different to production, triggering validation and reload.
962```
963
964It then validates the received configuration in its own config stage. There is
965an parameter override in place which disables the automatic inclusion of the production
966config in `/var/lib/icinga2/api/zones`.
967
968Once completed, the reload is triggered. This follows the same configurable timeout
969as with the global reload.
970
971```
972[2019-06-19 16:52:26 +0200] information/ApiListener: Config validation for stage '/var/lib/icinga2/api/zones-stage/' was OK, replacing into '/var/lib/icinga2/api/zones/' and triggering reload.
973[2019-06-19 16:52:27 +0200] information/Application: Got reload command: Started new instance with PID '19945' (timeout is 300s).
974[2019-06-19 16:52:28 +0200] information/Application: Reload requested, letting new process take over.
975```
976
977Whenever the staged configuration validation fails, Icinga logs this including a reference
978to the startup log file which includes additional errors.
979
980```
981[2019-06-19 15:45:27 +0200] critical/ApiListener: Config validation failed for staged cluster config sync in '/var/lib/icinga2/api/zones-stage/'. Aborting. Logs: '/var/lib/icinga2/api/zones-stage//startup.log'
982```
983
984
985#### Config Sync: Changes and Reload <a id="technical-concepts-cluster-config-sync-changes-reload"></a>
986
987Whenever a new configuration is received, it is validated and upon success, the
988daemon automatically reloads. While the daemon continues with checks, the reload
989cannot hand over open TCP connections. That being said, reloading the daemon everytime
990a configuration is synchronized would lead into many not connected endpoints.
991
992Therefore the cluster config sync checks whether the configuration files actually
993changed, and will only trigger a reload when such a change happened.
994
9952.11 calculates a checksum from each file content and compares this to the
996production configuration. Previous versions used additional metadata with timestamps from
997files which sometimes led to problems with asynchronous dates.
998
999> **Note**
1000>
1001> For compatibility reasons, the timestamp metadata algorithm is still intact, e.g.
1002> when the client is 2.11 already, but the parent endpoint is still on 2.10.
1003
1004Icinga logs a warning when this happens.
1005
1006```
1007Received configuration update without checksums from parent endpoint satellite1. This behaviour is deprecated. Please upgrade the parent endpoint to 2.11+
1008```
1009
1010
1011The debug log provides more details on the actual checksums and checks. Future output
1012may change, use this solely for troubleshooting and debugging whenever the cluster
1013config sync fails.
1014
1015```
1016[2019-06-19 16:13:16 +0200] information/ApiListener: Received configuration for zone 'agent' from endpoint 'master1'. Comparing the checksums.
1017[2019-06-19 16:13:16 +0200] debug/ApiListener: Checking for config change between stage and production. Old (3): '{"/.checksums":"7ede1276a9a32019c1412a52779804a976e163943e268ec4066e6b6ec4d15d73","/.timestamp":"ec4354b0eca455f7c2ca386fddf5b9ea810d826d402b3b6ac56ba63b55c2892c","/_etc/host.conf":"35d4823684d83a5ab0ca853c9a3aa8e592adfca66210762cdf2e54339ccf0a44"}' vs. new (3): '{"/.checksums":"84a586435d732327e2152e7c9b6d85a340cc917b89ae30972042f3dc344ea7cf","/.timestamp":"0fd6facf35e49ab1b2a161872fa7ad794564eba08624373d99d31c32a7a4c7d3","/_etc/host.conf":"0d62075e89be14088de1979644b40f33a8f185fcb4bb6ff1f7da2f63c7723fcb"}'.
1018[2019-06-19 16:13:16 +0200] debug/ApiListener: Checking /_etc/host.conf for checksum: 35d4823684d83a5ab0ca853c9a3aa8e592adfca66210762cdf2e54339ccf0a44
1019[2019-06-19 16:13:16 +0200] debug/ApiListener: Path '/_etc/host.conf' doesn't match old checksum '0d62075e89be14088de1979644b40f33a8f185fcb4bb6ff1f7da2f63c7723fcb' with new checksum '35d4823684d83a5ab0ca853c9a3aa8e592adfca66210762cdf2e54339ccf0a44'.
1020```
1021
1022
1023#### Config Sync: Trust <a id="technical-concepts-cluster-config-sync-trust"></a>
1024
1025The config sync follows the "top down" approach, where the master endpoint in the master
1026zone is allowed to synchronize configuration to the child zone, e.g. the satellite zone.
1027
1028Endpoints in the same zone, e.g. a secondary master, receive configuration for the same
1029zone and all child zones.
1030
1031Endpoints in the satellite zone trust the parent zone, and will accept the pushed
1032configuration via JSON-RPC cluster messages. By default, this is disabled and must
1033be enabled with the `accept_config` attribute in the ApiListener feature (manually or with CLI
1034helpers).
1035
1036The satellite zone will not only accept zone configuration for its own zone, but also
1037all configured child zones. That is why it is important to configure the zone hierarchy
1038on the satellite as well.
1039
1040Child zones are not allowed to sync configuration up to the parent zone. Each Icinga instance
1041evaluates this in startup and knows on endpoint connect which config zones need to be synced.
1042
1043
1044Global zones have a special trust relationship: They are synced to all child zones, be it
1045a satellite zone or agent zone. Since checkable objects such as a Host or a Service object
1046must have only one endpoint as authority, they cannot be put into a global zone (denied by
1047the config compiler).
1048
1049Apply rules and templates are allowed, since they are evaluated in the endpoint which received
1050the synced configuration. Keep in mind that there may be differences on the master and the satellite
1051when e.g. hostgroup membership is used for assign where expressions, but the groups are only
1052available on the master.
1053
1054
1055### Cluster: Message Routing <a id="technical-concepts-cluster-message-routing"></a>
1056
1057One fundamental part of the cluster message routing is the MessageOrigin object.
1058This is created when a new JSON-RPC message is received in `JsonRpcConnection::MessageHandler()`.
1059
1060It contains
1061
1062- FromZone being extracted from the endpoint object which owns the JsonRpcConnection
1063- FromClient being the JsonRpcConnection bound to the endpoint object
1064
1065These attributes are checked in message receive api handlers for security access. E.g. whether a
1066message origin is from a child zone which is not allowed, etc.
1067This is explained in the [JSON-RPC messages](19-technical-concepts.md#technical-concepts-json-rpc-messages) chapter.
1068
1069Whenever such a message is processed on the client, it may trigger additional cluster events
1070which are sent back to other endpoints. Therefore it is key to always pass the MessageOrigin
1071`origin` when processing these messages locally.
1072
1073Example:
1074
1075- Client receives a CheckResult from another endpoint in the same zone, call it `sender` for now
1076- Calls ProcessCheckResult() to store the CR and calculcate states, notifications, etc.
1077- Calls the OnNewCheckResult() signal to trigger IDO updates
1078
1079OnNewCheckResult() also calls a registered cluster handler which forwards the CheckResult to other cluster members.
1080
1081Without any origin details, this CheckResult would be relayed to the `sender` endpoint again.
1082Which processes the message, ProcessCheckResult(), OnNewCheckResult(), sends back and so on.
1083
1084That creates a loop which our cluster protocol needs to prevent at all cost.
1085
1086RelayMessageOne() takes care of the routing. This involves fetching the targetZone for this message and its endpoints.
1087
1088- Don't relay messages to ourselves.
1089- Don't relay messages to disconnected endpoints.
1090- Don't relay the message to the zone through more than one endpoint unless this is our own zone.
1091- Don't relay messages back to the endpoint which we got the message from. **THIS**
1092- Don't relay messages back to the zone which we got the message from.
1093- Only relay message to the zone master if we're not currently the zone master.
1094
1095```
1096 e1 is zone master, e2 and e3 are zone members.
1097
1098 Message is sent from e2 or e3:
1099   !isMaster == true
1100   targetEndpoint e1 is zone master -> send the message
1101   targetEndpoint e3 is not zone master -> skip it, avoid routing loops
1102
1103 Message is sent from e1:
1104   !isMaster == false -> send the messages to e2 and e3 being the zone routing master.
1105```
1106
1107With passing the `origin` the following condition prevents sending a message back to sender:
1108
1109```cpp
1110if (origin && origin->FromClient && targetEndpoint == origin->FromClient->GetEndpoint()) {
1111```
1112
1113This message then simply gets skipped for this specific Endpoint and is never sent.
1114
1115This analysis originates from a long-lasting [downtime loop bug](https://github.com/Icinga/icinga2/issues/7198).
1116
1117## TLS Network IO <a id="technical-concepts-tls-network-io"></a>
1118
1119### TLS Connection Handling <a id="technical-concepts-tls-network-io-connection-handling"></a>
1120
1121Icinga supports two connection directions, controlled via the `host` attribute
1122inside the Endpoint objects:
1123
1124* Outgoing connection attempts
1125* Incoming connection handling
1126
1127Once the connection is established, higher layers can exchange JSON-RPC and
1128HTTP messages. It doesn't matter which direction these message go.
1129
1130This offers a big advantage over single direction connections, just like
1131polling via HTTP only. Also, connections are kept alive as long as data
1132is transmitted.
1133
1134When the master connects to the child zone member(s), this requires more
1135resources there. Keep this in mind when endpoints are not reachable, the
1136TCP timeout blocks other resources. Moving a satellite zone in the middle
1137between masters and agents helps to split the tasks - the master
1138processes and stores data, deploys configuration and serves the API. The
1139satellites schedule the checks, connect to the agents and receive
1140check results.
1141
1142Agents/Clients can also connect to the parent endpoints - be it a master or
1143a satellite. This is the preferred way out of a DMZ, and also reduces the
1144overhead with connecting to e.g. 2000 agents on the master. You can
1145benchmark this when TCP connections are broken and timeouts are encountered.
1146
1147#### Master Processes Incoming Connection <a id="technical-concepts-tls-network-io-connection-handling-incoming"></a>
1148
1149* The node starts a new ApiListener, this invokes `AddListener()`
1150    * Setup TLS Context (SslContext)
1151    * Initialize global I/O engine and create a TCP acceptor
1152    * Resolve bind host/port (optional)
1153    * Listen on IPv4 and IPv6
1154    * Re-use socket address and port
1155    * Listen on port 5665 with `INT_MAX` possible sockets
1156* Spawn a new Coroutine which listens for new incoming connections as 'TCP server' pattern
1157    * Accept new connections asynchronously
1158    * Spawn a new Coroutine which handles the new client connection in a different context, Role: Server
1159
1160#### Master Connects Outgoing <a id="technical-concepts-tls-network-io-connection-handling-outgoing"></a>
1161
1162* The node starts a timer in a 10 seconds interval with `ApiReconnectTimerHandler()` as callback
1163    * Loop over all configured zones, exclude global zones and not direct parent/child zones
1164    * Get the endpoints configured in the zones, exclude: local endpoint, no 'host' attribute, already connected or in progress
1165    * Call `AddConnection()`
1166* Spawn a new Coroutine after making the TLS context
1167    * Use the global I/O engine for socket I/O
1168    * Create TLS stream
1169    * Connect to endpoint host/port details
1170    * Handle the client connection, Role: Client
1171
1172#### TLS Handshake <a id="technical-concepts-tls-network-io-connection-handling-handshake"></a>
1173
1174* Create a TLS connection in sslConn and perform an asynchronous TLS handshake
1175* Get the peer certificate
1176* Verify the presented certificate: `ssl::verify_peer` and `ssl::verify_client_once`
1177* Get the certificate CN and compare it against the endpoint name - if not matching, return and close the connection
1178
1179#### Data Exchange <a id="technical-concepts-tls-network-io-connection-data-exchange"></a>
1180
1181Everything runs through TLS, we don't use any "raw" connections nor plain message handling.
1182
1183HTTP and JSON-RPC messages share the same port and API, so additional handling is required.
1184
1185On a new connection and successful TLS handshake, the first byte is read. This either
1186is a JSON-RPC message in Netstring format starting with a number, or plain HTTP.
1187
1188```
1189HTTP/1.1
1190
11912:{}
1192```
1193
1194Depending on this, `ClientJsonRpc` or `ClientHttp` are assigned.
1195
1196JSON-RPC:
1197
1198* Create a new JsonRpcConnection object
1199    * When the endpoint object is configured, spawn a Coroutine which takes care of syncing the client (file and runtime config, replay log, etc.)
1200    * No endpoint treats this connection as anonymous client, with a configurable limit. This client may send a CSR signing request for example.
1201    * Start the JsonRpcConnection - this spawns Coroutines to HandleIncomingMessages, WriteOutgoingMessages, HandleAndWriteHeartbeats and CheckLiveness
1202
1203HTTP:
1204
1205* Create a new HttpServerConnection
1206     * Start the HttpServerConnection - this spawns Coroutines to ProcessMessages and CheckLiveness
1207
1208
1209All the mentioned Coroutines run asynchronously using the global I/O engine's context.
1210More details on this topic can be found in [this blogpost](https://www.netways.de/blog/2019/04/04/modern-c-programming-coroutines-with-boost/).
1211
1212The lower levels of context switching and sharing or event polling are
1213hidden in Boost ASIO, Beast, Coroutine and Context libraries.
1214
1215#### Data Exchange: Coroutines and I/O Engine <a id="technical-concepts-tls-network-io-connection-data-exchange-coroutines"></a>
1216
1217Light-weight and fast operations such as connection handling or TLS handshakes
1218are performed in the default `IoBoundWorkSlot` pool inside the I/O engine.
1219
1220The I/O engine has another pool available: `CpuBoundWork`.
1221
1222This is used for processing CPU intensive tasks, such as handling a HTTP request.
1223Depending on the available CPU cores, this is limited to `std::thread::hardware_concurrency() * 3u / 2u`.
1224
1225```
12261 core * 3 / 2 = 1
12272 cores * 3 / 2 = 3
12288 cores * 3 / 2 = 12
122916 cores * 3 / 2 = 24
1230```
1231
1232The I/O engine itself is used with all network I/O in Icinga, not only the cluster
1233and the REST API. Features such as Graphite, InfluxDB, etc. also consume its functionality.
1234
1235There are 2 * CPU cores threads available which run the event loop
1236in the I/O engine. This polls the I/O service with `m_IoService.run();`
1237and triggers an asynchronous event progress for waiting coroutines.
1238
1239<!--
1240## REST API <a id="technical-concepts-rest-api"></a>
1241
1242Icinga 2 provides its own HTTP server which shares the port 5665 with
1243the JSON-RPC cluster protocol.
1244-->
1245
1246## JSON-RPC Message API <a id="technical-concepts-json-rpc-messages"></a>
1247
1248**The JSON-RPC message API is not a public API for end users.** In case you want
1249to interact with Icinga, use the [REST API](12-icinga2-api.md#icinga2-api).
1250
1251This section describes the internal cluster messages exchanged between endpoints.
1252
1253> **Tip**
1254>
1255> Debug builds with `icinga2 daemon -DInternal.DebugJsonRpc=1` unveils the JSON-RPC messages.
1256
1257### Registered Handler Functions
1258
1259Functions by example:
1260
1261Event Sender: `Checkable::OnNewCheckResult`
1262
1263```
1264On<xyz>.connect(&xyzHandler)
1265```
1266
1267Event Receiver (Client): `CheckResultAPIHandler` in `REGISTER_APIFUNCTION`
1268
1269```
1270<xyz>APIHandler()
1271```
1272
1273### Messages
1274
1275#### icinga::Hello <a id="technical-concepts-json-rpc-messages-icinga-hello"></a>
1276
1277> Location: `apilistener.cpp`
1278
1279##### Message Body
1280
1281Key       | Value
1282----------|---------
1283jsonrpc   | 2.0
1284method    | icinga::Hello
1285params    | Dictionary
1286
1287##### Params
1288
1289Key                  | Type        | Description
1290---------------------|-------------|------------------
1291capabilities         | Number      | Bitmask, see `lib/remote/apilistener.hpp`.
1292version              | Number      | Icinga 2 version, e.g. 21300 for v2.13.0.
1293
1294##### Functions
1295
1296Event Sender: When a new client connects in `NewClientHandlerInternal()`.
1297Event Receiver: `HelloAPIHandler`
1298
1299##### Permissions
1300
1301None, this is a required message.
1302
1303#### event::Heartbeat <a id="technical-concepts-json-rpc-messages-event-heartbeat"></a>
1304
1305> Location: `jsonrpcconnection-heartbeat.cpp`
1306
1307##### Message Body
1308
1309Key       | Value
1310----------|---------
1311jsonrpc   | 2.0
1312method    | event::Heartbeat
1313params    | Dictionary
1314
1315##### Params
1316
1317Key       | Type          | Description
1318----------|---------------|------------------
1319timeout   | Number        | Heartbeat timeout, sender sets 120s.
1320
1321
1322##### Functions
1323
1324Event Sender: `JsonRpcConnection::HeartbeatTimerHandler`
1325Event Receiver: `HeartbeatAPIHandler`
1326
1327Both sender and receiver exchange this heartbeat message. If the sender detects
1328that a client endpoint hasn't sent anything in the updated timeout span, it disconnects
1329the client. This is to avoid stale connections with no message processing.
1330
1331##### Permissions
1332
1333None, this is a required message.
1334
1335#### event::CheckResult <a id="technical-concepts-json-rpc-messages-event-checkresult"></a>
1336
1337> Location: `clusterevents.cpp`
1338
1339##### Message Body
1340
1341Key       | Value
1342----------|---------
1343jsonrpc   | 2.0
1344method    | event::CheckResult
1345params    | Dictionary
1346
1347##### Params
1348
1349Key       | Type          | Description
1350----------|---------------|------------------
1351host      | String        | Host name
1352service   | String        | Service name
1353cr        | Serialized CR | Check result
1354
1355##### Functions
1356
1357Event Sender: `Checkable::OnNewCheckResult`
1358Event Receiver: `CheckResultAPIHandler`
1359
1360##### Permissions
1361
1362The receiver will not process messages from not configured endpoints.
1363
1364Message updates will be dropped when:
1365
1366* Hosts/services do not exist
1367* Origin is a remote command endpoint different to the configured, and whose zone is not allowed to access this checkable.
1368
1369#### event::SetNextCheck <a id="technical-concepts-json-rpc-messages-event-setnextcheck"></a>
1370
1371> Location: `clusterevents.cpp`
1372
1373##### Message Body
1374
1375Key       | Value
1376----------|---------
1377jsonrpc   | 2.0
1378method    | event::SetNextCheck
1379params    | Dictionary
1380
1381##### Params
1382
1383Key         | Type          | Description
1384------------|---------------|------------------
1385host        | String        | Host name
1386service     | String        | Service name
1387next\_check | Timestamp     | Next scheduled time as UNIX timestamp.
1388
1389##### Functions
1390
1391Event Sender: `Checkable::OnNextCheckChanged`
1392Event Receiver: `NextCheckChangedAPIHandler`
1393
1394##### Permissions
1395
1396The receiver will not process messages from not configured endpoints.
1397
1398Message updates will be dropped when:
1399
1400* Checkable does not exist.
1401* Origin endpoint's zone is not allowed to access this checkable.
1402
1403#### event::SetLastCheckStarted <a id="technical-concepts-json-rpc-messages-event-setlastcheckstarted"></a>
1404
1405> Location: `clusterevents.cpp`
1406
1407##### Message Body
1408
1409Key       | Value
1410----------|---------
1411jsonrpc   | 2.0
1412method    | event::SetLastCheckStarted
1413params    | Dictionary
1414
1415##### Params
1416
1417Key                  | Type      | Description
1418---------------------|-----------|------------------
1419host                 | String    | Host name
1420service              | String    | Service name
1421last\_check\_started | Timestamp | Last check's start time as UNIX timestamp.
1422
1423##### Functions
1424
1425Event Sender: `Checkable::OnLastCheckStartedChanged`
1426Event Receiver: `LastCheckStartedChangedAPIHandler`
1427
1428##### Permissions
1429
1430The receiver will not process messages from not configured endpoints.
1431
1432Message updates will be dropped when:
1433
1434* Checkable does not exist.
1435* Origin endpoint's zone is not allowed to access this checkable.
1436
1437#### event::SetSuppressedNotifications <a id="technical-concepts-json-rpc-messages-event-setsupressednotifications"></a>
1438
1439> Location: `clusterevents.cpp`
1440
1441##### Message Body
1442
1443Key       | Value
1444----------|---------
1445jsonrpc   | 2.0
1446method    | event::SetSuppressedNotifications
1447params    | Dictionary
1448
1449##### Params
1450
1451Key         		 | Type          | Description
1452-------------------------|---------------|------------------
1453host        		 | String        | Host name
1454service     		 | String        | Service name
1455supressed\_notifications | Number 	 | Bitmask for suppressed notifications.
1456
1457##### Functions
1458
1459Event Sender: `Checkable::OnSuppressedNotificationsChanged`
1460Event Receiver: `SuppressedNotificationsChangedAPIHandler`
1461
1462Used to sync the notification state of a host or service object within the same HA zone.
1463
1464##### Permissions
1465
1466The receiver will not process messages from not configured endpoints.
1467
1468Message updates will be dropped when:
1469
1470* Checkable does not exist.
1471* Origin endpoint is not within the local zone.
1472
1473#### event::SetSuppressedNotificationTypes <a id="technical-concepts-json-rpc-messages-event-setsuppressednotificationtypes"></a>
1474
1475> Location: `clusterevents.cpp`
1476
1477##### Message Body
1478
1479Key       | Value
1480----------|---------
1481jsonrpc   | 2.0
1482method    | event::SetSuppressedNotificationTypes
1483params    | Dictionary
1484
1485##### Params
1486
1487Key         		 | Type   | Description
1488-------------------------|--------|------------------
1489notification             | String | Notification name
1490supressed\_notifications | Number | Bitmask for suppressed notifications.
1491
1492Used to sync the state of a notification object within the same HA zone.
1493
1494##### Functions
1495
1496Event Sender: `Notification::OnSuppressedNotificationsChanged`
1497Event Receiver: `SuppressedNotificationTypesChangedAPIHandler`
1498
1499##### Permissions
1500
1501The receiver will not process messages from not configured endpoints.
1502
1503Message updates will be dropped when:
1504
1505* Notification does not exist.
1506* Origin endpoint is not within the local zone.
1507
1508
1509#### event::SetNextNotification <a id="technical-concepts-json-rpc-messages-event-setnextnotification"></a>
1510
1511> Location: `clusterevents.cpp`
1512
1513##### Message Body
1514
1515Key       | Value
1516----------|---------
1517jsonrpc   | 2.0
1518method    | event::SetNextNotification
1519params    | Dictionary
1520
1521##### Params
1522
1523Key                | Type          | Description
1524-------------------|---------------|------------------
1525host               | String        | Host name
1526service            | String        | Service name
1527notification       | String        | Notification name
1528next\_notification | Timestamp     | Next scheduled notification time as UNIX timestamp.
1529
1530##### Functions
1531
1532Event Sender: `Notification::OnNextNotificationChanged`
1533Event Receiver: `NextNotificationChangedAPIHandler`
1534
1535##### Permissions
1536
1537The receiver will not process messages from not configured endpoints.
1538
1539Message updates will be dropped when:
1540
1541* Notification does not exist.
1542* Origin endpoint's zone is not allowed to access this checkable.
1543
1544#### event::SetForceNextCheck <a id="technical-concepts-json-rpc-messages-event-setforcenextcheck"></a>
1545
1546> Location: `clusterevents.cpp`
1547
1548##### Message Body
1549
1550Key       | Value
1551----------|---------
1552jsonrpc   | 2.0
1553method    | event::SetForceNextCheck
1554params    | Dictionary
1555
1556##### Params
1557
1558Key       | Type          | Description
1559----------|---------------|------------------
1560host      | String        | Host name
1561service   | String        | Service name
1562forced    | Boolean       | Forced next check (execute now)
1563
1564##### Functions
1565
1566Event Sender: `Checkable::OnForceNextCheckChanged`
1567Event Receiver: `ForceNextCheckChangedAPIHandler`
1568
1569##### Permissions
1570
1571The receiver will not process messages from not configured endpoints.
1572
1573Message updates will be dropped when:
1574
1575* Checkable does not exist.
1576* Origin endpoint's zone is not allowed to access this checkable.
1577
1578#### event::SetForceNextNotification <a id="technical-concepts-json-rpc-messages-event-setforcenextnotification"></a>
1579
1580> Location: `clusterevents.cpp`
1581
1582##### Message Body
1583
1584Key       | Value
1585----------|---------
1586jsonrpc   | 2.0
1587method    | event::SetForceNextNotification
1588params    | Dictionary
1589
1590##### Params
1591
1592Key       | Type          | Description
1593----------|---------------|------------------
1594host      | String        | Host name
1595service   | String        | Service name
1596forced    | Boolean       | Forced next check (execute now)
1597
1598##### Functions
1599
1600Event Sender: `Checkable::SetForceNextNotification`
1601Event Receiver: `ForceNextNotificationChangedAPIHandler`
1602
1603##### Permissions
1604
1605The receiver will not process messages from not configured endpoints.
1606
1607Message updates will be dropped when:
1608
1609* Checkable does not exist.
1610* Origin endpoint's zone is not allowed to access this checkable.
1611
1612#### event::SetAcknowledgement <a id="technical-concepts-json-rpc-messages-event-setacknowledgement"></a>
1613
1614> Location: `clusterevents.cpp`
1615
1616##### Message Body
1617
1618Key       | Value
1619----------|---------
1620jsonrpc   | 2.0
1621method    | event::SetAcknowledgement
1622params    | Dictionary
1623
1624##### Params
1625
1626Key        | Type          | Description
1627-----------|---------------|------------------
1628host       | String        | Host name
1629service    | String        | Service name
1630author     | String        | Acknowledgement author name.
1631comment    | String        | Acknowledgement comment content.
1632acktype    | Number        | Acknowledgement type (0=None, 1=Normal, 2=Sticky)
1633notify     | Boolean       | Notification should be sent.
1634persistent | Boolean       | Whether the comment is persistent.
1635expiry     | Timestamp     | Optional expire time as UNIX timestamp.
1636
1637##### Functions
1638
1639Event Sender: `Checkable::OnForceNextCheckChanged`
1640Event Receiver: `ForceNextCheckChangedAPIHandler`
1641
1642##### Permissions
1643
1644The receiver will not process messages from not configured endpoints.
1645
1646Message updates will be dropped when:
1647
1648* Checkable does not exist.
1649* Origin endpoint's zone is not allowed to access this checkable.
1650
1651#### event::ClearAcknowledgement <a id="technical-concepts-json-rpc-messages-event-clearacknowledgement"></a>
1652
1653> Location: `clusterevents.cpp`
1654
1655##### Message Body
1656
1657Key       | Value
1658----------|---------
1659jsonrpc   | 2.0
1660method    | event::ClearAcknowledgement
1661params    | Dictionary
1662
1663##### Params
1664
1665Key       | Type          | Description
1666----------|---------------|------------------
1667host      | String        | Host name
1668service   | String        | Service name
1669
1670##### Functions
1671
1672Event Sender: `Checkable::OnAcknowledgementCleared`
1673Event Receiver: `AcknowledgementClearedAPIHandler`
1674
1675##### Permissions
1676
1677The receiver will not process messages from not configured endpoints.
1678
1679Message updates will be dropped when:
1680
1681* Checkable does not exist.
1682* Origin endpoint's zone is not allowed to access this checkable.
1683
1684#### event::SendNotifications <a id="technical-concepts-json-rpc-messages-event-sendnotifications"></a>
1685
1686> Location: `clusterevents.cpp`
1687
1688##### Message Body
1689
1690Key       | Value
1691----------|---------
1692jsonrpc   | 2.0
1693method    | event::SendNotifications
1694params    | Dictionary
1695
1696##### Params
1697
1698Key       | Type          | Description
1699----------|---------------|------------------
1700host      | String        | Host name
1701service   | String        | Service name
1702cr        | Serialized CR | Check result
1703type      | Number        | enum NotificationType, same as `types` for notification objects.
1704author    | String        | Author name
1705text      | String        | Notification text
1706
1707##### Functions
1708
1709Event Sender: `Checkable::OnNotificationsRequested`
1710Event Receiver: `SendNotificationsAPIHandler`
1711
1712Signals that notifications have to be sent within the same HA zone. This is relevant if the checkable and its
1713notifications are active on different endpoints.
1714
1715##### Permissions
1716
1717The receiver will not process messages from not configured endpoints.
1718
1719Message updates will be dropped when:
1720
1721* Checkable does not exist.
1722* Origin endpoint is not within the local zone.
1723
1724#### event::NotificationSentUser <a id="technical-concepts-json-rpc-messages-event-notificationsentuser"></a>
1725
1726> Location: `clusterevents.cpp`
1727
1728##### Message Body
1729
1730Key       | Value
1731----------|---------
1732jsonrpc   | 2.0
1733method    | event::NotificationSentUser
1734params    | Dictionary
1735
1736##### Params
1737
1738Key           | Type            | Description
1739--------------|-----------------|------------------
1740host          | String          | Host name
1741service       | String          | Service name
1742notification  | String          | Notification name.
1743user          | String          | Notified user name.
1744type          | Number          | enum NotificationType, same as `types` in Notification objects.
1745cr            | Serialized CR   | Check result.
1746author        | String          | Notification author (for specific types)
1747text          | String          | Notification text (for specific types)
1748command       | String          | Notification command name.
1749
1750##### Functions
1751
1752Event Sender: `Checkable::OnNotificationSentToUser`
1753Event Receiver: `NotificationSentUserAPIHandler`
1754
1755##### Permissions
1756
1757The receiver will not process messages from not configured endpoints.
1758
1759Message updates will be dropped when:
1760
1761* Checkable does not exist.
1762* Origin endpoint's zone the same as the receiver. This binds notification messages to the HA zone.
1763
1764#### event::NotificationSentToAllUsers <a id="technical-concepts-json-rpc-messages-event-notificationsenttoallusers"></a>
1765
1766> Location: `clusterevents.cpp`
1767
1768##### Message Body
1769
1770Key       | Value
1771----------|---------
1772jsonrpc   | 2.0
1773method    | event::NotificationSentToAllUsers
1774params    | Dictionary
1775
1776##### Params
1777
1778Key                         | Type            | Description
1779----------------------------|-----------------|------------------
1780host                        | String          | Host name
1781service                     | String          | Service name
1782notification                | String          | Notification name.
1783users                       | Array of String | Notified user names.
1784type                        | Number          | enum NotificationType, same as `types` in Notification objects.
1785cr                          | Serialized CR   | Check result.
1786author                      | String          | Notification author (for specific types)
1787text                        | String          | Notification text (for specific types)
1788last\_notification          | Timestamp       | Last notification time as UNIX timestamp.
1789next\_notification          | Timestamp       | Next scheduled notification time as UNIX timestamp.
1790notification\_number        | Number          | Current notification number in problem state.
1791last\_problem\_notification | Timestamp       | Last problem notification time as UNIX timestamp.
1792no\_more\_notifications     | Boolean         | Whether to send future notifications when this notification becomes active on this HA node.
1793
1794##### Functions
1795
1796Event Sender: `Checkable::OnNotificationSentToAllUsers`
1797Event Receiver: `NotificationSentToAllUsersAPIHandler`
1798
1799##### Permissions
1800
1801The receiver will not process messages from not configured endpoints.
1802
1803Message updates will be dropped when:
1804
1805* Checkable does not exist.
1806* Origin endpoint's zone the same as the receiver. This binds notification messages to the HA zone.
1807
1808#### event::ExecuteCommand <a id="technical-concepts-json-rpc-messages-event-executecommand"></a>
1809
1810> Location: `clusterevents-check.cpp` and `checkable-check.cpp`
1811
1812##### Message Body
1813
1814Key       | Value
1815----------|---------
1816jsonrpc   | 2.0
1817method    | event::ExecuteCommand
1818params    | Dictionary
1819
1820##### Params
1821
1822Key            | Type          | Description
1823---------------|---------------|------------------
1824host           | String        | Host name.
1825service        | String        | Service name.
1826command\_type  | String        | `check_command` or `event_command`.
1827command        | String        | CheckCommand or EventCommand name.
1828check\_timeout | Number        | Check timeout of the checkable object, if specified as `check_timeout` attribute.
1829macros         | Dictionary    | Command arguments as key/value pairs for remote execution.
1830endpoint       | String        | The endpoint to execute the command on.
1831deadline       | Number        | A Unix timestamp indicating the execution deadline
1832source         | String        | The execution UUID
1833
1834
1835##### Functions
1836
1837**Event Sender:** This gets constructed directly in `Checkable::ExecuteCheck()`, `Checkable::ExecuteEventHandler()` or `ApiActions::ExecuteCommand()` when a remote command endpoint is configured.
1838
1839* `Get{CheckCommand,EventCommand}()->Execute()` simulates an execution and extracts all command arguments into the `macro` dictionary (inside lib/methods tasks).
1840* When the endpoint is connected, the message is constructed and sent directly.
1841* When the endpoint is not connected and not syncing replay logs and 5m after application start, generate an UNKNOWN check result for the user ("not connected").
1842
1843**Event Receiver:** `ExecuteCommandAPIHandler`
1844
1845Special handling, calls `ClusterEvents::EnqueueCheck()` for command endpoint checks.
1846This function enqueues check tasks into a queue which is controlled in `RemoteCheckThreadProc()`.
1847If the `endpoint` parameter is specified and is not equal to the local endpoint then the message is forwarded to the correct endpoint zone.
1848
1849##### Permissions
1850
1851The receiver will not process messages from not configured endpoints.
1852
1853Message updates will be dropped when:
1854
1855* Origin endpoint's zone is not a parent zone of the receiver endpoint.
1856* `accept_commands = false` in the `api` feature configuration sends back an UNKNOWN check result to the sender.
1857
1858The receiver constructs a virtual host object and looks for the local CheckCommand object.
1859
1860Returns UNKNOWN as check result to the sender
1861
1862* when the CheckCommand object does not exist.
1863* when there was an exception triggered from check execution, e.g. the plugin binary could not be executed or similar.
1864
1865The returned messages are synced directly to the sender's endpoint, no cluster broadcast.
1866
1867> **Note**: EventCommand errors are just logged on the remote endpoint.
1868
1869### event::UpdateExecutions <a id="technical-concepts-json-rpc-messages-event-updateexecutions"></a>
1870
1871> Location: `clusterevents.cpp`
1872
1873##### Message Body
1874
1875Key       | Value
1876----------|---------
1877jsonrpc   | 2.0
1878method    | event::UpdateExecutions
1879params    | Dictionary
1880
1881##### Params
1882
1883Key            | Type          | Description
1884---------------|---------------|------------------
1885host           | String        | Host name.
1886service        | String        | Service name.
1887executions     | Dictionary    | Executions to be updated
1888
1889##### Functions
1890
1891**Event Sender:** `ClusterEvents::ExecutedCommandAPIHandler`, `ClusterEvents::UpdateExecutionsAPIHandler`, `ApiActions::ExecuteCommand`
1892**Event Receiver:** `ClusterEvents::UpdateExecutionsAPIHandler`
1893
1894##### Permissions
1895
1896The receiver will not process messages from not configured endpoints.
1897
1898Message updates will be dropped when:
1899
1900* Checkable does not exist.
1901* Origin endpoint's zone is not allowed to access this checkable.
1902
1903### event::ExecutedCommand <a id="technical-concepts-json-rpc-messages-event-executedcommand"></a>
1904
1905> Location: `clusterevents.cpp`
1906
1907##### Message Body
1908
1909Key       | Value
1910----------|---------
1911jsonrpc   | 2.0
1912method    | event::ExecutedCommand
1913params    | Dictionary
1914
1915##### Params
1916
1917Key            | Type          | Description
1918---------------|---------------|------------------
1919host           | String        | Host name.
1920service        | String        | Service name.
1921execution      | String        | The execution ID executed.
1922exitStatus     | Number        | The command exit status.
1923output         | String        | The command output.
1924start          | Number        | The unix timestamp at the start of the command execution
1925end            | Number        | The unix timestamp at the end of the command execution
1926
1927##### Functions
1928
1929**Event Sender:** `ClusterEvents::ExecuteCheckFromQueue`, `ClusterEvents::ExecuteCommandAPIHandler`
1930**Event Receiver:** `ClusterEvents::ExecutedCommandAPIHandler`
1931
1932##### Permissions
1933
1934The receiver will not process messages from not configured endpoints.
1935
1936Message updates will be dropped when:
1937
1938* Checkable does not exist.
1939* Origin endpoint's zone is not allowed to access this checkable.
1940
1941#### config::Update <a id="technical-concepts-json-rpc-messages-config-update"></a>
1942
1943> Location: `apilistener-filesync.cpp`
1944
1945##### Message Body
1946
1947Key       | Value
1948----------|---------
1949jsonrpc   | 2.0
1950method    | config::Update
1951params    | Dictionary
1952
1953##### Params
1954
1955Key        | Type          | Description
1956-----------|---------------|------------------
1957update     | Dictionary    | Config file paths and their content.
1958update\_v2 | Dictionary    | Additional meta config files introduced in 2.4+ for compatibility reasons.
1959
1960##### Functions
1961
1962**Event Sender:** `SendConfigUpdate()` called in `ApiListener::SyncClient()` when a new client endpoint connects.
1963**Event Receiver:** `ConfigUpdateHandler` reads the config update content and stores them in `/var/lib/icinga2/api`.
1964When it detects a configuration change, the function requests and application restart.
1965
1966##### Permissions
1967
1968The receiver will not process messages from not configured endpoints.
1969
1970Message updates will be dropped when:
1971
1972* The origin sender is not in a parent zone of the receiver.
1973* `api` feature does not accept config.
1974
1975Config updates will be ignored when:
1976
1977* The zone is not configured on the receiver endpoint.
1978* The zone is authoritative on this instance (this only happens on a master which has `/etc/icinga2/zones.d` populated, and prevents sync loops)
1979
1980#### config::UpdateObject <a id="technical-concepts-json-rpc-messages-config-updateobject"></a>
1981
1982> Location: `apilistener-configsync.cpp`
1983
1984##### Message Body
1985
1986Key       | Value
1987----------|---------
1988jsonrpc   | 2.0
1989method    | config::UpdateObject
1990params    | Dictionary
1991
1992##### Params
1993
1994Key                  | Type        | Description
1995---------------------|-------------|------------------
1996name                 | String      | Object name.
1997type                 | String      | Object type name.
1998version              | Number      | Object version.
1999config               | String      | Config file content for `_api` packages.
2000modified\_attributes | Dictionary  | Modified attributes at runtime as key value pairs.
2001original\_attributes | Array       | Original attributes as array of keys.
2002
2003
2004##### Functions
2005
2006**Event Sender:** Either on client connect (full sync), or runtime created/updated object
2007
2008`ApiListener::SendRuntimeConfigObjects()` gets called when a new endpoint is connected
2009and runtime created config objects need to be synced. This invokes a call to `UpdateConfigObject()`
2010to only sync this JsonRpcConnection client.
2011
2012`ConfigObject::OnActiveChanged` (created or deleted) or `ConfigObject::OnVersionChanged` (updated)
2013also call `UpdateConfigObject()`.
2014
2015**Event Receiver:** `ConfigUpdateObjectAPIHandler` calls `ConfigObjectUtility::CreateObject()` in order
2016to create the object if it is not already existing. Afterwards, all modified attributes are applied
2017and in case, original attributes are restored. The object version is set as well, keeping it in sync
2018with the sender.
2019
2020##### Permissions
2021
2022###### Sender
2023
2024Client receiver connects:
2025
2026The sender only syncs config object updates to a client which can access
2027the config object, in `ApiListener::SendRuntimeConfigObjects()`.
2028
2029In addition to that, the client endpoint's zone is checked whether this zone may access
2030the config object.
2031
2032Runtime updated object:
2033
2034Only if the config object belongs to the `_api` package.
2035
2036
2037###### Receiver
2038
2039The receiver will not process messages from not configured endpoints.
2040
2041Message updates will be dropped when:
2042
2043* Origin sender endpoint's zone is in a child zone.
2044* `api` feature does not accept config
2045* The received config object type does not exist (this is to prevent failures with older nodes and new object types).
2046
2047Error handling:
2048
2049* Log an error if `CreateObject` fails (only if the object does not already exist)
2050* Local object version is newer than the received version, object will not be updated.
2051* Compare modified and original attributes and restore any type of change here.
2052
2053
2054#### config::DeleteObject <a id="technical-concepts-json-rpc-messages-config-deleteobject"></a>
2055
2056> Location: `apilistener-configsync.cpp`
2057
2058##### Message Body
2059
2060Key       | Value
2061----------|---------
2062jsonrpc   | 2.0
2063method    | config::DeleteObject
2064params    | Dictionary
2065
2066##### Params
2067
2068Key                 | Type        | Description
2069--------------------|-------------|------------------
2070name                | String      | Object name.
2071type                | String      | Object type name.
2072version             | Number      | Object version.
2073
2074##### Functions
2075
2076**Event Sender:**
2077
2078`ConfigObject::OnActiveChanged` (created or deleted) or `ConfigObject::OnVersionChanged` (updated)
2079call `DeleteConfigObject()`.
2080
2081**Event Receiver:** `ConfigDeleteObjectAPIHandler`
2082
2083##### Permissions
2084
2085###### Sender
2086
2087Runtime deleted object:
2088
2089Only if the config object belongs to the `_api` package.
2090
2091###### Receiver
2092
2093The receiver will not process messages from not configured endpoints.
2094
2095Message updates will be dropped when:
2096
2097* Origin sender endpoint's zone is in a child zone.
2098* `api` feature does not accept config
2099* The received config object type does not exist (this is to prevent failures with older nodes and new object types).
2100* The object in question was not created at runtime, it does not belong to the `_api` package.
2101
2102Error handling:
2103
2104* Log an error if `DeleteObject` fails (only if the object does not already exist)
2105
2106#### pki::RequestCertificate <a id="technical-concepts-json-rpc-messages-pki-requestcertificate"></a>
2107
2108> Location: `jsonrpcconnection-pki.cpp`
2109
2110##### Message Body
2111
2112Key       | Value
2113----------|---------
2114jsonrpc   | 2.0
2115method    | pki::RequestCertificate
2116params    | Dictionary
2117
2118##### Params
2119
2120Key           | Type          | Description
2121--------------|---------------|------------------
2122ticket        | String        | Own ticket, or as satellite in CA proxy from local store.
2123cert\_request | String        | Certificate request content from local store, optional.
2124
2125##### Functions
2126
2127Event Sender: `RequestCertificateHandler`
2128Event Receiver: `RequestCertificateHandler`
2129
2130##### Permissions
2131
2132This is an anonymous request, and the number of anonymous clients can be configured
2133in the `api` feature.
2134
2135Only valid certificate request messages are processed, and valid signed certificates
2136won't be signed again.
2137
2138#### pki::UpdateCertificate <a id="technical-concepts-json-rpc-messages-pki-updatecertificate"></a>
2139
2140> Location: `jsonrpcconnection-pki.cpp`
2141
2142##### Message Body
2143
2144Key       | Value
2145----------|---------
2146jsonrpc   | 2.0
2147method    | pki::UpdateCertificate
2148params    | Dictionary
2149
2150##### Params
2151
2152Key                  | Type          | Description
2153---------------------|---------------|------------------
2154status\_code         | Number        | Status code, 0=ok.
2155cert                 | String        | Signed certificate content.
2156ca                   | String        | Public CA certificate content.
2157fingerprint\_request | String        | Certificate fingerprint from the CSR.
2158
2159
2160##### Functions
2161
2162**Event Sender:**
2163
2164* When a client requests a certificate in `RequestCertificateHandler` and the satellite
2165already has a signed certificate, the `pki::UpdateCertificate` message is constructed and sent back.
2166* When the endpoint holding the master's CA private key (and TicketSalt private key) is able to sign
2167the request, the `pki::UpdateCertificate` message is constructed and sent back.
2168
2169**Event Receiver:** `UpdateCertificateHandler`
2170
2171##### Permissions
2172
2173Message updates are dropped when
2174
2175* The origin sender is not in a parent zone of the receiver.
2176* The certificate fingerprint is in an invalid format.
2177
2178#### log::SetLogPosition <a id="technical-concepts-json-rpc-messages-log-setlogposition"></a>
2179
2180> Location: `apilistener.cpp` and `jsonrpcconnection.cpp`
2181
2182##### Message Body
2183
2184Key       | Value
2185----------|---------
2186jsonrpc   | 2.0
2187method    | log::SetLogPosition
2188params    | Dictionary
2189
2190##### Params
2191
2192Key                 | Type          | Description
2193--------------------|---------------|------------------
2194log\_position       | Timestamp     | The endpoint's log position as UNIX timestamp.
2195
2196
2197##### Functions
2198
2199**Event Sender:**
2200
2201During log replay to a client endpoint in `ApiListener::ReplayLog()`, each processed
2202file generates a message which updates the log position timestamp.
2203
2204`ApiListener::ApiTimerHandler()` invokes a check to keep all connected endpoints and
2205their log position in sync during replay log.
2206
2207**Event Receiver:** `SetLogPositionHandler`
2208
2209##### Permissions
2210
2211The receiver will not process messages from not configured endpoints.
2212