1// Copyright (C) 2017-2021 Internet Systems Consortium, Inc. ("ISC")
2//
3// This Source Code Form is subject to the terms of the Mozilla Public
4// License, v. 2.0. If a copy of the MPL was not distributed with this
5// file, You can obtain one at http://mozilla.org/MPL/2.0/.
6/**
7
8@page libdhcp_ha Kea High Availability Hooks Library
9
10Welcome to Kea High Availability Hooks Library. This documentation is
11addressed at developers who are interested in internal operation of the
12library. This file provides information needed to understand and perhaps
13extend this library.
14
15@section haOverview Overview
16
17The High Availability (HA) hooks library is intended for DHCP deployments
18in which there is a need to sustain the DHCP service in the event if one
19of the servers becomes unavailable as a result of a crash, power outage or
20other unexpected situation. The other server belonging to this setup should
21be able to handle the entire DHCP traffic directed to the system, including
22the traffic that would be normally handled by the server which became
23unavailable.
24
25Many of the concepts behind the HA hooks library are derived from the
26DHCP Failover protocol, however this solution has different architecture,
27uses different state machine and different message formats for communication
28between the participating servers. This solution is not a DHCP Failover
29implementation and, therefore, this documentation purposely avoids using
30the word "Failover" in the context of this library.
31
32The HA feature design can be found at
33<a href="https://gitlab.isc.org/isc-projects/kea/wikis/designs/High-Availability-Design">Kea HA Design page</a>.
34
35@section haWhyHookLibrary Why Hook Library?
36
37High Availability is a very important requirement for various DHCP
38deployments. It is a valid question why such a generic feature is
39placed in a hook library rather implemented as an integral part of the
40Kea DHCP servers. If the HA is implemented in the loadable library,
41users who don't use HA or who don't want to use this particular
42solution for HA will simply not load this library. The server code
43without the HA implementation is lighter, easier to understand and
44debug. High Availability is a pretty complex feature and will certainly
45keep growing both in size and complexity. Keeping it in a separate
46code base makes it easier to maintain and use. Also, the HA hooks
47library requires Kea lease_cmds hook library to be loaded on the
48participating servers. It would clearly be a bad design to introduce
49the feature relying on the presence the loadable (lease_cmds)
50module in the main Kea code.
51
52@section haNotableDifferences Notable Differences to ISC DHCP
53
54It is worth to briefly explain what are the major differences between Kea HA
55implementation and the failover implemented in ISC DHCP.
56
57There are two protocols that IETF attempted to standardize:
58<a href="https://datatracker.ietf.org/doc/html/draft-ietf-dhc-failover">
59DHCPv4 Failover draft</a>, which was an Internet Draft status that had
60expired Sept. 2003. The other one is <a href="https://tools.ietf.org/html/rfc8156">
61RFC8156: DHCPv6 Failover</a>, which was published as Proposed Standard.
62ISC DHCP implemented the former, but not the latter. As such, ISC DHCP
63is able to provide failover for DHCPv4 only, not DHCPv6.
64
65The second major difference is that both IETF failover protocols are based on
66MCLT (or Maximum Client Lead Time), sometimes referenced to as lazy
67updates. This mechanism lets a server respond immediately, which improves
68latency, but it does so at the cost of greatly increased complexity. The lease
69is assigned with a very short lifetime, then an update is sent to the other
70server with a lifetime greater than the client requested. Once the other server
71confirms the lease, the client's renewal is being updated with a longer
72lifetime.  This approach generates more traffic and causes lease lifetimes to
73fluctuate greatly, despite an administrator setting it to a specific value. Kea
74HA does not implement this complexity. It is much simpler and easier to use and
75understand its operation, although the price to pay for this relative simplicity
76is a longer response time and somewhat decreased performance.
77
78Third difference is that in ISC DHCP the failover relationship is strictly
79a pair (i.e. two) of servers. On the other hand Kea HA is able to define additional
80backup servers. While they're not technically participating in the HA
81relationship, their databases are kept up to date and can be used are replacements
82that are almost ready to take over the traffic. However, replacing primary
83or secondary server with a backup requires manual administrator's intervention.
84
85The fourth difference is that Kea HA does not support pool rebalancing yet.
86When running in load balancing mode, Kea uses hashing mechanism to segregate
87clients into one of two pools. It is unlikely, but possible that a network
88would be visited by clients that are predominantly assigned to one server.
89As a result, this server could ran out of addresses, while its underutilized
90partner could still have many addresses available. This unfortunate, but
91unlikely limitation will be removed in the future Kea releases.
92
93@section haAyncCommunication Asynchronous Communication with Boost Asio
94
95One of the major technical problems with High Availability is that the
96participating servers must constantly communicate with each other.
97When one of the servers allocates a lease it must notify its peer about
98this allocation and provide it with a full information about the
99allocated lease. The server which has allocated the lease must not
100respond to the client until its partner confirms that it has saved
101the lease in its database. This guarantees that, at any given time,
102both servers hold the most current lease information and any of the
103servers can take responsibility for managing existing leases if the
104partner server becomes unavailable. This is similar to the requirement
105on a single DHCP server which must store the lease information on
106the persistent storage before responding to the client. Failing to do
107so may cause the lease information to get lost if the server crashes
108before writing it to the lease file.
109
110The requirement for the partner to store the lease in its lease database
111and confirming this fact to the server allocating the lease results in
112increased latency of the DHCP responses to the clients. In order to
113minimize the latency the idea of "parking" DHCP packets has been introduced.
114This is a solution for pseudo parallel processing of multiple DHCP packets
115and to prevent blocking wait during the communication with the other server.
116When the HA hooks library needs to send a lease update to the partner,
117the client's packet associated with this lease is "parked", waiting for
118the communication with the partner to complete. Meanwhile, other incoming
119DHCP packets are processed (and also parked if necessary). The client
120which sent the DHCP packet still has to wait for the communication with
121the partner to complete, but it doesn't have to wait for the server to
122receive its packet (and start processing it) while previous DHCP
123transaction is still in progress.
124
125This solution requires that the communication between the servers is
126asynchronous and the most obvious framework for this was Boost ASIO,
127as it is already used in many different areas of the code.
128
129The DHCP servers are processing incoming packets synchronously (in a
130loop), but each loop pass contains a call to:
131
132@code
133getIOService()->poll();
134@endcode
135
136which executes callbacks for completed asynchronous operations, such as
137timers, asynchronous sends and receives. The instance of the IOService
138is owned by the DHCP servers, but hooks libraries must have access to it
139and must use this instance to schedule asynchronous tasks. This is why
140the new hook points "dhcp4_srv_configured" and "dhcp6_srv_configured"
141have been introduced. These hook points are used by the DHCPv4 and the
142DHCPv6 servers respectively, to pass the instance of the IOService
143(via "io_context" argument) to the hooks libraries which require to
144schedule asynchronous tasks.
145
146It is also worth to note that the blocking reception of the DHCP packets
147may cause up to 1 second delays in the asynchronous operations. This is
148due to the structure of the main server loop:
149
150@code
151bool
152Dhcpv4Srv::run() {
153    while (!shutdown_) {
154        try {
155            run_one();
156            getIOService()->poll();
157        } catch (const std::exception& e) {
158            // General catch-all exception that are not caught by more specific
159            // catches. This one is for exceptions derived from std::exception.
160            LOG_ERROR(packet4_logger, DHCP4_PACKET_PROCESS_STD_EXCEPTION)
161                .arg(e.what());
162        } catch (...) {
163            // General catch-all exception that are not caught by more specific
164            // catches. This one is for other exceptions, not derived from
165            // std::exception.
166            LOG_ERROR(packet4_logger, DHCP4_PACKET_PROCESS_EXCEPTION);
167        }
168    }
169
170    return (true);
171}
172@endcode
173
174The @c run_one() call includes a @c select() invocation with a timeout of
1751 second. The @c poll() is not invoked for at most 1 second while the server
176is performing this blocking @c select(). Future Kea releases should mitigate
177this problem by introducing some mechanisms for concurrent reception and
178processing of the DHCP packets.
179
180
181@section haClientClassification Client Classification in Load Balancing
182
183One of the top requirements for the HA was to support load balancing between
184two participating servers. Even though, current implementation supports
185only 50/50 split of packets between two servers, the implementation can
186easily be extended to support different splits.
187
188Another supported mode of operation is the "hot-standby" mode in which
189one of the servers handles the entire traffic and the other server is
190simply receiving lease updates from it. In case of the failure of the
191first server, the standby server can automatically switch to handle the
192DHCP traffic directed to the system.
193
194The "load-balancing" mode is more complex in that it requires isolation
195of address/prefix pools from which the respective servers are allocating
196leases for the clients. If the two servers were sharing address pools
197they would frequently run into the conflict whereby both of them would
198allocate the same address to different clients. This is not a problem in
199the "hot-standby" mode because there is only one server allocating leases
200at the given time.
201
202The most challenging part in case of load balancing is the configuration
203of the address pools on respective servers. At the time when the HA design
204was created, there was no requirement on the HA hooks library to be able
205to rebalance the pools, e.g. in case one of the pools is nearly exhausted
206and the other pool include many available addresses or prefixes. This
207requirement may come in the future, in which case the current approach
208to the configuration may be enhanced.
209
210The current approach uses existing client classification mechanism to
211statically split allocations accross multiple pools. Client classification
212was designed to serve as a generic framework to support various scenarios
213in which clients need to be segregated and associated with selected
214pools, subnets and shared networks. The load balancing in HA hooks
215library is nothing else but another use case for client classification.
216Should new requirements be created for the HA hooks library in the
217future (e.g. rebalancing), the client classification will need to be
218extended to adopt those requirements.
219
220In fact, client classification was already extended for the Kea 1.4.0
221release to allow for selecting a specific pool based on combinations
222of classes, rather than a single class associated with the server
223by the HA load balancing algorithm. The examples of the pools split
224between different device types (e.g. laptops and telephones) and
225between load balancing servers (e.g. "server1" and "server2") can
226be found in the Kea Administrator's Manual.
227
228@section haCodeStructure HA Hooks Library Code Structure
229
230@subsection haService HA Service Class
231
232The @c isc::ha::HAService class is a heart of the HA system. It implements the
233HA state machine. It is derived from the @c isc::util::StateModel
234class. The states are documented both in the Kea Administrator's
235Manual and the HA design. The declarations of the states can be
236found in the @c ha_service_states.h header file because they are
237used by multiple C++ classes.
238
239Besides running the state machine transitions, the @c HAService
240class serves the following purposes:
241
242- Assigns class to the received DHCP packet appropriate for the server
243  selected to process the DHCP packet as a result of load balancing.
244- Measures the clock skew between the active servers. If the clock skew
245  is too high, it can either log an error or stop the HA function.
246- Sends lease updates to the partner and receives responses.
247- Sends heartbeat command to the partner to verify partner's state
248  and its notion of time (for clock skew).
249- Controls whether the DHCP server should respond to the queries
250  from clients or not.
251- Synchronizes local lease database by fetching the leases from the
252  partner server.
253- Controls which packets the server responds to (HA scopes).
254
255As of Kea 1.4.0 release, there is only one instance of the @c HAService
256class created by the HA hooks library. In the future, multiple
257@c HAService instances may co-exist, each handling an independent HA
258relationship with another server. For example: a server could be
259configured to respond to devices in two subnets and establish a
260connection with two different servers for respective subnets. Lease
261updates pertaining to the first subnet would be sent via first
262connection and those pertaining to the second subnet would be sent
263via the second connection. As of Kea 1.4.0 release, there is exactly
264one relationship that the Kea server instance can participate in.
265
266@subsection haImplementation HA Implementation Class
267
268The @c isc::ha::HAImpl class implements callouts and command handlers supported
269by the HA hooks library. Its methods expect @c isc::hooks::CalloutHandle
270as arguments and are usually directly called by the callout functions
271such as @c pkt4_receive etc. This makes it more natural to unit test
272those implementations because the  tests can invoke methods of the @c HAImpl
273class, rather than the "extern" functions.
274
275Internally, the @c HAImpl class methods call methods of the @c HAService
276class to perform certain actions, such as triggering lease updates,
277sending heartbeat to another server etc. However, the @c HAImpl still
278includes a fair amount of logic to retrieve and validate the arguments
279provided within the @c isc::hooks::CalloutHandle.
280
281The @c isc::ha::HAImpl::buffer4Receive and @c isc::ha::HAImpl::buffer6Receive
282functions deserve some detailed explanation, because not only do they retrieve
283the arguments provided to the callouts but also perform parsing of the received
284DHCP queries.
285
286The DHCP query parsing is normally performed by the server. In most
287cases a hooks library would not have to parse the DHCP packets on
288its own. If the hooks library needs to access some information, e.g.
289DHCP options or BOOTP message fields, it is sufficient to
290implement the @c pkt4_receive or @c pkt6_receive callout, which is
291invoked after the server has parsed the packet. However, this
292approach would not work in case of the HA hooks library. This
293library assigns classes as a result of the load balancing to the
294incoming packets. This assignment must take place before the server
295evaluates classes specified in the configuration file, i.e.
296before the @c pkt4_receive and @c pkt6_receive hook point. This
297implies that the HA specific classification must be performed within
298the @c buffer4_receive or @c buffer6_receive callouts. These callouts
299must parse (unpack) the received buffers to have an access into the
300data used by the load balancing algorithm, such as: MAC address, client
301identifier or DUID.
302
303@subsection haQueryFilter Query Filter Class
304
305The @c isc::ha::QueryFilter class is used to control which DHCP queries are
306to be processed by respective servers. It implements the load
307balancing algorithm which is triggered by cooperating servers against
308each incoming packet and results in assigning the packet to one of the
309served "scopes". Scopes are associated with the servers and are named
310after the servers. In the load balancing case there are two scopes,
311e.g. "server1" and "server2". The Load balancing algorithm selects
312one of the scopes for the packet. During the normal operation,
313each server handles its own scope. In the "partner-down" state, the
314surviving server would handle both scopes. The selection of the
315scopes to be served by the server instance is usually made
316automatically as a result of transitioning to some new state within
317the @c HAService class. However, the scopes assignment can also be
318made via control channel as a result of an administrative action.
319
320@subsection haCommunicationState Communication State Class
321
322The @c CommunicationState class is used by the @c HAService to
323control all aspects of the communication between the active servers,
324i.e.:
325
326- Scheduling periodic heartbeat commands using Boost ASIO timers.
327- Holding the state of the partner returned in response to the
328  heartbeat command.
329- Recording when the last successful heartbeat has been sent, i.e.
330  how long the partner server has been unresponsive.
331- Analyzing DHCP queries to detect whether the partner server is
332  not responsive by checking whether the values in the 'secs' field
333  or Elapsed Time option are too high.
334- Monitoring the clocks skew between the active servers, which is
335  calculated by substracting the current time (on the local
336  server) from the time returned by the partner in response to the
337  heartbeat command.
338
339The large part of this class is common for the DHCPv4 and DHCPv6 servers.
340However, there are differences in how the DHCPv4 and the DHCPv6 messages
341are analyzed to detect whether the partner server has stopped responding:
342
343- The DHCPv4 server uses 'secs' field, while the DHCPv6 server looks
344  into the DHCPv6 specific Elapsed Time option.
345- When the DHCPv4 server records a client information in case if the
346  DHCPv4 server fails to respond the client's query, it records both the
347  client identifier and the MAC address. The DHCPv6 server uses the
348  DUID to record the client.
349
350Those differences led to creation of DHCPv4 and DHCPv6 specific
351derivations of the @c CommunicationState class, which differently
352deal with analysis of the queries.
353
354The clock skew is checked by the @c QueryFilter class every time
355it is updated as a result of receiving a response to the heartbeat.
356If the clock skew is in the range of 30 to 60 seconds, the
357@c clockSkewShouldWarn returns true to indicate to the @c HAService
358that a warning should be logged. In order to prevent too frequent
359warnings (especially when heartbeats are sent frequently), this
360method implements a simple gating algorithm, which would not return
361true (trigger the warning) more often than every 60 seconds.
362
363The @c isc::ha::CommunicationState::clockSkewShouldTerminate informs whether
364the clock skew has exceeded 60 seconds, in which case the
365@c HAService class would transition to the "terminated" state.
366
367@subsection haCommandCreator Command Creator Class
368
369The @c CommandCreator is a collection of static methods which
370create commands issued between the HA-enabled DHCP servers. These
371JSON commands are sent over the @c isc::http::HttpClient from the
372@c HAService class.
373
374@section haShortcomings Future HA Hooks Library Improvement Ideas
375
376The HA hooks library was first released with Kea 1.4.0. There are
377numerous enhancements to this library considered for the future releases.
378Some of them are briefly described in this section.
379
380@subsection haStateMachineControl Controlling State Machine
381
382As of Kea 1.4.0, there are no control commands allowing for setting or
383influencing the transitions between states. In particular, there is no
384way to pause the HA state machine on the selected state to perform
385some administrative actions before transitioning to the normal
386operation state.
387
388@subsection haNameUpdates DNS Updates are not Coordinated
389
390When one of the servers allocates the lease this server is responsible
391or sending a DNS update if configured to send such updates. The partner
392server receives the lease update (including the inserted hostname) so
393it knows that the hostname was stored in the DNS. When this lease
394subsequently expires, the hostname must be removed from the DNS. The
395HA hooks library, however, has no means to record which server has
396allocated this lease in the lease database. If recording such information
397had been possible, the same server which allocated the lease would have
398sent the removal name change request (NCR) to the D2. Because this
399information is unavailable, both servers will send the removal NCRs.
400One of those NCRs will succeed, another one will fail.
401
402Addressing this issue requires two enhancements:
403
404- Implementing "user context" for leases, which could be used for storing
405  custom type of information, e.g. server identifier, along with the leases.
406- Implementing callouts for the "lease4_expire" and "lease6_expire" hook
407  points via which the server removing the lease from the database could
408  notify the partner about such removal.
409
410@section haMTCompatibility Multi-Threading Compatibility
411
412The High Availability hooks library is compatible with multi-threading.
413
414*/
415