1// Copyright (C) 2017-2021 Internet Systems Consortium, Inc. ("ISC") 2// 3// This Source Code Form is subject to the terms of the Mozilla Public 4// License, v. 2.0. If a copy of the MPL was not distributed with this 5// file, You can obtain one at http://mozilla.org/MPL/2.0/. 6/** 7 8@page libdhcp_ha Kea High Availability Hooks Library 9 10Welcome to Kea High Availability Hooks Library. This documentation is 11addressed at developers who are interested in internal operation of the 12library. This file provides information needed to understand and perhaps 13extend this library. 14 15@section haOverview Overview 16 17The High Availability (HA) hooks library is intended for DHCP deployments 18in which there is a need to sustain the DHCP service in the event if one 19of the servers becomes unavailable as a result of a crash, power outage or 20other unexpected situation. The other server belonging to this setup should 21be able to handle the entire DHCP traffic directed to the system, including 22the traffic that would be normally handled by the server which became 23unavailable. 24 25Many of the concepts behind the HA hooks library are derived from the 26DHCP Failover protocol, however this solution has different architecture, 27uses different state machine and different message formats for communication 28between the participating servers. This solution is not a DHCP Failover 29implementation and, therefore, this documentation purposely avoids using 30the word "Failover" in the context of this library. 31 32The HA feature design can be found at 33<a href="https://gitlab.isc.org/isc-projects/kea/wikis/designs/High-Availability-Design">Kea HA Design page</a>. 34 35@section haWhyHookLibrary Why Hook Library? 36 37High Availability is a very important requirement for various DHCP 38deployments. It is a valid question why such a generic feature is 39placed in a hook library rather implemented as an integral part of the 40Kea DHCP servers. If the HA is implemented in the loadable library, 41users who don't use HA or who don't want to use this particular 42solution for HA will simply not load this library. The server code 43without the HA implementation is lighter, easier to understand and 44debug. High Availability is a pretty complex feature and will certainly 45keep growing both in size and complexity. Keeping it in a separate 46code base makes it easier to maintain and use. Also, the HA hooks 47library requires Kea lease_cmds hook library to be loaded on the 48participating servers. It would clearly be a bad design to introduce 49the feature relying on the presence the loadable (lease_cmds) 50module in the main Kea code. 51 52@section haNotableDifferences Notable Differences to ISC DHCP 53 54It is worth to briefly explain what are the major differences between Kea HA 55implementation and the failover implemented in ISC DHCP. 56 57There are two protocols that IETF attempted to standardize: 58<a href="https://datatracker.ietf.org/doc/html/draft-ietf-dhc-failover"> 59DHCPv4 Failover draft</a>, which was an Internet Draft status that had 60expired Sept. 2003. The other one is <a href="https://tools.ietf.org/html/rfc8156"> 61RFC8156: DHCPv6 Failover</a>, which was published as Proposed Standard. 62ISC DHCP implemented the former, but not the latter. As such, ISC DHCP 63is able to provide failover for DHCPv4 only, not DHCPv6. 64 65The second major difference is that both IETF failover protocols are based on 66MCLT (or Maximum Client Lead Time), sometimes referenced to as lazy 67updates. This mechanism lets a server respond immediately, which improves 68latency, but it does so at the cost of greatly increased complexity. The lease 69is assigned with a very short lifetime, then an update is sent to the other 70server with a lifetime greater than the client requested. Once the other server 71confirms the lease, the client's renewal is being updated with a longer 72lifetime. This approach generates more traffic and causes lease lifetimes to 73fluctuate greatly, despite an administrator setting it to a specific value. Kea 74HA does not implement this complexity. It is much simpler and easier to use and 75understand its operation, although the price to pay for this relative simplicity 76is a longer response time and somewhat decreased performance. 77 78Third difference is that in ISC DHCP the failover relationship is strictly 79a pair (i.e. two) of servers. On the other hand Kea HA is able to define additional 80backup servers. While they're not technically participating in the HA 81relationship, their databases are kept up to date and can be used are replacements 82that are almost ready to take over the traffic. However, replacing primary 83or secondary server with a backup requires manual administrator's intervention. 84 85The fourth difference is that Kea HA does not support pool rebalancing yet. 86When running in load balancing mode, Kea uses hashing mechanism to segregate 87clients into one of two pools. It is unlikely, but possible that a network 88would be visited by clients that are predominantly assigned to one server. 89As a result, this server could ran out of addresses, while its underutilized 90partner could still have many addresses available. This unfortunate, but 91unlikely limitation will be removed in the future Kea releases. 92 93@section haAyncCommunication Asynchronous Communication with Boost Asio 94 95One of the major technical problems with High Availability is that the 96participating servers must constantly communicate with each other. 97When one of the servers allocates a lease it must notify its peer about 98this allocation and provide it with a full information about the 99allocated lease. The server which has allocated the lease must not 100respond to the client until its partner confirms that it has saved 101the lease in its database. This guarantees that, at any given time, 102both servers hold the most current lease information and any of the 103servers can take responsibility for managing existing leases if the 104partner server becomes unavailable. This is similar to the requirement 105on a single DHCP server which must store the lease information on 106the persistent storage before responding to the client. Failing to do 107so may cause the lease information to get lost if the server crashes 108before writing it to the lease file. 109 110The requirement for the partner to store the lease in its lease database 111and confirming this fact to the server allocating the lease results in 112increased latency of the DHCP responses to the clients. In order to 113minimize the latency the idea of "parking" DHCP packets has been introduced. 114This is a solution for pseudo parallel processing of multiple DHCP packets 115and to prevent blocking wait during the communication with the other server. 116When the HA hooks library needs to send a lease update to the partner, 117the client's packet associated with this lease is "parked", waiting for 118the communication with the partner to complete. Meanwhile, other incoming 119DHCP packets are processed (and also parked if necessary). The client 120which sent the DHCP packet still has to wait for the communication with 121the partner to complete, but it doesn't have to wait for the server to 122receive its packet (and start processing it) while previous DHCP 123transaction is still in progress. 124 125This solution requires that the communication between the servers is 126asynchronous and the most obvious framework for this was Boost ASIO, 127as it is already used in many different areas of the code. 128 129The DHCP servers are processing incoming packets synchronously (in a 130loop), but each loop pass contains a call to: 131 132@code 133getIOService()->poll(); 134@endcode 135 136which executes callbacks for completed asynchronous operations, such as 137timers, asynchronous sends and receives. The instance of the IOService 138is owned by the DHCP servers, but hooks libraries must have access to it 139and must use this instance to schedule asynchronous tasks. This is why 140the new hook points "dhcp4_srv_configured" and "dhcp6_srv_configured" 141have been introduced. These hook points are used by the DHCPv4 and the 142DHCPv6 servers respectively, to pass the instance of the IOService 143(via "io_context" argument) to the hooks libraries which require to 144schedule asynchronous tasks. 145 146It is also worth to note that the blocking reception of the DHCP packets 147may cause up to 1 second delays in the asynchronous operations. This is 148due to the structure of the main server loop: 149 150@code 151bool 152Dhcpv4Srv::run() { 153 while (!shutdown_) { 154 try { 155 run_one(); 156 getIOService()->poll(); 157 } catch (const std::exception& e) { 158 // General catch-all exception that are not caught by more specific 159 // catches. This one is for exceptions derived from std::exception. 160 LOG_ERROR(packet4_logger, DHCP4_PACKET_PROCESS_STD_EXCEPTION) 161 .arg(e.what()); 162 } catch (...) { 163 // General catch-all exception that are not caught by more specific 164 // catches. This one is for other exceptions, not derived from 165 // std::exception. 166 LOG_ERROR(packet4_logger, DHCP4_PACKET_PROCESS_EXCEPTION); 167 } 168 } 169 170 return (true); 171} 172@endcode 173 174The @c run_one() call includes a @c select() invocation with a timeout of 1751 second. The @c poll() is not invoked for at most 1 second while the server 176is performing this blocking @c select(). Future Kea releases should mitigate 177this problem by introducing some mechanisms for concurrent reception and 178processing of the DHCP packets. 179 180 181@section haClientClassification Client Classification in Load Balancing 182 183One of the top requirements for the HA was to support load balancing between 184two participating servers. Even though, current implementation supports 185only 50/50 split of packets between two servers, the implementation can 186easily be extended to support different splits. 187 188Another supported mode of operation is the "hot-standby" mode in which 189one of the servers handles the entire traffic and the other server is 190simply receiving lease updates from it. In case of the failure of the 191first server, the standby server can automatically switch to handle the 192DHCP traffic directed to the system. 193 194The "load-balancing" mode is more complex in that it requires isolation 195of address/prefix pools from which the respective servers are allocating 196leases for the clients. If the two servers were sharing address pools 197they would frequently run into the conflict whereby both of them would 198allocate the same address to different clients. This is not a problem in 199the "hot-standby" mode because there is only one server allocating leases 200at the given time. 201 202The most challenging part in case of load balancing is the configuration 203of the address pools on respective servers. At the time when the HA design 204was created, there was no requirement on the HA hooks library to be able 205to rebalance the pools, e.g. in case one of the pools is nearly exhausted 206and the other pool include many available addresses or prefixes. This 207requirement may come in the future, in which case the current approach 208to the configuration may be enhanced. 209 210The current approach uses existing client classification mechanism to 211statically split allocations accross multiple pools. Client classification 212was designed to serve as a generic framework to support various scenarios 213in which clients need to be segregated and associated with selected 214pools, subnets and shared networks. The load balancing in HA hooks 215library is nothing else but another use case for client classification. 216Should new requirements be created for the HA hooks library in the 217future (e.g. rebalancing), the client classification will need to be 218extended to adopt those requirements. 219 220In fact, client classification was already extended for the Kea 1.4.0 221release to allow for selecting a specific pool based on combinations 222of classes, rather than a single class associated with the server 223by the HA load balancing algorithm. The examples of the pools split 224between different device types (e.g. laptops and telephones) and 225between load balancing servers (e.g. "server1" and "server2") can 226be found in the Kea Administrator's Manual. 227 228@section haCodeStructure HA Hooks Library Code Structure 229 230@subsection haService HA Service Class 231 232The @c isc::ha::HAService class is a heart of the HA system. It implements the 233HA state machine. It is derived from the @c isc::util::StateModel 234class. The states are documented both in the Kea Administrator's 235Manual and the HA design. The declarations of the states can be 236found in the @c ha_service_states.h header file because they are 237used by multiple C++ classes. 238 239Besides running the state machine transitions, the @c HAService 240class serves the following purposes: 241 242- Assigns class to the received DHCP packet appropriate for the server 243 selected to process the DHCP packet as a result of load balancing. 244- Measures the clock skew between the active servers. If the clock skew 245 is too high, it can either log an error or stop the HA function. 246- Sends lease updates to the partner and receives responses. 247- Sends heartbeat command to the partner to verify partner's state 248 and its notion of time (for clock skew). 249- Controls whether the DHCP server should respond to the queries 250 from clients or not. 251- Synchronizes local lease database by fetching the leases from the 252 partner server. 253- Controls which packets the server responds to (HA scopes). 254 255As of Kea 1.4.0 release, there is only one instance of the @c HAService 256class created by the HA hooks library. In the future, multiple 257@c HAService instances may co-exist, each handling an independent HA 258relationship with another server. For example: a server could be 259configured to respond to devices in two subnets and establish a 260connection with two different servers for respective subnets. Lease 261updates pertaining to the first subnet would be sent via first 262connection and those pertaining to the second subnet would be sent 263via the second connection. As of Kea 1.4.0 release, there is exactly 264one relationship that the Kea server instance can participate in. 265 266@subsection haImplementation HA Implementation Class 267 268The @c isc::ha::HAImpl class implements callouts and command handlers supported 269by the HA hooks library. Its methods expect @c isc::hooks::CalloutHandle 270as arguments and are usually directly called by the callout functions 271such as @c pkt4_receive etc. This makes it more natural to unit test 272those implementations because the tests can invoke methods of the @c HAImpl 273class, rather than the "extern" functions. 274 275Internally, the @c HAImpl class methods call methods of the @c HAService 276class to perform certain actions, such as triggering lease updates, 277sending heartbeat to another server etc. However, the @c HAImpl still 278includes a fair amount of logic to retrieve and validate the arguments 279provided within the @c isc::hooks::CalloutHandle. 280 281The @c isc::ha::HAImpl::buffer4Receive and @c isc::ha::HAImpl::buffer6Receive 282functions deserve some detailed explanation, because not only do they retrieve 283the arguments provided to the callouts but also perform parsing of the received 284DHCP queries. 285 286The DHCP query parsing is normally performed by the server. In most 287cases a hooks library would not have to parse the DHCP packets on 288its own. If the hooks library needs to access some information, e.g. 289DHCP options or BOOTP message fields, it is sufficient to 290implement the @c pkt4_receive or @c pkt6_receive callout, which is 291invoked after the server has parsed the packet. However, this 292approach would not work in case of the HA hooks library. This 293library assigns classes as a result of the load balancing to the 294incoming packets. This assignment must take place before the server 295evaluates classes specified in the configuration file, i.e. 296before the @c pkt4_receive and @c pkt6_receive hook point. This 297implies that the HA specific classification must be performed within 298the @c buffer4_receive or @c buffer6_receive callouts. These callouts 299must parse (unpack) the received buffers to have an access into the 300data used by the load balancing algorithm, such as: MAC address, client 301identifier or DUID. 302 303@subsection haQueryFilter Query Filter Class 304 305The @c isc::ha::QueryFilter class is used to control which DHCP queries are 306to be processed by respective servers. It implements the load 307balancing algorithm which is triggered by cooperating servers against 308each incoming packet and results in assigning the packet to one of the 309served "scopes". Scopes are associated with the servers and are named 310after the servers. In the load balancing case there are two scopes, 311e.g. "server1" and "server2". The Load balancing algorithm selects 312one of the scopes for the packet. During the normal operation, 313each server handles its own scope. In the "partner-down" state, the 314surviving server would handle both scopes. The selection of the 315scopes to be served by the server instance is usually made 316automatically as a result of transitioning to some new state within 317the @c HAService class. However, the scopes assignment can also be 318made via control channel as a result of an administrative action. 319 320@subsection haCommunicationState Communication State Class 321 322The @c CommunicationState class is used by the @c HAService to 323control all aspects of the communication between the active servers, 324i.e.: 325 326- Scheduling periodic heartbeat commands using Boost ASIO timers. 327- Holding the state of the partner returned in response to the 328 heartbeat command. 329- Recording when the last successful heartbeat has been sent, i.e. 330 how long the partner server has been unresponsive. 331- Analyzing DHCP queries to detect whether the partner server is 332 not responsive by checking whether the values in the 'secs' field 333 or Elapsed Time option are too high. 334- Monitoring the clocks skew between the active servers, which is 335 calculated by substracting the current time (on the local 336 server) from the time returned by the partner in response to the 337 heartbeat command. 338 339The large part of this class is common for the DHCPv4 and DHCPv6 servers. 340However, there are differences in how the DHCPv4 and the DHCPv6 messages 341are analyzed to detect whether the partner server has stopped responding: 342 343- The DHCPv4 server uses 'secs' field, while the DHCPv6 server looks 344 into the DHCPv6 specific Elapsed Time option. 345- When the DHCPv4 server records a client information in case if the 346 DHCPv4 server fails to respond the client's query, it records both the 347 client identifier and the MAC address. The DHCPv6 server uses the 348 DUID to record the client. 349 350Those differences led to creation of DHCPv4 and DHCPv6 specific 351derivations of the @c CommunicationState class, which differently 352deal with analysis of the queries. 353 354The clock skew is checked by the @c QueryFilter class every time 355it is updated as a result of receiving a response to the heartbeat. 356If the clock skew is in the range of 30 to 60 seconds, the 357@c clockSkewShouldWarn returns true to indicate to the @c HAService 358that a warning should be logged. In order to prevent too frequent 359warnings (especially when heartbeats are sent frequently), this 360method implements a simple gating algorithm, which would not return 361true (trigger the warning) more often than every 60 seconds. 362 363The @c isc::ha::CommunicationState::clockSkewShouldTerminate informs whether 364the clock skew has exceeded 60 seconds, in which case the 365@c HAService class would transition to the "terminated" state. 366 367@subsection haCommandCreator Command Creator Class 368 369The @c CommandCreator is a collection of static methods which 370create commands issued between the HA-enabled DHCP servers. These 371JSON commands are sent over the @c isc::http::HttpClient from the 372@c HAService class. 373 374@section haShortcomings Future HA Hooks Library Improvement Ideas 375 376The HA hooks library was first released with Kea 1.4.0. There are 377numerous enhancements to this library considered for the future releases. 378Some of them are briefly described in this section. 379 380@subsection haStateMachineControl Controlling State Machine 381 382As of Kea 1.4.0, there are no control commands allowing for setting or 383influencing the transitions between states. In particular, there is no 384way to pause the HA state machine on the selected state to perform 385some administrative actions before transitioning to the normal 386operation state. 387 388@subsection haNameUpdates DNS Updates are not Coordinated 389 390When one of the servers allocates the lease this server is responsible 391or sending a DNS update if configured to send such updates. The partner 392server receives the lease update (including the inserted hostname) so 393it knows that the hostname was stored in the DNS. When this lease 394subsequently expires, the hostname must be removed from the DNS. The 395HA hooks library, however, has no means to record which server has 396allocated this lease in the lease database. If recording such information 397had been possible, the same server which allocated the lease would have 398sent the removal name change request (NCR) to the D2. Because this 399information is unavailable, both servers will send the removal NCRs. 400One of those NCRs will succeed, another one will fail. 401 402Addressing this issue requires two enhancements: 403 404- Implementing "user context" for leases, which could be used for storing 405 custom type of information, e.g. server identifier, along with the leases. 406- Implementing callouts for the "lease4_expire" and "lease6_expire" hook 407 points via which the server removing the lease from the database could 408 notify the partner about such removal. 409 410@section haMTCompatibility Multi-Threading Compatibility 411 412The High Availability hooks library is compatible with multi-threading. 413 414*/ 415