1In this file a quick overview of all the modifications that have been made for 2zone verification. 3 4 5Configuring the verifier 6======================== 7 8Configure (nsd.conf) options were added. In the new "verify:" clause: 9 enable: 10 port: 11 ip-address: 12 verify-zones: 13 verifier: 14 verifier-count, 15 verifier-feed-zone, 16 and verifier-timeout. 17 18And for the "zone:" and "pattern:" clauses: 19 verify-zone, 20 verifier, 21 verifier-feed-zone, 22 and verifier-timeout. 23 24To parse the syntax for those options, configlexer.lex and configparser.y are 25modified. To hold those configuration values, the structs nsd_options and 26pattern_options in the file options.h are extended. 27 28The type of pattern_options::verifier, char**, is in the vector of arguments 29form that can be used by the execve family of executing functions. The helper 30type "struct component" is defined to help parsing a command with arguments. 31A zone_verifier is a list of STRING tokens. A stack of component is 32constructed from those strings, that eventually is converted to an argument 33in configparser.y. 34 35 36Difffile modifications 37====================== 38 39It is possible that during a reload updates for multiple different zones are 40read. If some should be loaded (because they verified or didn't need to be 41verified) and some not, we have a problem because the database is updated 42with all the updates (also the bad ones) and we cannot easily selectively 43undo only the bad updates. 44 45In order to break this situation the committed field of each transfer is 46utilized. Initially it will be assigned the value DIFF_NOT_COMMITTED (0). 47When an update is verified this will be modified to DIFF_COMMITTED (1), 48DIFF_CORRUPT (2) or DIFF_INCONSISTENT (4) depending on whether the update 49was applied and verified successfully. When a reload resulted in one or 50more zones being corrupt or inconsistent, the newly forked server will quit 51with exit status NSD_RELOAD_FAILED and the parent server will initiate a new 52reload. Then it is clear which updates should be merged with the database (the 53updates which committed field is neither DIFF_CORRUPT or DIFF_INCONSISTENT). 54 55 Handling of the NSD_RELOAD_FAILED exit status of a child reload server 56 is in server_main (server.c) 57 58To allow updates to be applied again on failure, xfrd has been updated to keep 59all updates for each zone around until a reload succeeds. The set of updates 60is fixed once a reload has been initiated to avoid a potentially infinite 61loop. During the update window, xfrd will accept and transfer updates, but 62does not schedule them until the reload finishes. As a result, xfrd manages 63the updates stored on disk rather than the server, which previously just 64removed each update during the reload process regardless of the result. 65Potentially resulting in the same transfer being tried mutiple times if the 66set of updates contained a bad update. 67 68 69Running verifiers 70================= 71 72In server_reload (in server.c) the function server_verify is called just after 73all updates are merged into the (in memory) database, but just before the new 74database will be served. server_verify sets up a temporary event loop, calls 75verify_zone repeatedly to run the verifiers and mark each updated zone. 76server_reload then inspects the update status for each zone and communicates 77the number of good and bad zones in the update. server_reload then decides how 78to continue based on the number of good and bad zones as described above. 79 80verify_zone is defined in verify.c (and .h). The function creates the 81necessary pipes, starts the verifier and then sets up the required events and 82registers them with the event loop. 83 84The state for each verifier is maintained an array of struct verifier. The 85size of the array is "verifier-count:" big. Each verifier that runs 86simultaneously is assigned a slot. When no free slots are available it waits 87until a running verifier is finished (or timed out) and a free slot is 88available for a potential next verifier to run simultaneously with the already 89running verifiers. The default setting is to run just one verifier at once, 90which will probably be fine in most situations. 91 92Once all verifiers are finised (or timed out), the event loop is exited and 93server_reload communicates the status for each updated zone. 94 95 96Environment variables for the verifiers 97======================================= 98 99Verifiers are informed on how a zone can be verified through environment 100variables. The information on which addresses and ports a verifier may query a 101zone to be assessed is available and set on startup just after reading the 102configuration and setting up the sockets in nsd.c by calling 103setup_verifier_environment (also in nsd.c). 104 105Verifiers are spawned (via verify_zone) with popen3. verify_zone sets the zone 106specific environment variables (VERIFY_ZONE and VERIFY_ZONE_ON_STDIN) just 107before it executes the verifier with execvp. Server sockets are automatically 108closed when the verifier is executed. 109 110 111Logging a verifiers standard output and error streams 112===================================================== 113 114Everything a verifier outputs to stdin and stderr is logged in the nsd log 115file. Handler with handle_log_from_fd (verify.c) as a callback are setup by 116server_verifiers_add. The log_from_fd_t struct is the user_data for the handler 117and contains besides the priority and the file descriptor, variables that are 118used by handle_log_from_fd to make sure logged lines will never exceed 119LOGLINELEN in length and will be split into parts if necessary. 120 121Note that in practice error messages are always logged before messages on the 122standard output, because stdout is buffered and stderr is not. Maybe it is more 123convenient to set stdout to unbuffered too. 124 125 126Feeding a zone to a verifier 127============================ 128 129The complete zone may be fed to the standard input of a verifier when the 130"verifier-feed-zone:" configuration option has value "yes" (the default). For 131this purpose a verify_handle_feed (verify.c) handler is called when the 132standard input file descriptor of the verifier is writeable. The function 133utilizes the zone_rr_iter_next (verify.c) function to get the next rr to 134write to the verifier. The verifier_zone_feed struct is used to maintain state 135(the file handle, the rr pretty printing state and the zone iterator). 136 137 138Serving a zone to a verifier 139============================ 140 141The nsd struct (in nsd.h) is extended with two arrays of nsd_socket structs: 142verify_tcp and verify_udp and an verify_ifs size_t which holds the number of 143sockets for verifying. This reflects the tcp, udp and ifs members that are used 144for normal serving. Several parts in the code that operate on the tcp and udp 145arrays is simply reused with the verify_tcp and verify_udp arrays. 146 147Furthermore, in places in server.c were before the server_close_all_sockets 148(server.c) function was used with the normal server sockets, the function is 149called subsequently for the verify sockets. Also in server_start_xfrd the 150sockets for verifiers are closed in the xfrd child process, because it has no 151need for them. 152 153 154Verifier timeouts 155================= 156 157A handler for timeouts (as configured with the "verifier-timeout:" option) is 158added by server_verifiers_add at verifier initialization time. The callback is 159handle_verifier_timeout (verify.c) and the verifier_state_type for the verifier 160is used as user_data. 161 162verify_handle_timeout simply kills the verifier (by sending SIGTERM) and does 163not cleanup the verifier state for reuse. This is done in verify_handle_exit, 164which is triggered once the verifier exits, because it can handle and start 165more verifiers simultaneously. 166 167 168Aborting the reload process (and killing all running verifiers) 169=============================================================== 170 171A reload might (especially with a verifier) take some time. A parent server 172process could in this time be asked to quit. If that happens and it has a child 173reload server process, it sends the NSD_QUIT command over the communication 174channel. verify_handle_command, which is registered when the temporary event 175loop is created, is triggered and sends a SIGTERM signal to each of the 176verifiers. 177 178 179Refreshing and expiring zones 180============================= 181 182When the SOA-Refresh timer runs out, a fresh zone is tried to be fetched from 183the master server. If that fails, each SOA-Retry time will be tried again. To 184prevent a bad zone from being verified again and again, xfrd remembers the 185last serial number of the zone that didn't verify. It will not try to transfer 186a zone with the bad serial number again. 187 188Before afer reloading, the reload process informed xfrd which SOA's were 189merged in the database, so that xfrd knew when zone needed to be refreshed. 190This is adapted to inform xfrd about bad zones. The function 191inform_xfrd_new_soas is called for this in server.c. It communicated either 192good or bad soas. When bad soas are communicated a session starts with 193NSD_BAD_SOA_BEGIN. For only good zones it starts with NSD_SOA_BEGIN. Each soa 194is preceded by a NSD_SOA_INFO. When all soas are communicated, NSD_SOA_END is 195send. Reception of these messages by xfrd is handled by function 196xfrd_handle_ipc_read in ipc.c. In the xfrd_state struct (in xfrd.h), the 197boolean parent_bad_soa_infos is added to help with this control flow in ipc. 198 199The soas are eventually processed by xfrd, via xfrd_handle_ipc_SOAINFO in 200ipc.c, with the xfrd_handle_incoming_soa function in xfrd.c. The function 201make sure that if a bad soa was received it is remembered in the xfrd_zone 202struct. Two new variables are added for the purpose to this struct: soa_bad 203and soa_bad_acquired. The values are stored and read to the xfrd.state file 204with the functions xfrd_write_state_soa and xfrd_read_state respectively. 205 206In xfrd.c function xfrd_parse_received_xfr_packet is adapted to make sure that 207known bad serials are not transfered again unless the transfer is in a 208response to a notify. And even then only when the SOA matches the one in the 209notify (if it contained one, otherwise any SOA is good). 210 211