xref: /openbsd/usr.sbin/nsd/doc/NSD-VERIFY-MODS (revision d415bd75)
1In this file a quick overview of all the modifications that have been made for
2zone verification.
3
4
5Configuring the verifier
6========================
7
8Configure (nsd.conf) options were added. In the new "verify:" clause:
9	enable:
10	port:
11	ip-address:
12	verify-zones:
13	verifier:
14	verifier-count,
15	verifier-feed-zone,
16    and verifier-timeout.
17
18And for the "zone:" and "pattern:" clauses:
19	verify-zone,
20	verifier,
21	verifier-feed-zone,
22    and verifier-timeout.
23
24To parse the syntax for those options, configlexer.lex and configparser.y are
25modified. To hold those configuration values, the structs nsd_options and
26pattern_options in the file options.h are extended.
27
28The type of pattern_options::verifier, char**, is in the vector of arguments
29form that can be used by the execve family of executing functions. The helper
30type "struct component" is defined to help parsing a command with arguments.
31A zone_verifier is a list of STRING tokens. A stack of component is
32constructed from those strings, that eventually is converted to an argument
33in configparser.y.
34
35
36Difffile modifications
37======================
38
39It is possible that during a reload updates for multiple different zones are
40read. If some should be loaded (because they verified or didn't need to be
41verified) and some not, we have a problem because the database is updated
42with all the updates (also the bad ones) and we cannot easily selectively
43undo only the bad updates.
44
45In order to break this situation the committed field of each transfer is
46utilized. Initially it will be assigned the value DIFF_NOT_COMMITTED (0).
47When an update is verified this will be modified to DIFF_COMMITTED (1),
48DIFF_CORRUPT (2) or DIFF_INCONSISTENT (4) depending on whether the update
49was applied and verified successfully. When a reload resulted in one or
50more zones being corrupt or inconsistent, the newly forked server will quit
51with exit status NSD_RELOAD_FAILED and the parent server will initiate a new
52reload. Then it is clear which updates should be merged with the database (the
53updates which committed field is neither DIFF_CORRUPT or DIFF_INCONSISTENT).
54
55	Handling of the NSD_RELOAD_FAILED exit status of a child reload server
56	is in server_main (server.c)
57
58To allow updates to be applied again on failure, xfrd has been updated to keep
59all updates for each zone around until a reload succeeds. The set of updates
60is fixed once a reload has been initiated to avoid a potentially infinite
61loop. During the update window, xfrd will accept and transfer updates, but
62does not schedule them until the reload finishes. As a result, xfrd manages
63the updates stored on disk rather than the server, which previously just
64removed each update during the reload process regardless of the result.
65Potentially resulting in the same transfer being tried mutiple times if the
66set of updates contained a bad update.
67
68
69Running verifiers
70=================
71
72In server_reload (in server.c) the function server_verify is called just after
73all updates are merged into the (in memory) database, but just before the new
74database will be served. server_verify sets up a temporary event loop, calls
75verify_zone repeatedly to run the verifiers and mark each updated zone.
76server_reload then inspects the update status for each zone and communicates
77the number of good and bad zones in the update. server_reload then decides how
78to continue based on the number of good and bad zones as described above.
79
80verify_zone is defined in verify.c (and .h). The function creates the
81necessary pipes, starts the verifier and then sets up the required events and
82registers them with the event loop.
83
84The state for each verifier is maintained an array of struct verifier. The
85size of the array is "verifier-count:" big. Each verifier that runs
86simultaneously is assigned a slot. When no free slots are available it waits
87until a running verifier is finished (or timed out) and a free slot is
88available for a potential next verifier to run simultaneously with the already
89running verifiers. The default setting is to run just one verifier at once,
90which will probably be fine in most situations.
91
92Once all verifiers are finised (or timed out), the event loop is exited and
93server_reload communicates the status for each updated zone.
94
95
96Environment variables for the verifiers
97=======================================
98
99Verifiers are informed on how a zone can be verified through environment
100variables. The information on which addresses and ports a verifier may query a
101zone to be assessed is available and set on startup just after reading the
102configuration and setting up the sockets in nsd.c by calling
103setup_verifier_environment (also in nsd.c).
104
105Verifiers are spawned (via verify_zone) with popen3. verify_zone sets the zone
106specific environment variables (VERIFY_ZONE and VERIFY_ZONE_ON_STDIN) just
107before it executes the verifier with execvp. Server sockets are automatically
108closed when the verifier is executed.
109
110
111Logging a verifiers standard output and error streams
112=====================================================
113
114Everything a verifier outputs to stdin and stderr is logged in the nsd log
115file.  Handler with handle_log_from_fd (verify.c) as a callback are setup by
116server_verifiers_add. The log_from_fd_t struct is the user_data for the handler
117and contains besides the priority and the file descriptor, variables that are
118used by handle_log_from_fd to make sure logged lines will never exceed
119LOGLINELEN in length and will be split into parts if necessary.
120
121Note that in practice error messages are always logged before messages on the
122standard output, because stdout is buffered and stderr is not. Maybe it is more
123convenient to set stdout to unbuffered too.
124
125
126Feeding a zone to a verifier
127============================
128
129The complete zone may be fed to the standard input of a verifier when the
130"verifier-feed-zone:" configuration option has value "yes" (the default). For
131this purpose a verify_handle_feed (verify.c) handler is called when the
132standard input file descriptor of the verifier is writeable. The function
133utilizes the zone_rr_iter_next (verify.c) function to get the next rr to
134write to the verifier. The verifier_zone_feed struct is used to maintain state
135(the file handle, the rr pretty printing state and the zone iterator).
136
137
138Serving a zone to a verifier
139============================
140
141The nsd struct (in nsd.h) is extended with two arrays of nsd_socket structs:
142verify_tcp and verify_udp and an verify_ifs size_t which holds the number of
143sockets for verifying. This reflects the tcp, udp and ifs members that are used
144for normal serving. Several parts in the code that operate on the tcp and udp
145arrays is simply reused with the verify_tcp and verify_udp arrays.
146
147Furthermore, in places in server.c were before the server_close_all_sockets
148(server.c) function was used with the normal server sockets, the function is
149called subsequently for the verify sockets. Also in server_start_xfrd the
150sockets for verifiers are closed in the xfrd child process, because it has no
151need for them.
152
153
154Verifier timeouts
155=================
156
157A handler for timeouts (as configured with the "verifier-timeout:" option) is
158added by server_verifiers_add at verifier initialization time. The callback is
159handle_verifier_timeout (verify.c) and the verifier_state_type for the verifier
160is used as user_data.
161
162verify_handle_timeout simply kills the verifier (by sending SIGTERM) and does
163not cleanup the verifier state for reuse. This is done in verify_handle_exit,
164which is triggered once the verifier exits, because it can handle and start
165more verifiers simultaneously.
166
167
168Aborting the reload process (and killing all running verifiers)
169===============================================================
170
171A reload might (especially with a verifier) take some time. A parent server
172process could in this time be asked to quit. If that happens and it has a child
173reload server process, it sends the NSD_QUIT command over the communication
174channel. verify_handle_command, which is registered when the temporary event
175loop is created, is triggered and sends a SIGTERM signal to each of the
176verifiers.
177
178
179Refreshing and expiring zones
180=============================
181
182When the SOA-Refresh timer runs out, a fresh zone is tried to be fetched from
183the master server. If that fails, each SOA-Retry time will be tried again. To
184prevent a bad zone from being verified again and again, xfrd remembers the
185last serial number of the zone that didn't verify. It will not try to transfer
186a zone with the bad serial number again.
187
188Before afer reloading, the reload process informed xfrd which SOA's were
189merged in the database, so that xfrd knew when zone needed to be refreshed.
190This is adapted to inform xfrd about bad zones. The function
191inform_xfrd_new_soas is called for this in server.c. It communicated either
192good or bad soas. When bad soas are communicated a session starts with
193NSD_BAD_SOA_BEGIN. For only good zones it starts with NSD_SOA_BEGIN. Each soa
194is preceded by a NSD_SOA_INFO. When all soas are communicated, NSD_SOA_END is
195send. Reception of these messages by xfrd is handled by function
196xfrd_handle_ipc_read in ipc.c. In the xfrd_state struct (in xfrd.h), the
197boolean parent_bad_soa_infos is added to help with this control flow in ipc.
198
199The soas are eventually processed by xfrd, via xfrd_handle_ipc_SOAINFO in
200ipc.c, with the xfrd_handle_incoming_soa function in xfrd.c.  The function
201make sure that if a bad soa was received it is remembered in the xfrd_zone
202struct. Two new variables are added for the purpose to this struct: soa_bad
203and soa_bad_acquired.  The values are stored and read to the xfrd.state file
204with the functions xfrd_write_state_soa and xfrd_read_state respectively.
205
206In xfrd.c function xfrd_parse_received_xfr_packet is adapted to make sure that
207known bad serials are not transfered again unless the transfer is in a
208response to a notify. And even then only when the SOA matches the one in the
209notify (if it contained one, otherwise any SOA is good).
210
211