xref: /openbsd/usr.sbin/nsd/doc/NSD-VERIFY-MODS (revision 2161bcef)
1*2161bcefSsthenIn this file a quick overview of all the modifications that have been made for
2*2161bcefSsthenzone verification.
3*2161bcefSsthen
4*2161bcefSsthen
5*2161bcefSsthenConfiguring the verifier
6*2161bcefSsthen========================
7*2161bcefSsthen
8*2161bcefSsthenConfigure (nsd.conf) options were added. In the new "verify:" clause:
9*2161bcefSsthen	enable:
10*2161bcefSsthen	port:
11*2161bcefSsthen	ip-address:
12*2161bcefSsthen	verify-zones:
13*2161bcefSsthen	verifier:
14*2161bcefSsthen	verifier-count,
15*2161bcefSsthen	verifier-feed-zone,
16*2161bcefSsthen    and verifier-timeout.
17*2161bcefSsthen
18*2161bcefSsthenAnd for the "zone:" and "pattern:" clauses:
19*2161bcefSsthen	verify-zone,
20*2161bcefSsthen	verifier,
21*2161bcefSsthen	verifier-feed-zone,
22*2161bcefSsthen    and verifier-timeout.
23*2161bcefSsthen
24*2161bcefSsthenTo parse the syntax for those options, configlexer.lex and configparser.y are
25*2161bcefSsthenmodified. To hold those configuration values, the structs nsd_options and
26*2161bcefSsthenpattern_options in the file options.h are extended.
27*2161bcefSsthen
28*2161bcefSsthenThe type of pattern_options::verifier, char**, is in the vector of arguments
29*2161bcefSsthenform that can be used by the execve family of executing functions. The helper
30*2161bcefSsthentype "struct component" is defined to help parsing a command with arguments.
31*2161bcefSsthenA zone_verifier is a list of STRING tokens. A stack of component is
32*2161bcefSsthenconstructed from those strings, that eventually is converted to an argument
33*2161bcefSsthenin configparser.y.
34*2161bcefSsthen
35*2161bcefSsthen
36*2161bcefSsthenDifffile modifications
37*2161bcefSsthen======================
38*2161bcefSsthen
39*2161bcefSsthenIt is possible that during a reload updates for multiple different zones are
40*2161bcefSsthenread. If some should be loaded (because they verified or didn't need to be
41*2161bcefSsthenverified) and some not, we have a problem because the database is updated
42*2161bcefSsthenwith all the updates (also the bad ones) and we cannot easily selectively
43*2161bcefSsthenundo only the bad updates.
44*2161bcefSsthen
45*2161bcefSsthenIn order to break this situation the committed field of each transfer is
46*2161bcefSsthenutilized. Initially it will be assigned the value DIFF_NOT_COMMITTED (0).
47*2161bcefSsthenWhen an update is verified this will be modified to DIFF_COMMITTED (1),
48*2161bcefSsthenDIFF_CORRUPT (2) or DIFF_INCONSISTENT (4) depending on whether the update
49*2161bcefSsthenwas applied and verified successfully. When a reload resulted in one or
50*2161bcefSsthenmore zones being corrupt or inconsistent, the newly forked server will quit
51*2161bcefSsthenwith exit status NSD_RELOAD_FAILED and the parent server will initiate a new
52*2161bcefSsthenreload. Then it is clear which updates should be merged with the database (the
53*2161bcefSsthenupdates which committed field is neither DIFF_CORRUPT or DIFF_INCONSISTENT).
54*2161bcefSsthen
55*2161bcefSsthen	Handling of the NSD_RELOAD_FAILED exit status of a child reload server
56*2161bcefSsthen	is in server_main (server.c)
57*2161bcefSsthen
58*2161bcefSsthenTo allow updates to be applied again on failure, xfrd has been updated to keep
59*2161bcefSsthenall updates for each zone around until a reload succeeds. The set of updates
60*2161bcefSsthenis fixed once a reload has been initiated to avoid a potentially infinite
61*2161bcefSsthenloop. During the update window, xfrd will accept and transfer updates, but
62*2161bcefSsthendoes not schedule them until the reload finishes. As a result, xfrd manages
63*2161bcefSsthenthe updates stored on disk rather than the server, which previously just
64*2161bcefSsthenremoved each update during the reload process regardless of the result.
65*2161bcefSsthenPotentially resulting in the same transfer being tried mutiple times if the
66*2161bcefSsthenset of updates contained a bad update.
67*2161bcefSsthen
68*2161bcefSsthen
69*2161bcefSsthenRunning verifiers
70*2161bcefSsthen=================
71*2161bcefSsthen
72*2161bcefSsthenIn server_reload (in server.c) the function server_verify is called just after
73*2161bcefSsthenall updates are merged into the (in memory) database, but just before the new
74*2161bcefSsthendatabase will be served. server_verify sets up a temporary event loop, calls
75*2161bcefSsthenverify_zone repeatedly to run the verifiers and mark each updated zone.
76*2161bcefSsthenserver_reload then inspects the update status for each zone and communicates
77*2161bcefSsthenthe number of good and bad zones in the update. server_reload then decides how
78*2161bcefSsthento continue based on the number of good and bad zones as described above.
79*2161bcefSsthen
80*2161bcefSsthenverify_zone is defined in verify.c (and .h). The function creates the
81*2161bcefSsthennecessary pipes, starts the verifier and then sets up the required events and
82*2161bcefSsthenregisters them with the event loop.
83*2161bcefSsthen
84*2161bcefSsthenThe state for each verifier is maintained an array of struct verifier. The
85*2161bcefSsthensize of the array is "verifier-count:" big. Each verifier that runs
86*2161bcefSsthensimultaneously is assigned a slot. When no free slots are available it waits
87*2161bcefSsthenuntil a running verifier is finished (or timed out) and a free slot is
88*2161bcefSsthenavailable for a potential next verifier to run simultaneously with the already
89*2161bcefSsthenrunning verifiers. The default setting is to run just one verifier at once,
90*2161bcefSsthenwhich will probably be fine in most situations.
91*2161bcefSsthen
92*2161bcefSsthenOnce all verifiers are finised (or timed out), the event loop is exited and
93*2161bcefSsthenserver_reload communicates the status for each updated zone.
94*2161bcefSsthen
95*2161bcefSsthen
96*2161bcefSsthenEnvironment variables for the verifiers
97*2161bcefSsthen=======================================
98*2161bcefSsthen
99*2161bcefSsthenVerifiers are informed on how a zone can be verified through environment
100*2161bcefSsthenvariables. The information on which addresses and ports a verifier may query a
101*2161bcefSsthenzone to be assessed is available and set on startup just after reading the
102*2161bcefSsthenconfiguration and setting up the sockets in nsd.c by calling
103*2161bcefSsthensetup_verifier_environment (also in nsd.c).
104*2161bcefSsthen
105*2161bcefSsthenVerifiers are spawned (via verify_zone) with popen3. verify_zone sets the zone
106*2161bcefSsthenspecific environment variables (VERIFY_ZONE and VERIFY_ZONE_ON_STDIN) just
107*2161bcefSsthenbefore it executes the verifier with execvp. Server sockets are automatically
108*2161bcefSsthenclosed when the verifier is executed.
109*2161bcefSsthen
110*2161bcefSsthen
111*2161bcefSsthenLogging a verifiers standard output and error streams
112*2161bcefSsthen=====================================================
113*2161bcefSsthen
114*2161bcefSsthenEverything a verifier outputs to stdin and stderr is logged in the nsd log
115*2161bcefSsthenfile.  Handler with handle_log_from_fd (verify.c) as a callback are setup by
116*2161bcefSsthenserver_verifiers_add. The log_from_fd_t struct is the user_data for the handler
117*2161bcefSsthenand contains besides the priority and the file descriptor, variables that are
118*2161bcefSsthenused by handle_log_from_fd to make sure logged lines will never exceed
119*2161bcefSsthenLOGLINELEN in length and will be split into parts if necessary.
120*2161bcefSsthen
121*2161bcefSsthenNote that in practice error messages are always logged before messages on the
122*2161bcefSsthenstandard output, because stdout is buffered and stderr is not. Maybe it is more
123*2161bcefSsthenconvenient to set stdout to unbuffered too.
124*2161bcefSsthen
125*2161bcefSsthen
126*2161bcefSsthenFeeding a zone to a verifier
127*2161bcefSsthen============================
128*2161bcefSsthen
129*2161bcefSsthenThe complete zone may be fed to the standard input of a verifier when the
130*2161bcefSsthen"verifier-feed-zone:" configuration option has value "yes" (the default). For
131*2161bcefSsthenthis purpose a verify_handle_feed (verify.c) handler is called when the
132*2161bcefSsthenstandard input file descriptor of the verifier is writeable. The function
133*2161bcefSsthenutilizes the zone_rr_iter_next (verify.c) function to get the next rr to
134*2161bcefSsthenwrite to the verifier. The verifier_zone_feed struct is used to maintain state
135*2161bcefSsthen(the file handle, the rr pretty printing state and the zone iterator).
136*2161bcefSsthen
137*2161bcefSsthen
138*2161bcefSsthenServing a zone to a verifier
139*2161bcefSsthen============================
140*2161bcefSsthen
141*2161bcefSsthenThe nsd struct (in nsd.h) is extended with two arrays of nsd_socket structs:
142*2161bcefSsthenverify_tcp and verify_udp and an verify_ifs size_t which holds the number of
143*2161bcefSsthensockets for verifying. This reflects the tcp, udp and ifs members that are used
144*2161bcefSsthenfor normal serving. Several parts in the code that operate on the tcp and udp
145*2161bcefSsthenarrays is simply reused with the verify_tcp and verify_udp arrays.
146*2161bcefSsthen
147*2161bcefSsthenFurthermore, in places in server.c were before the server_close_all_sockets
148*2161bcefSsthen(server.c) function was used with the normal server sockets, the function is
149*2161bcefSsthencalled subsequently for the verify sockets. Also in server_start_xfrd the
150*2161bcefSsthensockets for verifiers are closed in the xfrd child process, because it has no
151*2161bcefSsthenneed for them.
152*2161bcefSsthen
153*2161bcefSsthen
154*2161bcefSsthenVerifier timeouts
155*2161bcefSsthen=================
156*2161bcefSsthen
157*2161bcefSsthenA handler for timeouts (as configured with the "verifier-timeout:" option) is
158*2161bcefSsthenadded by server_verifiers_add at verifier initialization time. The callback is
159*2161bcefSsthenhandle_verifier_timeout (verify.c) and the verifier_state_type for the verifier
160*2161bcefSsthenis used as user_data.
161*2161bcefSsthen
162*2161bcefSsthenverify_handle_timeout simply kills the verifier (by sending SIGTERM) and does
163*2161bcefSsthennot cleanup the verifier state for reuse. This is done in verify_handle_exit,
164*2161bcefSsthenwhich is triggered once the verifier exits, because it can handle and start
165*2161bcefSsthenmore verifiers simultaneously.
166*2161bcefSsthen
167*2161bcefSsthen
168*2161bcefSsthenAborting the reload process (and killing all running verifiers)
169*2161bcefSsthen===============================================================
170*2161bcefSsthen
171*2161bcefSsthenA reload might (especially with a verifier) take some time. A parent server
172*2161bcefSsthenprocess could in this time be asked to quit. If that happens and it has a child
173*2161bcefSsthenreload server process, it sends the NSD_QUIT command over the communication
174*2161bcefSsthenchannel. verify_handle_command, which is registered when the temporary event
175*2161bcefSsthenloop is created, is triggered and sends a SIGTERM signal to each of the
176*2161bcefSsthenverifiers.
177*2161bcefSsthen
178*2161bcefSsthen
179*2161bcefSsthenRefreshing and expiring zones
180*2161bcefSsthen=============================
181*2161bcefSsthen
182*2161bcefSsthenWhen the SOA-Refresh timer runs out, a fresh zone is tried to be fetched from
183*2161bcefSsthenthe master server. If that fails, each SOA-Retry time will be tried again. To
184*2161bcefSsthenprevent a bad zone from being verified again and again, xfrd remembers the
185*2161bcefSsthenlast serial number of the zone that didn't verify. It will not try to transfer
186*2161bcefSsthena zone with the bad serial number again.
187*2161bcefSsthen
188*2161bcefSsthenBefore afer reloading, the reload process informed xfrd which SOA's were
189*2161bcefSsthenmerged in the database, so that xfrd knew when zone needed to be refreshed.
190*2161bcefSsthenThis is adapted to inform xfrd about bad zones. The function
191*2161bcefSstheninform_xfrd_new_soas is called for this in server.c. It communicated either
192*2161bcefSsthengood or bad soas. When bad soas are communicated a session starts with
193*2161bcefSsthenNSD_BAD_SOA_BEGIN. For only good zones it starts with NSD_SOA_BEGIN. Each soa
194*2161bcefSsthenis preceded by a NSD_SOA_INFO. When all soas are communicated, NSD_SOA_END is
195*2161bcefSsthensend. Reception of these messages by xfrd is handled by function
196*2161bcefSsthenxfrd_handle_ipc_read in ipc.c. In the xfrd_state struct (in xfrd.h), the
197*2161bcefSsthenboolean parent_bad_soa_infos is added to help with this control flow in ipc.
198*2161bcefSsthen
199*2161bcefSsthenThe soas are eventually processed by xfrd, via xfrd_handle_ipc_SOAINFO in
200*2161bcefSsthenipc.c, with the xfrd_handle_incoming_soa function in xfrd.c.  The function
201*2161bcefSsthenmake sure that if a bad soa was received it is remembered in the xfrd_zone
202*2161bcefSsthenstruct. Two new variables are added for the purpose to this struct: soa_bad
203*2161bcefSsthenand soa_bad_acquired.  The values are stored and read to the xfrd.state file
204*2161bcefSsthenwith the functions xfrd_write_state_soa and xfrd_read_state respectively.
205*2161bcefSsthen
206*2161bcefSsthenIn xfrd.c function xfrd_parse_received_xfr_packet is adapted to make sure that
207*2161bcefSsthenknown bad serials are not transfered again unless the transfer is in a
208*2161bcefSsthenresponse to a notify. And even then only when the SOA matches the one in the
209*2161bcefSsthennotify (if it contained one, otherwise any SOA is good).
210*2161bcefSsthen
211