1.. Copyright (C) Internet Systems Consortium, Inc. ("ISC")
2..
3.. SPDX-License-Identifier: MPL-2.0
4..
5.. This Source Code Form is subject to the terms of the Mozilla Public
6.. License, v. 2.0.  If a copy of the MPL was not distributed with this
7.. file, you can obtain one at https://mozilla.org/MPL/2.0/.
8..
9.. See the COPYRIGHT file distributed with this work for additional
10.. information regarding copyright ownership.
11
12.. _dnssec_troubleshooting:
13
14Basic DNSSEC Troubleshooting
15----------------------------
16
17In this chapter, we cover some basic troubleshooting
18techniques, some common DNSSEC symptoms, and their causes and solutions. This
19is not a comprehensive "how to troubleshoot any DNS or DNSSEC problem"
20guide, because that could easily be an entire book by itself.
21
22.. _troubleshooting_query_path:
23
24Query Path
25~~~~~~~~~~
26
27The first step in troubleshooting DNS or DNSSEC should be to
28determine the query path. Whenever you are working with a DNS-related issue, it is
29always a good idea to determine the exact query path to identify the
30origin of the problem.
31
32End clients, such as laptop computers or mobile phones, are configured
33to talk to a recursive name server, and the recursive name server may in
34turn forward requests on to other recursive name servers before arriving at the
35authoritative name server. The giveaway is the presence of the
36Authoritative Answer (``aa``) flag in a query response: when present, we know we are talking
37to the authoritative server; when missing, we are talking to a recursive
38server. The example below shows an answer to a query for
39``www.example.com`` without the Authoritative Answer flag:
40
41::
42
43   $ dig @10.53.0.3 www.example.com A
44
45   ; <<>> DiG 9.16.0 <<>> @10.53.0.3 www.example.com a
46   ; (1 server found)
47   ;; global options: +cmd
48   ;; Got answer:
49   ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62714
50   ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
51
52   ;; OPT PSEUDOSECTION:
53   ; EDNS: version: 0, flags:; udp: 4096
54   ; COOKIE: c823fe302625db5b010000005e722b504d81bb01c2227259 (good)
55   ;; QUESTION SECTION:
56   ;www.example.com.       IN  A
57
58   ;; ANSWER SECTION:
59   www.example.com.    60  IN  A   10.1.0.1
60
61   ;; Query time: 3 msec
62   ;; SERVER: 10.53.0.3#53(10.53.0.3)
63   ;; WHEN: Wed Mar 18 14:08:16 GMT 2020
64   ;; MSG SIZE  rcvd: 88
65
66Not only do we not see the ``aa`` flag, we see an ``ra``
67flag, which indicates Recursion Available. This indicates that the
68server we are talking to (10.53.0.3 in this example) is a recursive name
69server: although we were able to get an answer for
70``www.example.com``, we know that the answer came from somewhere else.
71
72If we query the authoritative server directly, we get:
73
74::
75
76   $ dig @10.53.0.2 www.example.com A
77
78   ; <<>> DiG 9.16.0 <<>> @10.53.0.2 www.example.com a
79   ; (1 server found)
80   ;; global options: +cmd
81   ;; Got answer:
82   ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39542
83   ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
84   ;; WARNING: recursion requested but not available
85   ...
86
87The ``aa`` flag tells us that we are now talking to the
88authoritative name server for ``www.example.com``, and that this is not a
89cached answer it obtained from some other name server; it served this
90answer to us right from its own database. In fact,
91the Recursion Available (``ra``) flag is not present, which means this
92name server is not configured to perform recursion (at least not for
93this client), so it could not have queried another name server to get
94cached results.
95
96.. _troubleshooting_visible_symptoms:
97
98Visible DNSSEC Validation Symptoms
99~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
100
101After determining the query path, it is necessary to
102determine whether the problem is actually related to DNSSEC
103validation. You can use the ``+cd`` flag in ``dig`` to disable
104validation, as described in
105:ref:`how_do_i_know_validation_problem`.
106
107When there is indeed a DNSSEC validation problem, the visible symptoms,
108unfortunately, are very limited. With DNSSEC validation enabled, if a
109DNS response is not fully validated, it results in a generic
110SERVFAIL message, as shown below when querying against a recursive name
111server at 192.168.1.7:
112
113::
114
115   $ dig @10.53.0.3 www.example.org. A
116
117   ; <<>> DiG 9.16.0 <<>> @10.53.0.3 www.example.org A
118   ; (1 server found)
119   ;; global options: +cmd
120   ;; Got answer:
121   ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 28947
122   ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
123
124   ;; OPT PSEUDOSECTION:
125   ; EDNS: version: 0, flags:; udp: 4096
126   ; COOKIE: d1301968aca086ad010000005e723a7113603c01916d136b (good)
127   ;; QUESTION SECTION:
128   ;www.example.org.       IN  A
129
130   ;; Query time: 3 msec
131   ;; SERVER: 10.53.0.3#53(10.53.0.3)
132   ;; WHEN: Wed Mar 18 15:12:49 GMT 2020
133   ;; MSG SIZE  rcvd: 72
134
135With ``delv``, a "resolution failed" message is output instead:
136
137::
138
139   $ delv @10.53.0.3 www.example.org. A +rtrace
140   ;; fetch: www.example.org/A
141   ;; resolution failed: SERVFAIL
142
143BIND 9 logging features may be useful when trying to identify
144DNSSEC errors.
145
146.. _troubleshooting_logging:
147
148Basic Logging
149~~~~~~~~~~~~~
150
151DNSSEC validation error messages show up in ``syslog`` as a
152query error by default. Here is an example of what it may look like:
153
154::
155
156   validating www.example.org/A: no valid signature found
157   RRSIG failed to verify resolving 'www.example.org/A/IN': 10.53.0.2#53
158
159Usually, this level of error logging is sufficient.
160Debug logging, described in
161:ref:`troubleshooting_logging_debug`, gives information on how
162to get more details about why DNSSEC validation may have
163failed.
164
165.. _troubleshooting_logging_debug:
166
167BIND DNSSEC Debug Logging
168~~~~~~~~~~~~~~~~~~~~~~~~~
169
170A word of caution: before you enable debug logging, be aware that this
171may dramatically increase the load on your name servers. Enabling debug
172logging is thus not recommended for production servers.
173
174With that said, sometimes it may become necessary to temporarily enable
175BIND debug logging to see more details of how and whether DNSSEC is
176validating. DNSSEC-related messages are not recorded in ``syslog`` by default,
177even if query log is enabled; only DNSSEC errors show up in ``syslog``.
178
179The example below shows how to enable debug level 3 (to see full DNSSEC
180validation messages) in BIND 9 and have it sent to ``syslog``:
181
182::
183
184   logging {
185      channel dnssec_log {
186           syslog daemon;
187           severity debug 3;
188           print-category yes;
189       };
190       category dnssec { dnssec_log; };
191   };
192
193The example below shows how to log DNSSEC messages to their own file
194(here, ``/var/log/dnssec.log``):
195
196::
197
198   logging {
199       channel dnssec_log {
200           file "/var/log/dnssec.log";
201           severity debug 3;
202       };
203       category dnssec { dnssec_log; };
204   };
205
206After turning on debug logging and restarting BIND, a large
207number of log messages appear in
208``syslog``. The example below shows the log messages as a result of
209successfully looking up and validating the domain name ``ftp.isc.org``.
210
211::
212
213   validating ./NS: starting
214   validating ./NS: attempting positive response validation
215     validating ./DNSKEY: starting
216     validating ./DNSKEY: attempting positive response validation
217     validating ./DNSKEY: verify rdataset (keyid=20326): success
218     validating ./DNSKEY: marking as secure (DS)
219   validating ./NS: in validator_callback_dnskey
220   validating ./NS: keyset with trust secure
221   validating ./NS: resuming validate
222   validating ./NS: verify rdataset (keyid=33853): success
223   validating ./NS: marking as secure, noqname proof not needed
224   validating ftp.isc.org/A: starting
225   validating ftp.isc.org/A: attempting positive response validation
226   validating isc.org/DNSKEY: starting
227   validating isc.org/DNSKEY: attempting positive response validation
228     validating isc.org/DS: starting
229     validating isc.org/DS: attempting positive response validation
230   validating org/DNSKEY: starting
231   validating org/DNSKEY: attempting positive response validation
232     validating org/DS: starting
233     validating org/DS: attempting positive response validation
234     validating org/DS: keyset with trust secure
235     validating org/DS: verify rdataset (keyid=33853): success
236     validating org/DS: marking as secure, noqname proof not needed
237   validating org/DNSKEY: in validator_callback_ds
238   validating org/DNSKEY: dsset with trust secure
239   validating org/DNSKEY: verify rdataset (keyid=9795): success
240   validating org/DNSKEY: marking as secure (DS)
241     validating isc.org/DS: in fetch_callback_dnskey
242     validating isc.org/DS: keyset with trust secure
243     validating isc.org/DS: resuming validate
244     validating isc.org/DS: verify rdataset (keyid=33209): success
245     validating isc.org/DS: marking as secure, noqname proof not needed
246   validating isc.org/DNSKEY: in validator_callback_ds
247   validating isc.org/DNSKEY: dsset with trust secure
248   validating isc.org/DNSKEY: verify rdataset (keyid=7250): success
249   validating isc.org/DNSKEY: marking as secure (DS)
250   validating ftp.isc.org/A: in fetch_callback_dnskey
251   validating ftp.isc.org/A: keyset with trust secure
252   validating ftp.isc.org/A: resuming validate
253   validating ftp.isc.org/A: verify rdataset (keyid=27566): success
254   validating ftp.isc.org/A: marking as secure, noqname proof not needed
255
256Note that these log messages indicate that the chain of trust has been
257established and ``ftp.isc.org`` has been successfully validated.
258
259If validation had failed, you would see log messages indicating errors.
260We cover some of the most validation problems in the next section.
261
262.. _troubleshooting_common_problems:
263
264Common Problems
265~~~~~~~~~~~~~~~
266
267.. _troubleshooting_security_lameness:
268
269Security Lameness
270^^^^^^^^^^^^^^^^^
271
272Similar to lame delegation in traditional DNS, security lameness refers to the
273condition when the parent zone holds a set of DS records that point to
274something that does not exist in the child zone. As a result,
275the entire child zone may "disappear," having been marked as bogus by
276validating resolvers.
277
278Below is an example attempting to resolve the A record for a test domain
279name ``www.example.net``. From the user's perspective, as described in
280:ref:`how_do_i_know_validation_problem`, only a SERVFAIL
281message is returned. On the validating resolver, we see the
282following messages in ``syslog``:
283
284::
285
286   named[126063]: validating example.net/DNSKEY: no valid signature found (DS)
287   named[126063]: no valid RRSIG resolving 'example.net/DNSKEY/IN': 10.53.0.2#53
288   named[126063]: broken trust chain resolving 'www.example.net/A/IN': 10.53.0.2#53
289
290This gives us a hint that it is a broken trust chain issue. Let's take a
291look at the DS records that are published for the zone (with the keys
292shortened for ease of display):
293
294::
295
296   $ dig @10.53.0.3 example.net. DS
297
298   ; <<>> DiG 9.16.0 <<>> @10.53.0.3 example.net DS
299   ; (1 server found)
300   ;; global options: +cmd
301   ;; Got answer:
302   ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59602
303   ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
304
305   ;; OPT PSEUDOSECTION:
306   ; EDNS: version: 0, flags:; udp: 4096
307   ; COOKIE: 7026d8f7c6e77e2a010000005e735d7c9d038d061b2d24da (good)
308   ;; QUESTION SECTION:
309   ;example.net.           IN  DS
310
311   ;; ANSWER SECTION:
312   example.net.        256 IN  DS  14956 8 2 9F3CACD...D3E3A396
313
314   ;; Query time: 0 msec
315   ;; SERVER: 10.53.0.3#53(10.53.0.3)
316   ;; WHEN: Thu Mar 19 11:54:36 GMT 2020
317   ;; MSG SIZE  rcvd: 116
318
319Next, we query for the DNSKEY and RRSIG of ``example.net`` to see if
320there's anything wrong. Since we are having trouble validating, we
321can use the ``+cd`` option to temporarily disable checking and return
322results, even though they do not pass the validation tests. The
323``+multiline`` option tells ``dig`` to print the type, algorithm type,
324and key id for DNSKEY records. Again,
325some long strings are shortened for ease of display:
326
327::
328
329   $ dig @10.53.0.3 example.net. DNSKEY +dnssec +cd +multiline
330
331   ; <<>> DiG 9.16.0 <<>> @10.53.0.3 example.net DNSKEY +cd +multiline +dnssec
332   ; (1 server found)
333   ;; global options: +cmd
334   ;; Got answer:
335   ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42980
336   ;; flags: qr rd ra cd; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1
337
338   ;; OPT PSEUDOSECTION:
339   ; EDNS: version: 0, flags: do; udp: 4096
340   ; COOKIE: 4b5e7c88b3680c35010000005e73722057551f9f8be1990e (good)
341   ;; QUESTION SECTION:
342   ;example.net.       IN DNSKEY
343
344   ;; ANSWER SECTION:
345   example.net.        287 IN DNSKEY 256 3 8 (
346                   AwEAAbu3NX...ADU/D7xjFFDu+8WRIn
347                   ) ; ZSK; alg = RSASHA256 ; key id = 35328
348   example.net.        287 IN DNSKEY 257 3 8 (
349                   AwEAAbKtU1...PPP4aQZTybk75ZW+uL
350                   6OJMAF63NO0s1nAZM2EWAVasbnn/X+J4N2rLuhk=
351                   ) ; KSK; alg = RSASHA256 ; key id = 27247
352   example.net.        287 IN RRSIG DNSKEY 8 2 300 (
353                   20811123173143 20180101000000 27247 example.net.
354                   Fz1sjClIoF...YEjzpAWuAj9peQ== )
355   example.net.        287 IN RRSIG DNSKEY 8 2 300 (
356                   20811123173143 20180101000000 35328 example.net.
357                   seKtUeJ4/l...YtDc1rcXTVlWIOw= )
358
359   ;; Query time: 0 msec
360   ;; SERVER: 10.53.0.3#53(10.53.0.3)
361   ;; WHEN: Thu Mar 19 13:22:40 GMT 2020
362   ;; MSG SIZE  rcvd: 962
363
364Here is the problem: the parent zone is telling the world that
365``example.net`` is using the key 14956, but the authoritative server
366indicates that it is using keys 27247 and 35328. There are several
367potential causes for this mismatch: one possibility is that a malicious
368attacker has compromised one side and changed the data. A more likely
369scenario is that the DNS administrator for the child zone did not upload
370the correct key information to the parent zone.
371
372.. _troubleshooting_incorrect_time:
373
374Incorrect Time
375^^^^^^^^^^^^^^
376
377In DNSSEC, every record comes with at least one RRSIG, and each RRSIG
378contains two timestamps: one indicating when it becomes valid, and
379one when it expires. If the validating resolver's current system time does
380not fall within the two RRSIG timestamps, error messages
381appear in the BIND debug log.
382
383The example below shows a log message when the RRSIG appears to have
384expired. This could mean the validating resolver system time is
385incorrectly set too far in the future, or the zone administrator has not
386kept up with RRSIG maintenance.
387
388::
389
390   validating example.com/DNSKEY: verify failed due to bad signature (keyid=19036): RRSIG has expired
391
392The log below shows that the RRSIG validity period has not yet begun. This could mean
393the validation resolver's system time is incorrectly set too far in the past, or
394the zone administrator has incorrectly generated signatures for this
395domain name.
396
397::
398
399   validating example.com/DNSKEY: verify failed due to bad signature (keyid=4521): RRSIG validity period has not begun
400
401.. _troubleshooting_unable_to_load_keys:
402
403Unable to Load Keys
404^^^^^^^^^^^^^^^^^^^
405
406This is a simple yet common issue. If the key files are present but
407unreadable by ``named`` for some reason, the ``syslog`` returns clear error
408messages, as shown below:
409
410::
411
412   named[32447]: zone example.com/IN (signed): reconfiguring zone keys
413   named[32447]: dns_dnssec_findmatchingkeys: error reading key file Kexample.com.+008+06817.private: permission denied
414   named[32447]: dns_dnssec_findmatchingkeys: error reading key file Kexample.com.+008+17694.private: permission denied
415   named[32447]: zone example.com/IN (signed): next key event: 27-Nov-2014 20:04:36.521
416
417However, if no keys are found, the error is not as obvious. Below shows
418the ``syslog`` messages after executing ``rndc
419reload`` with the key files missing from the key directory:
420
421::
422
423   named[32516]: received control channel command 'reload'
424   named[32516]: loading configuration from '/etc/bind/named.conf'
425   named[32516]: reading built-in trusted keys from file '/etc/bind/bind.keys'
426   named[32516]: using default UDP/IPv4 port range: [1024, 65535]
427   named[32516]: using default UDP/IPv6 port range: [1024, 65535]
428   named[32516]: sizing zone task pool based on 6 zones
429   named[32516]: the working directory is not writable
430   named[32516]: reloading configuration succeeded
431   named[32516]: reloading zones succeeded
432   named[32516]: all zones loaded
433   named[32516]: running
434   named[32516]: zone example.com/IN (signed): reconfiguring zone keys
435   named[32516]: zone example.com/IN (signed): next key event: 27-Nov-2014 20:07:09.292
436
437This happens to look exactly the same as if the keys were present and
438readable, and appears to indicate that ``named`` loaded the keys and signed the zone. It
439even generates the internal (raw) files:
440
441::
442
443   # cd /etc/bind/db
444   # ls
445   example.com.db  example.com.db.jbk  example.com.db.signed
446
447If ``named`` really loaded the keys and signed the zone, you should see
448the following files:
449
450::
451
452   # cd /etc/bind/db
453   # ls
454   example.com.db  example.com.db.jbk  example.com.db.signed  example.com.db.signed.jnl
455
456So, unless you see the ``*.signed.jnl`` file, your zone has not been
457signed.
458
459.. _troubleshooting_invalid_trust_anchors:
460
461Invalid Trust Anchors
462^^^^^^^^^^^^^^^^^^^^^
463
464In most cases, you never need to explicitly configure trust
465anchors. ``named`` supplies the current root trust anchor and,
466with the default setting of ``dnssec-validation``, updates it on the
467infrequent occasions when it is changed.
468
469However, in some circumstances you may need to explicitly configure
470your own trust anchor. As we saw in the :ref:`trust_anchors_description`
471section, whenever a DNSKEY is received by the validating resolver, it is
472compared to the list of keys the resolver explicitly trusts to see if
473further action is needed. If the two keys match, the validating resolver
474stops performing further verification and returns the answer(s) as
475validated.
476
477But what if the key file on the validating resolver is misconfigured or
478missing? Below we show some examples of log messages when things are not
479working properly.
480
481First of all, if the key you copied is malformed, BIND does not even
482start and you will likely find this error message in syslog:
483
484::
485
486   named[18235]: /etc/bind/named.conf.options:29: bad base64 encoding
487   named[18235]: loading configuration: failure
488
489If the key is a valid base64 string but the key algorithm is incorrect,
490or if the wrong key is installed, the first thing you will notice is
491that virtually all of your DNS lookups result in SERVFAIL, even when
492you are looking up domain names that have not been DNSSEC-enabled. Below
493shows an example of querying a recursive server 10.53.0.3:
494
495::
496
497   $ dig @10.53.0.3 www.example.com. A
498
499   ; <<>> DiG 9.16.0 <<>> @10.53.0.3 www.example.org A +dnssec
500   ; (1 server found)
501   ;; global options: +cmd
502   ;; Got answer:
503   ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 29586
504   ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
505
506   ;; OPT PSEUDOSECTION:
507   ; EDNS: version: 0, flags: do; udp: 4096
508   ; COOKIE: ee078fc321fa1367010000005e73a58bf5f205ca47e04bed (good)
509   ;; QUESTION SECTION:
510   ;www.example.org.       IN  A
511
512``delv`` shows a similar result:
513
514::
515
516   $ delv @192.168.1.7 www.example.com. +rtrace
517   ;; fetch: www.example.com/A
518   ;; resolution failed: SERVFAIL
519
520The next symptom you see is in the DNSSEC log messages:
521
522::
523
524   managed-keys-zone: DNSKEY set for zone '.' could not be verified with current keys
525   validating ./DNSKEY: starting
526   validating ./DNSKEY: attempting positive response validation
527   validating ./DNSKEY: no DNSKEY matching DS
528   validating ./DNSKEY: no DNSKEY matching DS
529   validating ./DNSKEY: no valid signature found (DS)
530
531These errors are indications that there are problems with the trust
532anchor.
533
534.. _troubleshooting_nta:
535
536Negative Trust Anchors
537~~~~~~~~~~~~~~~~~~~~~~
538
539BIND 9.11 introduced Negative Trust Anchors (NTAs) as a means to
540*temporarily* disable DNSSEC validation for a zone when you know that
541the zone's DNSSEC is misconfigured.
542
543NTAs are added using the ``rndc`` command, e.g.:
544
545::
546
547   $ rndc nta example.com
548    Negative trust anchor added: example.com/_default, expires 19-Mar-2020 19:57:42.000
549
550
551The list of currently configured NTAs can also be examined using
552``rndc``, e.g.:
553
554::
555
556   $ rndc nta -dump
557    example.com/_default: expiry 19-Mar-2020 19:57:42.000
558
559
560The default lifetime of an NTA is one hour, although by default, BIND
561polls the zone every five minutes to see if the zone correctly
562validates, at which point the NTA automatically expires. Both the
563default lifetime and the polling interval may be configured via
564``named.conf``, and the lifetime can be overridden on a per-zone basis
565using the ``-lifetime duration`` parameter to ``rndc nta``. Both timer
566values have a permitted maximum value of one week.
567
568.. _troubleshooting_nsec3:
569
570NSEC3 Troubleshooting
571~~~~~~~~~~~~~~~~~~~~~
572
573BIND includes a tool called ``nsec3hash`` that runs through the same
574steps as a validating resolver, to generate the correct hashed name
575based on NSEC3PARAM parameters. The command takes the following
576parameters in order: salt, algorithm, iterations, and domain. For
577example, if the salt is 1234567890ABCDEF, hash algorithm is 1, and
578iteration is 10, to get the NSEC3-hashed name for ``www.example.com`` we
579would execute a command like this:
580
581::
582
583   $ nsec3hash 1234567890ABCEDF 1 10 www.example.com
584   RN7I9ME6E1I6BDKIP91B9TCE4FHJ7LKF (salt=1234567890ABCEDF, hash=1, iterations=10)
585
586While it is unlikely you would construct a rainbow table of your own
587zone data, this tool may be useful when troubleshooting NSEC3 problems.
588