1.. Copyright (C) Internet Systems Consortium, Inc. ("ISC") 2.. 3.. SPDX-License-Identifier: MPL-2.0 4.. 5.. This Source Code Form is subject to the terms of the Mozilla Public 6.. License, v. 2.0. If a copy of the MPL was not distributed with this 7.. file, you can obtain one at https://mozilla.org/MPL/2.0/. 8.. 9.. See the COPYRIGHT file distributed with this work for additional 10.. information regarding copyright ownership. 11 12.. _dnssec_troubleshooting: 13 14Basic DNSSEC Troubleshooting 15---------------------------- 16 17In this chapter, we cover some basic troubleshooting 18techniques, some common DNSSEC symptoms, and their causes and solutions. This 19is not a comprehensive "how to troubleshoot any DNS or DNSSEC problem" 20guide, because that could easily be an entire book by itself. 21 22.. _troubleshooting_query_path: 23 24Query Path 25~~~~~~~~~~ 26 27The first step in troubleshooting DNS or DNSSEC should be to 28determine the query path. Whenever you are working with a DNS-related issue, it is 29always a good idea to determine the exact query path to identify the 30origin of the problem. 31 32End clients, such as laptop computers or mobile phones, are configured 33to talk to a recursive name server, and the recursive name server may in 34turn forward requests on to other recursive name servers before arriving at the 35authoritative name server. The giveaway is the presence of the 36Authoritative Answer (``aa``) flag in a query response: when present, we know we are talking 37to the authoritative server; when missing, we are talking to a recursive 38server. The example below shows an answer to a query for 39``www.example.com`` without the Authoritative Answer flag: 40 41:: 42 43 $ dig @10.53.0.3 www.example.com A 44 45 ; <<>> DiG 9.16.0 <<>> @10.53.0.3 www.example.com a 46 ; (1 server found) 47 ;; global options: +cmd 48 ;; Got answer: 49 ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62714 50 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 51 52 ;; OPT PSEUDOSECTION: 53 ; EDNS: version: 0, flags:; udp: 4096 54 ; COOKIE: c823fe302625db5b010000005e722b504d81bb01c2227259 (good) 55 ;; QUESTION SECTION: 56 ;www.example.com. IN A 57 58 ;; ANSWER SECTION: 59 www.example.com. 60 IN A 10.1.0.1 60 61 ;; Query time: 3 msec 62 ;; SERVER: 10.53.0.3#53(10.53.0.3) 63 ;; WHEN: Wed Mar 18 14:08:16 GMT 2020 64 ;; MSG SIZE rcvd: 88 65 66Not only do we not see the ``aa`` flag, we see an ``ra`` 67flag, which indicates Recursion Available. This indicates that the 68server we are talking to (10.53.0.3 in this example) is a recursive name 69server: although we were able to get an answer for 70``www.example.com``, we know that the answer came from somewhere else. 71 72If we query the authoritative server directly, we get: 73 74:: 75 76 $ dig @10.53.0.2 www.example.com A 77 78 ; <<>> DiG 9.16.0 <<>> @10.53.0.2 www.example.com a 79 ; (1 server found) 80 ;; global options: +cmd 81 ;; Got answer: 82 ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39542 83 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 84 ;; WARNING: recursion requested but not available 85 ... 86 87The ``aa`` flag tells us that we are now talking to the 88authoritative name server for ``www.example.com``, and that this is not a 89cached answer it obtained from some other name server; it served this 90answer to us right from its own database. In fact, 91the Recursion Available (``ra``) flag is not present, which means this 92name server is not configured to perform recursion (at least not for 93this client), so it could not have queried another name server to get 94cached results. 95 96.. _troubleshooting_visible_symptoms: 97 98Visible DNSSEC Validation Symptoms 99~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 100 101After determining the query path, it is necessary to 102determine whether the problem is actually related to DNSSEC 103validation. You can use the ``+cd`` flag in ``dig`` to disable 104validation, as described in 105:ref:`how_do_i_know_validation_problem`. 106 107When there is indeed a DNSSEC validation problem, the visible symptoms, 108unfortunately, are very limited. With DNSSEC validation enabled, if a 109DNS response is not fully validated, it results in a generic 110SERVFAIL message, as shown below when querying against a recursive name 111server at 192.168.1.7: 112 113:: 114 115 $ dig @10.53.0.3 www.example.org. A 116 117 ; <<>> DiG 9.16.0 <<>> @10.53.0.3 www.example.org A 118 ; (1 server found) 119 ;; global options: +cmd 120 ;; Got answer: 121 ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 28947 122 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 123 124 ;; OPT PSEUDOSECTION: 125 ; EDNS: version: 0, flags:; udp: 4096 126 ; COOKIE: d1301968aca086ad010000005e723a7113603c01916d136b (good) 127 ;; QUESTION SECTION: 128 ;www.example.org. IN A 129 130 ;; Query time: 3 msec 131 ;; SERVER: 10.53.0.3#53(10.53.0.3) 132 ;; WHEN: Wed Mar 18 15:12:49 GMT 2020 133 ;; MSG SIZE rcvd: 72 134 135With ``delv``, a "resolution failed" message is output instead: 136 137:: 138 139 $ delv @10.53.0.3 www.example.org. A +rtrace 140 ;; fetch: www.example.org/A 141 ;; resolution failed: SERVFAIL 142 143BIND 9 logging features may be useful when trying to identify 144DNSSEC errors. 145 146.. _troubleshooting_logging: 147 148Basic Logging 149~~~~~~~~~~~~~ 150 151DNSSEC validation error messages show up in ``syslog`` as a 152query error by default. Here is an example of what it may look like: 153 154:: 155 156 validating www.example.org/A: no valid signature found 157 RRSIG failed to verify resolving 'www.example.org/A/IN': 10.53.0.2#53 158 159Usually, this level of error logging is sufficient. 160Debug logging, described in 161:ref:`troubleshooting_logging_debug`, gives information on how 162to get more details about why DNSSEC validation may have 163failed. 164 165.. _troubleshooting_logging_debug: 166 167BIND DNSSEC Debug Logging 168~~~~~~~~~~~~~~~~~~~~~~~~~ 169 170A word of caution: before you enable debug logging, be aware that this 171may dramatically increase the load on your name servers. Enabling debug 172logging is thus not recommended for production servers. 173 174With that said, sometimes it may become necessary to temporarily enable 175BIND debug logging to see more details of how and whether DNSSEC is 176validating. DNSSEC-related messages are not recorded in ``syslog`` by default, 177even if query log is enabled; only DNSSEC errors show up in ``syslog``. 178 179The example below shows how to enable debug level 3 (to see full DNSSEC 180validation messages) in BIND 9 and have it sent to ``syslog``: 181 182:: 183 184 logging { 185 channel dnssec_log { 186 syslog daemon; 187 severity debug 3; 188 print-category yes; 189 }; 190 category dnssec { dnssec_log; }; 191 }; 192 193The example below shows how to log DNSSEC messages to their own file 194(here, ``/var/log/dnssec.log``): 195 196:: 197 198 logging { 199 channel dnssec_log { 200 file "/var/log/dnssec.log"; 201 severity debug 3; 202 }; 203 category dnssec { dnssec_log; }; 204 }; 205 206After turning on debug logging and restarting BIND, a large 207number of log messages appear in 208``syslog``. The example below shows the log messages as a result of 209successfully looking up and validating the domain name ``ftp.isc.org``. 210 211:: 212 213 validating ./NS: starting 214 validating ./NS: attempting positive response validation 215 validating ./DNSKEY: starting 216 validating ./DNSKEY: attempting positive response validation 217 validating ./DNSKEY: verify rdataset (keyid=20326): success 218 validating ./DNSKEY: marking as secure (DS) 219 validating ./NS: in validator_callback_dnskey 220 validating ./NS: keyset with trust secure 221 validating ./NS: resuming validate 222 validating ./NS: verify rdataset (keyid=33853): success 223 validating ./NS: marking as secure, noqname proof not needed 224 validating ftp.isc.org/A: starting 225 validating ftp.isc.org/A: attempting positive response validation 226 validating isc.org/DNSKEY: starting 227 validating isc.org/DNSKEY: attempting positive response validation 228 validating isc.org/DS: starting 229 validating isc.org/DS: attempting positive response validation 230 validating org/DNSKEY: starting 231 validating org/DNSKEY: attempting positive response validation 232 validating org/DS: starting 233 validating org/DS: attempting positive response validation 234 validating org/DS: keyset with trust secure 235 validating org/DS: verify rdataset (keyid=33853): success 236 validating org/DS: marking as secure, noqname proof not needed 237 validating org/DNSKEY: in validator_callback_ds 238 validating org/DNSKEY: dsset with trust secure 239 validating org/DNSKEY: verify rdataset (keyid=9795): success 240 validating org/DNSKEY: marking as secure (DS) 241 validating isc.org/DS: in fetch_callback_dnskey 242 validating isc.org/DS: keyset with trust secure 243 validating isc.org/DS: resuming validate 244 validating isc.org/DS: verify rdataset (keyid=33209): success 245 validating isc.org/DS: marking as secure, noqname proof not needed 246 validating isc.org/DNSKEY: in validator_callback_ds 247 validating isc.org/DNSKEY: dsset with trust secure 248 validating isc.org/DNSKEY: verify rdataset (keyid=7250): success 249 validating isc.org/DNSKEY: marking as secure (DS) 250 validating ftp.isc.org/A: in fetch_callback_dnskey 251 validating ftp.isc.org/A: keyset with trust secure 252 validating ftp.isc.org/A: resuming validate 253 validating ftp.isc.org/A: verify rdataset (keyid=27566): success 254 validating ftp.isc.org/A: marking as secure, noqname proof not needed 255 256Note that these log messages indicate that the chain of trust has been 257established and ``ftp.isc.org`` has been successfully validated. 258 259If validation had failed, you would see log messages indicating errors. 260We cover some of the most validation problems in the next section. 261 262.. _troubleshooting_common_problems: 263 264Common Problems 265~~~~~~~~~~~~~~~ 266 267.. _troubleshooting_security_lameness: 268 269Security Lameness 270^^^^^^^^^^^^^^^^^ 271 272Similar to lame delegation in traditional DNS, security lameness refers to the 273condition when the parent zone holds a set of DS records that point to 274something that does not exist in the child zone. As a result, 275the entire child zone may "disappear," having been marked as bogus by 276validating resolvers. 277 278Below is an example attempting to resolve the A record for a test domain 279name ``www.example.net``. From the user's perspective, as described in 280:ref:`how_do_i_know_validation_problem`, only a SERVFAIL 281message is returned. On the validating resolver, we see the 282following messages in ``syslog``: 283 284:: 285 286 named[126063]: validating example.net/DNSKEY: no valid signature found (DS) 287 named[126063]: no valid RRSIG resolving 'example.net/DNSKEY/IN': 10.53.0.2#53 288 named[126063]: broken trust chain resolving 'www.example.net/A/IN': 10.53.0.2#53 289 290This gives us a hint that it is a broken trust chain issue. Let's take a 291look at the DS records that are published for the zone (with the keys 292shortened for ease of display): 293 294:: 295 296 $ dig @10.53.0.3 example.net. DS 297 298 ; <<>> DiG 9.16.0 <<>> @10.53.0.3 example.net DS 299 ; (1 server found) 300 ;; global options: +cmd 301 ;; Got answer: 302 ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59602 303 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 304 305 ;; OPT PSEUDOSECTION: 306 ; EDNS: version: 0, flags:; udp: 4096 307 ; COOKIE: 7026d8f7c6e77e2a010000005e735d7c9d038d061b2d24da (good) 308 ;; QUESTION SECTION: 309 ;example.net. IN DS 310 311 ;; ANSWER SECTION: 312 example.net. 256 IN DS 14956 8 2 9F3CACD...D3E3A396 313 314 ;; Query time: 0 msec 315 ;; SERVER: 10.53.0.3#53(10.53.0.3) 316 ;; WHEN: Thu Mar 19 11:54:36 GMT 2020 317 ;; MSG SIZE rcvd: 116 318 319Next, we query for the DNSKEY and RRSIG of ``example.net`` to see if 320there's anything wrong. Since we are having trouble validating, we 321can use the ``+cd`` option to temporarily disable checking and return 322results, even though they do not pass the validation tests. The 323``+multiline`` option tells ``dig`` to print the type, algorithm type, 324and key id for DNSKEY records. Again, 325some long strings are shortened for ease of display: 326 327:: 328 329 $ dig @10.53.0.3 example.net. DNSKEY +dnssec +cd +multiline 330 331 ; <<>> DiG 9.16.0 <<>> @10.53.0.3 example.net DNSKEY +cd +multiline +dnssec 332 ; (1 server found) 333 ;; global options: +cmd 334 ;; Got answer: 335 ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42980 336 ;; flags: qr rd ra cd; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1 337 338 ;; OPT PSEUDOSECTION: 339 ; EDNS: version: 0, flags: do; udp: 4096 340 ; COOKIE: 4b5e7c88b3680c35010000005e73722057551f9f8be1990e (good) 341 ;; QUESTION SECTION: 342 ;example.net. IN DNSKEY 343 344 ;; ANSWER SECTION: 345 example.net. 287 IN DNSKEY 256 3 8 ( 346 AwEAAbu3NX...ADU/D7xjFFDu+8WRIn 347 ) ; ZSK; alg = RSASHA256 ; key id = 35328 348 example.net. 287 IN DNSKEY 257 3 8 ( 349 AwEAAbKtU1...PPP4aQZTybk75ZW+uL 350 6OJMAF63NO0s1nAZM2EWAVasbnn/X+J4N2rLuhk= 351 ) ; KSK; alg = RSASHA256 ; key id = 27247 352 example.net. 287 IN RRSIG DNSKEY 8 2 300 ( 353 20811123173143 20180101000000 27247 example.net. 354 Fz1sjClIoF...YEjzpAWuAj9peQ== ) 355 example.net. 287 IN RRSIG DNSKEY 8 2 300 ( 356 20811123173143 20180101000000 35328 example.net. 357 seKtUeJ4/l...YtDc1rcXTVlWIOw= ) 358 359 ;; Query time: 0 msec 360 ;; SERVER: 10.53.0.3#53(10.53.0.3) 361 ;; WHEN: Thu Mar 19 13:22:40 GMT 2020 362 ;; MSG SIZE rcvd: 962 363 364Here is the problem: the parent zone is telling the world that 365``example.net`` is using the key 14956, but the authoritative server 366indicates that it is using keys 27247 and 35328. There are several 367potential causes for this mismatch: one possibility is that a malicious 368attacker has compromised one side and changed the data. A more likely 369scenario is that the DNS administrator for the child zone did not upload 370the correct key information to the parent zone. 371 372.. _troubleshooting_incorrect_time: 373 374Incorrect Time 375^^^^^^^^^^^^^^ 376 377In DNSSEC, every record comes with at least one RRSIG, and each RRSIG 378contains two timestamps: one indicating when it becomes valid, and 379one when it expires. If the validating resolver's current system time does 380not fall within the two RRSIG timestamps, error messages 381appear in the BIND debug log. 382 383The example below shows a log message when the RRSIG appears to have 384expired. This could mean the validating resolver system time is 385incorrectly set too far in the future, or the zone administrator has not 386kept up with RRSIG maintenance. 387 388:: 389 390 validating example.com/DNSKEY: verify failed due to bad signature (keyid=19036): RRSIG has expired 391 392The log below shows that the RRSIG validity period has not yet begun. This could mean 393the validation resolver's system time is incorrectly set too far in the past, or 394the zone administrator has incorrectly generated signatures for this 395domain name. 396 397:: 398 399 validating example.com/DNSKEY: verify failed due to bad signature (keyid=4521): RRSIG validity period has not begun 400 401.. _troubleshooting_unable_to_load_keys: 402 403Unable to Load Keys 404^^^^^^^^^^^^^^^^^^^ 405 406This is a simple yet common issue. If the key files are present but 407unreadable by ``named`` for some reason, the ``syslog`` returns clear error 408messages, as shown below: 409 410:: 411 412 named[32447]: zone example.com/IN (signed): reconfiguring zone keys 413 named[32447]: dns_dnssec_findmatchingkeys: error reading key file Kexample.com.+008+06817.private: permission denied 414 named[32447]: dns_dnssec_findmatchingkeys: error reading key file Kexample.com.+008+17694.private: permission denied 415 named[32447]: zone example.com/IN (signed): next key event: 27-Nov-2014 20:04:36.521 416 417However, if no keys are found, the error is not as obvious. Below shows 418the ``syslog`` messages after executing ``rndc 419reload`` with the key files missing from the key directory: 420 421:: 422 423 named[32516]: received control channel command 'reload' 424 named[32516]: loading configuration from '/etc/bind/named.conf' 425 named[32516]: reading built-in trusted keys from file '/etc/bind/bind.keys' 426 named[32516]: using default UDP/IPv4 port range: [1024, 65535] 427 named[32516]: using default UDP/IPv6 port range: [1024, 65535] 428 named[32516]: sizing zone task pool based on 6 zones 429 named[32516]: the working directory is not writable 430 named[32516]: reloading configuration succeeded 431 named[32516]: reloading zones succeeded 432 named[32516]: all zones loaded 433 named[32516]: running 434 named[32516]: zone example.com/IN (signed): reconfiguring zone keys 435 named[32516]: zone example.com/IN (signed): next key event: 27-Nov-2014 20:07:09.292 436 437This happens to look exactly the same as if the keys were present and 438readable, and appears to indicate that ``named`` loaded the keys and signed the zone. It 439even generates the internal (raw) files: 440 441:: 442 443 # cd /etc/bind/db 444 # ls 445 example.com.db example.com.db.jbk example.com.db.signed 446 447If ``named`` really loaded the keys and signed the zone, you should see 448the following files: 449 450:: 451 452 # cd /etc/bind/db 453 # ls 454 example.com.db example.com.db.jbk example.com.db.signed example.com.db.signed.jnl 455 456So, unless you see the ``*.signed.jnl`` file, your zone has not been 457signed. 458 459.. _troubleshooting_invalid_trust_anchors: 460 461Invalid Trust Anchors 462^^^^^^^^^^^^^^^^^^^^^ 463 464In most cases, you never need to explicitly configure trust 465anchors. ``named`` supplies the current root trust anchor and, 466with the default setting of ``dnssec-validation``, updates it on the 467infrequent occasions when it is changed. 468 469However, in some circumstances you may need to explicitly configure 470your own trust anchor. As we saw in the :ref:`trust_anchors_description` 471section, whenever a DNSKEY is received by the validating resolver, it is 472compared to the list of keys the resolver explicitly trusts to see if 473further action is needed. If the two keys match, the validating resolver 474stops performing further verification and returns the answer(s) as 475validated. 476 477But what if the key file on the validating resolver is misconfigured or 478missing? Below we show some examples of log messages when things are not 479working properly. 480 481First of all, if the key you copied is malformed, BIND does not even 482start and you will likely find this error message in syslog: 483 484:: 485 486 named[18235]: /etc/bind/named.conf.options:29: bad base64 encoding 487 named[18235]: loading configuration: failure 488 489If the key is a valid base64 string but the key algorithm is incorrect, 490or if the wrong key is installed, the first thing you will notice is 491that virtually all of your DNS lookups result in SERVFAIL, even when 492you are looking up domain names that have not been DNSSEC-enabled. Below 493shows an example of querying a recursive server 10.53.0.3: 494 495:: 496 497 $ dig @10.53.0.3 www.example.com. A 498 499 ; <<>> DiG 9.16.0 <<>> @10.53.0.3 www.example.org A +dnssec 500 ; (1 server found) 501 ;; global options: +cmd 502 ;; Got answer: 503 ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 29586 504 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 505 506 ;; OPT PSEUDOSECTION: 507 ; EDNS: version: 0, flags: do; udp: 4096 508 ; COOKIE: ee078fc321fa1367010000005e73a58bf5f205ca47e04bed (good) 509 ;; QUESTION SECTION: 510 ;www.example.org. IN A 511 512``delv`` shows a similar result: 513 514:: 515 516 $ delv @192.168.1.7 www.example.com. +rtrace 517 ;; fetch: www.example.com/A 518 ;; resolution failed: SERVFAIL 519 520The next symptom you see is in the DNSSEC log messages: 521 522:: 523 524 managed-keys-zone: DNSKEY set for zone '.' could not be verified with current keys 525 validating ./DNSKEY: starting 526 validating ./DNSKEY: attempting positive response validation 527 validating ./DNSKEY: no DNSKEY matching DS 528 validating ./DNSKEY: no DNSKEY matching DS 529 validating ./DNSKEY: no valid signature found (DS) 530 531These errors are indications that there are problems with the trust 532anchor. 533 534.. _troubleshooting_nta: 535 536Negative Trust Anchors 537~~~~~~~~~~~~~~~~~~~~~~ 538 539BIND 9.11 introduced Negative Trust Anchors (NTAs) as a means to 540*temporarily* disable DNSSEC validation for a zone when you know that 541the zone's DNSSEC is misconfigured. 542 543NTAs are added using the ``rndc`` command, e.g.: 544 545:: 546 547 $ rndc nta example.com 548 Negative trust anchor added: example.com/_default, expires 19-Mar-2020 19:57:42.000 549 550 551The list of currently configured NTAs can also be examined using 552``rndc``, e.g.: 553 554:: 555 556 $ rndc nta -dump 557 example.com/_default: expiry 19-Mar-2020 19:57:42.000 558 559 560The default lifetime of an NTA is one hour, although by default, BIND 561polls the zone every five minutes to see if the zone correctly 562validates, at which point the NTA automatically expires. Both the 563default lifetime and the polling interval may be configured via 564``named.conf``, and the lifetime can be overridden on a per-zone basis 565using the ``-lifetime duration`` parameter to ``rndc nta``. Both timer 566values have a permitted maximum value of one week. 567 568.. _troubleshooting_nsec3: 569 570NSEC3 Troubleshooting 571~~~~~~~~~~~~~~~~~~~~~ 572 573BIND includes a tool called ``nsec3hash`` that runs through the same 574steps as a validating resolver, to generate the correct hashed name 575based on NSEC3PARAM parameters. The command takes the following 576parameters in order: salt, algorithm, iterations, and domain. For 577example, if the salt is 1234567890ABCDEF, hash algorithm is 1, and 578iteration is 10, to get the NSEC3-hashed name for ``www.example.com`` we 579would execute a command like this: 580 581:: 582 583 $ nsec3hash 1234567890ABCEDF 1 10 www.example.com 584 RN7I9ME6E1I6BDKIP91B9TCE4FHJ7LKF (salt=1234567890ABCEDF, hash=1, iterations=10) 585 586While it is unlikely you would construct a rainbow table of your own 587zone data, this tool may be useful when troubleshooting NSEC3 problems. 588