1================== 2american fuzzy lop 3================== 4 5 Written and maintained by Michal Zalewski <lcamtuf@google.com> 6 7 Copyright 2013, 2014, 2015, 2016 Google Inc. All rights reserved. 8 Released under terms and conditions of Apache License, Version 2.0. 9 10 For new versions and additional information, check out: 11 http://lcamtuf.coredump.cx/afl/ 12 13 To compare notes with other users or get notified about major new features, 14 send a mail to <afl-users+subscribe@googlegroups.com>. 15 16 ** See QuickStartGuide.txt if you don't have time to read this file. ** 17 181) Challenges of guided fuzzing 19------------------------------- 20 21Fuzzing is one of the most powerful and proven strategies for identifying 22security issues in real-world software; it is responsible for the vast 23majority of remote code execution and privilege escalation bugs found to date 24in security-critical software. 25 26Unfortunately, fuzzing is also relatively shallow; blind, random mutations 27make it very unlikely to reach certain code paths in the tested code, leaving 28some vulnerabilities firmly outside the reach of this technique. 29 30There have been numerous attempts to solve this problem. One of the early 31approaches - pioneered by Tavis Ormandy - is corpus distillation. The method 32relies on coverage signals to select a subset of interesting seeds from a 33massive, high-quality corpus of candidate files, and then fuzz them by 34traditional means. The approach works exceptionally well, but requires such 35a corpus to be readily available. In addition, block coverage measurements 36provide only a very simplistic understanding of program state, and are less 37useful for guiding the fuzzing effort in the long haul. 38 39Other, more sophisticated research has focused on techniques such as program 40flow analysis ("concolic execution"), symbolic execution, or static analysis. 41All these methods are extremely promising in experimental settings, but tend 42to suffer from reliability and performance problems in practical uses - and 43currently do not offer a viable alternative to "dumb" fuzzing techniques. 44 452) The afl-fuzz approach 46------------------------ 47 48American Fuzzy Lop is a brute-force fuzzer coupled with an exceedingly simple 49but rock-solid instrumentation-guided genetic algorithm. It uses a modified 50form of edge coverage to effortlessly pick up subtle, local-scale changes to 51program control flow. 52 53Simplifying a bit, the overall algorithm can be summed up as: 54 55 1) Load user-supplied initial test cases into the queue, 56 57 2) Take next input file from the queue, 58 59 3) Attempt to trim the test case to the smallest size that doesn't alter 60 the measured behavior of the program, 61 62 4) Repeatedly mutate the file using a balanced and well-researched variety 63 of traditional fuzzing strategies, 64 65 5) If any of the generated mutations resulted in a new state transition 66 recorded by the instrumentation, add mutated output as a new entry in the 67 queue. 68 69 6) Go to 2. 70 71The discovered test cases are also periodically culled to eliminate ones that 72have been obsoleted by newer, higher-coverage finds; and undergo several other 73instrumentation-driven effort minimization steps. 74 75As a side result of the fuzzing process, the tool creates a small, 76self-contained corpus of interesting test cases. These are extremely useful 77for seeding other, labor- or resource-intensive testing regimes - for example, 78for stress-testing browsers, office applications, graphics suites, or 79closed-source tools. 80 81The fuzzer is thoroughly tested to deliver out-of-the-box performance far 82superior to blind fuzzing or coverage-only tools. 83 843) Instrumenting programs for use with AFL 85------------------------------------------ 86 87When source code is available, instrumentation can be injected by a companion 88tool that works as a drop-in replacement for gcc or clang in any standard build 89process for third-party code. 90 91The instrumentation has a fairly modest performance impact; in conjunction with 92other optimizations implemented by afl-fuzz, most programs can be fuzzed as fast 93or even faster than possible with traditional tools. 94 95The correct way to recompile the target program may vary depending on the 96specifics of the build process, but a nearly-universal approach would be: 97 98$ CC=/path/to/afl/afl-gcc ./configure 99$ make clean all 100 101For C++ programs, you'd would also want to set CXX=/path/to/afl/afl-g++. 102 103The clang wrappers (afl-clang and afl-clang++) can be used in the same way; 104clang users may also opt to leverage a higher-performance instrumentation mode, 105as described in llvm_mode/README.llvm. 106 107When testing libraries, you need to find or write a simple program that reads 108data from stdin or from a file and passes it to the tested library. In such a 109case, it is essential to link this executable against a static version of the 110instrumented library, or to make sure that the correct .so file is loaded at 111runtime (usually by setting LD_LIBRARY_PATH). The simplest option is a static 112build, usually possible via: 113 114$ CC=/path/to/afl/afl-gcc ./configure --disable-shared 115 116Setting AFL_HARDEN=1 when calling 'make' will cause the CC wrapper to 117automatically enable code hardening options that make it easier to detect 118simple memory bugs. Libdislocator, a helper library included with AFL (see 119libdislocator/README.dislocator) can help uncover heap corruption issues, too. 120 121PS. ASAN users are advised to review notes_for_asan.txt file for important 122caveats. 123 1244) Instrumenting binary-only apps 125--------------------------------- 126 127When source code is *NOT* available, the fuzzer offers experimental support for 128fast, on-the-fly instrumentation of black-box binaries. This is accomplished 129with a version of QEMU running in the lesser-known "user space emulation" mode. 130 131QEMU is a project separate from AFL, but you can conveniently build the 132feature by doing: 133 134$ cd qemu_mode 135$ ./build_qemu_support.sh 136 137For additional instructions and caveats, see qemu_mode/README.qemu. 138 139The mode is approximately 2-5x slower than compile-time instrumentation, is 140less conductive to parallelization, and may have some other quirks. 141 1425) Choosing initial test cases 143------------------------------ 144 145To operate correctly, the fuzzer requires one or more starting file that 146contains a good example of the input data normally expected by the targeted 147application. There are two basic rules: 148 149 - Keep the files small. Under 1 kB is ideal, although not strictly necessary. 150 For a discussion of why size matters, see perf_tips.txt. 151 152 - Use multiple test cases only if they are functionally different from 153 each other. There is no point in using fifty different vacation photos 154 to fuzz an image library. 155 156You can find many good examples of starting files in the testcases/ subdirectory 157that comes with this tool. 158 159PS. If a large corpus of data is available for screening, you may want to use 160the afl-cmin utility to identify a subset of functionally distinct files that 161exercise different code paths in the target binary. 162 1636) Fuzzing binaries 164------------------- 165 166The fuzzing process itself is carried out by the afl-fuzz utility. This program 167requires a read-only directory with initial test cases, a separate place to 168store its findings, plus a path to the binary to test. 169 170For target binaries that accept input directly from stdin, the usual syntax is: 171 172$ ./afl-fuzz -i testcase_dir -o findings_dir /path/to/program [...params...] 173 174For programs that take input from a file, use '@@' to mark the location in 175the target's command line where the input file name should be placed. The 176fuzzer will substitute this for you: 177 178$ ./afl-fuzz -i testcase_dir -o findings_dir /path/to/program @@ 179 180You can also use the -f option to have the mutated data written to a specific 181file. This is useful if the program expects a particular file extension or so. 182 183Non-instrumented binaries can be fuzzed in the QEMU mode (add -Q in the command 184line) or in a traditional, blind-fuzzer mode (specify -n). 185 186You can use -t and -m to override the default timeout and memory limit for the 187executed process; rare examples of targets that may need these settings touched 188include compilers and video decoders. 189 190Tips for optimizing fuzzing performance are discussed in perf_tips.txt. 191 192Note that afl-fuzz starts by performing an array of deterministic fuzzing 193steps, which can take several days, but tend to produce neat test cases. If you 194want quick & dirty results right away - akin to zzuf and other traditional 195fuzzers - add the -d option to the command line. 196 1977) Interpreting output 198---------------------- 199 200See the status_screen.txt file for information on how to interpret the 201displayed stats and monitor the health of the process. Be sure to consult this 202file especially if any UI elements are highlighted in red. 203 204The fuzzing process will continue until you press Ctrl-C. At minimum, you want 205to allow the fuzzer to complete one queue cycle, which may take anywhere from a 206couple of hours to a week or so. 207 208There are three subdirectories created within the output directory and updated 209in real time: 210 211 - queue/ - test cases for every distinctive execution path, plus all the 212 starting files given by the user. This is the synthesized corpus 213 mentioned in section 2. 214 215 Before using this corpus for any other purposes, you can shrink 216 it to a smaller size using the afl-cmin tool. The tool will find 217 a smaller subset of files offering equivalent edge coverage. 218 219 - crashes/ - unique test cases that cause the tested program to receive a 220 fatal signal (e.g., SIGSEGV, SIGILL, SIGABRT). The entries are 221 grouped by the received signal. 222 223 - hangs/ - unique test cases that cause the tested program to time out. The 224 default time limit before something is classified as a hang is 225 the larger of 1 second and the value of the -t parameter. 226 The value can be fine-tuned by setting AFL_HANG_TMOUT, but this 227 is rarely necessary. 228 229Crashes and hangs are considered "unique" if the associated execution paths 230involve any state transitions not seen in previously-recorded faults. If a 231single bug can be reached in multiple ways, there will be some count inflation 232early in the process, but this should quickly taper off. 233 234The file names for crashes and hangs are correlated with parent, non-faulting 235queue entries. This should help with debugging. 236 237When you can't reproduce a crash found by afl-fuzz, the most likely cause is 238that you are not setting the same memory limit as used by the tool. Try: 239 240$ LIMIT_MB=50 241$ ( ulimit -Sv $[LIMIT_MB << 10]; /path/to/tested_binary ... ) 242 243Change LIMIT_MB to match the -m parameter passed to afl-fuzz. On OpenBSD, 244also change -Sv to -Sd. 245 246Any existing output directory can be also used to resume aborted jobs; try: 247 248$ ./afl-fuzz -i- -o existing_output_dir [...etc...] 249 250If you have gnuplot installed, you can also generate some pretty graphs for any 251active fuzzing task using afl-plot. For an example of how this looks like, 252see http://lcamtuf.coredump.cx/afl/plot/. 253 2548) Parallelized fuzzing 255----------------------- 256 257Every instance of afl-fuzz takes up roughly one core. This means that on 258multi-core systems, parallelization is necessary to fully utilize the hardware. 259For tips on how to fuzz a common target on multiple cores or multiple networked 260machines, please refer to parallel_fuzzing.txt. 261 262The parallel fuzzing mode also offers a simple way for interfacing AFL to other 263fuzzers, to symbolic or concolic execution engines, and so forth; again, see the 264last section of parallel_fuzzing.txt for tips. 265 2669) Fuzzer dictionaries 267---------------------- 268 269By default, afl-fuzz mutation engine is optimized for compact data formats - 270say, images, multimedia, compressed data, regular expression syntax, or shell 271scripts. It is somewhat less suited for languages with particularly verbose and 272redundant verbiage - notably including HTML, SQL, or JavaScript. 273 274To avoid the hassle of building syntax-aware tools, afl-fuzz provides a way to 275seed the fuzzing process with an optional dictionary of language keywords, 276magic headers, or other special tokens associated with the targeted data type 277- and use that to reconstruct the underlying grammar on the go: 278 279 http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html 280 281To use this feature, you first need to create a dictionary in one of the two 282formats discussed in dictionaries/README.dictionaries; and then point the fuzzer 283to it via the -x option in the command line. 284 285(Several common dictionaries are already provided in that subdirectory, too.) 286 287There is no way to provide more structured descriptions of the underlying 288syntax, but the fuzzer will likely figure out some of this based on the 289instrumentation feedback alone. This actually works in practice, say: 290 291 http://lcamtuf.blogspot.com/2015/04/finding-bugs-in-sqlite-easy-way.html 292 293PS. Even when no explicit dictionary is given, afl-fuzz will try to extract 294existing syntax tokens in the input corpus by watching the instrumentation 295very closely during deterministic byte flips. This works for some types of 296parsers and grammars, but isn't nearly as good as the -x mode. 297 298If a dictionary is really hard to come by, another option is to let AFL run 299for a while, and then use the token capture library that comes as a companion 300utility with AFL. For that, see libtokencap/README.tokencap. 301 30210) Crash triage 303---------------- 304 305The coverage-based grouping of crashes usually produces a small data set that 306can be quickly triaged manually or with a very simple GDB or Valgrind script. 307Every crash is also traceable to its parent non-crashing test case in the 308queue, making it easier to diagnose faults. 309 310Having said that, it's important to acknowledge that some fuzzing crashes can be 311difficult to quickly evaluate for exploitability without a lot of debugging and 312code analysis work. To assist with this task, afl-fuzz supports a very unique 313"crash exploration" mode enabled with the -C flag. 314 315In this mode, the fuzzer takes one or more crashing test cases as the input, 316and uses its feedback-driven fuzzing strategies to very quickly enumerate all 317code paths that can be reached in the program while keeping it in the 318crashing state. 319 320Mutations that do not result in a crash are rejected; so are any changes that 321do not affect the execution path. 322 323The output is a small corpus of files that can be very rapidly examined to see 324what degree of control the attacker has over the faulting address, or whether 325it is possible to get past an initial out-of-bounds read - and see what lies 326beneath. 327 328Oh, one more thing: for test case minimization, give afl-tmin a try. The tool 329can be operated in a very simple way: 330 331$ ./afl-tmin -i test_case -o minimized_result -- /path/to/program [...] 332 333The tool works with crashing and non-crashing test cases alike. In the crash 334mode, it will happily accept instrumented and non-instrumented binaries. In the 335non-crashing mode, the minimizer relies on standard AFL instrumentation to make 336the file simpler without altering the execution path. 337 338The minimizer accepts the -m, -t, -f and @@ syntax in a manner compatible with 339afl-fuzz. 340 341Another recent addition to AFL is the afl-analyze tool. It takes an input 342file, attempts to sequentially flip bytes, and observes the behavior of the 343tested program. It then color-codes the input based on which sections appear to 344be critical, and which are not; while not bulletproof, it can often offer quick 345insights into complex file formats. More info about its operation can be found 346near the end of technical_details.txt. 347 34811) Going beyond crashes 349------------------------ 350 351Fuzzing is a wonderful and underutilized technique for discovering non-crashing 352design and implementation errors, too. Quite a few interesting bugs have been 353found by modifying the target programs to call abort() when, say: 354 355 - Two bignum libraries produce different outputs when given the same 356 fuzzer-generated input, 357 358 - An image library produces different outputs when asked to decode the same 359 input image several times in a row, 360 361 - A serialization / deserialization library fails to produce stable outputs 362 when iteratively serializing and deserializing fuzzer-supplied data, 363 364 - A compression library produces an output inconsistent with the input file 365 when asked to compress and then decompress a particular blob. 366 367Implementing these or similar sanity checks usually takes very little time; 368if you are the maintainer of a particular package, you can make this code 369conditional with #ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION (a flag also 370shared with libfuzzer) or #ifdef __AFL_COMPILER (this one is just for AFL). 371 37212) Common-sense risks 373---------------------- 374 375Please keep in mind that, similarly to many other computationally-intensive 376tasks, fuzzing may put strain on your hardware and on the OS. In particular: 377 378 - Your CPU will run hot and will need adequate cooling. In most cases, if 379 cooling is insufficient or stops working properly, CPU speeds will be 380 automatically throttled. That said, especially when fuzzing on less 381 suitable hardware (laptops, smartphones, etc), it's not entirely impossible 382 for something to blow up. 383 384 - Targeted programs may end up erratically grabbing gigabytes of memory or 385 filling up disk space with junk files. AFL tries to enforce basic memory 386 limits, but can't prevent each and every possible mishap. The bottom line 387 is that you shouldn't be fuzzing on systems where the prospect of data loss 388 is not an acceptable risk. 389 390 - Fuzzing involves billions of reads and writes to the filesystem. On modern 391 systems, this will be usually heavily cached, resulting in fairly modest 392 "physical" I/O - but there are many factors that may alter this equation. 393 It is your responsibility to monitor for potential trouble; with very heavy 394 I/O, the lifespan of many HDDs and SSDs may be reduced. 395 396 A good way to monitor disk I/O on Linux is the 'iostat' command: 397 398 $ iostat -d 3 -x -k [...optional disk ID...] 399 40013) Known limitations & areas for improvement 401--------------------------------------------- 402 403Here are some of the most important caveats for AFL: 404 405 - AFL detects faults by checking for the first spawned process dying due to 406 a signal (SIGSEGV, SIGABRT, etc). Programs that install custom handlers for 407 these signals may need to have the relevant code commented out. In the same 408 vein, faults in child processed spawned by the fuzzed target may evade 409 detection unless you manually add some code to catch that. 410 411 - As with any other brute-force tool, the fuzzer offers limited coverage if 412 encryption, checksums, cryptographic signatures, or compression are used to 413 wholly wrap the actual data format to be tested. 414 415 To work around this, you can comment out the relevant checks (see 416 experimental/libpng_no_checksum/ for inspiration); if this is not possible, 417 you can also write a postprocessor, as explained in 418 experimental/post_library/. 419 420 - There are some unfortunate trade-offs with ASAN and 64-bit binaries. This 421 isn't due to any specific fault of afl-fuzz; see notes_for_asan.txt for 422 tips. 423 424 - There is no direct support for fuzzing network services, background 425 daemons, or interactive apps that require UI interaction to work. You may 426 need to make simple code changes to make them behave in a more traditional 427 way. Preeny may offer a relatively simple option, too - see: 428 https://github.com/zardus/preeny 429 430 Some useful tips for modifying network-based services can be also found at: 431 https://www.fastly.com/blog/how-to-fuzz-server-american-fuzzy-lop 432 433 - AFL doesn't output human-readable coverage data. If you want to monitor 434 coverage, use afl-cov from Michael Rash: https://github.com/mrash/afl-cov 435 436 - Occasionally, sentient machines rise against their creators. If this 437 happens to you, please consult http://lcamtuf.coredump.cx/prep/. 438 439Beyond this, see INSTALL for platform-specific tips. 440 44114) Special thanks 442------------------ 443 444Many of the improvements to afl-fuzz wouldn't be possible without feedback, 445bug reports, or patches from: 446 447 Jann Horn Hanno Boeck 448 Felix Groebert Jakub Wilk 449 Richard W. M. Jones Alexander Cherepanov 450 Tom Ritter Hovik Manucharyan 451 Sebastian Roschke Eberhard Mattes 452 Padraig Brady Ben Laurie 453 @dronesec Luca Barbato 454 Tobias Ospelt Thomas Jarosch 455 Martin Carpenter Mudge Zatko 456 Joe Zbiciak Ryan Govostes 457 Michael Rash William Robinet 458 Jonathan Gray Filipe Cabecinhas 459 Nico Weber Jodie Cunningham 460 Andrew Griffiths Parker Thompson 461 Jonathan Neuschfer Tyler Nighswander 462 Ben Nagy Samir Aguiar 463 Aidan Thornton Aleksandar Nikolich 464 Sam Hakim Laszlo Szekeres 465 David A. Wheeler Turo Lamminen 466 Andreas Stieger Richard Godbee 467 Louis Dassy teor2345 468 Alex Moneger Dmitry Vyukov 469 Keegan McAllister Kostya Serebryany 470 Richo Healey Martijn Bogaard 471 rc0r Jonathan Foote 472 Christian Holler Dominique Pelle 473 Jacek Wielemborek Leo Barnes 474 Jeremy Barnes Jeff Trull 475 Guillaume Endignoux ilovezfs 476 Daniel Godas-Lopez Franjo Ivancic 477 Austin Seipp Daniel Komaromy 478 Daniel Binderman Jonathan Metzman 479 Vegard Nossum Jan Kneschke 480 Kurt Roeckx Marcel Bohme 481 Van-Thuan Pham Abhik Roychoudhury 482 Joshua J. Drake Toby Hutton 483 Rene Freingruber Sergey Davidoff 484 Sami Liedes Craig Young 485 Andrzej Jackowski Daniel Hodson 486 487Thank you! 488 48915) Contact 490----------- 491 492Questions? Concerns? Bug reports? The author can be usually reached at 493<lcamtuf@google.com>. 494 495There is also a mailing list for the project; to join, send a mail to 496<afl-users+subscribe@googlegroups.com>. Or, if you prefer to browse 497archives first, try: 498 499 https://groups.google.com/group/afl-users 500 501PS. If you wish to submit raw code to be incorporated into the project, please 502be aware that the copyright on most of AFL is claimed by Google. While you do 503retain copyright on your contributions, they do ask people to agree to a simple 504CLA first: 505 506 https://cla.developers.google.com/clas 507 508Sorry about the hassle. Of course, no CLA is required for feature requests or 509bug reports. 510