1==================
2american fuzzy lop
3==================
4
5 Written and maintained by Michal Zalewski <lcamtuf@google.com>
6
7 Copyright 2013, 2014, 2015, 2016 Google Inc. All rights reserved.
8 Released under terms and conditions of Apache License, Version 2.0.
9
10 For new versions and additional information, check out:
11 http://lcamtuf.coredump.cx/afl/
12
13 To compare notes with other users or get notified about major new features,
14 send a mail to <afl-users+subscribe@googlegroups.com>.
15
16 ** See QuickStartGuide.txt if you don't have time to read this file. **
17
181) Challenges of guided fuzzing
19-------------------------------
20
21Fuzzing is one of the most powerful and proven strategies for identifying
22security issues in real-world software; it is responsible for the vast
23majority of remote code execution and privilege escalation bugs found to date
24in security-critical software.
25
26Unfortunately, fuzzing is also relatively shallow; blind, random mutations
27make it very unlikely to reach certain code paths in the tested code, leaving
28some vulnerabilities firmly outside the reach of this technique.
29
30There have been numerous attempts to solve this problem. One of the early
31approaches - pioneered by Tavis Ormandy - is corpus distillation. The method
32relies on coverage signals to select a subset of interesting seeds from a
33massive, high-quality corpus of candidate files, and then fuzz them by
34traditional means. The approach works exceptionally well, but requires such
35a corpus to be readily available. In addition, block coverage measurements
36provide only a very simplistic understanding of program state, and are less
37useful for guiding the fuzzing effort in the long haul.
38
39Other, more sophisticated research has focused on techniques such as program
40flow analysis ("concolic execution"), symbolic execution, or static analysis.
41All these methods are extremely promising in experimental settings, but tend
42to suffer from reliability and performance problems in practical uses - and
43currently do not offer a viable alternative to "dumb" fuzzing techniques.
44
452) The afl-fuzz approach
46------------------------
47
48American Fuzzy Lop is a brute-force fuzzer coupled with an exceedingly simple
49but rock-solid instrumentation-guided genetic algorithm. It uses a modified
50form of edge coverage to effortlessly pick up subtle, local-scale changes to
51program control flow.
52
53Simplifying a bit, the overall algorithm can be summed up as:
54
55 1) Load user-supplied initial test cases into the queue,
56
57 2) Take next input file from the queue,
58
59 3) Attempt to trim the test case to the smallest size that doesn't alter
60 the measured behavior of the program,
61
62 4) Repeatedly mutate the file using a balanced and well-researched variety
63 of traditional fuzzing strategies,
64
65 5) If any of the generated mutations resulted in a new state transition
66 recorded by the instrumentation, add mutated output as a new entry in the
67 queue.
68
69 6) Go to 2.
70
71The discovered test cases are also periodically culled to eliminate ones that
72have been obsoleted by newer, higher-coverage finds; and undergo several other
73instrumentation-driven effort minimization steps.
74
75As a side result of the fuzzing process, the tool creates a small,
76self-contained corpus of interesting test cases. These are extremely useful
77for seeding other, labor- or resource-intensive testing regimes - for example,
78for stress-testing browsers, office applications, graphics suites, or
79closed-source tools.
80
81The fuzzer is thoroughly tested to deliver out-of-the-box performance far
82superior to blind fuzzing or coverage-only tools.
83
843) Instrumenting programs for use with AFL
85------------------------------------------
86
87When source code is available, instrumentation can be injected by a companion
88tool that works as a drop-in replacement for gcc or clang in any standard build
89process for third-party code.
90
91The instrumentation has a fairly modest performance impact; in conjunction with
92other optimizations implemented by afl-fuzz, most programs can be fuzzed as fast
93or even faster than possible with traditional tools.
94
95The correct way to recompile the target program may vary depending on the
96specifics of the build process, but a nearly-universal approach would be:
97
98$ CC=/path/to/afl/afl-gcc ./configure
99$ make clean all
100
101For C++ programs, you'd would also want to set CXX=/path/to/afl/afl-g++.
102
103The clang wrappers (afl-clang and afl-clang++) can be used in the same way;
104clang users may also opt to leverage a higher-performance instrumentation mode,
105as described in llvm_mode/README.llvm.
106
107When testing libraries, you need to find or write a simple program that reads
108data from stdin or from a file and passes it to the tested library. In such a
109case, it is essential to link this executable against a static version of the
110instrumented library, or to make sure that the correct .so file is loaded at
111runtime (usually by setting LD_LIBRARY_PATH). The simplest option is a static
112build, usually possible via:
113
114$ CC=/path/to/afl/afl-gcc ./configure --disable-shared
115
116Setting AFL_HARDEN=1 when calling 'make' will cause the CC wrapper to
117automatically enable code hardening options that make it easier to detect
118simple memory bugs. Libdislocator, a helper library included with AFL (see
119libdislocator/README.dislocator) can help uncover heap corruption issues, too.
120
121PS. ASAN users are advised to review notes_for_asan.txt file for important
122caveats.
123
1244) Instrumenting binary-only apps
125---------------------------------
126
127When source code is *NOT* available, the fuzzer offers experimental support for
128fast, on-the-fly instrumentation of black-box binaries. This is accomplished
129with a version of QEMU running in the lesser-known "user space emulation" mode.
130
131QEMU is a project separate from AFL, but you can conveniently build the
132feature by doing:
133
134$ cd qemu_mode
135$ ./build_qemu_support.sh
136
137For additional instructions and caveats, see qemu_mode/README.qemu.
138
139The mode is approximately 2-5x slower than compile-time instrumentation, is
140less conductive to parallelization, and may have some other quirks.
141
1425) Choosing initial test cases
143------------------------------
144
145To operate correctly, the fuzzer requires one or more starting file that
146contains a good example of the input data normally expected by the targeted
147application. There are two basic rules:
148
149 - Keep the files small. Under 1 kB is ideal, although not strictly necessary.
150 For a discussion of why size matters, see perf_tips.txt.
151
152 - Use multiple test cases only if they are functionally different from
153 each other. There is no point in using fifty different vacation photos
154 to fuzz an image library.
155
156You can find many good examples of starting files in the testcases/ subdirectory
157that comes with this tool.
158
159PS. If a large corpus of data is available for screening, you may want to use
160the afl-cmin utility to identify a subset of functionally distinct files that
161exercise different code paths in the target binary.
162
1636) Fuzzing binaries
164-------------------
165
166The fuzzing process itself is carried out by the afl-fuzz utility. This program
167requires a read-only directory with initial test cases, a separate place to
168store its findings, plus a path to the binary to test.
169
170For target binaries that accept input directly from stdin, the usual syntax is:
171
172$ ./afl-fuzz -i testcase_dir -o findings_dir /path/to/program [...params...]
173
174For programs that take input from a file, use '@@' to mark the location in
175the target's command line where the input file name should be placed. The
176fuzzer will substitute this for you:
177
178$ ./afl-fuzz -i testcase_dir -o findings_dir /path/to/program @@
179
180You can also use the -f option to have the mutated data written to a specific
181file. This is useful if the program expects a particular file extension or so.
182
183Non-instrumented binaries can be fuzzed in the QEMU mode (add -Q in the command
184line) or in a traditional, blind-fuzzer mode (specify -n).
185
186You can use -t and -m to override the default timeout and memory limit for the
187executed process; rare examples of targets that may need these settings touched
188include compilers and video decoders.
189
190Tips for optimizing fuzzing performance are discussed in perf_tips.txt.
191
192Note that afl-fuzz starts by performing an array of deterministic fuzzing
193steps, which can take several days, but tend to produce neat test cases. If you
194want quick & dirty results right away - akin to zzuf and other traditional
195fuzzers - add the -d option to the command line.
196
1977) Interpreting output
198----------------------
199
200See the status_screen.txt file for information on how to interpret the
201displayed stats and monitor the health of the process. Be sure to consult this
202file especially if any UI elements are highlighted in red.
203
204The fuzzing process will continue until you press Ctrl-C. At minimum, you want
205to allow the fuzzer to complete one queue cycle, which may take anywhere from a
206couple of hours to a week or so.
207
208There are three subdirectories created within the output directory and updated
209in real time:
210
211 - queue/ - test cases for every distinctive execution path, plus all the
212 starting files given by the user. This is the synthesized corpus
213 mentioned in section 2.
214
215 Before using this corpus for any other purposes, you can shrink
216 it to a smaller size using the afl-cmin tool. The tool will find
217 a smaller subset of files offering equivalent edge coverage.
218
219 - crashes/ - unique test cases that cause the tested program to receive a
220 fatal signal (e.g., SIGSEGV, SIGILL, SIGABRT). The entries are
221 grouped by the received signal.
222
223 - hangs/ - unique test cases that cause the tested program to time out. The
224 default time limit before something is classified as a hang is
225 the larger of 1 second and the value of the -t parameter.
226 The value can be fine-tuned by setting AFL_HANG_TMOUT, but this
227 is rarely necessary.
228
229Crashes and hangs are considered "unique" if the associated execution paths
230involve any state transitions not seen in previously-recorded faults. If a
231single bug can be reached in multiple ways, there will be some count inflation
232early in the process, but this should quickly taper off.
233
234The file names for crashes and hangs are correlated with parent, non-faulting
235queue entries. This should help with debugging.
236
237When you can't reproduce a crash found by afl-fuzz, the most likely cause is
238that you are not setting the same memory limit as used by the tool. Try:
239
240$ LIMIT_MB=50
241$ ( ulimit -Sv $[LIMIT_MB << 10]; /path/to/tested_binary ... )
242
243Change LIMIT_MB to match the -m parameter passed to afl-fuzz. On OpenBSD,
244also change -Sv to -Sd.
245
246Any existing output directory can be also used to resume aborted jobs; try:
247
248$ ./afl-fuzz -i- -o existing_output_dir [...etc...]
249
250If you have gnuplot installed, you can also generate some pretty graphs for any
251active fuzzing task using afl-plot. For an example of how this looks like,
252see http://lcamtuf.coredump.cx/afl/plot/.
253
2548) Parallelized fuzzing
255-----------------------
256
257Every instance of afl-fuzz takes up roughly one core. This means that on
258multi-core systems, parallelization is necessary to fully utilize the hardware.
259For tips on how to fuzz a common target on multiple cores or multiple networked
260machines, please refer to parallel_fuzzing.txt.
261
262The parallel fuzzing mode also offers a simple way for interfacing AFL to other
263fuzzers, to symbolic or concolic execution engines, and so forth; again, see the
264last section of parallel_fuzzing.txt for tips.
265
2669) Fuzzer dictionaries
267----------------------
268
269By default, afl-fuzz mutation engine is optimized for compact data formats -
270say, images, multimedia, compressed data, regular expression syntax, or shell
271scripts. It is somewhat less suited for languages with particularly verbose and
272redundant verbiage - notably including HTML, SQL, or JavaScript.
273
274To avoid the hassle of building syntax-aware tools, afl-fuzz provides a way to
275seed the fuzzing process with an optional dictionary of language keywords,
276magic headers, or other special tokens associated with the targeted data type
277- and use that to reconstruct the underlying grammar on the go:
278
279 http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html
280
281To use this feature, you first need to create a dictionary in one of the two
282formats discussed in dictionaries/README.dictionaries; and then point the fuzzer
283to it via the -x option in the command line.
284
285(Several common dictionaries are already provided in that subdirectory, too.)
286
287There is no way to provide more structured descriptions of the underlying
288syntax, but the fuzzer will likely figure out some of this based on the
289instrumentation feedback alone. This actually works in practice, say:
290
291 http://lcamtuf.blogspot.com/2015/04/finding-bugs-in-sqlite-easy-way.html
292
293PS. Even when no explicit dictionary is given, afl-fuzz will try to extract
294existing syntax tokens in the input corpus by watching the instrumentation
295very closely during deterministic byte flips. This works for some types of
296parsers and grammars, but isn't nearly as good as the -x mode.
297
298If a dictionary is really hard to come by, another option is to let AFL run
299for a while, and then use the token capture library that comes as a companion
300utility with AFL. For that, see libtokencap/README.tokencap.
301
30210) Crash triage
303----------------
304
305The coverage-based grouping of crashes usually produces a small data set that
306can be quickly triaged manually or with a very simple GDB or Valgrind script.
307Every crash is also traceable to its parent non-crashing test case in the
308queue, making it easier to diagnose faults.
309
310Having said that, it's important to acknowledge that some fuzzing crashes can be
311difficult to quickly evaluate for exploitability without a lot of debugging and
312code analysis work. To assist with this task, afl-fuzz supports a very unique
313"crash exploration" mode enabled with the -C flag.
314
315In this mode, the fuzzer takes one or more crashing test cases as the input,
316and uses its feedback-driven fuzzing strategies to very quickly enumerate all
317code paths that can be reached in the program while keeping it in the
318crashing state.
319
320Mutations that do not result in a crash are rejected; so are any changes that
321do not affect the execution path.
322
323The output is a small corpus of files that can be very rapidly examined to see
324what degree of control the attacker has over the faulting address, or whether
325it is possible to get past an initial out-of-bounds read - and see what lies
326beneath.
327
328Oh, one more thing: for test case minimization, give afl-tmin a try. The tool
329can be operated in a very simple way:
330
331$ ./afl-tmin -i test_case -o minimized_result -- /path/to/program [...]
332
333The tool works with crashing and non-crashing test cases alike. In the crash
334mode, it will happily accept instrumented and non-instrumented binaries. In the
335non-crashing mode, the minimizer relies on standard AFL instrumentation to make
336the file simpler without altering the execution path.
337
338The minimizer accepts the -m, -t, -f and @@ syntax in a manner compatible with
339afl-fuzz.
340
341Another recent addition to AFL is the afl-analyze tool. It takes an input
342file, attempts to sequentially flip bytes, and observes the behavior of the
343tested program. It then color-codes the input based on which sections appear to
344be critical, and which are not; while not bulletproof, it can often offer quick
345insights into complex file formats. More info about its operation can be found
346near the end of technical_details.txt.
347
34811) Going beyond crashes
349------------------------
350
351Fuzzing is a wonderful and underutilized technique for discovering non-crashing
352design and implementation errors, too. Quite a few interesting bugs have been
353found by modifying the target programs to call abort() when, say:
354
355 - Two bignum libraries produce different outputs when given the same
356 fuzzer-generated input,
357
358 - An image library produces different outputs when asked to decode the same
359 input image several times in a row,
360
361 - A serialization / deserialization library fails to produce stable outputs
362 when iteratively serializing and deserializing fuzzer-supplied data,
363
364 - A compression library produces an output inconsistent with the input file
365 when asked to compress and then decompress a particular blob.
366
367Implementing these or similar sanity checks usually takes very little time;
368if you are the maintainer of a particular package, you can make this code
369conditional with #ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION (a flag also
370shared with libfuzzer) or #ifdef __AFL_COMPILER (this one is just for AFL).
371
37212) Common-sense risks
373----------------------
374
375Please keep in mind that, similarly to many other computationally-intensive
376tasks, fuzzing may put strain on your hardware and on the OS. In particular:
377
378 - Your CPU will run hot and will need adequate cooling. In most cases, if
379 cooling is insufficient or stops working properly, CPU speeds will be
380 automatically throttled. That said, especially when fuzzing on less
381 suitable hardware (laptops, smartphones, etc), it's not entirely impossible
382 for something to blow up.
383
384 - Targeted programs may end up erratically grabbing gigabytes of memory or
385 filling up disk space with junk files. AFL tries to enforce basic memory
386 limits, but can't prevent each and every possible mishap. The bottom line
387 is that you shouldn't be fuzzing on systems where the prospect of data loss
388 is not an acceptable risk.
389
390 - Fuzzing involves billions of reads and writes to the filesystem. On modern
391 systems, this will be usually heavily cached, resulting in fairly modest
392 "physical" I/O - but there are many factors that may alter this equation.
393 It is your responsibility to monitor for potential trouble; with very heavy
394 I/O, the lifespan of many HDDs and SSDs may be reduced.
395
396 A good way to monitor disk I/O on Linux is the 'iostat' command:
397
398 $ iostat -d 3 -x -k [...optional disk ID...]
399
40013) Known limitations & areas for improvement
401---------------------------------------------
402
403Here are some of the most important caveats for AFL:
404
405 - AFL detects faults by checking for the first spawned process dying due to
406 a signal (SIGSEGV, SIGABRT, etc). Programs that install custom handlers for
407 these signals may need to have the relevant code commented out. In the same
408 vein, faults in child processed spawned by the fuzzed target may evade
409 detection unless you manually add some code to catch that.
410
411 - As with any other brute-force tool, the fuzzer offers limited coverage if
412 encryption, checksums, cryptographic signatures, or compression are used to
413 wholly wrap the actual data format to be tested.
414
415 To work around this, you can comment out the relevant checks (see
416 experimental/libpng_no_checksum/ for inspiration); if this is not possible,
417 you can also write a postprocessor, as explained in
418 experimental/post_library/.
419
420 - There are some unfortunate trade-offs with ASAN and 64-bit binaries. This
421 isn't due to any specific fault of afl-fuzz; see notes_for_asan.txt for
422 tips.
423
424 - There is no direct support for fuzzing network services, background
425 daemons, or interactive apps that require UI interaction to work. You may
426 need to make simple code changes to make them behave in a more traditional
427 way. Preeny may offer a relatively simple option, too - see:
428 https://github.com/zardus/preeny
429
430 Some useful tips for modifying network-based services can be also found at:
431 https://www.fastly.com/blog/how-to-fuzz-server-american-fuzzy-lop
432
433 - AFL doesn't output human-readable coverage data. If you want to monitor
434 coverage, use afl-cov from Michael Rash: https://github.com/mrash/afl-cov
435
436 - Occasionally, sentient machines rise against their creators. If this
437 happens to you, please consult http://lcamtuf.coredump.cx/prep/.
438
439Beyond this, see INSTALL for platform-specific tips.
440
44114) Special thanks
442------------------
443
444Many of the improvements to afl-fuzz wouldn't be possible without feedback,
445bug reports, or patches from:
446
447 Jann Horn Hanno Boeck
448 Felix Groebert Jakub Wilk
449 Richard W. M. Jones Alexander Cherepanov
450 Tom Ritter Hovik Manucharyan
451 Sebastian Roschke Eberhard Mattes
452 Padraig Brady Ben Laurie
453 @dronesec Luca Barbato
454 Tobias Ospelt Thomas Jarosch
455 Martin Carpenter Mudge Zatko
456 Joe Zbiciak Ryan Govostes
457 Michael Rash William Robinet
458 Jonathan Gray Filipe Cabecinhas
459 Nico Weber Jodie Cunningham
460 Andrew Griffiths Parker Thompson
461 Jonathan Neuschfer Tyler Nighswander
462 Ben Nagy Samir Aguiar
463 Aidan Thornton Aleksandar Nikolich
464 Sam Hakim Laszlo Szekeres
465 David A. Wheeler Turo Lamminen
466 Andreas Stieger Richard Godbee
467 Louis Dassy teor2345
468 Alex Moneger Dmitry Vyukov
469 Keegan McAllister Kostya Serebryany
470 Richo Healey Martijn Bogaard
471 rc0r Jonathan Foote
472 Christian Holler Dominique Pelle
473 Jacek Wielemborek Leo Barnes
474 Jeremy Barnes Jeff Trull
475 Guillaume Endignoux ilovezfs
476 Daniel Godas-Lopez Franjo Ivancic
477 Austin Seipp Daniel Komaromy
478 Daniel Binderman Jonathan Metzman
479 Vegard Nossum Jan Kneschke
480 Kurt Roeckx Marcel Bohme
481 Van-Thuan Pham Abhik Roychoudhury
482 Joshua J. Drake Toby Hutton
483 Rene Freingruber Sergey Davidoff
484 Sami Liedes Craig Young
485 Andrzej Jackowski Daniel Hodson
486
487Thank you!
488
48915) Contact
490-----------
491
492Questions? Concerns? Bug reports? The author can be usually reached at
493<lcamtuf@google.com>.
494
495There is also a mailing list for the project; to join, send a mail to
496<afl-users+subscribe@googlegroups.com>. Or, if you prefer to browse
497archives first, try:
498
499 https://groups.google.com/group/afl-users
500
501PS. If you wish to submit raw code to be incorporated into the project, please
502be aware that the copyright on most of AFL is claimed by Google. While you do
503retain copyright on your contributions, they do ask people to agree to a simple
504CLA first:
505
506 https://cla.developers.google.com/clas
507
508Sorry about the hassle. Of course, no CLA is required for feature requests or
509bug reports.
510