1## Tips for performance optimization 2 3 This file provides tips for troubleshooting slow or wasteful fuzzing jobs. 4 See README.md for the general instruction manual. 5 6## 1. Keep your test cases small 7 8This is probably the single most important step to take! Large test cases do 9not merely take more time and memory to be parsed by the tested binary, but 10also make the fuzzing process dramatically less efficient in several other 11ways. 12 13To illustrate, let's say that you're randomly flipping bits in a file, one bit 14at a time. Let's assume that if you flip bit #47, you will hit a security bug; 15flipping any other bit just results in an invalid document. 16 17Now, if your starting test case is 100 bytes long, you will have a 71% chance of 18triggering the bug within the first 1,000 execs - not bad! But if the test case 19is 1 kB long, the probability that we will randomly hit the right pattern in 20the same timeframe goes down to 11%. And if it has 10 kB of non-essential 21cruft, the odds plunge to 1%. 22 23On top of that, with larger inputs, the binary may be now running 5-10x times 24slower than before - so the overall drop in fuzzing efficiency may be easily 25as high as 500x or so. 26 27In practice, this means that you shouldn't fuzz image parsers with your 28vacation photos. Generate a tiny 16x16 picture instead, and run it through 29`jpegtran` or `pngcrunch` for good measure. The same goes for most other types 30of documents. 31 32There's plenty of small starting test cases in ../testcases/ - try them out 33or submit new ones! 34 35If you want to start with a larger, third-party corpus, run `afl-cmin` with an 36aggressive timeout on that data set first. 37 38## 2. Use a simpler target 39 40Consider using a simpler target binary in your fuzzing work. For example, for 41image formats, bundled utilities such as `djpeg`, `readpng`, or `gifhisto` are 42considerably (10-20x) faster than the convert tool from ImageMagick - all while exercising roughly the same library-level image parsing code. 43 44Even if you don't have a lightweight harness for a particular target, remember 45that you can always use another, related library to generate a corpus that will 46be then manually fed to a more resource-hungry program later on. 47 48Also note that reading the fuzzing input via stdin is faster than reading from 49a file. 50 51## 3. Use LLVM persistent instrumentation 52 53The LLVM mode offers a "persistent", in-process fuzzing mode that can 54work well for certain types of self-contained libraries, and for fast targets, 55can offer performance gains up to 5-10x; and a "deferred fork server" mode 56that can offer huge benefits for programs with high startup overhead. Both 57modes require you to edit the source code of the fuzzed program, but the 58changes often amount to just strategically placing a single line or two. 59 60If there are important data comparisons performed (e.g. `strcmp(ptr, MAGIC_HDR)`) 61then using laf-intel (see instrumentation/README.laf-intel.md) will help `afl-fuzz` a lot 62to get to the important parts in the code. 63 64If you are only interested in specific parts of the code being fuzzed, you can 65instrument_files the files that are actually relevant. This improves the speed and 66accuracy of afl. See instrumentation/README.instrument_list.md 67 68## 4. Profile and optimize the binary 69 70Check for any parameters or settings that obviously improve performance. For 71example, the djpeg utility that comes with IJG jpeg and libjpeg-turbo can be 72called with: 73 74```bash 75 -dct fast -nosmooth -onepass -dither none -scale 1/4 76``` 77 78...and that will speed things up. There is a corresponding drop in the quality 79of decoded images, but it's probably not something you care about. 80 81In some programs, it is possible to disable output altogether, or at least use 82an output format that is computationally inexpensive. For example, with image 83transcoding tools, converting to a BMP file will be a lot faster than to PNG. 84 85With some laid-back parsers, enabling "strict" mode (i.e., bailing out after 86first error) may result in smaller files and improved run time without 87sacrificing coverage; for example, for sqlite, you may want to specify -bail. 88 89If the program is still too slow, you can use `strace -tt` or an equivalent 90profiling tool to see if the targeted binary is doing anything silly. 91Sometimes, you can speed things up simply by specifying `/dev/null` as the 92config file, or disabling some compile-time features that aren't really needed 93for the job (try `./configure --help`). One of the notoriously resource-consuming 94things would be calling other utilities via `exec*()`, `popen()`, `system()`, or 95equivalent calls; for example, tar can invoke external decompression tools 96when it decides that the input file is a compressed archive. 97 98Some programs may also intentionally call `sleep()`, `usleep()`, or `nanosleep()`; 99vim is a good example of that. Other programs may attempt `fsync()` and so on. 100There are third-party libraries that make it easy to get rid of such code, 101e.g.: 102 103 https://launchpad.net/libeatmydata 104 105In programs that are slow due to unavoidable initialization overhead, you may 106want to try the LLVM deferred forkserver mode (see README.llvm.md), 107which can give you speed gains up to 10x, as mentioned above. 108 109Last but not least, if you are using ASAN and the performance is unacceptable, 110consider turning it off for now, and manually examining the generated corpus 111with an ASAN-enabled binary later on. 112 113## 5. Instrument just what you need 114 115Instrument just the libraries you actually want to stress-test right now, one 116at a time. Let the program use system-wide, non-instrumented libraries for 117any functionality you don't actually want to fuzz. For example, in most 118cases, it doesn't make to instrument `libgmp` just because you're testing a 119crypto app that relies on it for bignum math. 120 121Beware of programs that come with oddball third-party libraries bundled with 122their source code (Spidermonkey is a good example of this). Check `./configure` 123options to use non-instrumented system-wide copies instead. 124 125## 6. Parallelize your fuzzers 126 127The fuzzer is designed to need ~1 core per job. This means that on a, say, 1284-core system, you can easily run four parallel fuzzing jobs with relatively 129little performance hit. For tips on how to do that, see parallel_fuzzing.md. 130 131The `afl-gotcpu` utility can help you understand if you still have idle CPU 132capacity on your system. (It won't tell you about memory bandwidth, cache 133misses, or similar factors, but they are less likely to be a concern.) 134 135## 7. Keep memory use and timeouts in check 136 137Consider setting low values for `-m` and `-t`. 138 139For programs that are nominally very fast, but get sluggish for some inputs, 140you can also try setting `-t` values that are more punishing than what `afl-fuzz` 141dares to use on its own. On fast and idle machines, going down to `-t 5` may be 142a viable plan. 143 144The `-m` parameter is worth looking at, too. Some programs can end up spending 145a fair amount of time allocating and initializing megabytes of memory when 146presented with pathological inputs. Low `-m` values can make them give up sooner 147and not waste CPU time. 148 149## 8. Check OS configuration 150 151There are several OS-level factors that may affect fuzzing speed: 152 153 - If you have no risk of power loss then run your fuzzing on a tmpfs 154 partition. This increases the performance noticably. 155 Alternatively you can use `AFL_TMPDIR` to point to a tmpfs location to 156 just write the input file to a tmpfs. 157 - High system load. Use idle machines where possible. Kill any non-essential 158 CPU hogs (idle browser windows, media players, complex screensavers, etc). 159 - Network filesystems, either used for fuzzer input / output, or accessed by 160 the fuzzed binary to read configuration files (pay special attention to the 161 home directory - many programs search it for dot-files). 162 - Disable all the spectre, meltdown etc. security countermeasures in the 163 kernel if your machine is properly separated: 164 165``` 166ibpb=off ibrs=off kpti=off l1tf=off mds=off mitigations=off 167no_stf_barrier noibpb noibrs nopcid nopti nospec_store_bypass_disable 168nospectre_v1 nospectre_v2 pcid=off pti=off spec_store_bypass_disable=off 169spectre_v2=off stf_barrier=off 170``` 171 In most Linux distributions you can put this into a `/etc/default/grub` 172 variable. 173 174The following list of changes are made when executing `afl-system-config`: 175 176 - On-demand CPU scaling. The Linux `ondemand` governor performs its analysis 177 on a particular schedule and is known to underestimate the needs of 178 short-lived processes spawned by `afl-fuzz` (or any other fuzzer). On Linux, 179 this can be fixed with: 180 181``` bash 182 cd /sys/devices/system/cpu 183 echo performance | tee cpu*/cpufreq/scaling_governor 184``` 185 186 On other systems, the impact of CPU scaling will be different; when fuzzing, 187 use OS-specific tools to find out if all cores are running at full speed. 188 - Transparent huge pages. Some allocators, such as `jemalloc`, can incur a 189 heavy fuzzing penalty when transparent huge pages (THP) are enabled in the 190 kernel. You can disable this via: 191 192```bash 193 echo never > /sys/kernel/mm/transparent_hugepage/enabled 194``` 195 196 - Suboptimal scheduling strategies. The significance of this will vary from 197 one target to another, but on Linux, you may want to make sure that the 198 following options are set: 199 200```bash 201 echo 1 >/proc/sys/kernel/sched_child_runs_first 202 echo 1 >/proc/sys/kernel/sched_autogroup_enabled 203``` 204 205 Setting a different scheduling policy for the fuzzer process - say 206 `SCHED_RR` - can usually speed things up, too, but needs to be done with 207 care. 208 209