1# Getting started with fuzzing in Chromium 2 3This document walks you through the basic steps to start fuzzing and suggestions 4for improving your fuzz targets. If you're looking for more advanced fuzzing 5topics, see the [main page](README.md). 6 7[TOC] 8 9## Getting started 10 11### Setting up your build environment 12 13Generate build files by using the `use_libfuzzer` [GN] argument together with a 14sanitizer: 15 16```bash 17# AddressSanitizer is the default config we recommend testing with. 18# Linux: 19tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Upload Linux ASan' out/libfuzzer 20# Chrome OS: 21tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Upload Chrome OS ASan' out/libfuzzer 22# Mac: 23tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Upload Mac ASan' out/libfuzzer 24# Windows: 25python tools\mb\mb.py gen -m chromium.fuzz -b "Libfuzzer Upload Windows ASan" out\libfuzzer 26``` 27 28*** note 29**Note:** You can also invoke [AFL] by using the `use_afl` GN argument, but we 30recommend libFuzzer for local development. Running libFuzzer locally doesn't 31require any special configuration and gives quick, meaningful output for speed, 32coverage, and other parameters. 33*** 34 35It’s possible to run fuzz targets without sanitizers, but not recommended, as 36sanitizers help to detect errors which may not result in a crash otherwise. 37`use_libfuzzer` is supported in the following sanitizer configurations. 38 39| GN Argument | Description | Supported OS | 40|-------------|-------------|--------------| 41| `is_asan=true` | Enables [AddressSanitizer] to catch problems like buffer overruns. | Linux, Windows, Mac, Chrome OS | 42| `is_msan=true` | Enables [MemorySanitizer] to catch problems like uninitialized reads<sup>\[[\*](reference.md#MSan)\]</sup>. | Linux | 43| `is_ubsan_security=true` | Enables [UndefinedBehaviorSanitizer] to catch<sup>\[[\*](reference.md#UBSan)\]</sup> undefined behavior like integer overflow.| Linux | 44 45For more on builder and sanitizer configurations, see the [Integration 46Reference] page. 47 48*** note 49**Hint**: Fuzz targets are built with minimal symbols by default. You can adjust 50the symbol level by setting the `symbol_level` attribute. 51*** 52 53### Creating your first fuzz target 54 55After you set up your build environment, you can create your first fuzz target: 56 571. In the same directory as the code you are going to fuzz (or next to the tests 58 for that code), create a new `<my_fuzzer>.cc` file. 59 60 *** note 61 **Note:** Do not use the `testing/libfuzzer/fuzzers` directory. This 62 directory was used for initial sample fuzz targets but is no longer 63 recommended for landing new targets. 64 *** 65 662. In the new file, define a `LLVMFuzzerTestOneInput` function: 67 68 ```cpp 69 #include <stddef.h> 70 #include <stdint.h> 71 72 extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { 73 // Put your fuzzing code here and use |data| and |size| as input. 74 return 0; 75 } 76 ``` 77 783. In `BUILD.gn` file, define a `fuzzer_test` GN target: 79 80 ```python 81 import("//testing/libfuzzer/fuzzer_test.gni") 82 fuzzer_test("my_fuzzer") { 83 sources = [ "my_fuzzer.cc" ] 84 deps = [ ... ] 85 } 86 ``` 87 88*** note 89**Note:** Most of the targets are small. They may perform one or a few API calls 90using the data provided by the fuzzing engine as an argument. However, fuzz 91targets may be more complex if a certain initialization procedure needs to be 92performed. [quic_stream_factory_fuzzer.cc] is a good example of a complex fuzz 93target. 94*** 95 96### Running the fuzz target 97 98After you create your fuzz target, build it with ninja and run it locally: 99 100```bash 101# Build the fuzz target. 102ninja -C out/libfuzzer url_parse_fuzzer 103# Create an empty corpus directory. 104mkdir corpus 105# Run the fuzz target. 106./out/libfuzzer/url_parse_fuzzer corpus 107# If have other corpus directories, pass their paths as well: 108./out/libfuzzer/url_parse_fuzzer corpus seed_corpus_dir_1 seed_corpus_dir_N 109``` 110 111Your fuzz target should produce output like this: 112 113``` 114INFO: Seed: 1511722356 115INFO: Loaded 2 modules (115485 guards): 22572 [0x7fe8acddf560, 0x7fe8acdf5610), 92913 [0xaa05d0, 0xafb194), 116INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes 117INFO: A corpus is not provided, starting from an empty corpus 118#2 INITED cov: 961 ft: 48 corp: 1/1b exec/s: 0 rss: 48Mb 119#3 NEW cov: 986 ft: 70 corp: 2/104b exec/s: 0 rss: 48Mb L: 103/103 MS: 1 InsertRepeatedBytes- 120#4 NEW cov: 989 ft: 74 corp: 3/106b exec/s: 0 rss: 48Mb L: 2/103 MS: 1 InsertByte- 121#6 NEW cov: 991 ft: 76 corp: 4/184b exec/s: 0 rss: 48Mb L: 78/103 MS: 2 CopyPart-InsertRepeatedBytes- 122``` 123 124A `... NEW ...` line appears when libFuzzer finds new and interesting inputs. If 125your fuzz target is efficient, it will find a lot of them quickly. A `... pulse 126...` line appears periodically to show the current status. 127 128For more information about the output, see [libFuzzer's output documentation]. 129 130*** note 131**Note:** If you observe an `odr-violation` error in the log, please try setting 132the following environment variable: `ASAN_OPTIONS=detect_odr_violation=0` and 133running the fuzz target again. 134*** 135 136#### Symbolizing a stacktrace 137 138If your fuzz target crashes when running locally and you see non-symbolized 139stacktrace, make sure you add the `third_party/llvm-build/Release+Asserts/bin/` 140directory from Chromium’s Clang package in `$PATH`. This directory contains the 141`llvm-symbolizer` binary. 142 143Alternatively, you can set an `external_symbolizer_path` via the `ASAN_OPTIONS` 144environment variable: 145 146```bash 147ASAN_OPTIONS=external_symbolizer_path=/my/local/llvm/build/llvm-symbolizer \ 148 ./fuzzer ./crash-input 149``` 150 151The same approach works with other sanitizers via `MSAN_OPTIONS`, 152`UBSAN_OPTIONS`, etc. 153 154### Submitting your fuzz target 155 156ClusterFuzz and the build infrastructure automatically discover, build and 157execute all `fuzzer_test` targets in the Chromium repository. Once you land your 158fuzz target, ClusterFuzz will run it at scale. Check the [ClusterFuzz status] 159page after a day or two. 160 161If you want to better understand and optimize your fuzz target’s performance, 162see the [Efficient Fuzzing Guide]. 163 164*** note 165**Note:** It’s important to run fuzzers at scale, not just in your own 166environment, because local fuzzing will catch fewer issues. If you run fuzz 167targets at scale continuously, you’ll catch regressions and improve code 168coverage over time. 169*** 170 171## Optional improvements 172 173### Common tricks 174 175Your fuzz target may immediately discover interesting (i.e. crashing) inputs. 176You can make it more effective with several easy steps: 177 178* **Create a seed corpus**. You can guide the fuzzing engine to generate more 179 relevant inputs by adding the `seed_corpus = "src/fuzz-testcases/"` attribute 180 to your fuzz target and adding example files to the appropriate directory. For 181 more, see the [Seed Corpus] section of the [Efficient Fuzzing Guide]. 182 183 *** note 184 **Note:** make sure your corpus files are appropriately licensed. 185 *** 186 187* **Create a mutation dictionary**. You can make mutations more effective by 188 providing the fuzzer with a `dict = "protocol.dict"` GN attribute and a 189 dictionary file that contains interesting strings / byte sequences for the 190 target API. For more, see the [Fuzzer Dictionary] section of the [Efficient 191 Fuzzer Guide]. 192 193* **Specify testcase length limits**. Long inputs can be problematic, because 194 they are more slowly processed by the fuzz target and increase the search 195 space. By default, libFuzzer uses `-max_len=4096` or takes the longest 196 testcase in the corpus if `-max_len` is not specified. 197 198 ClusterFuzz uses different strategies for different fuzzing sessions, 199 including different random values. Also, ClusterFuzz uses different fuzzing 200 engines (e.g. AFL that doesn't have `-max_len` option). If your target has an 201 input length limit that you would like to *strictly enforce*, add a sanity 202 check to the beginning of your `LLVMFuzzerTestOneInput` function: 203 204 ```cpp 205 if (size < kMinInputLength || size > kMaxInputLength) 206 return 0; 207 ``` 208 209* **Generate a [code coverage report]**. See which code the fuzzer covered in 210 recent runs, so you can gauge whether it hits the important code parts or not. 211 212 **Note:** Since the code coverage of a fuzz target depends heavily on the 213 corpus provided when running the target, we recommend running the fuzz target 214 built with ASan locally for a little while (several minutes / hours) first. 215 This will produce some corpus, which should be used for generating a code 216 coverage report. 217 218#### Disabling noisy error message logging 219 220If the code you’re fuzzing generates a lot of error messages when encountering 221incorrect or invalid data, the fuzz target will be slow and inefficient. 222 223If the target uses Chromium logging APIs, you can silence errors by overriding 224the environment used for logging in your fuzz target: 225 226```cpp 227struct Environment { 228 Environment() { 229 logging::SetMinLogLevel(logging::LOG_FATAL); 230 } 231}; 232 233extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { 234 static Environment env; 235 236 // Put your fuzzing code here and use data+size as input. 237 return 0; 238} 239``` 240 241### Mutating Multiple Inputs 242 243By default, a fuzzing engine such as libFuzzer mutates a single input (`uint8_t* 244data, size_t size`). However, APIs often accept multiple arguments of various 245types, rather than a single buffer. You can use three different methods to 246mutate multiple inputs at once. 247 248#### libprotobuf-mutator (LPM) 249 250If you need to mutate multiple inputs of various types and length, see [Getting 251Started with libprotobuf-mutator in Chromium]. 252 253*** note 254**Note:** This method works with APIs and data structures of any complexity, but 255requires extra effort. You would need to write a `.proto` definition (unless you 256fuzz an existing protobuf) and C++ code to pass the proto message to the API you 257are fuzzing (you'll have a fuzzed protobuf message instead of `data, size` 258buffer). 259*** 260 261#### FuzzedDataProvider (FDP) 262 263[FuzzedDataProvider] is a class useful for splitting a fuzz input into multiple 264parts of various types. 265 266*** note 267**Note:** FDP is much easier to use than LPM, but its downside is that format of 268the corpus becomes inconsistent. This doesn't matter if you don't have [Seed 269Corpus] (e.g. valid image files if you fuzz an image parser). FDP splits your 270corpus files into several pieces to fuzz a broader range of input types, so it 271can take longer to reach deeper code paths that surface more quickly if you fuzz 272only a single input type. 273*** 274 275To use FDP, add `#include <fuzzer/FuzzedDataProvider.h>` to your fuzz target 276source file. 277 278To learn more about `FuzzedDataProvider`, check out the [upstream documentation] 279on it. It gives an overview of the available methods and links to a few example 280fuzz targets. 281 282#### Hash-based argument 283 284If your API accepts a buffer with data and some integer value (i.e., a bitwise 285combination of flags), you can calculate a hash value from (`data, size`) and 286use it to fuzz an additional integer argument. For example: 287 288```cpp 289extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { 290 std::string str = std::string(reinterpret_cast<const char*>(data), size); 291 std::size_t data_hash = std::hash<std::string>()(str); 292 APIToBeFuzzed(data, size, data_hash); 293 return 0; 294} 295 296``` 297 298*** note 299**Note:** The hash method doesn't have the corpus format issue mentioned in the 300FDP section above, but it can lead to results that aren't as sophisticated as 301LPM or FDP. The hash value derived from the data is a random value, rather than 302a meaningful one controlled by the fuzzing engine. A single bit mutation might 303lead to a new code coverage, but the next mutation would generate a new hash 304value and trigger another code path, without providing any real guidance to the 305fuzzing engine. 306*** 307 308[AFL]: AFL_integration.md 309[AddressSanitizer]: http://clang.llvm.org/docs/AddressSanitizer.html 310[ClusterFuzz status]: libFuzzer_integration.md#Status-Links 311[Efficient Fuzzing Guide]: efficient_fuzzing.md 312[FuzzedDataProvider]: https://cs.chromium.org/chromium/src/third_party/libFuzzer/src/utils/FuzzedDataProvider.h 313[Fuzzer Dictionary]: efficient_fuzzing.md#Fuzzer-dictionary 314[GN]: https://gn.googlesource.com/gn/+/master/README.md 315[Getting Started with libprotobuf-mutator in Chromium]: libprotobuf-mutator.md 316[Integration Reference]: reference.md 317[MemorySanitizer]: http://clang.llvm.org/docs/MemorySanitizer.html 318[Seed Corpus]: efficient_fuzzing.md#Seed-corpus 319[UndefinedBehaviorSanitizer]: http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html 320[code coverage report]: efficient_fuzzing.md#Code-coverage 321[crbug/598448]: https://bugs.chromium.org/p/chromium/issues/detail?id=598448 322[upstream documentation]: https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#fuzzed-data-provider 323[libFuzzer's output documentation]: http://llvm.org/docs/LibFuzzer.html#output 324[quic_stream_factory_fuzzer.cc]: https://cs.chromium.org/chromium/src/net/quic/quic_stream_factory_fuzzer.cc 325[sanitizers]: https://github.com/google/sanitizers 326