1# Getting started with fuzzing in Chromium
2
3This document walks you through the basic steps to start fuzzing and suggestions
4for improving your fuzz targets. If you're looking for more advanced fuzzing
5topics, see the [main page](README.md).
6
7[TOC]
8
9## Getting started
10
11### Setting up your build environment
12
13Generate build files by using the `use_libfuzzer` [GN] argument together with a
14sanitizer:
15
16```bash
17# AddressSanitizer is the default config we recommend testing with.
18# Linux:
19tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Upload Linux ASan' out/libfuzzer
20# Chrome OS:
21tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Upload Chrome OS ASan' out/libfuzzer
22# Mac:
23tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Upload Mac ASan' out/libfuzzer
24# Windows:
25python tools\mb\mb.py gen -m chromium.fuzz -b "Libfuzzer Upload Windows ASan" out\libfuzzer
26```
27
28*** note
29**Note:** You can also invoke [AFL] by using the `use_afl` GN argument, but we
30recommend libFuzzer for local development. Running libFuzzer locally doesn't
31require any special configuration and gives quick, meaningful output for speed,
32coverage, and other parameters.
33***
34
35It’s possible to run fuzz targets without sanitizers, but not recommended, as
36sanitizers help to detect errors which may not result in a crash otherwise.
37`use_libfuzzer` is supported in the following sanitizer configurations.
38
39| GN Argument | Description | Supported OS |
40|-------------|-------------|--------------|
41| `is_asan=true` | Enables [AddressSanitizer] to catch problems like buffer overruns. | Linux, Windows, Mac, Chrome OS |
42| `is_msan=true` | Enables [MemorySanitizer] to catch problems like uninitialized reads<sup>\[[\*](reference.md#MSan)\]</sup>. | Linux |
43| `is_ubsan_security=true` | Enables [UndefinedBehaviorSanitizer] to catch<sup>\[[\*](reference.md#UBSan)\]</sup> undefined behavior like integer overflow.| Linux |
44
45For more on builder and sanitizer configurations, see the [Integration
46Reference] page.
47
48*** note
49**Hint**: Fuzz targets are built with minimal symbols by default. You can adjust
50the symbol level by setting the `symbol_level` attribute.
51***
52
53### Creating your first fuzz target
54
55After you set up your build environment, you can create your first fuzz target:
56
571. In the same directory as the code you are going to fuzz (or next to the tests
58   for that code), create a new `<my_fuzzer>.cc` file.
59
60   *** note
61   **Note:** Do not use the `testing/libfuzzer/fuzzers` directory. This
62   directory was used for initial sample fuzz targets but is no longer
63   recommended for landing new targets.
64   ***
65
662. In the new file, define a `LLVMFuzzerTestOneInput` function:
67
68  ```cpp
69  #include <stddef.h>
70  #include <stdint.h>
71
72  extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
73    // Put your fuzzing code here and use |data| and |size| as input.
74    return 0;
75  }
76  ```
77
783. In `BUILD.gn` file, define a `fuzzer_test` GN target:
79
80  ```python
81  import("//testing/libfuzzer/fuzzer_test.gni")
82  fuzzer_test("my_fuzzer") {
83    sources = [ "my_fuzzer.cc" ]
84    deps = [ ... ]
85  }
86  ```
87
88*** note
89**Note:** Most of the targets are small. They may perform one or a few API calls
90using the data provided by the fuzzing engine as an argument. However, fuzz
91targets may be more complex if a certain initialization procedure needs to be
92performed. [quic_stream_factory_fuzzer.cc] is a good example of a complex fuzz
93target.
94***
95
96### Running the fuzz target
97
98After you create your fuzz target, build it with ninja and run it locally:
99
100```bash
101# Build the fuzz target.
102ninja -C out/libfuzzer url_parse_fuzzer
103# Create an empty corpus directory.
104mkdir corpus
105# Run the fuzz target.
106./out/libfuzzer/url_parse_fuzzer corpus
107# If have other corpus directories, pass their paths as well:
108./out/libfuzzer/url_parse_fuzzer corpus seed_corpus_dir_1 seed_corpus_dir_N
109```
110
111Your fuzz target should produce output like this:
112
113```
114INFO: Seed: 1511722356
115INFO: Loaded 2 modules   (115485 guards): 22572 [0x7fe8acddf560, 0x7fe8acdf5610), 92913 [0xaa05d0, 0xafb194),
116INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
117INFO: A corpus is not provided, starting from an empty corpus
118#2  INITED cov: 961 ft: 48 corp: 1/1b exec/s: 0 rss: 48Mb
119#3  NEW    cov: 986 ft: 70 corp: 2/104b exec/s: 0 rss: 48Mb L: 103/103 MS: 1 InsertRepeatedBytes-
120#4  NEW    cov: 989 ft: 74 corp: 3/106b exec/s: 0 rss: 48Mb L: 2/103 MS: 1 InsertByte-
121#6  NEW    cov: 991 ft: 76 corp: 4/184b exec/s: 0 rss: 48Mb L: 78/103 MS: 2 CopyPart-InsertRepeatedBytes-
122```
123
124A `... NEW ...` line appears when libFuzzer finds new and interesting inputs. If
125your fuzz target is efficient, it will find a lot of them quickly. A `... pulse
126...` line appears periodically to show the current status.
127
128For more information about the output, see [libFuzzer's output documentation].
129
130*** note
131**Note:** If you observe an `odr-violation` error in the log, please try setting
132the following environment variable: `ASAN_OPTIONS=detect_odr_violation=0` and
133running the fuzz target again.
134***
135
136#### Symbolizing a stacktrace
137
138If your fuzz target crashes when running locally and you see non-symbolized
139stacktrace, make sure you add the `third_party/llvm-build/Release+Asserts/bin/`
140directory from Chromium’s Clang package in `$PATH`. This directory contains the
141`llvm-symbolizer` binary.
142
143Alternatively, you can set an `external_symbolizer_path` via the `ASAN_OPTIONS`
144environment variable:
145
146```bash
147ASAN_OPTIONS=external_symbolizer_path=/my/local/llvm/build/llvm-symbolizer \
148  ./fuzzer ./crash-input
149```
150
151The same approach works with other sanitizers via `MSAN_OPTIONS`,
152`UBSAN_OPTIONS`, etc.
153
154### Submitting your fuzz target
155
156ClusterFuzz and the build infrastructure automatically discover, build and
157execute all `fuzzer_test` targets in the Chromium repository. Once you land your
158fuzz target, ClusterFuzz will run it at scale. Check the [ClusterFuzz status]
159page after a day or two.
160
161If you want to better understand and optimize your fuzz target’s performance,
162see the [Efficient Fuzzing Guide].
163
164*** note
165**Note:** It’s important to run fuzzers at scale, not just in your own
166environment, because local fuzzing will catch fewer issues. If you run fuzz
167targets at scale continuously, you’ll catch regressions and improve code
168coverage over time.
169***
170
171## Optional improvements
172
173### Common tricks
174
175Your fuzz target may immediately discover interesting (i.e. crashing) inputs.
176You can make it more effective with several easy steps:
177
178* **Create a seed corpus**. You can guide the fuzzing engine to generate more
179  relevant inputs by adding the `seed_corpus = "src/fuzz-testcases/"` attribute
180  to your fuzz target and adding example files to the appropriate directory. For
181  more, see the [Seed Corpus] section of the [Efficient Fuzzing Guide].
182
183  *** note
184  **Note:** make sure your corpus files are appropriately licensed.
185  ***
186
187* **Create a mutation dictionary**. You can make mutations more effective by
188  providing the fuzzer with a `dict = "protocol.dict"` GN attribute and a
189  dictionary file that contains interesting strings / byte sequences for the
190  target API. For more, see the [Fuzzer Dictionary] section of the [Efficient
191  Fuzzer Guide].
192
193* **Specify testcase length limits**. Long inputs can be problematic, because
194  they are more slowly processed by the fuzz target and increase the search
195  space. By default, libFuzzer uses `-max_len=4096` or takes the longest
196  testcase in the corpus if `-max_len` is not specified.
197
198  ClusterFuzz uses different strategies for different fuzzing sessions,
199  including different random values. Also, ClusterFuzz uses different fuzzing
200  engines (e.g. AFL that doesn't have `-max_len` option). If your target has an
201  input length limit that you would like to *strictly enforce*, add a sanity
202  check to the beginning of your `LLVMFuzzerTestOneInput` function:
203
204  ```cpp
205  if (size < kMinInputLength || size > kMaxInputLength)
206    return 0;
207  ```
208
209* **Generate a [code coverage report]**. See which code the fuzzer covered in
210  recent runs, so you can gauge whether it hits the important code parts or not.
211
212  **Note:** Since the code coverage of a fuzz target depends heavily on the
213  corpus provided when running the target, we recommend running the fuzz target
214  built with ASan locally for a little while (several minutes / hours) first.
215  This will produce some corpus, which should be used for generating a code
216  coverage report.
217
218#### Disabling noisy error message logging
219
220If the code you’re fuzzing generates a lot of error messages when encountering
221incorrect or invalid data, the fuzz target will be slow and inefficient.
222
223If the target uses Chromium logging APIs, you can silence errors by overriding
224the environment used for logging in your fuzz target:
225
226```cpp
227struct Environment {
228  Environment() {
229    logging::SetMinLogLevel(logging::LOG_FATAL);
230  }
231};
232
233extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
234  static Environment env;
235
236  // Put your fuzzing code here and use data+size as input.
237  return 0;
238}
239```
240
241### Mutating Multiple Inputs
242
243By default, a fuzzing engine such as libFuzzer mutates a single input (`uint8_t*
244data, size_t size`). However, APIs often accept multiple arguments of various
245types, rather than a single buffer. You can use three different methods to
246mutate multiple inputs at once.
247
248#### libprotobuf-mutator (LPM)
249
250If you need to mutate multiple inputs of various types and length, see [Getting
251Started with libprotobuf-mutator in Chromium].
252
253*** note
254**Note:** This method works with APIs and data structures of any complexity, but
255requires extra effort. You would need to write a `.proto` definition (unless you
256fuzz an existing protobuf) and C++ code to pass the proto message to the API you
257are fuzzing (you'll have a fuzzed protobuf message instead of `data, size`
258buffer).
259***
260
261#### FuzzedDataProvider (FDP)
262
263[FuzzedDataProvider] is a class useful for splitting a fuzz input into multiple
264parts of various types.
265
266*** note
267**Note:** FDP is much easier to use than LPM, but its downside is that format of
268the corpus becomes inconsistent. This doesn't matter if you don't have [Seed
269Corpus] (e.g. valid image files if you fuzz an image parser). FDP splits your
270corpus files into several pieces to fuzz a broader range of input types, so it
271can take longer to reach deeper code paths that surface more quickly if you fuzz
272only a single input type.
273***
274
275To use FDP, add `#include <fuzzer/FuzzedDataProvider.h>` to your fuzz target
276source file.
277
278To learn more about `FuzzedDataProvider`, check out the [upstream documentation]
279on it. It gives an overview of the available methods and links to a few example
280fuzz targets.
281
282#### Hash-based argument
283
284If your API accepts a buffer with data and some integer value (i.e., a bitwise
285combination of flags), you can calculate a hash value from (`data, size`) and
286use it to fuzz an additional integer argument. For example:
287
288```cpp
289extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
290  std::string str = std::string(reinterpret_cast<const char*>(data), size);
291  std::size_t data_hash = std::hash<std::string>()(str);
292  APIToBeFuzzed(data, size, data_hash);
293  return 0;
294}
295
296```
297
298*** note
299**Note:** The hash method doesn't have the corpus format issue mentioned in the
300FDP section above, but it can lead to results that aren't as sophisticated as
301LPM or FDP. The hash value derived from the data is a random value, rather than
302a meaningful one controlled by the fuzzing engine. A single bit mutation might
303lead to a new code coverage, but the next mutation would generate a new hash
304value and trigger another code path, without providing any real guidance to the
305fuzzing engine.
306***
307
308[AFL]: AFL_integration.md
309[AddressSanitizer]: http://clang.llvm.org/docs/AddressSanitizer.html
310[ClusterFuzz status]: libFuzzer_integration.md#Status-Links
311[Efficient Fuzzing Guide]: efficient_fuzzing.md
312[FuzzedDataProvider]: https://cs.chromium.org/chromium/src/third_party/libFuzzer/src/utils/FuzzedDataProvider.h
313[Fuzzer Dictionary]: efficient_fuzzing.md#Fuzzer-dictionary
314[GN]: https://gn.googlesource.com/gn/+/master/README.md
315[Getting Started with libprotobuf-mutator in Chromium]: libprotobuf-mutator.md
316[Integration Reference]: reference.md
317[MemorySanitizer]: http://clang.llvm.org/docs/MemorySanitizer.html
318[Seed Corpus]: efficient_fuzzing.md#Seed-corpus
319[UndefinedBehaviorSanitizer]: http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
320[code coverage report]: efficient_fuzzing.md#Code-coverage
321[crbug/598448]: https://bugs.chromium.org/p/chromium/issues/detail?id=598448
322[upstream documentation]: https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#fuzzed-data-provider
323[libFuzzer's output documentation]: http://llvm.org/docs/LibFuzzer.html#output
324[quic_stream_factory_fuzzer.cc]: https://cs.chromium.org/chromium/src/net/quic/quic_stream_factory_fuzzer.cc
325[sanitizers]: https://github.com/google/sanitizers
326