1=================
2SanitizerCoverage
3=================
4
5.. contents::
6   :local:
7
8Introduction
9============
10
11LLVM has a simple code coverage instrumentation built in (SanitizerCoverage).
12It inserts calls to user-defined functions on function-, basic-block-, and edge- levels.
13Default implementations of those callbacks are provided and implement
14simple coverage reporting and visualization,
15however if you need *just* coverage visualization you may want to use
16:doc:`SourceBasedCodeCoverage <SourceBasedCodeCoverage>` instead.
17
18Tracing PCs with guards
19=======================
20
21With ``-fsanitize-coverage=trace-pc-guard`` the compiler will insert the following code
22on every edge:
23
24.. code-block:: none
25
26   __sanitizer_cov_trace_pc_guard(&guard_variable)
27
28Every edge will have its own `guard_variable` (uint32_t).
29
30The compler will also insert calls to a module constructor:
31
32.. code-block:: c++
33
34   // The guards are [start, stop).
35   // This function will be called at least once per DSO and may be called
36   // more than once with the same values of start/stop.
37   __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop);
38
39With an additional ``...=trace-pc,indirect-calls`` flag
40``__sanitizer_cov_trace_pc_indirect(void *callee)`` will be inserted on every indirect call.
41
42The functions `__sanitizer_cov_trace_pc_*` should be defined by the user.
43
44Example:
45
46.. code-block:: c++
47
48  // trace-pc-guard-cb.cc
49  #include <stdint.h>
50  #include <stdio.h>
51  #include <sanitizer/coverage_interface.h>
52
53  // This callback is inserted by the compiler as a module constructor
54  // into every DSO. 'start' and 'stop' correspond to the
55  // beginning and end of the section with the guards for the entire
56  // binary (executable or DSO). The callback will be called at least
57  // once per DSO and may be called multiple times with the same parameters.
58  extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *start,
59                                                      uint32_t *stop) {
60    static uint64_t N;  // Counter for the guards.
61    if (start == stop || *start) return;  // Initialize only once.
62    printf("INIT: %p %p\n", start, stop);
63    for (uint32_t *x = start; x < stop; x++)
64      *x = ++N;  // Guards should start from 1.
65  }
66
67  // This callback is inserted by the compiler on every edge in the
68  // control flow (some optimizations apply).
69  // Typically, the compiler will emit the code like this:
70  //    if(*guard)
71  //      __sanitizer_cov_trace_pc_guard(guard);
72  // But for large functions it will emit a simple call:
73  //    __sanitizer_cov_trace_pc_guard(guard);
74  extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
75    if (!*guard) return;  // Duplicate the guard check.
76    // If you set *guard to 0 this code will not be called again for this edge.
77    // Now you can get the PC and do whatever you want:
78    //   store it somewhere or symbolize it and print right away.
79    // The values of `*guard` are as you set them in
80    // __sanitizer_cov_trace_pc_guard_init and so you can make them consecutive
81    // and use them to dereference an array or a bit vector.
82    void *PC = __builtin_return_address(0);
83    char PcDescr[1024];
84    // This function is a part of the sanitizer run-time.
85    // To use it, link with AddressSanitizer or other sanitizer.
86    __sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr));
87    printf("guard: %p %x PC %s\n", guard, *guard, PcDescr);
88  }
89
90.. code-block:: c++
91
92  // trace-pc-guard-example.cc
93  void foo() { }
94  int main(int argc, char **argv) {
95    if (argc > 1) foo();
96  }
97
98.. code-block:: console
99
100  clang++ -g  -fsanitize-coverage=trace-pc-guard trace-pc-guard-example.cc -c
101  clang++ trace-pc-guard-cb.cc trace-pc-guard-example.o -fsanitize=address
102  ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out
103
104.. code-block:: console
105
106  INIT: 0x71bcd0 0x71bce0
107  guard: 0x71bcd4 2 PC 0x4ecd5b in main trace-pc-guard-example.cc:2
108  guard: 0x71bcd8 3 PC 0x4ecd9e in main trace-pc-guard-example.cc:3:7
109
110.. code-block:: console
111
112  ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out with-foo
113
114
115.. code-block:: console
116
117  INIT: 0x71bcd0 0x71bce0
118  guard: 0x71bcd4 2 PC 0x4ecd5b in main trace-pc-guard-example.cc:3
119  guard: 0x71bcdc 4 PC 0x4ecdc7 in main trace-pc-guard-example.cc:4:17
120  guard: 0x71bcd0 1 PC 0x4ecd20 in foo() trace-pc-guard-example.cc:2:14
121
122Inline 8bit-counters
123====================
124
125**Experimental, may change or disappear in future**
126
127With ``-fsanitize-coverage=inline-8bit-counters`` the compiler will insert
128inline counter increments on every edge.
129This is similar to ``-fsanitize-coverage=trace-pc-guard`` but instead of a
130callback the instrumentation simply increments a counter.
131
132Users need to implement a single function to capture the counters at startup.
133
134.. code-block:: c++
135
136  extern "C"
137  void __sanitizer_cov_8bit_counters_init(char *start, char *end) {
138    // [start,end) is the array of 8-bit counters created for the current DSO.
139    // Capture this array in order to read/modify the counters.
140  }
141
142PC-Table
143========
144
145**Experimental, may change or disappear in future**
146
147**Note:** this instrumentation might be incompatible with dead code stripping
148(``-Wl,-gc-sections``) for linkers other than LLD, thus resulting in a
149significant binary size overhead. For more information, see
150`Bug 34636 <https://bugs.llvm.org/show_bug.cgi?id=34636>`_.
151
152With ``-fsanitize-coverage=pc-table`` the compiler will create a table of
153instrumented PCs. Requires either ``-fsanitize-coverage=inline-8bit-counters`` or
154``-fsanitize-coverage=trace-pc-guard``.
155
156Users need to implement a single function to capture the PC table at startup:
157
158.. code-block:: c++
159
160  extern "C"
161  void __sanitizer_cov_pcs_init(const uintptr_t *pcs_beg,
162                                const uintptr_t *pcs_end) {
163    // [pcs_beg,pcs_end) is the array of ptr-sized integers representing
164    // pairs [PC,PCFlags] for every instrumented block in the current DSO.
165    // Capture this array in order to read the PCs and their Flags.
166    // The number of PCs and PCFlags for a given DSO is the same as the number
167    // of 8-bit counters (-fsanitize-coverage=inline-8bit-counters) or
168    // trace_pc_guard callbacks (-fsanitize-coverage=trace-pc-guard)
169    // A PCFlags describes the basic block:
170    //  * bit0: 1 if the block is the function entry block, 0 otherwise.
171  }
172
173
174Tracing PCs
175===========
176
177With ``-fsanitize-coverage=trace-pc`` the compiler will insert
178``__sanitizer_cov_trace_pc()`` on every edge.
179With an additional ``...=trace-pc,indirect-calls`` flag
180``__sanitizer_cov_trace_pc_indirect(void *callee)`` will be inserted on every indirect call.
181These callbacks are not implemented in the Sanitizer run-time and should be defined
182by the user.
183This mechanism is used for fuzzing the Linux kernel
184(https://github.com/google/syzkaller).
185
186Instrumentation points
187======================
188Sanitizer Coverage offers different levels of instrumentation.
189
190* ``edge`` (default): edges are instrumented (see below).
191* ``bb``: basic blocks are instrumented.
192* ``func``: only the entry block of every function will be instrumented.
193
194Use these flags together with ``trace-pc-guard`` or ``trace-pc``,
195like this: ``-fsanitize-coverage=func,trace-pc-guard``.
196
197When ``edge`` or ``bb`` is used, some of the edges/blocks may still be left
198uninstrumented (pruned) if such instrumentation is considered redundant.
199Use ``no-prune`` (e.g. ``-fsanitize-coverage=bb,no-prune,trace-pc-guard``)
200to disable pruning. This could be useful for better coverage visualization.
201
202
203Edge coverage
204-------------
205
206Consider this code:
207
208.. code-block:: c++
209
210    void foo(int *a) {
211      if (a)
212        *a = 0;
213    }
214
215It contains 3 basic blocks, let's name them A, B, C:
216
217.. code-block:: none
218
219    A
220    |\
221    | \
222    |  B
223    | /
224    |/
225    C
226
227If blocks A, B, and C are all covered we know for certain that the edges A=>B
228and B=>C were executed, but we still don't know if the edge A=>C was executed.
229Such edges of control flow graph are called
230`critical <https://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_.
231The edge-level coverage simply splits all critical edges by introducing new
232dummy blocks and then instruments those blocks:
233
234.. code-block:: none
235
236    A
237    |\
238    | \
239    D  B
240    | /
241    |/
242    C
243
244Tracing data flow
245=================
246
247Support for data-flow-guided fuzzing.
248With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation
249around comparison instructions and switch statements.
250Similarly, with ``-fsanitize-coverage=trace-div`` the compiler will instrument
251integer division instructions (to capture the right argument of division)
252and with  ``-fsanitize-coverage=trace-gep`` --
253the `LLVM GEP instructions <https://llvm.org/docs/GetElementPtr.html>`_
254(to capture array indices).
255
256Unless ``no-prune`` option is provided, some of the comparison instructions
257will not be instrumented.
258
259.. code-block:: c++
260
261  // Called before a comparison instruction.
262  // Arg1 and Arg2 are arguments of the comparison.
263  void __sanitizer_cov_trace_cmp1(uint8_t Arg1, uint8_t Arg2);
264  void __sanitizer_cov_trace_cmp2(uint16_t Arg1, uint16_t Arg2);
265  void __sanitizer_cov_trace_cmp4(uint32_t Arg1, uint32_t Arg2);
266  void __sanitizer_cov_trace_cmp8(uint64_t Arg1, uint64_t Arg2);
267
268  // Called before a comparison instruction if exactly one of the arguments is constant.
269  // Arg1 and Arg2 are arguments of the comparison, Arg1 is a compile-time constant.
270  // These callbacks are emitted by -fsanitize-coverage=trace-cmp since 2017-08-11
271  void __sanitizer_cov_trace_const_cmp1(uint8_t Arg1, uint8_t Arg2);
272  void __sanitizer_cov_trace_const_cmp2(uint16_t Arg1, uint16_t Arg2);
273  void __sanitizer_cov_trace_const_cmp4(uint32_t Arg1, uint32_t Arg2);
274  void __sanitizer_cov_trace_const_cmp8(uint64_t Arg1, uint64_t Arg2);
275
276  // Called before a switch statement.
277  // Val is the switch operand.
278  // Cases[0] is the number of case constants.
279  // Cases[1] is the size of Val in bits.
280  // Cases[2:] are the case constants.
281  void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
282
283  // Called before a division statement.
284  // Val is the second argument of division.
285  void __sanitizer_cov_trace_div4(uint32_t Val);
286  void __sanitizer_cov_trace_div8(uint64_t Val);
287
288  // Called before a GetElemementPtr (GEP) instruction
289  // for every non-constant array index.
290  void __sanitizer_cov_trace_gep(uintptr_t Idx);
291
292Default implementation
293======================
294
295The sanitizer run-time (AddressSanitizer, MemorySanitizer, etc) provide a
296default implementations of some of the coverage callbacks.
297You may use this implementation to dump the coverage on disk at the process
298exit.
299
300Example:
301
302.. code-block:: console
303
304    % cat -n cov.cc
305         1  #include <stdio.h>
306         2  __attribute__((noinline))
307         3  void foo() { printf("foo\n"); }
308         4
309         5  int main(int argc, char **argv) {
310         6    if (argc == 2)
311         7      foo();
312         8    printf("main\n");
313         9  }
314    % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=trace-pc-guard
315    % ASAN_OPTIONS=coverage=1 ./a.out; wc -c *.sancov
316    main
317    SanitizerCoverage: ./a.out.7312.sancov 2 PCs written
318    24 a.out.7312.sancov
319    % ASAN_OPTIONS=coverage=1 ./a.out foo ; wc -c *.sancov
320    foo
321    main
322    SanitizerCoverage: ./a.out.7316.sancov 3 PCs written
323    24 a.out.7312.sancov
324    32 a.out.7316.sancov
325
326Every time you run an executable instrumented with SanitizerCoverage
327one ``*.sancov`` file is created during the process shutdown.
328If the executable is dynamically linked against instrumented DSOs,
329one ``*.sancov`` file will be also created for every DSO.
330
331Sancov data format
332------------------
333
334The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
335one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
336magic defines the size of the following offsets. The rest of the data is the
337offsets in the corresponding binary/DSO that were executed during the run.
338
339Sancov Tool
340-----------
341
342An simple ``sancov`` tool is provided to process coverage files.
343The tool is part of LLVM project and is currently supported only on Linux.
344It can handle symbolization tasks autonomously without any extra support
345from the environment. You need to pass .sancov files (named
346``<module_name>.<pid>.sancov`` and paths to all corresponding binary elf files.
347Sancov matches these files using module names and binaries file names.
348
349.. code-block:: console
350
351    USAGE: sancov [options] <action> (<binary file>|<.sancov file>)...
352
353    Action (required)
354      -print                    - Print coverage addresses
355      -covered-functions        - Print all covered functions.
356      -not-covered-functions    - Print all not covered functions.
357      -symbolize                - Symbolizes the report.
358
359    Options
360      -blacklist=<string>         - Blacklist file (sanitizer blacklist format).
361      -demangle                   - Print demangled function name.
362      -strip_path_prefix=<string> - Strip this prefix from file paths in reports
363
364
365Coverage Reports
366----------------
367
368**Experimental**
369
370``.sancov`` files do not contain enough information to generate a source-level
371coverage report. The missing information is contained
372in debug info of the binary. Thus the ``.sancov`` has to be symbolized
373to produce a ``.symcov`` file first:
374
375.. code-block:: console
376
377    sancov -symbolize my_program.123.sancov my_program > my_program.123.symcov
378
379The ``.symcov`` file can be browsed overlayed over the source code by
380running ``tools/sancov/coverage-report-server.py`` script that will start
381an HTTP server.
382
383Output directory
384----------------
385
386By default, .sancov files are created in the current working directory.
387This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
388
389.. code-block:: console
390
391    % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
392    % ls -l /tmp/cov/*sancov
393    -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
394    -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
395