README.tokencap
1=========================================
2strcmp() / memcmp() token capture library
3=========================================
4
5 (See ../docs/README for the general instruction manual.)
6
7This Linux-only companion library allows you to instrument strcmp(), memcmp(),
8and related functions to automatically extract syntax tokens passed to any of
9these libcalls. The resulting list of tokens may be then given as a starting
10dictionary to afl-fuzz (the -x option) to improve coverage on subsequent
11fuzzing runs.
12
13This may help improving coverage in some targets, and do precisely nothing in
14others. In some cases, it may even make things worse: if libtokencap picks up
15syntax tokens that are not used to process the input data, but that are a part
16of - say - parsing a config file... well, you're going to end up wasting a lot
17of CPU time on trying them out in the input stream. In other words, use this
18feature with care. Manually screening the resulting dictionary is almost
19always a necessity.
20
21As for the actual operation: the library stores tokens, without any deduping,
22by appending them to a file specified via AFL_TOKEN_FILE. If the variable is not
23set, the tool uses stderr (which is probably not what you want).
24
25Similarly to afl-tmin, the library is not "proprietary" and can be used with
26other fuzzers or testing tools without the need for any code tweaks. It does not
27require AFL-instrumented binaries to work.
28
29To use the library, you *need* to make sure that your fuzzing target is compiled
30with -fno-builtin and is linked dynamically. If you wish to automate the first
31part without mucking with CFLAGS in Makefiles, you can set AFL_NO_BUILTIN=1
32when using afl-gcc. This setting specifically adds the following flags:
33
34 -fno-builtin-strcmp -fno-builtin-strncmp -fno-builtin-strcasecmp
35 -fno-builtin-strcasencmp -fno-builtin-memcmp -fno-builtin-strstr
36 -fno-builtin-strcasestr
37
38The next step is simply loading this library via LD_PRELOAD. The optimal usage
39pattern is to allow afl-fuzz to fuzz normally for a while and build up a corpus,
40and then fire off the target binary, with libtokencap.so loaded, on every file
41found by AFL in that earlier run. This demonstrates the basic principle:
42
43 export AFL_TOKEN_FILE=$PWD/temp_output.txt
44
45 for i in <out_dir>/queue/id*; do
46 LD_PRELOAD=/path/to/libtokencap.so \
47 /path/to/target/program [...params, including $i...]
48 done
49
50 sort -u temp_output.txt >afl_dictionary.txt
51
52If you don't get any results, the target library is probably not using strcmp()
53and memcmp() to parse input; or you haven't compiled it with -fno-builtin; or
54the whole thing isn't dynamically linked, and LD_PRELOAD is having no effect.
55
56PS. The library is Linux-only because there is probably no particularly portable
57and non-invasive way to distinguish between read-only and read-write memory
58mappings. The __tokencap_load_mappings() function is the only thing that would
59need to be changed for other OSes. Porting to platforms with /proc/<pid>/maps
60(e.g., FreeBSD) should be trivial.
61
62