1These are a couple of debugging notes that may be helpful for anyone developing
2Gumbo or trying to diagnose a tricky problem.  They will probably not be
3necessary for normal clients of this library - Gumbo is relatively stable, and
4bugs are often rare and obscure.  However, they're handy to have as a reference,
5and may also provide useful Google fodder to people searching for these tools.
6
7Standard disclaimer: I use all of these techniques on my Ubuntu 14.04 computer
8with gcc 4.8.2, clang 3.4, and gtest 1.6.0, but make no warranty about them
9working on other systems.  In particular, they're almost certain not to work on
10Windows.
11
12Debug output
13============
14
15Gumbo has a compile-time switch to dump lots of debug output onto stdout.
16Compile with the GUMBO_DEBUG define enabled:
17
18```bash
19$ make CFLAGS='-DGUMBO_DEBUG'
20```
21
22Note that this spits *a lot* of debug information to the console and makes the
23program run significantly slower, so it's usually helpful to isolate only the
24specific HTML file or fragment that causes the bug.  It lets us trace the
25operation of each of the tokenizer & parser's state machines in depth, though.
26
27Unit tests
28==========
29
30As mentioned in the README, Gumbo relies on [googletest][] for unit tests.
31Unzip the gtest ZIP distribution inside the Gumbo root and rename it 'gtest'.
32'make check' runs the tests, as normal.
33
34```bash
35$ make check
36$ cat test-suite.log
37```
38
39If you need to debug a core dump, you'll probably want to run the test binary
40directly:
41
42```bash
43$ ulimit -c unlimited
44$ make check
45$ .libs/lt-gumbo_test
46$ gdb .libs/lt-gumbo_test core
47```
48
49The same goes for core dumps in other example binaries.
50
51To run only a single unit test, pass the --gtest_filter='TestName' flag to the
52lt-gumbo_test binary.
53
54Assertions
55==========
56
57Gumbo relies pretty heavily on assertions.  By default they're enabled at
58run-time: to turn them off, define NDEBUG:
59
60```bash
61$ make CFLAGS='-DNDEBUG'
62```
63
64ASAN
65====
66
67Google's [address-sanitizer][] is a helpful tool that lets you find memory
68errors with relatively low overhead: enough that you can often run it in
69production.  Enabling it for C/C++ binaries is pretty standard and described on
70the ASAN documentation pages.  It requires Clang >=3.1 or GCC >= 4.8.
71
72```bash
73$ make \
74    CFLAGS='-fsanitize=address -fno-omit-frame-pointer -fno-inline' \
75    LDFLAGS='-fsanitize=address'
76```
77
78ASAN can also be used when Gumbo is compiled as a shared library and linked into
79a scripting language via FFI, but this use-case is unsupported by the ASAN
80authors.  To do it, use LD_PRELOAD to ensure the ASAN runtime support is
81included in the process:
82
83```bash
84$ LD_PRELOAD=libasan.so.0 python -c 'import gumbo; gumbo.parse(problem_text)'
85```
86
87Getting clean stack traces from this requires the use of the llvm-symbolizer
88binary, included with clang:
89
90```bash
91$ export ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer-3.4
92$ export ASAN_OPTIONS=symbolize=1
93$ LD_PRELOAD=libasan.so.0 python -c \
94  'import gumbo; gumbo.parse(problem_text)' 2>&1 | head -100
95$ killall llvm-symbolizer-3.4
96$ killall llvm-symbolizer-3.4
97$ killall llvm-symbolizer-3.4
98```
99
100This use case is even less officially supported than using it with dynamic
101shared objects; on my machine, it led to a recursive ASAN error about a
102use-after-free in llvm-symbolizer, effectively fork-bombing the machine.  Have
103the killalls ready, and avoid letting the process run for too long (eg. piping
104it to 'less').
105
106[googletest]: https://code.google.com/p/googletest/
107[address-sanitizer]: https://code.google.com/p/address-sanitizer/
108