1These are a couple of debugging notes that may be helpful for anyone developing 2Gumbo or trying to diagnose a tricky problem. They will probably not be 3necessary for normal clients of this library - Gumbo is relatively stable, and 4bugs are often rare and obscure. However, they're handy to have as a reference, 5and may also provide useful Google fodder to people searching for these tools. 6 7Standard disclaimer: I use all of these techniques on my Ubuntu 14.04 computer 8with gcc 4.8.2, clang 3.4, and gtest 1.6.0, but make no warranty about them 9working on other systems. In particular, they're almost certain not to work on 10Windows. 11 12Debug output 13============ 14 15Gumbo has a compile-time switch to dump lots of debug output onto stdout. 16Compile with the GUMBO_DEBUG define enabled: 17 18```bash 19$ make CFLAGS='-DGUMBO_DEBUG' 20``` 21 22Note that this spits *a lot* of debug information to the console and makes the 23program run significantly slower, so it's usually helpful to isolate only the 24specific HTML file or fragment that causes the bug. It lets us trace the 25operation of each of the tokenizer & parser's state machines in depth, though. 26 27Unit tests 28========== 29 30As mentioned in the README, Gumbo relies on [googletest][] for unit tests. 31Unzip the gtest ZIP distribution inside the Gumbo root and rename it 'gtest'. 32'make check' runs the tests, as normal. 33 34```bash 35$ make check 36$ cat test-suite.log 37``` 38 39If you need to debug a core dump, you'll probably want to run the test binary 40directly: 41 42```bash 43$ ulimit -c unlimited 44$ make check 45$ .libs/lt-gumbo_test 46$ gdb .libs/lt-gumbo_test core 47``` 48 49The same goes for core dumps in other example binaries. 50 51To run only a single unit test, pass the --gtest_filter='TestName' flag to the 52lt-gumbo_test binary. 53 54Assertions 55========== 56 57Gumbo relies pretty heavily on assertions. By default they're enabled at 58run-time: to turn them off, define NDEBUG: 59 60```bash 61$ make CFLAGS='-DNDEBUG' 62``` 63 64ASAN 65==== 66 67Google's [address-sanitizer][] is a helpful tool that lets you find memory 68errors with relatively low overhead: enough that you can often run it in 69production. Enabling it for C/C++ binaries is pretty standard and described on 70the ASAN documentation pages. It requires Clang >=3.1 or GCC >= 4.8. 71 72```bash 73$ make \ 74 CFLAGS='-fsanitize=address -fno-omit-frame-pointer -fno-inline' \ 75 LDFLAGS='-fsanitize=address' 76``` 77 78ASAN can also be used when Gumbo is compiled as a shared library and linked into 79a scripting language via FFI, but this use-case is unsupported by the ASAN 80authors. To do it, use LD_PRELOAD to ensure the ASAN runtime support is 81included in the process: 82 83```bash 84$ LD_PRELOAD=libasan.so.0 python -c 'import gumbo; gumbo.parse(problem_text)' 85``` 86 87Getting clean stack traces from this requires the use of the llvm-symbolizer 88binary, included with clang: 89 90```bash 91$ export ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer-3.4 92$ export ASAN_OPTIONS=symbolize=1 93$ LD_PRELOAD=libasan.so.0 python -c \ 94 'import gumbo; gumbo.parse(problem_text)' 2>&1 | head -100 95$ killall llvm-symbolizer-3.4 96$ killall llvm-symbolizer-3.4 97$ killall llvm-symbolizer-3.4 98``` 99 100This use case is even less officially supported than using it with dynamic 101shared objects; on my machine, it led to a recursive ASAN error about a 102use-after-free in llvm-symbolizer, effectively fork-bombing the machine. Have 103the killalls ready, and avoid letting the process run for too long (eg. piping 104it to 'less'). 105 106[googletest]: https://code.google.com/p/googletest/ 107[address-sanitizer]: https://code.google.com/p/address-sanitizer/ 108