1Debugging
2=========
3
4There link-grammar library has API calls to ease debugging and development.
5The `link-parser` program has corresponding options for this API.
6
7Only the `link-parser` options will be discussed here.
8Options to `link-parser` at the command-line are preceded with a `-` sign.
9You can use a unique prefix of an option name instead of its full name. At
10the **linkparser>** prompt or in batch files, options are preceded with
11a `!` character.
12
13For info on common options, see the "Special ! options" of the `link-grammar`
14manual. For a general help message use `link-parser -help`.
15
16
17Debug options
18-------------
19
20### 1) -verbosity=N (-v=N)
21Sets the verbosity level of the library to N (a small non-negative integer).
22
23#### Verbosity levels
240: Certain informative messages are not printed by the
25library. `link-parser` also doesn't print its usual **linkparser>**
26prompt. This is the current default verbosity level for the Python
27binding.
28
291: This is the library default. This is also the default for
30`link-parser`.
31
322: Display parsing steps time. In case an error/warning gets issued by the
33library, this may help finding out at which step it happened.
34
353: Display some more Info messages:
36- Freeing dictionaries.
37- Number of insane-morphism linkages.
38- A warning when all the linkages have a PP violation.
39
404: Display data file search and locale setup. It can be used to debug
41problems with the locale setup or in finding the dictionary.
42
435-9: Show trace and debug messages regarding sentence handling. Higher
44levels include the messages of the lower ones.
45
4610-99: Show also trace and debug messages regarding reading the
47dictionary.  As with levels greater then 4, higher levels include the
48messages of the lower ones.
49
50* 10: Basic dictionary debug.
51
52100-...: Show only messages exactly at the specified level.
53* 101: Print all the connectors, along with their length limit.
54       A length limit of 0 means the value of the `short_length` option is
55       used.
56
57* 102: Print all disjuncts before and after pruning.
58
59* 103: Show unsubscripted dictionary words and subscripted ones which share
60       the same base word.
61
62* 104: Memory pool statistics.
63
64### 2) -debug=LOCATIONS (-de=LOCATIONS)
65Show only messages from these LOCATIONS. The LOCATIONS string is a
66comma-separated list of source file names (without specifying their
67directory) and function names (fully qualified for C++) from which to
68show the messages.
69
70For example, to only show messages from the `flatten_wordgraph()` function
71or the print.c file:
72
73`link-parser -v=6 -debug=flatten_wordgraph,print.c`
74
75Note that since print.c is used to produce certain messages, it is
76currently needed to add it to the debug LOCATIONS list unless you
77explicitly specify also the function in print.c (to further restrict
78the messages).
79
80### 3) -test=FEATURES (-te=FEATURES)
81Enable certain features. These can be debug aids, or new features that
82are not yet official or fully-developed.
83
84For example, to automatically show all linkages of a sentence, the
85following can be done:
86
87`link-parser -test=auto-next-linkage`
88
89`link-parser` warns when tests are enabled. This way it is possible to see in
90the linkage output which tests were enabled. This is particularly important
91when examining output files. However, when doing benchmarks (with and w/o
92tests) this is not desired because these warnings skew the timing.
93If needed, suppress this warnings with the added special tests `@`, as in:
94`-test=@,one-step-parse`.
95
96Useful examples
97---------------
98
99### -debug=...
100
1011) See the tokens after flattening into the word array used by the parser:
102
103```
104echo "Let's test it" | \
105link-parser -v=6 -debug=flatten_wordgraph,print_sentence_word_alternatives
106```
107
1082) Trace the work of `sane_linkage_morphism()`:
109
110`link-parser -v=8 -debug=sane_linkage_morphism`
111
1123) Same as (2) above, but also see other messages from sane.c:
113
114`link-parser -v=8 -debug=sane.c`
115
116(`sane_linkage_morphism()` happens to be in `sane.c` so this includes its
117messages.)
118
1194) Debug the tokenizer:
120
121`link-parser -v=7 -debug=tokenizer.c`
122
123Or, in order to display the word array:
124
125`link-parser -v=7 -debug=tokenize.c,print_sentence_word_alternatives`
126
1275) Debug post-processing:
128
129`link-parser -v=9 -debug=post-process.c`
130
1316) Debug expression pruning:
132
133`link-parser -v=9 -debug=expression_prune`
134
1357) Debug reading the affix and knowledge files:
136
137`link-parser -v=11`
138
139### -test=...
140
1411) Automatically show all linkages:
142
143`link-parser -test=auto-next-linkage`
144Try to type some sentences at the **linkparser>** prompt to see its action.
145
1462) Print more that 1024 linkages in `link-parser` (this is the maximum
147`link-parser` would print by default), e.g. 20000:
148
149`link-parser -test=auto-next-linkage:20000`
150
1513) To print detailed linkages of **data/en/corpus-basic.batch**:
152
153```
154sed '/^*/d;/^!const/d;/^!batch/d' data/en/corpus-basic.batch | \
155link-parser -test=auto-next-linkage
156```
157
158(If you cut&paste it to a terminal, remember to escape each of the "**!**"
159characters with a backslash.)
160
161This, along with "diff", "grep" etc., can be used in order to validate
162that a change didn't cause undesired effects. Special care should be taken
163if sentences with more than 1024 linkages are to be verified too (use a
164larger `-limit=N` and `-test=auto-next-linkage:M`, when N>>M).
165
166Note that this technique is not very effective if the order to the
167linkages got changed (or if SAT-parser linkages need to be compared to the
168classic-parser linkages). In that case the detailed linkages results need
169to be filtered through a script which sorts them according to some
170"canonical order" and also removes duplicates.
171
1724) Display the wordgraph using `-wordgraph=N`, optionally using additional
173wordgraph-display flags with `-test=wg:FLAGS`.
174
175For more examples of how to use the wordgraph-display, see
176[link-grammar/tokenize/README.md]
177(/link-grammar/tokenize/README.md#word-graph-display)
178and [msvc/README.md](/msvc/README.md).
179
1805) Test the "trailing connector" hashing for short sentences too (e.g. for
181all sentences with more than 10 tokens):
182`link-parser test=min-len-encoding:10`
183Or optionally (in order to see relevant debug messages from `preparation.c`):
184`link-parser test=min-len-encoding:10 -v=5 -debug=preparation.c`
185
1866) -test=<values> for SAT parser debugging:
187`linkage-disconnected` - Display also solutions which don't have a full linkage.
188`sat-stats` - Display the number of PP-violations and disconnected linkages.
189`no-pp_pruning_1` - Disable a partial CONTAINS_NONE_RULES pruning
190
1917)  -test=<values> for the pruning subsystem:
192`len-multi-pruning:N` - Prune per null_count for more than N-token sentences.
193`always-parse` - Don't use a parse shortcut and always fully prune.
194`no-mlink` - Don't prune using an mlink table.
195
196Debugging and STDIO streams
197---------------------------
198Messages at severity Info and higher (i.e. also Warning, Error and
199Fatal) are printed to `stderr`. The other severities
200(at Debug and below, i.e also
201Trace and None) are printed to `stdout`. The rational is that
202debugging messages, in order to be useful, need to appear along with the
203regular output of the program, while errors are exceptional and need to
204stand out when `link-parser`s `stdout` is redirected to a file.
205
206The C API includes the ability to set the severity level threshold above
207which messages are printed to `stderr` (see
208"Improved error notification facility"->"C API" in
209[link-grammar/README.md](/link-grammar/README.md)).
210
211Note that when debugging errors during a sentence batch run, it may be useful
212to redirect also `stderr` to the same file (the error facility of the library
213flushes `stdout` before printing in order to preserve output order).
214
215Using debugger
216--------------
217
218### Configuring for debug
219
220`configure --enable-debug`
221
222Its sets the DEBUG definitions and removes the optimization flags of the
223compiler. The DEBUG definition adds various validity checks, test
224messages, and some debug functions (that can be invoked, for example, from
225the debugger).
226
227
228| `gdb` command | Description |
229|---------------|-------------|
230| <pre>call wordgraph_show(sent, "") | If something goes wrong, it is may be useful to display the wordgraph. The second argument can include wordgraph display options.|
231| <pre>call print_all_disjuncts(sent) | Print the disjuncts. |
232
233
234FIXME: Document more debug functions.
235
236Compilation definitions
237-----------------------
238Some debug-related compilation flags can be set using `configure`, to as `make`
239arguments:
240
241| Definition | Description |
242| ---------- |-------------|
243| `NO_SAN_DICT` | Don't use ASAN/UBSAN for dict reading. This cause a vast startup speedup when using ASAN/UBSAN. It is optional because it shouldn't normally be used when debugging the dictionary code. |
244|`POOL_ALLOCATOR=0` | Pool allocator debug facility: A fake pool allocator that uses `malloc()` for each allocation is defined, in order that ASAN or valgrind can be used to find memory usage bugs. |
245|`TRACON_SET_DEBUG` | Print tracon_set stats. |
246|`DEBUG_PP_PRUNE` | PP pruning debug printout. |
247|`DEBUG_TABLE_STAT`| print count table stats. |
248|`DO_COUNT_TRACE` | Detailed trace of do_count. |
249|`DEBUG_X_TABLE` | Print x_table stats. |
250
251### Specific SAT-parser debug
252| Definition | Description |
253| ---------- |-------------|
254| `CONNECTIVITY_DEBUG` |  Debug SAT connectivity . |
255| `SAT_DEBUG`, `VARS` | Debug variables. |
256