1# Automatic function documentation and autodoc
2
3We use a specially formatted comment header on functions where we want
4to have Markdown documentation automatically extracted from our .c
5file. For example:
6
7
8```
9/* Function:  esl_json_Parse()
10 * Synopsis:  Parse a complete JSON data object
11 * Incept:    SRE, Sun 29 Jul 2018 [IB 6165 Madrid-Boston]
12 *
13 * Purpose:   Given an open input buffer <bf>, read the next
14 *            complete JSON data object from it. Return the
15 *            parse tree thru <*ret_pi>.
16 *
17 *            Upon successful return, the buffer <bf>'s point is
18 *            sitting precisely on the next byte following the closing
19 *            brace of the JSON object.
20 *
21 * Args:      bf     - open buffer for reading
22 *            ret_pi - RETURN: JSON parse tree
23 *
24 * Returns:   <eslOK> on success, and <*ret_pi> points
25 *            to the parse tree.
26 *
27 *            <eslEFORMAT> if the JSON data string is
28 *            invalid. <bf->errbuf> is set to a user-friendly
29 *            error message indicating why. <*ret_pi> is <NULL>.
30 *
31 * Throws:    <eslEMEM> on allocation failure.
32 *
33 *            On these exceptions, <*ret_pi> is returned <NULL>.
34 */
35int
36esl_json_Parse(ESL_BUFFER *bf, ESL_JSON **ret_pi)
37{
38  ...
39}
40
41```
42
43The `autodoc` script parses the .c file and extracts and formats
44the documentation for each documented function in it.
45
46```
47    % ./devkit/autodoc.py esl_foo.c > esl_foo_funcs.md
48```
49
50The entire unit starting with `/* Function:` and ending with an
51unindented closing brace followed by a blank line is called a **doc
52block**. A doc block consists of a **comment header** (from `/*
53Function` to the closing comment ` */`) and the **implementation**
54(code for one or more C functions). The comment header consists of
55**elements**, such as `Function:`, `Synopsis:`, and `Purpose:`, that
56`autodoc` extracts and reformats.
57
58Usually a doc block contains a single documented function, but in some
59cases we use one formatted comment header to document more than one
60function at once, which is why we talk about a "block" as a more
61general case.
62
63## tl;dr summary
64
65Everything in the comment header is treated as Markdown format, after
66stripping out the leading part of each line (comment `*`, whitespace,
67element names), with the exception that the function name(s) on the
68`Function:` line are treated as verbatim code.
69
70The Markdown format is GFM (github-flavored markdown) with MathJAX
71enabled (LaTeX mathematics work, with $$ for inline equations), with
72one major exception/addition: embedded code style is indicated by
73angle brackets `<code>` instead of backquotes. (Sorry if this annoys
74you; I just don't like the look of a bunch of backquotes in these
75headers.)  Backquotes work too, but anything that matches the regex
76`<(\S|\S.*?\S)>` work) has the angle brackets replaced by backquotes.
77(Note the lack of whitespace, so greater/less than signs don't get
78subbed.) The `autodoc` script has a `process()` function that
79does
80the angle bracket substitutions.
81
82The `process()` function also does the removal of the leading `*` and
83whitespace on each line of the comment block.  Because leading
84`^\s*\*\s+` is removed, Markdown features that depend on having zero
85leading whitespace work fine (such as tables) even though they're
86indented into our comment block.
87
88Short summary of the relevant elements of the comment header:
89
90* **Function:** names the documented function(). Extracted verbatim
91  and treated as code (no Markdown).
92
93* **Synopsis:** one-line short summary.
94
95* **Purpose:** The main documentation extracted for the function(s).
96
97* **Args:** Converted to a Markdown table with two columns, `arg` and
98`description`. Either a `:` or `-` is recognized as a separator; each
99line (after processing the leading comment piece out) is recognized by
100the regex `^(\S+)\s*[:-]\s*(.+?)\s*$` to split it into `arg` and
101`description`.
102
103* **Returns:** Brief description of what the function returns when it
104  succeeds or fails normally.
105
106* **Throws:** Brief description of what exceptions the function can
107  throw, and what state this leaves the returned stuff in.
108
109
110Comment headers can contain other elements that `autodoc` ignores,
111such as:
112
113* **Incept:** Who started writing the function and when -- and maybe
114  where they were and what they were listening to at the time, just
115  for fun.
116
117* **Xref:** Cross-references in our code, or into someone's paper or
118  electronic notes.
119
120* **Notes:** Additional notes, such as plans for future improvements
121  or issues that ought to be addressed (but don't rise to the level
122  that someone calling the function needs to know about).
123
124
125
126
127
128
129## syntactic details for a doc block
130
131`autodoc` uses regular expressions to parse the .c file, not a
132proper (context-free) C parser, so certain syntactic conventions need
133to be obeyed to allow it to work.
134
135The doc block is recognized by three pieces on four lines:
136
1371. An opening line starting with `/* Function:`. No leading space.
138   The regex fragment that matches this is `^/\*\s+Function:`.
139
1402. A line ` */` that closes the comment block, with one leading space.
141   The regex fragment for this is `^ \*/`
142
1433. An unindented closing brace followed by a blank line.
144   The regex fragment for this is `^\}\s*$^\s*$`.
145
146Everything from 1 to 2 is treated as a structured comment header.
147Everything after 2 up to the closing brace in 3 is treated as the
148implementation.
149
150The convention of a closing unindented brace + blank line is critical
151for allowing `autodoc` to recognize the end of the block with a
152regular expression. Only the outermost braces of a function are
153unindented (in properly indented code), and if we want more than one
154function under one doc comment we concatenate them without blank
155lines. Relaxing this format (for example, to allow one-liner
156implementations like `int myfunc(void) { foo(); }`) would require a
157substantial change in the `autodoc` parsing strategy (such as using an
158actual C syntax parser).
159
160
161
162
163
164
165## elements of the structured comment header
166
167### Function:
168
169Names the documented function(s). **Mandatory**. Plaintext (formatted
170as code).
171
172The `autodoc` script looks for a function with this name in the C
173implementation, and extracts its call syntax.
174
175Examples:
176
177```
178   /* Function:  esl_json_Parse()
179
180   /* Function:  esl_foo_Func1(), esl_foo_Func2()
181
182   /* Function:  esl_foo_{DFILCWB}Func()
183```
184
185When the comment header documents a set of related functions instead
186of just one, there's two ways to list the set. One is a
187comma-separated list. The other (see `esl_vectorops` for an example)
188gets used when we have related functions acting on different common
189types. Easel naming conventions attach a one-letter signifier of the
190type: D,F,I,L,C,W,B mean `double`, `float`, `int`, `int64_t`, `char`,
191`int16_t` (word), and `int8_t` (byte), respectively. If the function
192name contains a list `\{[DFILCWB]+\}`, the full set of function names
193will be constructed from this list of characters before `autodoc`
194searches for their syntax.
195
196### Synopsis:
197
198This needs to fit on one line. Optional. Markdown.
199
200### Incept:
201
202`autodoc` doesn't use this. Optional. Free text.
203
204Sometimes useful, or at least historically interesting, to know who
205first wrote the function and when. Less usefully (but I find it mildly
206amusing), I'll often add a note about where I am on the planet, and
207what I'm listening to.
208
209### Purpose:
210
211This is the main body of the documentation for the function. Optional
212(sometimes the one-line synopsis suffices). Markdown.
213
214### Args:
215
216Table of arguments; : or - as a separator. Optional. Formatted as a
217Markdown table.
218
219### Returns:
220
221Brief summary of the state of everything upon return, either
222successful or on normal error. Optional. Markdown.
223
224### Throws:
225
226Brief summary of exceptions that can be thrown, and of the state of
227everything if that happens. Optional. Markdown.
228
229### Xref:
230
231Cross-reference into our code, or someone's paper or electronic
232notes. Optional. Free text. `autodoc` doesn't use this.
233
234Something like `[SRE:H6/171]` is a crossreference into my paper notes:
235notebook Harvard 6, pg. 171.  Something like `SRE:2019/1117-foo` is a
236crossreference into my electronic notes. Scans or copies available
237upon (reasonable) request.
238
239### Notes:
240
241Internal notes to myself or other future developers.
242
243
244## emacs macro
245
246I use an emacs macro, bound to `M-"`, to insert a structured comment
247header:
248
249```
250(defun sre-get-name-and-time()
251  "Insert my initials and then the date into the buffer"
252  (interactive)
253  (progn
254    (insert "SRE, ")
255    (insert (shell-command-to-string "echo -n $(date +'%a %d %b %Y')"))))
256
257(defun sre-insert-my-function-header()
258  "Insert my standard function documentation header in C mode"
259  (interactive)
260  (insert "/* Function:  \n")
261  (insert " * Synopsis:  \n")
262  (insert " * Incept:    ")
263  (sre-get-name-and-time)
264  (insert "\n")
265  (insert " *\n")
266  (insert " * Purpose:   \n")
267  (insert " *\n")
268  (insert " * Args:      \n")
269  (insert " *\n")
270  (insert " * Returns:   \n")
271  (insert " *\n")
272  (insert " * Throws:    (no abnormal error conditions)\n")
273  (insert " *\n")
274  (insert " * Xref:      \n")
275  (insert " */\n"))
276
277(global-set-key "\e\"" 'sre-insert-my-function-header)
278
279```
280
281
282
283
284## future alternatives
285
286Periodically I look into whether we should adopt a more sophisticated
287[documentation generator](https://en.wikipedia.org/wiki/Comparison_of_documentation_generators)
288such as [Sphinx](http://www.sphinx-doc.org/en/master/) or
289[Doxygen](http://www.doxygen.nl/). At least for the moment, I think
290we're better off with a simpler system that we have control over.
291