1# Automatic function documentation and autodoc 2 3We use a specially formatted comment header on functions where we want 4to have Markdown documentation automatically extracted from our .c 5file. For example: 6 7 8``` 9/* Function: esl_json_Parse() 10 * Synopsis: Parse a complete JSON data object 11 * Incept: SRE, Sun 29 Jul 2018 [IB 6165 Madrid-Boston] 12 * 13 * Purpose: Given an open input buffer <bf>, read the next 14 * complete JSON data object from it. Return the 15 * parse tree thru <*ret_pi>. 16 * 17 * Upon successful return, the buffer <bf>'s point is 18 * sitting precisely on the next byte following the closing 19 * brace of the JSON object. 20 * 21 * Args: bf - open buffer for reading 22 * ret_pi - RETURN: JSON parse tree 23 * 24 * Returns: <eslOK> on success, and <*ret_pi> points 25 * to the parse tree. 26 * 27 * <eslEFORMAT> if the JSON data string is 28 * invalid. <bf->errbuf> is set to a user-friendly 29 * error message indicating why. <*ret_pi> is <NULL>. 30 * 31 * Throws: <eslEMEM> on allocation failure. 32 * 33 * On these exceptions, <*ret_pi> is returned <NULL>. 34 */ 35int 36esl_json_Parse(ESL_BUFFER *bf, ESL_JSON **ret_pi) 37{ 38 ... 39} 40 41``` 42 43The `autodoc` script parses the .c file and extracts and formats 44the documentation for each documented function in it. 45 46``` 47 % ./devkit/autodoc.py esl_foo.c > esl_foo_funcs.md 48``` 49 50The entire unit starting with `/* Function:` and ending with an 51unindented closing brace followed by a blank line is called a **doc 52block**. A doc block consists of a **comment header** (from `/* 53Function` to the closing comment ` */`) and the **implementation** 54(code for one or more C functions). The comment header consists of 55**elements**, such as `Function:`, `Synopsis:`, and `Purpose:`, that 56`autodoc` extracts and reformats. 57 58Usually a doc block contains a single documented function, but in some 59cases we use one formatted comment header to document more than one 60function at once, which is why we talk about a "block" as a more 61general case. 62 63## tl;dr summary 64 65Everything in the comment header is treated as Markdown format, after 66stripping out the leading part of each line (comment `*`, whitespace, 67element names), with the exception that the function name(s) on the 68`Function:` line are treated as verbatim code. 69 70The Markdown format is GFM (github-flavored markdown) with MathJAX 71enabled (LaTeX mathematics work, with $$ for inline equations), with 72one major exception/addition: embedded code style is indicated by 73angle brackets `<code>` instead of backquotes. (Sorry if this annoys 74you; I just don't like the look of a bunch of backquotes in these 75headers.) Backquotes work too, but anything that matches the regex 76`<(\S|\S.*?\S)>` work) has the angle brackets replaced by backquotes. 77(Note the lack of whitespace, so greater/less than signs don't get 78subbed.) The `autodoc` script has a `process()` function that 79does 80the angle bracket substitutions. 81 82The `process()` function also does the removal of the leading `*` and 83whitespace on each line of the comment block. Because leading 84`^\s*\*\s+` is removed, Markdown features that depend on having zero 85leading whitespace work fine (such as tables) even though they're 86indented into our comment block. 87 88Short summary of the relevant elements of the comment header: 89 90* **Function:** names the documented function(). Extracted verbatim 91 and treated as code (no Markdown). 92 93* **Synopsis:** one-line short summary. 94 95* **Purpose:** The main documentation extracted for the function(s). 96 97* **Args:** Converted to a Markdown table with two columns, `arg` and 98`description`. Either a `:` or `-` is recognized as a separator; each 99line (after processing the leading comment piece out) is recognized by 100the regex `^(\S+)\s*[:-]\s*(.+?)\s*$` to split it into `arg` and 101`description`. 102 103* **Returns:** Brief description of what the function returns when it 104 succeeds or fails normally. 105 106* **Throws:** Brief description of what exceptions the function can 107 throw, and what state this leaves the returned stuff in. 108 109 110Comment headers can contain other elements that `autodoc` ignores, 111such as: 112 113* **Incept:** Who started writing the function and when -- and maybe 114 where they were and what they were listening to at the time, just 115 for fun. 116 117* **Xref:** Cross-references in our code, or into someone's paper or 118 electronic notes. 119 120* **Notes:** Additional notes, such as plans for future improvements 121 or issues that ought to be addressed (but don't rise to the level 122 that someone calling the function needs to know about). 123 124 125 126 127 128 129## syntactic details for a doc block 130 131`autodoc` uses regular expressions to parse the .c file, not a 132proper (context-free) C parser, so certain syntactic conventions need 133to be obeyed to allow it to work. 134 135The doc block is recognized by three pieces on four lines: 136 1371. An opening line starting with `/* Function:`. No leading space. 138 The regex fragment that matches this is `^/\*\s+Function:`. 139 1402. A line ` */` that closes the comment block, with one leading space. 141 The regex fragment for this is `^ \*/` 142 1433. An unindented closing brace followed by a blank line. 144 The regex fragment for this is `^\}\s*$^\s*$`. 145 146Everything from 1 to 2 is treated as a structured comment header. 147Everything after 2 up to the closing brace in 3 is treated as the 148implementation. 149 150The convention of a closing unindented brace + blank line is critical 151for allowing `autodoc` to recognize the end of the block with a 152regular expression. Only the outermost braces of a function are 153unindented (in properly indented code), and if we want more than one 154function under one doc comment we concatenate them without blank 155lines. Relaxing this format (for example, to allow one-liner 156implementations like `int myfunc(void) { foo(); }`) would require a 157substantial change in the `autodoc` parsing strategy (such as using an 158actual C syntax parser). 159 160 161 162 163 164 165## elements of the structured comment header 166 167### Function: 168 169Names the documented function(s). **Mandatory**. Plaintext (formatted 170as code). 171 172The `autodoc` script looks for a function with this name in the C 173implementation, and extracts its call syntax. 174 175Examples: 176 177``` 178 /* Function: esl_json_Parse() 179 180 /* Function: esl_foo_Func1(), esl_foo_Func2() 181 182 /* Function: esl_foo_{DFILCWB}Func() 183``` 184 185When the comment header documents a set of related functions instead 186of just one, there's two ways to list the set. One is a 187comma-separated list. The other (see `esl_vectorops` for an example) 188gets used when we have related functions acting on different common 189types. Easel naming conventions attach a one-letter signifier of the 190type: D,F,I,L,C,W,B mean `double`, `float`, `int`, `int64_t`, `char`, 191`int16_t` (word), and `int8_t` (byte), respectively. If the function 192name contains a list `\{[DFILCWB]+\}`, the full set of function names 193will be constructed from this list of characters before `autodoc` 194searches for their syntax. 195 196### Synopsis: 197 198This needs to fit on one line. Optional. Markdown. 199 200### Incept: 201 202`autodoc` doesn't use this. Optional. Free text. 203 204Sometimes useful, or at least historically interesting, to know who 205first wrote the function and when. Less usefully (but I find it mildly 206amusing), I'll often add a note about where I am on the planet, and 207what I'm listening to. 208 209### Purpose: 210 211This is the main body of the documentation for the function. Optional 212(sometimes the one-line synopsis suffices). Markdown. 213 214### Args: 215 216Table of arguments; : or - as a separator. Optional. Formatted as a 217Markdown table. 218 219### Returns: 220 221Brief summary of the state of everything upon return, either 222successful or on normal error. Optional. Markdown. 223 224### Throws: 225 226Brief summary of exceptions that can be thrown, and of the state of 227everything if that happens. Optional. Markdown. 228 229### Xref: 230 231Cross-reference into our code, or someone's paper or electronic 232notes. Optional. Free text. `autodoc` doesn't use this. 233 234Something like `[SRE:H6/171]` is a crossreference into my paper notes: 235notebook Harvard 6, pg. 171. Something like `SRE:2019/1117-foo` is a 236crossreference into my electronic notes. Scans or copies available 237upon (reasonable) request. 238 239### Notes: 240 241Internal notes to myself or other future developers. 242 243 244## emacs macro 245 246I use an emacs macro, bound to `M-"`, to insert a structured comment 247header: 248 249``` 250(defun sre-get-name-and-time() 251 "Insert my initials and then the date into the buffer" 252 (interactive) 253 (progn 254 (insert "SRE, ") 255 (insert (shell-command-to-string "echo -n $(date +'%a %d %b %Y')")))) 256 257(defun sre-insert-my-function-header() 258 "Insert my standard function documentation header in C mode" 259 (interactive) 260 (insert "/* Function: \n") 261 (insert " * Synopsis: \n") 262 (insert " * Incept: ") 263 (sre-get-name-and-time) 264 (insert "\n") 265 (insert " *\n") 266 (insert " * Purpose: \n") 267 (insert " *\n") 268 (insert " * Args: \n") 269 (insert " *\n") 270 (insert " * Returns: \n") 271 (insert " *\n") 272 (insert " * Throws: (no abnormal error conditions)\n") 273 (insert " *\n") 274 (insert " * Xref: \n") 275 (insert " */\n")) 276 277(global-set-key "\e\"" 'sre-insert-my-function-header) 278 279``` 280 281 282 283 284## future alternatives 285 286Periodically I look into whether we should adopt a more sophisticated 287[documentation generator](https://en.wikipedia.org/wiki/Comparison_of_documentation_generators) 288such as [Sphinx](http://www.sphinx-doc.org/en/master/) or 289[Doxygen](http://www.doxygen.nl/). At least for the moment, I think 290we're better off with a simpler system that we have control over. 291