xref: /freebsd/contrib/mandoc/mandoc.3 (revision 2f513db7)
1.\"	$Id: mandoc.3,v 1.44 2018/12/30 00:49:55 schwarze Exp $
2.\"
3.\" Copyright (c) 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
4.\" Copyright (c) 2010-2017 Ingo Schwarze <schwarze@openbsd.org>
5.\"
6.\" Permission to use, copy, modify, and distribute this software for any
7.\" purpose with or without fee is hereby granted, provided that the above
8.\" copyright notice and this permission notice appear in all copies.
9.\"
10.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
17.\"
18.Dd $Mdocdate: December 30 2018 $
19.Dt MANDOC 3
20.Os
21.Sh NAME
22.Nm mandoc ,
23.Nm deroff ,
24.Nm mparse_alloc ,
25.Nm mparse_copy ,
26.Nm mparse_free ,
27.Nm mparse_open ,
28.Nm mparse_readfd ,
29.Nm mparse_reset ,
30.Nm mparse_result
31.Nd mandoc macro compiler library
32.Sh SYNOPSIS
33.In sys/types.h
34.In stdio.h
35.In mandoc.h
36.Pp
37.Fd "#define ASCII_NBRSP"
38.Fd "#define ASCII_HYPH"
39.Fd "#define ASCII_BREAK"
40.Ft struct mparse *
41.Fo mparse_alloc
42.Fa "int options"
43.Fa "enum mandoc_os oe_e"
44.Fa "char *os_s"
45.Fc
46.Ft void
47.Fo mparse_free
48.Fa "struct mparse *parse"
49.Fc
50.Ft void
51.Fo mparse_copy
52.Fa "const struct mparse *parse"
53.Fc
54.Ft int
55.Fo mparse_open
56.Fa "struct mparse *parse"
57.Fa "const char *fname"
58.Fc
59.Ft void
60.Fo mparse_readfd
61.Fa "struct mparse *parse"
62.Fa "int fd"
63.Fa "const char *fname"
64.Fc
65.Ft void
66.Fo mparse_reset
67.Fa "struct mparse *parse"
68.Fc
69.Ft struct roff_meta *
70.Fo mparse_result
71.Fa "struct mparse *parse"
72.Fc
73.In roff.h
74.Ft void
75.Fo deroff
76.Fa "char **dest"
77.Fa "const struct roff_node *node"
78.Fc
79.In sys/types.h
80.In mandoc.h
81.In mdoc.h
82.Vt extern const char * const * mdoc_argnames;
83.Vt extern const char * const * mdoc_macronames;
84.In sys/types.h
85.In mandoc.h
86.In man.h
87.Vt extern const char * const * man_macronames;
88.Sh DESCRIPTION
89The
90.Nm mandoc
91library parses a
92.Ux
93manual into an abstract syntax tree (AST).
94.Ux
95manuals are composed of
96.Xr mdoc 7
97or
98.Xr man 7 ,
99and may be mixed with
100.Xr roff 7 ,
101.Xr tbl 7 ,
102and
103.Xr eqn 7
104invocations.
105.Pp
106The following describes a general parse sequence:
107.Bl -enum
108.It
109initiate a parsing sequence with
110.Xr mchars_alloc 3
111and
112.Fn mparse_alloc ;
113.It
114open a file with
115.Xr open 2
116or
117.Fn mparse_open ;
118.It
119parse it with
120.Fn mparse_readfd ;
121.It
122close it with
123.Xr close 2 ;
124.It
125retrieve the syntax tree with
126.Fn mparse_result ;
127.It
128if information about the validity of the input is needed, fetch it with
129.Fn mparse_updaterc ;
130.It
131iterate over parse nodes with starting from the
132.Fa first
133member of the returned
134.Vt struct roff_meta ;
135.It
136free all allocated memory with
137.Fn mparse_free
138and
139.Xr mchars_free 3 ,
140or invoke
141.Fn mparse_reset
142and go back to step 2 to parse new files.
143.El
144.Sh REFERENCE
145This section documents the functions, types, and variables available
146via
147.In mandoc.h ,
148with the exception of those documented in
149.Xr mandoc_escape 3
150and
151.Xr mchars_alloc 3 .
152.Ss Types
153.Bl -ohang
154.It Vt "enum mandocerr"
155An error or warning message during parsing.
156.It Vt "enum mandoclevel"
157A classification of an
158.Vt "enum mandocerr"
159as regards system operation.
160See the DIAGNOSTICS section in
161.Xr mandoc 1
162regarding the meanings of the levels.
163.It Vt "struct mparse"
164An opaque pointer to a running parse sequence.
165Created with
166.Fn mparse_alloc
167and freed with
168.Fn mparse_free .
169This may be used across parsed input if
170.Fn mparse_reset
171is called between parses.
172.El
173.Ss Functions
174.Bl -ohang
175.It Fn deroff
176Obtain a text-only representation of a
177.Vt struct roff_node ,
178including text contained in its child nodes.
179To be used on children of the
180.Fa first
181member of
182.Vt struct roff_meta .
183When it is no longer needed, the pointer returned from
184.Fn deroff
185can be passed to
186.Xr free 3 .
187.It Fn mparse_alloc
188Allocate a parser.
189The arguments have the following effect:
190.Bl -tag -offset 5n -width inttype
191.It Ar options
192When the
193.Dv MPARSE_MDOC
194or
195.Dv MPARSE_MAN
196bit is set, only that parser is used.
197Otherwise, the document type is automatically detected.
198.Pp
199When the
200.Dv MPARSE_SO
201bit is set,
202.Xr roff 7
203.Ic \&so
204file inclusion requests are always honoured.
205Otherwise, if the request is the only content in an input file,
206only the file name is remembered, to be returned in the
207.Fa sodest
208field of
209.Vt struct roff_meta .
210.Pp
211When the
212.Dv MPARSE_QUICK
213bit is set, parsing is aborted after the NAME section.
214This is for example useful in
215.Xr makewhatis 8
216.Fl Q
217to quickly build minimal databases.
218.Pp
219When the
220.Dv MARSE_VALIDATE
221bit is set,
222.Fn mparse_result
223runs the validation functions before returning the syntax tree.
224This is almost always required, except in certain debugging scenarios,
225for example to dump unvalidated syntax trees.
226.It Ar os_e
227Operating system to check base system conventions for.
228If
229.Dv MANDOC_OS_OTHER ,
230the system is automatically detected from
231.Ic \&Os ,
232.Fl Ios ,
233or
234.Xr uname 3 .
235.It Ar os_s
236A default string for the
237.Xr mdoc 7
238.Ic \&Os
239macro, overriding the
240.Dv OSNAME
241preprocessor definition and the results of
242.Xr uname 3 .
243Passing
244.Dv NULL
245sets no default.
246.El
247.Pp
248The same parser may be used for multiple files so long as
249.Fn mparse_reset
250is called between parses.
251.Fn mparse_free
252must be called to free the memory allocated by this function.
253Declared in
254.In mandoc.h ,
255implemented in
256.Pa read.c .
257.It Fn mparse_free
258Free all memory allocated by
259.Fn mparse_alloc .
260Declared in
261.In mandoc.h ,
262implemented in
263.Pa read.c .
264.It Fn mparse_copy
265Dump a copy of the input to the standard output; used for
266.Fl man T Ns Cm man .
267Declared in
268.In mandoc.h ,
269implemented in
270.Pa read.c .
271.It Fn mparse_open
272Open the file for reading.
273If that fails and
274.Fa fname
275does not already end in
276.Ql .gz ,
277try again after appending
278.Ql .gz .
279Save the information whether the file is zipped or not.
280Return a file descriptor open for reading or -1 on failure.
281It can be passed to
282.Fn mparse_readfd
283or used directly.
284Declared in
285.In mandoc.h ,
286implemented in
287.Pa read.c .
288.It Fn mparse_readfd
289Parse a file descriptor opened with
290.Xr open 2
291or
292.Fn mparse_open .
293Pass the associated filename in
294.Va fname .
295This function may be called multiple times with different parameters; however,
296.Xr close 2
297and
298.Fn mparse_reset
299should be invoked between parses.
300Declared in
301.In mandoc.h ,
302implemented in
303.Pa read.c .
304.It Fn mparse_reset
305Reset a parser so that
306.Fn mparse_readfd
307may be used again.
308Declared in
309.In mandoc.h ,
310implemented in
311.Pa read.c .
312.It Fn mparse_result
313Obtain the result of a parse.
314Declared in
315.In mandoc.h ,
316implemented in
317.Pa read.c .
318.El
319.Ss Variables
320.Bl -ohang
321.It Va man_macronames
322The string representation of a
323.Xr man 7
324macro as indexed by
325.Vt "enum mant" .
326.It Va mdoc_argnames
327The string representation of an
328.Xr mdoc 7
329macro argument as indexed by
330.Vt "enum mdocargt" .
331.It Va mdoc_macronames
332The string representation of an
333.Xr mdoc 7
334macro as indexed by
335.Vt "enum mdoct" .
336.El
337.Sh IMPLEMENTATION NOTES
338This section consists of structural documentation for
339.Xr mdoc 7
340and
341.Xr man 7
342syntax trees and strings.
343.Ss Man and Mdoc Strings
344Strings may be extracted from mdoc and man meta-data, or from text
345nodes (MDOC_TEXT and MAN_TEXT, respectively).
346These strings have special non-printing formatting cues embedded in the
347text itself, as well as
348.Xr roff 7
349escapes preserved from input.
350Implementing systems will need to handle both situations to produce
351human-readable text.
352In general, strings may be assumed to consist of 7-bit ASCII characters.
353.Pp
354The following non-printing characters may be embedded in text strings:
355.Bl -tag -width Ds
356.It Dv ASCII_NBRSP
357A non-breaking space character.
358.It Dv ASCII_HYPH
359A soft hyphen.
360.It Dv ASCII_BREAK
361A breakable zero-width space.
362.El
363.Pp
364Escape characters are also passed verbatim into text strings.
365An escape character is a sequence of characters beginning with the
366backslash
367.Pq Sq \e .
368To construct human-readable text, these should be intercepted with
369.Xr mandoc_escape 3
370and converted with one the functions described in
371.Xr mchars_alloc 3 .
372.Ss Man Abstract Syntax Tree
373This AST is governed by the ontological rules dictated in
374.Xr man 7
375and derives its terminology accordingly.
376.Pp
377The AST is composed of
378.Vt struct roff_node
379nodes with element, root and text types as declared by the
380.Va type
381field.
382Each node also provides its parse point (the
383.Va line ,
384.Va pos ,
385and
386.Va sec
387fields), its position in the tree (the
388.Va parent ,
389.Va child ,
390.Va next
391and
392.Va prev
393fields) and some type-specific data.
394.Pp
395The tree itself is arranged according to the following normal form,
396where capitalised non-terminals represent nodes.
397.Pp
398.Bl -tag -width "ELEMENTXX" -compact
399.It ROOT
400\(<- mnode+
401.It mnode
402\(<- ELEMENT | TEXT | BLOCK
403.It BLOCK
404\(<- HEAD BODY
405.It HEAD
406\(<- mnode*
407.It BODY
408\(<- mnode*
409.It ELEMENT
410\(<- ELEMENT | TEXT*
411.It TEXT
412\(<- [[:ascii:]]*
413.El
414.Pp
415The only elements capable of nesting other elements are those with
416next-line scope as documented in
417.Xr man 7 .
418.Ss Mdoc Abstract Syntax Tree
419This AST is governed by the ontological
420rules dictated in
421.Xr mdoc 7
422and derives its terminology accordingly.
423.Qq In-line
424elements described in
425.Xr mdoc 7
426are described simply as
427.Qq elements .
428.Pp
429The AST is composed of
430.Vt struct roff_node
431nodes with block, head, body, element, root and text types as declared
432by the
433.Va type
434field.
435Each node also provides its parse point (the
436.Va line ,
437.Va pos ,
438and
439.Va sec
440fields), its position in the tree (the
441.Va parent ,
442.Va child ,
443.Va last ,
444.Va next
445and
446.Va prev
447fields) and some type-specific data, in particular, for nodes generated
448from macros, the generating macro in the
449.Va tok
450field.
451.Pp
452The tree itself is arranged according to the following normal form,
453where capitalised non-terminals represent nodes.
454.Pp
455.Bl -tag -width "ELEMENTXX" -compact
456.It ROOT
457\(<- mnode+
458.It mnode
459\(<- BLOCK | ELEMENT | TEXT
460.It BLOCK
461\(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
462.It ELEMENT
463\(<- TEXT*
464.It HEAD
465\(<- mnode*
466.It BODY
467\(<- mnode* [ENDBODY mnode*]
468.It TAIL
469\(<- mnode*
470.It TEXT
471\(<- [[:ascii:]]*
472.El
473.Pp
474Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
475the BLOCK production: these refer to punctuation marks.
476Furthermore, although a TEXT node will generally have a non-zero-length
477string, in the specific case of
478.Sq \&.Bd \-literal ,
479an empty line will produce a zero-length string.
480Multiple body parts are only found in invocations of
481.Sq \&Bl \-column ,
482where a new body introduces a new phrase.
483.Pp
484The
485.Xr mdoc 7
486syntax tree accommodates for broken block structures as well.
487The ENDBODY node is available to end the formatting associated
488with a given block before the physical end of that block.
489It has a non-null
490.Va end
491field, is of the BODY
492.Va type ,
493has the same
494.Va tok
495as the BLOCK it is ending, and has a
496.Va pending
497field pointing to that BLOCK's BODY node.
498It is an indirect child of that BODY node
499and has no children of its own.
500.Pp
501An ENDBODY node is generated when a block ends while one of its child
502blocks is still open, like in the following example:
503.Bd -literal -offset indent
504\&.Ao ao
505\&.Bo bo ac
506\&.Ac bc
507\&.Bc end
508.Ed
509.Pp
510This example results in the following block structure:
511.Bd -literal -offset indent
512BLOCK Ao
513    HEAD Ao
514    BODY Ao
515        TEXT ao
516        BLOCK Bo, pending -> Ao
517            HEAD Bo
518            BODY Bo
519                TEXT bo
520                TEXT ac
521                ENDBODY Ao, pending -> Ao
522                TEXT bc
523TEXT end
524.Ed
525.Pp
526Here, the formatting of the
527.Ic \&Ao
528block extends from TEXT ao to TEXT ac,
529while the formatting of the
530.Ic \&Bo
531block extends from TEXT bo to TEXT bc.
532It renders as follows in
533.Fl T Ns Cm ascii
534mode:
535.Pp
536.Dl <ao [bo ac> bc] end
537.Pp
538Support for badly-nested blocks is only provided for backward
539compatibility with some older
540.Xr mdoc 7
541implementations.
542Using badly-nested blocks is
543.Em strongly discouraged ;
544for example, the
545.Fl T Ns Cm html
546front-end to
547.Xr mandoc 1
548is unable to render them in any meaningful way.
549Furthermore, behaviour when encountering badly-nested blocks is not
550consistent across troff implementations, especially when using multiple
551levels of badly-nested blocks.
552.Sh SEE ALSO
553.Xr mandoc 1 ,
554.Xr man.cgi 3 ,
555.Xr mandoc_escape 3 ,
556.Xr mandoc_headers 3 ,
557.Xr mandoc_malloc 3 ,
558.Xr mansearch 3 ,
559.Xr mchars_alloc 3 ,
560.Xr tbl 3 ,
561.Xr eqn 7 ,
562.Xr man 7 ,
563.Xr mandoc_char 7 ,
564.Xr mdoc 7 ,
565.Xr roff 7 ,
566.Xr tbl 7
567.Sh AUTHORS
568.An -nosplit
569The
570.Nm
571library was written by
572.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
573and is maintained by
574.An Ingo Schwarze Aq Mt schwarze@openbsd.org .
575