roff.c - OpenGrok history log for /openbsd/usr.bin/mandoc/roff.c

Revision	Date	Author	Comments
# f6697133	24-Oct-2023	schwarze <schwarze@openbsd.org>	Implement the man(7) .MR macro, a 2023 GNU extension. The syntax and semantics is almost identical to mdoc(7) .Xr. This will be needed for reading the groff manual pages once our port will be update Implement the man(7) .MR macro, a 2023 GNU extension. The syntax and semantics is almost identical to mdoc(7) .Xr. This will be needed for reading the groff manual pages once our port will be updated to 1.23, and the Linux Manual Pages Project is also determined to start using it sooner or later. I did not advocate for this new macro, but since we want to remain able to read all manual pages found in the wild, there is little choice but to support it. At least it is easy to do, they basically copied .Xr. show more ...
# 6d9b308d	23-Oct-2023	schwarze <schwarze@openbsd.org>	Support some escape sequences, in particular character escape sequences, inside \w arguments, and skip most other escape sequences when measuring the output length in this way because most escape seq Support some escape sequences, in particular character escape sequences, inside \w arguments, and skip most other escape sequences when measuring the output length in this way because most escape sequences contribute little or nothing to text width: for example, consider font escapes in terminal output. This implementation is very rudimentary. In particular, it assumes that every character has the same width. No attempt is made to detect double-width or zero-width Unicode characters or to take dependencies on output devices or fonts into account. These limitations are hard to avoid because mandoc has to interpolate \w at the parsing stage when the output device is not yet known. I really do not want the content of the syntax tree to depend on the output device. Feature requested by Paul <Eggert at cs dot ucla dot edu>, who also submitted a patch, but i chose to commit this very different patch with almost the same functionality. His input was still very valuable because complete support for \w is out of the question, and consequently, the main task is identifying subsets of the feature that are needed for real-world manual pages and can be supported without uprooting the whole forest. show more ...
# 29079a11	22-Oct-2023	schwarze <schwarze@openbsd.org>	While doing delayed expansion of escape sequences in macro arguments, correctly check for failure of the in-place expansion function. If an argument not only does recursive delayed expansion but infi While doing delayed expansion of escape sequences in macro arguments, correctly check for failure of the in-place expansion function. If an argument not only does recursive delayed expansion but infinitely recursive delayed expansion, this bug could result in an ESCAPE_EXPAND assertion failure. Thanks to Eric van Gyzen <vangyzen at FreeBSD> for finding this bug by inspecting FreeBSD source code. show more ...
# 7bda13b1	21-Oct-2023	schwarze <schwarze@openbsd.org>	When parsing a macro argument results in delayed escape sequence expansion, re-check for all contained escape sequences whether they need delayed expansion, not just for the particular escape sequenc When parsing a macro argument results in delayed escape sequence expansion, re-check for all contained escape sequences whether they need delayed expansion, not just for the particular escape sequences that triggered delayed expansion in the first place. This is needed because delayed expansion can result in strings containing nested escape sequences recursively needing delayed expansion, too. This fixes an assertion failure in krb5_openlog(3), see: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=266882 Thanks to Wolfram Schneider <wosch at FreeBSD> for reporting the bug and to Baptiste Daroussin <bapt at FreeBSD> for forwarding the report. show more ...
# d9a51c35	26-Dec-2022	jmc <jmc@openbsd.org>	spelling fixes; from paul tagliamonte amendments to his diff are noted on tech
# 1c23c756	16-Aug-2022	schwarze <schwarze@openbsd.org>	Even though the constant ASCII_ESC is only used in the roff pre-parser roff.c, move it to the top level include file mandoc.h to reduce the risk of causing clashes when introducing new ASCII_* consta Even though the constant ASCII_ESC is only used in the roff pre-parser roff.c, move it to the top level include file mandoc.h to reduce the risk of causing clashes when introducing new ASCII_* constants in the future. show more ...
# 140f9883	07-Jun-2022	schwarze <schwarze@openbsd.org>	Purge duplicate error reporting from the .tr request parser: the error was already reported earlier when roff_expand() called roff_escape().
# a72149c5	03-Jun-2022	schwarze <schwarze@openbsd.org>	During identifier parsing, handle undefined escape sequences in the same way as groff: * \\ is always reduced to \ * \. is always reduced to . * other undefined escape sequences are usually reduced t During identifier parsing, handle undefined escape sequences in the same way as groff: * \\ is always reduced to \ * \. is always reduced to . * other undefined escape sequences are usually reduced to the escape name, for example \G to G, except during the expansion of expanding escape sequences having the standard argument form (in particular \* and \n), in which case the backslash is preserved literally. Yes, this is confusing indeed. For example, the following have the same meaning: * .ds \. and .ds . which is not the same as .ds \\. * \[\.] and \[.] which is not the same as \[\\.] .ds \G and .ds G which is not the same as .ds \\G * \[\G] and \[\\G] which is not the same as \*[G] <- sic! To feel less dirty, have a leaning toothpick, if you are so inclined. This patch also slightly improves the string shown by the "escaped character not allowed in a name" error message. show more ...
# 9784ce3e	02-Jun-2022	schwarze <schwarze@openbsd.org>	Avoid the layering violation of re-parsing for \E in roff_expand(). To that end, add another argument to roff_escape() returning the index of the escape name. This also makes the code in roff_escape( Avoid the layering violation of re-parsing for \E in roff_expand(). To that end, add another argument to roff_escape() returning the index of the escape name. This also makes the code in roff_escape() a bit more uniform in so far as it no longer needs the "char esc_name" local variable but now does everything with indices into buf[]. No functional change. show more ...
# 75a6bad9	31-May-2022	schwarze <schwarze@openbsd.org>	Rudimentary implementation of the \A escape sequence, following groff semantics (test identifier for syntactical validity), not at all following the completely unrelated Heirloom semantics (define hy Rudimentary implementation of the \A escape sequence, following groff semantics (test identifier for syntactical validity), not at all following the completely unrelated Heirloom semantics (define hyperlink target position). The main motivation for providing this implementation is to get \A into the parsing class ESCAPE_EXPAND that corresponds to groff parsing behaviour, which is quite similar to the \B escape sequence (test numerical expression for syntactical validity). This is likely to improve parsing of nested escape sequences in the future. Validation isn't perfect yet. In particular, this implementation rejects \A arguments containing some escape sequences that groff allows to slip through. But that is unlikely to cause trouble even in documents using \A for non-trivial purposes. Rejecting the nested escapes in question might even improve robustnest because the rejected names are unlikely to really be usable for practical purposes - no matter that groff dubiously considers them syntactically valid. show more ...
# 6f49ebc4	31-May-2022	schwarze <schwarze@openbsd.org>	Trivial patch to put the roff(7) \g (interpolate format of register) escape sequence into the correct parsing class, ESCAPE_EXPAND. Expansion of \g is supposed to work exactly like the expansion of t Trivial patch to put the roff(7) \g (interpolate format of register) escape sequence into the correct parsing class, ESCAPE_EXPAND. Expansion of \g is supposed to work exactly like the expansion of the related escape sequence \n (interpolate register value), but since we ignore the .af (assign output format) request, we just interpolate an empty string to replace the \g sequence. Surprising as it may seem, this actually makes a formatting difference for deviate input like ".O\gNx" which used to raise bogus "escaped character not allowed in a name" and "skipping unknown macro" errors and printed nothing, whereas now it correctly prints "OpenBSD". show more ...
# 83a9dfe1	30-May-2022	schwarze <schwarze@openbsd.org>	Dummy implementation of the roff(7) \V (interpolate environment variable) escape sequence. This is needed to get \V into the correct parsing class, ESCAPE_EXPAND. It is intentional that mandoc(1) o Dummy implementation of the roff(7) \V (interpolate environment variable) escape sequence. This is needed to get \V into the correct parsing class, ESCAPE_EXPAND. It is intentional that mandoc(1) output is not influenced by environment variables, so interpolate the name of the variable with some decorating punctuation rather than interpolating its value. show more ...
# cd14d642	19-May-2022	schwarze <schwarze@openbsd.org>	Make roff_expand() parse left-to-right rather than right-to-left. Some escape sequences have side effects on global state, implying that the order of evaluation matters. For example, this fixes the Make roff_expand() parse left-to-right rather than right-to-left. Some escape sequences have side effects on global state, implying that the order of evaluation matters. For example, this fixes the long-standing bug that "\n+x\n+x\n+x" after ".nr x 0 1" used to print "321"; now it correctly prints "123". Right-to-left parsing was convenient because it implicitly handled nested escape sequences. With correct left-to-right parsing, nesting now requires an explicit implementation, here solved as follows: 1. Handle nested expanding escape sequences iteratively. When finding one, expand it, then retry parsing the enclosing escape sequence from the beginning, which will ultimately succeed as soon as it no longer contains any nested expanding escape sequences. 2. Handle nested non-expanding escape sequences recursively. When finding one, the escape sequence parser calls itself to find the end of the inner sequence, then continues parsing the outer sequence after that point. This requires the mandoc_escape() function to operate in two different modes. The roff(7) parser uses it in a mode where it generates diagnostics and may return an expansion request instead of a parse result. All other callers, in particular the formatters, use it in a simpler mode that never generates diagnostics and always returns a definite parsing result, but that requires all expanding escape sequences to already have been expanded earlier. The bulk of the code is the same for both modes. Since this required a major rewrite of the function anyway, move it into its own new file roff_escape.c and out of the file mandoc.c, which was misnamed in the first place and lacks a clear focus. As a side benefit, this also fixes a number of assertion failures that tb@ found with afl(1), for example "\n\\\\0", "\v\-\\0", and "\w\-\\\\\$0*0". As another side benefit, it also resolves some code duplication between mandoc_escape() and roff_expand() and centralizes all handling of escape sequences (except for expansion) in roff_escape.c, hopefully easing maintenance and feature improvements in the future. While here, also move end-of-input handling out of the complicated function roff_expand() and into the simpler function roff_parse_comment(), making the logic easier to understand. Since this is a major reorganization of a central component of mandoc(1), stability of the program might slightly suffer for a few weeks, but i believe that's not a problem at this point of the release cycle. The new code already satisfies the regression suite, but more tweaking and regression testing to further improve the handling of various escape sequences will likely follow in the near future. show more ...
# 928431b4	01-May-2022	schwarze <schwarze@openbsd.org>	Split a new function roff_parse_comment() out of roff_expand() because this functionality is not needed when called from roff_getarg(). This makes the long and complicated function roff_expand() sig Split a new function roff_parse_comment() out of roff_expand() because this functionality is not needed when called from roff_getarg(). This makes the long and complicated function roff_expand() significantly shorter, and also simpler in so far as it no longer needs to return ROFF_APPEND. No functional change intended. show more ...
# f79258f3	30-Apr-2022	schwarze <schwarze@openbsd.org>	Provide a new function roff_req_or_macro() to parse and handle a request or macro, including context-dependent error handling inside tbl(7) code and inside .ce/.rj blocks. Use it both in the top lev Provide a new function roff_req_or_macro() to parse and handle a request or macro, including context-dependent error handling inside tbl(7) code and inside .ce/.rj blocks. Use it both in the top level roff(7) parser and inside conditional blocks. This fixes an assertion failure triggered by ".if 1 .ce" inside tbl(7) code, found by tb@ using afl(1). As a side benefit for readability, only one place remains in the code that calls the main handler functions for the various roff(7) requests. This patch also improves column numbers in some error messages and various comments. show more ...
# ed59e75b	30-Apr-2022	schwarze <schwarze@openbsd.org>	Refactor the handler function roff_block_sub() for clarity and simplicity. 1. Do not needlessly access the function pointer table roffs[]. Instead, simply call the block closing function directly. Refactor the handler function roff_block_sub() for clarity and simplicity. 1. Do not needlessly access the function pointer table roffs[]. Instead, simply call the block closing function directly. 2. Sort code: handle both cases of block closing at the beginning of the function rather than one at the beginning and one at the end. 3. Trim excessive, partially repetitive and obvious comments, also making the comments considerably more precise. No functional change. show more ...
# edb0312f	28-Apr-2022	schwarze <schwarze@openbsd.org>	The syntax of the roff(7) .mc request is quite special and the roff_onearg() parsing function is too generic, so provide a dedicated parsing function instead. This fixes an assertion failure when an The syntax of the roff(7) .mc request is quite special and the roff_onearg() parsing function is too generic, so provide a dedicated parsing function instead. This fixes an assertion failure when an \o escape sequence is passed as the argument; the bug was found by tb@ using afl(1). It also makes mandoc output more similar to groff in various cases. show more ...
# a68c8a85	24-Apr-2022	schwarze <schwarze@openbsd.org>	When we open a new .while loop, let's not attempt to close out another enclosing .while loop at the same time. Instead, postpone the closing until the next iteration of ROFF_RERUN. This prevents one When we open a new .while loop, let's not attempt to close out another enclosing .while loop at the same time. Instead, postpone the closing until the next iteration of ROFF_RERUN. This prevents one-line constructions like ".while 0 .while 0 something" and ".while rx .while rx .rr x" (which admittedly aren't particularly useful) from dying of abort(3), which was a bug tb@ found with afl(1). show more ...
# c1a68d52	24-Apr-2022	schwarze <schwarze@openbsd.org>	If a .shift request has a negative argument, do not use a negative array index but use 0 instead of the argument, just like groff. Warn about the invalid argument. While here, fix the column number i If a .shift request has a negative argument, do not use a negative array index but use 0 instead of the argument, just like groff. Warn about the invalid argument. While here, fix the column number in another warning message. Segfault reported by tb@, found with afl(1). show more ...
# e6cf71aa	13-Apr-2022	schwarze <schwarze@openbsd.org>	Surprisingly, groff supports multiple copy mode escapes at the beginning of an escape sequence: \, \E, \EE, \EEE, and so on all do the same outside copy mode, so let them do the same in mandoc(1), to Surprisingly, groff supports multiple copy mode escapes at the beginning of an escape sequence: \, \E, \EE, \EEE, and so on all do the same outside copy mode, so let them do the same in mandoc(1), too. This fixes an assertion failure triggered by \EEX that tb@ found with afl(1). The first E was consumed by roff_expand(), but that function failed to recognize the escape sequence as the expansion of a user-defined string and handed it over to mandoc_escape(), which consumed the second E and then died on an assertion because it is not prepared to handle user-defined strings. Fix this by letting both* functions handly arbitrary numbers of 'E's correctly. show more ...
# 8055da74	04-Oct-2021	schwarze <schwarze@openbsd.org>	store the operating system name obtained from uname(3) in the adequate struct together with similar state date rather than in a function-scope static variable, such that it can be free(3)d in roff_ma store the operating system name obtained from uname(3) in the adequate struct together with similar state date rather than in a function-scope static variable, such that it can be free(3)d in roff_man_free(); no functional change show more ...
# b69f004a	04-Oct-2021	schwarze <schwarze@openbsd.org>	Do not leak 64 bytes of heap memory every time a manual page calls a user-defined macro. Calls of standard mdoc(7) and man(7) macros were unaffected, so the effect on OpenBSD manual pages was small, Do not leak 64 bytes of heap memory every time a manual page calls a user-defined macro. Calls of standard mdoc(7) and man(7) macros were unaffected, so the effect on OpenBSD manual pages was small, about 80 Kilobytes grand total for a full run of "makewhatis /usr/share/man". Argument expansion contexts for user-defined macros are stored on a stack that grows as needed if calls of user-defined macros are nested or recursive. Individual stack entries contain dynamically allocated arrays of pointers to arguments; these argument arrays also grow as needed if user-defined macros take more than eight arguments. The mistake was that argument arrays of already initialized expansion contexts were leaked rather than reused on subsequent macro calls. I found this issue in a systematic hunt for memory leaks after Michael <Stapelberg at Debian> reported memory exhaustion problems on the production server manpages.debian.org. This sub-Megabyte leak is not the cause of Michael's trouble, though, where Gigabytes of memory are being wasted. We are still investigating whether the original problem may be related to his supervisor process, which is written in Go, rather than to mandoc. show more ...
# 7d063611	10-Aug-2021	schwarze <schwarze@openbsd.org>	Support two-character font names (BI, CW, CR, CB, CI) in the tbl(7) layout font modifier. Get rid of the TBL_CELL_BOLD and TBL_CELL_ITALIC flags and use the usual ESCAPE_FONT* enum mandoc_esc member Support two-character font names (BI, CW, CR, CB, CI) in the tbl(7) layout font modifier. Get rid of the TBL_CELL_BOLD and TBL_CELL_ITALIC flags and use the usual ESCAPE_FONT* enum mandoc_esc members from mandoc.h instead, which simplifies and unifies some code. While here, also support CB and CI in roff(7) \f escape sequences and in roff(7) .ft requests for all output modes. Using those is certainly not recommended because portability is limited even with groff, but supporting them makes some existing third-party manual pages look better, in particular in HTML output mode. Bug-compatible with groff as far as i'm aware, except that i consider font names starting with the '\n' (ASCII 0x0a line feed) character so insane that i decided to not support them. Missing feature reported by nabijaczleweli dot xyz in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=992002. I used none of the code from the initial patch submitted by nabijaczleweli, but some of their ideas. Final patch tested by them, too. show more ...
# dd9cc97d	27-Jun-2021	schwarze <schwarze@openbsd.org>	add a style message about overlong text lines, trying very hard to avoid false positives, not at all trying to catch as many cases as possible; feature originally suggested by tb@, OK tb@ kn@ jmc@
# fbe6102b	27-Aug-2020	schwarze <schwarze@openbsd.org>	Avoid artifacts in the most common case of closing conditional blocks when no arguments follow the closing brace, \}. For example, the line "'br\}" contained in the pod2man(1) preamble would throw a Avoid artifacts in the most common case of closing conditional blocks when no arguments follow the closing brace, \}. For example, the line "'br\}" contained in the pod2man(1) preamble would throw a bogus "escaped character not allowed in a name" error. This issue was originally reported by Chris Bennett on ports@, and afresh1@ noticed it came from the pod2man(1) preamble. show more ...
12 3 4 5 6 7 8 9 10 11