History log of /openbsd/usr.bin/mandoc/roff.c (Results 1 – 25 of 272)
Revision Date Author Comments
# f6697133 24-Oct-2023 schwarze <schwarze@openbsd.org>

Implement the man(7) .MR macro, a 2023 GNU extension.
The syntax and semantics is almost identical to mdoc(7) .Xr.

This will be needed for reading the groff manual pages once our port
will be update

Implement the man(7) .MR macro, a 2023 GNU extension.
The syntax and semantics is almost identical to mdoc(7) .Xr.

This will be needed for reading the groff manual pages once our port
will be updated to 1.23, and the Linux Manual Pages Project is also
determined to start using it sooner or later. I did not advocate for
this new macro, but since we want to remain able to read all manual
pages found in the wild, there is little choice but to support it.
At least it is easy to do, they basically copied .Xr.

show more ...


# 6d9b308d 23-Oct-2023 schwarze <schwarze@openbsd.org>

Support some escape sequences, in particular character escape sequences,
inside \w arguments, and skip most other escape sequences when measuring
the output length in this way because most escape seq

Support some escape sequences, in particular character escape sequences,
inside \w arguments, and skip most other escape sequences when measuring
the output length in this way because most escape sequences contribute
little or nothing to text width: for example, consider font escapes in
terminal output.

This implementation is very rudimentary. In particular, it assumes that
every character has the same width. No attempt is made to detect
double-width or zero-width Unicode characters or to take dependencies on
output devices or fonts into account. These limitations are hard to
avoid because mandoc has to interpolate \w at the parsing stage when the
output device is not yet known. I really do not want the content of the
syntax tree to depend on the output device.

Feature requested by Paul <Eggert at cs dot ucla dot edu>, who also
submitted a patch, but i chose to commit this very different patch
with almost the same functionality.
His input was still very valuable because complete support for \w is
out of the question, and consequently, the main task is identifying
subsets of the feature that are needed for real-world manual pages
and can be supported without uprooting the whole forest.

show more ...


# 29079a11 22-Oct-2023 schwarze <schwarze@openbsd.org>

While doing delayed expansion of escape sequences in macro arguments,
correctly check for failure of the in-place expansion function.
If an argument not only does recursive delayed expansion
but infi

While doing delayed expansion of escape sequences in macro arguments,
correctly check for failure of the in-place expansion function.
If an argument not only does recursive delayed expansion
but infinitely recursive delayed expansion, this bug could
result in an ESCAPE_EXPAND assertion failure.

Thanks to Eric van Gyzen <vangyzen at FreeBSD> for finding this bug
by inspecting FreeBSD source code.

show more ...


# 7bda13b1 21-Oct-2023 schwarze <schwarze@openbsd.org>

When parsing a macro argument results in delayed escape sequence
expansion, re-check for all contained escape sequences whether they
need delayed expansion, not just for the particular escape sequenc

When parsing a macro argument results in delayed escape sequence
expansion, re-check for all contained escape sequences whether they
need delayed expansion, not just for the particular escape sequences
that triggered delayed expansion in the first place. This is needed
because delayed expansion can result in strings containing nested
escape sequences recursively needing delayed expansion, too.

This fixes an assertion failure in krb5_openlog(3), see:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=266882

Thanks to Wolfram Schneider <wosch at FreeBSD> for reporting the bug
and to Baptiste Daroussin <bapt at FreeBSD> for forwarding the report.

show more ...


# d9a51c35 26-Dec-2022 jmc <jmc@openbsd.org>

spelling fixes; from paul tagliamonte
amendments to his diff are noted on tech


# 1c23c756 16-Aug-2022 schwarze <schwarze@openbsd.org>

Even though the constant ASCII_ESC is only used in the roff pre-parser roff.c,
move it to the top level include file mandoc.h to reduce the risk of causing
clashes when introducing new ASCII_* consta

Even though the constant ASCII_ESC is only used in the roff pre-parser roff.c,
move it to the top level include file mandoc.h to reduce the risk of causing
clashes when introducing new ASCII_* constants in the future.

show more ...


# 140f9883 07-Jun-2022 schwarze <schwarze@openbsd.org>

Purge duplicate error reporting from the .tr request parser:
the error was already reported earlier when roff_expand()
called roff_escape().


# a72149c5 03-Jun-2022 schwarze <schwarze@openbsd.org>

During identifier parsing, handle undefined escape sequences
in the same way as groff:
* \\ is always reduced to \
* \. is always reduced to .
* other undefined escape sequences are usually reduced t

During identifier parsing, handle undefined escape sequences
in the same way as groff:
* \\ is always reduced to \
* \. is always reduced to .
* other undefined escape sequences are usually reduced to the escape name,
for example \G to G, except during the expansion of expanding escape
sequences having the standard argument form (in particular \* and \n),
in which case the backslash is preserved literally.

Yes, this is confusing indeed.
For example, the following have the same meaning:
* .ds \. and .ds . which is not the same as .ds \\.
* \*[\.] and \*[.] which is not the same as \*[\\.]
* .ds \G and .ds G which is not the same as .ds \\G
* \*[\G] and \*[\\G] which is not the same as \*[G] <- sic!

To feel less dirty, have a leaning toothpick, if you are so inclined.

This patch also slightly improves the string shown by the "escaped
character not allowed in a name" error message.

show more ...


# 9784ce3e 02-Jun-2022 schwarze <schwarze@openbsd.org>

Avoid the layering violation of re-parsing for \E in roff_expand().
To that end, add another argument to roff_escape()
returning the index of the escape name.
This also makes the code in roff_escape(

Avoid the layering violation of re-parsing for \E in roff_expand().
To that end, add another argument to roff_escape()
returning the index of the escape name.
This also makes the code in roff_escape() a bit more uniform
in so far as it no longer needs the "char esc_name" local variable
but now does everything with indices into buf[].
No functional change.

show more ...


# 75a6bad9 31-May-2022 schwarze <schwarze@openbsd.org>

Rudimentary implementation of the \A escape sequence, following groff
semantics (test identifier for syntactical validity), not at all
following the completely unrelated Heirloom semantics (define
hy

Rudimentary implementation of the \A escape sequence, following groff
semantics (test identifier for syntactical validity), not at all
following the completely unrelated Heirloom semantics (define
hyperlink target position).

The main motivation for providing this implementation is to get \A
into the parsing class ESCAPE_EXPAND that corresponds to groff parsing
behaviour, which is quite similar to the \B escape sequence (test
numerical expression for syntactical validity). This is likely
to improve parsing of nested escape sequences in the future.

Validation isn't perfect yet. In particular, this implementation
rejects \A arguments containing some escape sequences that groff
allows to slip through. But that is unlikely to cause trouble even
in documents using \A for non-trivial purposes. Rejecting the nested
escapes in question might even improve robustnest because the rejected
names are unlikely to really be usable for practical purposes - no
matter that groff dubiously considers them syntactically valid.

show more ...


# 6f49ebc4 31-May-2022 schwarze <schwarze@openbsd.org>

Trivial patch to put the roff(7) \g (interpolate format of register)
escape sequence into the correct parsing class, ESCAPE_EXPAND.
Expansion of \g is supposed to work exactly like the expansion
of t

Trivial patch to put the roff(7) \g (interpolate format of register)
escape sequence into the correct parsing class, ESCAPE_EXPAND.
Expansion of \g is supposed to work exactly like the expansion
of the related escape sequence \n (interpolate register value),
but since we ignore the .af (assign output format) request,
we just interpolate an empty string to replace the \g sequence.

Surprising as it may seem, this actually makes a formatting difference
for deviate input like ".O\gNx" which used to raise bogus "escaped
character not allowed in a name" and "skipping unknown macro" errors
and printed nothing, whereas now it correctly prints "OpenBSD".

show more ...


# 83a9dfe1 30-May-2022 schwarze <schwarze@openbsd.org>

Dummy implementation of the roff(7) \V (interpolate environment variable)
escape sequence. This is needed to get \V into the correct parsing
class, ESCAPE_EXPAND.

It is intentional that mandoc(1) o

Dummy implementation of the roff(7) \V (interpolate environment variable)
escape sequence. This is needed to get \V into the correct parsing
class, ESCAPE_EXPAND.

It is intentional that mandoc(1) output is *not* influenced by environment
variables, so interpolate the name of the variable with some decorating
punctuation rather than interpolating its value.

show more ...


# cd14d642 19-May-2022 schwarze <schwarze@openbsd.org>

Make roff_expand() parse left-to-right rather than right-to-left.
Some escape sequences have side effects on global state, implying
that the order of evaluation matters. For example, this fixes the

Make roff_expand() parse left-to-right rather than right-to-left.
Some escape sequences have side effects on global state, implying
that the order of evaluation matters. For example, this fixes the
long-standing bug that "\n+x\n+x\n+x" after ".nr x 0 1" used to
print "321"; now it correctly prints "123".

Right-to-left parsing was convenient because it implicitly handled
nested escape sequences. With correct left-to-right parsing, nesting
now requires an explicit implementation, here solved as follows:
1. Handle nested expanding escape sequences iteratively.
When finding one, expand it, then retry parsing the enclosing escape
sequence from the beginning, which will ultimately succeed as soon
as it no longer contains any nested expanding escape sequences.
2. Handle nested non-expanding escape sequences recursively.
When finding one, the escape sequence parser calls itself to find
the end of the inner sequence, then continues parsing the outer
sequence after that point.

This requires the mandoc_escape() function to operate in two different
modes. The roff(7) parser uses it in a mode where it generates
diagnostics and may return an expansion request instead of a parse
result. All other callers, in particular the formatters, use it
in a simpler mode that never generates diagnostics and always returns
a definite parsing result, but that requires all expanding escape
sequences to already have been expanded earlier. The bulk of the
code is the same for both modes.
Since this required a major rewrite of the function anyway, move
it into its own new file roff_escape.c and out of the file mandoc.c,
which was misnamed in the first place and lacks a clear focus.

As a side benefit, this also fixes a number of assertion failures
that tb@ found with afl(1), for example "\n\\\\*0", "\v\-\\*0",
and "\w\-\\\\\$0*0".

As another side benefit, it also resolves some code duplication
between mandoc_escape() and roff_expand() and centralizes all
handling of escape sequences (except for expansion) in roff_escape.c,
hopefully easing maintenance and feature improvements in the future.

While here, also move end-of-input handling out of the complicated
function roff_expand() and into the simpler function roff_parse_comment(),
making the logic easier to understand.

Since this is a major reorganization of a central component of
mandoc(1), stability of the program might slightly suffer for a few
weeks, but i believe that's not a problem at this point of the
release cycle. The new code already satisfies the regression suite,
but more tweaking and regression testing to further improve the
handling of various escape sequences will likely follow in the near
future.

show more ...


# 928431b4 01-May-2022 schwarze <schwarze@openbsd.org>

Split a new function roff_parse_comment() out of roff_expand() because this
functionality is not needed when called from roff_getarg(). This makes the
long and complicated function roff_expand() sig

Split a new function roff_parse_comment() out of roff_expand() because this
functionality is not needed when called from roff_getarg(). This makes the
long and complicated function roff_expand() significantly shorter, and also
simpler in so far as it no longer needs to return ROFF_APPEND.
No functional change intended.

show more ...


# f79258f3 30-Apr-2022 schwarze <schwarze@openbsd.org>

Provide a new function roff_req_or_macro() to parse and handle a request
or macro, including context-dependent error handling inside tbl(7) code
and inside .ce/.rj blocks. Use it both in the top lev

Provide a new function roff_req_or_macro() to parse and handle a request
or macro, including context-dependent error handling inside tbl(7) code
and inside .ce/.rj blocks. Use it both in the top level roff(7) parser
and inside conditional blocks.

This fixes an assertion failure triggered by ".if 1 .ce" inside tbl(7)
code, found by tb@ using afl(1).

As a side benefit for readability, only one place remains in the
code that calls the main handler functions for the various roff(7)
requests. This patch also improves column numbers in some error
messages and various comments.

show more ...


# ed59e75b 30-Apr-2022 schwarze <schwarze@openbsd.org>

Refactor the handler function roff_block_sub() for clarity and simplicity.

1. Do not needlessly access the function pointer table roffs[].
Instead, simply call the block closing function directly.

Refactor the handler function roff_block_sub() for clarity and simplicity.

1. Do not needlessly access the function pointer table roffs[].
Instead, simply call the block closing function directly.

2. Sort code: handle both cases of block closing at the beginning
of the function rather than one at the beginning and one at the end.

3. Trim excessive, partially repetitive and obvious comments, also
making the comments considerably more precise.

No functional change.

show more ...


# edb0312f 28-Apr-2022 schwarze <schwarze@openbsd.org>

The syntax of the roff(7) .mc request is quite special
and the roff_onearg() parsing function is too generic,
so provide a dedicated parsing function instead.

This fixes an assertion failure when an

The syntax of the roff(7) .mc request is quite special
and the roff_onearg() parsing function is too generic,
so provide a dedicated parsing function instead.

This fixes an assertion failure when an \o escape sequence is
passed as the argument; the bug was found by tb@ using afl(1).
It also makes mandoc output more similar to groff in various cases.

show more ...


# a68c8a85 24-Apr-2022 schwarze <schwarze@openbsd.org>

When we open a new .while loop, let's not attempt to close out
another enclosing .while loop at the same time.
Instead, postpone the closing until the next iteration of ROFF_RERUN.

This prevents one

When we open a new .while loop, let's not attempt to close out
another enclosing .while loop at the same time.
Instead, postpone the closing until the next iteration of ROFF_RERUN.

This prevents one-line constructions like ".while 0 .while 0 something"
and ".while rx .while rx .rr x" (which admittedly aren't particularly
useful) from dying of abort(3), which was a bug tb@ found with afl(1).

show more ...


# c1a68d52 24-Apr-2022 schwarze <schwarze@openbsd.org>

If a .shift request has a negative argument, do not use a negative array
index but use 0 instead of the argument, just like groff.
Warn about the invalid argument.
While here, fix the column number i

If a .shift request has a negative argument, do not use a negative array
index but use 0 instead of the argument, just like groff.
Warn about the invalid argument.
While here, fix the column number in another warning message.

Segfault reported by tb@, found with afl(1).

show more ...


# e6cf71aa 13-Apr-2022 schwarze <schwarze@openbsd.org>

Surprisingly, groff supports multiple copy mode escapes at the
beginning of an escape sequence: \, \E, \EE, \EEE, and so on all do
the same outside copy mode, so let them do the same in mandoc(1), to

Surprisingly, groff supports multiple copy mode escapes at the
beginning of an escape sequence: \, \E, \EE, \EEE, and so on all do
the same outside copy mode, so let them do the same in mandoc(1), too.

This fixes an assertion failure triggered by \EE*X that tb@ found
with afl(1). The first E was consumed by roff_expand(), but that
function failed to recognize the escape sequence as the expansion
of a user-defined string and handed it over to mandoc_escape(),
which consumed the second E and then died on an assertion because
it is not prepared to handle user-defined strings. Fix this by
letting *both* functions handly arbitrary numbers of 'E's correctly.

show more ...


# 8055da74 04-Oct-2021 schwarze <schwarze@openbsd.org>

store the operating system name obtained from uname(3) in the adequate
struct together with similar state date rather than in a function-scope
static variable, such that it can be free(3)d in roff_ma

store the operating system name obtained from uname(3) in the adequate
struct together with similar state date rather than in a function-scope
static variable, such that it can be free(3)d in roff_man_free();
no functional change

show more ...


# b69f004a 04-Oct-2021 schwarze <schwarze@openbsd.org>

Do not leak 64 bytes of heap memory every time a manual page calls
a user-defined macro. Calls of standard mdoc(7) and man(7) macros
were unaffected, so the effect on OpenBSD manual pages was small,

Do not leak 64 bytes of heap memory every time a manual page calls
a user-defined macro. Calls of standard mdoc(7) and man(7) macros
were unaffected, so the effect on OpenBSD manual pages was small,
about 80 Kilobytes grand total for a full run of "makewhatis
/usr/share/man".

Argument expansion contexts for user-defined macros are stored on
a stack that grows as needed if calls of user-defined macros are
nested or recursive. Individual stack entries contain dynamically
allocated arrays of pointers to arguments; these argument arrays
also grow as needed if user-defined macros take more than eight
arguments. The mistake was that argument arrays of already
initialized expansion contexts were leaked rather than reused on
subsequent macro calls.

I found this issue in a systematic hunt for memory leaks after
Michael <Stapelberg at Debian> reported memory exhaustion problems
on the production server manpages.debian.org. This sub-Megabyte
leak is not the cause of Michael's trouble, though, where Gigabytes
of memory are being wasted. We are still investigating whether the
original problem may be related to his supervisor process, which is
written in Go, rather than to mandoc.

show more ...


# 7d063611 10-Aug-2021 schwarze <schwarze@openbsd.org>

Support two-character font names (BI, CW, CR, CB, CI)
in the tbl(7) layout font modifier.

Get rid of the TBL_CELL_BOLD and TBL_CELL_ITALIC flags and use
the usual ESCAPE_FONT* enum mandoc_esc member

Support two-character font names (BI, CW, CR, CB, CI)
in the tbl(7) layout font modifier.

Get rid of the TBL_CELL_BOLD and TBL_CELL_ITALIC flags and use
the usual ESCAPE_FONT* enum mandoc_esc members from mandoc.h instead,
which simplifies and unifies some code.

While here, also support CB and CI in roff(7) \f escape sequences
and in roff(7) .ft requests for all output modes. Using those is
certainly not recommended because portability is limited even with
groff, but supporting them makes some existing third-party manual
pages look better, in particular in HTML output mode.

Bug-compatible with groff as far as i'm aware, except that i consider
font names starting with the '\n' (ASCII 0x0a line feed) character
so insane that i decided to not support them.

Missing feature reported by nabijaczleweli dot xyz in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=992002.
I used none of the code from the initial patch submitted by
nabijaczleweli, but some of their ideas.
Final patch tested by them, too.

show more ...


# dd9cc97d 27-Jun-2021 schwarze <schwarze@openbsd.org>

add a style message about overlong text lines,
trying very hard to avoid false positives,
not at all trying to catch as many cases as possible;

feature originally suggested by tb@,
OK tb@ kn@ jmc@


# fbe6102b 27-Aug-2020 schwarze <schwarze@openbsd.org>

Avoid artifacts in the most common case of closing conditional blocks
when no arguments follow the closing brace, \}.
For example, the line "'br\}" contained in the pod2man(1) preamble
would throw a

Avoid artifacts in the most common case of closing conditional blocks
when no arguments follow the closing brace, \}.
For example, the line "'br\}" contained in the pod2man(1) preamble
would throw a bogus "escaped character not allowed in a name" error.
This issue was originally reported by Chris Bennett on ports@,
and afresh1@ noticed it came from the pod2man(1) preamble.

show more ...


1234567891011