History log of /openbsd/regress/usr.bin/mandoc/char/unicode/input.in (Results 1 – 6 of 6)
Revision Date Author Comments
# 631d5f39 16-May-2024 schwarze <schwarze@openbsd.org>

Check that lower-case variants of UTF-16 surrogate escape sequences
are rejected with the correct error message.


# c2eb3b8c 16-May-2024 schwarze <schwarze@openbsd.org>

Improve coverage of edge cases for 3-byte UTF-8 sequences.
Coverage for 2-byte and 4-byte sequences was already reasonable.


# 614c3e4f 02-Jun-2021 schwarze <schwarze@openbsd.org>

test private use areas some more as they have proven fragile


# 39f98da6 02-Jun-2021 schwarze <schwarze@openbsd.org>

Cleanup:
1. Move invalid two-byte sequences after valid ones
and make their descriptions easier to understand.
2. Replace the wrong and confusing expression "middle byte"
with the correct term "start

Cleanup:
1. Move invalid two-byte sequences after valid ones
and make their descriptions easier to understand.
2. Replace the wrong and confusing expression "middle byte"
with the correct term "start byte".
3. Add test lines for U+EFFFF and U+F0000.
4. Replace the unhelpful word "strange" with more descriptive terms.
Arguably, nothing about this (or maybe everything?) is strange.

show more ...


# 943fb9d8 04-Jul-2017 schwarze <schwarze@openbsd.org>

Messages of the -Wbase level now print STYLE:. Since this
causes horrible churn anyway, profit of the opportunity to stop
excessive testing, such that this is hopefully the last instance
of such chu

Messages of the -Wbase level now print STYLE:. Since this
causes horrible churn anyway, profit of the opportunity to stop
excessive testing, such that this is hopefully the last instance
of such churn. Consistently use OpenBSD RCS tags, blank .Os,
blank fourth .TH argument, and Mdocdate like everywhere else.
Use -Ios=OpenBSD for platform-independent predictable output.

show more ...


# 52a7f466 19-Dec-2014 schwarze <schwarze@openbsd.org>

Rewrite the low-level UTF-8 parser from scratch.
It accepted invalid byte sequences like 0xc080-c1bf, 0xe08080-e09fbf,
0xeda080-edbfbf, and 0xf0808080-f08fbfbf, produced valid roff Unicode
escape seq

Rewrite the low-level UTF-8 parser from scratch.
It accepted invalid byte sequences like 0xc080-c1bf, 0xe08080-e09fbf,
0xeda080-edbfbf, and 0xf0808080-f08fbfbf, produced valid roff Unicode
escape sequences from them, and the algorithm contained strong
defenses against any attempt to fix it.

This cures an assertion failure in the terminal formatter caused
by sneaking in ASCII 0x08 (backspace) by "encoding" it as an (invalid)
multibyte UTF-8 sequence, found by jsg@ with afl.

As a bonus, the new algorithm also reduces the code in the function
by about 20%.

show more ...