input.in - OpenGrok history log for /openbsd/regress/usr.bin/mandoc/char/unicode/input.in

Revision	Date	Author	Comments
# 631d5f39	16-May-2024	schwarze <schwarze@openbsd.org>	Check that lower-case variants of UTF-16 surrogate escape sequences are rejected with the correct error message.
# c2eb3b8c	16-May-2024	schwarze <schwarze@openbsd.org>	Improve coverage of edge cases for 3-byte UTF-8 sequences. Coverage for 2-byte and 4-byte sequences was already reasonable.
# 614c3e4f	02-Jun-2021	schwarze <schwarze@openbsd.org>	test private use areas some more as they have proven fragile
# 39f98da6	02-Jun-2021	schwarze <schwarze@openbsd.org>	Cleanup: 1. Move invalid two-byte sequences after valid ones and make their descriptions easier to understand. 2. Replace the wrong and confusing expression "middle byte" with the correct term "start Cleanup: 1. Move invalid two-byte sequences after valid ones and make their descriptions easier to understand. 2. Replace the wrong and confusing expression "middle byte" with the correct term "start byte". 3. Add test lines for U+EFFFF and U+F0000. 4. Replace the unhelpful word "strange" with more descriptive terms. Arguably, nothing about this (or maybe everything?) is strange. show more ...
# 943fb9d8	04-Jul-2017	schwarze <schwarze@openbsd.org>	Messages of the -Wbase level now print STYLE:. Since this causes horrible churn anyway, profit of the opportunity to stop excessive testing, such that this is hopefully the last instance of such chu Messages of the -Wbase level now print STYLE:. Since this causes horrible churn anyway, profit of the opportunity to stop excessive testing, such that this is hopefully the last instance of such churn. Consistently use OpenBSD RCS tags, blank .Os, blank fourth .TH argument, and Mdocdate like everywhere else. Use -Ios=OpenBSD for platform-independent predictable output. show more ...
# 52a7f466	19-Dec-2014	schwarze <schwarze@openbsd.org>	Rewrite the low-level UTF-8 parser from scratch. It accepted invalid byte sequences like 0xc080-c1bf, 0xe08080-e09fbf, 0xeda080-edbfbf, and 0xf0808080-f08fbfbf, produced valid roff Unicode escape seq Rewrite the low-level UTF-8 parser from scratch. It accepted invalid byte sequences like 0xc080-c1bf, 0xe08080-e09fbf, 0xeda080-edbfbf, and 0xf0808080-f08fbfbf, produced valid roff Unicode escape sequences from them, and the algorithm contained strong defenses against any attempt to fix it. This cures an assertion failure in the terminal formatter caused by sneaking in ASCII 0x08 (backspace) by "encoding" it as an (invalid) multibyte UTF-8 sequence, found by jsg@ with afl. As a bonus, the new algorithm also reduces the code in the function by about 20%. show more ...