History log of /dragonfly/tools/tools/locale/etc/manual-input.UTF-8 (Results 1 – 8 of 8)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.2.1, v6.2.0, v6.3.0, v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3, v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1, v5.2.0, v5.3.0, v5.2.0rc, v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc, v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0, v4.4.3, v4.4.2, v4.4.1, v4.4.0, v4.5.0, v4.4.0rc
# 29d602c8 01-Sep-2015 John Marino <draco@marino.st>

UTF8: fix a couple of number ctype definitions

During testing of new number ctype, I found a typo one of the CJK number
definitions and two Roman Numeral characters that were set as numbers but
shou

UTF8: fix a couple of number ctype definitions

During testing of new number ctype, I found a typo one of the CJK number
definitions and two Roman Numeral characters that were set as numbers but
should not be (according to equivalent python check).

show more ...


# 7560f083 01-Sep-2015 John Marino <draco@marino.st>

UTF8 locales: Fully consider "CIRCLED_" set as alphabet

This means defining the "A"-"Z" and "a"-"z" circled versions of the
Enclosed Alphanumerics block (0x2460-24FF) as hexidecimal digits and
defin

UTF8 locales: Fully consider "CIRCLED_" set as alphabet

This means defining the "A"-"Z" and "a"-"z" circled versions of the
Enclosed Alphanumerics block (0x2460-24FF) as hexidecimal digits and
defining the to-upper and to-lower conversions between the upper case
and lower case circled alphabets.

show more ...


# 7294feb8 01-Sep-2015 John Marino <draco@marino.st>

UTF-8: Multiple improvements (and detection of possible issue)

This commit started out intending to fix "digit" definition on unicode,
which it mostly does, but a lot more happened in the end, namel

UTF-8: Multiple improvements (and detection of possible issue)

This commit started out intending to fix "digit" definition on unicode,
which it mostly does, but a lot more happened in the end, namely:

* digits apparently are not part of CLDR definition. I added a section
in the manual portion of UTF-8 source file that defines digit classes
for generated sections.
* Add numbers classification for entire UTF-8. Currently DragonFly and
all BSDs do not support "number" type. However, localedef understands
it (its supported on Illumos), but currently the number flag value is
zero, so it's a no-op. A short term goal is to have DragonFly be the
first BSD with proper number ctype handling.
* Redefine "special" ctype once and for all. There is no definitive
agreement on what "special" characters are. According to wiki which
got it from unicode, it starts with 33 characters (0x20 - 0x2F, 0x3A -
0x40, 0x5B - 0x60, 0x7B - 0x7E). However, localedef objects to <space>
because it sets "graph" and "print" flags, and <space> can't be graph.
As a result, the <space> is not considered "special" here. Moreover,
the punctuation in Latin-1 supplement is "special". The division and
multiplication signs are ambiguous, so I set them to special (since
plus and minus signs are special). Finally, with the most doubt, the
punctuation of "general punctuation" block is also considered special
although I couldn't find convincing evidence either way. Given the
lack of definition, I don't think "special" classification is really
used, especially not in unicode.
* Fix NON-BREAK_SPACE classification (set as graph and space on previous
commit)
* the MICRO character was also warning due to being classified as both
lower (in Greek section) and punctuation, so remove the punct. class.
* When possible, don't define graph if digit is defined, and similarly
with graph and punct. Both digit and punct also set graph flag so
having both is redundant.
* add several new block definitions:
- Syloti Nagri
- Common Indic Number Forms
- Phags-pa
- Saurashra
- Kayah Li
- Rejang
- Javanese
- Cham
- Tal Viet
- Meetei Mayek & extension
* Detection of possible bug in localedef
The Tai Tham definition are producing the wrong code but there's
nothing wrong with the definitions. The 6 unused characters between
the two digit definitions should not be graphable, but as soon as
one "digit" is defined after the first digit range is defined, all
the characters between are marked as graphable and digits. There
are similar "fill-ins" but so far only with Thai Tam. It was
detected while outputting all "digit" types against a python program
that does the same and this error was reveal. It requires further
investigation about exactly what is causing it (and thus where the
bug is) but right now it's either a bad definition elsewhere that
affects Thai Tam or localedef has a bug somewhere (avl lookup?)

show more ...


# 2fd39989 30-Aug-2015 John Marino <draco@marino.st>

UTF8 locales: Refine Latin supplement more

The multiplication and division sign were missing, and the control
characters were not outlined. Also set superscript 1,2,3 as digits.
There are not showi

UTF8 locales: Refine Latin supplement more

The multiplication and division sign were missing, and the control
characters were not outlined. Also set superscript 1,2,3 as digits.
There are not showing up with iswdigit() function so that requires
further investigation (iswdigit does work for '0','1',...'9' however)

show more ...


# 51fe16e4 30-Aug-2015 John Marino <draco@marino.st>

UTF8 locales: Include inverted exclamation mark too

I was off by one character when I defined the first range on the previous
commit. It starts with an inverted exclamation mark, not the cent sign.


# 89aeb470 30-Aug-2015 John Marino <draco@marino.st>

UTF8 locales: Complete implemenation of Latin-1 Supplement

The Latin-1 Supplement block of UTF-8 (U0080-U00FF) was not fully
implemented. Specifically it was missing U00A1 (inverted exclamation)
th

UTF8 locales: Complete implemenation of Latin-1 Supplement

The Latin-1 Supplement block of UTF-8 (U0080-U00FF) was not fully
implemented. Specifically it was missing U00A1 (inverted exclamation)
through U00BF (inverted question mark). Some popular characters this
affected was cent sign, pound sign, Yen sign, broken bar, copyright
symbol and superscripts. On international keyboards, AltGR + number
key wouldn't output correctly. This addition to the manual ctype input
definitions (and subsequent regenerations) will fix these issues.

Reported by: profmakx, ivadasz
Diagnostics: YRabbit

show more ...


# e9e78086 16-Aug-2015 John Marino <draco@marino.st>

rollup UTF-8: Manually add NO-BREAK_SPACE

Move this definition from cldr2def to manual UTF8 definition.
It was omitted in the first draft accidentally.


# 775a693d 16-Aug-2015 John Marino <draco@marino.st>

Add locale tool to generate "rollup" UTF-8 src file

The first version of the "common" UTF-8 file was hand-assembled by myself.
This is obviously prone to error and is very hard to maintain (the
prev

Add locale tool to generate "rollup" UTF-8 src file

The first version of the "common" UTF-8 file was hand-assembled by myself.
This is obviously prone to error and is very hard to maintain (the
previous incarnation was never maintained; not once after it was added).

To address these issues, create a new tool (using cldr2def as inspiration)
to create a composite UTF-8 source files using all available POSIX input
from CLDR. What can't be generated still comes from a manual fragment
that is added to the common source file at the end.

This allows periodic maintenance when CLDR issues new releases. We are
converging on using this composite (aka "rollup") file for all UTF-8
locales.

show more ...