Revision tags: v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3, v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1, v5.2.0, v5.3.0, v5.2.0rc, v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc, v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0, v4.4.3, v4.4.2, v4.4.1, v4.4.0, v4.5.0, v4.4.0rc, v4.2.4 |
#
6af9a77b |
| 06-Aug-2015 |
John Marino <draco@marino.st> |
libc/regex: Replace old regex library with modified TRE
The existing DragonFly REGEX library has several limitations, including lack of wide character support and no collation ability due to its bei
libc/regex: Replace old regex library with modified TRE
The existing DragonFly REGEX library has several limitations, including lack of wide character support and no collation ability due to its being locked to POSIX/C locale. It's also slow and doesn't pass a number of tests of the AT&T Research Regex testsuite:
basic : TEST testregex, 539 tests, 0 errors categorize : TEST testregex, 20 tests, 0 errors nullsubexpr : TEST testregex, 84 tests, 31 errors leftassoc : TEST testregex, 12 tests, 12 errors rightassoc : TEST testregex, 24 tests, 0 errors forcedassoc : TEST testregex, 48 tests, 8 errors repetition : TEST testregex, 129 tests, 37 errors
Now it achieves these scores (elevated with new regnexec support):
basic : TEST testregex, 808 tests, 0 errors categorize : TEST testregex, 26 tests, 0 errors nullsubexpr : TEST testregex, 172 tests, 0 errors leftassoc : TEST testregex, 12 tests, 12 errors rightassoc : TEST testregex, 36 tests, 0 errors forcedassoc : TEST testregex, 84 tests, 0 errors repetition : TEST testregex, 241 tests, 0 errors
Here's proof that the regex library is now locale sensitive:
> env LANG=C sed /abandonn[a-z]/d fwl-sort-C.txt a abandonnâmes abandonnât abandonnâtes abandonnèrent abandonné abandonnée abandonnées abandonnés abord abords absence
> env LANG=fr_FR sed /abandonn[a-z]/d fwl-sort-C.txt a abord abords absence accepta acceptai acceptaient acceptais acceptait acceptant acceptas acceptasse
Several new functions have been added to to libc:
variations of regcomp: regcomp_l, regncomp, regncomp_l, regwcomp, regwcomp_l, regnwcomp, regnwcomp_l
variations of regexec: regnexec, regwexec, regwnexec
The regex.3 and re_format.7 map pages have been updated and symlinked accordingly.
show more ...
|