Name | Date | Size | #Lines | LOC | ||
---|---|---|---|---|---|---|
.. | 03-May-2022 | - | ||||
Data/ | H | 11-May-2020 | - | 10,781 | 10,299 | |
benchmark/ | H | 26-Sep-2020 | - | 229 | 152 | |
test/ | H | 26-Sep-2020 | - | 247 | 181 | |
unicode-data/ | H | 26-Sep-2020 | - | 526 | 415 | |
.gitignore | H A D | 11-May-2020 | 294 | 26 | 24 | |
.travis.yml | H A D | 26-Sep-2020 | 10 KiB | 246 | 202 | |
Changelog.md | H A D | 11-Oct-2020 | 698 | 53 | 29 | |
LICENSE | H A D | 11-May-2020 | 1.5 KiB | 28 | 22 | |
MAINTAINING.md | H A D | 26-Sep-2020 | 210 | 8 | 5 | |
NOTES.md | H A D | 11-May-2020 | 2.4 KiB | 55 | 41 | |
README.md | H A D | 26-Sep-2020 | 3.9 KiB | 78 | 66 | |
Setup.hs | H A D | 11-May-2020 | 46 | 3 | 2 | |
appveyor.yml | H A D | 26-Sep-2020 | 4 KiB | 90 | 78 | |
stack-7.10.yaml | H A D | 26-Sep-2020 | 191 | 9 | 8 | |
stack-8.0.yaml | H A D | 11-May-2020 | 61 | 6 | 5 | |
stack.yaml | H A D | 26-Sep-2020 | 35 | 4 | 3 | |
unicode-transforms.cabal | H A D | 03-May-2022 | 6.1 KiB | 223 | 210 |
README.md
1# Unicode Transforms 2 3[![Hackage](https://img.shields.io/hackage/v/unicode-transforms.svg?style=flat)](https://hackage.haskell.org/package/unicode-transforms) 4[![Build Status](https://travis-ci.com/composewell/unicode-transforms.svg?branch=master)](https://travis-ci.com/composewell/unicode-transforms) 5[![Windows Build status](https://ci.appveyor.com/api/projects/status/5wov8m1m0asvbv32?svg=true)](https://ci.appveyor.com/project/harendra-kumar/unicode-transforms) 6[![Coverage Status](https://coveralls.io/repos/composewell/unicode-transforms/badge.svg?branch=master&service=github)](https://coveralls.io/github/composewell/unicode-transforms?branch=master) 7 8Fast Unicode 13.0.0 normalization in Haskell (NFC, NFKC, NFD, NFKD). 9 10## What is normalization? 11 12Unicode characters with adornments (e.g. Á) can be represented in two different 13forms, as a single composed character (U+00C1 = Á) or as multiple decomposed 14characters (U+0041(A) U+0301( ́ ) = Á). They are differently encoded byte 15sequences but for humans they have exactly the same visual appearance. 16 17A regular byte comparison may tell that two strings are different even though 18they might be equivalent. We need to convert both the strings in a 19[`normalized`](http://unicode.org/reports/tr15/) form using the [Unicode 20Character Database](http://www.unicode.org/Public/UCD/latest/) before we can 21compare them for equivalence. For example: 22``` 23>> import Data.Text.Normalize 24>> normalize NFC "\193" == normalize NFC "\65\769" 25True 26``` 27 28## Performance 29 30Normalization performance comparison of this package (v0.3.7) with 31the [text-icu](http://hackage.haskell.org/package/text-icu) package 32using the [ICU C++ library](http://site.icu-project.org/download) 33version ICU4C 65.1 on macOS. The benchmarks compare the time taken in 34milliseconds to normalize files in different languages and normalization 35forms using both the packages. In most cases `unicode-transforms` 36outperforms ICU. 37 38``` 39Benchmark unicode-transforms(ms) ICU(ms) % Diff 40--------------- ---------------------- ------- -------- 41NFKD/Korean 7.78 37.10 +376.87 42NFD/Korean 7.86 37.06 +371.50 43NFKD/Vietnamese 6.85 12.48 +82.20 44NFKD/Deutsch 2.17 3.55 +63.30 45NFKD/English 1.71 2.78 +62.30 46NFKC/Korean 4.77 7.65 +60.28 47NFD/Deutsch 2.24 3.53 +57.41 48NFD/English 1.76 2.77 +57.32 49NFC/Vietnamese 10.66 16.63 +56.00 50NFKC/Vietnamese 10.95 16.58 +51.43 51NFD/Devanagari 6.48 8.68 +34.10 52NFC/Devanagari 6.77 8.49 +25.48 53NFD/AllChars 6.18 7.41 +19.91 54NFD/Japanese 7.80 9.20 +17.99 55NFKC/Devanagari 7.33 8.48 +15.74 56NFKD/Japanese 8.71 10.05 +15.39 57NFD/Vietnamese 5.94 6.83 +14.99 58NFKD/Devanagari 7.59 8.68 +14.27 59NFKD/AllChars 9.80 10.66 +8.82 60NFKC/Deutsch 3.21 3.18 -0.72 61NFC/Korean 4.62 4.38 -5.35 62NFKC/English 2.21 2.06 -6.88 63NFC/English 2.19 2.04 -7.21 64NFKC/AllChars 14.67 9.75 -50.51 65NFC/Deutsch 3.02 1.95 -54.39 66NFKC/Japanese 12.46 5.42 -129.93 67NFC/AllChars 9.72 3.58 -171.63 68NFC/Japanese 11.90 3.04 -292.04 69``` 70 71## Talks 72 73* Talks: [Functional Conf 2018 Video](https://www.youtube.com/watch?v=aJvwORrBJ0o) | [Functional Conf 2018 Slides](https://www.slideshare.net/HarendraKumar10/high-performance-haskell) 74 75## Contributing 76Please use https://github.com/harendra-kumar/unicode-transforms to raise 77issues, or send pull requests. 78