Name | Date | Size | #Lines | LOC | ||
---|---|---|---|---|---|---|
.. | 27-Oct-2020 | - | ||||
build-mac/ | H | 27-Oct-2020 | - | 758 | 741 | |
doc/ | H | 03-May-2022 | - | 75 | 50 | |
script/ | H | 03-May-2022 | - | 4,244 | 1,369 | |
src/ | H | 03-May-2022 | - | 16,009 | 10,872 | |
test/ | H | 03-May-2022 | - | 150 | 88 | |
.gitignore | H A D | 27-Oct-2020 | 41 | 5 | 4 | |
AUTHORS | H A D | 27-Oct-2020 | 449 | 17 | 11 | |
COPYING | H A D | 27-Oct-2020 | 68.9 KiB | 1,317 | 1,097 | |
INSTALL | H A D | 27-Oct-2020 | 849 | 27 | 16 | |
README.md | H A D | 27-Oct-2020 | 4.4 KiB | 209 | 185 | |
uchardet.doap | H A D | 27-Oct-2020 | 2 KiB | 52 | 39 | |
uchardet.pc.in | H A D | 27-Oct-2020 | 279 | 11 | 9 |
README.md
1# uchardet 2 3Forked from [freedesktop/uchardet](https://github.com/freedesktop/uchardet) 4 5[uchardet](https://www.freedesktop.org/wiki/Software/uchardet/) is an encoding detector library, which takes a sequence of bytes in an unknown character encoding without any additional information, and attempts to determine the encoding of the text. Returned encoding names are [iconv](https://www.gnu.org/software/libiconv/)-compatible. 6 7uchardet started as a C language binding of the original C++ implementation of the universal charset detection library by Mozilla. It can now detect more charsets, and more reliably than the original implementation. 8 9The original code of universalchardet is available at http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/ 10 11Techniques used by universalchardet are described at http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html 12 13## Supported Languages/Encodings 14 15 * International (Unicode) 16 * UTF-8 17 * UTF-16BE / UTF-16LE 18 * UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431 19 * Arabic 20 * ISO-8859-6 21 * WINDOWS-1256 22 * Bulgarian 23 * ISO-8859-5 24 * WINDOWS-1251 25 * Chinese 26 * ISO-2022-CN 27 * BIG5 28 * EUC-TW 29 * GB18030 30 * HZ-GB-2312 31 * Croatian: 32 * ISO-8859-2 33 * ISO-8859-13 34 * ISO-8859-16 35 * Windows-1250 36 * IBM852 37 * MacCentralEurope 38 * Czech 39 * Windows-1250 40 * ISO-8859-2 41 * IBM852 42 * MacCentralEurope 43 * Danish 44 * ISO-8859-1 45 * ISO-8859-15 46 * WINDOWS-1252 47 * English 48 * ASCII 49 * Esperanto 50 * ISO-8859-3 51 * Estonian 52 * ISO-8859-4 53 * ISO-8859-13 54 * ISO-8859-13 55 * Windows-1252 56 * Windows-1257 57 * Finnish 58 * ISO-8859-1 59 * ISO-8859-4 60 * ISO-8859-9 61 * ISO-8859-13 62 * ISO-8859-15 63 * WINDOWS-1252 64 * French 65 * ISO-8859-1 66 * ISO-8859-15 67 * WINDOWS-1252 68 * German 69 * ISO-8859-1 70 * WINDOWS-1252 71 * Greek 72 * ISO-8859-7 73 * WINDOWS-1253 74 * Hebrew 75 * ISO-8859-8 76 * WINDOWS-1255 77 * Hungarian: 78 * ISO-8859-2 79 * WINDOWS-1250 80 * Irish Gaelic 81 * ISO-8859-1 82 * ISO-8859-9 83 * ISO-8859-15 84 * WINDOWS-1252 85 * Italian 86 * ISO-8859-1 87 * ISO-8859-3 88 * ISO-8859-9 89 * ISO-8859-15 90 * WINDOWS-1252 91 * Japanese 92 * ISO-2022-JP 93 * SHIFT_JIS 94 * EUC-JP 95 * Korean 96 * ISO-2022-KR 97 * EUC-KR / UHC 98 * Lithuanian 99 * ISO-8859-4 100 * ISO-8859-10 101 * ISO-8859-13 102 * Latvian 103 * ISO-8859-4 104 * ISO-8859-10 105 * ISO-8859-13 106 * Maltese 107 * ISO-8859-3 108 * Polish: 109 * ISO-8859-2 110 * ISO-8859-13 111 * ISO-8859-16 112 * Windows-1250 113 * IBM852 114 * MacCentralEurope 115 * Portuguese 116 * ISO-8859-1 117 * ISO-8859-9 118 * ISO-8859-15 119 * WINDOWS-1252 120 * Romanian: 121 * ISO-8859-2 122 * ISO-8859-16 123 * Windows-1250 124 * IBM852 125 * Russian 126 * ISO-8859-5 127 * KOI8-R 128 * WINDOWS-1251 129 * MAC-CYRILLIC 130 * IBM866 131 * IBM855 132 * Slovak 133 * Windows-1250 134 * ISO-8859-2 135 * IBM852 136 * MacCentralEurope 137 * Slovene 138 * ISO-8859-2 139 * ISO-8859-16 140 * Windows-1250 141 * IBM852 142 * MacCentralEurope 143 * Spanish 144 * ISO-8859-1 145 * ISO-8859-15 146 * WINDOWS-1252 147 * Swedish 148 * ISO-8859-1 149 * ISO-8859-4 150 * ISO-8859-9 151 * ISO-8859-15 152 * WINDOWS-1252 153 * Thai 154 * TIS-620 155 * ISO-8859-11 156 * Turkish: 157 * ISO-8859-3 158 * ISO-8859-9 159 * Vietnamese: 160 * VISCII 161 * Windows-1258 162 * Others 163 * WINDOWS-1252 164 165## Installation 166 167### Build from source 168 169If you prefer a development version, clone the git repository: 170 171 git clone https://github.com/PyYoshi/uchardet.git 172 173The source can be browsed at: https://github.com/PyYoshi/uchardet 174 175 mkdir build/ && cd build/ 176 cmake .. 177 make 178 make install 179 180## Usage 181 182### Command Line 183 184``` 185uchardet Command Line Tool 186Version 0.0.6 187 188Authors: BYVoid, Jehan 189Bug Report: https://bugs.freedesktop.org/enter_bug.cgi?product=uchardet 190 191Usage: 192 uchardet [Options] [File]... 193 194Options: 195 -v, --version Print version and build information. 196 -h, --help Print this help. 197 ``` 198### Library 199 200See [uchardet.h](https://github.com/PyYoshi/uchardet/blob/cchardet/src/uchardet.h) 201 202## Licenses 203 204* [Mozilla Public License Version 1.1](http://www.mozilla.org/MPL/1.1/) 205* [GNU General Public License, version 2.0](http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html) or later. 206* [GNU Lesser General Public License, version 2.1](http://www.gnu.org/licenses/old-licenses/lgpl-2.1.en.html) or later. 207 208See the file `COPYING` for the complete text of these 3 licenses. 209