• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..27-Oct-2020-

build-mac/H27-Oct-2020-758741

doc/H03-May-2022-7550

script/H03-May-2022-4,2441,369

src/H03-May-2022-16,00910,872

test/H03-May-2022-15088

.gitignoreH A D27-Oct-202041 54

AUTHORSH A D27-Oct-2020449 1711

COPYINGH A D27-Oct-202068.9 KiB1,3171,097

INSTALLH A D27-Oct-2020849 2716

README.mdH A D27-Oct-20204.4 KiB209185

uchardet.doapH A D27-Oct-20202 KiB5239

uchardet.pc.inH A D27-Oct-2020279 119

README.md

1# uchardet
2
3Forked from [freedesktop/uchardet](https://github.com/freedesktop/uchardet)
4
5[uchardet](https://www.freedesktop.org/wiki/Software/uchardet/) is an encoding detector library, which takes a sequence of bytes in an unknown character encoding without any additional information, and attempts to determine the encoding of the text. Returned encoding names are [iconv](https://www.gnu.org/software/libiconv/)-compatible.
6
7uchardet started as a C language binding of the original C++ implementation of the universal charset detection library by Mozilla. It can now detect more charsets, and more reliably than the original implementation.
8
9The original code of universalchardet is available at http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/
10
11Techniques used by universalchardet are described at http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
12
13## Supported Languages/Encodings
14
15  * International (Unicode)
16    * UTF-8
17    * UTF-16BE / UTF-16LE
18    * UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
19  * Arabic
20    * ISO-8859-6
21    * WINDOWS-1256
22  * Bulgarian
23    * ISO-8859-5
24    * WINDOWS-1251
25  * Chinese
26    * ISO-2022-CN
27    * BIG5
28    * EUC-TW
29    * GB18030
30    * HZ-GB-2312
31  * Croatian:
32    * ISO-8859-2
33    * ISO-8859-13
34    * ISO-8859-16
35    * Windows-1250
36    * IBM852
37    * MacCentralEurope
38  * Czech
39    * Windows-1250
40    * ISO-8859-2
41    * IBM852
42    * MacCentralEurope
43  * Danish
44    * ISO-8859-1
45    * ISO-8859-15
46    * WINDOWS-1252
47  * English
48    * ASCII
49  * Esperanto
50    * ISO-8859-3
51  * Estonian
52    * ISO-8859-4
53    * ISO-8859-13
54    * ISO-8859-13
55    * Windows-1252
56    * Windows-1257
57  * Finnish
58    * ISO-8859-1
59    * ISO-8859-4
60    * ISO-8859-9
61    * ISO-8859-13
62    * ISO-8859-15
63    * WINDOWS-1252
64  * French
65    * ISO-8859-1
66    * ISO-8859-15
67    * WINDOWS-1252
68  * German
69    * ISO-8859-1
70    * WINDOWS-1252
71  * Greek
72    * ISO-8859-7
73    * WINDOWS-1253
74  * Hebrew
75    * ISO-8859-8
76    * WINDOWS-1255
77  * Hungarian:
78    * ISO-8859-2
79    * WINDOWS-1250
80  * Irish Gaelic
81    * ISO-8859-1
82    * ISO-8859-9
83    * ISO-8859-15
84    * WINDOWS-1252
85  * Italian
86    * ISO-8859-1
87    * ISO-8859-3
88    * ISO-8859-9
89    * ISO-8859-15
90    * WINDOWS-1252
91  * Japanese
92    * ISO-2022-JP
93    * SHIFT_JIS
94    * EUC-JP
95  * Korean
96    * ISO-2022-KR
97    * EUC-KR / UHC
98  * Lithuanian
99    * ISO-8859-4
100    * ISO-8859-10
101    * ISO-8859-13
102  * Latvian
103    * ISO-8859-4
104    * ISO-8859-10
105    * ISO-8859-13
106  * Maltese
107    * ISO-8859-3
108  * Polish:
109    * ISO-8859-2
110    * ISO-8859-13
111    * ISO-8859-16
112    * Windows-1250
113    * IBM852
114    * MacCentralEurope
115  * Portuguese
116    * ISO-8859-1
117    * ISO-8859-9
118    * ISO-8859-15
119    * WINDOWS-1252
120  * Romanian:
121    * ISO-8859-2
122    * ISO-8859-16
123    * Windows-1250
124    * IBM852
125  * Russian
126    * ISO-8859-5
127    * KOI8-R
128    * WINDOWS-1251
129    * MAC-CYRILLIC
130    * IBM866
131    * IBM855
132  * Slovak
133    * Windows-1250
134    * ISO-8859-2
135    * IBM852
136    * MacCentralEurope
137  * Slovene
138    * ISO-8859-2
139    * ISO-8859-16
140    * Windows-1250
141    * IBM852
142    * MacCentralEurope
143  * Spanish
144    * ISO-8859-1
145    * ISO-8859-15
146    * WINDOWS-1252
147  * Swedish
148    * ISO-8859-1
149    * ISO-8859-4
150    * ISO-8859-9
151    * ISO-8859-15
152    * WINDOWS-1252
153  * Thai
154    * TIS-620
155    * ISO-8859-11
156  * Turkish:
157    * ISO-8859-3
158    * ISO-8859-9
159  * Vietnamese:
160    * VISCII
161    * Windows-1258
162  * Others
163    * WINDOWS-1252
164
165## Installation
166
167### Build from source
168
169If you prefer a development version, clone the git repository:
170
171    git clone https://github.com/PyYoshi/uchardet.git
172
173The source can be browsed at: https://github.com/PyYoshi/uchardet
174
175    mkdir build/ && cd build/
176    cmake ..
177    make
178    make install
179
180## Usage
181
182### Command Line
183
184```
185uchardet Command Line Tool
186Version 0.0.6
187
188Authors: BYVoid, Jehan
189Bug Report: https://bugs.freedesktop.org/enter_bug.cgi?product=uchardet
190
191Usage:
192 uchardet [Options] [File]...
193
194Options:
195 -v, --version         Print version and build information.
196 -h, --help            Print this help.
197 ```
198### Library
199
200See [uchardet.h](https://github.com/PyYoshi/uchardet/blob/cchardet/src/uchardet.h)
201
202## Licenses
203
204* [Mozilla Public License Version 1.1](http://www.mozilla.org/MPL/1.1/)
205* [GNU General Public License, version 2.0](http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html) or later.
206* [GNU Lesser General Public License, version 2.1](http://www.gnu.org/licenses/old-licenses/lgpl-2.1.en.html) or later.
207
208See the file `COPYING` for the complete text of these 3 licenses.
209