• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

bin/H27-Oct-2020-4535

src/H27-Oct-2020-28,09718,598

CHANGES.rstH A D27-Oct-20202.3 KiB12177

COPYINGH A D27-Oct-202068.9 KiB1,3171,097

MANIFEST.inH A D27-Oct-2020162 65

PKG-INFOH A D27-Oct-202010.7 KiB429294

README.rstH A D27-Oct-20204.3 KiB287196

setup.cfgH A D27-Oct-202077 96

setup.pyH A D27-Oct-20205.9 KiB150130

README.rst

1cChardet
2========
3
4cChardet is high speed universal character encoding detector. - binding to `uchardet`_.
5
6.. image:: https://badge.fury.io/py/cchardet.svg
7   :target: https://badge.fury.io/py/cchardet
8   :alt: PyPI version
9
10.. image:: https://github.com/PyYoshi/cChardet/workflows/Build%20for%20Linux/badge.svg?branch=master
11   :target: https://github.com/PyYoshi/cChardet/actions?query=workflow%3A%22Build+for+Linux%22
12   :alt: Build for Linux
13
14.. image:: https://github.com/PyYoshi/cChardet/workflows/Build%20for%20macOS/badge.svg?branch=master
15   :target: https://github.com/PyYoshi/cChardet/actions?query=workflow%3A%22Build+for+macOS%22
16   :alt: Build for macOS
17
18.. image:: https://github.com/PyYoshi/cChardet/workflows/Build%20for%20windows/badge.svg?branch=master
19   :target: https://github.com/PyYoshi/cChardet/actions?query=workflow%3A%22Build+for+windows%22
20   :alt: Build for Windows
21
22Supported Languages/Encodings
23-----------------------------
24
25-  International (Unicode)
26
27   -  UTF-8
28   -  UTF-16BE / UTF-16LE
29   -  UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 /
30      X-ISO-10646-UCS-4-21431
31
32-  Arabic
33
34   -  ISO-8859-6
35   -  WINDOWS-1256
36
37-  Bulgarian
38
39   -  ISO-8859-5
40   -  WINDOWS-1251
41
42-  Chinese
43
44   -  ISO-2022-CN
45   -  BIG5
46   -  EUC-TW
47   -  GB18030
48   -  HZ-GB-2312
49
50-  Croatian:
51
52   -  ISO-8859-2
53   -  ISO-8859-13
54   -  ISO-8859-16
55   -  Windows-1250
56   -  IBM852
57   -  MAC-CENTRALEUROPE
58
59-  Czech
60
61   -  Windows-1250
62   -  ISO-8859-2
63   -  IBM852
64   -  MAC-CENTRALEUROPE
65
66-  Danish
67
68   -  ISO-8859-1
69   -  ISO-8859-15
70   -  WINDOWS-1252
71
72-  English
73
74   -  ASCII
75
76-  Esperanto
77
78   -  ISO-8859-3
79
80-  Estonian
81
82   -  ISO-8859-4
83   -  ISO-8859-13
84   -  ISO-8859-13
85   -  Windows-1252
86   -  Windows-1257
87
88-  Finnish
89
90   -  ISO-8859-1
91   -  ISO-8859-4
92   -  ISO-8859-9
93   -  ISO-8859-13
94   -  ISO-8859-15
95   -  WINDOWS-1252
96
97-  French
98
99   -  ISO-8859-1
100   -  ISO-8859-15
101   -  WINDOWS-1252
102
103-  German
104
105   -  ISO-8859-1
106   -  WINDOWS-1252
107
108-  Greek
109
110   -  ISO-8859-7
111   -  WINDOWS-1253
112
113-  Hebrew
114
115   -  ISO-8859-8
116   -  WINDOWS-1255
117
118-  Hungarian:
119
120   -  ISO-8859-2
121   -  WINDOWS-1250
122
123-  Irish Gaelic
124
125   -  ISO-8859-1
126   -  ISO-8859-9
127   -  ISO-8859-15
128   -  WINDOWS-1252
129
130-  Italian
131
132   -  ISO-8859-1
133   -  ISO-8859-3
134   -  ISO-8859-9
135   -  ISO-8859-15
136   -  WINDOWS-1252
137
138-  Japanese
139
140   -  ISO-2022-JP
141   -  SHIFT\_JIS
142   -  EUC-JP
143
144-  Korean
145
146   -  ISO-2022-KR
147   -  EUC-KR / UHC
148
149-  Lithuanian
150
151   -  ISO-8859-4
152   -  ISO-8859-10
153   -  ISO-8859-13
154
155-  Latvian
156
157   -  ISO-8859-4
158   -  ISO-8859-10
159   -  ISO-8859-13
160
161-  Maltese
162
163   -  ISO-8859-3
164
165-  Polish:
166
167   -  ISO-8859-2
168   -  ISO-8859-13
169   -  ISO-8859-16
170   -  Windows-1250
171   -  IBM852
172   -  MAC-CENTRALEUROPE
173
174-  Portuguese
175
176   -  ISO-8859-1
177   -  ISO-8859-9
178   -  ISO-8859-15
179   -  WINDOWS-1252
180
181-  Romanian:
182
183   -  ISO-8859-2
184   -  ISO-8859-16
185   -  Windows-1250
186   -  IBM852
187
188-  Russian
189
190   -  ISO-8859-5
191   -  KOI8-R
192   -  WINDOWS-1251
193   -  MAC-CYRILLIC
194   -  IBM866
195   -  IBM855
196
197-  Slovak
198
199   -  Windows-1250
200   -  ISO-8859-2
201   -  IBM852
202   -  MAC-CENTRALEUROPE
203
204-  Slovene
205
206   -  ISO-8859-2
207   -  ISO-8859-16
208   -  Windows-1250
209   -  IBM852
210   -  M
211
212Example
213-------
214
215.. code-block:: python
216
217    # -*- coding: utf-8 -*-
218    import cchardet as chardet
219    with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
220        msg = f.read()
221        result = chardet.detect(msg)
222        print(result)
223
224Benchmark
225---------
226
227.. code-block:: bash
228
229    $ cd src/
230    $ pip install chardet
231    $ python tests/bench.py
232
233
234Results
235~~~~~~~
236
237CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
238
239RAM: DDR3 1600Mhz 16GB
240
241Platform: Ubuntu 16.04 amd64
242
243Python 3.6.1
244^^^^^^^^^^^^
245
246+-----------------+------------------+
247|                 | Request (call/s) |
248+=================+==================+
249| chardet v3.0.2  |       0.35       |
250+-----------------+------------------+
251| cchardet v2.0.1 |     1467.77      |
252+-----------------+------------------+
253
254
255LICENSE
256-------
257
258See **COPYING** file.
259
260Contact
261-------
262
263- `Issues`_
264
265
266.. _uchardet: https://github.com/PyYoshi/uchardet
267.. _Issues: https://github.com/PyYoshi/cChardet/issues?page=1&state=open
268
269Platform
270--------
271
272Support
273~~~~~~~
274
275- Windows i686, x86_64
276- Linux i686, x86_64
277- macOS x86_64
278
279Do not Support
280~~~~~~~~~~~~~~
281
282- `Anaconda`_
283- `pyenv`_
284
285.. _Anaconda: https://www.anaconda.com/
286.. _pyenv: https://github.com/pyenv/pyenv
287