Name | Date | Size | #Lines | LOC | ||
---|---|---|---|---|---|---|
.. | 03-May-2022 | - | ||||
bin/ | H | 27-Oct-2020 | - | 45 | 35 | |
src/ | H | 27-Oct-2020 | - | 28,097 | 18,598 | |
CHANGES.rst | H A D | 27-Oct-2020 | 2.3 KiB | 121 | 77 | |
COPYING | H A D | 27-Oct-2020 | 68.9 KiB | 1,317 | 1,097 | |
MANIFEST.in | H A D | 27-Oct-2020 | 162 | 6 | 5 | |
PKG-INFO | H A D | 27-Oct-2020 | 10.7 KiB | 429 | 294 | |
README.rst | H A D | 27-Oct-2020 | 4.3 KiB | 287 | 196 | |
setup.cfg | H A D | 27-Oct-2020 | 77 | 9 | 6 | |
setup.py | H A D | 27-Oct-2020 | 5.9 KiB | 150 | 130 |
README.rst
1cChardet 2======== 3 4cChardet is high speed universal character encoding detector. - binding to `uchardet`_. 5 6.. image:: https://badge.fury.io/py/cchardet.svg 7 :target: https://badge.fury.io/py/cchardet 8 :alt: PyPI version 9 10.. image:: https://github.com/PyYoshi/cChardet/workflows/Build%20for%20Linux/badge.svg?branch=master 11 :target: https://github.com/PyYoshi/cChardet/actions?query=workflow%3A%22Build+for+Linux%22 12 :alt: Build for Linux 13 14.. image:: https://github.com/PyYoshi/cChardet/workflows/Build%20for%20macOS/badge.svg?branch=master 15 :target: https://github.com/PyYoshi/cChardet/actions?query=workflow%3A%22Build+for+macOS%22 16 :alt: Build for macOS 17 18.. image:: https://github.com/PyYoshi/cChardet/workflows/Build%20for%20windows/badge.svg?branch=master 19 :target: https://github.com/PyYoshi/cChardet/actions?query=workflow%3A%22Build+for+windows%22 20 :alt: Build for Windows 21 22Supported Languages/Encodings 23----------------------------- 24 25- International (Unicode) 26 27 - UTF-8 28 - UTF-16BE / UTF-16LE 29 - UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / 30 X-ISO-10646-UCS-4-21431 31 32- Arabic 33 34 - ISO-8859-6 35 - WINDOWS-1256 36 37- Bulgarian 38 39 - ISO-8859-5 40 - WINDOWS-1251 41 42- Chinese 43 44 - ISO-2022-CN 45 - BIG5 46 - EUC-TW 47 - GB18030 48 - HZ-GB-2312 49 50- Croatian: 51 52 - ISO-8859-2 53 - ISO-8859-13 54 - ISO-8859-16 55 - Windows-1250 56 - IBM852 57 - MAC-CENTRALEUROPE 58 59- Czech 60 61 - Windows-1250 62 - ISO-8859-2 63 - IBM852 64 - MAC-CENTRALEUROPE 65 66- Danish 67 68 - ISO-8859-1 69 - ISO-8859-15 70 - WINDOWS-1252 71 72- English 73 74 - ASCII 75 76- Esperanto 77 78 - ISO-8859-3 79 80- Estonian 81 82 - ISO-8859-4 83 - ISO-8859-13 84 - ISO-8859-13 85 - Windows-1252 86 - Windows-1257 87 88- Finnish 89 90 - ISO-8859-1 91 - ISO-8859-4 92 - ISO-8859-9 93 - ISO-8859-13 94 - ISO-8859-15 95 - WINDOWS-1252 96 97- French 98 99 - ISO-8859-1 100 - ISO-8859-15 101 - WINDOWS-1252 102 103- German 104 105 - ISO-8859-1 106 - WINDOWS-1252 107 108- Greek 109 110 - ISO-8859-7 111 - WINDOWS-1253 112 113- Hebrew 114 115 - ISO-8859-8 116 - WINDOWS-1255 117 118- Hungarian: 119 120 - ISO-8859-2 121 - WINDOWS-1250 122 123- Irish Gaelic 124 125 - ISO-8859-1 126 - ISO-8859-9 127 - ISO-8859-15 128 - WINDOWS-1252 129 130- Italian 131 132 - ISO-8859-1 133 - ISO-8859-3 134 - ISO-8859-9 135 - ISO-8859-15 136 - WINDOWS-1252 137 138- Japanese 139 140 - ISO-2022-JP 141 - SHIFT\_JIS 142 - EUC-JP 143 144- Korean 145 146 - ISO-2022-KR 147 - EUC-KR / UHC 148 149- Lithuanian 150 151 - ISO-8859-4 152 - ISO-8859-10 153 - ISO-8859-13 154 155- Latvian 156 157 - ISO-8859-4 158 - ISO-8859-10 159 - ISO-8859-13 160 161- Maltese 162 163 - ISO-8859-3 164 165- Polish: 166 167 - ISO-8859-2 168 - ISO-8859-13 169 - ISO-8859-16 170 - Windows-1250 171 - IBM852 172 - MAC-CENTRALEUROPE 173 174- Portuguese 175 176 - ISO-8859-1 177 - ISO-8859-9 178 - ISO-8859-15 179 - WINDOWS-1252 180 181- Romanian: 182 183 - ISO-8859-2 184 - ISO-8859-16 185 - Windows-1250 186 - IBM852 187 188- Russian 189 190 - ISO-8859-5 191 - KOI8-R 192 - WINDOWS-1251 193 - MAC-CYRILLIC 194 - IBM866 195 - IBM855 196 197- Slovak 198 199 - Windows-1250 200 - ISO-8859-2 201 - IBM852 202 - MAC-CENTRALEUROPE 203 204- Slovene 205 206 - ISO-8859-2 207 - ISO-8859-16 208 - Windows-1250 209 - IBM852 210 - M 211 212Example 213------- 214 215.. code-block:: python 216 217 # -*- coding: utf-8 -*- 218 import cchardet as chardet 219 with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f: 220 msg = f.read() 221 result = chardet.detect(msg) 222 print(result) 223 224Benchmark 225--------- 226 227.. code-block:: bash 228 229 $ cd src/ 230 $ pip install chardet 231 $ python tests/bench.py 232 233 234Results 235~~~~~~~ 236 237CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz 238 239RAM: DDR3 1600Mhz 16GB 240 241Platform: Ubuntu 16.04 amd64 242 243Python 3.6.1 244^^^^^^^^^^^^ 245 246+-----------------+------------------+ 247| | Request (call/s) | 248+=================+==================+ 249| chardet v3.0.2 | 0.35 | 250+-----------------+------------------+ 251| cchardet v2.0.1 | 1467.77 | 252+-----------------+------------------+ 253 254 255LICENSE 256------- 257 258See **COPYING** file. 259 260Contact 261------- 262 263- `Issues`_ 264 265 266.. _uchardet: https://github.com/PyYoshi/uchardet 267.. _Issues: https://github.com/PyYoshi/cChardet/issues?page=1&state=open 268 269Platform 270-------- 271 272Support 273~~~~~~~ 274 275- Windows i686, x86_64 276- Linux i686, x86_64 277- macOS x86_64 278 279Do not Support 280~~~~~~~~~~~~~~ 281 282- `Anaconda`_ 283- `pyenv`_ 284 285.. _Anaconda: https://www.anaconda.com/ 286.. _pyenv: https://github.com/pyenv/pyenv 287