• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

confusable_homoglyphs/H31-Aug-2018-445340

confusable_homoglyphs.egg-info/H03-May-2022-12290

docs/H31-Aug-2018-815478

tests/H31-Aug-2018-152102

AUTHORS.rstH A D14-Oct-2017152 149

CONTRIBUTING.rstH A D13-Sep-20163.1 KiB10468

HISTORY.rstH A D14-Oct-20171 KiB3220

LICENSEH A D13-Sep-20161.1 KiB2217

MANIFEST.inH A D14-Oct-2017402 1713

PKG-INFOH A D31-Aug-20185.5 KiB12290

README.rstH A D13-Oct-20172.8 KiB6447

setup.cfgH A D31-Aug-2018265 1612

setup.pyH A D31-Aug-20182.2 KiB7259

versioneer.pyH A D13-Sep-201661 KiB1,7001,279

README.rst

1confusable_homoglyphs `[doc] <http://confusable-homoglyphs.readthedocs.io/en/latest/>`__
2========================================================================================
3
4.. image:: https://img.shields.io/travis/vhf/confusable_homoglyphs.svg
5        :target: https://travis-ci.org/vhf/confusable_homoglyphs
6
7.. image:: https://img.shields.io/pypi/v/confusable_homoglyphs.svg
8        :target: https://pypi.python.org/pypi/confusable_homoglyphs
9
10.. image:: https://readthedocs.org/projects/confusable_homoglyphs/badge/?version=latest
11        :target: http://confusable-homoglyphs.readthedocs.io/en/latest/
12        :alt: Documentation Status
13
14*a homoglyph is one of two or more graphemes, characters, or glyphs with
15shapes that appear identical or very similar*
16`wikipedia:Homoglyph <https://en.wikipedia.org/wiki/Homoglyph>`__
17
18Unicode homoglyphs can be a nuisance on the web. Your most popular
19client, AlaskaJazz, might be upset to be impersonated by a trickster who
20deliberately chose the username ΑlaskaJazz.
21
22-  ``AlaskaJazz`` is single script: only Latin characters.
23-  ``ΑlaskaJazz`` is mixed-script: the first character is a greek
24   letter.
25
26You might also want to avoid people being tricked into entering their
27password on ``www.microsоft.com`` or ``www.faϲebook.com`` instead of
28``www.microsoft.com`` or ``www.facebook.com``. `Here is a
29utility <http://unicode.org/cldr/utility/confusables.jsp>`__ to play
30with these **confusable homoglyphs**.
31
32Not all mixed-script strings have to be ruled out though, you could only
33exclude mixed-script strings containing characters that might be
34confused with a character from some unicode blocks of your choosing.
35
36-  ``Allo`` and ``ρττ`` are fine: single script.
37-  ``AlloΓ`` is fine when our preferred script alias is 'latin': mixed script, but ``Γ`` is not confusable.
38-  ``Alloρ`` is dangerous: mixed script and ``ρ`` could be confused with
39   ``p``.
40
41This library is compatible Python 2 and Python 3.
42
43`API documentation <http://confusable-homoglyphs.readthedocs.io/en/latest/apidocumentation.html>`__
44---------------------------------------------------------------------------------------------------
45
46Is the data up to date?
47-----------------------
48
49Yep.
50
51The unicode blocks aliases and names for each character are extracted
52from `this file <http://www.unicode.org/Public/UNIDATA/Scripts.txt>`__
53provided by the unicode consortium.
54
55The matrix of which character can be confused with which other
56characters is built using `this
57file <http://www.unicode.org/Public/security/latest/confusables.txt>`__
58provided by the unicode consortium.
59
60This data is stored in two JSON files: ``categories.json`` and
61``confusables.json``. If you delete them, they will both be recreated by
62downloading and parsing the two abovementioned files and stored as JSON
63files again.
64