1confusable_homoglyphs `[doc] <http://confusable-homoglyphs.readthedocs.io/en/latest/>`__
2========================================================================================
3
4.. image:: https://img.shields.io/travis/vhf/confusable_homoglyphs.svg
5        :target: https://travis-ci.org/vhf/confusable_homoglyphs
6
7.. image:: https://img.shields.io/pypi/v/confusable_homoglyphs.svg
8        :target: https://pypi.python.org/pypi/confusable_homoglyphs
9
10.. image:: https://readthedocs.org/projects/confusable_homoglyphs/badge/?version=latest
11        :target: http://confusable-homoglyphs.readthedocs.io/en/latest/
12        :alt: Documentation Status
13
14*a homoglyph is one of two or more graphemes, characters, or glyphs with
15shapes that appear identical or very similar*
16`wikipedia:Homoglyph <https://en.wikipedia.org/wiki/Homoglyph>`__
17
18Unicode homoglyphs can be a nuisance on the web. Your most popular
19client, AlaskaJazz, might be upset to be impersonated by a trickster who
20deliberately chose the username ΑlaskaJazz.
21
22-  ``AlaskaJazz`` is single script: only Latin characters.
23-  ``ΑlaskaJazz`` is mixed-script: the first character is a greek
24   letter.
25
26You might also want to avoid people being tricked into entering their
27password on ``www.microsоft.com`` or ``www.faϲebook.com`` instead of
28``www.microsoft.com`` or ``www.facebook.com``. `Here is a
29utility <http://unicode.org/cldr/utility/confusables.jsp>`__ to play
30with these **confusable homoglyphs**.
31
32Not all mixed-script strings have to be ruled out though, you could only
33exclude mixed-script strings containing characters that might be
34confused with a character from some unicode blocks of your choosing.
35
36-  ``Allo`` and ``ρττ`` are fine: single script.
37-  ``AlloΓ`` is fine when our preferred script alias is 'latin': mixed script, but ``Γ`` is not confusable.
38-  ``Alloρ`` is dangerous: mixed script and ``ρ`` could be confused with
39   ``p``.
40
41This library is compatible Python 2 and Python 3.
42
43`API documentation <http://confusable-homoglyphs.readthedocs.io/en/latest/apidocumentation.html>`__
44---------------------------------------------------------------------------------------------------
45
46Is the data up to date?
47-----------------------
48
49Yep.
50
51The unicode blocks aliases and names for each character are extracted
52from `this file <http://www.unicode.org/Public/UNIDATA/Scripts.txt>`__
53provided by the unicode consortium.
54
55The matrix of which character can be confused with which other
56characters is built using `this
57file <http://www.unicode.org/Public/security/latest/confusables.txt>`__
58provided by the unicode consortium.
59
60This data is stored in two JSON files: ``categories.json`` and
61``confusables.json``. If you delete them, they will both be recreated by
62downloading and parsing the two abovementioned files and stored as JSON
63files again.
64