• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

email_validator/H19-May-2020-537315

email_validator.egg-info/H03-May-2022-436352

LICENSEH A D27-Oct-20146.9 KiB122109

MANIFEST.inH A D30-Apr-202053 32

PKG-INFOH A D19-May-202020.2 KiB436352

README.mdH A D19-May-202016 KiB412329

setup.cfgH A D19-May-2020192 1913

setup.pyH A D19-May-20201.5 KiB5037

README.md

1email\_validator
2================
3
4A robust email address syntax and deliverability validation library for
5Python 2.7/3.4+ by [Joshua Tauberer](https://razor.occams.info).
6
7This library validates that a string is of the form `x@y.com`. This is
8the sort of validation you would want for an email-based login form on
9a website.
10
11Key features:
12
13* Good for validating email addresses used for logins/identity.
14* Friendly error messages when validation fails (appropriate to show
15  to end users).
16* (optionally) Checks deliverability: Does the domain name resolve?
17* Supports internationalized domain names and (optionally)
18  internationalized local parts.
19* Normalizes email addresses (super important for internationalized
20  addresses! see below).
21
22The library is NOT for validation of the To: line in an email message
23(e.g. `My Name <my@address.com>`), which
24[flanker](https://github.com/mailgun/flanker) is more appropriate for.
25And this library does NOT permit obsolete forms of email addresses, so
26if you need strict validation against the email specs exactly, use
27[pyIsEmail](https://github.com/michaelherold/pyIsEmail).
28
29This library was first published in 2015. The current version is 1.1.1
30(posted May 19, 2020). **Starting in version 1.1.0, the type of the value returned
31from `validate_email` has changed, but dict-style access to the validated
32address information still works, so it is backwards compatible.**
33
34Installation
35------------
36
37This package [is on PyPI](https://pypi.org/project/email-validator/), so:
38
39```sh
40pip install email_validator
41```
42
43`pip3` also works.
44
45Usage
46-----
47
48If you're validating a user's email address before creating a user
49account, you might do this:
50
51```python
52from email_validator import validate_email, EmailNotValidError
53
54email = "my+address@mydomain.tld"
55
56try:
57  # Validate.
58  valid = validate_email(email)
59
60  # Update with the normalized form.
61  email = valid.email
62except EmailNotValidError as e:
63  # email is not valid, exception message is human-readable
64  print(str(e))
65```
66
67This validates the address and gives you its normalized form. You should
68put the normalized form in your database and always normalize before
69checking if an address is in your database.
70
71The validator will accept internationalized email addresses, but email
72addresses with non-ASCII characters in the *local* part of the address
73(before the @-sign) require the
74[SMTPUTF8](https://tools.ietf.org/html/rfc6531) extension which may not
75be supported by your mail submission library or your outbound mail
76server. If you know ahead of time that SMTPUTF8 is not supported then
77**add the keyword argument allow\_smtputf8=False to fail validation for
78addresses that would require SMTPUTF8**:
79
80```python
81valid = validate_email(email, allow_smtputf8=False)
82```
83
84Overview
85--------
86
87The module provides a single function `validate_email(email_address)` which
88takes an email address (either a `str` or ASCII `bytes`) and:
89
90- Raises a `EmailNotValidError` with a helpful, human-readable error
91  message explaining why the email address is not valid, or
92- Returns an object with a normalized form of the email address and
93  other information about it.
94
95When an email address is not valid, `validate_email` raises either an
96`EmailSyntaxError` if the form of the address is invalid or an
97`EmailUndeliverableError` if the domain name does not resolve. Both
98exception classes are subclasses of `EmailNotValidError`, which in turn
99is a subclass of `ValueError`.
100
101But when an email address is valid, an object is returned containing
102a normalized form of the email address (which you should use!) and
103other information.
104
105The validator doesn't permit obsoleted forms of email addresses that no
106one uses anymore even though they are still valid and deliverable, since
107they will probably give you grief if you're using email for login. (See
108later in the document about that.)
109
110The validator checks that the domain name in the email address resolves.
111There is nothing to be gained by trying to actually contact an SMTP
112server, so that's not done here. For privacy, security, and practicality
113reasons servers are good at not giving away whether an address is
114deliverable or not: email addresses that appear to accept mail at first
115can bounce mail after a delay, and bounced mail may indicate a temporary
116failure of a good email address (sometimes an intentional failure, like
117greylisting).
118
119The function also accepts the following keyword arguments (default as
120shown):
121
122`allow_smtputf8=True`: Set to `False` to prohibit internationalized addresses that would
123    require the
124    [SMTPUTF8](https://tools.ietf.org/html/rfc6531) extension.
125
126`check_deliverability=True`: Set to `False` to skip the domain name resolution check.
127
128`allow_empty_local=False`: Set to `True` to allow an empty local part (i.e.
129    `@example.com`), e.g. for validating Postfix aliases.
130
131Internationalized email addresses
132---------------------------------
133
134The email protocol SMTP and the domain name system DNS have historically
135only allowed ASCII characters in email addresses and domain names,
136respectively. Each has adapted to internationalization in a separate
137way, creating two separate aspects to email address
138internationalization.
139
140### Internationalized domain names (IDN)
141
142The first is [internationalized domain names (RFC
1435891)](https://tools.ietf.org/html/rfc5891), a.k.a IDNA 2008. The DNS
144system has not been updated with Unicode support. Instead, internationalized
145domain names are converted into a special IDNA ASCII "[Punycode](https://www.rfc-editor.org/rfc/rfc3492.txt)"
146form starting with `xn--`. When an email address has non-ASCII
147characters in its domain part, the domain part is replaced with its IDNA
148ASCII equivalent form in the process of mail transmission. Your mail
149submission library probably does this for you transparently. Note that
150most web browsers are currently in transition between IDNA 2003 (RFC
1513490) and IDNA 2008 (RFC 5891) and [compliance around the web is not
152very
153good](http://archives.miloush.net/michkap/archive/2012/02/27/10273315.html)
154in any case, so be aware that edge cases are handled differently by
155different applications and libraries. This library conforms to IDNA 2008
156using the [idna](https://github.com/kjd/idna) module by Kim Davies.
157
158### Internationalized local parts
159
160The second sort of internationalization is internationalization in the
161*local* part of the address (before the @-sign). These email addresses
162require that your mail submission library and the mail servers along the
163route to the destination, including your own outbound mail server, all
164support the [SMTPUTF8 (RFC 6531)](https://tools.ietf.org/html/rfc6531)
165extension. Support for SMTPUTF8 varies.
166
167### If you know ahead of time that SMTPUTF8 is not supported by your mail submission stack
168
169By default all internationalized forms are accepted by the validator.
170But if you know ahead of time that SMTPUTF8 is not supported by your
171mail submission stack, then you must filter out addresses that require
172SMTPUTF8 using the `allow_smtputf8=False` keyword argument (see above).
173This will cause the validation function to raise a `EmailSyntaxError` if
174delivery would require SMTPUTF8. That's just in those cases where
175non-ASCII characters appear before the @-sign. If you do not set
176`allow_smtputf8=False`, you can also check the value of the `smtputf8`
177field in the returned object.
178
179If your mail submission library doesn't support Unicode at all --- even
180in the domain part of the address --- then immediately prior to mail
181submission you must replace the email address with its ASCII-ized form.
182This library gives you back the ASCII-ized form in the `ascii_email`
183field in the returned object, which you can get like this:
184
185```python
186valid = validate_email(email, allow_smtputf8=False)
187email = valid.ascii_email
188```
189
190The local part is left alone (if it has internationalized characters
191`allow_smtputf8=False` will force validation to fail) and the domain
192part is converted to [IDNA ASCII](https://tools.ietf.org/html/rfc5891).
193(You probably should not do this at account creation time so you don't
194change the user's login information without telling them.)
195
196### UCS-4 support required for Python 2.7
197
198Note that when using Python 2.7, it is required that it was built with
199UCS-4 support (see
200[here](https://stackoverflow.com/questions/29109944/python-returns-length-of-2-for-single-unicode-character-string));
201otherwise emails with unicode characters outside of the BMP (Basic
202Multilingual Plane) will not validate correctly.
203
204Normalization
205-------------
206
207The use of Unicode in email addresses introduced a normalization
208problem. Different Unicode strings can look identical and have the same
209semantic meaning to the user. The `email` field returned on successful
210validation provides the correctly normalized form of the given email
211address:
212
213```python
214valid = validate_email("me@Domain.com")
215email = valid.ascii_email
216print(email)
217# prints: me@domain.com
218```
219
220Because an end-user might type their email address in different (but
221equivalent) un-normalized forms at different times, you ought to
222replace what they enter with the normalized form immediately prior to
223going into your database (during account creation), querying your database
224(during login), or sending outbound mail. Normalization may also change
225the length of an email address, and this may affect whether it is valid
226and acceptable by your SMTP provider.
227
228The normalizations include lowercasing the domain part of the email
229address (domain names are case-insensitive), [Unicode "NFC"
230normalization](https://en.wikipedia.org/wiki/Unicode_equivalence) of the
231whole address (which turns characters plus [combining
232characters](https://en.wikipedia.org/wiki/Combining_character) into
233precomposed characters where possible and replaces certain Unicode
234characters (such as angstrom and ohm) with other equivalent code points
235(a-with-ring and omega, respectively)), replacement of [fullwidth and
236halfwidth
237characters](https://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms)
238in the domain part, and possibly other
239[UTS46](http://unicode.org/reports/tr46) mappings on the domain part.
240
241(See [RFC 6532 (internationalized email) section
2423.1](https://tools.ietf.org/html/rfc6532#section-3.1) and [RFC 5895
243(IDNA 2008) section 2](http://www.ietf.org/rfc/rfc5895.txt).)
244
245Examples
246--------
247
248For the email address `test@joshdata.me`, the returned object is:
249
250```python
251ValidatedEmail(
252  email='test@joshdata.me',
253  local_part='test',
254  domain='joshdata.me',
255  ascii_email='test@joshdata.me',
256  ascii_local_part='test',
257  ascii_domain='joshdata.me',
258  smtputf8=False,
259  mx=[(10, 'box.occams.info')],
260  mx_fallback_type=None)
261```
262
263For the fictitious address `example@ツ.life`, which has an
264internationalized domain but ASCII local part, the returned object is:
265
266```python
267ValidatedEmail(
268  email='example@ツ.life',
269  local_part='example',
270  domain='ツ.life',
271  ascii_email='example@xn--bdk.life',
272  ascii_local_part='example',
273  ascii_domain='xn--bdk.life',
274  smtputf8=False)
275
276```
277
278Note that `smtputf8` is `False` even though the domain part is
279internationalized because
280[SMTPUTF8](https://tools.ietf.org/html/rfc6531) is only needed if the
281local part of the address is internationalized (the domain part can be
282converted to IDNA ASCII Punycode). Also note that the `email` and `domain`
283fields provide a normalized form of the email address and domain name
284(casefolding and Unicode normalization as required by IDNA 2008).
285
286For the fictitious address `ツ-test@joshdata.me`, which has an
287internationalized local part, the returned object is:
288
289```python
290ValidatedEmail(
291  email='ツ-test@joshdata.me',
292  local_part='ツ-test',
293  domain='joshdata.me',
294  ascii_email=None,
295  ascii_local_part=None,
296  ascii_domain='joshdata.me',
297  smtputf8=True)
298```
299
300Now `smtputf8` is `True` and `ascii_email` is `None` because the local
301part of the address is internationalized. The `local_part` and `email` fields
302return the normalized form of the address: certain Unicode characters
303(such as angstrom and ohm) may be replaced by other equivalent code
304points (a-with-ring and omega).
305
306Return value
307------------
308
309When an email address passes validation, the fields in the returned object
310are:
311
312`email`: The canonical form of the email address, mostly useful for
313    display purposes. This merely combines the `local_part` and `domain`
314    fields (see below).
315
316`ascii_email`: If set, an ASCII-only form of the email address by replacing the
317    domain part with [IDNA](https://tools.ietf.org/html/rfc5891)
318    [Punycode](https://www.rfc-editor.org/rfc/rfc3492.txt).
319    This field will be present when an ASCII-only form of the email
320    address exists (including if the email address is already ASCII). If
321    the local part of the email address contains internationalized
322    characters, `ascii_email` will be `None`. If set, it merely combines
323    `ascii_local_part` and `ascii_domain`.
324
325`local_part`: The local part of the given email address (before the @-sign) with
326    Unicode NFC normalization applied.
327
328`ascii_local_part`: If set, the local part, which is composed of ASCII characters only.
329
330`domain`: The canonical internationalized Unicode form of the domain part of the
331    email address. If the returned string contains non-ASCII characters, either the
332    [SMTPUTF8](https://tools.ietf.org/html/rfc6531) feature of your
333    mail relay will be required to transmit the message or else the
334    email address's domain part must be converted to IDNA ASCII first: Use
335    `ascii_domain` field instead.
336
337`ascii_domain`: The [IDNA](https://tools.ietf.org/html/rfc5891)
338    [Punycode](https://www.rfc-editor.org/rfc/rfc3492.txt)-encoded
339    form of the domain part of the given email address, as
340    it would be transmitted on the wire.
341
342`smtputf8`: A boolean indicating that the
343    [SMTPUTF8](https://tools.ietf.org/html/rfc6531) feature of your
344    mail relay will be required to transmit messages to this address
345    because the local part of the address has non-ASCII characters (the
346    local part cannot be IDNA-encoded). If `allow_smtputf8=False` is
347    passed as an argument, this flag will always be false because an
348    exception is raised if it would have been true.
349
350`mx`: A list of (priority, domain) tuples of MX records specified in the
351    DNS for the domain (see [RFC 5321 section
352    5](https://tools.ietf.org/html/rfc5321#section-5)). May be `None` if
353    the deliverability check could not be completed because of a temporary
354    issue like a timeout.
355
356`mx_fallback_type`: `None` if an `MX` record is found. If no MX records are actually
357    specified in DNS and instead are inferred, through an obsolete
358    mechanism, from A or AAAA records, the value is the type of DNS
359    record used instead (`A` or `AAAA`). May be `None` if the deliverability check
360    could not be completed because of a temporary issue like a timeout.
361
362Assumptions
363-----------
364
365By design, this validator does not pass all email addresses that
366strictly conform to the standards. Many email address forms are obsolete
367or likely to cause trouble:
368
369* The validator assumes the email address is intended to be
370  deliverable on the public Internet using DNS, and so the domain part
371  of the email address must be a resolvable domain name.
372* The "quoted string" form of the local part of the email address (RFC
373  5321 4.1.2) is not permitted --- no one uses this anymore anyway.
374  Quoted forms allow multiple @-signs, space characters, and other
375  troublesome conditions.
376* The "literal" form for the domain part of an email address (an
377  IP address) is not accepted --- no one uses this anymore anyway.
378
379Testing
380-------
381
382Tests can be run using
383
384```sh
385pip install -r test_requirements.txt
386make test
387```
388
389For Project Maintainers
390-----------------------
391
392The package is distributed as a universal wheel and as a source package.
393
394To release:
395
396* Update the version number.
397* Follow the steps below to publish source and a universal wheel to pypi.
398* Make a release at https://github.com/JoshData/python-email-validator/releases/new.
399
400```sh
401pip3 install twine
402rm -rf dist
403python3 setup.py sdist
404python3 setup.py bdist_wheel
405twine upload dist/*
406git tag v1.0.XXX # replace with version in setup.py
407git push --tags
408```
409
410Notes: The wheel is specified as universal in the file `setup.cfg` by the `universal = 1` key in the
411`[bdist_wheel]` section.
412