1:mod:`email.parser`: Parsing email messages
2-------------------------------------------
3
4.. module:: email.parser
5   :synopsis: Parse flat text email messages to produce a message object structure.
6
7**Source code:** :source:`Lib/email/parser.py`
8
9--------------
10
11Message object structures can be created in one of two ways: they can be
12created from whole cloth by creating an :class:`~email.message.EmailMessage`
13object, adding headers using the dictionary interface, and adding payload(s)
14using :meth:`~email.message.EmailMessage.set_content` and related methods, or
15they can be created by parsing a serialized representation of the email
16message.
17
18The :mod:`email` package provides a standard parser that understands most email
19document structures, including MIME documents.  You can pass the parser a
20bytes, string or file object, and the parser will return to you the root
21:class:`~email.message.EmailMessage` instance of the object structure.  For
22simple, non-MIME messages the payload of this root object will likely be a
23string containing the text of the message.  For MIME messages, the root object
24will return ``True`` from its :meth:`~email.message.EmailMessage.is_multipart`
25method, and the subparts can be accessed via the payload manipulation methods,
26such as :meth:`~email.message.EmailMessage.get_body`,
27:meth:`~email.message.EmailMessage.iter_parts`, and
28:meth:`~email.message.EmailMessage.walk`.
29
30There are actually two parser interfaces available for use, the :class:`Parser`
31API and the incremental :class:`FeedParser` API.  The :class:`Parser` API is
32most useful if you have the entire text of the message in memory, or if the
33entire message lives in a file on the file system.  :class:`FeedParser` is more
34appropriate when you are reading the message from a stream which might block
35waiting for more input (such as reading an email message from a socket).  The
36:class:`FeedParser` can consume and parse the message incrementally, and only
37returns the root object when you close the parser.
38
39Note that the parser can be extended in limited ways, and of course you can
40implement your own parser completely from scratch.  All of the logic that
41connects the :mod:`email` package's bundled parser and the
42:class:`~email.message.EmailMessage` class is embodied in the :mod:`policy`
43class, so a custom parser can create message object trees any way it finds
44necessary by implementing custom versions of the appropriate :mod:`policy`
45methods.
46
47
48FeedParser API
49^^^^^^^^^^^^^^
50
51The :class:`BytesFeedParser`, imported from the :mod:`email.feedparser` module,
52provides an API that is conducive to incremental parsing of email messages,
53such as would be necessary when reading the text of an email message from a
54source that can block (such as a socket).  The :class:`BytesFeedParser` can of
55course be used to parse an email message fully contained in a :term:`bytes-like
56object`, string, or file, but the :class:`BytesParser` API may be more
57convenient for such use cases.  The semantics and results of the two parser
58APIs are identical.
59
60The :class:`BytesFeedParser`'s API is simple; you create an instance, feed it a
61bunch of bytes until there's no more to feed it, then close the parser to
62retrieve the root message object.  The :class:`BytesFeedParser` is extremely
63accurate when parsing standards-compliant messages, and it does a very good job
64of parsing non-compliant messages, providing information about how a message
65was deemed broken.  It will populate a message object's
66:attr:`~email.message.EmailMessage.defects` attribute with a list of any
67problems it found in a message.  See the :mod:`email.errors` module for the
68list of defects that it can find.
69
70Here is the API for the :class:`BytesFeedParser`:
71
72
73.. class:: BytesFeedParser(_factory=None, *, policy=policy.compat32)
74
75   Create a :class:`BytesFeedParser` instance.  Optional *_factory* is a
76   no-argument callable; if not specified use the
77   :attr:`~email.policy.Policy.message_factory` from the *policy*.  Call
78   *_factory* whenever a new message object is needed.
79
80   If *policy* is specified use the rules it specifies to update the
81   representation of the message.  If *policy* is not set, use the
82   :class:`compat32 <email.policy.Compat32>` policy, which maintains backward
83   compatibility with the Python 3.2 version of the email package and provides
84   :class:`~email.message.Message` as the default factory.  All other policies
85   provide :class:`~email.message.EmailMessage` as the default *_factory*. For
86   more information on what else *policy* controls, see the
87   :mod:`~email.policy` documentation.
88
89   Note: **The policy keyword should always be specified**; The default will
90   change to :data:`email.policy.default` in a future version of Python.
91
92   .. versionadded:: 3.2
93
94   .. versionchanged:: 3.3 Added the *policy* keyword.
95   .. versionchanged:: 3.6 *_factory* defaults to the policy ``message_factory``.
96
97
98   .. method:: feed(data)
99
100      Feed the parser some more data.  *data* should be a :term:`bytes-like
101      object` containing one or more lines.  The lines can be partial and the
102      parser will stitch such partial lines together properly.  The lines can
103      have any of the three common line endings: carriage return, newline, or
104      carriage return and newline (they can even be mixed).
105
106
107   .. method:: close()
108
109      Complete the parsing of all previously fed data and return the root
110      message object.  It is undefined what happens if :meth:`~feed` is called
111      after this method has been called.
112
113
114.. class:: FeedParser(_factory=None, *, policy=policy.compat32)
115
116   Works like :class:`BytesFeedParser` except that the input to the
117   :meth:`~BytesFeedParser.feed` method must be a string.  This is of limited
118   utility, since the only way for such a message to be valid is for it to
119   contain only ASCII text or, if :attr:`~email.policy.Policy.utf8` is
120   ``True``, no binary attachments.
121
122   .. versionchanged:: 3.3 Added the *policy* keyword.
123
124
125Parser API
126^^^^^^^^^^
127
128The :class:`BytesParser` class, imported from the :mod:`email.parser` module,
129provides an API that can be used to parse a message when the complete contents
130of the message are available in a :term:`bytes-like object` or file.  The
131:mod:`email.parser` module also provides :class:`Parser` for parsing strings,
132and header-only parsers, :class:`BytesHeaderParser` and
133:class:`HeaderParser`, which can be used if you're only interested in the
134headers of the message.  :class:`BytesHeaderParser` and :class:`HeaderParser`
135can be much faster in these situations, since they do not attempt to parse the
136message body, instead setting the payload to the raw body.
137
138
139.. class:: BytesParser(_class=None, *, policy=policy.compat32)
140
141   Create a :class:`BytesParser` instance.  The *_class* and *policy*
142   arguments have the same meaning and semantics as the *_factory*
143   and *policy* arguments of :class:`BytesFeedParser`.
144
145   Note: **The policy keyword should always be specified**; The default will
146   change to :data:`email.policy.default` in a future version of Python.
147
148   .. versionchanged:: 3.3
149      Removed the *strict* argument that was deprecated in 2.4.  Added the
150      *policy* keyword.
151   .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``.
152
153
154   .. method:: parse(fp, headersonly=False)
155
156      Read all the data from the binary file-like object *fp*, parse the
157      resulting bytes, and return the message object.  *fp* must support
158      both the :meth:`~io.IOBase.readline` and the :meth:`~io.IOBase.read`
159      methods.
160
161      The bytes contained in *fp* must be formatted as a block of :rfc:`5322`
162      (or, if :attr:`~email.policy.Policy.utf8` is ``True``, :rfc:`6532`)
163      style headers and header continuation lines, optionally preceded by an
164      envelope header.  The header block is terminated either by the end of the
165      data or by a blank line.  Following the header block is the body of the
166      message (which may contain MIME-encoded subparts, including subparts
167      with a :mailheader:`Content-Transfer-Encoding` of ``8bit``).
168
169      Optional *headersonly* is a flag specifying whether to stop parsing after
170      reading the headers or not.  The default is ``False``, meaning it parses
171      the entire contents of the file.
172
173
174   .. method:: parsebytes(bytes, headersonly=False)
175
176      Similar to the :meth:`parse` method, except it takes a :term:`bytes-like
177      object` instead of a file-like object.  Calling this method on a
178      :term:`bytes-like object` is equivalent to wrapping *bytes* in a
179      :class:`~io.BytesIO` instance first and calling :meth:`parse`.
180
181      Optional *headersonly* is as with the :meth:`parse` method.
182
183   .. versionadded:: 3.2
184
185
186.. class:: BytesHeaderParser(_class=None, *, policy=policy.compat32)
187
188   Exactly like :class:`BytesParser`, except that *headersonly*
189   defaults to ``True``.
190
191   .. versionadded:: 3.3
192
193
194.. class:: Parser(_class=None, *, policy=policy.compat32)
195
196   This class is parallel to :class:`BytesParser`, but handles string input.
197
198   .. versionchanged:: 3.3
199      Removed the *strict* argument.  Added the *policy* keyword.
200   .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``.
201
202
203   .. method:: parse(fp, headersonly=False)
204
205      Read all the data from the text-mode file-like object *fp*, parse the
206      resulting text, and return the root message object.  *fp* must support
207      both the :meth:`~io.TextIOBase.readline` and the
208      :meth:`~io.TextIOBase.read` methods on file-like objects.
209
210      Other than the text mode requirement, this method operates like
211      :meth:`BytesParser.parse`.
212
213
214   .. method:: parsestr(text, headersonly=False)
215
216      Similar to the :meth:`parse` method, except it takes a string object
217      instead of a file-like object.  Calling this method on a string is
218      equivalent to wrapping *text* in a :class:`~io.StringIO` instance first
219      and calling :meth:`parse`.
220
221      Optional *headersonly* is as with the :meth:`parse` method.
222
223
224.. class:: HeaderParser(_class=None, *, policy=policy.compat32)
225
226   Exactly like :class:`Parser`, except that *headersonly*
227   defaults to ``True``.
228
229
230Since creating a message object structure from a string or a file object is such
231a common task, four functions are provided as a convenience.  They are available
232in the top-level :mod:`email` package namespace.
233
234.. currentmodule:: email
235
236
237.. function:: message_from_bytes(s, _class=None, *, policy=policy.compat32)
238
239   Return a message object structure from a :term:`bytes-like object`.  This is
240   equivalent to ``BytesParser().parsebytes(s)``.  Optional *_class* and
241   *policy* are interpreted as with the :class:`~email.parser.BytesParser` class
242   constructor.
243
244   .. versionadded:: 3.2
245   .. versionchanged:: 3.3
246      Removed the *strict* argument.  Added the *policy* keyword.
247
248
249.. function:: message_from_binary_file(fp, _class=None, *, \
250                                       policy=policy.compat32)
251
252   Return a message object structure tree from an open binary :term:`file
253   object`.  This is equivalent to ``BytesParser().parse(fp)``.  *_class* and
254   *policy* are interpreted as with the :class:`~email.parser.BytesParser` class
255   constructor.
256
257   .. versionadded:: 3.2
258   .. versionchanged:: 3.3
259      Removed the *strict* argument.  Added the *policy* keyword.
260
261
262.. function:: message_from_string(s, _class=None, *, policy=policy.compat32)
263
264   Return a message object structure from a string.  This is equivalent to
265   ``Parser().parsestr(s)``.  *_class* and *policy* are interpreted as
266   with the :class:`~email.parser.Parser` class constructor.
267
268   .. versionchanged:: 3.3
269      Removed the *strict* argument.  Added the *policy* keyword.
270
271
272.. function:: message_from_file(fp, _class=None, *, policy=policy.compat32)
273
274   Return a message object structure tree from an open :term:`file object`.
275   This is equivalent to ``Parser().parse(fp)``.  *_class* and *policy* are
276   interpreted as with the :class:`~email.parser.Parser` class constructor.
277
278   .. versionchanged:: 3.3
279      Removed the *strict* argument.  Added the *policy* keyword.
280   .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``.
281
282
283Here's an example of how you might use :func:`message_from_bytes` at an
284interactive Python prompt::
285
286   >>> import email
287   >>> msg = email.message_from_bytes(myBytes)  # doctest: +SKIP
288
289
290Additional notes
291^^^^^^^^^^^^^^^^
292
293Here are some notes on the parsing semantics:
294
295* Most non-\ :mimetype:`multipart` type messages are parsed as a single message
296  object with a string payload.  These objects will return ``False`` for
297  :meth:`~email.message.EmailMessage.is_multipart`, and
298  :meth:`~email.message.EmailMessage.iter_parts` will yield an empty list.
299
300* All :mimetype:`multipart` type messages will be parsed as a container message
301  object with a list of sub-message objects for their payload.  The outer
302  container message will return ``True`` for
303  :meth:`~email.message.EmailMessage.is_multipart`, and
304  :meth:`~email.message.EmailMessage.iter_parts` will yield a list of subparts.
305
306* Most messages with a content type of :mimetype:`message/\*` (such as
307  :mimetype:`message/delivery-status` and :mimetype:`message/rfc822`) will also
308  be parsed as container object containing a list payload of length 1.  Their
309  :meth:`~email.message.EmailMessage.is_multipart` method will return ``True``.
310  The single element yielded by :meth:`~email.message.EmailMessage.iter_parts`
311  will be a sub-message object.
312
313* Some non-standards-compliant messages may not be internally consistent about
314  their :mimetype:`multipart`\ -edness.  Such messages may have a
315  :mailheader:`Content-Type` header of type :mimetype:`multipart`, but their
316  :meth:`~email.message.EmailMessage.is_multipart` method may return ``False``.
317  If such messages were parsed with the :class:`~email.parser.FeedParser`,
318  they will have an instance of the
319  :class:`~email.errors.MultipartInvariantViolationDefect` class in their
320  *defects* attribute list.  See :mod:`email.errors` for details.
321