1:mod:`email.parser`: Parsing email messages 2------------------------------------------- 3 4.. module:: email.parser 5 :synopsis: Parse flat text email messages to produce a message object structure. 6 7 8Message object structures can be created in one of two ways: they can be created 9from whole cloth by instantiating :class:`~email.message.Message` objects and 10stringing them together via :meth:`~email.message.Message.attach` and 11:meth:`~email.message.Message.set_payload` calls, or they 12can be created by parsing a flat text representation of the email message. 13 14The :mod:`email` package provides a standard parser that understands most email 15document structures, including MIME documents. You can pass the parser a string 16or a file object, and the parser will return to you the root 17:class:`~email.message.Message` instance of the object structure. For simple, 18non-MIME messages the payload of this root object will likely be a string 19containing the text of the message. For MIME messages, the root object will 20return ``True`` from its :meth:`~email.message.Message.is_multipart` method, and 21the subparts can be accessed via the :meth:`~email.message.Message.get_payload` 22and :meth:`~email.message.Message.walk` methods. 23 24There are actually two parser interfaces available for use, the classic 25:class:`Parser` API and the incremental :class:`FeedParser` API. The classic 26:class:`Parser` API is fine if you have the entire text of the message in memory 27as a string, or if the entire message lives in a file on the file system. 28:class:`FeedParser` is more appropriate for when you're reading the message from 29a stream which might block waiting for more input (e.g. reading an email message 30from a socket). The :class:`FeedParser` can consume and parse the message 31incrementally, and only returns the root object when you close the parser [#]_. 32 33Note that the parser can be extended in limited ways, and of course you can 34implement your own parser completely from scratch. There is no magical 35connection between the :mod:`email` package's bundled parser and the 36:class:`~email.message.Message` class, so your custom parser can create message 37object trees any way it finds necessary. 38 39 40FeedParser API 41^^^^^^^^^^^^^^ 42 43.. versionadded:: 2.4 44 45The :class:`FeedParser`, imported from the :mod:`email.feedparser` module, 46provides an API that is conducive to incremental parsing of email messages, such 47as would be necessary when reading the text of an email message from a source 48that can block (e.g. a socket). The :class:`FeedParser` can of course be used 49to parse an email message fully contained in a string or a file, but the classic 50:class:`Parser` API may be more convenient for such use cases. The semantics 51and results of the two parser APIs are identical. 52 53The :class:`FeedParser`'s API is simple; you create an instance, feed it a bunch 54of text until there's no more to feed it, then close the parser to retrieve the 55root message object. The :class:`FeedParser` is extremely accurate when parsing 56standards-compliant messages, and it does a very good job of parsing 57non-compliant messages, providing information about how a message was deemed 58broken. It will populate a message object's *defects* attribute with a list of 59any problems it found in a message. See the :mod:`email.errors` module for the 60list of defects that it can find. 61 62Here is the API for the :class:`FeedParser`: 63 64 65.. class:: FeedParser([_factory]) 66 67 Create a :class:`FeedParser` instance. Optional *_factory* is a no-argument 68 callable that will be called whenever a new message object is needed. It 69 defaults to the :class:`email.message.Message` class. 70 71 72 .. method:: feed(data) 73 74 Feed the :class:`FeedParser` some more data. *data* should be a string 75 containing one or more lines. The lines can be partial and the 76 :class:`FeedParser` will stitch such partial lines together properly. The 77 lines in the string can have any of the common three line endings, 78 carriage return, newline, or carriage return and newline (they can even be 79 mixed). 80 81 82 .. method:: close() 83 84 Closing a :class:`FeedParser` completes the parsing of all previously fed 85 data, and returns the root message object. It is undefined what happens 86 if you feed more data to a closed :class:`FeedParser`. 87 88 89Parser class API 90^^^^^^^^^^^^^^^^ 91 92The :class:`Parser` class, imported from the :mod:`email.parser` module, 93provides an API that can be used to parse a message when the complete contents 94of the message are available in a string or file. The :mod:`email.parser` 95module also provides a second class, called :class:`HeaderParser` which can be 96used if you're only interested in the headers of the message. 97:class:`HeaderParser` can be much faster in these situations, since it does not 98attempt to parse the message body, instead setting the payload to the raw body 99as a string. :class:`HeaderParser` has the same API as the :class:`Parser` 100class. 101 102 103.. class:: Parser([_class]) 104 105 The constructor for the :class:`Parser` class takes an optional argument 106 *_class*. This must be a callable factory (such as a function or a class), and 107 it is used whenever a sub-message object needs to be created. It defaults to 108 :class:`~email.message.Message` (see :mod:`email.message`). The factory will 109 be called without arguments. 110 111 The optional *strict* flag is ignored. 112 113 .. deprecated:: 2.4 114 Because the :class:`Parser` class is a backward compatible API wrapper 115 around the new-in-Python 2.4 :class:`FeedParser`, *all* parsing is 116 effectively non-strict. You should simply stop passing a *strict* flag to 117 the :class:`Parser` constructor. 118 119 .. versionchanged:: 2.2.2 120 The *strict* flag was added. 121 122 .. versionchanged:: 2.4 123 The *strict* flag was deprecated. 124 125 The other public :class:`Parser` methods are: 126 127 128 .. method:: parse(fp[, headersonly]) 129 130 Read all the data from the file-like object *fp*, parse the resulting 131 text, and return the root message object. *fp* must support both the 132 :meth:`~io.TextIOBase.readline` and the :meth:`~io.TextIOBase.read` 133 methods on file-like objects. 134 135 The text contained in *fp* must be formatted as a block of :rfc:`2822` 136 style headers and header continuation lines, optionally preceded by an 137 envelope header. The header block is terminated either by the end of the 138 data or by a blank line. Following the header block is the body of the 139 message (which may contain MIME-encoded subparts). 140 141 Optional *headersonly* is a flag specifying whether to stop parsing after 142 reading the headers or not. The default is ``False``, meaning it parses 143 the entire contents of the file. 144 145 .. versionchanged:: 2.2.2 146 The *headersonly* flag was added. 147 148 149 .. method:: parsestr(text[, headersonly]) 150 151 Similar to the :meth:`parse` method, except it takes a string object 152 instead of a file-like object. Calling this method on a string is exactly 153 equivalent to wrapping *text* in a :class:`~StringIO.StringIO` instance first and 154 calling :meth:`parse`. 155 156 Optional *headersonly* is as with the :meth:`parse` method. 157 158 .. versionchanged:: 2.2.2 159 The *headersonly* flag was added. 160 161Since creating a message object structure from a string or a file object is such 162a common task, two functions are provided as a convenience. They are available 163in the top-level :mod:`email` package namespace. 164 165.. currentmodule:: email 166 167.. function:: message_from_string(s[, _class[, strict]]) 168 169 Return a message object structure from a string. This is exactly equivalent to 170 ``Parser().parsestr(s)``. Optional *_class* and *strict* are interpreted as 171 with the :class:`~email.parser.Parser` class constructor. 172 173 .. versionchanged:: 2.2.2 174 The *strict* flag was added. 175 176 177.. function:: message_from_file(fp[, _class[, strict]]) 178 179 Return a message object structure tree from an open file object. This is 180 exactly equivalent to ``Parser().parse(fp)``. Optional *_class* and *strict* 181 are interpreted as with the :class:`~email.parser.Parser` class constructor. 182 183 .. versionchanged:: 2.2.2 184 The *strict* flag was added. 185 186Here's an example of how you might use this at an interactive Python prompt:: 187 188 >>> import email 189 >>> msg = email.message_from_string(myString) 190 191 192Additional notes 193^^^^^^^^^^^^^^^^ 194 195Here are some notes on the parsing semantics: 196 197* Most non-\ :mimetype:`multipart` type messages are parsed as a single message 198 object with a string payload. These objects will return ``False`` for 199 :meth:`~email.message.Message.is_multipart`. Their 200 :meth:`~email.message.Message.get_payload` method will return a string object. 201 202* All :mimetype:`multipart` type messages will be parsed as a container message 203 object with a list of sub-message objects for their payload. The outer 204 container message will return ``True`` for 205 :meth:`~email.message.Message.is_multipart` and their 206 :meth:`~email.message.Message.get_payload` method will return the list of 207 :class:`~email.message.Message` subparts. 208 209* Most messages with a content type of :mimetype:`message/\*` (e.g. 210 :mimetype:`message/delivery-status` and :mimetype:`message/rfc822`) will also be 211 parsed as container object containing a list payload of length 1. Their 212 :meth:`~email.message.Message.is_multipart` method will return ``True``. 213 The single element in the list payload will be a sub-message object. 214 215* Some non-standards compliant messages may not be internally consistent about 216 their :mimetype:`multipart`\ -edness. Such messages may have a 217 :mailheader:`Content-Type` header of type :mimetype:`multipart`, but their 218 :meth:`~email.message.Message.is_multipart` method may return ``False``. 219 If such messages were parsed with the :class:`~email.parser.FeedParser`, 220 they will have an instance of the 221 :class:`~email.errors.MultipartInvariantViolationDefect` class in their 222 *defects* attribute list. See :mod:`email.errors` for details. 223 224.. rubric:: Footnotes 225 226.. [#] As of email package version 3.0, introduced in Python 2.4, the classic 227 :class:`~email.parser.Parser` was re-implemented in terms of the 228 :class:`~email.parser.FeedParser`, so the semantics and results are 229 identical between the two parsers. 230 231