1:mod:`email.headerregistry`: Custom Header Objects
2--------------------------------------------------
3
4.. module:: email.headerregistry
5   :synopsis: Automatic Parsing of headers based on the field name
6
7.. moduleauthor:: R. David Murray <rdmurray@bitdance.com>
8.. sectionauthor:: R. David Murray <rdmurray@bitdance.com>
9
10**Source code:** :source:`Lib/email/headerregistry.py`
11
12--------------
13
14.. versionadded:: 3.6 [1]_
15
16Headers are represented by customized subclasses of :class:`str`.  The
17particular class used to represent a given header is determined by the
18:attr:`~email.policy.EmailPolicy.header_factory` of the :mod:`~email.policy` in
19effect when the headers are created.  This section documents the particular
20``header_factory`` implemented by the email package for handling :RFC:`5322`
21compliant email messages, which not only provides customized header objects for
22various header types, but also provides an extension mechanism for applications
23to add their own custom header types.
24
25When using any of the policy objects derived from
26:data:`~email.policy.EmailPolicy`, all headers are produced by
27:class:`.HeaderRegistry` and have :class:`.BaseHeader` as their last base
28class.  Each header class has an additional base class that is determined by
29the type of the header.  For example, many headers have the class
30:class:`.UnstructuredHeader` as their other base class.  The specialized second
31class for a header is determined by the name of the header, using a lookup
32table stored in the :class:`.HeaderRegistry`.  All of this is managed
33transparently for the typical application program, but interfaces are provided
34for modifying the default behavior for use by more complex applications.
35
36The sections below first document the header base classes and their attributes,
37followed by the API for modifying the behavior of :class:`.HeaderRegistry`, and
38finally the support classes used to represent the data parsed from structured
39headers.
40
41
42.. class:: BaseHeader(name, value)
43
44   *name* and *value* are passed to ``BaseHeader`` from the
45   :attr:`~email.policy.EmailPolicy.header_factory` call.  The string value of
46   any header object is the *value* fully decoded to unicode.
47
48   This base class defines the following read-only properties:
49
50
51   .. attribute:: name
52
53      The name of the header (the portion of the field before the ':').  This
54      is exactly the value passed in the
55      :attr:`~email.policy.EmailPolicy.header_factory` call for *name*; that
56      is, case is preserved.
57
58
59   .. attribute:: defects
60
61      A tuple of :exc:`~email.errors.HeaderDefect` instances reporting any
62      RFC compliance problems found during parsing.  The email package tries to
63      be complete about detecting compliance issues.  See the :mod:`~email.errors`
64      module for a discussion of the types of defects that may be reported.
65
66
67   .. attribute:: max_count
68
69      The maximum number of headers of this type that can have the same
70      ``name``.  A value of ``None`` means unlimited.  The ``BaseHeader`` value
71      for this attribute is ``None``; it is expected that specialized header
72      classes will override this value as needed.
73
74   ``BaseHeader`` also provides the following method, which is called by the
75   email library code and should not in general be called by application
76   programs:
77
78   .. method:: fold(*, policy)
79
80      Return a string containing :attr:`~email.policy.Policy.linesep`
81      characters as required to correctly fold the header according to
82      *policy*.  A :attr:`~email.policy.Policy.cte_type` of ``8bit`` will be
83      treated as if it were ``7bit``, since headers may not contain arbitrary
84      binary data.  If :attr:`~email.policy.EmailPolicy.utf8` is ``False``,
85      non-ASCII data will be :rfc:`2047` encoded.
86
87
88   ``BaseHeader`` by itself cannot be used to create a header object.  It
89   defines a protocol that each specialized header cooperates with in order to
90   produce the header object.  Specifically, ``BaseHeader`` requires that
91   the specialized class provide a :func:`classmethod` named ``parse``.  This
92   method is called as follows::
93
94       parse(string, kwds)
95
96   ``kwds`` is a dictionary containing one pre-initialized key, ``defects``.
97   ``defects`` is an empty list.  The parse method should append any detected
98   defects to this list.  On return, the ``kwds`` dictionary *must* contain
99   values for at least the keys ``decoded`` and ``defects``.  ``decoded``
100   should be the string value for the header (that is, the header value fully
101   decoded to unicode).  The parse method should assume that *string* may
102   contain content-transfer-encoded parts, but should correctly handle all valid
103   unicode characters as well so that it can parse un-encoded header values.
104
105   ``BaseHeader``'s ``__new__`` then creates the header instance, and calls its
106   ``init`` method.  The specialized class only needs to provide an ``init``
107   method if it wishes to set additional attributes beyond those provided by
108   ``BaseHeader`` itself.  Such an ``init`` method should look like this::
109
110       def init(self, /, *args, **kw):
111           self._myattr = kw.pop('myattr')
112           super().init(*args, **kw)
113
114   That is, anything extra that the specialized class puts in to the ``kwds``
115   dictionary should be removed and handled, and the remaining contents of
116   ``kw`` (and ``args``) passed to the ``BaseHeader`` ``init`` method.
117
118
119.. class:: UnstructuredHeader
120
121   An "unstructured" header is the default type of header in :rfc:`5322`.
122   Any header that does not have a specified syntax is treated as
123   unstructured.  The classic example of an unstructured header is the
124   :mailheader:`Subject` header.
125
126   In :rfc:`5322`, an unstructured header is a run of arbitrary text in the
127   ASCII character set.  :rfc:`2047`, however, has an :rfc:`5322` compatible
128   mechanism for encoding non-ASCII text as ASCII characters within a header
129   value.  When a *value* containing encoded words is passed to the
130   constructor, the ``UnstructuredHeader`` parser converts such encoded words
131   into unicode, following the :rfc:`2047` rules for unstructured text.  The
132   parser uses heuristics to attempt to decode certain non-compliant encoded
133   words.  Defects are registered in such cases, as well as defects for issues
134   such as invalid characters within the encoded words or the non-encoded text.
135
136   This header type provides no additional attributes.
137
138
139.. class:: DateHeader
140
141   :rfc:`5322` specifies a very specific format for dates within email headers.
142   The ``DateHeader`` parser recognizes that date format, as well as
143   recognizing a number of variant forms that are sometimes found "in the
144   wild".
145
146   This header type provides the following additional attributes:
147
148   .. attribute:: datetime
149
150      If the header value can be recognized as a valid date of one form or
151      another, this attribute will contain a :class:`~datetime.datetime`
152      instance representing that date.  If the timezone of the input date is
153      specified as ``-0000`` (indicating it is in UTC but contains no
154      information about the source timezone), then :attr:`.datetime` will be a
155      naive :class:`~datetime.datetime`.  If a specific timezone offset is
156      found (including `+0000`), then :attr:`.datetime` will contain an aware
157      ``datetime`` that uses :class:`datetime.timezone` to record the timezone
158      offset.
159
160   The ``decoded`` value of the header is determined by formatting the
161   ``datetime`` according to the :rfc:`5322` rules; that is, it is set to::
162
163       email.utils.format_datetime(self.datetime)
164
165   When creating a ``DateHeader``, *value* may be
166   :class:`~datetime.datetime` instance.  This means, for example, that
167   the following code is valid and does what one would expect::
168
169       msg['Date'] = datetime(2011, 7, 15, 21)
170
171   Because this is a naive ``datetime`` it will be interpreted as a UTC
172   timestamp, and the resulting value will have a timezone of ``-0000``.  Much
173   more useful is to use the :func:`~email.utils.localtime` function from the
174   :mod:`~email.utils` module::
175
176       msg['Date'] = utils.localtime()
177
178   This example sets the date header to the current time and date using
179   the current timezone offset.
180
181
182.. class:: AddressHeader
183
184   Address headers are one of the most complex structured header types.
185   The ``AddressHeader`` class provides a generic interface to any address
186   header.
187
188   This header type provides the following additional attributes:
189
190
191   .. attribute:: groups
192
193      A tuple of :class:`.Group` objects encoding the
194      addresses and groups found in the header value.  Addresses that are
195      not part of a group are represented in this list as single-address
196      ``Groups`` whose :attr:`~.Group.display_name` is ``None``.
197
198
199   .. attribute:: addresses
200
201      A tuple of :class:`.Address` objects encoding all
202      of the individual addresses from the header value.  If the header value
203      contains any groups, the individual addresses from the group are included
204      in the list at the point where the group occurs in the value (that is,
205      the list of addresses is "flattened" into a one dimensional list).
206
207   The ``decoded`` value of the header will have all encoded words decoded to
208   unicode.  :class:`~encodings.idna` encoded domain names are also decoded to
209   unicode.  The ``decoded`` value is set by :attr:`~str.join`\ ing the
210   :class:`str` value of the elements of the ``groups`` attribute with ``',
211   '``.
212
213   A list of :class:`.Address` and :class:`.Group` objects in any combination
214   may be used to set the value of an address header.  ``Group`` objects whose
215   ``display_name`` is ``None`` will be interpreted as single addresses, which
216   allows an address list to be copied with groups intact by using the list
217   obtained from the ``groups`` attribute of the source header.
218
219
220.. class:: SingleAddressHeader
221
222   A subclass of :class:`.AddressHeader` that adds one
223   additional attribute:
224
225
226   .. attribute:: address
227
228      The single address encoded by the header value.  If the header value
229      actually contains more than one address (which would be a violation of
230      the RFC under the default :mod:`~email.policy`), accessing this attribute
231      will result in a :exc:`ValueError`.
232
233
234Many of the above classes also have a ``Unique`` variant (for example,
235``UniqueUnstructuredHeader``).  The only difference is that in the ``Unique``
236variant, :attr:`~.BaseHeader.max_count` is set to 1.
237
238
239.. class:: MIMEVersionHeader
240
241   There is really only one valid value for the :mailheader:`MIME-Version`
242   header, and that is ``1.0``.  For future proofing, this header class
243   supports other valid version numbers.  If a version number has a valid value
244   per :rfc:`2045`, then the header object will have non-``None`` values for
245   the following attributes:
246
247   .. attribute:: version
248
249      The version number as a string, with any whitespace and/or comments
250      removed.
251
252   .. attribute:: major
253
254      The major version number as an integer
255
256   .. attribute:: minor
257
258      The minor version number as an integer
259
260
261.. class:: ParameterizedMIMEHeader
262
263    MIME headers all start with the prefix 'Content-'.  Each specific header has
264    a certain value, described under the class for that header.  Some can
265    also take a list of supplemental parameters, which have a common format.
266    This class serves as a base for all the MIME headers that take parameters.
267
268    .. attribute:: params
269
270       A dictionary mapping parameter names to parameter values.
271
272
273.. class:: ContentTypeHeader
274
275    A :class:`ParameterizedMIMEHeader` class that handles the
276    :mailheader:`Content-Type` header.
277
278    .. attribute:: content_type
279
280       The content type string, in the form ``maintype/subtype``.
281
282    .. attribute:: maintype
283
284    .. attribute:: subtype
285
286
287.. class:: ContentDispositionHeader
288
289    A :class:`ParameterizedMIMEHeader` class that handles the
290    :mailheader:`Content-Disposition` header.
291
292    .. attribute:: content_disposition
293
294       ``inline`` and ``attachment`` are the only valid values in common use.
295
296
297.. class:: ContentTransferEncoding
298
299   Handles the :mailheader:`Content-Transfer-Encoding` header.
300
301   .. attribute:: cte
302
303      Valid values are ``7bit``, ``8bit``, ``base64``, and
304      ``quoted-printable``.  See :rfc:`2045` for more information.
305
306
307
308.. class:: HeaderRegistry(base_class=BaseHeader, \
309                          default_class=UnstructuredHeader, \
310                          use_default_map=True)
311
312    This is the factory used by :class:`~email.policy.EmailPolicy` by default.
313    ``HeaderRegistry`` builds the class used to create a header instance
314    dynamically, using *base_class* and a specialized class retrieved from a
315    registry that it holds.  When a given header name does not appear in the
316    registry, the class specified by *default_class* is used as the specialized
317    class.  When *use_default_map* is ``True`` (the default), the standard
318    mapping of header names to classes is copied in to the registry during
319    initialization.  *base_class* is always the last class in the generated
320    class's ``__bases__`` list.
321
322    The default mappings are:
323
324      :subject:                   UniqueUnstructuredHeader
325      :date:                      UniqueDateHeader
326      :resent-date:               DateHeader
327      :orig-date:                 UniqueDateHeader
328      :sender:                    UniqueSingleAddressHeader
329      :resent-sender:             SingleAddressHeader
330      :to:                        UniqueAddressHeader
331      :resent-to:                 AddressHeader
332      :cc:                        UniqueAddressHeader
333      :resent-cc:                 AddressHeader
334      :bcc:                       UniqueAddressHeader
335      :resent-bcc:                AddressHeader
336      :from:                      UniqueAddressHeader
337      :resent-from:               AddressHeader
338      :reply-to:                  UniqueAddressHeader
339      :mime-version:              MIMEVersionHeader
340      :content-type:              ContentTypeHeader
341      :content-disposition:       ContentDispositionHeader
342      :content-transfer-encoding: ContentTransferEncodingHeader
343      :message-id:                MessageIDHeader
344
345    ``HeaderRegistry`` has the following methods:
346
347
348    .. method:: map_to_type(self, name, cls)
349
350       *name* is the name of the header to be mapped.  It will be converted to
351       lower case in the registry.  *cls* is the specialized class to be used,
352       along with *base_class*, to create the class used to instantiate headers
353       that match *name*.
354
355
356    .. method:: __getitem__(name)
357
358       Construct and return a class to handle creating a *name* header.
359
360
361    .. method:: __call__(name, value)
362
363       Retrieves the specialized header associated with *name* from the
364       registry (using *default_class* if *name* does not appear in the
365       registry) and composes it with *base_class* to produce a class,
366       calls the constructed class's constructor, passing it the same
367       argument list, and finally returns the class instance created thereby.
368
369
370The following classes are the classes used to represent data parsed from
371structured headers and can, in general, be used by an application program to
372construct structured values to assign to specific headers.
373
374
375.. class:: Address(display_name='', username='', domain='', addr_spec=None)
376
377   The class used to represent an email address.  The general form of an
378   address is::
379
380      [display_name] <username@domain>
381
382   or::
383
384      username@domain
385
386   where each part must conform to specific syntax rules spelled out in
387   :rfc:`5322`.
388
389   As a convenience *addr_spec* can be specified instead of *username* and
390   *domain*, in which case *username* and *domain* will be parsed from the
391   *addr_spec*.  An *addr_spec* must be a properly RFC quoted string; if it is
392   not ``Address`` will raise an error.  Unicode characters are allowed and
393   will be property encoded when serialized.  However, per the RFCs, unicode is
394   *not* allowed in the username portion of the address.
395
396   .. attribute:: display_name
397
398      The display name portion of the address, if any, with all quoting
399      removed.  If the address does not have a display name, this attribute
400      will be an empty string.
401
402   .. attribute:: username
403
404      The ``username`` portion of the address, with all quoting removed.
405
406   .. attribute:: domain
407
408      The ``domain`` portion of the address.
409
410   .. attribute:: addr_spec
411
412      The ``username@domain`` portion of the address, correctly quoted
413      for use as a bare address (the second form shown above).  This
414      attribute is not mutable.
415
416   .. method:: __str__()
417
418      The ``str`` value of the object is the address quoted according to
419      :rfc:`5322` rules, but with no Content Transfer Encoding of any non-ASCII
420      characters.
421
422   To support SMTP (:rfc:`5321`), ``Address`` handles one special case: if
423   ``username`` and ``domain`` are both the empty string (or ``None``), then
424   the string value of the ``Address`` is ``<>``.
425
426
427.. class:: Group(display_name=None, addresses=None)
428
429   The class used to represent an address group.  The general form of an
430   address group is::
431
432     display_name: [address-list];
433
434   As a convenience for processing lists of addresses that consist of a mixture
435   of groups and single addresses, a ``Group`` may also be used to represent
436   single addresses that are not part of a group by setting *display_name* to
437   ``None`` and providing a list of the single address as *addresses*.
438
439   .. attribute:: display_name
440
441      The ``display_name`` of the group.  If it is ``None`` and there is
442      exactly one ``Address`` in ``addresses``, then the ``Group`` represents a
443      single address that is not in a group.
444
445   .. attribute:: addresses
446
447      A possibly empty tuple of :class:`.Address` objects representing the
448      addresses in the group.
449
450   .. method:: __str__()
451
452      The ``str`` value of a ``Group`` is formatted according to :rfc:`5322`,
453      but with no Content Transfer Encoding of any non-ASCII characters.  If
454      ``display_name`` is none and there is a single ``Address`` in the
455      ``addresses`` list, the ``str`` value will be the same as the ``str`` of
456      that single ``Address``.
457
458
459.. rubric:: Footnotes
460
461.. [1] Originally added in 3.3 as a :term:`provisional module <provisional
462       package>`
463