1:mod:`email.policy`: Policy Objects
2-----------------------------------
3
4.. module:: email.policy
5   :synopsis: Controlling the parsing and generating of messages
6
7.. moduleauthor:: R. David Murray <rdmurray@bitdance.com>
8.. sectionauthor:: R. David Murray <rdmurray@bitdance.com>
9
10.. versionadded:: 3.3
11
12**Source code:** :source:`Lib/email/policy.py`
13
14--------------
15
16The :mod:`email` package's prime focus is the handling of email messages as
17described by the various email and MIME RFCs.  However, the general format of
18email messages (a block of header fields each consisting of a name followed by
19a colon followed by a value, the whole block followed by a blank line and an
20arbitrary 'body'), is a format that has found utility outside of the realm of
21email.  Some of these uses conform fairly closely to the main email RFCs, some
22do not.  Even when working with email, there are times when it is desirable to
23break strict compliance with the RFCs, such as generating emails that
24interoperate with email servers that do not themselves follow the standards, or
25that implement extensions you want to use in ways that violate the
26standards.
27
28Policy objects give the email package the flexibility to handle all these
29disparate use cases.
30
31A :class:`Policy` object encapsulates a set of attributes and methods that
32control the behavior of various components of the email package during use.
33:class:`Policy` instances can be passed to various classes and methods in the
34email package to alter the default behavior.  The settable values and their
35defaults are described below.
36
37There is a default policy used by all classes in the email package.  For all of
38the :mod:`~email.parser` classes and the related convenience functions, and for
39the :class:`~email.message.Message` class, this is the :class:`Compat32`
40policy, via its corresponding pre-defined instance :const:`compat32`.  This
41policy provides for complete backward compatibility (in some cases, including
42bug compatibility) with the pre-Python3.3 version of the email package.
43
44This default value for the *policy* keyword to
45:class:`~email.message.EmailMessage` is the :class:`EmailPolicy` policy, via
46its pre-defined instance :data:`~default`.
47
48When a :class:`~email.message.Message` or :class:`~email.message.EmailMessage`
49object is created, it acquires a policy.  If the message is created by a
50:mod:`~email.parser`, a policy passed to the parser will be the policy used by
51the message it creates.  If the message is created by the program, then the
52policy can be specified when it is created.  When a message is passed to a
53:mod:`~email.generator`, the generator uses the policy from the message by
54default, but you can also pass a specific policy to the generator that will
55override the one stored on the message object.
56
57The default value for the *policy* keyword for the :mod:`email.parser` classes
58and the parser convenience functions **will be changing** in a future version of
59Python.  Therefore you should **always specify explicitly which policy you want
60to use** when calling any of the classes and functions described in the
61:mod:`~email.parser` module.
62
63The first part of this documentation covers the features of :class:`Policy`, an
64:term:`abstract base class` that defines the features that are common to all
65policy objects, including :const:`compat32`.  This includes certain hook
66methods that are called internally by the email package, which a custom policy
67could override to obtain different behavior.  The second part describes the
68concrete classes :class:`EmailPolicy` and :class:`Compat32`, which implement
69the hooks that provide the standard behavior and the backward compatible
70behavior and features, respectively.
71
72:class:`Policy` instances are immutable, but they can be cloned, accepting the
73same keyword arguments as the class constructor and returning a new
74:class:`Policy` instance that is a copy of the original but with the specified
75attributes values changed.
76
77As an example, the following code could be used to read an email message from a
78file on disk and pass it to the system ``sendmail`` program on a Unix system:
79
80.. testsetup::
81
82   from unittest import mock
83   mocker = mock.patch('subprocess.Popen')
84   m = mocker.start()
85   proc = mock.MagicMock()
86   m.return_value = proc
87   proc.stdin.close.return_value = None
88   mymsg = open('mymsg.txt', 'w')
89   mymsg.write('To: abc@xyz.com\n\n')
90   mymsg.flush()
91
92.. doctest::
93
94   >>> from email import message_from_binary_file
95   >>> from email.generator import BytesGenerator
96   >>> from email import policy
97   >>> from subprocess import Popen, PIPE
98   >>> with open('mymsg.txt', 'rb') as f:
99   ...     msg = message_from_binary_file(f, policy=policy.default)
100   >>> p = Popen(['sendmail', msg['To'].addresses[0]], stdin=PIPE)
101   >>> g = BytesGenerator(p.stdin, policy=msg.policy.clone(linesep='\r\n'))
102   >>> g.flatten(msg)
103   >>> p.stdin.close()
104   >>> rc = p.wait()
105
106.. testcleanup::
107
108   mymsg.close()
109   mocker.stop()
110   import os
111   os.remove('mymsg.txt')
112
113Here we are telling :class:`~email.generator.BytesGenerator` to use the RFC
114correct line separator characters when creating the binary string to feed into
115``sendmail's`` ``stdin``, where the default policy would use ``\n`` line
116separators.
117
118Some email package methods accept a *policy* keyword argument, allowing the
119policy to be overridden for that method.  For example, the following code uses
120the :meth:`~email.message.Message.as_bytes` method of the *msg* object from
121the previous example and writes the message to a file using the native line
122separators for the platform on which it is running::
123
124   >>> import os
125   >>> with open('converted.txt', 'wb') as f:
126   ...     f.write(msg.as_bytes(policy=msg.policy.clone(linesep=os.linesep)))
127   17
128
129Policy objects can also be combined using the addition operator, producing a
130policy object whose settings are a combination of the non-default values of the
131summed objects::
132
133   >>> compat_SMTP = policy.compat32.clone(linesep='\r\n')
134   >>> compat_strict = policy.compat32.clone(raise_on_defect=True)
135   >>> compat_strict_SMTP = compat_SMTP + compat_strict
136
137This operation is not commutative; that is, the order in which the objects are
138added matters.  To illustrate::
139
140   >>> policy100 = policy.compat32.clone(max_line_length=100)
141   >>> policy80 = policy.compat32.clone(max_line_length=80)
142   >>> apolicy = policy100 + policy80
143   >>> apolicy.max_line_length
144   80
145   >>> apolicy = policy80 + policy100
146   >>> apolicy.max_line_length
147   100
148
149
150.. class:: Policy(**kw)
151
152   This is the :term:`abstract base class` for all policy classes.  It provides
153   default implementations for a couple of trivial methods, as well as the
154   implementation of the immutability property, the :meth:`clone` method, and
155   the constructor semantics.
156
157   The constructor of a policy class can be passed various keyword arguments.
158   The arguments that may be specified are any non-method properties on this
159   class, plus any additional non-method properties on the concrete class.  A
160   value specified in the constructor will override the default value for the
161   corresponding attribute.
162
163   This class defines the following properties, and thus values for the
164   following may be passed in the constructor of any policy class:
165
166
167   .. attribute:: max_line_length
168
169      The maximum length of any line in the serialized output, not counting the
170      end of line character(s).  Default is 78, per :rfc:`5322`.  A value of
171      ``0`` or :const:`None` indicates that no line wrapping should be
172      done at all.
173
174
175   .. attribute:: linesep
176
177      The string to be used to terminate lines in serialized output.  The
178      default is ``\n`` because that's the internal end-of-line discipline used
179      by Python, though ``\r\n`` is required by the RFCs.
180
181
182   .. attribute:: cte_type
183
184      Controls the type of Content Transfer Encodings that may be or are
185      required to be used.  The possible values are:
186
187      .. tabularcolumns:: |l|L|
188
189      ========  ===============================================================
190      ``7bit``  all data must be "7 bit clean" (ASCII-only).  This means that
191                where necessary data will be encoded using either
192                quoted-printable or base64 encoding.
193
194      ``8bit``  data is not constrained to be 7 bit clean.  Data in headers is
195                still required to be ASCII-only and so will be encoded (see
196                :meth:`fold_binary` and :attr:`~EmailPolicy.utf8` below for
197                exceptions), but body parts may use the ``8bit`` CTE.
198      ========  ===============================================================
199
200      A ``cte_type`` value of ``8bit`` only works with ``BytesGenerator``, not
201      ``Generator``, because strings cannot contain binary data.  If a
202      ``Generator`` is operating under a policy that specifies
203      ``cte_type=8bit``, it will act as if ``cte_type`` is ``7bit``.
204
205
206   .. attribute:: raise_on_defect
207
208      If :const:`True`, any defects encountered will be raised as errors.  If
209      :const:`False` (the default), defects will be passed to the
210      :meth:`register_defect` method.
211
212
213   .. attribute:: mangle_from_
214
215      If :const:`True`, lines starting with *"From "* in the body are
216      escaped by putting a ``>`` in front of them. This parameter is used when
217      the message is being serialized by a generator.
218      Default: :const:`False`.
219
220      .. versionadded:: 3.5
221         The *mangle_from_* parameter.
222
223
224   .. attribute:: message_factory
225
226      A factory function for constructing a new empty message object.  Used
227      by the parser when building messages.  Defaults to ``None``, in
228      which case :class:`~email.message.Message` is used.
229
230      .. versionadded:: 3.6
231
232   The following :class:`Policy` method is intended to be called by code using
233   the email library to create policy instances with custom settings:
234
235
236   .. method:: clone(**kw)
237
238      Return a new :class:`Policy` instance whose attributes have the same
239      values as the current instance, except where those attributes are
240      given new values by the keyword arguments.
241
242
243   The remaining :class:`Policy` methods are called by the email package code,
244   and are not intended to be called by an application using the email package.
245   A custom policy must implement all of these methods.
246
247
248   .. method:: handle_defect(obj, defect)
249
250      Handle a *defect* found on *obj*.  When the email package calls this
251      method, *defect* will always be a subclass of
252      :class:`~email.errors.Defect`.
253
254      The default implementation checks the :attr:`raise_on_defect` flag.  If
255      it is ``True``, *defect* is raised as an exception.  If it is ``False``
256      (the default), *obj* and *defect* are passed to :meth:`register_defect`.
257
258
259   .. method:: register_defect(obj, defect)
260
261      Register a *defect* on *obj*.  In the email package, *defect* will always
262      be a subclass of :class:`~email.errors.Defect`.
263
264      The default implementation calls the ``append`` method of the ``defects``
265      attribute of *obj*.  When the email package calls :attr:`handle_defect`,
266      *obj* will normally have a ``defects`` attribute that has an ``append``
267      method.  Custom object types used with the email package (for example,
268      custom ``Message`` objects) should also provide such an attribute,
269      otherwise defects in parsed messages will raise unexpected errors.
270
271
272   .. method:: header_max_count(name)
273
274      Return the maximum allowed number of headers named *name*.
275
276      Called when a header is added to an :class:`~email.message.EmailMessage`
277      or :class:`~email.message.Message` object.  If the returned value is not
278      ``0`` or ``None``, and there are already a number of headers with the
279      name *name* greater than or equal to the value returned, a
280      :exc:`ValueError` is raised.
281
282      Because the default behavior of ``Message.__setitem__`` is to append the
283      value to the list of headers, it is easy to create duplicate headers
284      without realizing it.  This method allows certain headers to be limited
285      in the number of instances of that header that may be added to a
286      ``Message`` programmatically.  (The limit is not observed by the parser,
287      which will faithfully produce as many headers as exist in the message
288      being parsed.)
289
290      The default implementation returns ``None`` for all header names.
291
292
293   .. method:: header_source_parse(sourcelines)
294
295      The email package calls this method with a list of strings, each string
296      ending with the line separation characters found in the source being
297      parsed.  The first line includes the field header name and separator.
298      All whitespace in the source is preserved.  The method should return the
299      ``(name, value)`` tuple that is to be stored in the ``Message`` to
300      represent the parsed header.
301
302      If an implementation wishes to retain compatibility with the existing
303      email package policies, *name* should be the case preserved name (all
304      characters up to the '``:``' separator), while *value* should be the
305      unfolded value (all line separator characters removed, but whitespace
306      kept intact), stripped of leading whitespace.
307
308      *sourcelines* may contain surrogateescaped binary data.
309
310      There is no default implementation
311
312
313   .. method:: header_store_parse(name, value)
314
315      The email package calls this method with the name and value provided by
316      the application program when the application program is modifying a
317      ``Message`` programmatically (as opposed to a ``Message`` created by a
318      parser).  The method should return the ``(name, value)`` tuple that is to
319      be stored in the ``Message`` to represent the header.
320
321      If an implementation wishes to retain compatibility with the existing
322      email package policies, the *name* and *value* should be strings or
323      string subclasses that do not change the content of the passed in
324      arguments.
325
326      There is no default implementation
327
328
329   .. method:: header_fetch_parse(name, value)
330
331      The email package calls this method with the *name* and *value* currently
332      stored in the ``Message`` when that header is requested by the
333      application program, and whatever the method returns is what is passed
334      back to the application as the value of the header being retrieved.
335      Note that there may be more than one header with the same name stored in
336      the ``Message``; the method is passed the specific name and value of the
337      header destined to be returned to the application.
338
339      *value* may contain surrogateescaped binary data.  There should be no
340      surrogateescaped binary data in the value returned by the method.
341
342      There is no default implementation
343
344
345   .. method:: fold(name, value)
346
347      The email package calls this method with the *name* and *value* currently
348      stored in the ``Message`` for a given header.  The method should return a
349      string that represents that header "folded" correctly (according to the
350      policy settings) by composing the *name* with the *value* and inserting
351      :attr:`linesep` characters at the appropriate places.  See :rfc:`5322`
352      for a discussion of the rules for folding email headers.
353
354      *value* may contain surrogateescaped binary data.  There should be no
355      surrogateescaped binary data in the string returned by the method.
356
357
358   .. method:: fold_binary(name, value)
359
360      The same as :meth:`fold`, except that the returned value should be a
361      bytes object rather than a string.
362
363      *value* may contain surrogateescaped binary data.  These could be
364      converted back into binary data in the returned bytes object.
365
366
367
368.. class:: EmailPolicy(**kw)
369
370   This concrete :class:`Policy` provides behavior that is intended to be fully
371   compliant with the current email RFCs.  These include (but are not limited
372   to) :rfc:`5322`, :rfc:`2047`, and the current MIME RFCs.
373
374   This policy adds new header parsing and folding algorithms.  Instead of
375   simple strings, headers are ``str`` subclasses with attributes that depend
376   on the type of the field.  The parsing and folding algorithm fully implement
377   :rfc:`2047` and :rfc:`5322`.
378
379   The default value for the :attr:`~email.policy.Policy.message_factory`
380   attribute is :class:`~email.message.EmailMessage`.
381
382   In addition to the settable attributes listed above that apply to all
383   policies, this policy adds the following additional attributes:
384
385   .. versionadded:: 3.6 [1]_
386
387
388   .. attribute:: utf8
389
390      If ``False``, follow :rfc:`5322`, supporting non-ASCII characters in
391      headers by encoding them as "encoded words".  If ``True``, follow
392      :rfc:`6532` and use ``utf-8`` encoding for headers.  Messages
393      formatted in this way may be passed to SMTP servers that support
394      the ``SMTPUTF8`` extension (:rfc:`6531`).
395
396
397   .. attribute:: refold_source
398
399      If the value for a header in the ``Message`` object originated from a
400      :mod:`~email.parser` (as opposed to being set by a program), this
401      attribute indicates whether or not a generator should refold that value
402      when transforming the message back into serialized form.  The possible
403      values are:
404
405      ========  ===============================================================
406      ``none``  all source values use original folding
407
408      ``long``  source values that have any line that is longer than
409                ``max_line_length`` will be refolded
410
411      ``all``   all values are refolded.
412      ========  ===============================================================
413
414      The default is ``long``.
415
416
417   .. attribute:: header_factory
418
419      A callable that takes two arguments, ``name`` and ``value``, where
420      ``name`` is a header field name and ``value`` is an unfolded header field
421      value, and returns a string subclass that represents that header.  A
422      default ``header_factory`` (see :mod:`~email.headerregistry`) is provided
423      that supports custom parsing for the various address and date :RFC:`5322`
424      header field types, and the major MIME header field stypes.  Support for
425      additional custom parsing will be added in the future.
426
427
428   .. attribute:: content_manager
429
430      An object with at least two methods: get_content and set_content.  When
431      the :meth:`~email.message.EmailMessage.get_content` or
432      :meth:`~email.message.EmailMessage.set_content` method of an
433      :class:`~email.message.EmailMessage` object is called, it calls the
434      corresponding method of this object, passing it the message object as its
435      first argument, and any arguments or keywords that were passed to it as
436      additional arguments.  By default ``content_manager`` is set to
437      :data:`~email.contentmanager.raw_data_manager`.
438
439      .. versionadded:: 3.4
440
441
442   The class provides the following concrete implementations of the abstract
443   methods of :class:`Policy`:
444
445
446   .. method:: header_max_count(name)
447
448      Returns the value of the
449      :attr:`~email.headerregistry.BaseHeader.max_count` attribute of the
450      specialized class used to represent the header with the given name.
451
452
453   .. method:: header_source_parse(sourcelines)
454
455
456      The name is parsed as everything up to the '``:``' and returned
457      unmodified.  The value is determined by stripping leading whitespace off
458      the remainder of the first line, joining all subsequent lines together,
459      and stripping any trailing carriage return or linefeed characters.
460
461
462   .. method:: header_store_parse(name, value)
463
464      The name is returned unchanged.  If the input value has a ``name``
465      attribute and it matches *name* ignoring case, the value is returned
466      unchanged.  Otherwise the *name* and *value* are passed to
467      ``header_factory``, and the resulting header object is returned as
468      the value.  In this case a ``ValueError`` is raised if the input value
469      contains CR or LF characters.
470
471
472   .. method:: header_fetch_parse(name, value)
473
474      If the value has a ``name`` attribute, it is returned to unmodified.
475      Otherwise the *name*, and the *value* with any CR or LF characters
476      removed, are passed to the ``header_factory``, and the resulting
477      header object is returned.  Any surrogateescaped bytes get turned into
478      the unicode unknown-character glyph.
479
480
481   .. method:: fold(name, value)
482
483      Header folding is controlled by the :attr:`refold_source` policy setting.
484      A value is considered to be a 'source value' if and only if it does not
485      have a ``name`` attribute (having a ``name`` attribute means it is a
486      header object of some sort).  If a source value needs to be refolded
487      according to the policy, it is converted into a header object by
488      passing the *name* and the *value* with any CR and LF characters removed
489      to the ``header_factory``.  Folding of a header object is done by
490      calling its ``fold`` method with the current policy.
491
492      Source values are split into lines using :meth:`~str.splitlines`.  If
493      the value is not to be refolded, the lines are rejoined using the
494      ``linesep`` from the policy and returned.  The exception is lines
495      containing non-ascii binary data.  In that case the value is refolded
496      regardless of the ``refold_source`` setting, which causes the binary data
497      to be CTE encoded using the ``unknown-8bit`` charset.
498
499
500   .. method:: fold_binary(name, value)
501
502      The same as :meth:`fold` if :attr:`~Policy.cte_type` is ``7bit``, except
503      that the returned value is bytes.
504
505      If :attr:`~Policy.cte_type` is ``8bit``, non-ASCII binary data is
506      converted back
507      into bytes.  Headers with binary data are not refolded, regardless of the
508      ``refold_header`` setting, since there is no way to know whether the
509      binary data consists of single byte characters or multibyte characters.
510
511
512The following instances of :class:`EmailPolicy` provide defaults suitable for
513specific application domains.  Note that in the future the behavior of these
514instances (in particular the ``HTTP`` instance) may be adjusted to conform even
515more closely to the RFCs relevant to their domains.
516
517
518.. data:: default
519
520   An instance of ``EmailPolicy`` with all defaults unchanged.  This policy
521   uses the standard Python ``\n`` line endings rather than the RFC-correct
522   ``\r\n``.
523
524
525.. data:: SMTP
526
527   Suitable for serializing messages in conformance with the email RFCs.
528   Like ``default``, but with ``linesep`` set to ``\r\n``, which is RFC
529   compliant.
530
531
532.. data:: SMTPUTF8
533
534   The same as ``SMTP`` except that :attr:`~EmailPolicy.utf8` is ``True``.
535   Useful for serializing messages to a message store without using encoded
536   words in the headers.  Should only be used for SMTP transmission if the
537   sender or recipient addresses have non-ASCII characters (the
538   :meth:`smtplib.SMTP.send_message` method handles this automatically).
539
540
541.. data:: HTTP
542
543   Suitable for serializing headers with for use in HTTP traffic.  Like
544   ``SMTP`` except that ``max_line_length`` is set to ``None`` (unlimited).
545
546
547.. data:: strict
548
549   Convenience instance.  The same as ``default`` except that
550   ``raise_on_defect`` is set to ``True``.  This allows any policy to be made
551   strict by writing::
552
553        somepolicy + policy.strict
554
555
556With all of these :class:`EmailPolicies <.EmailPolicy>`, the effective API of
557the email package is changed from the Python 3.2 API in the following ways:
558
559   * Setting a header on a :class:`~email.message.Message` results in that
560     header being parsed and a header object created.
561
562   * Fetching a header value from a :class:`~email.message.Message` results
563     in that header being parsed and a header object created and
564     returned.
565
566   * Any header object, or any header that is refolded due to the
567     policy settings, is folded using an algorithm that fully implements the
568     RFC folding algorithms, including knowing where encoded words are required
569     and allowed.
570
571From the application view, this means that any header obtained through the
572:class:`~email.message.EmailMessage` is a header object with extra
573attributes, whose string value is the fully decoded unicode value of the
574header.  Likewise, a header may be assigned a new value, or a new header
575created, using a unicode string, and the policy will take care of converting
576the unicode string into the correct RFC encoded form.
577
578The header objects and their attributes are described in
579:mod:`~email.headerregistry`.
580
581
582
583.. class:: Compat32(**kw)
584
585   This concrete :class:`Policy` is the backward compatibility policy.  It
586   replicates the behavior of the email package in Python 3.2.  The
587   :mod:`~email.policy` module also defines an instance of this class,
588   :const:`compat32`, that is used as the default policy.  Thus the default
589   behavior of the email package is to maintain compatibility with Python 3.2.
590
591   The following attributes have values that are different from the
592   :class:`Policy` default:
593
594
595   .. attribute:: mangle_from_
596
597      The default is ``True``.
598
599
600   The class provides the following concrete implementations of the
601   abstract methods of :class:`Policy`:
602
603
604   .. method:: header_source_parse(sourcelines)
605
606      The name is parsed as everything up to the '``:``' and returned
607      unmodified.  The value is determined by stripping leading whitespace off
608      the remainder of the first line, joining all subsequent lines together,
609      and stripping any trailing carriage return or linefeed characters.
610
611
612   .. method:: header_store_parse(name, value)
613
614      The name and value are returned unmodified.
615
616
617   .. method:: header_fetch_parse(name, value)
618
619      If the value contains binary data, it is converted into a
620      :class:`~email.header.Header` object using the ``unknown-8bit`` charset.
621      Otherwise it is returned unmodified.
622
623
624   .. method:: fold(name, value)
625
626      Headers are folded using the :class:`~email.header.Header` folding
627      algorithm, which preserves existing line breaks in the value, and wraps
628      each resulting line to the ``max_line_length``.  Non-ASCII binary data are
629      CTE encoded using the ``unknown-8bit`` charset.
630
631
632   .. method:: fold_binary(name, value)
633
634      Headers are folded using the :class:`~email.header.Header` folding
635      algorithm, which preserves existing line breaks in the value, and wraps
636      each resulting line to the ``max_line_length``.  If ``cte_type`` is
637      ``7bit``, non-ascii binary data is CTE encoded using the ``unknown-8bit``
638      charset.  Otherwise the original source header is used, with its existing
639      line breaks and any (RFC invalid) binary data it may contain.
640
641
642.. data:: compat32
643
644   An instance of :class:`Compat32`, providing  backward compatibility with the
645   behavior of the email package in Python 3.2.
646
647
648.. rubric:: Footnotes
649
650.. [1] Originally added in 3.3 as a :term:`provisional feature <provisional
651       package>`.
652