1:mod:`xml.etree.ElementTree` --- The ElementTree XML API
2========================================================
3
4.. module:: xml.etree.ElementTree
5   :synopsis: Implementation of the ElementTree API.
6
7.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
8
9**Source code:** :source:`Lib/xml/etree/ElementTree.py`
10
11--------------
12
13The :mod:`xml.etree.ElementTree` module implements a simple and efficient API
14for parsing and creating XML data.
15
16.. versionchanged:: 3.3
17   This module will use a fast implementation whenever available.
18   The :mod:`xml.etree.cElementTree` module is deprecated.
19
20
21.. warning::
22
23   The :mod:`xml.etree.ElementTree` module is not secure against
24   maliciously constructed data.  If you need to parse untrusted or
25   unauthenticated data see :ref:`xml-vulnerabilities`.
26
27Tutorial
28--------
29
30This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
31short).  The goal is to demonstrate some of the building blocks and basic
32concepts of the module.
33
34XML tree and elements
35^^^^^^^^^^^^^^^^^^^^^
36
37XML is an inherently hierarchical data format, and the most natural way to
38represent it is with a tree.  ``ET`` has two classes for this purpose -
39:class:`ElementTree` represents the whole XML document as a tree, and
40:class:`Element` represents a single node in this tree.  Interactions with
41the whole document (reading and writing to/from files) are usually done
42on the :class:`ElementTree` level.  Interactions with a single XML element
43and its sub-elements are done on the :class:`Element` level.
44
45.. _elementtree-parsing-xml:
46
47Parsing XML
48^^^^^^^^^^^
49
50We'll be using the following XML document as the sample data for this section:
51
52.. code-block:: xml
53
54   <?xml version="1.0"?>
55   <data>
56       <country name="Liechtenstein">
57           <rank>1</rank>
58           <year>2008</year>
59           <gdppc>141100</gdppc>
60           <neighbor name="Austria" direction="E"/>
61           <neighbor name="Switzerland" direction="W"/>
62       </country>
63       <country name="Singapore">
64           <rank>4</rank>
65           <year>2011</year>
66           <gdppc>59900</gdppc>
67           <neighbor name="Malaysia" direction="N"/>
68       </country>
69       <country name="Panama">
70           <rank>68</rank>
71           <year>2011</year>
72           <gdppc>13600</gdppc>
73           <neighbor name="Costa Rica" direction="W"/>
74           <neighbor name="Colombia" direction="E"/>
75       </country>
76   </data>
77
78We can import this data by reading from a file::
79
80   import xml.etree.ElementTree as ET
81   tree = ET.parse('country_data.xml')
82   root = tree.getroot()
83
84Or directly from a string::
85
86   root = ET.fromstring(country_data_as_string)
87
88:func:`fromstring` parses XML from a string directly into an :class:`Element`,
89which is the root element of the parsed tree.  Other parsing functions may
90create an :class:`ElementTree`.  Check the documentation to be sure.
91
92As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::
93
94   >>> root.tag
95   'data'
96   >>> root.attrib
97   {}
98
99It also has children nodes over which we can iterate::
100
101   >>> for child in root:
102   ...     print(child.tag, child.attrib)
103   ...
104   country {'name': 'Liechtenstein'}
105   country {'name': 'Singapore'}
106   country {'name': 'Panama'}
107
108Children are nested, and we can access specific child nodes by index::
109
110   >>> root[0][1].text
111   '2008'
112
113
114.. note::
115
116   Not all elements of the XML input will end up as elements of the
117   parsed tree. Currently, this module skips over any XML comments,
118   processing instructions, and document type declarations in the
119   input. Nevertheless, trees built using this module's API rather
120   than parsing from XML text can have comments and processing
121   instructions in them; they will be included when generating XML
122   output. A document type declaration may be accessed by passing a
123   custom :class:`TreeBuilder` instance to the :class:`XMLParser`
124   constructor.
125
126
127.. _elementtree-pull-parsing:
128
129Pull API for non-blocking parsing
130^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
131
132Most parsing functions provided by this module require the whole document
133to be read at once before returning any result.  It is possible to use an
134:class:`XMLParser` and feed data into it incrementally, but it is a push API that
135calls methods on a callback target, which is too low-level and inconvenient for
136most needs.  Sometimes what the user really wants is to be able to parse XML
137incrementally, without blocking operations, while enjoying the convenience of
138fully constructed :class:`Element` objects.
139
140The most powerful tool for doing this is :class:`XMLPullParser`.  It does not
141require a blocking read to obtain the XML data, and is instead fed with data
142incrementally with :meth:`XMLPullParser.feed` calls.  To get the parsed XML
143elements, call :meth:`XMLPullParser.read_events`.  Here is an example::
144
145   >>> parser = ET.XMLPullParser(['start', 'end'])
146   >>> parser.feed('<mytag>sometext')
147   >>> list(parser.read_events())
148   [('start', <Element 'mytag' at 0x7fa66db2be58>)]
149   >>> parser.feed(' more text</mytag>')
150   >>> for event, elem in parser.read_events():
151   ...     print(event)
152   ...     print(elem.tag, 'text=', elem.text)
153   ...
154   end
155
156The obvious use case is applications that operate in a non-blocking fashion
157where the XML data is being received from a socket or read incrementally from
158some storage device.  In such cases, blocking reads are unacceptable.
159
160Because it's so flexible, :class:`XMLPullParser` can be inconvenient to use for
161simpler use-cases.  If you don't mind your application blocking on reading XML
162data but would still like to have incremental parsing capabilities, take a look
163at :func:`iterparse`.  It can be useful when you're reading a large XML document
164and don't want to hold it wholly in memory.
165
166Finding interesting elements
167^^^^^^^^^^^^^^^^^^^^^^^^^^^^
168
169:class:`Element` has some useful methods that help iterate recursively over all
170the sub-tree below it (its children, their children, and so on).  For example,
171:meth:`Element.iter`::
172
173   >>> for neighbor in root.iter('neighbor'):
174   ...     print(neighbor.attrib)
175   ...
176   {'name': 'Austria', 'direction': 'E'}
177   {'name': 'Switzerland', 'direction': 'W'}
178   {'name': 'Malaysia', 'direction': 'N'}
179   {'name': 'Costa Rica', 'direction': 'W'}
180   {'name': 'Colombia', 'direction': 'E'}
181
182:meth:`Element.findall` finds only elements with a tag which are direct
183children of the current element.  :meth:`Element.find` finds the *first* child
184with a particular tag, and :attr:`Element.text` accesses the element's text
185content.  :meth:`Element.get` accesses the element's attributes::
186
187   >>> for country in root.findall('country'):
188   ...     rank = country.find('rank').text
189   ...     name = country.get('name')
190   ...     print(name, rank)
191   ...
192   Liechtenstein 1
193   Singapore 4
194   Panama 68
195
196More sophisticated specification of which elements to look for is possible by
197using :ref:`XPath <elementtree-xpath>`.
198
199Modifying an XML File
200^^^^^^^^^^^^^^^^^^^^^
201
202:class:`ElementTree` provides a simple way to build XML documents and write them to files.
203The :meth:`ElementTree.write` method serves this purpose.
204
205Once created, an :class:`Element` object may be manipulated by directly changing
206its fields (such as :attr:`Element.text`), adding and modifying attributes
207(:meth:`Element.set` method), as well as adding new children (for example
208with :meth:`Element.append`).
209
210Let's say we want to add one to each country's rank, and add an ``updated``
211attribute to the rank element::
212
213   >>> for rank in root.iter('rank'):
214   ...     new_rank = int(rank.text) + 1
215   ...     rank.text = str(new_rank)
216   ...     rank.set('updated', 'yes')
217   ...
218   >>> tree.write('output.xml')
219
220Our XML now looks like this:
221
222.. code-block:: xml
223
224   <?xml version="1.0"?>
225   <data>
226       <country name="Liechtenstein">
227           <rank updated="yes">2</rank>
228           <year>2008</year>
229           <gdppc>141100</gdppc>
230           <neighbor name="Austria" direction="E"/>
231           <neighbor name="Switzerland" direction="W"/>
232       </country>
233       <country name="Singapore">
234           <rank updated="yes">5</rank>
235           <year>2011</year>
236           <gdppc>59900</gdppc>
237           <neighbor name="Malaysia" direction="N"/>
238       </country>
239       <country name="Panama">
240           <rank updated="yes">69</rank>
241           <year>2011</year>
242           <gdppc>13600</gdppc>
243           <neighbor name="Costa Rica" direction="W"/>
244           <neighbor name="Colombia" direction="E"/>
245       </country>
246   </data>
247
248We can remove elements using :meth:`Element.remove`.  Let's say we want to
249remove all countries with a rank higher than 50::
250
251   >>> for country in root.findall('country'):
252   ...     rank = int(country.find('rank').text)
253   ...     if rank > 50:
254   ...         root.remove(country)
255   ...
256   >>> tree.write('output.xml')
257
258Our XML now looks like this:
259
260.. code-block:: xml
261
262   <?xml version="1.0"?>
263   <data>
264       <country name="Liechtenstein">
265           <rank updated="yes">2</rank>
266           <year>2008</year>
267           <gdppc>141100</gdppc>
268           <neighbor name="Austria" direction="E"/>
269           <neighbor name="Switzerland" direction="W"/>
270       </country>
271       <country name="Singapore">
272           <rank updated="yes">5</rank>
273           <year>2011</year>
274           <gdppc>59900</gdppc>
275           <neighbor name="Malaysia" direction="N"/>
276       </country>
277   </data>
278
279Building XML documents
280^^^^^^^^^^^^^^^^^^^^^^
281
282The :func:`SubElement` function also provides a convenient way to create new
283sub-elements for a given element::
284
285   >>> a = ET.Element('a')
286   >>> b = ET.SubElement(a, 'b')
287   >>> c = ET.SubElement(a, 'c')
288   >>> d = ET.SubElement(c, 'd')
289   >>> ET.dump(a)
290   <a><b /><c><d /></c></a>
291
292Parsing XML with Namespaces
293^^^^^^^^^^^^^^^^^^^^^^^^^^^
294
295If the XML input has `namespaces
296<https://en.wikipedia.org/wiki/XML_namespace>`__, tags and attributes
297with prefixes in the form ``prefix:sometag`` get expanded to
298``{uri}sometag`` where the *prefix* is replaced by the full *URI*.
299Also, if there is a `default namespace
300<https://www.w3.org/TR/xml-names/#defaulting>`__,
301that full URI gets prepended to all of the non-prefixed tags.
302
303Here is an XML example that incorporates two namespaces, one with the
304prefix "fictional" and the other serving as the default namespace:
305
306.. code-block:: xml
307
308    <?xml version="1.0"?>
309    <actors xmlns:fictional="http://characters.example.com"
310            xmlns="http://people.example.com">
311        <actor>
312            <name>John Cleese</name>
313            <fictional:character>Lancelot</fictional:character>
314            <fictional:character>Archie Leach</fictional:character>
315        </actor>
316        <actor>
317            <name>Eric Idle</name>
318            <fictional:character>Sir Robin</fictional:character>
319            <fictional:character>Gunther</fictional:character>
320            <fictional:character>Commander Clement</fictional:character>
321        </actor>
322    </actors>
323
324One way to search and explore this XML example is to manually add the
325URI to every tag or attribute in the xpath of a
326:meth:`~Element.find` or :meth:`~Element.findall`::
327
328    root = fromstring(xml_text)
329    for actor in root.findall('{http://people.example.com}actor'):
330        name = actor.find('{http://people.example.com}name')
331        print(name.text)
332        for char in actor.findall('{http://characters.example.com}character'):
333            print(' |-->', char.text)
334
335A better way to search the namespaced XML example is to create a
336dictionary with your own prefixes and use those in the search functions::
337
338    ns = {'real_person': 'http://people.example.com',
339          'role': 'http://characters.example.com'}
340
341    for actor in root.findall('real_person:actor', ns):
342        name = actor.find('real_person:name', ns)
343        print(name.text)
344        for char in actor.findall('role:character', ns):
345            print(' |-->', char.text)
346
347These two approaches both output::
348
349    John Cleese
350     |--> Lancelot
351     |--> Archie Leach
352    Eric Idle
353     |--> Sir Robin
354     |--> Gunther
355     |--> Commander Clement
356
357
358Additional resources
359^^^^^^^^^^^^^^^^^^^^
360
361See http://effbot.org/zone/element-index.htm for tutorials and links to other
362docs.
363
364
365.. _elementtree-xpath:
366
367XPath support
368-------------
369
370This module provides limited support for
371`XPath expressions <https://www.w3.org/TR/xpath>`_ for locating elements in a
372tree.  The goal is to support a small subset of the abbreviated syntax; a full
373XPath engine is outside the scope of the module.
374
375Example
376^^^^^^^
377
378Here's an example that demonstrates some of the XPath capabilities of the
379module.  We'll be using the ``countrydata`` XML document from the
380:ref:`Parsing XML <elementtree-parsing-xml>` section::
381
382   import xml.etree.ElementTree as ET
383
384   root = ET.fromstring(countrydata)
385
386   # Top-level elements
387   root.findall(".")
388
389   # All 'neighbor' grand-children of 'country' children of the top-level
390   # elements
391   root.findall("./country/neighbor")
392
393   # Nodes with name='Singapore' that have a 'year' child
394   root.findall(".//year/..[@name='Singapore']")
395
396   # 'year' nodes that are children of nodes with name='Singapore'
397   root.findall(".//*[@name='Singapore']/year")
398
399   # All 'neighbor' nodes that are the second child of their parent
400   root.findall(".//neighbor[2]")
401
402Supported XPath syntax
403^^^^^^^^^^^^^^^^^^^^^^
404
405.. tabularcolumns:: |l|L|
406
407+-----------------------+------------------------------------------------------+
408| Syntax                | Meaning                                              |
409+=======================+======================================================+
410| ``tag``               | Selects all child elements with the given tag.       |
411|                       | For example, ``spam`` selects all child elements     |
412|                       | named ``spam``, and ``spam/egg`` selects all         |
413|                       | grandchildren named ``egg`` in all children named    |
414|                       | ``spam``.                                            |
415+-----------------------+------------------------------------------------------+
416| ``*``                 | Selects all child elements.  For example, ``*/egg``  |
417|                       | selects all grandchildren named ``egg``.             |
418+-----------------------+------------------------------------------------------+
419| ``.``                 | Selects the current node.  This is mostly useful     |
420|                       | at the beginning of the path, to indicate that it's  |
421|                       | a relative path.                                     |
422+-----------------------+------------------------------------------------------+
423| ``//``                | Selects all subelements, on all levels beneath the   |
424|                       | current  element.  For example, ``.//egg`` selects   |
425|                       | all ``egg`` elements in the entire tree.             |
426+-----------------------+------------------------------------------------------+
427| ``..``                | Selects the parent element.  Returns ``None`` if the |
428|                       | path attempts to reach the ancestors of the start    |
429|                       | element (the element ``find`` was called on).        |
430+-----------------------+------------------------------------------------------+
431| ``[@attrib]``         | Selects all elements that have the given attribute.  |
432+-----------------------+------------------------------------------------------+
433| ``[@attrib='value']`` | Selects all elements for which the given attribute   |
434|                       | has the given value.  The value cannot contain       |
435|                       | quotes.                                              |
436+-----------------------+------------------------------------------------------+
437| ``[tag]``             | Selects all elements that have a child named         |
438|                       | ``tag``.  Only immediate children are supported.     |
439+-----------------------+------------------------------------------------------+
440| ``[.='text']``        | Selects all elements whose complete text content,    |
441|                       | including descendants, equals the given ``text``.    |
442|                       |                                                      |
443|                       | .. versionadded:: 3.7                                |
444+-----------------------+------------------------------------------------------+
445| ``[tag='text']``      | Selects all elements that have a child named         |
446|                       | ``tag`` whose complete text content, including       |
447|                       | descendants, equals the given ``text``.              |
448+-----------------------+------------------------------------------------------+
449| ``[position]``        | Selects all elements that are located at the given   |
450|                       | position.  The position can be either an integer     |
451|                       | (1 is the first position), the expression ``last()`` |
452|                       | (for the last position), or a position relative to   |
453|                       | the last position (e.g. ``last()-1``).               |
454+-----------------------+------------------------------------------------------+
455
456Predicates (expressions within square brackets) must be preceded by a tag
457name, an asterisk, or another predicate.  ``position`` predicates must be
458preceded by a tag name.
459
460Reference
461---------
462
463.. _elementtree-functions:
464
465Functions
466^^^^^^^^^
467
468
469.. function:: Comment(text=None)
470
471   Comment element factory.  This factory function creates a special element
472   that will be serialized as an XML comment by the standard serializer.  The
473   comment string can be either a bytestring or a Unicode string.  *text* is a
474   string containing the comment string.  Returns an element instance
475   representing a comment.
476
477   Note that :class:`XMLParser` skips over comments in the input
478   instead of creating comment objects for them. An :class:`ElementTree` will
479   only contain comment nodes if they have been inserted into to
480   the tree using one of the :class:`Element` methods.
481
482.. function:: dump(elem)
483
484   Writes an element tree or element structure to sys.stdout.  This function
485   should be used for debugging only.
486
487   The exact output format is implementation dependent.  In this version, it's
488   written as an ordinary XML file.
489
490   *elem* is an element tree or an individual element.
491
492
493.. function:: fromstring(text, parser=None)
494
495   Parses an XML section from a string constant.  Same as :func:`XML`.  *text*
496   is a string containing XML data.  *parser* is an optional parser instance.
497   If not given, the standard :class:`XMLParser` parser is used.
498   Returns an :class:`Element` instance.
499
500
501.. function:: fromstringlist(sequence, parser=None)
502
503   Parses an XML document from a sequence of string fragments.  *sequence* is a
504   list or other sequence containing XML data fragments.  *parser* is an
505   optional parser instance.  If not given, the standard :class:`XMLParser`
506   parser is used.  Returns an :class:`Element` instance.
507
508   .. versionadded:: 3.2
509
510
511.. function:: iselement(element)
512
513   Check if an object appears to be a valid element object.  *element* is an
514   element instance.  Return ``True`` if this is an element object.
515
516
517.. function:: iterparse(source, events=None, parser=None)
518
519   Parses an XML section into an element tree incrementally, and reports what's
520   going on to the user.  *source* is a filename or :term:`file object`
521   containing XML data.  *events* is a sequence of events to report back.  The
522   supported events are the strings ``"start"``, ``"end"``, ``"start-ns"`` and
523   ``"end-ns"`` (the "ns" events are used to get detailed namespace
524   information).  If *events* is omitted, only ``"end"`` events are reported.
525   *parser* is an optional parser instance.  If not given, the standard
526   :class:`XMLParser` parser is used.  *parser* must be a subclass of
527   :class:`XMLParser` and can only use the default :class:`TreeBuilder` as a
528   target.  Returns an :term:`iterator` providing ``(event, elem)`` pairs.
529
530   Note that while :func:`iterparse` builds the tree incrementally, it issues
531   blocking reads on *source* (or the file it names).  As such, it's unsuitable
532   for applications where blocking reads can't be made.  For fully non-blocking
533   parsing, see :class:`XMLPullParser`.
534
535   .. note::
536
537      :func:`iterparse` only guarantees that it has seen the ">" character of a
538      starting tag when it emits a "start" event, so the attributes are defined,
539      but the contents of the text and tail attributes are undefined at that
540      point.  The same applies to the element children; they may or may not be
541      present.
542
543      If you need a fully populated element, look for "end" events instead.
544
545   .. deprecated:: 3.4
546      The *parser* argument.
547
548.. function:: parse(source, parser=None)
549
550   Parses an XML section into an element tree.  *source* is a filename or file
551   object containing XML data.  *parser* is an optional parser instance.  If
552   not given, the standard :class:`XMLParser` parser is used.  Returns an
553   :class:`ElementTree` instance.
554
555
556.. function:: ProcessingInstruction(target, text=None)
557
558   PI element factory.  This factory function creates a special element that
559   will be serialized as an XML processing instruction.  *target* is a string
560   containing the PI target.  *text* is a string containing the PI contents, if
561   given.  Returns an element instance, representing a processing instruction.
562
563   Note that :class:`XMLParser` skips over processing instructions
564   in the input instead of creating comment objects for them. An
565   :class:`ElementTree` will only contain processing instruction nodes if
566   they have been inserted into to the tree using one of the
567   :class:`Element` methods.
568
569.. function:: register_namespace(prefix, uri)
570
571   Registers a namespace prefix.  The registry is global, and any existing
572   mapping for either the given prefix or the namespace URI will be removed.
573   *prefix* is a namespace prefix.  *uri* is a namespace uri.  Tags and
574   attributes in this namespace will be serialized with the given prefix, if at
575   all possible.
576
577   .. versionadded:: 3.2
578
579
580.. function:: SubElement(parent, tag, attrib={}, **extra)
581
582   Subelement factory.  This function creates an element instance, and appends
583   it to an existing element.
584
585   The element name, attribute names, and attribute values can be either
586   bytestrings or Unicode strings.  *parent* is the parent element.  *tag* is
587   the subelement name.  *attrib* is an optional dictionary, containing element
588   attributes.  *extra* contains additional attributes, given as keyword
589   arguments.  Returns an element instance.
590
591
592.. function:: tostring(element, encoding="us-ascii", method="xml", *, \
593                       short_empty_elements=True)
594
595   Generates a string representation of an XML element, including all
596   subelements.  *element* is an :class:`Element` instance.  *encoding* [1]_ is
597   the output encoding (default is US-ASCII).  Use ``encoding="unicode"`` to
598   generate a Unicode string (otherwise, a bytestring is generated).  *method*
599   is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
600   *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`.
601   Returns an (optionally) encoded string containing the XML data.
602
603   .. versionadded:: 3.4
604      The *short_empty_elements* parameter.
605
606
607.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \
608                           short_empty_elements=True)
609
610   Generates a string representation of an XML element, including all
611   subelements.  *element* is an :class:`Element` instance.  *encoding* [1]_ is
612   the output encoding (default is US-ASCII).  Use ``encoding="unicode"`` to
613   generate a Unicode string (otherwise, a bytestring is generated).  *method*
614   is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
615   *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`.
616   Returns a list of (optionally) encoded strings containing the XML data.
617   It does not guarantee any specific sequence, except that
618   ``b"".join(tostringlist(element)) == tostring(element)``.
619
620   .. versionadded:: 3.2
621
622   .. versionadded:: 3.4
623      The *short_empty_elements* parameter.
624
625
626.. function:: XML(text, parser=None)
627
628   Parses an XML section from a string constant.  This function can be used to
629   embed "XML literals" in Python code.  *text* is a string containing XML
630   data.  *parser* is an optional parser instance.  If not given, the standard
631   :class:`XMLParser` parser is used.  Returns an :class:`Element` instance.
632
633
634.. function:: XMLID(text, parser=None)
635
636   Parses an XML section from a string constant, and also returns a dictionary
637   which maps from element id:s to elements.  *text* is a string containing XML
638   data.  *parser* is an optional parser instance.  If not given, the standard
639   :class:`XMLParser` parser is used.  Returns a tuple containing an
640   :class:`Element` instance and a dictionary.
641
642
643.. _elementtree-xinclude:
644
645XInclude support
646----------------
647
648This module provides limited support for
649`XInclude directives <https://www.w3.org/TR/xinclude/>`_, via the :mod:`xml.etree.ElementInclude` helper module.  This module can be used to insert subtrees and text strings into element trees, based on information in the tree.
650
651Example
652^^^^^^^
653
654Here's an example that demonstrates use of the XInclude module. To include an XML document in the current document, use the ``{http://www.w3.org/2001/XInclude}include`` element and set the **parse** attribute to ``"xml"``, and use the **href** attribute to specify the document to include.
655
656.. code-block:: xml
657
658    <?xml version="1.0"?>
659    <document xmlns:xi="http://www.w3.org/2001/XInclude">
660      <xi:include href="source.xml" parse="xml" />
661    </document>
662
663By default, the **href** attribute is treated as a file name. You can use custom loaders to override this behaviour. Also note that the standard helper does not support XPointer syntax.
664
665To process this file, load it as usual, and pass the root element to the :mod:`xml.etree.ElementTree` module:
666
667.. code-block:: python
668
669   from xml.etree import ElementTree, ElementInclude
670
671   tree = ElementTree.parse("document.xml")
672   root = tree.getroot()
673
674   ElementInclude.include(root)
675
676The ElementInclude module replaces the ``{http://www.w3.org/2001/XInclude}include`` element with the root element from the **source.xml** document. The result might look something like this:
677
678.. code-block:: xml
679
680    <document xmlns:xi="http://www.w3.org/2001/XInclude">
681      <para>This is a paragraph.</para>
682    </document>
683
684If the **parse** attribute is omitted, it defaults to "xml". The href attribute is required.
685
686To include a text document, use the ``{http://www.w3.org/2001/XInclude}include`` element, and set the **parse** attribute to "text":
687
688.. code-block:: xml
689
690    <?xml version="1.0"?>
691    <document xmlns:xi="http://www.w3.org/2001/XInclude">
692      Copyright (c) <xi:include href="year.txt" parse="text" />.
693    </document>
694
695The result might look something like:
696
697.. code-block:: xml
698
699    <document xmlns:xi="http://www.w3.org/2001/XInclude">
700      Copyright (c) 2003.
701    </document>
702
703Reference
704---------
705
706.. _elementinclude-functions:
707
708Functions
709^^^^^^^^^
710
711.. function:: xml.etree.ElementInclude.default_loader( href, parse, encoding=None)
712
713   Default loader. This default loader reads an included resource from disk.  *href* is a URL.
714   *parse* is for parse mode either "xml" or "text".  *encoding*
715   is an optional text encoding.  If not given, encoding is ``utf-8``.  Returns the
716   expanded resource.  If the parse mode is ``"xml"``, this is an ElementTree
717   instance.  If the parse mode is "text", this is a Unicode string.  If the
718   loader fails, it can return None or raise an exception.
719
720
721.. function:: xml.etree.ElementInclude.include( elem, loader=None)
722
723   This function expands XInclude directives.  *elem* is the root element.  *loader* is
724   an optional resource loader.  If omitted, it defaults to :func:`default_loader`.
725   If given, it should be a callable that implements the same interface as
726   :func:`default_loader`.  Returns the expanded resource.  If the parse mode is
727   ``"xml"``, this is an ElementTree instance.  If the parse mode is "text",
728   this is a Unicode string.  If the loader fails, it can return None or
729   raise an exception.
730
731
732.. _elementtree-element-objects:
733
734Element Objects
735^^^^^^^^^^^^^^^
736
737.. class:: Element(tag, attrib={}, **extra)
738
739   Element class.  This class defines the Element interface, and provides a
740   reference implementation of this interface.
741
742   The element name, attribute names, and attribute values can be either
743   bytestrings or Unicode strings.  *tag* is the element name.  *attrib* is
744   an optional dictionary, containing element attributes.  *extra* contains
745   additional attributes, given as keyword arguments.
746
747
748   .. attribute:: tag
749
750      A string identifying what kind of data this element represents (the
751      element type, in other words).
752
753
754   .. attribute:: text
755                  tail
756
757      These attributes can be used to hold additional data associated with
758      the element.  Their values are usually strings but may be any
759      application-specific object.  If the element is created from
760      an XML file, the *text* attribute holds either the text between
761      the element's start tag and its first child or end tag, or ``None``, and
762      the *tail* attribute holds either the text between the element's
763      end tag and the next tag, or ``None``.  For the XML data
764
765      .. code-block:: xml
766
767         <a><b>1<c>2<d/>3</c></b>4</a>
768
769      the *a* element has ``None`` for both *text* and *tail* attributes,
770      the *b* element has *text* ``"1"`` and *tail* ``"4"``,
771      the *c* element has *text* ``"2"`` and *tail* ``None``,
772      and the *d* element has *text* ``None`` and *tail* ``"3"``.
773
774      To collect the inner text of an element, see :meth:`itertext`, for
775      example ``"".join(element.itertext())``.
776
777      Applications may store arbitrary objects in these attributes.
778
779
780   .. attribute:: attrib
781
782      A dictionary containing the element's attributes.  Note that while the
783      *attrib* value is always a real mutable Python dictionary, an ElementTree
784      implementation may choose to use another internal representation, and
785      create the dictionary only if someone asks for it.  To take advantage of
786      such implementations, use the dictionary methods below whenever possible.
787
788   The following dictionary-like methods work on the element attributes.
789
790
791   .. method:: clear()
792
793      Resets an element.  This function removes all subelements, clears all
794      attributes, and sets the text and tail attributes to ``None``.
795
796
797   .. method:: get(key, default=None)
798
799      Gets the element attribute named *key*.
800
801      Returns the attribute value, or *default* if the attribute was not found.
802
803
804   .. method:: items()
805
806      Returns the element attributes as a sequence of (name, value) pairs.  The
807      attributes are returned in an arbitrary order.
808
809
810   .. method:: keys()
811
812      Returns the elements attribute names as a list.  The names are returned
813      in an arbitrary order.
814
815
816   .. method:: set(key, value)
817
818      Set the attribute *key* on the element to *value*.
819
820   The following methods work on the element's children (subelements).
821
822
823   .. method:: append(subelement)
824
825      Adds the element *subelement* to the end of this element's internal list
826      of subelements.  Raises :exc:`TypeError` if *subelement* is not an
827      :class:`Element`.
828
829
830   .. method:: extend(subelements)
831
832      Appends *subelements* from a sequence object with zero or more elements.
833      Raises :exc:`TypeError` if a subelement is not an :class:`Element`.
834
835      .. versionadded:: 3.2
836
837
838   .. method:: find(match, namespaces=None)
839
840      Finds the first subelement matching *match*.  *match* may be a tag name
841      or a :ref:`path <elementtree-xpath>`.  Returns an element instance
842      or ``None``.  *namespaces* is an optional mapping from namespace prefix
843      to full name.
844
845
846   .. method:: findall(match, namespaces=None)
847
848      Finds all matching subelements, by tag name or
849      :ref:`path <elementtree-xpath>`.  Returns a list containing all matching
850      elements in document order.  *namespaces* is an optional mapping from
851      namespace prefix to full name.
852
853
854   .. method:: findtext(match, default=None, namespaces=None)
855
856      Finds text for the first subelement matching *match*.  *match* may be
857      a tag name or a :ref:`path <elementtree-xpath>`.  Returns the text content
858      of the first matching element, or *default* if no element was found.
859      Note that if the matching element has no text content an empty string
860      is returned. *namespaces* is an optional mapping from namespace prefix
861      to full name.
862
863
864   .. method:: getchildren()
865
866      .. deprecated:: 3.2
867         Use ``list(elem)`` or iteration.
868
869
870   .. method:: getiterator(tag=None)
871
872      .. deprecated:: 3.2
873         Use method :meth:`Element.iter` instead.
874
875
876   .. method:: insert(index, subelement)
877
878      Inserts *subelement* at the given position in this element.  Raises
879      :exc:`TypeError` if *subelement* is not an :class:`Element`.
880
881
882   .. method:: iter(tag=None)
883
884      Creates a tree :term:`iterator` with the current element as the root.
885      The iterator iterates over this element and all elements below it, in
886      document (depth first) order.  If *tag* is not ``None`` or ``'*'``, only
887      elements whose tag equals *tag* are returned from the iterator.  If the
888      tree structure is modified during iteration, the result is undefined.
889
890      .. versionadded:: 3.2
891
892
893   .. method:: iterfind(match, namespaces=None)
894
895      Finds all matching subelements, by tag name or
896      :ref:`path <elementtree-xpath>`.  Returns an iterable yielding all
897      matching elements in document order. *namespaces* is an optional mapping
898      from namespace prefix to full name.
899
900
901      .. versionadded:: 3.2
902
903
904   .. method:: itertext()
905
906      Creates a text iterator.  The iterator loops over this element and all
907      subelements, in document order, and returns all inner text.
908
909      .. versionadded:: 3.2
910
911
912   .. method:: makeelement(tag, attrib)
913
914      Creates a new element object of the same type as this element.  Do not
915      call this method, use the :func:`SubElement` factory function instead.
916
917
918   .. method:: remove(subelement)
919
920      Removes *subelement* from the element.  Unlike the find\* methods this
921      method compares elements based on the instance identity, not on tag value
922      or contents.
923
924   :class:`Element` objects also support the following sequence type methods
925   for working with subelements: :meth:`~object.__delitem__`,
926   :meth:`~object.__getitem__`, :meth:`~object.__setitem__`,
927   :meth:`~object.__len__`.
928
929   Caution: Elements with no subelements will test as ``False``.  This behavior
930   will change in future versions.  Use specific ``len(elem)`` or ``elem is
931   None`` test instead. ::
932
933     element = root.find('foo')
934
935     if not element:  # careful!
936         print("element not found, or element has no subelements")
937
938     if element is None:
939         print("element not found")
940
941
942.. _elementtree-elementtree-objects:
943
944ElementTree Objects
945^^^^^^^^^^^^^^^^^^^
946
947
948.. class:: ElementTree(element=None, file=None)
949
950   ElementTree wrapper class.  This class represents an entire element
951   hierarchy, and adds some extra support for serialization to and from
952   standard XML.
953
954   *element* is the root element.  The tree is initialized with the contents
955   of the XML *file* if given.
956
957
958   .. method:: _setroot(element)
959
960      Replaces the root element for this tree.  This discards the current
961      contents of the tree, and replaces it with the given element.  Use with
962      care.  *element* is an element instance.
963
964
965   .. method:: find(match, namespaces=None)
966
967      Same as :meth:`Element.find`, starting at the root of the tree.
968
969
970   .. method:: findall(match, namespaces=None)
971
972      Same as :meth:`Element.findall`, starting at the root of the tree.
973
974
975   .. method:: findtext(match, default=None, namespaces=None)
976
977      Same as :meth:`Element.findtext`, starting at the root of the tree.
978
979
980   .. method:: getiterator(tag=None)
981
982      .. deprecated:: 3.2
983         Use method :meth:`ElementTree.iter` instead.
984
985
986   .. method:: getroot()
987
988      Returns the root element for this tree.
989
990
991   .. method:: iter(tag=None)
992
993      Creates and returns a tree iterator for the root element.  The iterator
994      loops over all elements in this tree, in section order.  *tag* is the tag
995      to look for (default is to return all elements).
996
997
998   .. method:: iterfind(match, namespaces=None)
999
1000      Same as :meth:`Element.iterfind`, starting at the root of the tree.
1001
1002      .. versionadded:: 3.2
1003
1004
1005   .. method:: parse(source, parser=None)
1006
1007      Loads an external XML section into this element tree.  *source* is a file
1008      name or :term:`file object`.  *parser* is an optional parser instance.
1009      If not given, the standard :class:`XMLParser` parser is used.  Returns the
1010      section root element.
1011
1012
1013   .. method:: write(file, encoding="us-ascii", xml_declaration=None, \
1014                     default_namespace=None, method="xml", *, \
1015                     short_empty_elements=True)
1016
1017      Writes the element tree to a file, as XML.  *file* is a file name, or a
1018      :term:`file object` opened for writing.  *encoding* [1]_ is the output
1019      encoding (default is US-ASCII).
1020      *xml_declaration* controls if an XML declaration should be added to the
1021      file.  Use ``False`` for never, ``True`` for always, ``None``
1022      for only if not US-ASCII or UTF-8 or Unicode (default is ``None``).
1023      *default_namespace* sets the default XML namespace (for "xmlns").
1024      *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is
1025      ``"xml"``).
1026      The keyword-only *short_empty_elements* parameter controls the formatting
1027      of elements that contain no content.  If ``True`` (the default), they are
1028      emitted as a single self-closed tag, otherwise they are emitted as a pair
1029      of start/end tags.
1030
1031      The output is either a string (:class:`str`) or binary (:class:`bytes`).
1032      This is controlled by the *encoding* argument.  If *encoding* is
1033      ``"unicode"``, the output is a string; otherwise, it's binary.  Note that
1034      this may conflict with the type of *file* if it's an open
1035      :term:`file object`; make sure you do not try to write a string to a
1036      binary stream and vice versa.
1037
1038      .. versionadded:: 3.4
1039         The *short_empty_elements* parameter.
1040
1041
1042This is the XML file that is going to be manipulated::
1043
1044    <html>
1045        <head>
1046            <title>Example page</title>
1047        </head>
1048        <body>
1049            <p>Moved to <a href="http://example.org/">example.org</a>
1050            or <a href="http://example.com/">example.com</a>.</p>
1051        </body>
1052    </html>
1053
1054Example of changing the attribute "target" of every link in first paragraph::
1055
1056    >>> from xml.etree.ElementTree import ElementTree
1057    >>> tree = ElementTree()
1058    >>> tree.parse("index.xhtml")
1059    <Element 'html' at 0xb77e6fac>
1060    >>> p = tree.find("body/p")     # Finds first occurrence of tag p in body
1061    >>> p
1062    <Element 'p' at 0xb77ec26c>
1063    >>> links = list(p.iter("a"))   # Returns list of all links
1064    >>> links
1065    [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
1066    >>> for i in links:             # Iterates through all found links
1067    ...     i.attrib["target"] = "blank"
1068    >>> tree.write("output.xhtml")
1069
1070.. _elementtree-qname-objects:
1071
1072QName Objects
1073^^^^^^^^^^^^^
1074
1075
1076.. class:: QName(text_or_uri, tag=None)
1077
1078   QName wrapper.  This can be used to wrap a QName attribute value, in order
1079   to get proper namespace handling on output.  *text_or_uri* is a string
1080   containing the QName value, in the form {uri}local, or, if the tag argument
1081   is given, the URI part of a QName.  If *tag* is given, the first argument is
1082   interpreted as a URI, and this argument is interpreted as a local name.
1083   :class:`QName` instances are opaque.
1084
1085
1086
1087.. _elementtree-treebuilder-objects:
1088
1089TreeBuilder Objects
1090^^^^^^^^^^^^^^^^^^^
1091
1092
1093.. class:: TreeBuilder(element_factory=None)
1094
1095   Generic element structure builder.  This builder converts a sequence of
1096   start, data, and end method calls to a well-formed element structure.  You
1097   can use this class to build an element structure using a custom XML parser,
1098   or a parser for some other XML-like format.  *element_factory*, when given,
1099   must be a callable accepting two positional arguments: a tag and
1100   a dict of attributes.  It is expected to return a new element instance.
1101
1102   .. method:: close()
1103
1104      Flushes the builder buffers, and returns the toplevel document
1105      element.  Returns an :class:`Element` instance.
1106
1107
1108   .. method:: data(data)
1109
1110      Adds text to the current element.  *data* is a string.  This should be
1111      either a bytestring, or a Unicode string.
1112
1113
1114   .. method:: end(tag)
1115
1116      Closes the current element.  *tag* is the element name.  Returns the
1117      closed element.
1118
1119
1120   .. method:: start(tag, attrs)
1121
1122      Opens a new element.  *tag* is the element name.  *attrs* is a dictionary
1123      containing element attributes.  Returns the opened element.
1124
1125
1126   In addition, a custom :class:`TreeBuilder` object can provide the
1127   following method:
1128
1129   .. method:: doctype(name, pubid, system)
1130
1131      Handles a doctype declaration.  *name* is the doctype name.  *pubid* is
1132      the public identifier.  *system* is the system identifier.  This method
1133      does not exist on the default :class:`TreeBuilder` class.
1134
1135      .. versionadded:: 3.2
1136
1137
1138.. _elementtree-xmlparser-objects:
1139
1140XMLParser Objects
1141^^^^^^^^^^^^^^^^^
1142
1143
1144.. class:: XMLParser(html=0, target=None, encoding=None)
1145
1146   This class is the low-level building block of the module.  It uses
1147   :mod:`xml.parsers.expat` for efficient, event-based parsing of XML.  It can
1148   be fed XML data incrementally with the :meth:`feed` method, and parsing
1149   events are translated to a push API - by invoking callbacks on the *target*
1150   object.  If *target* is omitted, the standard :class:`TreeBuilder` is used.
1151   The *html* argument was historically used for backwards compatibility and is
1152   now deprecated.  If *encoding* [1]_ is given, the value overrides the
1153   encoding specified in the XML file.
1154
1155   .. deprecated:: 3.4
1156      The *html* argument.  The remaining arguments should be passed via
1157      keyword to prepare for the removal of the *html* argument.
1158
1159   .. method:: close()
1160
1161      Finishes feeding data to the parser.  Returns the result of calling the
1162      ``close()`` method of the *target* passed during construction; by default,
1163      this is the toplevel document element.
1164
1165
1166   .. method:: doctype(name, pubid, system)
1167
1168      .. deprecated:: 3.2
1169         Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
1170         target.
1171
1172
1173   .. method:: feed(data)
1174
1175      Feeds data to the parser.  *data* is encoded data.
1176
1177   :meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method
1178   for each opening tag, its ``end(tag)`` method for each closing tag, and data
1179   is processed by method ``data(data)``.  :meth:`XMLParser.close` calls
1180   *target*\'s method ``close()``. :class:`XMLParser` can be used not only for
1181   building a tree structure. This is an example of counting the maximum depth
1182   of an XML file::
1183
1184    >>> from xml.etree.ElementTree import XMLParser
1185    >>> class MaxDepth:                     # The target object of the parser
1186    ...     maxDepth = 0
1187    ...     depth = 0
1188    ...     def start(self, tag, attrib):   # Called for each opening tag.
1189    ...         self.depth += 1
1190    ...         if self.depth > self.maxDepth:
1191    ...             self.maxDepth = self.depth
1192    ...     def end(self, tag):             # Called for each closing tag.
1193    ...         self.depth -= 1
1194    ...     def data(self, data):
1195    ...         pass            # We do not need to do anything with data.
1196    ...     def close(self):    # Called when all data has been parsed.
1197    ...         return self.maxDepth
1198    ...
1199    >>> target = MaxDepth()
1200    >>> parser = XMLParser(target=target)
1201    >>> exampleXml = """
1202    ... <a>
1203    ...   <b>
1204    ...   </b>
1205    ...   <b>
1206    ...     <c>
1207    ...       <d>
1208    ...       </d>
1209    ...     </c>
1210    ...   </b>
1211    ... </a>"""
1212    >>> parser.feed(exampleXml)
1213    >>> parser.close()
1214    4
1215
1216
1217.. _elementtree-xmlpullparser-objects:
1218
1219XMLPullParser Objects
1220^^^^^^^^^^^^^^^^^^^^^
1221
1222.. class:: XMLPullParser(events=None)
1223
1224   A pull parser suitable for non-blocking applications.  Its input-side API is
1225   similar to that of :class:`XMLParser`, but instead of pushing calls to a
1226   callback target, :class:`XMLPullParser` collects an internal list of parsing
1227   events and lets the user read from it. *events* is a sequence of events to
1228   report back.  The supported events are the strings ``"start"``, ``"end"``,
1229   ``"start-ns"`` and ``"end-ns"`` (the "ns" events are used to get detailed
1230   namespace information).  If *events* is omitted, only ``"end"`` events are
1231   reported.
1232
1233   .. method:: feed(data)
1234
1235      Feed the given bytes data to the parser.
1236
1237   .. method:: close()
1238
1239      Signal the parser that the data stream is terminated. Unlike
1240      :meth:`XMLParser.close`, this method always returns :const:`None`.
1241      Any events not yet retrieved when the parser is closed can still be
1242      read with :meth:`read_events`.
1243
1244   .. method:: read_events()
1245
1246      Return an iterator over the events which have been encountered in the
1247      data fed to the
1248      parser.  The iterator yields ``(event, elem)`` pairs, where *event* is a
1249      string representing the type of event (e.g. ``"end"``) and *elem* is the
1250      encountered :class:`Element` object.
1251
1252      Events provided in a previous call to :meth:`read_events` will not be
1253      yielded again.  Events are consumed from the internal queue only when
1254      they are retrieved from the iterator, so multiple readers iterating in
1255      parallel over iterators obtained from :meth:`read_events` will have
1256      unpredictable results.
1257
1258   .. note::
1259
1260      :class:`XMLPullParser` only guarantees that it has seen the ">"
1261      character of a starting tag when it emits a "start" event, so the
1262      attributes are defined, but the contents of the text and tail attributes
1263      are undefined at that point.  The same applies to the element children;
1264      they may or may not be present.
1265
1266      If you need a fully populated element, look for "end" events instead.
1267
1268   .. versionadded:: 3.4
1269
1270Exceptions
1271^^^^^^^^^^
1272
1273.. class:: ParseError
1274
1275   XML parse error, raised by the various parsing methods in this module when
1276   parsing fails.  The string representation of an instance of this exception
1277   will contain a user-friendly error message.  In addition, it will have
1278   the following attributes available:
1279
1280   .. attribute:: code
1281
1282      A numeric error code from the expat parser. See the documentation of
1283      :mod:`xml.parsers.expat` for the list of error codes and their meanings.
1284
1285   .. attribute:: position
1286
1287      A tuple of *line*, *column* numbers, specifying where the error occurred.
1288
1289.. rubric:: Footnotes
1290
1291.. [1] The encoding string included in XML output should conform to the
1292   appropriate standards.  For example, "UTF-8" is valid, but "UTF8" is
1293   not.  See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
1294   and https://www.iana.org/assignments/character-sets/character-sets.xhtml.
1295