1:mod:`xml.etree.ElementTree` --- The ElementTree XML API 2======================================================== 3 4.. module:: xml.etree.ElementTree 5 :synopsis: Implementation of the ElementTree API. 6 7.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com> 8 9**Source code:** :source:`Lib/xml/etree/ElementTree.py` 10 11-------------- 12 13The :mod:`xml.etree.ElementTree` module implements a simple and efficient API 14for parsing and creating XML data. 15 16.. versionchanged:: 3.3 17 This module will use a fast implementation whenever available. 18 The :mod:`xml.etree.cElementTree` module is deprecated. 19 20 21.. warning:: 22 23 The :mod:`xml.etree.ElementTree` module is not secure against 24 maliciously constructed data. If you need to parse untrusted or 25 unauthenticated data see :ref:`xml-vulnerabilities`. 26 27Tutorial 28-------- 29 30This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in 31short). The goal is to demonstrate some of the building blocks and basic 32concepts of the module. 33 34XML tree and elements 35^^^^^^^^^^^^^^^^^^^^^ 36 37XML is an inherently hierarchical data format, and the most natural way to 38represent it is with a tree. ``ET`` has two classes for this purpose - 39:class:`ElementTree` represents the whole XML document as a tree, and 40:class:`Element` represents a single node in this tree. Interactions with 41the whole document (reading and writing to/from files) are usually done 42on the :class:`ElementTree` level. Interactions with a single XML element 43and its sub-elements are done on the :class:`Element` level. 44 45.. _elementtree-parsing-xml: 46 47Parsing XML 48^^^^^^^^^^^ 49 50We'll be using the following XML document as the sample data for this section: 51 52.. code-block:: xml 53 54 <?xml version="1.0"?> 55 <data> 56 <country name="Liechtenstein"> 57 <rank>1</rank> 58 <year>2008</year> 59 <gdppc>141100</gdppc> 60 <neighbor name="Austria" direction="E"/> 61 <neighbor name="Switzerland" direction="W"/> 62 </country> 63 <country name="Singapore"> 64 <rank>4</rank> 65 <year>2011</year> 66 <gdppc>59900</gdppc> 67 <neighbor name="Malaysia" direction="N"/> 68 </country> 69 <country name="Panama"> 70 <rank>68</rank> 71 <year>2011</year> 72 <gdppc>13600</gdppc> 73 <neighbor name="Costa Rica" direction="W"/> 74 <neighbor name="Colombia" direction="E"/> 75 </country> 76 </data> 77 78We can import this data by reading from a file:: 79 80 import xml.etree.ElementTree as ET 81 tree = ET.parse('country_data.xml') 82 root = tree.getroot() 83 84Or directly from a string:: 85 86 root = ET.fromstring(country_data_as_string) 87 88:func:`fromstring` parses XML from a string directly into an :class:`Element`, 89which is the root element of the parsed tree. Other parsing functions may 90create an :class:`ElementTree`. Check the documentation to be sure. 91 92As an :class:`Element`, ``root`` has a tag and a dictionary of attributes:: 93 94 >>> root.tag 95 'data' 96 >>> root.attrib 97 {} 98 99It also has children nodes over which we can iterate:: 100 101 >>> for child in root: 102 ... print(child.tag, child.attrib) 103 ... 104 country {'name': 'Liechtenstein'} 105 country {'name': 'Singapore'} 106 country {'name': 'Panama'} 107 108Children are nested, and we can access specific child nodes by index:: 109 110 >>> root[0][1].text 111 '2008' 112 113 114.. note:: 115 116 Not all elements of the XML input will end up as elements of the 117 parsed tree. Currently, this module skips over any XML comments, 118 processing instructions, and document type declarations in the 119 input. Nevertheless, trees built using this module's API rather 120 than parsing from XML text can have comments and processing 121 instructions in them; they will be included when generating XML 122 output. A document type declaration may be accessed by passing a 123 custom :class:`TreeBuilder` instance to the :class:`XMLParser` 124 constructor. 125 126 127.. _elementtree-pull-parsing: 128 129Pull API for non-blocking parsing 130^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 131 132Most parsing functions provided by this module require the whole document 133to be read at once before returning any result. It is possible to use an 134:class:`XMLParser` and feed data into it incrementally, but it is a push API that 135calls methods on a callback target, which is too low-level and inconvenient for 136most needs. Sometimes what the user really wants is to be able to parse XML 137incrementally, without blocking operations, while enjoying the convenience of 138fully constructed :class:`Element` objects. 139 140The most powerful tool for doing this is :class:`XMLPullParser`. It does not 141require a blocking read to obtain the XML data, and is instead fed with data 142incrementally with :meth:`XMLPullParser.feed` calls. To get the parsed XML 143elements, call :meth:`XMLPullParser.read_events`. Here is an example:: 144 145 >>> parser = ET.XMLPullParser(['start', 'end']) 146 >>> parser.feed('<mytag>sometext') 147 >>> list(parser.read_events()) 148 [('start', <Element 'mytag' at 0x7fa66db2be58>)] 149 >>> parser.feed(' more text</mytag>') 150 >>> for event, elem in parser.read_events(): 151 ... print(event) 152 ... print(elem.tag, 'text=', elem.text) 153 ... 154 end 155 156The obvious use case is applications that operate in a non-blocking fashion 157where the XML data is being received from a socket or read incrementally from 158some storage device. In such cases, blocking reads are unacceptable. 159 160Because it's so flexible, :class:`XMLPullParser` can be inconvenient to use for 161simpler use-cases. If you don't mind your application blocking on reading XML 162data but would still like to have incremental parsing capabilities, take a look 163at :func:`iterparse`. It can be useful when you're reading a large XML document 164and don't want to hold it wholly in memory. 165 166Finding interesting elements 167^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 168 169:class:`Element` has some useful methods that help iterate recursively over all 170the sub-tree below it (its children, their children, and so on). For example, 171:meth:`Element.iter`:: 172 173 >>> for neighbor in root.iter('neighbor'): 174 ... print(neighbor.attrib) 175 ... 176 {'name': 'Austria', 'direction': 'E'} 177 {'name': 'Switzerland', 'direction': 'W'} 178 {'name': 'Malaysia', 'direction': 'N'} 179 {'name': 'Costa Rica', 'direction': 'W'} 180 {'name': 'Colombia', 'direction': 'E'} 181 182:meth:`Element.findall` finds only elements with a tag which are direct 183children of the current element. :meth:`Element.find` finds the *first* child 184with a particular tag, and :attr:`Element.text` accesses the element's text 185content. :meth:`Element.get` accesses the element's attributes:: 186 187 >>> for country in root.findall('country'): 188 ... rank = country.find('rank').text 189 ... name = country.get('name') 190 ... print(name, rank) 191 ... 192 Liechtenstein 1 193 Singapore 4 194 Panama 68 195 196More sophisticated specification of which elements to look for is possible by 197using :ref:`XPath <elementtree-xpath>`. 198 199Modifying an XML File 200^^^^^^^^^^^^^^^^^^^^^ 201 202:class:`ElementTree` provides a simple way to build XML documents and write them to files. 203The :meth:`ElementTree.write` method serves this purpose. 204 205Once created, an :class:`Element` object may be manipulated by directly changing 206its fields (such as :attr:`Element.text`), adding and modifying attributes 207(:meth:`Element.set` method), as well as adding new children (for example 208with :meth:`Element.append`). 209 210Let's say we want to add one to each country's rank, and add an ``updated`` 211attribute to the rank element:: 212 213 >>> for rank in root.iter('rank'): 214 ... new_rank = int(rank.text) + 1 215 ... rank.text = str(new_rank) 216 ... rank.set('updated', 'yes') 217 ... 218 >>> tree.write('output.xml') 219 220Our XML now looks like this: 221 222.. code-block:: xml 223 224 <?xml version="1.0"?> 225 <data> 226 <country name="Liechtenstein"> 227 <rank updated="yes">2</rank> 228 <year>2008</year> 229 <gdppc>141100</gdppc> 230 <neighbor name="Austria" direction="E"/> 231 <neighbor name="Switzerland" direction="W"/> 232 </country> 233 <country name="Singapore"> 234 <rank updated="yes">5</rank> 235 <year>2011</year> 236 <gdppc>59900</gdppc> 237 <neighbor name="Malaysia" direction="N"/> 238 </country> 239 <country name="Panama"> 240 <rank updated="yes">69</rank> 241 <year>2011</year> 242 <gdppc>13600</gdppc> 243 <neighbor name="Costa Rica" direction="W"/> 244 <neighbor name="Colombia" direction="E"/> 245 </country> 246 </data> 247 248We can remove elements using :meth:`Element.remove`. Let's say we want to 249remove all countries with a rank higher than 50:: 250 251 >>> for country in root.findall('country'): 252 ... rank = int(country.find('rank').text) 253 ... if rank > 50: 254 ... root.remove(country) 255 ... 256 >>> tree.write('output.xml') 257 258Our XML now looks like this: 259 260.. code-block:: xml 261 262 <?xml version="1.0"?> 263 <data> 264 <country name="Liechtenstein"> 265 <rank updated="yes">2</rank> 266 <year>2008</year> 267 <gdppc>141100</gdppc> 268 <neighbor name="Austria" direction="E"/> 269 <neighbor name="Switzerland" direction="W"/> 270 </country> 271 <country name="Singapore"> 272 <rank updated="yes">5</rank> 273 <year>2011</year> 274 <gdppc>59900</gdppc> 275 <neighbor name="Malaysia" direction="N"/> 276 </country> 277 </data> 278 279Building XML documents 280^^^^^^^^^^^^^^^^^^^^^^ 281 282The :func:`SubElement` function also provides a convenient way to create new 283sub-elements for a given element:: 284 285 >>> a = ET.Element('a') 286 >>> b = ET.SubElement(a, 'b') 287 >>> c = ET.SubElement(a, 'c') 288 >>> d = ET.SubElement(c, 'd') 289 >>> ET.dump(a) 290 <a><b /><c><d /></c></a> 291 292Parsing XML with Namespaces 293^^^^^^^^^^^^^^^^^^^^^^^^^^^ 294 295If the XML input has `namespaces 296<https://en.wikipedia.org/wiki/XML_namespace>`__, tags and attributes 297with prefixes in the form ``prefix:sometag`` get expanded to 298``{uri}sometag`` where the *prefix* is replaced by the full *URI*. 299Also, if there is a `default namespace 300<https://www.w3.org/TR/xml-names/#defaulting>`__, 301that full URI gets prepended to all of the non-prefixed tags. 302 303Here is an XML example that incorporates two namespaces, one with the 304prefix "fictional" and the other serving as the default namespace: 305 306.. code-block:: xml 307 308 <?xml version="1.0"?> 309 <actors xmlns:fictional="http://characters.example.com" 310 xmlns="http://people.example.com"> 311 <actor> 312 <name>John Cleese</name> 313 <fictional:character>Lancelot</fictional:character> 314 <fictional:character>Archie Leach</fictional:character> 315 </actor> 316 <actor> 317 <name>Eric Idle</name> 318 <fictional:character>Sir Robin</fictional:character> 319 <fictional:character>Gunther</fictional:character> 320 <fictional:character>Commander Clement</fictional:character> 321 </actor> 322 </actors> 323 324One way to search and explore this XML example is to manually add the 325URI to every tag or attribute in the xpath of a 326:meth:`~Element.find` or :meth:`~Element.findall`:: 327 328 root = fromstring(xml_text) 329 for actor in root.findall('{http://people.example.com}actor'): 330 name = actor.find('{http://people.example.com}name') 331 print(name.text) 332 for char in actor.findall('{http://characters.example.com}character'): 333 print(' |-->', char.text) 334 335A better way to search the namespaced XML example is to create a 336dictionary with your own prefixes and use those in the search functions:: 337 338 ns = {'real_person': 'http://people.example.com', 339 'role': 'http://characters.example.com'} 340 341 for actor in root.findall('real_person:actor', ns): 342 name = actor.find('real_person:name', ns) 343 print(name.text) 344 for char in actor.findall('role:character', ns): 345 print(' |-->', char.text) 346 347These two approaches both output:: 348 349 John Cleese 350 |--> Lancelot 351 |--> Archie Leach 352 Eric Idle 353 |--> Sir Robin 354 |--> Gunther 355 |--> Commander Clement 356 357 358Additional resources 359^^^^^^^^^^^^^^^^^^^^ 360 361See http://effbot.org/zone/element-index.htm for tutorials and links to other 362docs. 363 364 365.. _elementtree-xpath: 366 367XPath support 368------------- 369 370This module provides limited support for 371`XPath expressions <https://www.w3.org/TR/xpath>`_ for locating elements in a 372tree. The goal is to support a small subset of the abbreviated syntax; a full 373XPath engine is outside the scope of the module. 374 375Example 376^^^^^^^ 377 378Here's an example that demonstrates some of the XPath capabilities of the 379module. We'll be using the ``countrydata`` XML document from the 380:ref:`Parsing XML <elementtree-parsing-xml>` section:: 381 382 import xml.etree.ElementTree as ET 383 384 root = ET.fromstring(countrydata) 385 386 # Top-level elements 387 root.findall(".") 388 389 # All 'neighbor' grand-children of 'country' children of the top-level 390 # elements 391 root.findall("./country/neighbor") 392 393 # Nodes with name='Singapore' that have a 'year' child 394 root.findall(".//year/..[@name='Singapore']") 395 396 # 'year' nodes that are children of nodes with name='Singapore' 397 root.findall(".//*[@name='Singapore']/year") 398 399 # All 'neighbor' nodes that are the second child of their parent 400 root.findall(".//neighbor[2]") 401 402Supported XPath syntax 403^^^^^^^^^^^^^^^^^^^^^^ 404 405.. tabularcolumns:: |l|L| 406 407+-----------------------+------------------------------------------------------+ 408| Syntax | Meaning | 409+=======================+======================================================+ 410| ``tag`` | Selects all child elements with the given tag. | 411| | For example, ``spam`` selects all child elements | 412| | named ``spam``, and ``spam/egg`` selects all | 413| | grandchildren named ``egg`` in all children named | 414| | ``spam``. | 415+-----------------------+------------------------------------------------------+ 416| ``*`` | Selects all child elements. For example, ``*/egg`` | 417| | selects all grandchildren named ``egg``. | 418+-----------------------+------------------------------------------------------+ 419| ``.`` | Selects the current node. This is mostly useful | 420| | at the beginning of the path, to indicate that it's | 421| | a relative path. | 422+-----------------------+------------------------------------------------------+ 423| ``//`` | Selects all subelements, on all levels beneath the | 424| | current element. For example, ``.//egg`` selects | 425| | all ``egg`` elements in the entire tree. | 426+-----------------------+------------------------------------------------------+ 427| ``..`` | Selects the parent element. Returns ``None`` if the | 428| | path attempts to reach the ancestors of the start | 429| | element (the element ``find`` was called on). | 430+-----------------------+------------------------------------------------------+ 431| ``[@attrib]`` | Selects all elements that have the given attribute. | 432+-----------------------+------------------------------------------------------+ 433| ``[@attrib='value']`` | Selects all elements for which the given attribute | 434| | has the given value. The value cannot contain | 435| | quotes. | 436+-----------------------+------------------------------------------------------+ 437| ``[tag]`` | Selects all elements that have a child named | 438| | ``tag``. Only immediate children are supported. | 439+-----------------------+------------------------------------------------------+ 440| ``[.='text']`` | Selects all elements whose complete text content, | 441| | including descendants, equals the given ``text``. | 442| | | 443| | .. versionadded:: 3.7 | 444+-----------------------+------------------------------------------------------+ 445| ``[tag='text']`` | Selects all elements that have a child named | 446| | ``tag`` whose complete text content, including | 447| | descendants, equals the given ``text``. | 448+-----------------------+------------------------------------------------------+ 449| ``[position]`` | Selects all elements that are located at the given | 450| | position. The position can be either an integer | 451| | (1 is the first position), the expression ``last()`` | 452| | (for the last position), or a position relative to | 453| | the last position (e.g. ``last()-1``). | 454+-----------------------+------------------------------------------------------+ 455 456Predicates (expressions within square brackets) must be preceded by a tag 457name, an asterisk, or another predicate. ``position`` predicates must be 458preceded by a tag name. 459 460Reference 461--------- 462 463.. _elementtree-functions: 464 465Functions 466^^^^^^^^^ 467 468 469.. function:: Comment(text=None) 470 471 Comment element factory. This factory function creates a special element 472 that will be serialized as an XML comment by the standard serializer. The 473 comment string can be either a bytestring or a Unicode string. *text* is a 474 string containing the comment string. Returns an element instance 475 representing a comment. 476 477 Note that :class:`XMLParser` skips over comments in the input 478 instead of creating comment objects for them. An :class:`ElementTree` will 479 only contain comment nodes if they have been inserted into to 480 the tree using one of the :class:`Element` methods. 481 482.. function:: dump(elem) 483 484 Writes an element tree or element structure to sys.stdout. This function 485 should be used for debugging only. 486 487 The exact output format is implementation dependent. In this version, it's 488 written as an ordinary XML file. 489 490 *elem* is an element tree or an individual element. 491 492 493.. function:: fromstring(text, parser=None) 494 495 Parses an XML section from a string constant. Same as :func:`XML`. *text* 496 is a string containing XML data. *parser* is an optional parser instance. 497 If not given, the standard :class:`XMLParser` parser is used. 498 Returns an :class:`Element` instance. 499 500 501.. function:: fromstringlist(sequence, parser=None) 502 503 Parses an XML document from a sequence of string fragments. *sequence* is a 504 list or other sequence containing XML data fragments. *parser* is an 505 optional parser instance. If not given, the standard :class:`XMLParser` 506 parser is used. Returns an :class:`Element` instance. 507 508 .. versionadded:: 3.2 509 510 511.. function:: iselement(element) 512 513 Check if an object appears to be a valid element object. *element* is an 514 element instance. Return ``True`` if this is an element object. 515 516 517.. function:: iterparse(source, events=None, parser=None) 518 519 Parses an XML section into an element tree incrementally, and reports what's 520 going on to the user. *source* is a filename or :term:`file object` 521 containing XML data. *events* is a sequence of events to report back. The 522 supported events are the strings ``"start"``, ``"end"``, ``"start-ns"`` and 523 ``"end-ns"`` (the "ns" events are used to get detailed namespace 524 information). If *events* is omitted, only ``"end"`` events are reported. 525 *parser* is an optional parser instance. If not given, the standard 526 :class:`XMLParser` parser is used. *parser* must be a subclass of 527 :class:`XMLParser` and can only use the default :class:`TreeBuilder` as a 528 target. Returns an :term:`iterator` providing ``(event, elem)`` pairs. 529 530 Note that while :func:`iterparse` builds the tree incrementally, it issues 531 blocking reads on *source* (or the file it names). As such, it's unsuitable 532 for applications where blocking reads can't be made. For fully non-blocking 533 parsing, see :class:`XMLPullParser`. 534 535 .. note:: 536 537 :func:`iterparse` only guarantees that it has seen the ">" character of a 538 starting tag when it emits a "start" event, so the attributes are defined, 539 but the contents of the text and tail attributes are undefined at that 540 point. The same applies to the element children; they may or may not be 541 present. 542 543 If you need a fully populated element, look for "end" events instead. 544 545 .. deprecated:: 3.4 546 The *parser* argument. 547 548.. function:: parse(source, parser=None) 549 550 Parses an XML section into an element tree. *source* is a filename or file 551 object containing XML data. *parser* is an optional parser instance. If 552 not given, the standard :class:`XMLParser` parser is used. Returns an 553 :class:`ElementTree` instance. 554 555 556.. function:: ProcessingInstruction(target, text=None) 557 558 PI element factory. This factory function creates a special element that 559 will be serialized as an XML processing instruction. *target* is a string 560 containing the PI target. *text* is a string containing the PI contents, if 561 given. Returns an element instance, representing a processing instruction. 562 563 Note that :class:`XMLParser` skips over processing instructions 564 in the input instead of creating comment objects for them. An 565 :class:`ElementTree` will only contain processing instruction nodes if 566 they have been inserted into to the tree using one of the 567 :class:`Element` methods. 568 569.. function:: register_namespace(prefix, uri) 570 571 Registers a namespace prefix. The registry is global, and any existing 572 mapping for either the given prefix or the namespace URI will be removed. 573 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and 574 attributes in this namespace will be serialized with the given prefix, if at 575 all possible. 576 577 .. versionadded:: 3.2 578 579 580.. function:: SubElement(parent, tag, attrib={}, **extra) 581 582 Subelement factory. This function creates an element instance, and appends 583 it to an existing element. 584 585 The element name, attribute names, and attribute values can be either 586 bytestrings or Unicode strings. *parent* is the parent element. *tag* is 587 the subelement name. *attrib* is an optional dictionary, containing element 588 attributes. *extra* contains additional attributes, given as keyword 589 arguments. Returns an element instance. 590 591 592.. function:: tostring(element, encoding="us-ascii", method="xml", *, \ 593 short_empty_elements=True) 594 595 Generates a string representation of an XML element, including all 596 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is 597 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to 598 generate a Unicode string (otherwise, a bytestring is generated). *method* 599 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). 600 *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`. 601 Returns an (optionally) encoded string containing the XML data. 602 603 .. versionadded:: 3.4 604 The *short_empty_elements* parameter. 605 606 607.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \ 608 short_empty_elements=True) 609 610 Generates a string representation of an XML element, including all 611 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is 612 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to 613 generate a Unicode string (otherwise, a bytestring is generated). *method* 614 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). 615 *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`. 616 Returns a list of (optionally) encoded strings containing the XML data. 617 It does not guarantee any specific sequence, except that 618 ``b"".join(tostringlist(element)) == tostring(element)``. 619 620 .. versionadded:: 3.2 621 622 .. versionadded:: 3.4 623 The *short_empty_elements* parameter. 624 625 626.. function:: XML(text, parser=None) 627 628 Parses an XML section from a string constant. This function can be used to 629 embed "XML literals" in Python code. *text* is a string containing XML 630 data. *parser* is an optional parser instance. If not given, the standard 631 :class:`XMLParser` parser is used. Returns an :class:`Element` instance. 632 633 634.. function:: XMLID(text, parser=None) 635 636 Parses an XML section from a string constant, and also returns a dictionary 637 which maps from element id:s to elements. *text* is a string containing XML 638 data. *parser* is an optional parser instance. If not given, the standard 639 :class:`XMLParser` parser is used. Returns a tuple containing an 640 :class:`Element` instance and a dictionary. 641 642 643.. _elementtree-xinclude: 644 645XInclude support 646---------------- 647 648This module provides limited support for 649`XInclude directives <https://www.w3.org/TR/xinclude/>`_, via the :mod:`xml.etree.ElementInclude` helper module. This module can be used to insert subtrees and text strings into element trees, based on information in the tree. 650 651Example 652^^^^^^^ 653 654Here's an example that demonstrates use of the XInclude module. To include an XML document in the current document, use the ``{http://www.w3.org/2001/XInclude}include`` element and set the **parse** attribute to ``"xml"``, and use the **href** attribute to specify the document to include. 655 656.. code-block:: xml 657 658 <?xml version="1.0"?> 659 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 660 <xi:include href="source.xml" parse="xml" /> 661 </document> 662 663By default, the **href** attribute is treated as a file name. You can use custom loaders to override this behaviour. Also note that the standard helper does not support XPointer syntax. 664 665To process this file, load it as usual, and pass the root element to the :mod:`xml.etree.ElementTree` module: 666 667.. code-block:: python 668 669 from xml.etree import ElementTree, ElementInclude 670 671 tree = ElementTree.parse("document.xml") 672 root = tree.getroot() 673 674 ElementInclude.include(root) 675 676The ElementInclude module replaces the ``{http://www.w3.org/2001/XInclude}include`` element with the root element from the **source.xml** document. The result might look something like this: 677 678.. code-block:: xml 679 680 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 681 <para>This is a paragraph.</para> 682 </document> 683 684If the **parse** attribute is omitted, it defaults to "xml". The href attribute is required. 685 686To include a text document, use the ``{http://www.w3.org/2001/XInclude}include`` element, and set the **parse** attribute to "text": 687 688.. code-block:: xml 689 690 <?xml version="1.0"?> 691 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 692 Copyright (c) <xi:include href="year.txt" parse="text" />. 693 </document> 694 695The result might look something like: 696 697.. code-block:: xml 698 699 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 700 Copyright (c) 2003. 701 </document> 702 703Reference 704--------- 705 706.. _elementinclude-functions: 707 708Functions 709^^^^^^^^^ 710 711.. function:: xml.etree.ElementInclude.default_loader( href, parse, encoding=None) 712 713 Default loader. This default loader reads an included resource from disk. *href* is a URL. 714 *parse* is for parse mode either "xml" or "text". *encoding* 715 is an optional text encoding. If not given, encoding is ``utf-8``. Returns the 716 expanded resource. If the parse mode is ``"xml"``, this is an ElementTree 717 instance. If the parse mode is "text", this is a Unicode string. If the 718 loader fails, it can return None or raise an exception. 719 720 721.. function:: xml.etree.ElementInclude.include( elem, loader=None) 722 723 This function expands XInclude directives. *elem* is the root element. *loader* is 724 an optional resource loader. If omitted, it defaults to :func:`default_loader`. 725 If given, it should be a callable that implements the same interface as 726 :func:`default_loader`. Returns the expanded resource. If the parse mode is 727 ``"xml"``, this is an ElementTree instance. If the parse mode is "text", 728 this is a Unicode string. If the loader fails, it can return None or 729 raise an exception. 730 731 732.. _elementtree-element-objects: 733 734Element Objects 735^^^^^^^^^^^^^^^ 736 737.. class:: Element(tag, attrib={}, **extra) 738 739 Element class. This class defines the Element interface, and provides a 740 reference implementation of this interface. 741 742 The element name, attribute names, and attribute values can be either 743 bytestrings or Unicode strings. *tag* is the element name. *attrib* is 744 an optional dictionary, containing element attributes. *extra* contains 745 additional attributes, given as keyword arguments. 746 747 748 .. attribute:: tag 749 750 A string identifying what kind of data this element represents (the 751 element type, in other words). 752 753 754 .. attribute:: text 755 tail 756 757 These attributes can be used to hold additional data associated with 758 the element. Their values are usually strings but may be any 759 application-specific object. If the element is created from 760 an XML file, the *text* attribute holds either the text between 761 the element's start tag and its first child or end tag, or ``None``, and 762 the *tail* attribute holds either the text between the element's 763 end tag and the next tag, or ``None``. For the XML data 764 765 .. code-block:: xml 766 767 <a><b>1<c>2<d/>3</c></b>4</a> 768 769 the *a* element has ``None`` for both *text* and *tail* attributes, 770 the *b* element has *text* ``"1"`` and *tail* ``"4"``, 771 the *c* element has *text* ``"2"`` and *tail* ``None``, 772 and the *d* element has *text* ``None`` and *tail* ``"3"``. 773 774 To collect the inner text of an element, see :meth:`itertext`, for 775 example ``"".join(element.itertext())``. 776 777 Applications may store arbitrary objects in these attributes. 778 779 780 .. attribute:: attrib 781 782 A dictionary containing the element's attributes. Note that while the 783 *attrib* value is always a real mutable Python dictionary, an ElementTree 784 implementation may choose to use another internal representation, and 785 create the dictionary only if someone asks for it. To take advantage of 786 such implementations, use the dictionary methods below whenever possible. 787 788 The following dictionary-like methods work on the element attributes. 789 790 791 .. method:: clear() 792 793 Resets an element. This function removes all subelements, clears all 794 attributes, and sets the text and tail attributes to ``None``. 795 796 797 .. method:: get(key, default=None) 798 799 Gets the element attribute named *key*. 800 801 Returns the attribute value, or *default* if the attribute was not found. 802 803 804 .. method:: items() 805 806 Returns the element attributes as a sequence of (name, value) pairs. The 807 attributes are returned in an arbitrary order. 808 809 810 .. method:: keys() 811 812 Returns the elements attribute names as a list. The names are returned 813 in an arbitrary order. 814 815 816 .. method:: set(key, value) 817 818 Set the attribute *key* on the element to *value*. 819 820 The following methods work on the element's children (subelements). 821 822 823 .. method:: append(subelement) 824 825 Adds the element *subelement* to the end of this element's internal list 826 of subelements. Raises :exc:`TypeError` if *subelement* is not an 827 :class:`Element`. 828 829 830 .. method:: extend(subelements) 831 832 Appends *subelements* from a sequence object with zero or more elements. 833 Raises :exc:`TypeError` if a subelement is not an :class:`Element`. 834 835 .. versionadded:: 3.2 836 837 838 .. method:: find(match, namespaces=None) 839 840 Finds the first subelement matching *match*. *match* may be a tag name 841 or a :ref:`path <elementtree-xpath>`. Returns an element instance 842 or ``None``. *namespaces* is an optional mapping from namespace prefix 843 to full name. 844 845 846 .. method:: findall(match, namespaces=None) 847 848 Finds all matching subelements, by tag name or 849 :ref:`path <elementtree-xpath>`. Returns a list containing all matching 850 elements in document order. *namespaces* is an optional mapping from 851 namespace prefix to full name. 852 853 854 .. method:: findtext(match, default=None, namespaces=None) 855 856 Finds text for the first subelement matching *match*. *match* may be 857 a tag name or a :ref:`path <elementtree-xpath>`. Returns the text content 858 of the first matching element, or *default* if no element was found. 859 Note that if the matching element has no text content an empty string 860 is returned. *namespaces* is an optional mapping from namespace prefix 861 to full name. 862 863 864 .. method:: getchildren() 865 866 .. deprecated:: 3.2 867 Use ``list(elem)`` or iteration. 868 869 870 .. method:: getiterator(tag=None) 871 872 .. deprecated:: 3.2 873 Use method :meth:`Element.iter` instead. 874 875 876 .. method:: insert(index, subelement) 877 878 Inserts *subelement* at the given position in this element. Raises 879 :exc:`TypeError` if *subelement* is not an :class:`Element`. 880 881 882 .. method:: iter(tag=None) 883 884 Creates a tree :term:`iterator` with the current element as the root. 885 The iterator iterates over this element and all elements below it, in 886 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only 887 elements whose tag equals *tag* are returned from the iterator. If the 888 tree structure is modified during iteration, the result is undefined. 889 890 .. versionadded:: 3.2 891 892 893 .. method:: iterfind(match, namespaces=None) 894 895 Finds all matching subelements, by tag name or 896 :ref:`path <elementtree-xpath>`. Returns an iterable yielding all 897 matching elements in document order. *namespaces* is an optional mapping 898 from namespace prefix to full name. 899 900 901 .. versionadded:: 3.2 902 903 904 .. method:: itertext() 905 906 Creates a text iterator. The iterator loops over this element and all 907 subelements, in document order, and returns all inner text. 908 909 .. versionadded:: 3.2 910 911 912 .. method:: makeelement(tag, attrib) 913 914 Creates a new element object of the same type as this element. Do not 915 call this method, use the :func:`SubElement` factory function instead. 916 917 918 .. method:: remove(subelement) 919 920 Removes *subelement* from the element. Unlike the find\* methods this 921 method compares elements based on the instance identity, not on tag value 922 or contents. 923 924 :class:`Element` objects also support the following sequence type methods 925 for working with subelements: :meth:`~object.__delitem__`, 926 :meth:`~object.__getitem__`, :meth:`~object.__setitem__`, 927 :meth:`~object.__len__`. 928 929 Caution: Elements with no subelements will test as ``False``. This behavior 930 will change in future versions. Use specific ``len(elem)`` or ``elem is 931 None`` test instead. :: 932 933 element = root.find('foo') 934 935 if not element: # careful! 936 print("element not found, or element has no subelements") 937 938 if element is None: 939 print("element not found") 940 941 942.. _elementtree-elementtree-objects: 943 944ElementTree Objects 945^^^^^^^^^^^^^^^^^^^ 946 947 948.. class:: ElementTree(element=None, file=None) 949 950 ElementTree wrapper class. This class represents an entire element 951 hierarchy, and adds some extra support for serialization to and from 952 standard XML. 953 954 *element* is the root element. The tree is initialized with the contents 955 of the XML *file* if given. 956 957 958 .. method:: _setroot(element) 959 960 Replaces the root element for this tree. This discards the current 961 contents of the tree, and replaces it with the given element. Use with 962 care. *element* is an element instance. 963 964 965 .. method:: find(match, namespaces=None) 966 967 Same as :meth:`Element.find`, starting at the root of the tree. 968 969 970 .. method:: findall(match, namespaces=None) 971 972 Same as :meth:`Element.findall`, starting at the root of the tree. 973 974 975 .. method:: findtext(match, default=None, namespaces=None) 976 977 Same as :meth:`Element.findtext`, starting at the root of the tree. 978 979 980 .. method:: getiterator(tag=None) 981 982 .. deprecated:: 3.2 983 Use method :meth:`ElementTree.iter` instead. 984 985 986 .. method:: getroot() 987 988 Returns the root element for this tree. 989 990 991 .. method:: iter(tag=None) 992 993 Creates and returns a tree iterator for the root element. The iterator 994 loops over all elements in this tree, in section order. *tag* is the tag 995 to look for (default is to return all elements). 996 997 998 .. method:: iterfind(match, namespaces=None) 999 1000 Same as :meth:`Element.iterfind`, starting at the root of the tree. 1001 1002 .. versionadded:: 3.2 1003 1004 1005 .. method:: parse(source, parser=None) 1006 1007 Loads an external XML section into this element tree. *source* is a file 1008 name or :term:`file object`. *parser* is an optional parser instance. 1009 If not given, the standard :class:`XMLParser` parser is used. Returns the 1010 section root element. 1011 1012 1013 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \ 1014 default_namespace=None, method="xml", *, \ 1015 short_empty_elements=True) 1016 1017 Writes the element tree to a file, as XML. *file* is a file name, or a 1018 :term:`file object` opened for writing. *encoding* [1]_ is the output 1019 encoding (default is US-ASCII). 1020 *xml_declaration* controls if an XML declaration should be added to the 1021 file. Use ``False`` for never, ``True`` for always, ``None`` 1022 for only if not US-ASCII or UTF-8 or Unicode (default is ``None``). 1023 *default_namespace* sets the default XML namespace (for "xmlns"). 1024 *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is 1025 ``"xml"``). 1026 The keyword-only *short_empty_elements* parameter controls the formatting 1027 of elements that contain no content. If ``True`` (the default), they are 1028 emitted as a single self-closed tag, otherwise they are emitted as a pair 1029 of start/end tags. 1030 1031 The output is either a string (:class:`str`) or binary (:class:`bytes`). 1032 This is controlled by the *encoding* argument. If *encoding* is 1033 ``"unicode"``, the output is a string; otherwise, it's binary. Note that 1034 this may conflict with the type of *file* if it's an open 1035 :term:`file object`; make sure you do not try to write a string to a 1036 binary stream and vice versa. 1037 1038 .. versionadded:: 3.4 1039 The *short_empty_elements* parameter. 1040 1041 1042This is the XML file that is going to be manipulated:: 1043 1044 <html> 1045 <head> 1046 <title>Example page</title> 1047 </head> 1048 <body> 1049 <p>Moved to <a href="http://example.org/">example.org</a> 1050 or <a href="http://example.com/">example.com</a>.</p> 1051 </body> 1052 </html> 1053 1054Example of changing the attribute "target" of every link in first paragraph:: 1055 1056 >>> from xml.etree.ElementTree import ElementTree 1057 >>> tree = ElementTree() 1058 >>> tree.parse("index.xhtml") 1059 <Element 'html' at 0xb77e6fac> 1060 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body 1061 >>> p 1062 <Element 'p' at 0xb77ec26c> 1063 >>> links = list(p.iter("a")) # Returns list of all links 1064 >>> links 1065 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>] 1066 >>> for i in links: # Iterates through all found links 1067 ... i.attrib["target"] = "blank" 1068 >>> tree.write("output.xhtml") 1069 1070.. _elementtree-qname-objects: 1071 1072QName Objects 1073^^^^^^^^^^^^^ 1074 1075 1076.. class:: QName(text_or_uri, tag=None) 1077 1078 QName wrapper. This can be used to wrap a QName attribute value, in order 1079 to get proper namespace handling on output. *text_or_uri* is a string 1080 containing the QName value, in the form {uri}local, or, if the tag argument 1081 is given, the URI part of a QName. If *tag* is given, the first argument is 1082 interpreted as a URI, and this argument is interpreted as a local name. 1083 :class:`QName` instances are opaque. 1084 1085 1086 1087.. _elementtree-treebuilder-objects: 1088 1089TreeBuilder Objects 1090^^^^^^^^^^^^^^^^^^^ 1091 1092 1093.. class:: TreeBuilder(element_factory=None) 1094 1095 Generic element structure builder. This builder converts a sequence of 1096 start, data, and end method calls to a well-formed element structure. You 1097 can use this class to build an element structure using a custom XML parser, 1098 or a parser for some other XML-like format. *element_factory*, when given, 1099 must be a callable accepting two positional arguments: a tag and 1100 a dict of attributes. It is expected to return a new element instance. 1101 1102 .. method:: close() 1103 1104 Flushes the builder buffers, and returns the toplevel document 1105 element. Returns an :class:`Element` instance. 1106 1107 1108 .. method:: data(data) 1109 1110 Adds text to the current element. *data* is a string. This should be 1111 either a bytestring, or a Unicode string. 1112 1113 1114 .. method:: end(tag) 1115 1116 Closes the current element. *tag* is the element name. Returns the 1117 closed element. 1118 1119 1120 .. method:: start(tag, attrs) 1121 1122 Opens a new element. *tag* is the element name. *attrs* is a dictionary 1123 containing element attributes. Returns the opened element. 1124 1125 1126 In addition, a custom :class:`TreeBuilder` object can provide the 1127 following method: 1128 1129 .. method:: doctype(name, pubid, system) 1130 1131 Handles a doctype declaration. *name* is the doctype name. *pubid* is 1132 the public identifier. *system* is the system identifier. This method 1133 does not exist on the default :class:`TreeBuilder` class. 1134 1135 .. versionadded:: 3.2 1136 1137 1138.. _elementtree-xmlparser-objects: 1139 1140XMLParser Objects 1141^^^^^^^^^^^^^^^^^ 1142 1143 1144.. class:: XMLParser(html=0, target=None, encoding=None) 1145 1146 This class is the low-level building block of the module. It uses 1147 :mod:`xml.parsers.expat` for efficient, event-based parsing of XML. It can 1148 be fed XML data incrementally with the :meth:`feed` method, and parsing 1149 events are translated to a push API - by invoking callbacks on the *target* 1150 object. If *target* is omitted, the standard :class:`TreeBuilder` is used. 1151 The *html* argument was historically used for backwards compatibility and is 1152 now deprecated. If *encoding* [1]_ is given, the value overrides the 1153 encoding specified in the XML file. 1154 1155 .. deprecated:: 3.4 1156 The *html* argument. The remaining arguments should be passed via 1157 keyword to prepare for the removal of the *html* argument. 1158 1159 .. method:: close() 1160 1161 Finishes feeding data to the parser. Returns the result of calling the 1162 ``close()`` method of the *target* passed during construction; by default, 1163 this is the toplevel document element. 1164 1165 1166 .. method:: doctype(name, pubid, system) 1167 1168 .. deprecated:: 3.2 1169 Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder 1170 target. 1171 1172 1173 .. method:: feed(data) 1174 1175 Feeds data to the parser. *data* is encoded data. 1176 1177 :meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method 1178 for each opening tag, its ``end(tag)`` method for each closing tag, and data 1179 is processed by method ``data(data)``. :meth:`XMLParser.close` calls 1180 *target*\'s method ``close()``. :class:`XMLParser` can be used not only for 1181 building a tree structure. This is an example of counting the maximum depth 1182 of an XML file:: 1183 1184 >>> from xml.etree.ElementTree import XMLParser 1185 >>> class MaxDepth: # The target object of the parser 1186 ... maxDepth = 0 1187 ... depth = 0 1188 ... def start(self, tag, attrib): # Called for each opening tag. 1189 ... self.depth += 1 1190 ... if self.depth > self.maxDepth: 1191 ... self.maxDepth = self.depth 1192 ... def end(self, tag): # Called for each closing tag. 1193 ... self.depth -= 1 1194 ... def data(self, data): 1195 ... pass # We do not need to do anything with data. 1196 ... def close(self): # Called when all data has been parsed. 1197 ... return self.maxDepth 1198 ... 1199 >>> target = MaxDepth() 1200 >>> parser = XMLParser(target=target) 1201 >>> exampleXml = """ 1202 ... <a> 1203 ... <b> 1204 ... </b> 1205 ... <b> 1206 ... <c> 1207 ... <d> 1208 ... </d> 1209 ... </c> 1210 ... </b> 1211 ... </a>""" 1212 >>> parser.feed(exampleXml) 1213 >>> parser.close() 1214 4 1215 1216 1217.. _elementtree-xmlpullparser-objects: 1218 1219XMLPullParser Objects 1220^^^^^^^^^^^^^^^^^^^^^ 1221 1222.. class:: XMLPullParser(events=None) 1223 1224 A pull parser suitable for non-blocking applications. Its input-side API is 1225 similar to that of :class:`XMLParser`, but instead of pushing calls to a 1226 callback target, :class:`XMLPullParser` collects an internal list of parsing 1227 events and lets the user read from it. *events* is a sequence of events to 1228 report back. The supported events are the strings ``"start"``, ``"end"``, 1229 ``"start-ns"`` and ``"end-ns"`` (the "ns" events are used to get detailed 1230 namespace information). If *events* is omitted, only ``"end"`` events are 1231 reported. 1232 1233 .. method:: feed(data) 1234 1235 Feed the given bytes data to the parser. 1236 1237 .. method:: close() 1238 1239 Signal the parser that the data stream is terminated. Unlike 1240 :meth:`XMLParser.close`, this method always returns :const:`None`. 1241 Any events not yet retrieved when the parser is closed can still be 1242 read with :meth:`read_events`. 1243 1244 .. method:: read_events() 1245 1246 Return an iterator over the events which have been encountered in the 1247 data fed to the 1248 parser. The iterator yields ``(event, elem)`` pairs, where *event* is a 1249 string representing the type of event (e.g. ``"end"``) and *elem* is the 1250 encountered :class:`Element` object. 1251 1252 Events provided in a previous call to :meth:`read_events` will not be 1253 yielded again. Events are consumed from the internal queue only when 1254 they are retrieved from the iterator, so multiple readers iterating in 1255 parallel over iterators obtained from :meth:`read_events` will have 1256 unpredictable results. 1257 1258 .. note:: 1259 1260 :class:`XMLPullParser` only guarantees that it has seen the ">" 1261 character of a starting tag when it emits a "start" event, so the 1262 attributes are defined, but the contents of the text and tail attributes 1263 are undefined at that point. The same applies to the element children; 1264 they may or may not be present. 1265 1266 If you need a fully populated element, look for "end" events instead. 1267 1268 .. versionadded:: 3.4 1269 1270Exceptions 1271^^^^^^^^^^ 1272 1273.. class:: ParseError 1274 1275 XML parse error, raised by the various parsing methods in this module when 1276 parsing fails. The string representation of an instance of this exception 1277 will contain a user-friendly error message. In addition, it will have 1278 the following attributes available: 1279 1280 .. attribute:: code 1281 1282 A numeric error code from the expat parser. See the documentation of 1283 :mod:`xml.parsers.expat` for the list of error codes and their meanings. 1284 1285 .. attribute:: position 1286 1287 A tuple of *line*, *column* numbers, specifying where the error occurred. 1288 1289.. rubric:: Footnotes 1290 1291.. [1] The encoding string included in XML output should conform to the 1292 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is 1293 not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl 1294 and https://www.iana.org/assignments/character-sets/character-sets.xhtml. 1295