1:mod:`xml.etree.ElementTree` --- The ElementTree XML API 2======================================================== 3 4.. module:: xml.etree.ElementTree 5 :synopsis: Implementation of the ElementTree API. 6 7.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com> 8 9**Source code:** :source:`Lib/xml/etree/ElementTree.py` 10 11-------------- 12 13The :mod:`xml.etree.ElementTree` module implements a simple and efficient API 14for parsing and creating XML data. 15 16.. versionchanged:: 3.3 17 This module will use a fast implementation whenever available. 18 The :mod:`xml.etree.cElementTree` module is deprecated. 19 20 21.. warning:: 22 23 The :mod:`xml.etree.ElementTree` module is not secure against 24 maliciously constructed data. If you need to parse untrusted or 25 unauthenticated data see :ref:`xml-vulnerabilities`. 26 27Tutorial 28-------- 29 30This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in 31short). The goal is to demonstrate some of the building blocks and basic 32concepts of the module. 33 34XML tree and elements 35^^^^^^^^^^^^^^^^^^^^^ 36 37XML is an inherently hierarchical data format, and the most natural way to 38represent it is with a tree. ``ET`` has two classes for this purpose - 39:class:`ElementTree` represents the whole XML document as a tree, and 40:class:`Element` represents a single node in this tree. Interactions with 41the whole document (reading and writing to/from files) are usually done 42on the :class:`ElementTree` level. Interactions with a single XML element 43and its sub-elements are done on the :class:`Element` level. 44 45.. _elementtree-parsing-xml: 46 47Parsing XML 48^^^^^^^^^^^ 49 50We'll be using the following XML document as the sample data for this section: 51 52.. code-block:: xml 53 54 <?xml version="1.0"?> 55 <data> 56 <country name="Liechtenstein"> 57 <rank>1</rank> 58 <year>2008</year> 59 <gdppc>141100</gdppc> 60 <neighbor name="Austria" direction="E"/> 61 <neighbor name="Switzerland" direction="W"/> 62 </country> 63 <country name="Singapore"> 64 <rank>4</rank> 65 <year>2011</year> 66 <gdppc>59900</gdppc> 67 <neighbor name="Malaysia" direction="N"/> 68 </country> 69 <country name="Panama"> 70 <rank>68</rank> 71 <year>2011</year> 72 <gdppc>13600</gdppc> 73 <neighbor name="Costa Rica" direction="W"/> 74 <neighbor name="Colombia" direction="E"/> 75 </country> 76 </data> 77 78We can import this data by reading from a file:: 79 80 import xml.etree.ElementTree as ET 81 tree = ET.parse('country_data.xml') 82 root = tree.getroot() 83 84Or directly from a string:: 85 86 root = ET.fromstring(country_data_as_string) 87 88:func:`fromstring` parses XML from a string directly into an :class:`Element`, 89which is the root element of the parsed tree. Other parsing functions may 90create an :class:`ElementTree`. Check the documentation to be sure. 91 92As an :class:`Element`, ``root`` has a tag and a dictionary of attributes:: 93 94 >>> root.tag 95 'data' 96 >>> root.attrib 97 {} 98 99It also has children nodes over which we can iterate:: 100 101 >>> for child in root: 102 ... print(child.tag, child.attrib) 103 ... 104 country {'name': 'Liechtenstein'} 105 country {'name': 'Singapore'} 106 country {'name': 'Panama'} 107 108Children are nested, and we can access specific child nodes by index:: 109 110 >>> root[0][1].text 111 '2008' 112 113 114.. note:: 115 116 Not all elements of the XML input will end up as elements of the 117 parsed tree. Currently, this module skips over any XML comments, 118 processing instructions, and document type declarations in the 119 input. Nevertheless, trees built using this module's API rather 120 than parsing from XML text can have comments and processing 121 instructions in them; they will be included when generating XML 122 output. A document type declaration may be accessed by passing a 123 custom :class:`TreeBuilder` instance to the :class:`XMLParser` 124 constructor. 125 126 127.. _elementtree-pull-parsing: 128 129Pull API for non-blocking parsing 130^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 131 132Most parsing functions provided by this module require the whole document 133to be read at once before returning any result. It is possible to use an 134:class:`XMLParser` and feed data into it incrementally, but it is a push API that 135calls methods on a callback target, which is too low-level and inconvenient for 136most needs. Sometimes what the user really wants is to be able to parse XML 137incrementally, without blocking operations, while enjoying the convenience of 138fully constructed :class:`Element` objects. 139 140The most powerful tool for doing this is :class:`XMLPullParser`. It does not 141require a blocking read to obtain the XML data, and is instead fed with data 142incrementally with :meth:`XMLPullParser.feed` calls. To get the parsed XML 143elements, call :meth:`XMLPullParser.read_events`. Here is an example:: 144 145 >>> parser = ET.XMLPullParser(['start', 'end']) 146 >>> parser.feed('<mytag>sometext') 147 >>> list(parser.read_events()) 148 [('start', <Element 'mytag' at 0x7fa66db2be58>)] 149 >>> parser.feed(' more text</mytag>') 150 >>> for event, elem in parser.read_events(): 151 ... print(event) 152 ... print(elem.tag, 'text=', elem.text) 153 ... 154 end 155 156The obvious use case is applications that operate in a non-blocking fashion 157where the XML data is being received from a socket or read incrementally from 158some storage device. In such cases, blocking reads are unacceptable. 159 160Because it's so flexible, :class:`XMLPullParser` can be inconvenient to use for 161simpler use-cases. If you don't mind your application blocking on reading XML 162data but would still like to have incremental parsing capabilities, take a look 163at :func:`iterparse`. It can be useful when you're reading a large XML document 164and don't want to hold it wholly in memory. 165 166Finding interesting elements 167^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 168 169:class:`Element` has some useful methods that help iterate recursively over all 170the sub-tree below it (its children, their children, and so on). For example, 171:meth:`Element.iter`:: 172 173 >>> for neighbor in root.iter('neighbor'): 174 ... print(neighbor.attrib) 175 ... 176 {'name': 'Austria', 'direction': 'E'} 177 {'name': 'Switzerland', 'direction': 'W'} 178 {'name': 'Malaysia', 'direction': 'N'} 179 {'name': 'Costa Rica', 'direction': 'W'} 180 {'name': 'Colombia', 'direction': 'E'} 181 182:meth:`Element.findall` finds only elements with a tag which are direct 183children of the current element. :meth:`Element.find` finds the *first* child 184with a particular tag, and :attr:`Element.text` accesses the element's text 185content. :meth:`Element.get` accesses the element's attributes:: 186 187 >>> for country in root.findall('country'): 188 ... rank = country.find('rank').text 189 ... name = country.get('name') 190 ... print(name, rank) 191 ... 192 Liechtenstein 1 193 Singapore 4 194 Panama 68 195 196More sophisticated specification of which elements to look for is possible by 197using :ref:`XPath <elementtree-xpath>`. 198 199Modifying an XML File 200^^^^^^^^^^^^^^^^^^^^^ 201 202:class:`ElementTree` provides a simple way to build XML documents and write them to files. 203The :meth:`ElementTree.write` method serves this purpose. 204 205Once created, an :class:`Element` object may be manipulated by directly changing 206its fields (such as :attr:`Element.text`), adding and modifying attributes 207(:meth:`Element.set` method), as well as adding new children (for example 208with :meth:`Element.append`). 209 210Let's say we want to add one to each country's rank, and add an ``updated`` 211attribute to the rank element:: 212 213 >>> for rank in root.iter('rank'): 214 ... new_rank = int(rank.text) + 1 215 ... rank.text = str(new_rank) 216 ... rank.set('updated', 'yes') 217 ... 218 >>> tree.write('output.xml') 219 220Our XML now looks like this: 221 222.. code-block:: xml 223 224 <?xml version="1.0"?> 225 <data> 226 <country name="Liechtenstein"> 227 <rank updated="yes">2</rank> 228 <year>2008</year> 229 <gdppc>141100</gdppc> 230 <neighbor name="Austria" direction="E"/> 231 <neighbor name="Switzerland" direction="W"/> 232 </country> 233 <country name="Singapore"> 234 <rank updated="yes">5</rank> 235 <year>2011</year> 236 <gdppc>59900</gdppc> 237 <neighbor name="Malaysia" direction="N"/> 238 </country> 239 <country name="Panama"> 240 <rank updated="yes">69</rank> 241 <year>2011</year> 242 <gdppc>13600</gdppc> 243 <neighbor name="Costa Rica" direction="W"/> 244 <neighbor name="Colombia" direction="E"/> 245 </country> 246 </data> 247 248We can remove elements using :meth:`Element.remove`. Let's say we want to 249remove all countries with a rank higher than 50:: 250 251 >>> for country in root.findall('country'): 252 ... # using root.findall() to avoid removal during traversal 253 ... rank = int(country.find('rank').text) 254 ... if rank > 50: 255 ... root.remove(country) 256 ... 257 >>> tree.write('output.xml') 258 259Note that concurrent modification while iterating can lead to problems, 260just like when iterating and modifying Python lists or dicts. 261Therefore, the example first collects all matching elements with 262``root.findall()``, and only then iterates over the list of matches. 263 264Our XML now looks like this: 265 266.. code-block:: xml 267 268 <?xml version="1.0"?> 269 <data> 270 <country name="Liechtenstein"> 271 <rank updated="yes">2</rank> 272 <year>2008</year> 273 <gdppc>141100</gdppc> 274 <neighbor name="Austria" direction="E"/> 275 <neighbor name="Switzerland" direction="W"/> 276 </country> 277 <country name="Singapore"> 278 <rank updated="yes">5</rank> 279 <year>2011</year> 280 <gdppc>59900</gdppc> 281 <neighbor name="Malaysia" direction="N"/> 282 </country> 283 </data> 284 285Building XML documents 286^^^^^^^^^^^^^^^^^^^^^^ 287 288The :func:`SubElement` function also provides a convenient way to create new 289sub-elements for a given element:: 290 291 >>> a = ET.Element('a') 292 >>> b = ET.SubElement(a, 'b') 293 >>> c = ET.SubElement(a, 'c') 294 >>> d = ET.SubElement(c, 'd') 295 >>> ET.dump(a) 296 <a><b /><c><d /></c></a> 297 298Parsing XML with Namespaces 299^^^^^^^^^^^^^^^^^^^^^^^^^^^ 300 301If the XML input has `namespaces 302<https://en.wikipedia.org/wiki/XML_namespace>`__, tags and attributes 303with prefixes in the form ``prefix:sometag`` get expanded to 304``{uri}sometag`` where the *prefix* is replaced by the full *URI*. 305Also, if there is a `default namespace 306<https://www.w3.org/TR/xml-names/#defaulting>`__, 307that full URI gets prepended to all of the non-prefixed tags. 308 309Here is an XML example that incorporates two namespaces, one with the 310prefix "fictional" and the other serving as the default namespace: 311 312.. code-block:: xml 313 314 <?xml version="1.0"?> 315 <actors xmlns:fictional="http://characters.example.com" 316 xmlns="http://people.example.com"> 317 <actor> 318 <name>John Cleese</name> 319 <fictional:character>Lancelot</fictional:character> 320 <fictional:character>Archie Leach</fictional:character> 321 </actor> 322 <actor> 323 <name>Eric Idle</name> 324 <fictional:character>Sir Robin</fictional:character> 325 <fictional:character>Gunther</fictional:character> 326 <fictional:character>Commander Clement</fictional:character> 327 </actor> 328 </actors> 329 330One way to search and explore this XML example is to manually add the 331URI to every tag or attribute in the xpath of a 332:meth:`~Element.find` or :meth:`~Element.findall`:: 333 334 root = fromstring(xml_text) 335 for actor in root.findall('{http://people.example.com}actor'): 336 name = actor.find('{http://people.example.com}name') 337 print(name.text) 338 for char in actor.findall('{http://characters.example.com}character'): 339 print(' |-->', char.text) 340 341A better way to search the namespaced XML example is to create a 342dictionary with your own prefixes and use those in the search functions:: 343 344 ns = {'real_person': 'http://people.example.com', 345 'role': 'http://characters.example.com'} 346 347 for actor in root.findall('real_person:actor', ns): 348 name = actor.find('real_person:name', ns) 349 print(name.text) 350 for char in actor.findall('role:character', ns): 351 print(' |-->', char.text) 352 353These two approaches both output:: 354 355 John Cleese 356 |--> Lancelot 357 |--> Archie Leach 358 Eric Idle 359 |--> Sir Robin 360 |--> Gunther 361 |--> Commander Clement 362 363 364Additional resources 365^^^^^^^^^^^^^^^^^^^^ 366 367See http://effbot.org/zone/element-index.htm for tutorials and links to other 368docs. 369 370 371.. _elementtree-xpath: 372 373XPath support 374------------- 375 376This module provides limited support for 377`XPath expressions <https://www.w3.org/TR/xpath>`_ for locating elements in a 378tree. The goal is to support a small subset of the abbreviated syntax; a full 379XPath engine is outside the scope of the module. 380 381Example 382^^^^^^^ 383 384Here's an example that demonstrates some of the XPath capabilities of the 385module. We'll be using the ``countrydata`` XML document from the 386:ref:`Parsing XML <elementtree-parsing-xml>` section:: 387 388 import xml.etree.ElementTree as ET 389 390 root = ET.fromstring(countrydata) 391 392 # Top-level elements 393 root.findall(".") 394 395 # All 'neighbor' grand-children of 'country' children of the top-level 396 # elements 397 root.findall("./country/neighbor") 398 399 # Nodes with name='Singapore' that have a 'year' child 400 root.findall(".//year/..[@name='Singapore']") 401 402 # 'year' nodes that are children of nodes with name='Singapore' 403 root.findall(".//*[@name='Singapore']/year") 404 405 # All 'neighbor' nodes that are the second child of their parent 406 root.findall(".//neighbor[2]") 407 408For XML with namespaces, use the usual qualified ``{namespace}tag`` notation:: 409 410 # All dublin-core "title" tags in the document 411 root.findall(".//{http://purl.org/dc/elements/1.1/}title") 412 413 414Supported XPath syntax 415^^^^^^^^^^^^^^^^^^^^^^ 416 417.. tabularcolumns:: |l|L| 418 419+-----------------------+------------------------------------------------------+ 420| Syntax | Meaning | 421+=======================+======================================================+ 422| ``tag`` | Selects all child elements with the given tag. | 423| | For example, ``spam`` selects all child elements | 424| | named ``spam``, and ``spam/egg`` selects all | 425| | grandchildren named ``egg`` in all children named | 426| | ``spam``. ``{namespace}*`` selects all tags in the | 427| | given namespace, ``{*}spam`` selects tags named | 428| | ``spam`` in any (or no) namespace, and ``{}*`` | 429| | only selects tags that are not in a namespace. | 430| | | 431| | .. versionchanged:: 3.8 | 432| | Support for star-wildcards was added. | 433+-----------------------+------------------------------------------------------+ 434| ``*`` | Selects all child elements, including comments and | 435| | processing instructions. For example, ``*/egg`` | 436| | selects all grandchildren named ``egg``. | 437+-----------------------+------------------------------------------------------+ 438| ``.`` | Selects the current node. This is mostly useful | 439| | at the beginning of the path, to indicate that it's | 440| | a relative path. | 441+-----------------------+------------------------------------------------------+ 442| ``//`` | Selects all subelements, on all levels beneath the | 443| | current element. For example, ``.//egg`` selects | 444| | all ``egg`` elements in the entire tree. | 445+-----------------------+------------------------------------------------------+ 446| ``..`` | Selects the parent element. Returns ``None`` if the | 447| | path attempts to reach the ancestors of the start | 448| | element (the element ``find`` was called on). | 449+-----------------------+------------------------------------------------------+ 450| ``[@attrib]`` | Selects all elements that have the given attribute. | 451+-----------------------+------------------------------------------------------+ 452| ``[@attrib='value']`` | Selects all elements for which the given attribute | 453| | has the given value. The value cannot contain | 454| | quotes. | 455+-----------------------+------------------------------------------------------+ 456| ``[tag]`` | Selects all elements that have a child named | 457| | ``tag``. Only immediate children are supported. | 458+-----------------------+------------------------------------------------------+ 459| ``[.='text']`` | Selects all elements whose complete text content, | 460| | including descendants, equals the given ``text``. | 461| | | 462| | .. versionadded:: 3.7 | 463+-----------------------+------------------------------------------------------+ 464| ``[tag='text']`` | Selects all elements that have a child named | 465| | ``tag`` whose complete text content, including | 466| | descendants, equals the given ``text``. | 467+-----------------------+------------------------------------------------------+ 468| ``[position]`` | Selects all elements that are located at the given | 469| | position. The position can be either an integer | 470| | (1 is the first position), the expression ``last()`` | 471| | (for the last position), or a position relative to | 472| | the last position (e.g. ``last()-1``). | 473+-----------------------+------------------------------------------------------+ 474 475Predicates (expressions within square brackets) must be preceded by a tag 476name, an asterisk, or another predicate. ``position`` predicates must be 477preceded by a tag name. 478 479Reference 480--------- 481 482.. _elementtree-functions: 483 484Functions 485^^^^^^^^^ 486 487.. function:: canonicalize(xml_data=None, *, out=None, from_file=None, **options) 488 489 `C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ transformation function. 490 491 Canonicalization is a way to normalise XML output in a way that allows 492 byte-by-byte comparisons and digital signatures. It reduced the freedom 493 that XML serializers have and instead generates a more constrained XML 494 representation. The main restrictions regard the placement of namespace 495 declarations, the ordering of attributes, and ignorable whitespace. 496 497 This function takes an XML data string (*xml_data*) or a file path or 498 file-like object (*from_file*) as input, converts it to the canonical 499 form, and writes it out using the *out* file(-like) object, if provided, 500 or returns it as a text string if not. The output file receives text, 501 not bytes. It should therefore be opened in text mode with ``utf-8`` 502 encoding. 503 504 Typical uses:: 505 506 xml_data = "<root>...</root>" 507 print(canonicalize(xml_data)) 508 509 with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file: 510 canonicalize(xml_data, out=out_file) 511 512 with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file: 513 canonicalize(from_file="inputfile.xml", out=out_file) 514 515 The configuration *options* are as follows: 516 517 - *with_comments*: set to true to include comments (default: false) 518 - *strip_text*: set to true to strip whitespace before and after text content 519 (default: false) 520 - *rewrite_prefixes*: set to true to replace namespace prefixes by "n{number}" 521 (default: false) 522 - *qname_aware_tags*: a set of qname aware tag names in which prefixes 523 should be replaced in text content (default: empty) 524 - *qname_aware_attrs*: a set of qname aware attribute names in which prefixes 525 should be replaced in text content (default: empty) 526 - *exclude_attrs*: a set of attribute names that should not be serialised 527 - *exclude_tags*: a set of tag names that should not be serialised 528 529 In the option list above, "a set" refers to any collection or iterable of 530 strings, no ordering is expected. 531 532 .. versionadded:: 3.8 533 534 535.. function:: Comment(text=None) 536 537 Comment element factory. This factory function creates a special element 538 that will be serialized as an XML comment by the standard serializer. The 539 comment string can be either a bytestring or a Unicode string. *text* is a 540 string containing the comment string. Returns an element instance 541 representing a comment. 542 543 Note that :class:`XMLParser` skips over comments in the input 544 instead of creating comment objects for them. An :class:`ElementTree` will 545 only contain comment nodes if they have been inserted into to 546 the tree using one of the :class:`Element` methods. 547 548.. function:: dump(elem) 549 550 Writes an element tree or element structure to sys.stdout. This function 551 should be used for debugging only. 552 553 The exact output format is implementation dependent. In this version, it's 554 written as an ordinary XML file. 555 556 *elem* is an element tree or an individual element. 557 558 .. versionchanged:: 3.8 559 The :func:`dump` function now preserves the attribute order specified 560 by the user. 561 562 563.. function:: fromstring(text, parser=None) 564 565 Parses an XML section from a string constant. Same as :func:`XML`. *text* 566 is a string containing XML data. *parser* is an optional parser instance. 567 If not given, the standard :class:`XMLParser` parser is used. 568 Returns an :class:`Element` instance. 569 570 571.. function:: fromstringlist(sequence, parser=None) 572 573 Parses an XML document from a sequence of string fragments. *sequence* is a 574 list or other sequence containing XML data fragments. *parser* is an 575 optional parser instance. If not given, the standard :class:`XMLParser` 576 parser is used. Returns an :class:`Element` instance. 577 578 .. versionadded:: 3.2 579 580 581.. function:: iselement(element) 582 583 Check if an object appears to be a valid element object. *element* is an 584 element instance. Return ``True`` if this is an element object. 585 586 587.. function:: iterparse(source, events=None, parser=None) 588 589 Parses an XML section into an element tree incrementally, and reports what's 590 going on to the user. *source* is a filename or :term:`file object` 591 containing XML data. *events* is a sequence of events to report back. The 592 supported events are the strings ``"start"``, ``"end"``, ``"comment"``, 593 ``"pi"``, ``"start-ns"`` and ``"end-ns"`` 594 (the "ns" events are used to get detailed namespace 595 information). If *events* is omitted, only ``"end"`` events are reported. 596 *parser* is an optional parser instance. If not given, the standard 597 :class:`XMLParser` parser is used. *parser* must be a subclass of 598 :class:`XMLParser` and can only use the default :class:`TreeBuilder` as a 599 target. Returns an :term:`iterator` providing ``(event, elem)`` pairs. 600 601 Note that while :func:`iterparse` builds the tree incrementally, it issues 602 blocking reads on *source* (or the file it names). As such, it's unsuitable 603 for applications where blocking reads can't be made. For fully non-blocking 604 parsing, see :class:`XMLPullParser`. 605 606 .. note:: 607 608 :func:`iterparse` only guarantees that it has seen the ">" character of a 609 starting tag when it emits a "start" event, so the attributes are defined, 610 but the contents of the text and tail attributes are undefined at that 611 point. The same applies to the element children; they may or may not be 612 present. 613 614 If you need a fully populated element, look for "end" events instead. 615 616 .. deprecated:: 3.4 617 The *parser* argument. 618 619 .. versionchanged:: 3.8 620 The ``comment`` and ``pi`` events were added. 621 622 623.. function:: parse(source, parser=None) 624 625 Parses an XML section into an element tree. *source* is a filename or file 626 object containing XML data. *parser* is an optional parser instance. If 627 not given, the standard :class:`XMLParser` parser is used. Returns an 628 :class:`ElementTree` instance. 629 630 631.. function:: ProcessingInstruction(target, text=None) 632 633 PI element factory. This factory function creates a special element that 634 will be serialized as an XML processing instruction. *target* is a string 635 containing the PI target. *text* is a string containing the PI contents, if 636 given. Returns an element instance, representing a processing instruction. 637 638 Note that :class:`XMLParser` skips over processing instructions 639 in the input instead of creating comment objects for them. An 640 :class:`ElementTree` will only contain processing instruction nodes if 641 they have been inserted into to the tree using one of the 642 :class:`Element` methods. 643 644.. function:: register_namespace(prefix, uri) 645 646 Registers a namespace prefix. The registry is global, and any existing 647 mapping for either the given prefix or the namespace URI will be removed. 648 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and 649 attributes in this namespace will be serialized with the given prefix, if at 650 all possible. 651 652 .. versionadded:: 3.2 653 654 655.. function:: SubElement(parent, tag, attrib={}, **extra) 656 657 Subelement factory. This function creates an element instance, and appends 658 it to an existing element. 659 660 The element name, attribute names, and attribute values can be either 661 bytestrings or Unicode strings. *parent* is the parent element. *tag* is 662 the subelement name. *attrib* is an optional dictionary, containing element 663 attributes. *extra* contains additional attributes, given as keyword 664 arguments. Returns an element instance. 665 666 667.. function:: tostring(element, encoding="us-ascii", method="xml", *, \ 668 xml_declaration=None, default_namespace=None, \ 669 short_empty_elements=True) 670 671 Generates a string representation of an XML element, including all 672 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is 673 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to 674 generate a Unicode string (otherwise, a bytestring is generated). *method* 675 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). 676 *xml_declaration*, *default_namespace* and *short_empty_elements* has the same 677 meaning as in :meth:`ElementTree.write`. Returns an (optionally) encoded string 678 containing the XML data. 679 680 .. versionadded:: 3.4 681 The *short_empty_elements* parameter. 682 683 .. versionadded:: 3.8 684 The *xml_declaration* and *default_namespace* parameters. 685 686 .. versionchanged:: 3.8 687 The :func:`tostring` function now preserves the attribute order 688 specified by the user. 689 690 691.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \ 692 xml_declaration=None, default_namespace=None, \ 693 short_empty_elements=True) 694 695 Generates a string representation of an XML element, including all 696 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is 697 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to 698 generate a Unicode string (otherwise, a bytestring is generated). *method* 699 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). 700 *xml_declaration*, *default_namespace* and *short_empty_elements* has the same 701 meaning as in :meth:`ElementTree.write`. Returns a list of (optionally) encoded 702 strings containing the XML data. It does not guarantee any specific sequence, 703 except that ``b"".join(tostringlist(element)) == tostring(element)``. 704 705 .. versionadded:: 3.2 706 707 .. versionadded:: 3.4 708 The *short_empty_elements* parameter. 709 710 .. versionadded:: 3.8 711 The *xml_declaration* and *default_namespace* parameters. 712 713 .. versionchanged:: 3.8 714 The :func:`tostringlist` function now preserves the attribute order 715 specified by the user. 716 717 718.. function:: XML(text, parser=None) 719 720 Parses an XML section from a string constant. This function can be used to 721 embed "XML literals" in Python code. *text* is a string containing XML 722 data. *parser* is an optional parser instance. If not given, the standard 723 :class:`XMLParser` parser is used. Returns an :class:`Element` instance. 724 725 726.. function:: XMLID(text, parser=None) 727 728 Parses an XML section from a string constant, and also returns a dictionary 729 which maps from element id:s to elements. *text* is a string containing XML 730 data. *parser* is an optional parser instance. If not given, the standard 731 :class:`XMLParser` parser is used. Returns a tuple containing an 732 :class:`Element` instance and a dictionary. 733 734 735.. _elementtree-xinclude: 736 737XInclude support 738---------------- 739 740This module provides limited support for 741`XInclude directives <https://www.w3.org/TR/xinclude/>`_, via the :mod:`xml.etree.ElementInclude` helper module. This module can be used to insert subtrees and text strings into element trees, based on information in the tree. 742 743Example 744^^^^^^^ 745 746Here's an example that demonstrates use of the XInclude module. To include an XML document in the current document, use the ``{http://www.w3.org/2001/XInclude}include`` element and set the **parse** attribute to ``"xml"``, and use the **href** attribute to specify the document to include. 747 748.. code-block:: xml 749 750 <?xml version="1.0"?> 751 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 752 <xi:include href="source.xml" parse="xml" /> 753 </document> 754 755By default, the **href** attribute is treated as a file name. You can use custom loaders to override this behaviour. Also note that the standard helper does not support XPointer syntax. 756 757To process this file, load it as usual, and pass the root element to the :mod:`xml.etree.ElementTree` module: 758 759.. code-block:: python 760 761 from xml.etree import ElementTree, ElementInclude 762 763 tree = ElementTree.parse("document.xml") 764 root = tree.getroot() 765 766 ElementInclude.include(root) 767 768The ElementInclude module replaces the ``{http://www.w3.org/2001/XInclude}include`` element with the root element from the **source.xml** document. The result might look something like this: 769 770.. code-block:: xml 771 772 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 773 <para>This is a paragraph.</para> 774 </document> 775 776If the **parse** attribute is omitted, it defaults to "xml". The href attribute is required. 777 778To include a text document, use the ``{http://www.w3.org/2001/XInclude}include`` element, and set the **parse** attribute to "text": 779 780.. code-block:: xml 781 782 <?xml version="1.0"?> 783 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 784 Copyright (c) <xi:include href="year.txt" parse="text" />. 785 </document> 786 787The result might look something like: 788 789.. code-block:: xml 790 791 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 792 Copyright (c) 2003. 793 </document> 794 795Reference 796--------- 797 798.. _elementinclude-functions: 799 800Functions 801^^^^^^^^^ 802 803.. function:: xml.etree.ElementInclude.default_loader( href, parse, encoding=None) 804 805 Default loader. This default loader reads an included resource from disk. *href* is a URL. 806 *parse* is for parse mode either "xml" or "text". *encoding* 807 is an optional text encoding. If not given, encoding is ``utf-8``. Returns the 808 expanded resource. If the parse mode is ``"xml"``, this is an ElementTree 809 instance. If the parse mode is "text", this is a Unicode string. If the 810 loader fails, it can return None or raise an exception. 811 812 813.. function:: xml.etree.ElementInclude.include( elem, loader=None) 814 815 This function expands XInclude directives. *elem* is the root element. *loader* is 816 an optional resource loader. If omitted, it defaults to :func:`default_loader`. 817 If given, it should be a callable that implements the same interface as 818 :func:`default_loader`. Returns the expanded resource. If the parse mode is 819 ``"xml"``, this is an ElementTree instance. If the parse mode is "text", 820 this is a Unicode string. If the loader fails, it can return None or 821 raise an exception. 822 823 824.. _elementtree-element-objects: 825 826Element Objects 827^^^^^^^^^^^^^^^ 828 829.. class:: Element(tag, attrib={}, **extra) 830 831 Element class. This class defines the Element interface, and provides a 832 reference implementation of this interface. 833 834 The element name, attribute names, and attribute values can be either 835 bytestrings or Unicode strings. *tag* is the element name. *attrib* is 836 an optional dictionary, containing element attributes. *extra* contains 837 additional attributes, given as keyword arguments. 838 839 840 .. attribute:: tag 841 842 A string identifying what kind of data this element represents (the 843 element type, in other words). 844 845 846 .. attribute:: text 847 tail 848 849 These attributes can be used to hold additional data associated with 850 the element. Their values are usually strings but may be any 851 application-specific object. If the element is created from 852 an XML file, the *text* attribute holds either the text between 853 the element's start tag and its first child or end tag, or ``None``, and 854 the *tail* attribute holds either the text between the element's 855 end tag and the next tag, or ``None``. For the XML data 856 857 .. code-block:: xml 858 859 <a><b>1<c>2<d/>3</c></b>4</a> 860 861 the *a* element has ``None`` for both *text* and *tail* attributes, 862 the *b* element has *text* ``"1"`` and *tail* ``"4"``, 863 the *c* element has *text* ``"2"`` and *tail* ``None``, 864 and the *d* element has *text* ``None`` and *tail* ``"3"``. 865 866 To collect the inner text of an element, see :meth:`itertext`, for 867 example ``"".join(element.itertext())``. 868 869 Applications may store arbitrary objects in these attributes. 870 871 872 .. attribute:: attrib 873 874 A dictionary containing the element's attributes. Note that while the 875 *attrib* value is always a real mutable Python dictionary, an ElementTree 876 implementation may choose to use another internal representation, and 877 create the dictionary only if someone asks for it. To take advantage of 878 such implementations, use the dictionary methods below whenever possible. 879 880 The following dictionary-like methods work on the element attributes. 881 882 883 .. method:: clear() 884 885 Resets an element. This function removes all subelements, clears all 886 attributes, and sets the text and tail attributes to ``None``. 887 888 889 .. method:: get(key, default=None) 890 891 Gets the element attribute named *key*. 892 893 Returns the attribute value, or *default* if the attribute was not found. 894 895 896 .. method:: items() 897 898 Returns the element attributes as a sequence of (name, value) pairs. The 899 attributes are returned in an arbitrary order. 900 901 902 .. method:: keys() 903 904 Returns the elements attribute names as a list. The names are returned 905 in an arbitrary order. 906 907 908 .. method:: set(key, value) 909 910 Set the attribute *key* on the element to *value*. 911 912 The following methods work on the element's children (subelements). 913 914 915 .. method:: append(subelement) 916 917 Adds the element *subelement* to the end of this element's internal list 918 of subelements. Raises :exc:`TypeError` if *subelement* is not an 919 :class:`Element`. 920 921 922 .. method:: extend(subelements) 923 924 Appends *subelements* from a sequence object with zero or more elements. 925 Raises :exc:`TypeError` if a subelement is not an :class:`Element`. 926 927 .. versionadded:: 3.2 928 929 930 .. method:: find(match, namespaces=None) 931 932 Finds the first subelement matching *match*. *match* may be a tag name 933 or a :ref:`path <elementtree-xpath>`. Returns an element instance 934 or ``None``. *namespaces* is an optional mapping from namespace prefix 935 to full name. Pass ``''`` as prefix to move all unprefixed tag names 936 in the expression into the given namespace. 937 938 939 .. method:: findall(match, namespaces=None) 940 941 Finds all matching subelements, by tag name or 942 :ref:`path <elementtree-xpath>`. Returns a list containing all matching 943 elements in document order. *namespaces* is an optional mapping from 944 namespace prefix to full name. Pass ``''`` as prefix to move all 945 unprefixed tag names in the expression into the given namespace. 946 947 948 .. method:: findtext(match, default=None, namespaces=None) 949 950 Finds text for the first subelement matching *match*. *match* may be 951 a tag name or a :ref:`path <elementtree-xpath>`. Returns the text content 952 of the first matching element, or *default* if no element was found. 953 Note that if the matching element has no text content an empty string 954 is returned. *namespaces* is an optional mapping from namespace prefix 955 to full name. Pass ``''`` as prefix to move all unprefixed tag names 956 in the expression into the given namespace. 957 958 959 .. method:: getchildren() 960 961 .. deprecated-removed:: 3.2 3.9 962 Use ``list(elem)`` or iteration. 963 964 965 .. method:: getiterator(tag=None) 966 967 .. deprecated-removed:: 3.2 3.9 968 Use method :meth:`Element.iter` instead. 969 970 971 .. method:: insert(index, subelement) 972 973 Inserts *subelement* at the given position in this element. Raises 974 :exc:`TypeError` if *subelement* is not an :class:`Element`. 975 976 977 .. method:: iter(tag=None) 978 979 Creates a tree :term:`iterator` with the current element as the root. 980 The iterator iterates over this element and all elements below it, in 981 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only 982 elements whose tag equals *tag* are returned from the iterator. If the 983 tree structure is modified during iteration, the result is undefined. 984 985 .. versionadded:: 3.2 986 987 988 .. method:: iterfind(match, namespaces=None) 989 990 Finds all matching subelements, by tag name or 991 :ref:`path <elementtree-xpath>`. Returns an iterable yielding all 992 matching elements in document order. *namespaces* is an optional mapping 993 from namespace prefix to full name. 994 995 996 .. versionadded:: 3.2 997 998 999 .. method:: itertext() 1000 1001 Creates a text iterator. The iterator loops over this element and all 1002 subelements, in document order, and returns all inner text. 1003 1004 .. versionadded:: 3.2 1005 1006 1007 .. method:: makeelement(tag, attrib) 1008 1009 Creates a new element object of the same type as this element. Do not 1010 call this method, use the :func:`SubElement` factory function instead. 1011 1012 1013 .. method:: remove(subelement) 1014 1015 Removes *subelement* from the element. Unlike the find\* methods this 1016 method compares elements based on the instance identity, not on tag value 1017 or contents. 1018 1019 :class:`Element` objects also support the following sequence type methods 1020 for working with subelements: :meth:`~object.__delitem__`, 1021 :meth:`~object.__getitem__`, :meth:`~object.__setitem__`, 1022 :meth:`~object.__len__`. 1023 1024 Caution: Elements with no subelements will test as ``False``. This behavior 1025 will change in future versions. Use specific ``len(elem)`` or ``elem is 1026 None`` test instead. :: 1027 1028 element = root.find('foo') 1029 1030 if not element: # careful! 1031 print("element not found, or element has no subelements") 1032 1033 if element is None: 1034 print("element not found") 1035 1036 Prior to Python 3.8, the serialisation order of the XML attributes of 1037 elements was artificially made predictable by sorting the attributes by 1038 their name. Based on the now guaranteed ordering of dicts, this arbitrary 1039 reordering was removed in Python 3.8 to preserve the order in which 1040 attributes were originally parsed or created by user code. 1041 1042 In general, user code should try not to depend on a specific ordering of 1043 attributes, given that the `XML Information Set 1044 <https://www.w3.org/TR/xml-infoset/>`_ explicitly excludes the attribute 1045 order from conveying information. Code should be prepared to deal with 1046 any ordering on input. In cases where deterministic XML output is required, 1047 e.g. for cryptographic signing or test data sets, canonical serialisation 1048 is available with the :func:`canonicalize` function. 1049 1050 In cases where canonical output is not applicable but a specific attribute 1051 order is still desirable on output, code should aim for creating the 1052 attributes directly in the desired order, to avoid perceptual mismatches 1053 for readers of the code. In cases where this is difficult to achieve, a 1054 recipe like the following can be applied prior to serialisation to enforce 1055 an order independently from the Element creation:: 1056 1057 def reorder_attributes(root): 1058 for el in root.iter(): 1059 attrib = el.attrib 1060 if len(attrib) > 1: 1061 # adjust attribute order, e.g. by sorting 1062 attribs = sorted(attrib.items()) 1063 attrib.clear() 1064 attrib.update(attribs) 1065 1066 1067.. _elementtree-elementtree-objects: 1068 1069ElementTree Objects 1070^^^^^^^^^^^^^^^^^^^ 1071 1072 1073.. class:: ElementTree(element=None, file=None) 1074 1075 ElementTree wrapper class. This class represents an entire element 1076 hierarchy, and adds some extra support for serialization to and from 1077 standard XML. 1078 1079 *element* is the root element. The tree is initialized with the contents 1080 of the XML *file* if given. 1081 1082 1083 .. method:: _setroot(element) 1084 1085 Replaces the root element for this tree. This discards the current 1086 contents of the tree, and replaces it with the given element. Use with 1087 care. *element* is an element instance. 1088 1089 1090 .. method:: find(match, namespaces=None) 1091 1092 Same as :meth:`Element.find`, starting at the root of the tree. 1093 1094 1095 .. method:: findall(match, namespaces=None) 1096 1097 Same as :meth:`Element.findall`, starting at the root of the tree. 1098 1099 1100 .. method:: findtext(match, default=None, namespaces=None) 1101 1102 Same as :meth:`Element.findtext`, starting at the root of the tree. 1103 1104 1105 .. method:: getiterator(tag=None) 1106 1107 .. deprecated-removed:: 3.2 3.9 1108 Use method :meth:`ElementTree.iter` instead. 1109 1110 1111 .. method:: getroot() 1112 1113 Returns the root element for this tree. 1114 1115 1116 .. method:: iter(tag=None) 1117 1118 Creates and returns a tree iterator for the root element. The iterator 1119 loops over all elements in this tree, in section order. *tag* is the tag 1120 to look for (default is to return all elements). 1121 1122 1123 .. method:: iterfind(match, namespaces=None) 1124 1125 Same as :meth:`Element.iterfind`, starting at the root of the tree. 1126 1127 .. versionadded:: 3.2 1128 1129 1130 .. method:: parse(source, parser=None) 1131 1132 Loads an external XML section into this element tree. *source* is a file 1133 name or :term:`file object`. *parser* is an optional parser instance. 1134 If not given, the standard :class:`XMLParser` parser is used. Returns the 1135 section root element. 1136 1137 1138 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \ 1139 default_namespace=None, method="xml", *, \ 1140 short_empty_elements=True) 1141 1142 Writes the element tree to a file, as XML. *file* is a file name, or a 1143 :term:`file object` opened for writing. *encoding* [1]_ is the output 1144 encoding (default is US-ASCII). 1145 *xml_declaration* controls if an XML declaration should be added to the 1146 file. Use ``False`` for never, ``True`` for always, ``None`` 1147 for only if not US-ASCII or UTF-8 or Unicode (default is ``None``). 1148 *default_namespace* sets the default XML namespace (for "xmlns"). 1149 *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is 1150 ``"xml"``). 1151 The keyword-only *short_empty_elements* parameter controls the formatting 1152 of elements that contain no content. If ``True`` (the default), they are 1153 emitted as a single self-closed tag, otherwise they are emitted as a pair 1154 of start/end tags. 1155 1156 The output is either a string (:class:`str`) or binary (:class:`bytes`). 1157 This is controlled by the *encoding* argument. If *encoding* is 1158 ``"unicode"``, the output is a string; otherwise, it's binary. Note that 1159 this may conflict with the type of *file* if it's an open 1160 :term:`file object`; make sure you do not try to write a string to a 1161 binary stream and vice versa. 1162 1163 .. versionadded:: 3.4 1164 The *short_empty_elements* parameter. 1165 1166 .. versionchanged:: 3.8 1167 The :meth:`write` method now preserves the attribute order specified 1168 by the user. 1169 1170 1171This is the XML file that is going to be manipulated:: 1172 1173 <html> 1174 <head> 1175 <title>Example page</title> 1176 </head> 1177 <body> 1178 <p>Moved to <a href="http://example.org/">example.org</a> 1179 or <a href="http://example.com/">example.com</a>.</p> 1180 </body> 1181 </html> 1182 1183Example of changing the attribute "target" of every link in first paragraph:: 1184 1185 >>> from xml.etree.ElementTree import ElementTree 1186 >>> tree = ElementTree() 1187 >>> tree.parse("index.xhtml") 1188 <Element 'html' at 0xb77e6fac> 1189 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body 1190 >>> p 1191 <Element 'p' at 0xb77ec26c> 1192 >>> links = list(p.iter("a")) # Returns list of all links 1193 >>> links 1194 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>] 1195 >>> for i in links: # Iterates through all found links 1196 ... i.attrib["target"] = "blank" 1197 >>> tree.write("output.xhtml") 1198 1199.. _elementtree-qname-objects: 1200 1201QName Objects 1202^^^^^^^^^^^^^ 1203 1204 1205.. class:: QName(text_or_uri, tag=None) 1206 1207 QName wrapper. This can be used to wrap a QName attribute value, in order 1208 to get proper namespace handling on output. *text_or_uri* is a string 1209 containing the QName value, in the form {uri}local, or, if the tag argument 1210 is given, the URI part of a QName. If *tag* is given, the first argument is 1211 interpreted as a URI, and this argument is interpreted as a local name. 1212 :class:`QName` instances are opaque. 1213 1214 1215 1216.. _elementtree-treebuilder-objects: 1217 1218TreeBuilder Objects 1219^^^^^^^^^^^^^^^^^^^ 1220 1221 1222.. class:: TreeBuilder(element_factory=None, *, comment_factory=None, \ 1223 pi_factory=None, insert_comments=False, insert_pis=False) 1224 1225 Generic element structure builder. This builder converts a sequence of 1226 start, data, end, comment and pi method calls to a well-formed element 1227 structure. You can use this class to build an element structure using 1228 a custom XML parser, or a parser for some other XML-like format. 1229 1230 *element_factory*, when given, must be a callable accepting two positional 1231 arguments: a tag and a dict of attributes. It is expected to return a new 1232 element instance. 1233 1234 The *comment_factory* and *pi_factory* functions, when given, should behave 1235 like the :func:`Comment` and :func:`ProcessingInstruction` functions to 1236 create comments and processing instructions. When not given, the default 1237 factories will be used. When *insert_comments* and/or *insert_pis* is true, 1238 comments/pis will be inserted into the tree if they appear within the root 1239 element (but not outside of it). 1240 1241 .. method:: close() 1242 1243 Flushes the builder buffers, and returns the toplevel document 1244 element. Returns an :class:`Element` instance. 1245 1246 1247 .. method:: data(data) 1248 1249 Adds text to the current element. *data* is a string. This should be 1250 either a bytestring, or a Unicode string. 1251 1252 1253 .. method:: end(tag) 1254 1255 Closes the current element. *tag* is the element name. Returns the 1256 closed element. 1257 1258 1259 .. method:: start(tag, attrs) 1260 1261 Opens a new element. *tag* is the element name. *attrs* is a dictionary 1262 containing element attributes. Returns the opened element. 1263 1264 1265 .. method:: comment(text) 1266 1267 Creates a comment with the given *text*. If ``insert_comments`` is true, 1268 this will also add it to the tree. 1269 1270 .. versionadded:: 3.8 1271 1272 1273 .. method:: pi(target, text) 1274 1275 Creates a comment with the given *target* name and *text*. If 1276 ``insert_pis`` is true, this will also add it to the tree. 1277 1278 .. versionadded:: 3.8 1279 1280 1281 In addition, a custom :class:`TreeBuilder` object can provide the 1282 following methods: 1283 1284 .. method:: doctype(name, pubid, system) 1285 1286 Handles a doctype declaration. *name* is the doctype name. *pubid* is 1287 the public identifier. *system* is the system identifier. This method 1288 does not exist on the default :class:`TreeBuilder` class. 1289 1290 .. versionadded:: 3.2 1291 1292 .. method:: start_ns(prefix, uri) 1293 1294 Is called whenever the parser encounters a new namespace declaration, 1295 before the ``start()`` callback for the opening element that defines it. 1296 *prefix* is ``''`` for the default namespace and the declared 1297 namespace prefix name otherwise. *uri* is the namespace URI. 1298 1299 .. versionadded:: 3.8 1300 1301 .. method:: end_ns(prefix) 1302 1303 Is called after the ``end()`` callback of an element that declared 1304 a namespace prefix mapping, with the name of the *prefix* that went 1305 out of scope. 1306 1307 .. versionadded:: 3.8 1308 1309 1310.. class:: C14NWriterTarget(write, *, \ 1311 with_comments=False, strip_text=False, rewrite_prefixes=False, \ 1312 qname_aware_tags=None, qname_aware_attrs=None, \ 1313 exclude_attrs=None, exclude_tags=None) 1314 1315 A `C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ writer. Arguments are the 1316 same as for the :func:`canonicalize` function. This class does not build a 1317 tree but translates the callback events directly into a serialised form 1318 using the *write* function. 1319 1320 .. versionadded:: 3.8 1321 1322 1323.. _elementtree-xmlparser-objects: 1324 1325XMLParser Objects 1326^^^^^^^^^^^^^^^^^ 1327 1328 1329.. class:: XMLParser(*, target=None, encoding=None) 1330 1331 This class is the low-level building block of the module. It uses 1332 :mod:`xml.parsers.expat` for efficient, event-based parsing of XML. It can 1333 be fed XML data incrementally with the :meth:`feed` method, and parsing 1334 events are translated to a push API - by invoking callbacks on the *target* 1335 object. If *target* is omitted, the standard :class:`TreeBuilder` is used. 1336 If *encoding* [1]_ is given, the value overrides the 1337 encoding specified in the XML file. 1338 1339 .. versionchanged:: 3.8 1340 Parameters are now :ref:`keyword-only <keyword-only_parameter>`. 1341 The *html* argument no longer supported. 1342 1343 1344 .. method:: close() 1345 1346 Finishes feeding data to the parser. Returns the result of calling the 1347 ``close()`` method of the *target* passed during construction; by default, 1348 this is the toplevel document element. 1349 1350 1351 .. method:: feed(data) 1352 1353 Feeds data to the parser. *data* is encoded data. 1354 1355 :meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method 1356 for each opening tag, its ``end(tag)`` method for each closing tag, and data 1357 is processed by method ``data(data)``. For further supported callback 1358 methods, see the :class:`TreeBuilder` class. :meth:`XMLParser.close` calls 1359 *target*\'s method ``close()``. :class:`XMLParser` can be used not only for 1360 building a tree structure. This is an example of counting the maximum depth 1361 of an XML file:: 1362 1363 >>> from xml.etree.ElementTree import XMLParser 1364 >>> class MaxDepth: # The target object of the parser 1365 ... maxDepth = 0 1366 ... depth = 0 1367 ... def start(self, tag, attrib): # Called for each opening tag. 1368 ... self.depth += 1 1369 ... if self.depth > self.maxDepth: 1370 ... self.maxDepth = self.depth 1371 ... def end(self, tag): # Called for each closing tag. 1372 ... self.depth -= 1 1373 ... def data(self, data): 1374 ... pass # We do not need to do anything with data. 1375 ... def close(self): # Called when all data has been parsed. 1376 ... return self.maxDepth 1377 ... 1378 >>> target = MaxDepth() 1379 >>> parser = XMLParser(target=target) 1380 >>> exampleXml = """ 1381 ... <a> 1382 ... <b> 1383 ... </b> 1384 ... <b> 1385 ... <c> 1386 ... <d> 1387 ... </d> 1388 ... </c> 1389 ... </b> 1390 ... </a>""" 1391 >>> parser.feed(exampleXml) 1392 >>> parser.close() 1393 4 1394 1395 1396.. _elementtree-xmlpullparser-objects: 1397 1398XMLPullParser Objects 1399^^^^^^^^^^^^^^^^^^^^^ 1400 1401.. class:: XMLPullParser(events=None) 1402 1403 A pull parser suitable for non-blocking applications. Its input-side API is 1404 similar to that of :class:`XMLParser`, but instead of pushing calls to a 1405 callback target, :class:`XMLPullParser` collects an internal list of parsing 1406 events and lets the user read from it. *events* is a sequence of events to 1407 report back. The supported events are the strings ``"start"``, ``"end"``, 1408 ``"comment"``, ``"pi"``, ``"start-ns"`` and ``"end-ns"`` (the "ns" events 1409 are used to get detailed namespace information). If *events* is omitted, 1410 only ``"end"`` events are reported. 1411 1412 .. method:: feed(data) 1413 1414 Feed the given bytes data to the parser. 1415 1416 .. method:: close() 1417 1418 Signal the parser that the data stream is terminated. Unlike 1419 :meth:`XMLParser.close`, this method always returns :const:`None`. 1420 Any events not yet retrieved when the parser is closed can still be 1421 read with :meth:`read_events`. 1422 1423 .. method:: read_events() 1424 1425 Return an iterator over the events which have been encountered in the 1426 data fed to the 1427 parser. The iterator yields ``(event, elem)`` pairs, where *event* is a 1428 string representing the type of event (e.g. ``"end"``) and *elem* is the 1429 encountered :class:`Element` object, or other context value as follows. 1430 1431 * ``start``, ``end``: the current Element. 1432 * ``comment``, ``pi``: the current comment / processing instruction 1433 * ``start-ns``: a tuple ``(prefix, uri)`` naming the declared namespace 1434 mapping. 1435 * ``end-ns``: :const:`None` (this may change in a future version) 1436 1437 Events provided in a previous call to :meth:`read_events` will not be 1438 yielded again. Events are consumed from the internal queue only when 1439 they are retrieved from the iterator, so multiple readers iterating in 1440 parallel over iterators obtained from :meth:`read_events` will have 1441 unpredictable results. 1442 1443 .. note:: 1444 1445 :class:`XMLPullParser` only guarantees that it has seen the ">" 1446 character of a starting tag when it emits a "start" event, so the 1447 attributes are defined, but the contents of the text and tail attributes 1448 are undefined at that point. The same applies to the element children; 1449 they may or may not be present. 1450 1451 If you need a fully populated element, look for "end" events instead. 1452 1453 .. versionadded:: 3.4 1454 1455 .. versionchanged:: 3.8 1456 The ``comment`` and ``pi`` events were added. 1457 1458 1459Exceptions 1460^^^^^^^^^^ 1461 1462.. class:: ParseError 1463 1464 XML parse error, raised by the various parsing methods in this module when 1465 parsing fails. The string representation of an instance of this exception 1466 will contain a user-friendly error message. In addition, it will have 1467 the following attributes available: 1468 1469 .. attribute:: code 1470 1471 A numeric error code from the expat parser. See the documentation of 1472 :mod:`xml.parsers.expat` for the list of error codes and their meanings. 1473 1474 .. attribute:: position 1475 1476 A tuple of *line*, *column* numbers, specifying where the error occurred. 1477 1478.. rubric:: Footnotes 1479 1480.. [1] The encoding string included in XML output should conform to the 1481 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is 1482 not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl 1483 and https://www.iana.org/assignments/character-sets/character-sets.xhtml. 1484