1:mod:`xml.dom.minidom` --- Minimal DOM implementation 2===================================================== 3 4.. module:: xml.dom.minidom 5 :synopsis: Minimal Document Object Model (DOM) implementation. 6 7.. moduleauthor:: Paul Prescod <paul@prescod.net> 8.. sectionauthor:: Paul Prescod <paul@prescod.net> 9.. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> 10 11**Source code:** :source:`Lib/xml/dom/minidom.py` 12 13-------------- 14 15:mod:`xml.dom.minidom` is a minimal implementation of the Document Object 16Model interface, with an API similar to that in other languages. It is intended 17to be simpler than the full DOM and also significantly smaller. Users who are 18not already proficient with the DOM should consider using the 19:mod:`xml.etree.ElementTree` module for their XML processing instead. 20 21 22.. warning:: 23 24 The :mod:`xml.dom.minidom` module is not secure against 25 maliciously constructed data. If you need to parse untrusted or 26 unauthenticated data see :ref:`xml-vulnerabilities`. 27 28 29DOM applications typically start by parsing some XML into a DOM. With 30:mod:`xml.dom.minidom`, this is done through the parse functions:: 31 32 from xml.dom.minidom import parse, parseString 33 34 dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name 35 36 datasource = open('c:\\temp\\mydata.xml') 37 dom2 = parse(datasource) # parse an open file 38 39 dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>') 40 41The :func:`parse` function can take either a filename or an open file object. 42 43 44.. function:: parse(filename_or_file, parser=None, bufsize=None) 45 46 Return a :class:`Document` from the given input. *filename_or_file* may be 47 either a file name, or a file-like object. *parser*, if given, must be a SAX2 48 parser object. This function will change the document handler of the parser and 49 activate namespace support; other parser configuration (like setting an entity 50 resolver) must have been done in advance. 51 52If you have XML in a string, you can use the :func:`parseString` function 53instead: 54 55 56.. function:: parseString(string, parser=None) 57 58 Return a :class:`Document` that represents the *string*. This method creates an 59 :class:`io.StringIO` object for the string and passes that on to :func:`parse`. 60 61Both functions return a :class:`Document` object representing the content of the 62document. 63 64What the :func:`parse` and :func:`parseString` functions do is connect an XML 65parser with a "DOM builder" that can accept parse events from any SAX parser and 66convert them into a DOM tree. The name of the functions are perhaps misleading, 67but are easy to grasp when learning the interfaces. The parsing of the document 68will be completed before these functions return; it's simply that these 69functions do not provide a parser implementation themselves. 70 71You can also create a :class:`Document` by calling a method on a "DOM 72Implementation" object. You can get this object either by calling the 73:func:`getDOMImplementation` function in the :mod:`xml.dom` package or the 74:mod:`xml.dom.minidom` module. Once you have a :class:`Document`, you 75can add child nodes to it to populate the DOM:: 76 77 from xml.dom.minidom import getDOMImplementation 78 79 impl = getDOMImplementation() 80 81 newdoc = impl.createDocument(None, "some_tag", None) 82 top_element = newdoc.documentElement 83 text = newdoc.createTextNode('Some textual content.') 84 top_element.appendChild(text) 85 86Once you have a DOM document object, you can access the parts of your XML 87document through its properties and methods. These properties are defined in 88the DOM specification. The main property of the document object is the 89:attr:`documentElement` property. It gives you the main element in the XML 90document: the one that holds all others. Here is an example program:: 91 92 dom3 = parseString("<myxml>Some data</myxml>") 93 assert dom3.documentElement.tagName == "myxml" 94 95When you are finished with a DOM tree, you may optionally call the 96:meth:`unlink` method to encourage early cleanup of the now-unneeded 97objects. :meth:`unlink` is an :mod:`xml.dom.minidom`\ -specific 98extension to the DOM API that renders the node and its descendants are 99essentially useless. Otherwise, Python's garbage collector will 100eventually take care of the objects in the tree. 101 102.. seealso:: 103 104 `Document Object Model (DOM) Level 1 Specification <https://www.w3.org/TR/REC-DOM-Level-1/>`_ 105 The W3C recommendation for the DOM supported by :mod:`xml.dom.minidom`. 106 107 108.. _minidom-objects: 109 110DOM Objects 111----------- 112 113The definition of the DOM API for Python is given as part of the :mod:`xml.dom` 114module documentation. This section lists the differences between the API and 115:mod:`xml.dom.minidom`. 116 117 118.. method:: Node.unlink() 119 120 Break internal references within the DOM so that it will be garbage collected on 121 versions of Python without cyclic GC. Even when cyclic GC is available, using 122 this can make large amounts of memory available sooner, so calling this on DOM 123 objects as soon as they are no longer needed is good practice. This only needs 124 to be called on the :class:`Document` object, but may be called on child nodes 125 to discard children of that node. 126 127 You can avoid calling this method explicitly by using the :keyword:`with` 128 statement. The following code will automatically unlink *dom* when the 129 :keyword:`!with` block is exited:: 130 131 with xml.dom.minidom.parse(datasource) as dom: 132 ... # Work with dom. 133 134 135.. method:: Node.writexml(writer, indent="", addindent="", newl="") 136 137 Write XML to the writer object. The writer receives texts but not bytes as input, 138 it should have a :meth:`write` method which matches that of the file object 139 interface. The *indent* parameter is the indentation of the current node. 140 The *addindent* parameter is the incremental indentation to use for subnodes 141 of the current one. The *newl* parameter specifies the string to use to 142 terminate newlines. 143 144 For the :class:`Document` node, an additional keyword argument *encoding* can 145 be used to specify the encoding field of the XML header. 146 147 .. versionchanged:: 3.8 148 The :meth:`writexml` method now preserves the attribute order specified 149 by the user. 150 151.. method:: Node.toxml(encoding=None) 152 153 Return a string or byte string containing the XML represented by 154 the DOM node. 155 156 With an explicit *encoding* [1]_ argument, the result is a byte 157 string in the specified encoding. 158 With no *encoding* argument, the result is a Unicode string, and the 159 XML declaration in the resulting string does not specify an 160 encoding. Encoding this string in an encoding other than UTF-8 is 161 likely incorrect, since UTF-8 is the default encoding of XML. 162 163 .. versionchanged:: 3.8 164 The :meth:`toxml` method now preserves the attribute order specified 165 by the user. 166 167.. method:: Node.toprettyxml(indent="\\t", newl="\\n", encoding=None) 168 169 Return a pretty-printed version of the document. *indent* specifies the 170 indentation string and defaults to a tabulator; *newl* specifies the string 171 emitted at the end of each line and defaults to ``\n``. 172 173 The *encoding* argument behaves like the corresponding argument of 174 :meth:`toxml`. 175 176 .. versionchanged:: 3.8 177 The :meth:`toprettyxml` method now preserves the attribute order specified 178 by the user. 179 180 181.. _dom-example: 182 183DOM Example 184----------- 185 186This example program is a fairly realistic example of a simple program. In this 187particular case, we do not take much advantage of the flexibility of the DOM. 188 189.. literalinclude:: ../includes/minidom-example.py 190 191 192.. _minidom-and-dom: 193 194minidom and the DOM standard 195---------------------------- 196 197The :mod:`xml.dom.minidom` module is essentially a DOM 1.0-compatible DOM with 198some DOM 2 features (primarily namespace features). 199 200Usage of the DOM interface in Python is straight-forward. The following mapping 201rules apply: 202 203* Interfaces are accessed through instance objects. Applications should not 204 instantiate the classes themselves; they should use the creator functions 205 available on the :class:`Document` object. Derived interfaces support all 206 operations (and attributes) from the base interfaces, plus any new operations. 207 208* Operations are used as methods. Since the DOM uses only :keyword:`in` 209 parameters, the arguments are passed in normal order (from left to right). 210 There are no optional arguments. ``void`` operations return ``None``. 211 212* IDL attributes map to instance attributes. For compatibility with the OMG IDL 213 language mapping for Python, an attribute ``foo`` can also be accessed through 214 accessor methods :meth:`_get_foo` and :meth:`_set_foo`. ``readonly`` 215 attributes must not be changed; this is not enforced at runtime. 216 217* The types ``short int``, ``unsigned int``, ``unsigned long long``, and 218 ``boolean`` all map to Python integer objects. 219 220* The type ``DOMString`` maps to Python strings. :mod:`xml.dom.minidom` supports 221 either bytes or strings, but will normally produce strings. 222 Values of type ``DOMString`` may also be ``None`` where allowed to have the IDL 223 ``null`` value by the DOM specification from the W3C. 224 225* ``const`` declarations map to variables in their respective scope (e.g. 226 ``xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE``); they must not be changed. 227 228* ``DOMException`` is currently not supported in :mod:`xml.dom.minidom`. 229 Instead, :mod:`xml.dom.minidom` uses standard Python exceptions such as 230 :exc:`TypeError` and :exc:`AttributeError`. 231 232* :class:`NodeList` objects are implemented using Python's built-in list type. 233 These objects provide the interface defined in the DOM specification, but with 234 earlier versions of Python they do not support the official API. They are, 235 however, much more "Pythonic" than the interface defined in the W3C 236 recommendations. 237 238The following interfaces have no implementation in :mod:`xml.dom.minidom`: 239 240* :class:`DOMTimeStamp` 241 242* :class:`EntityReference` 243 244Most of these reflect information in the XML document that is not of general 245utility to most DOM users. 246 247.. rubric:: Footnotes 248 249.. [1] The encoding name included in the XML output should conform to 250 the appropriate standards. For example, "UTF-8" is valid, but 251 "UTF8" is not valid in an XML document's declaration, even though 252 Python accepts it as an encoding name. 253 See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl 254 and https://www.iana.org/assignments/character-sets/character-sets.xhtml. 255