1:mod:`xml.dom.pulldom` --- Support for building partial DOM trees
2=================================================================
3
4.. module:: xml.dom.pulldom
5   :synopsis: Support for building partial DOM trees from SAX events.
6
7.. moduleauthor:: Paul Prescod <paul@prescod.net>
8
9**Source code:** :source:`Lib/xml/dom/pulldom.py`
10
11--------------
12
13The :mod:`xml.dom.pulldom` module provides a "pull parser" which can also be
14asked to produce DOM-accessible fragments of the document where necessary. The
15basic concept involves pulling "events" from a stream of incoming XML and
16processing them. In contrast to SAX which also employs an event-driven
17processing model together with callbacks, the user of a pull parser is
18responsible for explicitly pulling events from the stream, looping over those
19events until either processing is finished or an error condition occurs.
20
21
22.. warning::
23
24   The :mod:`xml.dom.pulldom` module is not secure against
25   maliciously constructed data.  If you need to parse untrusted or
26   unauthenticated data see :ref:`xml-vulnerabilities`.
27
28.. versionchanged:: 3.7.1
29
30   The SAX parser no longer processes general external entities by default to
31   increase security by default. To enable processing of external entities,
32   pass a custom parser instance in::
33
34      from xml.dom.pulldom import parse
35      from xml.sax import make_parser
36      from xml.sax.handler import feature_external_ges
37
38      parser = make_parser()
39      parser.setFeature(feature_external_ges, True)
40      parse(filename, parser=parser)
41
42
43Example::
44
45   from xml.dom import pulldom
46
47   doc = pulldom.parse('sales_items.xml')
48   for event, node in doc:
49       if event == pulldom.START_ELEMENT and node.tagName == 'item':
50           if int(node.getAttribute('price')) > 50:
51               doc.expandNode(node)
52               print(node.toxml())
53
54``event`` is a constant and can be one of:
55
56* :data:`START_ELEMENT`
57* :data:`END_ELEMENT`
58* :data:`COMMENT`
59* :data:`START_DOCUMENT`
60* :data:`END_DOCUMENT`
61* :data:`CHARACTERS`
62* :data:`PROCESSING_INSTRUCTION`
63* :data:`IGNORABLE_WHITESPACE`
64
65``node`` is an object of type :class:`xml.dom.minidom.Document`,
66:class:`xml.dom.minidom.Element` or :class:`xml.dom.minidom.Text`.
67
68Since the document is treated as a "flat" stream of events, the document "tree"
69is implicitly traversed and the desired elements are found regardless of their
70depth in the tree. In other words, one does not need to consider hierarchical
71issues such as recursive searching of the document nodes, although if the
72context of elements were important, one would either need to maintain some
73context-related state (i.e. remembering where one is in the document at any
74given point) or to make use of the :func:`DOMEventStream.expandNode` method
75and switch to DOM-related processing.
76
77
78.. class:: PullDom(documentFactory=None)
79
80   Subclass of :class:`xml.sax.handler.ContentHandler`.
81
82
83.. class:: SAX2DOM(documentFactory=None)
84
85   Subclass of :class:`xml.sax.handler.ContentHandler`.
86
87
88.. function:: parse(stream_or_string, parser=None, bufsize=None)
89
90   Return a :class:`DOMEventStream` from the given input. *stream_or_string* may be
91   either a file name, or a file-like object. *parser*, if given, must be an
92   :class:`~xml.sax.xmlreader.XMLReader` object. This function will change the
93   document handler of the
94   parser and activate namespace support; other parser configuration (like
95   setting an entity resolver) must have been done in advance.
96
97If you have XML in a string, you can use the :func:`parseString` function instead:
98
99.. function:: parseString(string, parser=None)
100
101   Return a :class:`DOMEventStream` that represents the (Unicode) *string*.
102
103.. data:: default_bufsize
104
105   Default value for the *bufsize* parameter to :func:`parse`.
106
107   The value of this variable can be changed before calling :func:`parse` and
108   the new value will take effect.
109
110.. _domeventstream-objects:
111
112DOMEventStream Objects
113----------------------
114
115.. class:: DOMEventStream(stream, parser, bufsize)
116
117   .. deprecated:: 3.8
118      Support for :meth:`sequence protocol <__getitem__>` is deprecated.
119
120   .. method:: getEvent()
121
122      Return a tuple containing *event* and the current *node* as
123      :class:`xml.dom.minidom.Document` if event equals :data:`START_DOCUMENT`,
124      :class:`xml.dom.minidom.Element` if event equals :data:`START_ELEMENT` or
125      :data:`END_ELEMENT` or :class:`xml.dom.minidom.Text` if event equals
126      :data:`CHARACTERS`.
127      The current node does not contain information about its children, unless
128      :func:`expandNode` is called.
129
130   .. method:: expandNode(node)
131
132      Expands all children of *node* into *node*. Example::
133
134          from xml.dom import pulldom
135
136          xml = '<html><title>Foo</title> <p>Some text <div>and more</div></p> </html>'
137          doc = pulldom.parseString(xml)
138          for event, node in doc:
139              if event == pulldom.START_ELEMENT and node.tagName == 'p':
140                  # Following statement only prints '<p/>'
141                  print(node.toxml())
142                  doc.expandNode(node)
143                  # Following statement prints node with all its children '<p>Some text <div>and more</div></p>'
144                  print(node.toxml())
145
146   .. method:: DOMEventStream.reset()
147
148