1<?xml version="1.0" encoding="utf-8" ?>
2<!DOCTYPE erlref SYSTEM "erlref.dtd">
3
4<erlref>
5  <header>
6    <copyright>
7      <year>2008</year>
8      <year>2016</year>
9      <holder>Ericsson AB, All Rights Reserved</holder>
10    </copyright>
11    <legalnotice>
12  Licensed under the Apache License, Version 2.0 (the "License");
13  you may not use this file except in compliance with the License.
14  You may obtain a copy of the License at
15
16      http://www.apache.org/licenses/LICENSE-2.0
17
18  Unless required by applicable law or agreed to in writing, software
19  distributed under the License is distributed on an "AS IS" BASIS,
20  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
21  See the License for the specific language governing permissions and
22  limitations under the License.
23
24  The Initial Developer of the Original Code is Ericsson AB.
25    </legalnotice>
26
27    <title>xmerl_sax_parser</title>
28    <prepared></prepared>
29    <docno></docno>
30    <date></date>
31    <rev></rev>
32  </header>
33
34  <module since="">xmerl_sax_parser</module>
35  <modulesummary>XML SAX parser API</modulesummary>
36
37  <description>
38    <p>
39      A SAX parser for XML that sends the events through a callback interface.
40     SAX is the <em>Simple API for XML</em>, originally a Java-only API. SAX was the first widely adopted API for
41     XML in Java, and is a <em>de facto</em> standard where there are versions for several programming language
42     environments other than Java.
43    </p>
44  </description>
45
46  <section>
47    <title>DATA TYPES</title>
48
49    <taglist>
50      <tag><c>option()</c></tag>
51       <item>
52       <p>
53         Options used to customize the behaviour of the parser.
54         Possible options are:
55       </p><p></p>
56       <taglist>
57         <tag><c>{continuation_fun, ContinuationFun}</c></tag>
58         <item>
59           <seemfa marker="#Module:ContinuationFun/1">ContinuationFun</seemfa> is a call back function to decide what to do if
60           the parser runs into EOF before the document is complete.
61         </item>
62         <tag><c>{continuation_state, term()}</c></tag>
63         <item>
64           State that is accessible in the continuation call back function.
65         </item>
66         <tag><c>{event_fun, EventFun}</c></tag>
67         <item>
68           <seemfa marker="#Module:EventFun/3">EventFun</seemfa> is the call back function for parser events.
69         </item>
70         <tag><c>{event_state, term()}</c></tag>
71         <item>
72           State that is accessible in the event call back function.
73         </item>
74         <tag><c>{file_type, FileType}</c></tag>
75         <item>
76           Flag that tells the parser if it's parsing a DTD or a normal XML file (default normal).
77           <list>
78             <item><c>FileType = normal | dtd</c></item>
79           </list>
80         </item>
81         <tag><c>{encoding, Encoding}</c></tag>
82         <item>
83           Set default character set used (default UTF-8). This character set is used only if not explicitly
84           given by the XML document.
85           <list>
86             <item><c>Encoding = utf8 | {utf16,big} | {utf16,little} | latin1 | list</c></item>
87           </list>
88         </item>
89         <tag><c>skip_external_dtd</c></tag>
90         <item>
91           Skips the external DTD during parsing.
92         </item>
93       </taglist>
94       </item>
95      <tag></tag>
96 <item>
97<p></p>
98       </item>
99      <tag><c>event()</c></tag>
100       <item>
101       <p>
102         The SAX events that are sent to the user via the callback.
103       </p><p></p>
104       <taglist>
105
106         <tag><c>startDocument</c></tag>
107         <item>
108           Receive notification of the beginning of a document. The SAX parser will send this event only once
109           before any other event callbacks.
110         </item>
111
112         <tag><c>endDocument</c></tag>
113         <item>
114            Receive notification of the end of a document. The SAX parser will send this event only once, and it will
115            be the last event during the parse.
116         </item>
117
118         <tag><c>{startPrefixMapping, Prefix, Uri}</c></tag>
119         <item>
120           Begin the scope of a prefix-URI Namespace mapping.
121           Note that start/endPrefixMapping events are not guaranteed to be properly nested relative to each other:
122           all startPrefixMapping events will occur immediately before the corresponding startElement event, and all
123           endPrefixMapping  events will occur immediately after the corresponding endElement event, but their
124           order is not otherwise guaranteed.
125           There will not be start/endPrefixMapping events for the "xml" prefix, since it is predeclared and immutable.
126           <list>
127             <item><c>Prefix = string()</c></item>
128             <item><c>Uri = string()</c></item>
129           </list>
130         </item>
131
132         <tag><c>{endPrefixMapping, Prefix}</c></tag>
133         <item>
134           End the scope of a prefix-URI mapping.
135           <list>
136             <item><c>Prefix = string()</c></item>
137           </list>
138         </item>
139
140         <tag><c>{startElement, Uri, LocalName, QualifiedName, Attributes}</c></tag>
141         <item>
142          Receive notification of the beginning of an element.
143
144          The Parser will send this event at the beginning of every element in the XML document;
145          there will be a corresponding endElement event for every startElement event (even when the element is empty).
146          All of the element's content will be reported, in order, before the corresponding endElement event.
147            <list>
148             <item><c>Uri = string()</c></item>
149             <item><c>LocalName = string()</c></item>
150             <item><c>QualifiedName = {Prefix, LocalName}</c></item>
151             <item><c>Prefix = string()</c></item>
152             <item><c>Attributes = [{Uri, Prefix, AttributeName, Value}]</c></item>
153             <item><c>AttributeName = string()</c></item>
154             <item><c>Value = string()</c></item>
155           </list>
156        </item>
157
158         <tag><c>{endElement, Uri, LocalName, QualifiedName}</c></tag>
159         <item>
160          Receive notification of the end of an element.
161
162          The SAX parser will send this event at the end of every element in the XML document;
163          there will be a corresponding startElement event for every endElement event (even when the element is empty).
164            <list>
165             <item><c>Uri = string()</c></item>
166             <item><c>LocalName = string()</c></item>
167             <item><c>QualifiedName = {Prefix, LocalName}</c></item>
168             <item><c>Prefix = string()</c></item>
169            </list>
170         </item>
171
172         <tag><c>{characters, string()}</c></tag>
173         <item>
174          Receive notification of character data.
175         </item>
176
177         <tag><c>{ignorableWhitespace, string()}</c></tag>
178         <item>
179           Receive notification of ignorable whitespace in element content.
180         </item>
181
182         <tag><c>{processingInstruction, Target, Data}</c></tag>
183         <item>
184           Receive notification of a processing instruction.
185
186           The Parser will send this event once for each processing instruction found:
187           note that processing instructions may occur before or after the main document element.
188            <list>
189             <item><c>Target = string()</c></item>
190             <item><c>Data = string()</c></item>
191            </list>
192         </item>
193
194         <tag><c>{comment, string()}</c></tag>
195         <item>
196           Report an XML comment anywhere in the document (both inside and outside of the document element).
197         </item>
198
199         <tag><c>startCDATA</c></tag>
200         <item>
201           Report the start of a CDATA section. The contents of the CDATA section will be reported
202           through the regular characters event.
203         </item>
204
205         <tag><c>endCDATA</c></tag>
206         <item>
207           Report the end of a CDATA section.
208         </item>
209
210         <tag><c>{startDTD, Name, PublicId, SystemId}</c></tag>
211         <item>
212           Report the start of DTD declarations, it's reporting the start of the DOCTYPE declaration.
213           If the document has no DOCTYPE declaration, this event will not be sent.
214            <list>
215             <item><c>Name = string()</c></item>
216             <item><c>PublicId = string()</c></item>
217             <item><c>SystemId = string()</c></item>
218            </list>
219         </item>
220
221         <tag><c>endDTD</c></tag>
222         <item>
223          Report the end of DTD declarations, it's reporting the end of the DOCTYPE declaration.
224         </item>
225
226         <tag><c>{startEntity, SysId}</c></tag>
227         <item>
228           Report the beginning of some internal and external XML entities. ???
229         </item>
230
231         <tag><c>{endEntity, SysId}</c></tag>
232         <item>
233           Report the end of an entity. ???
234         </item>
235
236         <tag><c>{elementDecl, Name, Model}</c></tag>
237         <item>
238           Report an element type declaration.
239           The content model will consist of the string "EMPTY", the string "ANY", or a parenthesised group,
240           optionally followed by an occurrence indicator. The model will be normalized so that all parameter
241           entities are fully resolved and all whitespace is removed,and will include the enclosing parentheses.
242           Other normalization (such as removing redundant parentheses or simplifying occurrence indicators)
243           is at the discretion of the parser.
244            <list>
245             <item><c>Name = string()</c></item>
246             <item><c>Model = string()</c></item>
247            </list>
248         </item>
249
250         <tag><c>{attributeDecl, ElementName, AttributeName, Type, Mode, Value}</c></tag>
251         <item>
252           Report an attribute type declaration.
253            <list>
254             <item><c>ElementName = string()</c></item>
255             <item><c>AttributeName = string()</c></item>
256             <item><c>Type = string()</c></item>
257             <item><c>Mode = string()</c></item>
258             <item><c>Value = string()</c></item>
259            </list>
260         </item>
261
262         <tag><c>{internalEntityDecl, Name, Value}</c></tag>
263         <item>
264          Report an internal entity declaration.
265            <list>
266             <item><c>Name = string()</c></item>
267             <item><c>Value = string()</c></item>
268            </list>
269         </item>
270
271         <tag><c>{externalEntityDecl, Name, PublicId, SystemId}</c></tag>
272         <item>
273          Report a parsed external entity declaration.
274            <list>
275             <item><c>Name = string()</c></item>
276             <item><c>PublicId = string()</c></item>
277             <item><c>SystemId = string()</c></item>
278            </list>
279         </item>
280
281         <tag><c>{unparsedEntityDecl, Name, PublicId, SystemId, Ndata}</c></tag>
282         <item>
283           Receive notification of an unparsed entity declaration event.
284            <list>
285             <item><c>Name = string()</c></item>
286             <item><c>PublicId = string()</c></item>
287             <item><c>SystemId = string()</c></item>
288             <item><c>Ndata = string()</c></item>
289            </list>
290         </item>
291
292         <tag><c>{notationDecl, Name, PublicId, SystemId}</c></tag>
293         <item>
294           Receive notification of a notation declaration event.
295            <list>
296             <item><c>Name = string()</c></item>
297             <item><c>PublicId = string()</c></item>
298             <item><c>SystemId = string()</c></item>
299            </list>
300         </item>
301
302       </taglist>
303       </item>
304
305       <tag><c>unicode_char()</c></tag>
306       <item>
307         Integer representing valid unicode codepoint.
308       </item>
309
310       <tag><c>unicode_binary()</c></tag>
311       <item>
312         Binary with characters encoded in UTF-8 or UTF-16.
313       </item>
314
315       <tag><c>latin1_binary()</c></tag>
316       <item>
317         Binary with characters encoded in iso-latin-1.
318       </item>
319
320    </taglist>
321
322  </section>
323
324
325  <funcs>
326
327    <func>
328      <name since="">file(Filename, Options) -> Result</name>
329      <fsummary>Parse file containing an XML document.</fsummary>
330      <type>
331        <v>Filename = string()</v>
332        <v>Options = [option()]</v>
333        <v>Result = {ok, EventState, Rest} |</v>
334        <v>&nbsp;&nbsp;&nbsp;{Tag, Location, Reason, EndTags, EventState}</v>
335        <v>Rest = unicode_binary() | latin1_binary()</v>
336        <v>Tag = atom() (fatal_error, or user defined tag)</v>
337        <v>Location = {CurrentLocation, EntityName, LineNo}</v>
338        <v>CurrentLocation = string()</v>
339        <v>EntityName = string()</v>
340        <v>LineNo = integer()</v>
341        <v>EventState = term()</v>
342        <v>Reason = term()</v>
343      </type>
344      <desc>
345        <p>Parse file containing an XML document. This functions uses a default continuation function to read the file in blocks.</p>
346      </desc>
347    </func>
348
349    <func>
350      <name since="">stream(Xml, Options) -> Result</name>
351      <fsummary>Parse a stream containing an XML document.</fsummary>
352      <type>
353        <v>Xml = unicode_binary() | latin1_binary() | [unicode_char()]</v>
354        <v>Options = [option()]</v>
355        <v>Result = {ok, EventState, Rest} |</v>
356        <v>&nbsp;&nbsp;&nbsp;{Tag, Location, Reason, EndTags, EventState}</v>
357        <v>Rest =  unicode_binary() | latin1_binary() | [unicode_char()]</v>
358        <v>Tag = atom() (fatal_error or user defined tag)</v>
359        <v>Location = {CurrentLocation, EntityName, LineNo}</v>
360        <v>CurrentLocation = string()</v>
361        <v>EntityName = string()</v>
362        <v>LineNo = integer()</v>
363        <v>EventState = term()</v>
364        <v>Reason = term()</v>
365      </type>
366      <desc>
367        <p>Parse a stream containing an XML document.</p>
368      </desc>
369    </func>
370
371   </funcs>
372
373
374
375  <funcs>
376    <fsdescription>
377      <title>CALLBACK FUNCTIONS</title>
378      <p>
379        The callback interface is based on that the user sends a fun with the
380        correct signature to the parser.
381     </p>
382    </fsdescription>
383
384    <func>
385      <name since="">Module:ContinuationFun(State) -> {NewBytes, NewState}</name>
386      <fsummary>Continuation call back function.</fsummary>
387      <type>
388        <v>State = NewState = term()</v>
389        <v>NewBytes = binary() | list() (should be same as start input in stream/2)</v>
390      </type>
391      <desc>
392        <p>
393          This function is called whenever the parser runs out of input data.
394          If the function can't get hold of more input an empty list or binary
395          (depends on start input in stream/2) is returned.
396
397          Other types of errors is handled through exceptions. Use throw/1 to send the
398          following tuple {Tag = atom(), Reason = string()} if the continuation function encounters a fatal error.
399          Tag is an atom that identifies the functional entity that sends the exception
400          and Reason is a string that describes the problem.
401        </p>
402      </desc>
403    </func>
404
405    <func>
406      <name since="">Module:EventFun(Event, Location, State) -> NewState</name>
407      <fsummary>Event call back function.</fsummary>
408      <type>
409        <v>Event = event()</v>
410        <v>Location = {CurrentLocation, Entityname, LineNo}</v>
411        <v>CurrentLocation = string()</v>
412        <v>Entityname = string()</v>
413        <v>LineNo = integer()</v>
414        <v>State = NewState = term()</v>
415      </type>
416      <desc>
417        <p>
418          This function is called for every event sent by the parser.
419
420          The error handling is done through exceptions. Use throw/1 to send the
421          following tuple {Tag = atom(), Reason = string()} if the application encounters a fatal error.
422          Tag is an atom that identifies the functional entity that sends the exception
423          and Reason is a string that describes the problem.
424        </p>
425      </desc>
426    </func>
427
428  </funcs>
429
430
431
432</erlref>
433
434