1<?xml version="1.0" encoding="iso-8859-1"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
3<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
4
5<head>
6  <title>XML Parsing and Serialization in C++ with libstudxml</title>
7
8  <meta name="copyright" content="&#169; 2013-2020 Code Synthesis Tools CC"/>
9  <meta name="keywords" content="xml,c++,parsing,serialization,api,streaming,persistence"/>
10  <meta name="description" content="XML Parsing and Serialization in C++ with libstudxml"/>
11  <meta name="revision" content="1.0"/>
12  <meta name="version" content="1.1.0"/>
13
14  <link rel="stylesheet" type="text/css" href="default.css" />
15
16<style type="text/css">
17  pre {
18    padding    : 0 0 0 0em;
19    margin     : 0em 0em 0em 0;
20
21    font-size  : 102%
22  }
23
24  body {
25    min-width: 48em;
26  }
27
28  h1 {
29    font-weight: bold;
30    font-size: 200%;
31    line-height: 1.2em;
32  }
33
34  h2 {
35    font-weight : bold;
36    font-size   : 150%;
37
38    padding-top : 0.8em;
39  }
40
41  h3 {
42    font-size   : 140%;
43    padding-top : 0.8em;
44  }
45
46  /* Force page break for both PDF and HTML (when printing). */
47  hr.page-break {
48    height: 0;
49    width: 0;
50    border: 0;
51    visibility: hidden;
52
53    page-break-after: always;
54  }
55
56  /* Adjust indentation for three levels. */
57  #container {
58    max-width: 48em;
59  }
60
61  #content {
62    padding: 0 0.1em 0 4em;
63    /*background-color: red;*/
64  }
65
66  #content h1 {
67    margin-left: -2.06em;
68  }
69
70  #content h2 {
71    margin-left: -1.33em;
72  }
73
74  /* Title page */
75
76  #titlepage {
77    padding: 2em 0 1em 0;
78    border-bottom: 1px solid black;
79  }
80
81  #titlepage .title {
82    font-weight: bold;
83    font-size: 200%;
84    text-align: center;
85    padding: 1em 0 2em 0;
86  }
87
88  #titlepage #first-title {
89    padding: 1em 0 0.4em 0;
90  }
91
92  #titlepage #second-title {
93    padding: 0.4em 0 2em 0;
94  }
95
96  #titlepage p {
97    padding-bottom: 1em;
98  }
99
100  #titlepage #revision {
101    padding-bottom: 0em;
102  }
103
104  /* Lists */
105  ul.list li, ol.list li {
106    padding-top      : 0.3em;
107    padding-bottom   : 0.3em;
108  }
109
110  div.img {
111    text-align: center;
112    padding: 2em 0 2em 0;
113  }
114
115  /*  */
116  dl dt {
117    padding   : 0.8em 0 0 0;
118  }
119
120  /* TOC */
121  table.toc {
122    border-style      : none;
123    border-collapse   : separate;
124    border-spacing    : 0;
125
126    margin            : 0.2em 0 0.2em 0;
127    padding           : 0 0 0 0;
128  }
129
130  table.toc tr {
131    padding           : 0 0 0 0;
132    margin            : 0 0 0 0;
133  }
134
135  table.toc * td, table.toc * th {
136    border-style      : none;
137    margin            : 0 0 0 0;
138    vertical-align    : top;
139  }
140
141  table.toc * th {
142    font-weight       : normal;
143    padding           : 0em 0.1em 0em 0;
144    text-align        : left;
145    white-space       : nowrap;
146  }
147
148  table.toc * table.toc th {
149    padding-left      : 1em;
150  }
151
152  table.toc * td {
153    padding           : 0em 0 0em 0.7em;
154    text-align        : left;
155  }
156
157</style>
158
159
160</head>
161
162<body>
163<div id="container">
164  <div id="content">
165
166  <div class="noprint">
167
168  <div id="titlepage">
169    <div class="title" id="first-title">XML Parsing and Serialization in C++</div>
170    <div class="title" id="second-title">With <code>libstudxml</code></div>
171
172    <p>Copyright &#169; 2013-2020 Code Synthesis Tools CC. Permission is
173       granted to copy, distribute and/or modify this document under the
174       terms of the MIT license.</p>
175
176    <!-- REMEMBER TO CHANGE VERSIONS IN THE META TAGS ABOVE! -->
177    <p id="revision">Revision 1.0, May 2017</p>
178    <p>This revision of the document describes <code>libstudxml</code> 1.1.0.</p>
179  </div>
180
181  <hr class="page-break"/>
182  <h1>Table of Contents</h1>
183
184  <table class="toc">
185    <tr>
186      <th></th><td><a href="#0">About This Document</a></td>
187    </tr>
188    <tr>
189      <th>1</th><td><a href="#1">Terminology</a></td>
190    </tr>
191    <tr>
192      <th>2</th><td><a href="#2">Low-Level API</a></td>
193    </tr>
194    <tr>
195      <th>3</th><td><a href="#3">High-Level API</a></td>
196    </tr>
197    <tr>
198      <th>4</th><td><a href="#4">Object Persistence</a></td>
199    </tr>
200    <tr>
201      <th>5</th><td><a href="#5">Inheritance</a></td>
202    </tr>
203    <tr>
204      <th>6</th><td><a href="#6">Implementation Notes</a></td>
205    </tr>
206  </table>
207  </div>
208
209  <hr class="page-break"/>
210  <h1><a name="0">About This Document</a></h1>
211
212  <p>This document is based on the presentation given by Boris Kolpackov at
213     the C++Now 2014 conference where <code>libstudxml</code> was
214     first made publicly available. Its goal is to introduce a new,
215     modern C++ API for XML by showing how to handle the most common
216     use cases. Compared to the talk, this introduction omits some of
217     the discussion relevant to XML in general and its handling
218     in C++. It also provides more complete code examples that would not
219     fit onto slides during the presentation. If, however, you would
220     like to get a more complete picture of the "state of XML in C++", then
221     you may prefer to first
222     <a href="http://youtu.be/AuamDUrG5ZU?list=UU5e__RG9K3cHrPotPABnrwg">watch
223     the video</a> of the talk.</p>
224
225  <p>While this document uses some C++11 features in the examples, the
226     library itself can be used in C++98 applications as well.</p>
227
228  <h1><a name="1">Terminology</a></h1>
229
230  <p>Before we begin, let's define a few terms to make sure we are on
231     the same page.</p>
232
233  <p>When we say "XML format" that is a bit loose. XML is actually
234     a meta-format that we specialize for our needs. That is, we decide
235     what element and attribute names we will use, which elements will
236     be valid where, what they will mean, and so on. This specialization
237     of XML to a specific format is called an <em>XML Vocabulary</em>.</p>
238
239  <p>Often, but not always, when we parse XML, we store extracted data
240     in the application's memory. Usually, we would create classes
241     specific to our XML vocabulary. For example, if we have an element
242     called <code>person</code> then we may create a C++ class also
243     called <code>person</code>. we will call such classes an
244     <em>Object Model</em>.</p>
245
246  <p>The content of an element in XML can be empty, text, nested
247     elements, or a mixture of the two:</p>
248
249  <pre class="xml">
250&lt;empty name="a" id="1"/>
251
252&lt;simple name="b" id="2">text&lt;simple/>
253
254&lt;complex name="c" id="3">
255  &lt;nested>...&lt;/nested>
256  &lt;nested>...&lt;/nested>
257&lt;complex/>
258
259&lt;mixed name="d" id="4">
260  te&lt;nested>...&lt;/nested>
261  x
262  &lt;nested>...&lt;/nested>t
263&lt;mixed/>
264  </pre>
265
266  <p>These are called the <em>empty</em>, <em>simple</em>,
267     <em>complex</em>, and <em>mixed</em> content models,
268     respectively.</p>
269
270  <h1><a name="2">Low-Level API</a></h1>
271
272  <p><code>libstudxml</code> provides the streaming XML pull parser and
273     streaming XML serializer. The parser is a conforming, non-validating
274     XML 1.0 implementation (see <a href="#6">Implementation Notes</a>
275     for details). The application character encoding (that is, the
276     encoding used in the application's memory) for both parser and
277     serializer is UTF-8. The output encoding of the serializer is
278     UTF-8 as well. The parser supports UTF-8, UTF-16, ISO-8859-1,
279     and US-ASCII input encodings.</p>
280
281  <pre class="c++">
282#include &lt;xml/parser>
283
284namespace xml
285{
286  class parser;
287}
288  </pre>
289
290  <pre class="c++">
291#include &lt;xml/serializer>
292
293namespace xml
294{
295  class serializer;
296}
297  </pre>
298
299  <p>C++ is often used to implement XML converters and filters, especially
300     where speed is a concern. Such applications require the lowest-level
301     API with minimum overhead. So we will start there (see the
302     <code>roundtrip</code> example in the <code>libstudxml</code>
303     distribution).</p>
304
305  <pre class="c++">
306class parser
307{
308  typedef unsigned short feature_type;
309
310  static const feature_type receive_elements;
311  static const feature_type receive_characters;
312  static const feature_type receive_attributes;
313  static const feature_type receive_namespace_decls;
314
315  static const feature_type receive_default =
316    receive_elements |
317    receive_characters |
318    receive_attributes;
319
320  parser (std::istream&amp;,
321          const std::string&amp; input_name,
322          feature_type = receive_default);
323  ...
324};
325  </pre>
326
327  <p>The parser constructor takes three arguments: the stream to parse,
328     input name that is used in diagnostics to identify the document
329     being parsed, and the list of events we want the parser to report.</p>
330
331  <p>As an example of an XML filter, let's write one that removes a
332     specific attribute from the document, say <code>id</code>. The
333     first step in our filter would then be to create the parser
334     instance:</p>
335
336  <pre class="c++">
337int main (int argc, char* argv[])
338{
339  ...
340
341  try
342  {
343    using namespace xml;
344
345    ifstream ifs (argv[1]);
346    parser p (ifs, argv[1]);
347
348    ...
349  }
350  catch (const xml::parsing&amp; e)
351  {
352    cerr &lt;&lt; e.what () &lt;&lt; endl;
353    return 1;
354  }
355}
356  </pre>
357
358  <p>Here we also see how to handle parsing errors. So far so good.
359     Let's see the next piece of the API.</p>
360
361  <pre class="c++">
362class parser
363{
364  enum event_type
365  {
366    start_element,
367    end_element,
368    start_attribute,
369    end_attribute,
370    characters,
371    start_namespace_decl,
372    end_namespace_decl,
373    eof
374  };
375
376  event_type next ();
377};
378  </pre>
379
380  <p>We call the <code>next()</code> function when we are ready to handle
381     the next piece of XML. And now we can implement our filter a bit
382     further:</p>
383
384  <pre class="c++">
385parser p (ifs, argv[1]);
386
387for (parser::event_type e (p.next ());
388     e != parser::eof;
389     e = p.next ())
390{
391  switch (e)
392  {
393  case parser::start_element:
394    ...
395  case parser::end_element:
396    ...
397  case parser::start_attribute:
398    ...
399  case parser::end_attribute:
400    ...
401  case parser::characters:
402    ...
403  }
404}
405  </pre>
406
407  <p>In C++11 we can use the range-based <code>for</code> loop to tidy
408     things up a bit:</p>
409
410  <pre class="c++">
411parser p (ifs, argv[1]);
412
413for (parser::event_type e: p)
414{
415  switch (e)
416  {
417    ...
418  }
419}
420  </pre>
421
422  <p>The next piece of the API puzzle:</p>
423
424  <pre class="c++">
425class parser
426{
427  const std::string&amp; name () const;
428  const std::string&amp; value () const;
429
430  unsigned long long line () const;
431  unsigned long long column () const;
432};
433  </pre>
434
435  <p>The <code>name()</code> accessor returns the name of the current element
436     or attribute. The <code>value()</code> function returns the text of the
437     characters event for an element or attribute. The <code>line()</code> and
438     <code>column()</code> accessors return the current position in the document.
439     Here is how we could print all the element positions for debugging:</p>
440
441  <pre class="c++">
442switch (e)
443{
444case parser::start_element:
445  cerr &lt;&lt; p.line () &lt;&lt; ':' &lt;&lt; p.column () &lt;&lt; ": start "
446       &lt;&lt; p.name () &lt;&lt; endl;
447  break;
448case parser::end_element:
449  cerr &lt;&lt; p.line () &lt;&lt; ':' &lt;&lt; p.column () &lt;&lt; ": end "
450       &lt;&lt; p.name () &lt;&lt; endl;
451  break;
452}
453  </pre>
454
455  <p>We have now seen enough of the parsing side to complete our filter.
456     What's missing is the serialization. So let's switch to that for a
457     moment:</p>
458
459  <pre class="c++">
460class serializer
461{
462  serializer (std::ostream&amp;,
463              const std::string&amp; output_name,
464              unsigned short indentation = 2);
465
466  ...
467};
468  </pre>
469
470  <p>The constructor is pretty similar to the <code>parser</code>'s. The
471     <code>indentation</code> argument specifies the number of indentation
472     spaces that should be used for pretty-printing. We can disable it by
473     passing <code>0</code>.</p>
474
475  <p>Now we can create the serializer instance for our filter:</p>
476
477  <pre class="c++">
478int main (int argc, char* argv[])
479{
480  ...
481
482  try
483  {
484    using namespace xml;
485
486    ifstream ifs (argv[1]);
487    parser p (ifs, argv[1]);
488    serializer s (cout, "output", 0);
489
490    ...
491  }
492  catch (const xml::parsing&amp; e)
493  {
494    cerr &lt;&lt; e.what () &lt;&lt; endl;
495    return 1;
496  }
497  catch (const xml::serialization&amp; e)
498  {
499    cerr &lt;&lt; e.what () &lt;&lt; endl;
500    return 1;
501  }
502}
503  </pre>
504
505  <p>Notice that we have also added an exception handler for the
506     <code>serialization</code> exception. Instead of handling
507     the <code>parsing</code> and <code>serialization</code>
508     exceptions separately, we can catch just
509     <code>xml::exception</code>, which is a common base for the
510     other two:</p>
511
512  <pre class="c++">
513int main (int argc, char* argv[])
514{
515  try
516  {
517    ...
518  }
519  catch (const xml::exception&amp; e)
520  {
521    cerr &lt;&lt; e.what () &lt;&lt; endl;
522    return 1;
523  }
524}
525  </pre>
526
527  <p>The next chunk of the serializer API:</p>
528
529  <pre class="c++">
530class serializer
531{
532  void start_element (const std::string&amp; name);
533  void end_element ();
534
535  void start_attribute (const std::string&amp; name);
536  void end_attribute ();
537
538  void characters (const std::string&amp; value);
539};
540  </pre>
541
542  <p>Everything should be pretty self-explanatory here. And we have
543     now seen enough to finish our filter:</p>
544
545  <pre class="c++">
546parser p (ifs, argv[1]);
547serializer s (cout, "output", 0);
548
549bool skip (false);
550
551for (parser::event_type e: p)
552{
553  switch (e)
554  {
555  case parser::start_element:
556    {
557      s.start_element (p.name ());
558      break;
559    }
560  case parser::end_element:
561    {
562      s.end_element ();
563      break;
564    }
565  case parser::start_attribute:
566    {
567      if (p.name () == "id")
568        skip = true;
569      else
570        s.start_attribute (p.name ());
571      break;
572    }
573  case parser::end_attribute:
574    {
575      if (skip)
576        skip = false;
577      else
578        s.end_attribute ();
579      break;
580    }
581  case parser::characters:
582    {
583      if (!skip)
584        s.characters (p.value ());
585      break;
586    }
587  }
588}
589  </pre>
590
591  <p>Do you see any problems with our filter? Well, one problem is
592     that this implementation doesn't handle XML namespaces. Let's
593     see how we can fix this. The first issue is with the element
594     and attribute names. When namespaces are used, those may be
595     qualified. <code>libstudxml</code> uses the <code>qname</code>
596     class to represent such names:</p>
597
598  <pre class="c++">
599#include &lt;xml/qname>
600
601namespace xml
602{
603  class qname
604  {
605  public:
606    qname ();
607    qname (const std::string&amp; name);
608    qname (const std::string&amp; namespace_,
609           const std::string&amp; name);
610
611    const std::string&amp; namespace_ () const;
612    const std::string&amp; name () const;
613  };
614}
615  </pre>
616
617  <p>The parser, in addition to the <code>name()</code> accessor also
618     has <code>qname()</code> which returns the potentially qualified
619     name. Similarly, the <code>start_element()</code> and
620     <code>start_attribute()</code> functions in the serializer are
621     overloaded to accept <code>qname</code>:</p>
622
623  <pre class="c++">
624class parser
625{
626  const qname&amp; qname () const;
627};
628
629class serializer
630{
631  void start_element (const qname&amp;);
632  void start_attribute (const qname&amp;);
633};
634  </pre>
635
636  <p>The first thing we need to do to make our filter namespace-aware
637     is to use qualified names instead of the local ones. This one is
638     easy:</p>
639
640  <pre class="c++">
641switch (e)
642{
643case parser::start_element:
644  {
645    s.start_element (p.qname ());
646    break;
647  }
648case parser::start_attribute:
649  {
650    if (p.qname () == "id") // Unqualified name.
651      skip = true;
652    else
653      s.start_attribute (p.qname ());
654    break;
655  }
656}
657  </pre>
658
659
660  <p>There is, however, another thing that we have to do. Right now our
661     code does not propagate the namespace-prefix mappings from the input
662     document to the output. At the moment, where the input XML might have
663     meaningful prefixes assigned to namespaces, the output will have
664     automatically generated ones like <code>g1</code>, <code>g2</code>,
665     and so on.</p>
666
667  <p>To fix this, first we need to tell the parser to report to us
668     namespace-prefix mappings, called namespace declarations in XML:</p>
669
670  <pre class="c++">
671parser p (ifs,
672          argv[1]
673          parser::receive_default |
674          parser::receive_namespace_decls);
675  </pre>
676
677  <p>We then also need to propagate this information to the serializer by
678     handling the <code>start_namespace_decl</code> event:</p>
679
680  <pre class="c++">
681for (...)
682{
683  switch (e)
684  {
685    ...
686
687  case parser::start_namespace_decl:
688    s.namespace_decl (p.namespace_ (), p.prefix ());
689    break;
690
691    ...
692  }
693}
694  </pre>
695
696  <p>Well, that wasn't too bad.</p>
697
698  <h1><a name="3">High-Level API</a></h1>
699
700  <p>So that was pretty low level XML work where we didn't care about
701     the semantics of the stored data, or, in fact the XML vocabulary that
702     we dealt with.</p>
703
704  <p>However, this API will quickly become tedious once we try to handle
705     a specific XML vocabulary and do something useful with the stored
706     data. Why is that? There are several areas where we could use some
707     help:</p>
708
709  <ul>
710    <li>Validation and error handling</li>
711    <li>Attribute access</li>
712    <li>Data extraction</li>
713    <li>Content model processing</li>
714    <li>Control flow</li>
715  </ul>
716
717  <p>Let's examine each area using our object position vocabulary as a
718     test case (see the <code>processing</code> example in the
719     <code>libstudxml</code> distribution).</p>
720
721  <pre class="xml">
722&lt;object id="123">
723  &lt;name>Lion's Head&lt;/name>
724  &lt;type>mountain&lt;/type>
725
726  &lt;position lat="-33.8569" lon="18.5083"/>
727  &lt;position lat="-33.8568" lon="18.5083"/>
728  &lt;position lat="-33.8568" lon="18.5082"/>
729&lt;/object>
730  </pre>
731
732  <p>If you cannot assume the XML you are parsing is valid, and you
733     generally shouldn't, then you will quickly realize that the biggest
734     pain in dealing with XML is making sure that what we got is actually
735     valid.</p>
736
737  <p>This stuff is pervasive. What if the root element is spelled
738     wrong? Maybe the <code>id</code> attribute is missing? Or there
739     is some stray text before the <code>name</code> element? Things
740     can be broken in an infinite number of ways.</p>
741
742  <p>To illustrate this point, here is the parsing code of just the
743     root element with proper error handling:</p>
744
745  <pre class="c++">
746parser p (ifs, argv[1]);
747
748if (p.next () != parser::start_element ||
749    p.qname () != "object")
750{
751  // error
752}
753
754...
755
756if (p.next () != parser::end_element) // object
757{
758  // error
759}
760  </pre>
761
762  <p>Not very pretty. To help with this, the parser API provides the
763     <code>next_expect()</code> function:</p>
764
765  <pre class="c++">
766class parser
767{
768  void next_expect (event_type);
769  void next_expect (event_type, const std::string&amp; name);
770};
771  </pre>
772
773  <p>This function gets the next event and makes sure it is what's
774     expected. If not, it throws an appropriate parsing exception.
775     This simplifies our root element parsing quite a bit:</p>
776
777  <pre class="c++">
778parser p (ifs, argv[1]);
779
780p.next_expect (parser::start_element, "object");
781...
782p.next_expect (parser::end_element); // object
783  </pre>
784
785  <p>Let's now take the next step and try to handle the <code>id</code>
786     attribute. According to what we have seen so far, it will look
787     something along these lines:</p>
788
789  <pre class="c++">
790p.next_expect (parser::start_element, "object");
791
792p.next_expect (parser::start_attribute, "id");
793p.next_expect (parser::characters);
794cout &lt;&lt; "id: " &lt;&lt; p.value () &lt;&lt; endl;
795p.next_expect (parser::end_attribute);
796
797...
798
799p.next_expect (parser::end_element); // object
800  </pre>
801
802  <p>Not too bad but there is a bit of a problem. What if our <code>object</code>
803     element had several attributes? The order of attributes in XML
804     is arbitrary so we should be prepared to get them in any order.
805     This fact complicates our attribute parsing code quite a bit:</p>
806
807  <pre class="c++">
808while (p.next () == parser::start_attribute)
809{
810  if (p.qname () == "id")
811  {
812    p.next_expect (parser::characters);
813    cout &lt;&lt; "id: " &lt;&lt; p.value () &lt;&lt; endl;
814  }
815  else if (...)
816  {
817  }
818  else
819  {
820    // error: unknown attribute
821  }
822
823  p.next_expect (parser::end_attribute);
824}
825  </pre>
826
827  <p>There is also a bug in this version. Can you see it? We now
828     don't make sure that the <code>id</code> attribute was actually
829     specified.</p>
830
831  <p>If you think about it, at this level, it is actually not that
832     convenient to receive attributes as events. In fact, a map of
833     attributes would be much more usable.</p>
834
835  <p>Remember we talked about the parser features that specify which
836     events we want to see:</p>
837
838  <pre class="c++">
839class parser
840{
841  static const feature_type receive_elements;
842  static const feature_type receive_characters;
843  static const feature_type receive_attributes;
844
845  ...
846};
847  </pre>
848
849  <p>Well, in reality, there is no <code>receive_attributes</code>. Rather,
850     there are these two options:
851
852  <pre class="c++">
853class parser
854{
855  static const feature_type receive_attributes_map;
856  static const feature_type receive_attributes_event;
857
858  ...
859};
860  </pre>
861
862  <p>That is, we can ask the parser to send us attributes as events or
863     as a map. And the default is to send them as a map.</p>
864
865  <p>In case of a map, we have the following attribute access API to work
866     with:</p>
867
868  <pre class="c++">
869class parser
870{
871  const std::string&amp; attribute (const std::string&amp; name) const;
872
873  std::string attribute (const std::string&amp; name,
874                         const std::string&amp; default_value) const;
875
876  bool attribute_present (const std::string&amp; name) const;
877};
878  </pre>
879
880  <p>If the attribute is not found, then the version without the default
881     value throws an appropriate parsing exception while the version with
882     the default value returns that value. There are also the
883     <code>qname</code> versions of these functions.</p>
884
885  <p>Let's see how this simplifies our code:</p>
886
887  <pre class="c++">
888p.next_expect (parser::start_element, "object");
889
890cout &lt;&lt; "id: " &lt;&lt; p.attribute ("id") &lt;&lt; endl;
891
892...
893
894p.next_expect (parser::end_element); // object
895  </pre>
896
897  <p>Much better.</p>
898
899  <p>If the <code>id</code> attribute is not present, then we get an
900     exception. But what happens if we have a stray attribute in our
901     document? The attribute map is magical in this sense. After
902     the <code>end_element</code> event for the <code>object</code>
903     element the parser will examine the attribute map. If there is
904     an attribute that hasn't been retrieved with one of the attribute
905     access functions, then the parser will throw the unexpected
906     attribute exception.</p>
907
908  <p>Error handling out of the way, the next thing that will annoy us is data
909     extractions. In XML everything is text. While our <code>id</code> value
910     is an integer, XML stores it as text and the low-level API returns it to
911     us as text. To help with this the parser provides the following data
912     extraction functions:</p>
913
914  <pre class="c++">
915class parser
916{
917  template &lt;typename T>
918  T value () const;
919
920  template &lt;typename T>
921  T attribute (const std::string&amp; name) const;
922
923  template &lt;typename T>
924  T attribute (const std::string&amp; name,
925               const T&amp; default_value) const;
926};
927  </pre>
928
929  <p>Now we can get the <code>id</code> as an integer without much fuss:</p>
930
931  <pre class="c++">
932p.next_expect (parser::start_element, "object");
933
934unsigned int id = p.attribute&lt;unsigned int> ("id");
935
936...
937
938p.next_expect (parser::end_element); // object
939  </pre>
940
941  <p>Ok, let's try to parse our vocabulary a bit further:</p>
942
943  <pre class="c++">
944p.next_expect (parser::start_element, "object");
945unsigned int id = p.attribute&lt;unsigned int> ("id");
946
947p.next_expect (parser::start_element, "name");
948
949...
950
951p.next_expect (parser::end_element); // name
952
953p.next_expect (parser::end_element); // object
954  </pre>
955
956  <p>Here is the part of the document that we are parsing:</p>
957
958  <pre class="xml">
959&lt;object id="123">
960  &lt;name>Lion's Head&lt;/name>
961  </pre>
962
963  <p>What do you think, is everything alright with our code? When we
964     try to parse our document, we will get an exception here:</p>
965
966  <pre class="c++">
967p.next_expect (parser::start_element, "name");
968  </pre>
969
970  <p>Any idea why? Let's try to print the event that we get:</p>
971
972  <pre class="c++">
973// p.next_expect (parser::start_element, "name");
974cerr &lt;&lt; p.next () &lt;&lt; endl;
975  </pre>
976
977  <p>We expect <code>start_element</code> but get <code>characters</code>!
978     Wait a minute, but there are characters after <code>object</code> and
979     before <code>name</code>. There is a newline and two spaces that are
980     replaced with hashes for illustration here:</p>
981
982  <pre class="xml">
983&lt;object id="123">#
984##&lt;name>Lion's Head&lt;/name>
985  </pre>
986
987  <p>If you go to a forum or a mailing list for any XML parser, this will
988     be the most common question. Why do I get text when I should clearly
989     get an element!?</p>
990
991  <p>The reason why we get this whitespace text is because the parser has no
992     idea whether it is significant or not. The significance of whitespaces is
993     determined by the XML content model that we talked about earlier. Here is
994     the table:</p>
995
996  <pre class="c++">
997#include &lt;xml/content>
998
999namespace xml
1000{
1001  enum class content
1002  {          //  element   characters  whitespaces
1003    empty,   //    no          no        ignored
1004    simple,  //    no          yes       preserved
1005    complex, //    yes         no        ignored
1006    mixed    //    yes         yes       preserved
1007  };
1008}
1009  </pre>
1010
1011  <p>In empty content neither nested elements nor characters are allowed with
1012     whitespaces ignored. Simple content allows no nested elements with
1013     whitespaces preserved. Complex content allows nested elements only with
1014     whitespaces which are ignored. Finally, the mixed content allows anything
1015     in any order with everything preserved.</p>
1016
1017  <p>If we specify the content model for an element, then the parser
1018     will do automatic whitespace processing for us:</p>
1019
1020  <pre class="c++">
1021class parser
1022{
1023  void content (content);
1024};
1025  </pre>
1026
1027  <p>That is, in empty and complex content, whitespaces will be silently
1028     ignored. By knowing the content model, the parser also has a chance to do
1029     more error handling for us. It will automatically throw appropriate
1030     exceptions if there are nested elements in empty or simple content or
1031     non-whitespace characters in complex content.</p>
1032
1033  <p>Ok, let's now see how we can take advantage of this feature in
1034     our code:</p>
1035
1036  <pre class="c++">
1037p.next_expect (parser::start_element, "object");
1038p.content (content::complex);
1039
1040unsigned int id = p.attribute&lt;unsigned int> ("id");
1041
1042p.next_expect (parser::start_element, "name"); // Ok.
1043
1044...
1045
1046p.next_expect (parser::end_element); // name
1047
1048p.next_expect (parser::end_element); // object
1049  </pre>
1050
1051  <p>Now whitespaces are ignored and everything works as we expected.
1052     Here is how we can parse the content of the <code>name</code>
1053     element:</p>
1054
1055  <pre class="c++">
1056p.next_expect (parser::start_element, "name");
1057p.content (content::simple);
1058
1059p.next_expect (parser::characters);
1060string name = p.value ();
1061
1062p.next_expect (parser::end_element); // name
1063  </pre>
1064
1065  <p>As you can see, parsing a simple content element is quite a bit more
1066     involved compared to getting a value of an attribute. Element markup also
1067     has a higher overhead in the resulting XML. That's why in our case it would
1068     have been wiser to make <code>name</code> and <code>type</code>
1069     attributes.</p>
1070
1071  <p>But if we are stuck with a lot of simple content elements, then
1072     the parser provides the following helper functions:</p>
1073
1074  <pre class="c++">
1075class parser
1076{
1077  std::string element ();
1078
1079  template &lt;typename T>
1080  T element ();
1081
1082  std::string element (const std::string&amp; name);
1083
1084  template &lt;typename T>
1085  T element (const std::string&amp; name);
1086
1087  std::string element (const std::string&amp; name,
1088                       const std::string&amp; default_value);
1089
1090  template &lt;typename T>
1091  T element (const std::string&amp; name,
1092             const T&amp; default_value);
1093};
1094  </pre>
1095
1096  <p>The first two assume that you have already handled the
1097     <code>start_element</code> event. They should be used if the element also
1098     has attributes. The other four parse the complete element. Overloaded
1099     <code>qname</code> versions are also provided.</p>
1100
1101  <p>Here is how we can simplify our parsing code thanks to these
1102     functions:</p>
1103
1104  <pre class="c++">
1105p.next_expect (parser::start_element, "object");
1106p.content (content::complex);
1107
1108unsigned int id = p.attribute&lt;unsigned int> ("id");
1109string name = p.element ("name");
1110
1111p.next_expect (parser::end_element); // object
1112  </pre>
1113
1114  <p>For the <code>type</code> element we would like to use this <code>enum
1115     class</code>:</p>
1116
1117  <pre class="c++">
1118enum class object_type
1119{
1120  building,
1121  mountain,
1122  ...
1123};
1124  </pre>
1125
1126  <p>The parsing code is similar to the <code>name</code> element. Now
1127     we use the data extracting version of the <code>element()</code>
1128     function:</p>
1129
1130  <pre class="c++">
1131object_type type = p.element&lt;object_type> ("type");
1132  </pre>
1133
1134  <p>Except that this won't compile. The parser doesn't know how to
1135     convert the text representation to our <code>enum.</code> By
1136     default the parser will try to use the <code>iostream</code>
1137     extraction operator but we haven't provided any.</p>
1138
1139  <p>We can provide conversion code specifically for XML by specializing
1140     the <code>value_traits</code> class template:</p>
1141
1142  <pre class="c++">
1143namespace xml
1144{
1145  template &lt;>
1146  struct value_traits&lt;object_type>
1147  {
1148    static object_type
1149    parse (std::string, const parser&amp;)
1150    {
1151      ...
1152    }
1153
1154    static std::string
1155    serialize (object_type, const serializer&amp;)
1156    {
1157      ...
1158    }
1159  };
1160}
1161  </pre>
1162
1163  <p>The last bit that we need to handle is the <code>position</code>
1164     elements. The interesting part here is how to stop without going
1165     too far since there can be several of them. To help with this task
1166     the parser allows us to peek into the next event:</p>
1167
1168  <pre class="c++">
1169p.next_expect (parser::start_element, "object");
1170p.content (content::complex);
1171...
1172
1173do
1174{
1175  p.next_expect (parser::start_element, "position");
1176  p.content (content::empty);
1177
1178  float lat = p.attribute&lt;float> ("lat");
1179  float lon = p.attribute&lt;float> ("lon");
1180
1181  p.next_expect (parser::end_element);
1182
1183} while (p.peek () == parser::start_element);
1184
1185p.next_expect (parser::end_element); // object
1186  </pre>
1187
1188  <p>Do you see anything else that we can improve? Actually, there is
1189     one thing. Look at the <code>next_expect()</code> calls in the
1190     above code. They are both immediately followed by the setting
1191     of the content model. We can tidy this up a bit by passing the
1192     content model as a third argument to <code>next_expect()</code>.
1193     This even reads like prose: "Next we expect the start of an
1194     element called <code>position</code> that shall have empty
1195     content."</p>
1196
1197  <p>Here is the complete, production-quality parsing code for our XML
1198     vocabulary. 13 lines. With validation and everything:</p>
1199
1200  <pre class="c++">
1201parser p (ifs, argv[1]);
1202
1203p.next_expect (parser::start_element, "object", content::complex);
1204
1205unsigned int id = p.attribute&lt;unsigned int> ("id");
1206string name = p.element ("name");
1207object_type type = p.element&lt;object_type> ("type");
1208
1209do
1210{
1211  p.next_expect (parser::start_element, "position", content::empty);
1212
1213  float lat = p.attribute&lt;float> ("lat");
1214  float lon = p.attribute&lt;float> ("lon");
1215
1216  p.next_expect (parser::end_element); // position
1217} while (p.peek () == parser::start_element)
1218
1219p.next_expect (parser::end_element); // object
1220  </pre>
1221
1222  <p>So that was the high-level parsing API. Let's now catch up with the
1223     corresponding additions to the serializer.</p>
1224
1225  <p>Similar to parsing, calling <code>start_attribute()</code>,
1226     <code>characters()</code>, and then <code>end_attribute()</code>
1227     might not be convenient. Instead we can add an attribute with
1228     a single call:</p>
1229
1230  <pre class="c++">
1231class serializer
1232{
1233  void attribute (const std::string&amp; name,
1234                  const std::string&amp; value);
1235
1236  void element (const std::string&amp; value);
1237
1238  void element (const std::string&amp; name,
1239                const std::string&amp; value);
1240};
1241  </pre>
1242
1243  <p>The same works for elements with simple content. The first version finishes
1244     the element that we have started, while the second writes the complete
1245     element. There are also the <code>qname</code> versions of these
1246     functions that are not shown.</p>
1247
1248  <p>Instead of strings we can also serialize value types. This uses the
1249     same <code>value_traits</code> specialization mechanism that we have
1250     used for parsing:</p>
1251
1252  <pre class="c++">
1253class serializer
1254{
1255  template &lt;typename T>
1256  void attribute (const std::string&amp; name,
1257                  const T&amp; value);
1258
1259  template &lt;typename T>
1260  void element (const T&amp; value);
1261
1262  template &lt;typename T>
1263  void element (const std::string&amp; name,
1264                const T&amp; value);
1265
1266  template &lt;typename T>
1267  void characters (const T&amp; value);
1268};
1269  </pre>
1270
1271  <p>Let's now see now how we can serialize a complete sample document for
1272     our object position vocabulary using this high-level API:</p>
1273
1274  <pre class="c++">
1275serializer s (cout, "output");
1276
1277s.start_element ("object");
1278
1279s.attribute ("id", 123);
1280s.element ("name", "Lion's Head");
1281s.element ("type", object_type::mountain);
1282
1283for (...)
1284{
1285  s.start_element ("position");
1286
1287  float lat (...), lon (...);
1288
1289  s.attribute ("lat", lat);
1290  s.attribute ("lon", lon);
1291
1292  s.end_element (); // position
1293}
1294
1295s.end_element (); // object
1296  </pre>
1297
1298  <p>Pretty straightforward stuff.</p>
1299
1300  <h1><a name="4">Object Persistence</a></h1>
1301
1302  <p>So far we have used our API to first implement a filter that doesn't
1303     really care about the data and then an application that processes the
1304     data without creating any kind of object model. Let's now try to handle
1305     the other end of the spectrum: objects that know how to persist
1306     themselves into XML (see the <code>persistence</code> example in
1307     the <code>libstudxml</code> distribution).</p>
1308
1309  <p>But before we continue, let's fix our XML to be slightly more idiomatic.
1310     That is we make <code>name</code> and <code>type</code> to be attributes
1311     rather than elements:</p>
1312
1313  <pre class="xml">
1314&lt;object name="Lion's Head" type="mountain" id="123">
1315  &lt;position lat="-33.8569" lon="18.5083"/>
1316  &lt;position lat="-33.8568" lon="18.5083"/>
1317  &lt;position lat="-33.8568" lon="18.5082"/>
1318&lt;/object>
1319  </pre>
1320
1321  <p>Generally, the API works best with idiomatic XML and will nudge you
1322     gently in that direction with minor inconveniences.</p>
1323
1324  <p>For this vocabulary, the object model might look like this:</p>
1325
1326  <pre class="c++">
1327enum class object_type {...};
1328
1329class position
1330{
1331  ...
1332
1333  float lat_;
1334  float lon_;
1335};
1336
1337class object
1338{
1339  ...
1340
1341  std::string name_;
1342  object_type type_;
1343  unsigned int id_;
1344  std::vector&lt;position> positions_;
1345};
1346  </pre>
1347
1348  <p>Here I omit sensible constructors, accessors and modifiers that our
1349     classes would probably have.</p>
1350
1351  <p>Let me also mention that what I am going to show next is what I
1352     believe is the sensible structure for XML persistence using this
1353     API. But that doesn't mean it is the only way. For example, we
1354     are going to do parsing in a constructor:</p>
1355
1356  <pre class="c++">
1357class position
1358{
1359  position (xml::parser&amp;);
1360
1361  void
1362  serialize (xml::serializer&amp;) const;
1363
1364  ...
1365};
1366
1367class object
1368{
1369  object (xml::parser&amp;);
1370
1371  void
1372  serialize (xml::serializer&amp;) const;
1373
1374  ...
1375};
1376  </pre>
1377
1378  <p>But you may prefer to first create an instance, say with the default
1379     constructor, and then have a separate function do the parsing.
1380     There is nothing wrong with this approach.</p>
1381
1382  <p>Let's start with the <code>position</code> constructor. Here, we are
1383     immediately confronted with this choice: do we parse the start and end
1384     element events in position or expect our caller to handle them.</p>
1385
1386  <p>I suggest that we let our caller do this. We may have different elements
1387     in our vocabulary that use the same <code>position</code> type. If we
1388     assume the element name in the constructor, then we won't be able to use
1389     the same class for all these elements. We will see the second advantage
1390     of this arrangement in a moment, when we deal with inheritance. But, if
1391     you have a simple model with one-to-one mapping between types and
1392     elements and no inheritance, then there is nothing wrong with going the
1393     other route.</p>
1394
1395  <pre class="c++">
1396position::
1397position (parser&amp; p)
1398  : lat_ (p.attribute&lt;float> ("lat")),
1399    lon_ (p.attribute&lt;float> ("lon"))
1400{
1401  p.content (content::empty);
1402}
1403  </pre>
1404
1405  <p>Ok, nice and clean so far. Let's look at the <code>object</code>
1406     constructor:</p>
1407
1408  <pre class="c++">
1409object::
1410object (parser&amp; p)
1411  : name_ (p.attribute ("name")),
1412    type_ (p.attribute&lt;object_type> ("type")),
1413    id_ (p.attribute&lt;unsigned int> ("id"))
1414{
1415  p.content (content::complex);
1416
1417  do
1418  {
1419    p.next_expect (parser::start_element, "position");
1420    positions_.push_back (position (p));
1421    p.next_expect (parser::end_element);
1422
1423  } while (p.peek () == parser::start_element);
1424}
1425  </pre>
1426
1427  <p>The only mildly interesting line here is where we call the position
1428     constructor to parse the content of the nested elements.</p>
1429
1430  <p>Before we look into serialization, let me also mention one other
1431     thing. In our vocabulary all the attributes are required but it is
1432     quite common to have optional attributes. The API functions with
1433     default values make it really convenient to handle such attributes
1434     in the initializer lists.</p>
1435
1436  <p>Let's say the <code>type</code> attribute is optional. Then we
1437     could do this:</p>
1438
1439  <pre class="c++">
1440object::
1441object (parser&amp; p)
1442  : ...
1443    type_ (p.attribute ("type", object_type::other))
1444    ...
1445  </pre>
1446
1447  <p>We use the same arrangement for serialization, that is, the
1448    containing object starts and ends the element allowing us to
1449    reuse the same type for different elements:</p>
1450
1451  <pre class="c++">
1452void position::serialize (serializer&amp; s) const
1453{
1454  s.attribute ("lat", lat_);
1455  s.attribute ("lon", lon_);
1456}
1457
1458void object::serialize (serializer&amp; s) const
1459{
1460  s.attribute ("name", name_);
1461  s.attribute ("type", type_);
1462  s.attribute ("id", id_);
1463
1464  for (const auto&amp; p: positions_)
1465  {
1466    s.start_element ("position");
1467    p.serialize (s);
1468    s.end_element ();
1469  }
1470}
1471  </pre>
1472
1473  <p>Ok, also nice and tidy.</p>
1474
1475  There is one thing, however, that is not so nice: the start of
1476  the parser or serializer. Here is the code:</p>
1477
1478  <pre class="c++">
1479parser p (ifs, argv[1]);
1480p.next_expect (parser::start_element, "object");
1481object o (p);
1482p.next_expect (parser::end_element);
1483
1484serializer s (cout, "output");
1485s.start_element ("object");
1486o.serialize (s);
1487s.end_element ();
1488  </pre>
1489
1490  <p>Remember, we made the caller responsible for handling the start and
1491    end of the element. This works beautifully inside the object model but
1492    not so much in the client code. What we would like to see instead
1493    is this:</p>
1494
1495  <pre class="c++">
1496parser p (ifs, argv[1]);
1497object o (p);
1498
1499serializer s (cout, "output");
1500o.serialize (s);
1501  </pre>
1502
1503  <p>The main reason for choosing this structure was the ability to reuse the
1504     same type for different elements. The other reason was inheritance which
1505     we haven't gotten to yet. If we think about it, it is very unlikely for a
1506     class corresponding to the root of our vocabulary to also be used inside
1507     as a local element. I can't remember ever seeing a vocabulary like
1508     this.</p>
1509
1510  <p>So what we can do here is make an exception: the root type of our
1511     object model handles the top-level element. Here is the parser:</p>
1512
1513  <pre class="c++">
1514object::
1515object (parser&amp; p)
1516{
1517  p.next_expect (
1518    parser::start_element, "object", content::complex);
1519
1520  name_ = p.attribute ("name");
1521  type_ = p.attribute&lt;object_type> ("type");
1522  id_ = p.attribute&lt;unsigned int> ("id");
1523
1524  ...
1525
1526  p.next_expect (parser::end_element);
1527}
1528  </pre>
1529
1530  <p>And here is the serializer:</p>
1531
1532  <pre class="c++">
1533void object::
1534serialize (serializer&amp; s) const
1535{
1536  s.start_element ("object");
1537
1538  ...
1539
1540  s.end_element ();
1541}
1542  </pre>
1543
1544  <p>The only minor drawback of going this route is that we can no longer
1545     parse attributes in the initializer list for the root object.</p>
1546
1547  <h1><a name="5">Inheritance</a></h1>
1548
1549  <p>So far we have had a smooth sailing with the streaming approach but things get
1550     a bit bumpy once we start dealing with inheritance. This is normally
1551     where the in-memory approach has its day.</p>
1552
1553  <p>Say we have <code>elevated-object</code> which adds the
1554     <code>units</code> attribute and the <code>elevation</code> elements.
1555     Here is the XML:</p>
1556
1557  <pre class="xml">
1558&lt;elevated-object name="Lion's Head" type="mountain"
1559                 units="m" id="123">
1560  &lt;position lat="-33.8569" lon="18.5083"/>
1561  &lt;position lat="-33.8568" lon="18.5083"/>
1562  &lt;position lat="-33.8568" lon="18.5082"/>
1563
1564  &lt;elevation val="668.9"/>
1565  &lt;elevation val="669"/>
1566  &lt;elevation val="669.1"/>
1567&lt;/elevated-object>
1568  </pre>
1569
1570  <p>And here is the object model:</p>
1571
1572  <pre class="c++">
1573enum class units {...};
1574
1575class elevation {...};
1576
1577class elevated_object: public object
1578{
1579  ...
1580
1581  units units_;
1582  std::vector&lt;elevation> elevations_;
1583};
1584  </pre>
1585
1586  <p>Streaming assumes linearity. We start an element, add some attributes,
1587     add some nested elements, and end the element.  In contrast, with an
1588     in-memory approach we can add some attributes, then add some nested
1589     elements, then go back and add more attributes. This kind of back and
1590     forth is exactly what inheritance often requires. So this is a bit of
1591     problem for us.</p>
1592
1593  <p>Consider the <code>elevated_object</code> constructor:</p>
1594
1595  <pre class="c++">
1596elevated_object::
1597elevated_object (parser&amp; p)
1598  : object (p),
1599    units_ (p.attribute&lt;units> ("units"))
1600{
1601  do
1602  {
1603    p.next_expect (parser::start_element, "elevation");
1604    elevations_.push_back (elevation (p));
1605    p.next_expect (parser::end_element);
1606
1607  } while (p.peek () == parser::start_element &amp;&amp;
1608           p.name () == "elevation")
1609}
1610  </pre>
1611
1612  <p>Note that here I assume we went back to our original architecture
1613     where the caller handles the start and end of the element (this is
1614     the other advantage of this architecture: it allows us to reuse
1615     base parsing and serialization code in derived classes).</p>
1616
1617  <p>So we would like to reuse the parsing code from <code>object</code>
1618     so we call the base constructor first.</p>
1619
1620  <p>Then we parse the derived attribute and elements. Do you see
1621     the problem? The <code>object</code> constructor will parse its
1622     attributes and then move on to nested elements. When this constructor
1623     returns, we need to go back to parsing attributes! This is not
1624     something that a streaming approach would normally allow.</p>
1625
1626  <p>To resolve this, the lifetime of the attribute map was extended until
1627     after the <code>end_element</code> event. That is, we can access
1628     attributes any time we are at the element's level. As a result,
1629     the above code just works.</p>
1630
1631  <p>We have the same problem in serialization. Let's say we write
1632     the straightforward code like this:</p>
1633
1634  <pre class="c++">
1635void elevated_object::
1636serialize (serializer&amp; s) const
1637{
1638  object::serialize (s);
1639
1640  s.attribute ("units", units_);
1641
1642  for (const auto&amp; e: elevations_)
1643  {
1644    s.start_element ("elevation");
1645    e.serialize (s);
1646    s.end_element ();
1647  }
1648}
1649  </pre>
1650
1651  <p>This is not going to work since we will try to add the <code>units</code>
1652     attribute after the nested <code>position</code> elements have already
1653     been written.</p>
1654
1655  <p>To handle inheritance in serialization we have to split the
1656     <code>serialize()</code> function into two. One serializes
1657     the attributes while the other &mdash; content:</p>
1658
1659  <pre class="c++">
1660void object::
1661serialize_attributes (serializer&amp; s) const
1662{
1663  s.attribute ("name", name_);
1664  s.attribute ("type", type_);
1665  s.attribute ("id", id_);
1666}
1667
1668void object::
1669serialize_content (serializer&amp; s) const
1670{
1671  for (const auto&amp; p: positions_)
1672  {
1673    s.start_element ("position");
1674    p.serialize (s);
1675    s.end_element ();
1676  }
1677}
1678  </pre>
1679
1680  <p>The <code>serialize()</code> function then simply calls these two
1681     in the correct order.</p>
1682
1683  <pre class="c++">
1684void object::
1685serialize (serializer&amp; s) const
1686{
1687  serialize_attributes (s);
1688  serialize_content (s);
1689}
1690  </pre>
1691
1692  <p>I bet you can guess what the <code>elevated_object</code>'s
1693     implementation looks like:</p>
1694
1695  <pre class="c++">
1696void elevated_object::
1697serialize_attributes (serializer&amp; s) const
1698{
1699  object::serialize_attributes (s);
1700  s.attribute ("units", units_);
1701}
1702
1703void elevated_object::
1704serialize_content (serializer&amp; s) const
1705{
1706  object::serialize_content (s);
1707
1708  for (const auto&amp; e: elevations_)
1709  {
1710    s.start_element ("elevation");
1711    e.serialize (s);
1712    s.end_element ();
1713  }
1714}
1715  </pre>
1716
1717  <p>The <code>serialize()</code> function for <code>elevated_object</code>
1718     is exactly the same:</p>
1719
1720  <pre class="c++">
1721void elevated_object::
1722serialize (serializer&amp; s) const
1723{
1724  serialize_attributes (s);
1725  serialize_content (s);
1726}
1727  </pre>
1728
1729  <h1><a name="6">Implementation Notes</a></h1>
1730
1731  <p><code>libstudxml</code>is an open source (MIT license), portable
1732     (autotools and VC++ projects provided), and external dependency-free
1733     implementation.</p>
1734
1735  <p>It provides a conforming, non-validating XML 1.0 parser by using
1736     the mature and tested Expat XML parser. <code>libstudxml</code>
1737     includes the Expat source code (also distributed under the MIT
1738     license) as an implementation detail. However, you can link to
1739     an external Expat library if you prefer.</p>
1740
1741  <p>If you are familiar with Expat, you are probably wondering how
1742     the push interface provided by Expat was adapted to the pull
1743     API shown earlier. Expat allows us to suspend and resume parsing
1744     after every event and that's exactly what this implementation
1745     does. The performance cost of this constant suspension and
1746     resumption is about 35% of Expat's performance, which is not
1747     negligible but not the end of the world either.</p>
1748
1749  <p>All in, with all the name splitting and string constructions,
1750     parsing throughput on a 2010 Intel Core i7 laptop is about
1751     37 MByte/sec, which should be sufficient for most applications.</p>
1752
1753  <p>While it is much easier to implement a conforming serializer
1754     from scratch, <code>libstudxml</code> reuses an existing and
1755     tested implementation in this case as well. It includes source
1756     code of a small C library for XML serialization called Genx
1757     (also MIT licensed) that was initially created by Tim Bray
1758     and significantly improved and extended over the past years
1759     as part of the XSD/e project.</p>
1760
1761  </div>
1762</div>
1763
1764</body>
1765</html>
1766