1<?xml version="1.0" encoding="iso-8859-1"?> 2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 3<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> 4 5<head> 6 <title>XML Parsing and Serialization in C++ with libstudxml</title> 7 8 <meta name="copyright" content="© 2013-2020 Code Synthesis Tools CC"/> 9 <meta name="keywords" content="xml,c++,parsing,serialization,api,streaming,persistence"/> 10 <meta name="description" content="XML Parsing and Serialization in C++ with libstudxml"/> 11 <meta name="revision" content="1.0"/> 12 <meta name="version" content="1.1.0"/> 13 14 <link rel="stylesheet" type="text/css" href="default.css" /> 15 16<style type="text/css"> 17 pre { 18 padding : 0 0 0 0em; 19 margin : 0em 0em 0em 0; 20 21 font-size : 102% 22 } 23 24 body { 25 min-width: 48em; 26 } 27 28 h1 { 29 font-weight: bold; 30 font-size: 200%; 31 line-height: 1.2em; 32 } 33 34 h2 { 35 font-weight : bold; 36 font-size : 150%; 37 38 padding-top : 0.8em; 39 } 40 41 h3 { 42 font-size : 140%; 43 padding-top : 0.8em; 44 } 45 46 /* Force page break for both PDF and HTML (when printing). */ 47 hr.page-break { 48 height: 0; 49 width: 0; 50 border: 0; 51 visibility: hidden; 52 53 page-break-after: always; 54 } 55 56 /* Adjust indentation for three levels. */ 57 #container { 58 max-width: 48em; 59 } 60 61 #content { 62 padding: 0 0.1em 0 4em; 63 /*background-color: red;*/ 64 } 65 66 #content h1 { 67 margin-left: -2.06em; 68 } 69 70 #content h2 { 71 margin-left: -1.33em; 72 } 73 74 /* Title page */ 75 76 #titlepage { 77 padding: 2em 0 1em 0; 78 border-bottom: 1px solid black; 79 } 80 81 #titlepage .title { 82 font-weight: bold; 83 font-size: 200%; 84 text-align: center; 85 padding: 1em 0 2em 0; 86 } 87 88 #titlepage #first-title { 89 padding: 1em 0 0.4em 0; 90 } 91 92 #titlepage #second-title { 93 padding: 0.4em 0 2em 0; 94 } 95 96 #titlepage p { 97 padding-bottom: 1em; 98 } 99 100 #titlepage #revision { 101 padding-bottom: 0em; 102 } 103 104 /* Lists */ 105 ul.list li, ol.list li { 106 padding-top : 0.3em; 107 padding-bottom : 0.3em; 108 } 109 110 div.img { 111 text-align: center; 112 padding: 2em 0 2em 0; 113 } 114 115 /* */ 116 dl dt { 117 padding : 0.8em 0 0 0; 118 } 119 120 /* TOC */ 121 table.toc { 122 border-style : none; 123 border-collapse : separate; 124 border-spacing : 0; 125 126 margin : 0.2em 0 0.2em 0; 127 padding : 0 0 0 0; 128 } 129 130 table.toc tr { 131 padding : 0 0 0 0; 132 margin : 0 0 0 0; 133 } 134 135 table.toc * td, table.toc * th { 136 border-style : none; 137 margin : 0 0 0 0; 138 vertical-align : top; 139 } 140 141 table.toc * th { 142 font-weight : normal; 143 padding : 0em 0.1em 0em 0; 144 text-align : left; 145 white-space : nowrap; 146 } 147 148 table.toc * table.toc th { 149 padding-left : 1em; 150 } 151 152 table.toc * td { 153 padding : 0em 0 0em 0.7em; 154 text-align : left; 155 } 156 157</style> 158 159 160</head> 161 162<body> 163<div id="container"> 164 <div id="content"> 165 166 <div class="noprint"> 167 168 <div id="titlepage"> 169 <div class="title" id="first-title">XML Parsing and Serialization in C++</div> 170 <div class="title" id="second-title">With <code>libstudxml</code></div> 171 172 <p>Copyright © 2013-2020 Code Synthesis Tools CC. Permission is 173 granted to copy, distribute and/or modify this document under the 174 terms of the MIT license.</p> 175 176 <!-- REMEMBER TO CHANGE VERSIONS IN THE META TAGS ABOVE! --> 177 <p id="revision">Revision 1.0, May 2017</p> 178 <p>This revision of the document describes <code>libstudxml</code> 1.1.0.</p> 179 </div> 180 181 <hr class="page-break"/> 182 <h1>Table of Contents</h1> 183 184 <table class="toc"> 185 <tr> 186 <th></th><td><a href="#0">About This Document</a></td> 187 </tr> 188 <tr> 189 <th>1</th><td><a href="#1">Terminology</a></td> 190 </tr> 191 <tr> 192 <th>2</th><td><a href="#2">Low-Level API</a></td> 193 </tr> 194 <tr> 195 <th>3</th><td><a href="#3">High-Level API</a></td> 196 </tr> 197 <tr> 198 <th>4</th><td><a href="#4">Object Persistence</a></td> 199 </tr> 200 <tr> 201 <th>5</th><td><a href="#5">Inheritance</a></td> 202 </tr> 203 <tr> 204 <th>6</th><td><a href="#6">Implementation Notes</a></td> 205 </tr> 206 </table> 207 </div> 208 209 <hr class="page-break"/> 210 <h1><a name="0">About This Document</a></h1> 211 212 <p>This document is based on the presentation given by Boris Kolpackov at 213 the C++Now 2014 conference where <code>libstudxml</code> was 214 first made publicly available. Its goal is to introduce a new, 215 modern C++ API for XML by showing how to handle the most common 216 use cases. Compared to the talk, this introduction omits some of 217 the discussion relevant to XML in general and its handling 218 in C++. It also provides more complete code examples that would not 219 fit onto slides during the presentation. If, however, you would 220 like to get a more complete picture of the "state of XML in C++", then 221 you may prefer to first 222 <a href="http://youtu.be/AuamDUrG5ZU?list=UU5e__RG9K3cHrPotPABnrwg">watch 223 the video</a> of the talk.</p> 224 225 <p>While this document uses some C++11 features in the examples, the 226 library itself can be used in C++98 applications as well.</p> 227 228 <h1><a name="1">Terminology</a></h1> 229 230 <p>Before we begin, let's define a few terms to make sure we are on 231 the same page.</p> 232 233 <p>When we say "XML format" that is a bit loose. XML is actually 234 a meta-format that we specialize for our needs. That is, we decide 235 what element and attribute names we will use, which elements will 236 be valid where, what they will mean, and so on. This specialization 237 of XML to a specific format is called an <em>XML Vocabulary</em>.</p> 238 239 <p>Often, but not always, when we parse XML, we store extracted data 240 in the application's memory. Usually, we would create classes 241 specific to our XML vocabulary. For example, if we have an element 242 called <code>person</code> then we may create a C++ class also 243 called <code>person</code>. we will call such classes an 244 <em>Object Model</em>.</p> 245 246 <p>The content of an element in XML can be empty, text, nested 247 elements, or a mixture of the two:</p> 248 249 <pre class="xml"> 250<empty name="a" id="1"/> 251 252<simple name="b" id="2">text<simple/> 253 254<complex name="c" id="3"> 255 <nested>...</nested> 256 <nested>...</nested> 257<complex/> 258 259<mixed name="d" id="4"> 260 te<nested>...</nested> 261 x 262 <nested>...</nested>t 263<mixed/> 264 </pre> 265 266 <p>These are called the <em>empty</em>, <em>simple</em>, 267 <em>complex</em>, and <em>mixed</em> content models, 268 respectively.</p> 269 270 <h1><a name="2">Low-Level API</a></h1> 271 272 <p><code>libstudxml</code> provides the streaming XML pull parser and 273 streaming XML serializer. The parser is a conforming, non-validating 274 XML 1.0 implementation (see <a href="#6">Implementation Notes</a> 275 for details). The application character encoding (that is, the 276 encoding used in the application's memory) for both parser and 277 serializer is UTF-8. The output encoding of the serializer is 278 UTF-8 as well. The parser supports UTF-8, UTF-16, ISO-8859-1, 279 and US-ASCII input encodings.</p> 280 281 <pre class="c++"> 282#include <xml/parser> 283 284namespace xml 285{ 286 class parser; 287} 288 </pre> 289 290 <pre class="c++"> 291#include <xml/serializer> 292 293namespace xml 294{ 295 class serializer; 296} 297 </pre> 298 299 <p>C++ is often used to implement XML converters and filters, especially 300 where speed is a concern. Such applications require the lowest-level 301 API with minimum overhead. So we will start there (see the 302 <code>roundtrip</code> example in the <code>libstudxml</code> 303 distribution).</p> 304 305 <pre class="c++"> 306class parser 307{ 308 typedef unsigned short feature_type; 309 310 static const feature_type receive_elements; 311 static const feature_type receive_characters; 312 static const feature_type receive_attributes; 313 static const feature_type receive_namespace_decls; 314 315 static const feature_type receive_default = 316 receive_elements | 317 receive_characters | 318 receive_attributes; 319 320 parser (std::istream&, 321 const std::string& input_name, 322 feature_type = receive_default); 323 ... 324}; 325 </pre> 326 327 <p>The parser constructor takes three arguments: the stream to parse, 328 input name that is used in diagnostics to identify the document 329 being parsed, and the list of events we want the parser to report.</p> 330 331 <p>As an example of an XML filter, let's write one that removes a 332 specific attribute from the document, say <code>id</code>. The 333 first step in our filter would then be to create the parser 334 instance:</p> 335 336 <pre class="c++"> 337int main (int argc, char* argv[]) 338{ 339 ... 340 341 try 342 { 343 using namespace xml; 344 345 ifstream ifs (argv[1]); 346 parser p (ifs, argv[1]); 347 348 ... 349 } 350 catch (const xml::parsing& e) 351 { 352 cerr << e.what () << endl; 353 return 1; 354 } 355} 356 </pre> 357 358 <p>Here we also see how to handle parsing errors. So far so good. 359 Let's see the next piece of the API.</p> 360 361 <pre class="c++"> 362class parser 363{ 364 enum event_type 365 { 366 start_element, 367 end_element, 368 start_attribute, 369 end_attribute, 370 characters, 371 start_namespace_decl, 372 end_namespace_decl, 373 eof 374 }; 375 376 event_type next (); 377}; 378 </pre> 379 380 <p>We call the <code>next()</code> function when we are ready to handle 381 the next piece of XML. And now we can implement our filter a bit 382 further:</p> 383 384 <pre class="c++"> 385parser p (ifs, argv[1]); 386 387for (parser::event_type e (p.next ()); 388 e != parser::eof; 389 e = p.next ()) 390{ 391 switch (e) 392 { 393 case parser::start_element: 394 ... 395 case parser::end_element: 396 ... 397 case parser::start_attribute: 398 ... 399 case parser::end_attribute: 400 ... 401 case parser::characters: 402 ... 403 } 404} 405 </pre> 406 407 <p>In C++11 we can use the range-based <code>for</code> loop to tidy 408 things up a bit:</p> 409 410 <pre class="c++"> 411parser p (ifs, argv[1]); 412 413for (parser::event_type e: p) 414{ 415 switch (e) 416 { 417 ... 418 } 419} 420 </pre> 421 422 <p>The next piece of the API puzzle:</p> 423 424 <pre class="c++"> 425class parser 426{ 427 const std::string& name () const; 428 const std::string& value () const; 429 430 unsigned long long line () const; 431 unsigned long long column () const; 432}; 433 </pre> 434 435 <p>The <code>name()</code> accessor returns the name of the current element 436 or attribute. The <code>value()</code> function returns the text of the 437 characters event for an element or attribute. The <code>line()</code> and 438 <code>column()</code> accessors return the current position in the document. 439 Here is how we could print all the element positions for debugging:</p> 440 441 <pre class="c++"> 442switch (e) 443{ 444case parser::start_element: 445 cerr << p.line () << ':' << p.column () << ": start " 446 << p.name () << endl; 447 break; 448case parser::end_element: 449 cerr << p.line () << ':' << p.column () << ": end " 450 << p.name () << endl; 451 break; 452} 453 </pre> 454 455 <p>We have now seen enough of the parsing side to complete our filter. 456 What's missing is the serialization. So let's switch to that for a 457 moment:</p> 458 459 <pre class="c++"> 460class serializer 461{ 462 serializer (std::ostream&, 463 const std::string& output_name, 464 unsigned short indentation = 2); 465 466 ... 467}; 468 </pre> 469 470 <p>The constructor is pretty similar to the <code>parser</code>'s. The 471 <code>indentation</code> argument specifies the number of indentation 472 spaces that should be used for pretty-printing. We can disable it by 473 passing <code>0</code>.</p> 474 475 <p>Now we can create the serializer instance for our filter:</p> 476 477 <pre class="c++"> 478int main (int argc, char* argv[]) 479{ 480 ... 481 482 try 483 { 484 using namespace xml; 485 486 ifstream ifs (argv[1]); 487 parser p (ifs, argv[1]); 488 serializer s (cout, "output", 0); 489 490 ... 491 } 492 catch (const xml::parsing& e) 493 { 494 cerr << e.what () << endl; 495 return 1; 496 } 497 catch (const xml::serialization& e) 498 { 499 cerr << e.what () << endl; 500 return 1; 501 } 502} 503 </pre> 504 505 <p>Notice that we have also added an exception handler for the 506 <code>serialization</code> exception. Instead of handling 507 the <code>parsing</code> and <code>serialization</code> 508 exceptions separately, we can catch just 509 <code>xml::exception</code>, which is a common base for the 510 other two:</p> 511 512 <pre class="c++"> 513int main (int argc, char* argv[]) 514{ 515 try 516 { 517 ... 518 } 519 catch (const xml::exception& e) 520 { 521 cerr << e.what () << endl; 522 return 1; 523 } 524} 525 </pre> 526 527 <p>The next chunk of the serializer API:</p> 528 529 <pre class="c++"> 530class serializer 531{ 532 void start_element (const std::string& name); 533 void end_element (); 534 535 void start_attribute (const std::string& name); 536 void end_attribute (); 537 538 void characters (const std::string& value); 539}; 540 </pre> 541 542 <p>Everything should be pretty self-explanatory here. And we have 543 now seen enough to finish our filter:</p> 544 545 <pre class="c++"> 546parser p (ifs, argv[1]); 547serializer s (cout, "output", 0); 548 549bool skip (false); 550 551for (parser::event_type e: p) 552{ 553 switch (e) 554 { 555 case parser::start_element: 556 { 557 s.start_element (p.name ()); 558 break; 559 } 560 case parser::end_element: 561 { 562 s.end_element (); 563 break; 564 } 565 case parser::start_attribute: 566 { 567 if (p.name () == "id") 568 skip = true; 569 else 570 s.start_attribute (p.name ()); 571 break; 572 } 573 case parser::end_attribute: 574 { 575 if (skip) 576 skip = false; 577 else 578 s.end_attribute (); 579 break; 580 } 581 case parser::characters: 582 { 583 if (!skip) 584 s.characters (p.value ()); 585 break; 586 } 587 } 588} 589 </pre> 590 591 <p>Do you see any problems with our filter? Well, one problem is 592 that this implementation doesn't handle XML namespaces. Let's 593 see how we can fix this. The first issue is with the element 594 and attribute names. When namespaces are used, those may be 595 qualified. <code>libstudxml</code> uses the <code>qname</code> 596 class to represent such names:</p> 597 598 <pre class="c++"> 599#include <xml/qname> 600 601namespace xml 602{ 603 class qname 604 { 605 public: 606 qname (); 607 qname (const std::string& name); 608 qname (const std::string& namespace_, 609 const std::string& name); 610 611 const std::string& namespace_ () const; 612 const std::string& name () const; 613 }; 614} 615 </pre> 616 617 <p>The parser, in addition to the <code>name()</code> accessor also 618 has <code>qname()</code> which returns the potentially qualified 619 name. Similarly, the <code>start_element()</code> and 620 <code>start_attribute()</code> functions in the serializer are 621 overloaded to accept <code>qname</code>:</p> 622 623 <pre class="c++"> 624class parser 625{ 626 const qname& qname () const; 627}; 628 629class serializer 630{ 631 void start_element (const qname&); 632 void start_attribute (const qname&); 633}; 634 </pre> 635 636 <p>The first thing we need to do to make our filter namespace-aware 637 is to use qualified names instead of the local ones. This one is 638 easy:</p> 639 640 <pre class="c++"> 641switch (e) 642{ 643case parser::start_element: 644 { 645 s.start_element (p.qname ()); 646 break; 647 } 648case parser::start_attribute: 649 { 650 if (p.qname () == "id") // Unqualified name. 651 skip = true; 652 else 653 s.start_attribute (p.qname ()); 654 break; 655 } 656} 657 </pre> 658 659 660 <p>There is, however, another thing that we have to do. Right now our 661 code does not propagate the namespace-prefix mappings from the input 662 document to the output. At the moment, where the input XML might have 663 meaningful prefixes assigned to namespaces, the output will have 664 automatically generated ones like <code>g1</code>, <code>g2</code>, 665 and so on.</p> 666 667 <p>To fix this, first we need to tell the parser to report to us 668 namespace-prefix mappings, called namespace declarations in XML:</p> 669 670 <pre class="c++"> 671parser p (ifs, 672 argv[1] 673 parser::receive_default | 674 parser::receive_namespace_decls); 675 </pre> 676 677 <p>We then also need to propagate this information to the serializer by 678 handling the <code>start_namespace_decl</code> event:</p> 679 680 <pre class="c++"> 681for (...) 682{ 683 switch (e) 684 { 685 ... 686 687 case parser::start_namespace_decl: 688 s.namespace_decl (p.namespace_ (), p.prefix ()); 689 break; 690 691 ... 692 } 693} 694 </pre> 695 696 <p>Well, that wasn't too bad.</p> 697 698 <h1><a name="3">High-Level API</a></h1> 699 700 <p>So that was pretty low level XML work where we didn't care about 701 the semantics of the stored data, or, in fact the XML vocabulary that 702 we dealt with.</p> 703 704 <p>However, this API will quickly become tedious once we try to handle 705 a specific XML vocabulary and do something useful with the stored 706 data. Why is that? There are several areas where we could use some 707 help:</p> 708 709 <ul> 710 <li>Validation and error handling</li> 711 <li>Attribute access</li> 712 <li>Data extraction</li> 713 <li>Content model processing</li> 714 <li>Control flow</li> 715 </ul> 716 717 <p>Let's examine each area using our object position vocabulary as a 718 test case (see the <code>processing</code> example in the 719 <code>libstudxml</code> distribution).</p> 720 721 <pre class="xml"> 722<object id="123"> 723 <name>Lion's Head</name> 724 <type>mountain</type> 725 726 <position lat="-33.8569" lon="18.5083"/> 727 <position lat="-33.8568" lon="18.5083"/> 728 <position lat="-33.8568" lon="18.5082"/> 729</object> 730 </pre> 731 732 <p>If you cannot assume the XML you are parsing is valid, and you 733 generally shouldn't, then you will quickly realize that the biggest 734 pain in dealing with XML is making sure that what we got is actually 735 valid.</p> 736 737 <p>This stuff is pervasive. What if the root element is spelled 738 wrong? Maybe the <code>id</code> attribute is missing? Or there 739 is some stray text before the <code>name</code> element? Things 740 can be broken in an infinite number of ways.</p> 741 742 <p>To illustrate this point, here is the parsing code of just the 743 root element with proper error handling:</p> 744 745 <pre class="c++"> 746parser p (ifs, argv[1]); 747 748if (p.next () != parser::start_element || 749 p.qname () != "object") 750{ 751 // error 752} 753 754... 755 756if (p.next () != parser::end_element) // object 757{ 758 // error 759} 760 </pre> 761 762 <p>Not very pretty. To help with this, the parser API provides the 763 <code>next_expect()</code> function:</p> 764 765 <pre class="c++"> 766class parser 767{ 768 void next_expect (event_type); 769 void next_expect (event_type, const std::string& name); 770}; 771 </pre> 772 773 <p>This function gets the next event and makes sure it is what's 774 expected. If not, it throws an appropriate parsing exception. 775 This simplifies our root element parsing quite a bit:</p> 776 777 <pre class="c++"> 778parser p (ifs, argv[1]); 779 780p.next_expect (parser::start_element, "object"); 781... 782p.next_expect (parser::end_element); // object 783 </pre> 784 785 <p>Let's now take the next step and try to handle the <code>id</code> 786 attribute. According to what we have seen so far, it will look 787 something along these lines:</p> 788 789 <pre class="c++"> 790p.next_expect (parser::start_element, "object"); 791 792p.next_expect (parser::start_attribute, "id"); 793p.next_expect (parser::characters); 794cout << "id: " << p.value () << endl; 795p.next_expect (parser::end_attribute); 796 797... 798 799p.next_expect (parser::end_element); // object 800 </pre> 801 802 <p>Not too bad but there is a bit of a problem. What if our <code>object</code> 803 element had several attributes? The order of attributes in XML 804 is arbitrary so we should be prepared to get them in any order. 805 This fact complicates our attribute parsing code quite a bit:</p> 806 807 <pre class="c++"> 808while (p.next () == parser::start_attribute) 809{ 810 if (p.qname () == "id") 811 { 812 p.next_expect (parser::characters); 813 cout << "id: " << p.value () << endl; 814 } 815 else if (...) 816 { 817 } 818 else 819 { 820 // error: unknown attribute 821 } 822 823 p.next_expect (parser::end_attribute); 824} 825 </pre> 826 827 <p>There is also a bug in this version. Can you see it? We now 828 don't make sure that the <code>id</code> attribute was actually 829 specified.</p> 830 831 <p>If you think about it, at this level, it is actually not that 832 convenient to receive attributes as events. In fact, a map of 833 attributes would be much more usable.</p> 834 835 <p>Remember we talked about the parser features that specify which 836 events we want to see:</p> 837 838 <pre class="c++"> 839class parser 840{ 841 static const feature_type receive_elements; 842 static const feature_type receive_characters; 843 static const feature_type receive_attributes; 844 845 ... 846}; 847 </pre> 848 849 <p>Well, in reality, there is no <code>receive_attributes</code>. Rather, 850 there are these two options: 851 852 <pre class="c++"> 853class parser 854{ 855 static const feature_type receive_attributes_map; 856 static const feature_type receive_attributes_event; 857 858 ... 859}; 860 </pre> 861 862 <p>That is, we can ask the parser to send us attributes as events or 863 as a map. And the default is to send them as a map.</p> 864 865 <p>In case of a map, we have the following attribute access API to work 866 with:</p> 867 868 <pre class="c++"> 869class parser 870{ 871 const std::string& attribute (const std::string& name) const; 872 873 std::string attribute (const std::string& name, 874 const std::string& default_value) const; 875 876 bool attribute_present (const std::string& name) const; 877}; 878 </pre> 879 880 <p>If the attribute is not found, then the version without the default 881 value throws an appropriate parsing exception while the version with 882 the default value returns that value. There are also the 883 <code>qname</code> versions of these functions.</p> 884 885 <p>Let's see how this simplifies our code:</p> 886 887 <pre class="c++"> 888p.next_expect (parser::start_element, "object"); 889 890cout << "id: " << p.attribute ("id") << endl; 891 892... 893 894p.next_expect (parser::end_element); // object 895 </pre> 896 897 <p>Much better.</p> 898 899 <p>If the <code>id</code> attribute is not present, then we get an 900 exception. But what happens if we have a stray attribute in our 901 document? The attribute map is magical in this sense. After 902 the <code>end_element</code> event for the <code>object</code> 903 element the parser will examine the attribute map. If there is 904 an attribute that hasn't been retrieved with one of the attribute 905 access functions, then the parser will throw the unexpected 906 attribute exception.</p> 907 908 <p>Error handling out of the way, the next thing that will annoy us is data 909 extractions. In XML everything is text. While our <code>id</code> value 910 is an integer, XML stores it as text and the low-level API returns it to 911 us as text. To help with this the parser provides the following data 912 extraction functions:</p> 913 914 <pre class="c++"> 915class parser 916{ 917 template <typename T> 918 T value () const; 919 920 template <typename T> 921 T attribute (const std::string& name) const; 922 923 template <typename T> 924 T attribute (const std::string& name, 925 const T& default_value) const; 926}; 927 </pre> 928 929 <p>Now we can get the <code>id</code> as an integer without much fuss:</p> 930 931 <pre class="c++"> 932p.next_expect (parser::start_element, "object"); 933 934unsigned int id = p.attribute<unsigned int> ("id"); 935 936... 937 938p.next_expect (parser::end_element); // object 939 </pre> 940 941 <p>Ok, let's try to parse our vocabulary a bit further:</p> 942 943 <pre class="c++"> 944p.next_expect (parser::start_element, "object"); 945unsigned int id = p.attribute<unsigned int> ("id"); 946 947p.next_expect (parser::start_element, "name"); 948 949... 950 951p.next_expect (parser::end_element); // name 952 953p.next_expect (parser::end_element); // object 954 </pre> 955 956 <p>Here is the part of the document that we are parsing:</p> 957 958 <pre class="xml"> 959<object id="123"> 960 <name>Lion's Head</name> 961 </pre> 962 963 <p>What do you think, is everything alright with our code? When we 964 try to parse our document, we will get an exception here:</p> 965 966 <pre class="c++"> 967p.next_expect (parser::start_element, "name"); 968 </pre> 969 970 <p>Any idea why? Let's try to print the event that we get:</p> 971 972 <pre class="c++"> 973// p.next_expect (parser::start_element, "name"); 974cerr << p.next () << endl; 975 </pre> 976 977 <p>We expect <code>start_element</code> but get <code>characters</code>! 978 Wait a minute, but there are characters after <code>object</code> and 979 before <code>name</code>. There is a newline and two spaces that are 980 replaced with hashes for illustration here:</p> 981 982 <pre class="xml"> 983<object id="123"># 984##<name>Lion's Head</name> 985 </pre> 986 987 <p>If you go to a forum or a mailing list for any XML parser, this will 988 be the most common question. Why do I get text when I should clearly 989 get an element!?</p> 990 991 <p>The reason why we get this whitespace text is because the parser has no 992 idea whether it is significant or not. The significance of whitespaces is 993 determined by the XML content model that we talked about earlier. Here is 994 the table:</p> 995 996 <pre class="c++"> 997#include <xml/content> 998 999namespace xml 1000{ 1001 enum class content 1002 { // element characters whitespaces 1003 empty, // no no ignored 1004 simple, // no yes preserved 1005 complex, // yes no ignored 1006 mixed // yes yes preserved 1007 }; 1008} 1009 </pre> 1010 1011 <p>In empty content neither nested elements nor characters are allowed with 1012 whitespaces ignored. Simple content allows no nested elements with 1013 whitespaces preserved. Complex content allows nested elements only with 1014 whitespaces which are ignored. Finally, the mixed content allows anything 1015 in any order with everything preserved.</p> 1016 1017 <p>If we specify the content model for an element, then the parser 1018 will do automatic whitespace processing for us:</p> 1019 1020 <pre class="c++"> 1021class parser 1022{ 1023 void content (content); 1024}; 1025 </pre> 1026 1027 <p>That is, in empty and complex content, whitespaces will be silently 1028 ignored. By knowing the content model, the parser also has a chance to do 1029 more error handling for us. It will automatically throw appropriate 1030 exceptions if there are nested elements in empty or simple content or 1031 non-whitespace characters in complex content.</p> 1032 1033 <p>Ok, let's now see how we can take advantage of this feature in 1034 our code:</p> 1035 1036 <pre class="c++"> 1037p.next_expect (parser::start_element, "object"); 1038p.content (content::complex); 1039 1040unsigned int id = p.attribute<unsigned int> ("id"); 1041 1042p.next_expect (parser::start_element, "name"); // Ok. 1043 1044... 1045 1046p.next_expect (parser::end_element); // name 1047 1048p.next_expect (parser::end_element); // object 1049 </pre> 1050 1051 <p>Now whitespaces are ignored and everything works as we expected. 1052 Here is how we can parse the content of the <code>name</code> 1053 element:</p> 1054 1055 <pre class="c++"> 1056p.next_expect (parser::start_element, "name"); 1057p.content (content::simple); 1058 1059p.next_expect (parser::characters); 1060string name = p.value (); 1061 1062p.next_expect (parser::end_element); // name 1063 </pre> 1064 1065 <p>As you can see, parsing a simple content element is quite a bit more 1066 involved compared to getting a value of an attribute. Element markup also 1067 has a higher overhead in the resulting XML. That's why in our case it would 1068 have been wiser to make <code>name</code> and <code>type</code> 1069 attributes.</p> 1070 1071 <p>But if we are stuck with a lot of simple content elements, then 1072 the parser provides the following helper functions:</p> 1073 1074 <pre class="c++"> 1075class parser 1076{ 1077 std::string element (); 1078 1079 template <typename T> 1080 T element (); 1081 1082 std::string element (const std::string& name); 1083 1084 template <typename T> 1085 T element (const std::string& name); 1086 1087 std::string element (const std::string& name, 1088 const std::string& default_value); 1089 1090 template <typename T> 1091 T element (const std::string& name, 1092 const T& default_value); 1093}; 1094 </pre> 1095 1096 <p>The first two assume that you have already handled the 1097 <code>start_element</code> event. They should be used if the element also 1098 has attributes. The other four parse the complete element. Overloaded 1099 <code>qname</code> versions are also provided.</p> 1100 1101 <p>Here is how we can simplify our parsing code thanks to these 1102 functions:</p> 1103 1104 <pre class="c++"> 1105p.next_expect (parser::start_element, "object"); 1106p.content (content::complex); 1107 1108unsigned int id = p.attribute<unsigned int> ("id"); 1109string name = p.element ("name"); 1110 1111p.next_expect (parser::end_element); // object 1112 </pre> 1113 1114 <p>For the <code>type</code> element we would like to use this <code>enum 1115 class</code>:</p> 1116 1117 <pre class="c++"> 1118enum class object_type 1119{ 1120 building, 1121 mountain, 1122 ... 1123}; 1124 </pre> 1125 1126 <p>The parsing code is similar to the <code>name</code> element. Now 1127 we use the data extracting version of the <code>element()</code> 1128 function:</p> 1129 1130 <pre class="c++"> 1131object_type type = p.element<object_type> ("type"); 1132 </pre> 1133 1134 <p>Except that this won't compile. The parser doesn't know how to 1135 convert the text representation to our <code>enum.</code> By 1136 default the parser will try to use the <code>iostream</code> 1137 extraction operator but we haven't provided any.</p> 1138 1139 <p>We can provide conversion code specifically for XML by specializing 1140 the <code>value_traits</code> class template:</p> 1141 1142 <pre class="c++"> 1143namespace xml 1144{ 1145 template <> 1146 struct value_traits<object_type> 1147 { 1148 static object_type 1149 parse (std::string, const parser&) 1150 { 1151 ... 1152 } 1153 1154 static std::string 1155 serialize (object_type, const serializer&) 1156 { 1157 ... 1158 } 1159 }; 1160} 1161 </pre> 1162 1163 <p>The last bit that we need to handle is the <code>position</code> 1164 elements. The interesting part here is how to stop without going 1165 too far since there can be several of them. To help with this task 1166 the parser allows us to peek into the next event:</p> 1167 1168 <pre class="c++"> 1169p.next_expect (parser::start_element, "object"); 1170p.content (content::complex); 1171... 1172 1173do 1174{ 1175 p.next_expect (parser::start_element, "position"); 1176 p.content (content::empty); 1177 1178 float lat = p.attribute<float> ("lat"); 1179 float lon = p.attribute<float> ("lon"); 1180 1181 p.next_expect (parser::end_element); 1182 1183} while (p.peek () == parser::start_element); 1184 1185p.next_expect (parser::end_element); // object 1186 </pre> 1187 1188 <p>Do you see anything else that we can improve? Actually, there is 1189 one thing. Look at the <code>next_expect()</code> calls in the 1190 above code. They are both immediately followed by the setting 1191 of the content model. We can tidy this up a bit by passing the 1192 content model as a third argument to <code>next_expect()</code>. 1193 This even reads like prose: "Next we expect the start of an 1194 element called <code>position</code> that shall have empty 1195 content."</p> 1196 1197 <p>Here is the complete, production-quality parsing code for our XML 1198 vocabulary. 13 lines. With validation and everything:</p> 1199 1200 <pre class="c++"> 1201parser p (ifs, argv[1]); 1202 1203p.next_expect (parser::start_element, "object", content::complex); 1204 1205unsigned int id = p.attribute<unsigned int> ("id"); 1206string name = p.element ("name"); 1207object_type type = p.element<object_type> ("type"); 1208 1209do 1210{ 1211 p.next_expect (parser::start_element, "position", content::empty); 1212 1213 float lat = p.attribute<float> ("lat"); 1214 float lon = p.attribute<float> ("lon"); 1215 1216 p.next_expect (parser::end_element); // position 1217} while (p.peek () == parser::start_element) 1218 1219p.next_expect (parser::end_element); // object 1220 </pre> 1221 1222 <p>So that was the high-level parsing API. Let's now catch up with the 1223 corresponding additions to the serializer.</p> 1224 1225 <p>Similar to parsing, calling <code>start_attribute()</code>, 1226 <code>characters()</code>, and then <code>end_attribute()</code> 1227 might not be convenient. Instead we can add an attribute with 1228 a single call:</p> 1229 1230 <pre class="c++"> 1231class serializer 1232{ 1233 void attribute (const std::string& name, 1234 const std::string& value); 1235 1236 void element (const std::string& value); 1237 1238 void element (const std::string& name, 1239 const std::string& value); 1240}; 1241 </pre> 1242 1243 <p>The same works for elements with simple content. The first version finishes 1244 the element that we have started, while the second writes the complete 1245 element. There are also the <code>qname</code> versions of these 1246 functions that are not shown.</p> 1247 1248 <p>Instead of strings we can also serialize value types. This uses the 1249 same <code>value_traits</code> specialization mechanism that we have 1250 used for parsing:</p> 1251 1252 <pre class="c++"> 1253class serializer 1254{ 1255 template <typename T> 1256 void attribute (const std::string& name, 1257 const T& value); 1258 1259 template <typename T> 1260 void element (const T& value); 1261 1262 template <typename T> 1263 void element (const std::string& name, 1264 const T& value); 1265 1266 template <typename T> 1267 void characters (const T& value); 1268}; 1269 </pre> 1270 1271 <p>Let's now see now how we can serialize a complete sample document for 1272 our object position vocabulary using this high-level API:</p> 1273 1274 <pre class="c++"> 1275serializer s (cout, "output"); 1276 1277s.start_element ("object"); 1278 1279s.attribute ("id", 123); 1280s.element ("name", "Lion's Head"); 1281s.element ("type", object_type::mountain); 1282 1283for (...) 1284{ 1285 s.start_element ("position"); 1286 1287 float lat (...), lon (...); 1288 1289 s.attribute ("lat", lat); 1290 s.attribute ("lon", lon); 1291 1292 s.end_element (); // position 1293} 1294 1295s.end_element (); // object 1296 </pre> 1297 1298 <p>Pretty straightforward stuff.</p> 1299 1300 <h1><a name="4">Object Persistence</a></h1> 1301 1302 <p>So far we have used our API to first implement a filter that doesn't 1303 really care about the data and then an application that processes the 1304 data without creating any kind of object model. Let's now try to handle 1305 the other end of the spectrum: objects that know how to persist 1306 themselves into XML (see the <code>persistence</code> example in 1307 the <code>libstudxml</code> distribution).</p> 1308 1309 <p>But before we continue, let's fix our XML to be slightly more idiomatic. 1310 That is we make <code>name</code> and <code>type</code> to be attributes 1311 rather than elements:</p> 1312 1313 <pre class="xml"> 1314<object name="Lion's Head" type="mountain" id="123"> 1315 <position lat="-33.8569" lon="18.5083"/> 1316 <position lat="-33.8568" lon="18.5083"/> 1317 <position lat="-33.8568" lon="18.5082"/> 1318</object> 1319 </pre> 1320 1321 <p>Generally, the API works best with idiomatic XML and will nudge you 1322 gently in that direction with minor inconveniences.</p> 1323 1324 <p>For this vocabulary, the object model might look like this:</p> 1325 1326 <pre class="c++"> 1327enum class object_type {...}; 1328 1329class position 1330{ 1331 ... 1332 1333 float lat_; 1334 float lon_; 1335}; 1336 1337class object 1338{ 1339 ... 1340 1341 std::string name_; 1342 object_type type_; 1343 unsigned int id_; 1344 std::vector<position> positions_; 1345}; 1346 </pre> 1347 1348 <p>Here I omit sensible constructors, accessors and modifiers that our 1349 classes would probably have.</p> 1350 1351 <p>Let me also mention that what I am going to show next is what I 1352 believe is the sensible structure for XML persistence using this 1353 API. But that doesn't mean it is the only way. For example, we 1354 are going to do parsing in a constructor:</p> 1355 1356 <pre class="c++"> 1357class position 1358{ 1359 position (xml::parser&); 1360 1361 void 1362 serialize (xml::serializer&) const; 1363 1364 ... 1365}; 1366 1367class object 1368{ 1369 object (xml::parser&); 1370 1371 void 1372 serialize (xml::serializer&) const; 1373 1374 ... 1375}; 1376 </pre> 1377 1378 <p>But you may prefer to first create an instance, say with the default 1379 constructor, and then have a separate function do the parsing. 1380 There is nothing wrong with this approach.</p> 1381 1382 <p>Let's start with the <code>position</code> constructor. Here, we are 1383 immediately confronted with this choice: do we parse the start and end 1384 element events in position or expect our caller to handle them.</p> 1385 1386 <p>I suggest that we let our caller do this. We may have different elements 1387 in our vocabulary that use the same <code>position</code> type. If we 1388 assume the element name in the constructor, then we won't be able to use 1389 the same class for all these elements. We will see the second advantage 1390 of this arrangement in a moment, when we deal with inheritance. But, if 1391 you have a simple model with one-to-one mapping between types and 1392 elements and no inheritance, then there is nothing wrong with going the 1393 other route.</p> 1394 1395 <pre class="c++"> 1396position:: 1397position (parser& p) 1398 : lat_ (p.attribute<float> ("lat")), 1399 lon_ (p.attribute<float> ("lon")) 1400{ 1401 p.content (content::empty); 1402} 1403 </pre> 1404 1405 <p>Ok, nice and clean so far. Let's look at the <code>object</code> 1406 constructor:</p> 1407 1408 <pre class="c++"> 1409object:: 1410object (parser& p) 1411 : name_ (p.attribute ("name")), 1412 type_ (p.attribute<object_type> ("type")), 1413 id_ (p.attribute<unsigned int> ("id")) 1414{ 1415 p.content (content::complex); 1416 1417 do 1418 { 1419 p.next_expect (parser::start_element, "position"); 1420 positions_.push_back (position (p)); 1421 p.next_expect (parser::end_element); 1422 1423 } while (p.peek () == parser::start_element); 1424} 1425 </pre> 1426 1427 <p>The only mildly interesting line here is where we call the position 1428 constructor to parse the content of the nested elements.</p> 1429 1430 <p>Before we look into serialization, let me also mention one other 1431 thing. In our vocabulary all the attributes are required but it is 1432 quite common to have optional attributes. The API functions with 1433 default values make it really convenient to handle such attributes 1434 in the initializer lists.</p> 1435 1436 <p>Let's say the <code>type</code> attribute is optional. Then we 1437 could do this:</p> 1438 1439 <pre class="c++"> 1440object:: 1441object (parser& p) 1442 : ... 1443 type_ (p.attribute ("type", object_type::other)) 1444 ... 1445 </pre> 1446 1447 <p>We use the same arrangement for serialization, that is, the 1448 containing object starts and ends the element allowing us to 1449 reuse the same type for different elements:</p> 1450 1451 <pre class="c++"> 1452void position::serialize (serializer& s) const 1453{ 1454 s.attribute ("lat", lat_); 1455 s.attribute ("lon", lon_); 1456} 1457 1458void object::serialize (serializer& s) const 1459{ 1460 s.attribute ("name", name_); 1461 s.attribute ("type", type_); 1462 s.attribute ("id", id_); 1463 1464 for (const auto& p: positions_) 1465 { 1466 s.start_element ("position"); 1467 p.serialize (s); 1468 s.end_element (); 1469 } 1470} 1471 </pre> 1472 1473 <p>Ok, also nice and tidy.</p> 1474 1475 There is one thing, however, that is not so nice: the start of 1476 the parser or serializer. Here is the code:</p> 1477 1478 <pre class="c++"> 1479parser p (ifs, argv[1]); 1480p.next_expect (parser::start_element, "object"); 1481object o (p); 1482p.next_expect (parser::end_element); 1483 1484serializer s (cout, "output"); 1485s.start_element ("object"); 1486o.serialize (s); 1487s.end_element (); 1488 </pre> 1489 1490 <p>Remember, we made the caller responsible for handling the start and 1491 end of the element. This works beautifully inside the object model but 1492 not so much in the client code. What we would like to see instead 1493 is this:</p> 1494 1495 <pre class="c++"> 1496parser p (ifs, argv[1]); 1497object o (p); 1498 1499serializer s (cout, "output"); 1500o.serialize (s); 1501 </pre> 1502 1503 <p>The main reason for choosing this structure was the ability to reuse the 1504 same type for different elements. The other reason was inheritance which 1505 we haven't gotten to yet. If we think about it, it is very unlikely for a 1506 class corresponding to the root of our vocabulary to also be used inside 1507 as a local element. I can't remember ever seeing a vocabulary like 1508 this.</p> 1509 1510 <p>So what we can do here is make an exception: the root type of our 1511 object model handles the top-level element. Here is the parser:</p> 1512 1513 <pre class="c++"> 1514object:: 1515object (parser& p) 1516{ 1517 p.next_expect ( 1518 parser::start_element, "object", content::complex); 1519 1520 name_ = p.attribute ("name"); 1521 type_ = p.attribute<object_type> ("type"); 1522 id_ = p.attribute<unsigned int> ("id"); 1523 1524 ... 1525 1526 p.next_expect (parser::end_element); 1527} 1528 </pre> 1529 1530 <p>And here is the serializer:</p> 1531 1532 <pre class="c++"> 1533void object:: 1534serialize (serializer& s) const 1535{ 1536 s.start_element ("object"); 1537 1538 ... 1539 1540 s.end_element (); 1541} 1542 </pre> 1543 1544 <p>The only minor drawback of going this route is that we can no longer 1545 parse attributes in the initializer list for the root object.</p> 1546 1547 <h1><a name="5">Inheritance</a></h1> 1548 1549 <p>So far we have had a smooth sailing with the streaming approach but things get 1550 a bit bumpy once we start dealing with inheritance. This is normally 1551 where the in-memory approach has its day.</p> 1552 1553 <p>Say we have <code>elevated-object</code> which adds the 1554 <code>units</code> attribute and the <code>elevation</code> elements. 1555 Here is the XML:</p> 1556 1557 <pre class="xml"> 1558<elevated-object name="Lion's Head" type="mountain" 1559 units="m" id="123"> 1560 <position lat="-33.8569" lon="18.5083"/> 1561 <position lat="-33.8568" lon="18.5083"/> 1562 <position lat="-33.8568" lon="18.5082"/> 1563 1564 <elevation val="668.9"/> 1565 <elevation val="669"/> 1566 <elevation val="669.1"/> 1567</elevated-object> 1568 </pre> 1569 1570 <p>And here is the object model:</p> 1571 1572 <pre class="c++"> 1573enum class units {...}; 1574 1575class elevation {...}; 1576 1577class elevated_object: public object 1578{ 1579 ... 1580 1581 units units_; 1582 std::vector<elevation> elevations_; 1583}; 1584 </pre> 1585 1586 <p>Streaming assumes linearity. We start an element, add some attributes, 1587 add some nested elements, and end the element. In contrast, with an 1588 in-memory approach we can add some attributes, then add some nested 1589 elements, then go back and add more attributes. This kind of back and 1590 forth is exactly what inheritance often requires. So this is a bit of 1591 problem for us.</p> 1592 1593 <p>Consider the <code>elevated_object</code> constructor:</p> 1594 1595 <pre class="c++"> 1596elevated_object:: 1597elevated_object (parser& p) 1598 : object (p), 1599 units_ (p.attribute<units> ("units")) 1600{ 1601 do 1602 { 1603 p.next_expect (parser::start_element, "elevation"); 1604 elevations_.push_back (elevation (p)); 1605 p.next_expect (parser::end_element); 1606 1607 } while (p.peek () == parser::start_element && 1608 p.name () == "elevation") 1609} 1610 </pre> 1611 1612 <p>Note that here I assume we went back to our original architecture 1613 where the caller handles the start and end of the element (this is 1614 the other advantage of this architecture: it allows us to reuse 1615 base parsing and serialization code in derived classes).</p> 1616 1617 <p>So we would like to reuse the parsing code from <code>object</code> 1618 so we call the base constructor first.</p> 1619 1620 <p>Then we parse the derived attribute and elements. Do you see 1621 the problem? The <code>object</code> constructor will parse its 1622 attributes and then move on to nested elements. When this constructor 1623 returns, we need to go back to parsing attributes! This is not 1624 something that a streaming approach would normally allow.</p> 1625 1626 <p>To resolve this, the lifetime of the attribute map was extended until 1627 after the <code>end_element</code> event. That is, we can access 1628 attributes any time we are at the element's level. As a result, 1629 the above code just works.</p> 1630 1631 <p>We have the same problem in serialization. Let's say we write 1632 the straightforward code like this:</p> 1633 1634 <pre class="c++"> 1635void elevated_object:: 1636serialize (serializer& s) const 1637{ 1638 object::serialize (s); 1639 1640 s.attribute ("units", units_); 1641 1642 for (const auto& e: elevations_) 1643 { 1644 s.start_element ("elevation"); 1645 e.serialize (s); 1646 s.end_element (); 1647 } 1648} 1649 </pre> 1650 1651 <p>This is not going to work since we will try to add the <code>units</code> 1652 attribute after the nested <code>position</code> elements have already 1653 been written.</p> 1654 1655 <p>To handle inheritance in serialization we have to split the 1656 <code>serialize()</code> function into two. One serializes 1657 the attributes while the other — content:</p> 1658 1659 <pre class="c++"> 1660void object:: 1661serialize_attributes (serializer& s) const 1662{ 1663 s.attribute ("name", name_); 1664 s.attribute ("type", type_); 1665 s.attribute ("id", id_); 1666} 1667 1668void object:: 1669serialize_content (serializer& s) const 1670{ 1671 for (const auto& p: positions_) 1672 { 1673 s.start_element ("position"); 1674 p.serialize (s); 1675 s.end_element (); 1676 } 1677} 1678 </pre> 1679 1680 <p>The <code>serialize()</code> function then simply calls these two 1681 in the correct order.</p> 1682 1683 <pre class="c++"> 1684void object:: 1685serialize (serializer& s) const 1686{ 1687 serialize_attributes (s); 1688 serialize_content (s); 1689} 1690 </pre> 1691 1692 <p>I bet you can guess what the <code>elevated_object</code>'s 1693 implementation looks like:</p> 1694 1695 <pre class="c++"> 1696void elevated_object:: 1697serialize_attributes (serializer& s) const 1698{ 1699 object::serialize_attributes (s); 1700 s.attribute ("units", units_); 1701} 1702 1703void elevated_object:: 1704serialize_content (serializer& s) const 1705{ 1706 object::serialize_content (s); 1707 1708 for (const auto& e: elevations_) 1709 { 1710 s.start_element ("elevation"); 1711 e.serialize (s); 1712 s.end_element (); 1713 } 1714} 1715 </pre> 1716 1717 <p>The <code>serialize()</code> function for <code>elevated_object</code> 1718 is exactly the same:</p> 1719 1720 <pre class="c++"> 1721void elevated_object:: 1722serialize (serializer& s) const 1723{ 1724 serialize_attributes (s); 1725 serialize_content (s); 1726} 1727 </pre> 1728 1729 <h1><a name="6">Implementation Notes</a></h1> 1730 1731 <p><code>libstudxml</code>is an open source (MIT license), portable 1732 (autotools and VC++ projects provided), and external dependency-free 1733 implementation.</p> 1734 1735 <p>It provides a conforming, non-validating XML 1.0 parser by using 1736 the mature and tested Expat XML parser. <code>libstudxml</code> 1737 includes the Expat source code (also distributed under the MIT 1738 license) as an implementation detail. However, you can link to 1739 an external Expat library if you prefer.</p> 1740 1741 <p>If you are familiar with Expat, you are probably wondering how 1742 the push interface provided by Expat was adapted to the pull 1743 API shown earlier. Expat allows us to suspend and resume parsing 1744 after every event and that's exactly what this implementation 1745 does. The performance cost of this constant suspension and 1746 resumption is about 35% of Expat's performance, which is not 1747 negligible but not the end of the world either.</p> 1748 1749 <p>All in, with all the name splitting and string constructions, 1750 parsing throughput on a 2010 Intel Core i7 laptop is about 1751 37 MByte/sec, which should be sufficient for most applications.</p> 1752 1753 <p>While it is much easier to implement a conforming serializer 1754 from scratch, <code>libstudxml</code> reuses an existing and 1755 tested implementation in this case as well. It includes source 1756 code of a small C library for XML serialization called Genx 1757 (also MIT licensed) that was initially created by Tim Bray 1758 and significantly improved and extended over the past years 1759 as part of the XSD/e project.</p> 1760 1761 </div> 1762</div> 1763 1764</body> 1765</html> 1766