1=head1 NAME 2 3XML::LibXML::Parser - Parsing XML Data with XML::LibXML 4 5=head1 SYNOPSIS 6 7 8 9 use XML::LibXML '1.70'; 10 11 # Parser constructor 12 13 $parser = XML::LibXML->new(); 14 $parser = XML::LibXML->new(option=>value, ...); 15 $parser = XML::LibXML->new({option=>value, ...}); 16 17 # Parsing XML 18 19 $dom = XML::LibXML->load_xml( 20 location => $file_or_url 21 # parser options ... 22 ); 23 $dom = XML::LibXML->load_xml( 24 string => $xml_string 25 # parser options ... 26 ); 27 $dom = XML::LibXML->load_xml( 28 string => (\$xml_string) 29 # parser options ... 30 ); 31 $dom = XML::LibXML->load_xml({ 32 IO => $perl_file_handle 33 # parser options ... 34 ); 35 $dom = $parser->load_xml(...); 36 37 # Parsing HTML 38 39 $dom = XML::LibXML->load_html(...); 40 $dom = $parser->load_html(...); 41 42 # Parsing well-balanced XML chunks 43 44 $fragment = $parser->parse_balanced_chunk( $wbxmlstring, $encoding ); 45 46 # Processing XInclude 47 48 $parser->process_xincludes( $doc ); 49 $parser->processXIncludes( $doc ); 50 51 # Old-style parser interfaces 52 53 $doc = $parser->parse_file( $xmlfilename ); 54 $doc = $parser->parse_fh( $io_fh ); 55 $doc = $parser->parse_string( $xmlstring); 56 $doc = $parser->parse_html_file( $htmlfile, \%opts ); 57 $doc = $parser->parse_html_fh( $io_fh, \%opts ); 58 $doc = $parser->parse_html_string( $htmlstring, \%opts ); 59 60 # Push parser 61 62 $parser->parse_chunk($string, $terminate); 63 $parser->init_push(); 64 $parser->push(@data); 65 $doc = $parser->finish_push( $recover ); 66 67 # Set/query parser options 68 69 $parser->option_exists($name); 70 $parser->get_option($name); 71 $parser->set_option($name,$value); 72 $parser->set_options({$name=>$value,...}); 73 74 # XML catalogs 75 76 $parser->load_catalog( $catalog_file ); 77 78=head1 PARSING 79 80An XML document is read into a data structure such as a DOM tree by a piece of 81software, called a parser. XML::LibXML currently provides four different parser 82interfaces: 83 84 85=over 4 86 87=item * 88 89A DOM Pull-Parser 90 91 92 93=item * 94 95A DOM Push-Parser 96 97 98 99=item * 100 101A SAX Parser 102 103 104 105=item * 106 107A DOM based SAX Parser. 108 109 110 111=back 112 113 114=head2 Creating a Parser Instance 115 116XML::LibXML provides an OO interface to the libxml2 parser functions. Thus you 117have to create a parser instance before you can parse any XML data. 118 119=over 4 120 121=item new 122 123 124 $parser = XML::LibXML->new(); 125 $parser = XML::LibXML->new(option=>value, ...); 126 $parser = XML::LibXML->new({option=>value, ...}); 127 128Create a new XML and HTML parser instance. Each parser instance holds default 129values for various parser options. Optionally, one can pass a hash reference or 130a list of option => value pairs to set a different default set of options. 131Unless specified otherwise, the options C<<<<<< load_ext_dtd >>>>>>, and C<<<<<< expand_entities >>>>>> are set to 1. See L<<<<<< Parser Options >>>>>> for a list of libxml2 parser's options. 132 133 134 135=back 136 137 138=head2 DOM Parser 139 140One of the common parser interfaces of XML::LibXML is the DOM parser. This 141parser reads XML data into a DOM like data structure, so each tag can get 142accessed and transformed. 143 144XML::LibXML's DOM parser is not only capable to parse XML data, but also 145(strict) HTML files. There are three ways to parse documents - as a string, as 146a Perl filehandle, or as a filename/URL. The return value from each is a L<<<<<< XML::LibXML::Document >>>>>> object, which is a DOM object. 147 148All of the functions listed below will throw an exception if the document is 149invalid. To prevent this causing your program exiting, wrap the call in an 150eval{} block 151 152=over 4 153 154=item load_xml 155 156 157 $dom = XML::LibXML->load_xml( 158 location => $file_or_url 159 # parser options ... 160 ); 161 $dom = XML::LibXML->load_xml( 162 string => $xml_string 163 # parser options ... 164 ); 165 $dom = XML::LibXML->load_xml( 166 string => (\$xml_string) 167 # parser options ... 168 ); 169 $dom = XML::LibXML->load_xml({ 170 IO => $perl_file_handle 171 # parser options ... 172 ); 173 $dom = $parser->load_xml(...); 174 175 176This function is available since XML::LibXML 1.70. It provides easy to use 177interface to the XML parser that parses given file (or non-HTTPS URL), string, 178or input stream to a DOM tree. The arguments can be passed in a HASH reference 179or as name => value pairs. The function can be called as a class method or an 180object method. In both cases it internally creates a new parser instance 181passing the specified parser options; if called as an object method, it clones 182the original parser (preserving its settings) and additionally applies the 183specified options to the new parser. See the constructor C<<<<<< new >>>>>> and L<<<<<< Parser Options >>>>>> for more information. 184 185Note that, due to a limitation in the underlying libxml2 library, this call 186does not recognize HTTPS-based URLs. (It will treat an HTTPS URL as a filename, 187likely throwing a "No such file or directory" exception.) 188 189 190=item load_html 191 192 193 $dom = XML::LibXML->load_html(...); 194 $dom = $parser->load_html(...); 195 196 197This function is available since XML::LibXML 1.70. It has the same usage as C<<<<<< load_xml >>>>>>, providing interface to the HTML parser. See C<<<<<< load_xml >>>>>> for more information. 198 199 200 201=back 202 203Parsing HTML may cause problems, especially if the ampersand ('&') is used. 204This is a common problem if HTML code is parsed that contains links to 205CGI-scripts. Such links cause the parser to throw errors. In such cases libxml2 206still parses the entire document as there was no error, but the error causes 207XML::LibXML to stop the parsing process. However, the document is not lost. 208Such HTML documents should be parsed using the I<<<<<< recover >>>>>> flag. By default recovering is deactivated. 209 210The functions described above are implemented to parse well formed documents. 211In some cases a program gets well balanced XML instead of well formed documents 212(e.g. an XML fragment from a database). With XML::LibXML it is not required to 213wrap such fragments in the code, because XML::LibXML is capable even to parse 214well balanced XML fragments. 215 216=over 4 217 218=item parse_balanced_chunk 219 220 $fragment = $parser->parse_balanced_chunk( $wbxmlstring, $encoding ); 221 222This function parses a well balanced XML string into a L<<<<<< XML::LibXML::DocumentFragment >>>>>>. The first arguments contains the input string, the optional second argument 223can be used to specify character encoding of the input (UTF-8 is assumed by 224default). 225 226 227=item parse_xml_chunk 228 229This is the old name of parse_balanced_chunk(). Because it may causes confusion 230with the push parser interface, this function should not be used anymore. 231 232 233 234=back 235 236By default XML::LibXML does not process XInclude tags within an XML Document 237(see options section below). XML::LibXML allows one to post-process a document 238to expand XInclude tags. 239 240=over 4 241 242=item process_xincludes 243 244 $parser->process_xincludes( $doc ); 245 246After a document is parsed into a DOM structure, you may want to expand the 247documents XInclude tags. This function processes the given document structure 248and expands all XInclude tags (or throws an error) by using the flags and 249callbacks of the given parser instance. 250 251Note that the resulting Tree contains some extra nodes (of type 252XML_XINCLUDE_START and XML_XINCLUDE_END) after successfully processing the 253document. These nodes indicate where data was included into the original tree. 254if the document is serialized, these extra nodes will not show up. 255 256Remember: A Document with processed XIncludes differs from the original 257document after serialization, because the original XInclude tags will not get 258restored! 259 260If the parser flag "expand_xincludes" is set to 1, you need not to post process 261the parsed document. 262 263 264=item processXIncludes 265 266 $parser->processXIncludes( $doc ); 267 268This is an alias to process_xincludes, but through a JAVA like function name. 269 270 271=item parse_file 272 273 $doc = $parser->parse_file( $xmlfilename ); 274 275This function parses an XML document from a file or network; $xmlfilename can 276be either a filename or a (non-HTTPS) URL. Note that for parsing files, this 277function is the fastest choice, about 6-8 times faster then parse_fh(). 278 279 280=item parse_fh 281 282 $doc = $parser->parse_fh( $io_fh ); 283 284parse_fh() parses a IOREF or a subclass of IO::Handle. 285 286Because the data comes from an open handle, libxml2's parser does not know 287about the base URI of the document. To set the base URI one should use 288parse_fh() as follows: 289 290 291 292 my $doc = $parser->parse_fh( $io_fh, $baseuri ); 293 294 295=item parse_string 296 297 $doc = $parser->parse_string( $xmlstring); 298 299This function is similar to parse_fh(), but it parses an XML document that is 300available as a single string in memory, or alternatively as a reference to a 301scalar containing a string. Again, you can pass an optional base URI to the 302function. 303 304 305 306 my $doc = $parser->parse_string( $xmlstring, $baseuri ); 307 my $doc = $parser->parse_string(\$xmlstring, $baseuri); 308 309 310=item parse_html_file 311 312 $doc = $parser->parse_html_file( $htmlfile, \%opts ); 313 314Similar to parse_file() but parses HTML (strict) documents; $htmlfile can be 315filename or (non-HTTPS) URL. 316 317An optional second argument can be used to pass some options to the HTML parser 318as a HASH reference. See options labeled with HTML in L<<<<<< Parser Options >>>>>>. 319 320 321=item parse_html_fh 322 323 $doc = $parser->parse_html_fh( $io_fh, \%opts ); 324 325Similar to parse_fh() but parses HTML (strict) streams. 326 327An optional second argument can be used to pass some options to the HTML parser 328as a HASH reference. See options labeled with HTML in L<<<<<< Parser Options >>>>>>. 329 330Note: encoding option may not work correctly with this function in libxml2 < 3312.6.27 if the HTML file declares charset using a META tag. 332 333 334=item parse_html_string 335 336 $doc = $parser->parse_html_string( $htmlstring, \%opts ); 337 338Similar to parse_string() but parses HTML (strict) strings. 339 340An optional second argument can be used to pass some options to the HTML parser 341as a HASH reference. See options labeled with HTML in L<<<<<< Parser Options >>>>>>. 342 343 344 345=back 346 347 348=head2 Push Parser 349 350XML::LibXML provides a push parser interface. Rather than pulling the data from 351a given source the push parser waits for the data to be pushed into it. 352 353This allows one to parse large documents without waiting for the parser to 354finish. The interface is especially useful if a program needs to pre-process 355the incoming pieces of XML (e.g. to detect document boundaries). 356 357While XML::LibXML parse_*() functions force the data to be a well-formed XML, 358the push parser will take any arbitrary string that contains some XML data. The 359only requirement is that all the pushed strings are together a well formed 360document. With the push parser interface a program can interrupt the parsing 361process as required, where the parse_*() functions give not enough flexibility. 362 363Different to the pull parser implemented in parse_fh() or parse_file(), the 364push parser is not able to find out about the documents end itself. Thus the 365calling program needs to indicate explicitly when the parsing is done. 366 367In XML::LibXML this is done by a single function: 368 369=over 4 370 371=item parse_chunk 372 373 $parser->parse_chunk($string, $terminate); 374 375parse_chunk() tries to parse a given chunk of data, which isn't necessarily 376well balanced data. The function takes two parameters: The chunk of data as a 377string and optional a termination flag. If the termination flag is set to a 378true value (e.g. 1), the parsing will be stopped and the resulting document 379will be returned as the following example describes: 380 381 382 383 my $parser = XML::LibXML->new; 384 for my $string ( "<", "foo", ' bar="hello world"', "/>") { 385 $parser->parse_chunk( $string ); 386 } 387 my $doc = $parser->parse_chunk("", 1); # terminate the parsing 388 389 390 391=back 392 393Internally XML::LibXML provides three functions that control the push parser 394process: 395 396=over 4 397 398=item init_push 399 400 $parser->init_push(); 401 402Initializes the push parser. 403 404 405=item push 406 407 $parser->push(@data); 408 409This function pushes the data stored inside the array to libxml2's parser. Each 410entry in @data must be a normal scalar! This method can be called repeatedly. 411 412 413=item finish_push 414 415 $doc = $parser->finish_push( $recover ); 416 417This function returns the result of the parsing process. If this function is 418called without a parameter it will complain about non well-formed documents. If 419$restore is 1, the push parser can be used to restore broken or non well formed 420(XML) documents as the following example shows: 421 422 423 424 eval { 425 $parser->push( "<foo>", "bar" ); 426 $doc = $parser->finish_push(); # will report broken XML 427 }; 428 if ( $@ ) { 429 # ... 430 } 431 432This can be annoying if the closing tag is missed by accident. The following 433code will restore the document: 434 435 436 437 eval { 438 $parser->push( "<foo>", "bar" ); 439 $doc = $parser->finish_push(1); # will return the data parsed 440 # unless an error happened 441 }; 442 443 print $doc->toString(); # returns "<foo>bar</foo>" 444 445Of course finish_push() will return nothing if there was no data pushed to the 446parser before. 447 448 449 450=back 451 452 453=head2 Pull Parser (Reader) 454 455XML::LibXML also provides a pull-parser interface similar to the XmlReader 456interface in .NET. This interface is almost streaming, and is usually faster 457and simpler to use than SAX. See L<<<<<< XML::LibXML::Reader >>>>>>. 458 459 460=head2 Direct SAX Parser 461 462XML::LibXML provides a direct SAX parser in the L<<<<<< XML::LibXML::SAX >>>>>> module. 463 464 465=head2 DOM based SAX Parser 466 467XML::LibXML also provides a DOM based SAX parser. The SAX parser is defined in 468the module XML::LibXML::SAX::Parser. As it is not a stream based parser, it 469parses documents into a DOM and traverses the DOM tree instead. 470 471The API of this parser is exactly the same as any other Perl SAX2 parser. See 472XML::SAX::Intro for details. 473 474Aside from the regular parsing methods, you can access the DOM tree traverser 475directly, using the generate() method: 476 477 478 479 my $doc = build_yourself_a_document(); 480 my $saxparser = $XML::LibXML::SAX::Parser->new( ... ); 481 $parser->generate( $doc ); 482 483This is useful for serializing DOM trees, for example that you might have done 484prior processing on, or that you have as a result of XSLT processing. 485 486I<<<<<< WARNING >>>>>> 487 488This is NOT a streaming SAX parser. As I said above, this parser reads the 489entire document into a DOM and serialises it. Some people couldn't read that in 490the paragraph above so I've added this warning. If you want a streaming SAX 491parser look at the L<<<<<< XML::LibXML::SAX >>>>>> man page 492 493 494=head1 SERIALIZATION 495 496XML::LibXML provides some functions to serialize nodes and documents. The 497serialization functions are described on the L<<<<<< XML::LibXML::Node >>>>>> manpage or the L<<<<<< XML::LibXML::Document >>>>>> manpage. XML::LibXML checks three global flags that alter the serialization 498process: 499 500 501=over 4 502 503=item * 504 505skipXMLDeclaration 506 507 508 509=item * 510 511skipDTD 512 513 514 515=item * 516 517setTagCompression 518 519 520 521=back 522 523of that three functions only setTagCompression is available for all 524serialization functions. 525 526Because XML::LibXML does these flags not itself, one has to define them locally 527as the following example shows: 528 529 530 531 local $XML::LibXML::skipXMLDeclaration = 1; 532 local $XML::LibXML::skipDTD = 1; 533 local $XML::LibXML::setTagCompression = 1; 534 535If skipXMLDeclaration is defined and not '0', the XML declaration is omitted 536during serialization. 537 538If skipDTD is defined and not '0', an existing DTD would not be serialized with 539the document. 540 541If setTagCompression is defined and not '0' empty tags are displayed as open 542and closing tags rather than the shortcut. For example the empty tag I<<<<<< foo >>>>>> will be rendered as I<<<<<< E<lt>fooE<gt>E<lt>/fooE<gt> >>>>>> rather than I<<<<<< E<lt>foo/E<gt> >>>>>>. 543 544 545=head1 PARSER OPTIONS 546 547Handling of libxml2 parser options has been unified and improved in XML::LibXML 5481.70. You can now set default options for a particular parser instance by 549passing them to the constructor as C<<<<<< XML::LibXML-E<gt>new({name=E<gt>value, ...}) >>>>>> or C<<<<<< XML::LibXML-E<gt>new(name=E<gt>value,...) >>>>>>. The options can be queried and changed using the following methods (pre-1.70 550interfaces such as C<<<<<< $parser-E<gt>load_ext_dtd(0) >>>>>> also exist, see below): 551 552=over 4 553 554=item option_exists 555 556 $parser->option_exists($name); 557 558Returns 1 if the current XML::LibXML version supports the option C<<<<<< $name >>>>>>, otherwise returns 0 (note that this does not necessarily mean that the option 559is supported by the underlying libxml2 library). 560 561 562=item get_option 563 564 $parser->get_option($name); 565 566Returns the current value of the parser option C<<<<<< $name >>>>>>. 567 568 569=item set_option 570 571 $parser->set_option($name,$value); 572 573Sets option C<<<<<< $name >>>>>> to value C<<<<<< $value >>>>>>. 574 575 576=item set_options 577 578 $parser->set_options({$name=>$value,...}); 579 580Sets multiple parsing options at once. 581 582 583 584=back 585 586IMPORTANT NOTE: This documentation reflects the parser flags available in 587libxml2 2.7.3. Some options have no effect if an older version of libxml2 is 588used. 589 590Each of the flags listed below is labeled 591 592=over 4 593 594=item /parser/ 595 596if it can be used with a C<<<<<< XML::LibXML >>>>>> parser object (i.e. passed to C<<<<<< XML::LibXML-E<gt>new >>>>>>, C<<<<<< XML::LibXML-E<gt>set_option >>>>>>, etc.) 597 598 599=item /html/ 600 601if it can be used passed to the C<<<<<< parse_html_* >>>>>> methods 602 603 604=item /reader/ 605 606if it can be used with the C<<<<<< XML::LibXML::Reader >>>>>>. 607 608 609 610=back 611 612Unless specified otherwise, the default for boolean valued options is 0 613(false). 614 615The available options are: 616 617=over 4 618 619=item URI 620 621/parser, html, reader/ 622 623In case of parsing strings or file handles, XML::LibXML doesn't know about the 624base uri of the document. To make relative references such as XIncludes work, 625one has to set a base URI, that is then used for the parsed document. 626 627 628=item line_numbers 629 630/parser, html, reader/ 631 632If this option is activated, libxml2 will store the line number of each element 633node in the parsed document. The line number can be obtained using the C<<<<<< line_number() >>>>>> method of the C<<<<<< XML::LibXML::Node >>>>>> class (for non-element nodes this may report the line number of the containing 634element). The line numbers are also used for reporting positions of validation 635errors. 636 637IMPORTANT: Due to limitations in the libxml2 library line numbers greater than 63865535 will be returned as 65535. Unfortunately, this is a long and sad story, 639please see L<<<<<< http://bugzilla.gnome.org/show_bug.cgi?id=325533 >>>>>> for more details. 640 641 642=item encoding 643 644/html/ 645 646character encoding of the input 647 648 649=item recover 650 651/parser, html, reader/ 652 653recover from errors; possible values are 0, 1, and 2 654 655A true value turns on recovery mode which allows one to parse broken XML or 656HTML data. The recovery mode allows the parser to return the successfully 657parsed portion of the input document. This is useful for almost well-formed 658documents, where for example a closing tag is missing somewhere. Still, 659XML::LibXML will only parse until the first fatal (non-recoverable) error 660occurs, reporting recoverable parsing errors as warnings. To suppress even 661these warnings, use recover=>2. 662 663Note that validation is switched off automatically in recovery mode. 664 665 666=item expand_entities 667 668/parser, reader/ 669 670substitute entities; possible values are 0 and 1; default is 1 671 672Note that although this flag disables entity substitution, it does not prevent 673the parser from loading external entities; when substitution of an external 674entity is disabled, the entity will be represented in the document tree by an 675XML_ENTITY_REF_NODE node whose subtree will be the content obtained by parsing 676the external resource; Although this nesting is visible from the DOM it is 677transparent to XPath data model, so it is possible to match nodes in an 678unexpanded entity by the same XPath expression as if the entity were expanded. 679See also ext_ent_handler. 680 681 682=item ext_ent_handler 683 684/parser/ 685 686Provide a custom external entity handler to be used when expand_entities is set 687to 1. Possible value is a subroutine reference. 688 689This feature does not work properly in libxml2 < 2.6.27! 690 691The subroutine provided is called whenever the parser needs to retrieve the 692content of an external entity. It is called with two arguments: the system ID 693(URI) and the public ID. The value returned by the subroutine is parsed as the 694content of the entity. 695 696This method can be used to completely disable entity loading, e.g. to prevent 697exploits of the type described at (L<<<<<< http://searchsecuritychannel.techtarget.com/generic/0,295582,sid97_gci1304703,00.html >>>>>>), where a service is tricked to expose its private data by letting it parse a 698remote file (RSS feed) that contains an entity reference to a local file (e.g. C<<<<<< /etc/fstab >>>>>>). 699 700A more granular solution to this problem, however, is provided by custom URL 701resolvers, as in 702 703 my $c = XML::LibXML::InputCallback->new(); 704 sub match { # accept file:/ URIs except for XML catalogs in /etc/xml/ 705 my ($uri) = @_; 706 return ($uri=~m{^file:/} 707 and $uri !~ m{^file:///etc/xml/}) 708 ? 1 : 0; 709 } 710 $c->register_callbacks([ \&match, sub{}, sub{}, sub{} ]); 711 $parser->input_callbacks($c); 712 713 714 715 716=item load_ext_dtd 717 718/parser, reader/ 719 720load the external DTD subset while parsing; possible values are 0 and 1. Unless 721specified, XML::LibXML sets this option to 1. 722 723This flag is also required for DTD Validation, to provide complete attribute, 724and to expand entities, regardless if the document has an internal subset. Thus 725switching off external DTD loading, will disable entity expansion, validation, 726and complete attributes on internal subsets as well. 727 728 729=item complete_attributes 730 731/parser, reader/ 732 733create default DTD attributes; possible values are 0 and 1 734 735 736=item validation 737 738/parser, reader/ 739 740validate with the DTD; possible values are 0 and 1 741 742 743=item suppress_errors 744 745/parser, html, reader/ 746 747suppress error reports; possible values are 0 and 1 748 749 750=item suppress_warnings 751 752/parser, html, reader/ 753 754suppress warning reports; possible values are 0 and 1 755 756 757=item pedantic_parser 758 759/parser, html, reader/ 760 761pedantic error reporting; possible values are 0 and 1 762 763 764=item no_blanks 765 766/parser, html, reader/ 767 768remove blank nodes; possible values are 0 and 1 769 770 771=item no_defdtd 772 773/html/ 774 775do not add a default DOCTYPE; possible values are 0 and 1 776 777the default is (0) to add a DTD when the input html lacks one 778 779 780=item expand_xinclude or xinclude 781 782/parser, reader/ 783 784Implement XInclude substitution; possible values are 0 and 1 785 786Expands XInclude tags immediately while parsing the document. Note that the 787parser will use the URI resolvers installed via C<<<<<< XML::LibXML::InputCallback >>>>>> to parse the included document (if any). 788 789 790=item no_xinclude_nodes 791 792/parser, reader/ 793 794do not generate XINCLUDE START/END nodes; possible values are 0 and 1 795 796 797=item no_network 798 799/parser, html, reader/ 800 801Forbid network access; possible values are 0 and 1 802 803If set to true, all attempts to fetch non-local resources (such as DTD or 804external entities) will fail (unless custom callbacks are defined). 805 806It may be necessary to use the flag C<<<<<< recover >>>>>> for processing documents requiring such resources while networking is off. 807 808 809=item clean_namespaces 810 811/parser, reader/ 812 813remove redundant namespaces declarations during parsing; possible values are 0 814and 1. 815 816 817=item no_cdata 818 819/parser, html, reader/ 820 821merge CDATA as text nodes; possible values are 0 and 1 822 823 824=item no_basefix 825 826/parser, reader/ 827 828not fixup XINCLUDE xml#base URIS; possible values are 0 and 1 829 830 831=item huge 832 833/parser, html, reader/ 834 835relax any hardcoded limit from the parser; possible values are 0 and 1. Unless 836specified, XML::LibXML sets this option to 0. 837 838Note: the default value for this option was changed to protect against denial 839of service through entity expansion attacks. Before enabling the option ensure 840you have taken alternative measures to protect your application against this 841type of attack. 842 843 844=item gdome 845 846/parser/ 847 848THIS OPTION IS EXPERIMENTAL! 849 850Although quite powerful, XML::LibXML's DOM implementation is incomplete with 851respect to the DOM level 2 or level 3 specifications. XML::GDOME is based on 852libxml2 as well, and provides a rather complete DOM implementation by wrapping 853libgdome. This flag allows you to make use of XML::LibXML's full parser options 854and XML::GDOME's DOM implementation at the same time. 855 856To make use of this function, one has to install libgdome and configure 857XML::LibXML to use this library. For this you need to rebuild XML::LibXML! 858 859Note: this feature was not seriously tested in recent XML::LibXML releases. 860 861 862 863=back 864 865For compatibility with XML::LibXML versions prior to 1.70, the following 866methods are also supported for querying and setting the corresponding parser 867options (if called without arguments, the methods return the current value of 868the corresponding parser options; with an argument sets the option to a given 869value): 870 871 872 873 $parser->validation(); 874 $parser->recover(); 875 $parser->pedantic_parser(); 876 $parser->line_numbers(); 877 $parser->load_ext_dtd(); 878 $parser->complete_attributes(); 879 $parser->expand_xinclude(); 880 $parser->gdome_dom(); 881 $parser->clean_namespaces(); 882 $parser->no_network(); 883 884The following obsolete methods trigger parser options in some special way: 885 886=over 4 887 888=item recover_silently 889 890 891 892 $parser->recover_silently(1); 893 894If called without an argument, returns true if the current value of the C<<<<<< recover >>>>>> parser option is 2 and returns false otherwise. With a true argument sets the C<<<<<< recover >>>>>> parser option to 2; with a false argument sets the C<<<<<< recover >>>>>> parser option to 0. 895 896 897=item expand_entities 898 899 900 901 $parser->expand_entities(0); 902 903Get/set the C<<<<<< expand_entities >>>>>> option. If called with a true argument, also turns the C<<<<<< load_ext_dtd >>>>>> option to 1. 904 905 906=item keep_blanks 907 908 909 910 $parser->keep_blanks(0); 911 912This is actually the opposite of the C<<<<<< no_blanks >>>>>> parser option. If used without an argument retrieves negated value of C<<<<<< no_blanks >>>>>>. If used with an argument sets C<<<<<< no_blanks >>>>>> to the opposite value. 913 914 915=item base_uri 916 917 918 919 $parser->base_uri( $your_base_uri ); 920 921Get/set the C<<<<<< URI >>>>>> option. 922 923 924 925=back 926 927 928=head1 XML CATALOGS 929 930C<<<<<< libxml2 >>>>>> supports XML catalogs. Catalogs are used to map remote resources to their local 931copies. Using catalogs can speed up parsing processes if many external 932resources from remote addresses are loaded into the parsed documents (such as 933DTDs or XIncludes). 934 935Note that libxml2 has a global pool of loaded catalogs, so if you apply the 936method C<<<<<< load_catalog >>>>>> to one parser instance, all parser instances will start using the catalog (in 937addition to other previously loaded catalogs). 938 939Note also that catalogs are not used when a custom external entity handler is 940specified. At the current state it is not possible to make use of both types of 941resolving systems at the same time. 942 943=over 4 944 945=item load_catalog 946 947 $parser->load_catalog( $catalog_file ); 948 949Loads the XML catalog file $catalog_file. 950 951 952 953 # Global external entity loader (similar to ext_ent_handler option 954 # but this works really globally, also in XML::LibXSLT include etc..) 955 956 XML::LibXML::externalEntityLoader(\&my_loader); 957 958 959 960=back 961 962 963=head1 ERROR REPORTING 964 965XML::LibXML throws exceptions during parsing, validation or XPath processing 966(and some other occasions). These errors can be caught by using I<<<<<< eval >>>>>> blocks. The error is stored in I<<<<<< $@ >>>>>>. There are two implementations: the old one throws $@ which is just a message 967string, in the new one $@ is an object from the class XML::LibXML::Error; this 968class overrides the operator "" so that when printed, the object flattens to 969the usual error message. 970 971XML::LibXML throws errors as they occur. This is a very common misunderstanding 972in the use of XML::LibXML. If the eval is omitted, XML::LibXML will always halt 973your script by "croaking" (see Carp man page for details). 974 975Also note that an increasing number of functions throw errors if bad data is 976passed as arguments. If you cannot assure valid data passed to XML::LibXML you 977should eval these functions. 978 979Note: since version 1.59, get_last_error() is no longer available in 980XML::LibXML for thread-safety reasons. 981 982=head1 AUTHORS 983 984Matt Sergeant, 985Christian Glahn, 986Petr Pajas 987 988 989=head1 VERSION 990 9912.0207 992 993=head1 COPYRIGHT 994 9952001-2007, AxKit.com Ltd. 996 9972002-2006, Christian Glahn. 998 9992006-2009, Petr Pajas. 1000 1001=cut 1002 1003 1004=head1 LICENSE 1005 1006This program is free software; you can redistribute it and/or modify it under 1007the same terms as Perl itself. 1008 1009