1=head1 NAME
2
3XML::LibXML::Parser - Parsing XML Data with XML::LibXML
4
5=head1 SYNOPSIS
6
7
8
9  use XML::LibXML '1.70';
10
11  # Parser constructor
12
13  $parser = XML::LibXML->new();
14  $parser = XML::LibXML->new(option=>value, ...);
15  $parser = XML::LibXML->new({option=>value, ...});
16
17  # Parsing XML
18
19  $dom = XML::LibXML->load_xml(
20      location => $file_or_url
21      # parser options ...
22    );
23  $dom = XML::LibXML->load_xml(
24      string => $xml_string
25      # parser options ...
26    );
27  $dom = XML::LibXML->load_xml(
28      string => (\$xml_string)
29      # parser options ...
30    );
31  $dom = XML::LibXML->load_xml({
32      IO => $perl_file_handle
33      # parser options ...
34    );
35  $dom = $parser->load_xml(...);
36
37  # Parsing HTML
38
39  $dom = XML::LibXML->load_html(...);
40  $dom = $parser->load_html(...);
41
42  # Parsing well-balanced XML chunks
43
44  $fragment = $parser->parse_balanced_chunk( $wbxmlstring, $encoding );
45
46  # Processing XInclude
47
48  $parser->process_xincludes( $doc );
49  $parser->processXIncludes( $doc );
50
51  # Old-style parser interfaces
52
53  $doc = $parser->parse_file( $xmlfilename );
54  $doc = $parser->parse_fh( $io_fh );
55  $doc = $parser->parse_string( $xmlstring);
56  $doc = $parser->parse_html_file( $htmlfile, \%opts );
57  $doc = $parser->parse_html_fh( $io_fh, \%opts );
58  $doc = $parser->parse_html_string( $htmlstring, \%opts );
59
60  # Push parser
61
62  $parser->parse_chunk($string, $terminate);
63  $parser->init_push();
64  $parser->push(@data);
65  $doc = $parser->finish_push( $recover );
66
67  # Set/query parser options
68
69  $parser->option_exists($name);
70  $parser->get_option($name);
71  $parser->set_option($name,$value);
72  $parser->set_options({$name=>$value,...});
73
74  # XML catalogs
75
76  $parser->load_catalog( $catalog_file );
77
78=head1 PARSING
79
80An XML document is read into a data structure such as a DOM tree by a piece of
81software, called a parser. XML::LibXML currently provides four different parser
82interfaces:
83
84
85=over 4
86
87=item *
88
89A DOM Pull-Parser
90
91
92
93=item *
94
95A DOM Push-Parser
96
97
98
99=item *
100
101A SAX Parser
102
103
104
105=item *
106
107A DOM based SAX Parser.
108
109
110
111=back
112
113
114=head2 Creating a Parser Instance
115
116XML::LibXML provides an OO interface to the libxml2 parser functions. Thus you
117have to create a parser instance before you can parse any XML data.
118
119=over 4
120
121=item new
122
123
124  $parser = XML::LibXML->new();
125  $parser = XML::LibXML->new(option=>value, ...);
126  $parser = XML::LibXML->new({option=>value, ...});
127
128Create a new XML and HTML parser instance. Each parser instance holds default
129values for various parser options. Optionally, one can pass a hash reference or
130a list of option => value pairs to set a different default set of options.
131Unless specified otherwise, the options C<<<<<< load_ext_dtd >>>>>>, and C<<<<<< expand_entities >>>>>> are set to 1. See L<<<<<< Parser Options >>>>>> for a list of libxml2 parser's options.
132
133
134
135=back
136
137
138=head2 DOM Parser
139
140One of the common parser interfaces of XML::LibXML is the DOM parser. This
141parser reads XML data into a DOM like data structure, so each tag can get
142accessed and transformed.
143
144XML::LibXML's DOM parser is not only capable to parse XML data, but also
145(strict) HTML files. There are three ways to parse documents - as a string, as
146a Perl filehandle, or as a filename/URL. The return value from each is a L<<<<<< XML::LibXML::Document >>>>>> object, which is a DOM object.
147
148All of the functions listed below will throw an exception if the document is
149invalid. To prevent this causing your program exiting, wrap the call in an
150eval{} block
151
152=over 4
153
154=item load_xml
155
156
157  $dom = XML::LibXML->load_xml(
158      location => $file_or_url
159      # parser options ...
160    );
161  $dom = XML::LibXML->load_xml(
162      string => $xml_string
163      # parser options ...
164    );
165  $dom = XML::LibXML->load_xml(
166      string => (\$xml_string)
167      # parser options ...
168    );
169  $dom = XML::LibXML->load_xml({
170      IO => $perl_file_handle
171      # parser options ...
172    );
173  $dom = $parser->load_xml(...);
174
175
176This function is available since XML::LibXML 1.70. It provides easy to use
177interface to the XML parser that parses given file (or non-HTTPS URL), string,
178or input stream to a DOM tree. The arguments can be passed in a HASH reference
179or as name => value pairs. The function can be called as a class method or an
180object method. In both cases it internally creates a new parser instance
181passing the specified parser options; if called as an object method, it clones
182the original parser (preserving its settings) and additionally applies the
183specified options to the new parser. See the constructor C<<<<<< new >>>>>> and L<<<<<< Parser Options >>>>>> for more information.
184
185Note that, due to a limitation in the underlying libxml2 library, this call
186does not recognize HTTPS-based URLs. (It will treat an HTTPS URL as a filename,
187likely throwing a "No such file or directory" exception.)
188
189
190=item load_html
191
192
193  $dom = XML::LibXML->load_html(...);
194  $dom = $parser->load_html(...);
195
196
197This function is available since XML::LibXML 1.70. It has the same usage as C<<<<<< load_xml >>>>>>, providing interface to the HTML parser. See C<<<<<< load_xml >>>>>> for more information.
198
199
200
201=back
202
203Parsing HTML may cause problems, especially if the ampersand ('&') is used.
204This is a common problem if HTML code is parsed that contains links to
205CGI-scripts. Such links cause the parser to throw errors. In such cases libxml2
206still parses the entire document as there was no error, but the error causes
207XML::LibXML to stop the parsing process. However, the document is not lost.
208Such HTML documents should be parsed using the I<<<<<< recover >>>>>> flag. By default recovering is deactivated.
209
210The functions described above are implemented to parse well formed documents.
211In some cases a program gets well balanced XML instead of well formed documents
212(e.g. an XML fragment from a database). With XML::LibXML it is not required to
213wrap such fragments in the code, because XML::LibXML is capable even to parse
214well balanced XML fragments.
215
216=over 4
217
218=item parse_balanced_chunk
219
220  $fragment = $parser->parse_balanced_chunk( $wbxmlstring, $encoding );
221
222This function parses a well balanced XML string into a L<<<<<< XML::LibXML::DocumentFragment >>>>>>. The first arguments contains the input string, the optional second argument
223can be used to specify character encoding of the input (UTF-8 is assumed by
224default).
225
226
227=item parse_xml_chunk
228
229This is the old name of parse_balanced_chunk(). Because it may causes confusion
230with the push parser interface, this function should not be used anymore.
231
232
233
234=back
235
236By default XML::LibXML does not process XInclude tags within an XML Document
237(see options section below). XML::LibXML allows one to post-process a document
238to expand XInclude tags.
239
240=over 4
241
242=item process_xincludes
243
244  $parser->process_xincludes( $doc );
245
246After a document is parsed into a DOM structure, you may want to expand the
247documents XInclude tags. This function processes the given document structure
248and expands all XInclude tags (or throws an error) by using the flags and
249callbacks of the given parser instance.
250
251Note that the resulting Tree contains some extra nodes (of type
252XML_XINCLUDE_START and XML_XINCLUDE_END) after successfully processing the
253document. These nodes indicate where data was included into the original tree.
254if the document is serialized, these extra nodes will not show up.
255
256Remember: A Document with processed XIncludes differs from the original
257document after serialization, because the original XInclude tags will not get
258restored!
259
260If the parser flag "expand_xincludes" is set to 1, you need not to post process
261the parsed document.
262
263
264=item processXIncludes
265
266  $parser->processXIncludes( $doc );
267
268This is an alias to process_xincludes, but through a JAVA like function name.
269
270
271=item parse_file
272
273  $doc = $parser->parse_file( $xmlfilename );
274
275This function parses an XML document from a file or network; $xmlfilename can
276be either a filename or a (non-HTTPS) URL. Note that for parsing files, this
277function is the fastest choice, about 6-8 times faster then parse_fh().
278
279
280=item parse_fh
281
282  $doc = $parser->parse_fh( $io_fh );
283
284parse_fh() parses a IOREF or a subclass of IO::Handle.
285
286Because the data comes from an open handle, libxml2's parser does not know
287about the base URI of the document. To set the base URI one should use
288parse_fh() as follows:
289
290
291
292  my $doc = $parser->parse_fh( $io_fh, $baseuri );
293
294
295=item parse_string
296
297  $doc = $parser->parse_string( $xmlstring);
298
299This function is similar to parse_fh(), but it parses an XML document that is
300available as a single string in memory, or alternatively as a reference to a
301scalar containing a string. Again, you can pass an optional base URI to the
302function.
303
304
305
306  my $doc = $parser->parse_string( $xmlstring, $baseuri );
307  my $doc = $parser->parse_string(\$xmlstring, $baseuri);
308
309
310=item parse_html_file
311
312  $doc = $parser->parse_html_file( $htmlfile, \%opts );
313
314Similar to parse_file() but parses HTML (strict) documents; $htmlfile can be
315filename or (non-HTTPS) URL.
316
317An optional second argument can be used to pass some options to the HTML parser
318as a HASH reference. See options labeled with HTML in L<<<<<< Parser Options >>>>>>.
319
320
321=item parse_html_fh
322
323  $doc = $parser->parse_html_fh( $io_fh, \%opts );
324
325Similar to parse_fh() but parses HTML (strict) streams.
326
327An optional second argument can be used to pass some options to the HTML parser
328as a HASH reference. See options labeled with HTML in L<<<<<< Parser Options >>>>>>.
329
330Note: encoding option may not work correctly with this function in libxml2 <
3312.6.27 if the HTML file declares charset using a META tag.
332
333
334=item parse_html_string
335
336  $doc = $parser->parse_html_string( $htmlstring, \%opts );
337
338Similar to parse_string() but parses HTML (strict) strings.
339
340An optional second argument can be used to pass some options to the HTML parser
341as a HASH reference. See options labeled with HTML in L<<<<<< Parser Options >>>>>>.
342
343
344
345=back
346
347
348=head2 Push Parser
349
350XML::LibXML provides a push parser interface. Rather than pulling the data from
351a given source the push parser waits for the data to be pushed into it.
352
353This allows one to parse large documents without waiting for the parser to
354finish. The interface is especially useful if a program needs to pre-process
355the incoming pieces of XML (e.g. to detect document boundaries).
356
357While XML::LibXML parse_*() functions force the data to be a well-formed XML,
358the push parser will take any arbitrary string that contains some XML data. The
359only requirement is that all the pushed strings are together a well formed
360document. With the push parser interface a program can interrupt the parsing
361process as required, where the parse_*() functions give not enough flexibility.
362
363Different to the pull parser implemented in parse_fh() or parse_file(), the
364push parser is not able to find out about the documents end itself. Thus the
365calling program needs to indicate explicitly when the parsing is done.
366
367In XML::LibXML this is done by a single function:
368
369=over 4
370
371=item parse_chunk
372
373  $parser->parse_chunk($string, $terminate);
374
375parse_chunk() tries to parse a given chunk of data, which isn't necessarily
376well balanced data. The function takes two parameters: The chunk of data as a
377string and optional a termination flag. If the termination flag is set to a
378true value (e.g. 1), the parsing will be stopped and the resulting document
379will be returned as the following example describes:
380
381
382
383  my $parser = XML::LibXML->new;
384  for my $string ( "<", "foo", ' bar="hello world"', "/>") {
385       $parser->parse_chunk( $string );
386  }
387  my $doc = $parser->parse_chunk("", 1); # terminate the parsing
388
389
390
391=back
392
393Internally XML::LibXML provides three functions that control the push parser
394process:
395
396=over 4
397
398=item init_push
399
400  $parser->init_push();
401
402Initializes the push parser.
403
404
405=item push
406
407  $parser->push(@data);
408
409This function pushes the data stored inside the array to libxml2's parser. Each
410entry in @data must be a normal scalar! This method can be called repeatedly.
411
412
413=item finish_push
414
415  $doc = $parser->finish_push( $recover );
416
417This function returns the result of the parsing process. If this function is
418called without a parameter it will complain about non well-formed documents. If
419$restore is 1, the push parser can be used to restore broken or non well formed
420(XML) documents as the following example shows:
421
422
423
424  eval {
425      $parser->push( "<foo>", "bar" );
426      $doc = $parser->finish_push();    # will report broken XML
427  };
428  if ( $@ ) {
429     # ...
430  }
431
432This can be annoying if the closing tag is missed by accident. The following
433code will restore the document:
434
435
436
437  eval {
438      $parser->push( "<foo>", "bar" );
439      $doc = $parser->finish_push(1);   # will return the data parsed
440                                        # unless an error happened
441  };
442
443  print $doc->toString(); # returns "<foo>bar</foo>"
444
445Of course finish_push() will return nothing if there was no data pushed to the
446parser before.
447
448
449
450=back
451
452
453=head2 Pull Parser (Reader)
454
455XML::LibXML also provides a pull-parser interface similar to the XmlReader
456interface in .NET. This interface is almost streaming, and is usually faster
457and simpler to use than SAX. See L<<<<<< XML::LibXML::Reader >>>>>>.
458
459
460=head2 Direct SAX Parser
461
462XML::LibXML provides a direct SAX parser in the L<<<<<< XML::LibXML::SAX >>>>>> module.
463
464
465=head2 DOM based SAX Parser
466
467XML::LibXML also provides a DOM based SAX parser. The SAX parser is defined in
468the module XML::LibXML::SAX::Parser. As it is not a stream based parser, it
469parses documents into a DOM and traverses the DOM tree instead.
470
471The API of this parser is exactly the same as any other Perl SAX2 parser. See
472XML::SAX::Intro for details.
473
474Aside from the regular parsing methods, you can access the DOM tree traverser
475directly, using the generate() method:
476
477
478
479  my $doc = build_yourself_a_document();
480  my $saxparser = $XML::LibXML::SAX::Parser->new( ... );
481  $parser->generate( $doc );
482
483This is useful for serializing DOM trees, for example that you might have done
484prior processing on, or that you have as a result of XSLT processing.
485
486I<<<<<< WARNING >>>>>>
487
488This is NOT a streaming SAX parser. As I said above, this parser reads the
489entire document into a DOM and serialises it. Some people couldn't read that in
490the paragraph above so I've added this warning. If you want a streaming SAX
491parser look at the L<<<<<< XML::LibXML::SAX >>>>>> man page
492
493
494=head1 SERIALIZATION
495
496XML::LibXML provides some functions to serialize nodes and documents. The
497serialization functions are described on the L<<<<<< XML::LibXML::Node >>>>>> manpage or the L<<<<<< XML::LibXML::Document >>>>>> manpage. XML::LibXML checks three global flags that alter the serialization
498process:
499
500
501=over 4
502
503=item *
504
505skipXMLDeclaration
506
507
508
509=item *
510
511skipDTD
512
513
514
515=item *
516
517setTagCompression
518
519
520
521=back
522
523of that three functions only setTagCompression is available for all
524serialization functions.
525
526Because XML::LibXML does these flags not itself, one has to define them locally
527as the following example shows:
528
529
530
531  local $XML::LibXML::skipXMLDeclaration = 1;
532  local $XML::LibXML::skipDTD = 1;
533  local $XML::LibXML::setTagCompression = 1;
534
535If skipXMLDeclaration is defined and not '0', the XML declaration is omitted
536during serialization.
537
538If skipDTD is defined and not '0', an existing DTD would not be serialized with
539the document.
540
541If setTagCompression is defined and not '0' empty tags are displayed as open
542and closing tags rather than the shortcut. For example the empty tag I<<<<<< foo >>>>>> will be rendered as I<<<<<< E<lt>fooE<gt>E<lt>/fooE<gt> >>>>>> rather than I<<<<<< E<lt>foo/E<gt> >>>>>>.
543
544
545=head1 PARSER OPTIONS
546
547Handling of libxml2 parser options has been unified and improved in XML::LibXML
5481.70. You can now set default options for a particular parser instance by
549passing them to the constructor as C<<<<<< XML::LibXML-E<gt>new({name=E<gt>value, ...}) >>>>>> or C<<<<<< XML::LibXML-E<gt>new(name=E<gt>value,...) >>>>>>. The options can be queried and changed using the following methods (pre-1.70
550interfaces such as C<<<<<< $parser-E<gt>load_ext_dtd(0) >>>>>> also exist, see below):
551
552=over 4
553
554=item option_exists
555
556  $parser->option_exists($name);
557
558Returns 1 if the current XML::LibXML version supports the option C<<<<<< $name >>>>>>, otherwise returns 0 (note that this does not necessarily mean that the option
559is supported by the underlying libxml2 library).
560
561
562=item get_option
563
564  $parser->get_option($name);
565
566Returns the current value of the parser option C<<<<<< $name >>>>>>.
567
568
569=item set_option
570
571  $parser->set_option($name,$value);
572
573Sets option C<<<<<< $name >>>>>> to value C<<<<<< $value >>>>>>.
574
575
576=item set_options
577
578  $parser->set_options({$name=>$value,...});
579
580Sets multiple parsing options at once.
581
582
583
584=back
585
586IMPORTANT NOTE: This documentation reflects the parser flags available in
587libxml2 2.7.3. Some options have no effect if an older version of libxml2 is
588used.
589
590Each of the flags listed below is labeled
591
592=over 4
593
594=item /parser/
595
596if it can be used with a C<<<<<< XML::LibXML >>>>>> parser object (i.e. passed to C<<<<<< XML::LibXML-E<gt>new >>>>>>, C<<<<<< XML::LibXML-E<gt>set_option >>>>>>, etc.)
597
598
599=item /html/
600
601if it can be used passed to the C<<<<<< parse_html_* >>>>>> methods
602
603
604=item /reader/
605
606if it can be used with the C<<<<<< XML::LibXML::Reader >>>>>>.
607
608
609
610=back
611
612Unless specified otherwise, the default for boolean valued options is 0
613(false).
614
615The available options are:
616
617=over 4
618
619=item URI
620
621/parser, html, reader/
622
623In case of parsing strings or file handles, XML::LibXML doesn't know about the
624base uri of the document. To make relative references such as XIncludes work,
625one has to set a base URI, that is then used for the parsed document.
626
627
628=item line_numbers
629
630/parser, html, reader/
631
632If this option is activated, libxml2 will store the line number of each element
633node in the parsed document. The line number can be obtained using the C<<<<<< line_number() >>>>>> method of the C<<<<<< XML::LibXML::Node >>>>>> class (for non-element nodes this may report the line number of the containing
634element). The line numbers are also used for reporting positions of validation
635errors.
636
637IMPORTANT: Due to limitations in the libxml2 library line numbers greater than
63865535 will be returned as 65535. Unfortunately, this is a long and sad story,
639please see L<<<<<< http://bugzilla.gnome.org/show_bug.cgi?id=325533 >>>>>> for more details.
640
641
642=item encoding
643
644/html/
645
646character encoding of the input
647
648
649=item recover
650
651/parser, html, reader/
652
653recover from errors; possible values are 0, 1, and 2
654
655A true value turns on recovery mode which allows one to parse broken XML or
656HTML data. The recovery mode allows the parser to return the successfully
657parsed portion of the input document. This is useful for almost well-formed
658documents, where for example a closing tag is missing somewhere. Still,
659XML::LibXML will only parse until the first fatal (non-recoverable) error
660occurs, reporting recoverable parsing errors as warnings. To suppress even
661these warnings, use recover=>2.
662
663Note that validation is switched off automatically in recovery mode.
664
665
666=item expand_entities
667
668/parser, reader/
669
670substitute entities; possible values are 0 and 1; default is 1
671
672Note that although this flag disables entity substitution, it does not prevent
673the parser from loading external entities; when substitution of an external
674entity is disabled, the entity will be represented in the document tree by an
675XML_ENTITY_REF_NODE node whose subtree will be the content obtained by parsing
676the external resource; Although this nesting is visible from the DOM it is
677transparent to XPath data model, so it is possible to match nodes in an
678unexpanded entity by the same XPath expression as if the entity were expanded.
679See also ext_ent_handler.
680
681
682=item ext_ent_handler
683
684/parser/
685
686Provide a custom external entity handler to be used when expand_entities is set
687to 1. Possible value is a subroutine reference.
688
689This feature does not work properly in libxml2 < 2.6.27!
690
691The subroutine provided is called whenever the parser needs to retrieve the
692content of an external entity. It is called with two arguments: the system ID
693(URI) and the public ID. The value returned by the subroutine is parsed as the
694content of the entity.
695
696This method can be used to completely disable entity loading, e.g. to prevent
697exploits of the type described at  (L<<<<<< http://searchsecuritychannel.techtarget.com/generic/0,295582,sid97_gci1304703,00.html >>>>>>), where a service is tricked to expose its private data by letting it parse a
698remote file (RSS feed) that contains an entity reference to a local file (e.g. C<<<<<< /etc/fstab >>>>>>).
699
700A more granular solution to this problem, however, is provided by custom URL
701resolvers, as in
702
703  my $c = XML::LibXML::InputCallback->new();
704  sub match {   # accept file:/ URIs except for XML catalogs in /etc/xml/
705    my ($uri) = @_;
706    return ($uri=~m{^file:/}
707            and $uri !~ m{^file:///etc/xml/})
708           ? 1 : 0;
709  }
710  $c->register_callbacks([ \&match, sub{}, sub{}, sub{} ]);
711  $parser->input_callbacks($c);
712
713
714
715
716=item load_ext_dtd
717
718/parser, reader/
719
720load the external DTD subset while parsing; possible values are 0 and 1. Unless
721specified, XML::LibXML sets this option to 1.
722
723This flag is also required for DTD Validation, to provide complete attribute,
724and to expand entities, regardless if the document has an internal subset. Thus
725switching off external DTD loading, will disable entity expansion, validation,
726and complete attributes on internal subsets as well.
727
728
729=item complete_attributes
730
731/parser, reader/
732
733create default DTD attributes; possible values are 0 and 1
734
735
736=item validation
737
738/parser, reader/
739
740validate with the DTD; possible values are 0 and 1
741
742
743=item suppress_errors
744
745/parser, html, reader/
746
747suppress error reports; possible values are 0 and 1
748
749
750=item suppress_warnings
751
752/parser, html, reader/
753
754suppress warning reports; possible values are 0 and 1
755
756
757=item pedantic_parser
758
759/parser, html, reader/
760
761pedantic error reporting; possible values are 0 and 1
762
763
764=item no_blanks
765
766/parser, html, reader/
767
768remove blank nodes; possible values are 0 and 1
769
770
771=item no_defdtd
772
773/html/
774
775do not add a default DOCTYPE; possible values are 0 and 1
776
777the default is (0) to add a DTD when the input html lacks one
778
779
780=item expand_xinclude or xinclude
781
782/parser, reader/
783
784Implement XInclude substitution; possible values are 0 and 1
785
786Expands XInclude tags immediately while parsing the document. Note that the
787parser will use the URI resolvers installed via C<<<<<< XML::LibXML::InputCallback >>>>>> to parse the included document (if any).
788
789
790=item no_xinclude_nodes
791
792/parser, reader/
793
794do not generate XINCLUDE START/END nodes; possible values are 0 and 1
795
796
797=item no_network
798
799/parser, html, reader/
800
801Forbid network access; possible values are 0 and 1
802
803If set to true, all attempts to fetch non-local resources (such as DTD or
804external entities) will fail (unless custom callbacks are defined).
805
806It may be necessary to use the flag C<<<<<< recover >>>>>> for processing documents requiring such resources while networking is off.
807
808
809=item clean_namespaces
810
811/parser, reader/
812
813remove redundant namespaces declarations during parsing; possible values are 0
814and 1.
815
816
817=item no_cdata
818
819/parser, html, reader/
820
821merge CDATA as text nodes; possible values are 0 and 1
822
823
824=item no_basefix
825
826/parser, reader/
827
828not fixup XINCLUDE xml#base URIS; possible values are 0 and 1
829
830
831=item huge
832
833/parser, html, reader/
834
835relax any hardcoded limit from the parser; possible values are 0 and 1. Unless
836specified, XML::LibXML sets this option to 0.
837
838Note: the default value for this option was changed to protect against denial
839of service through entity expansion attacks. Before enabling the option ensure
840you have taken alternative measures to protect your application against this
841type of attack.
842
843
844=item gdome
845
846/parser/
847
848THIS OPTION IS EXPERIMENTAL!
849
850Although quite powerful, XML::LibXML's DOM implementation is incomplete with
851respect to the DOM level 2 or level 3 specifications. XML::GDOME is based on
852libxml2 as well, and provides a rather complete DOM implementation by wrapping
853libgdome. This flag allows you to make use of XML::LibXML's full parser options
854and XML::GDOME's DOM implementation at the same time.
855
856To make use of this function, one has to install libgdome and configure
857XML::LibXML to use this library. For this you need to rebuild XML::LibXML!
858
859Note: this feature was not seriously tested in recent XML::LibXML releases.
860
861
862
863=back
864
865For compatibility with XML::LibXML versions prior to 1.70, the following
866methods are also supported for querying and setting the corresponding parser
867options (if called without arguments, the methods return the current value of
868the corresponding parser options; with an argument sets the option to a given
869value):
870
871
872
873  $parser->validation();
874  $parser->recover();
875  $parser->pedantic_parser();
876  $parser->line_numbers();
877  $parser->load_ext_dtd();
878  $parser->complete_attributes();
879  $parser->expand_xinclude();
880  $parser->gdome_dom();
881  $parser->clean_namespaces();
882  $parser->no_network();
883
884The following obsolete methods trigger parser options in some special way:
885
886=over 4
887
888=item recover_silently
889
890
891
892  $parser->recover_silently(1);
893
894If called without an argument, returns true if the current value of the C<<<<<< recover >>>>>> parser option is 2 and returns false otherwise. With a true argument sets the C<<<<<< recover >>>>>> parser option to 2; with a false argument sets the C<<<<<< recover >>>>>> parser option to 0.
895
896
897=item expand_entities
898
899
900
901  $parser->expand_entities(0);
902
903Get/set the C<<<<<< expand_entities >>>>>> option. If called with a true argument, also turns the C<<<<<< load_ext_dtd >>>>>> option to 1.
904
905
906=item keep_blanks
907
908
909
910  $parser->keep_blanks(0);
911
912This is actually the opposite of the C<<<<<< no_blanks >>>>>> parser option. If used without an argument retrieves negated value of C<<<<<< no_blanks >>>>>>. If used with an argument sets C<<<<<< no_blanks >>>>>> to the opposite value.
913
914
915=item base_uri
916
917
918
919  $parser->base_uri( $your_base_uri );
920
921Get/set the C<<<<<< URI >>>>>> option.
922
923
924
925=back
926
927
928=head1 XML CATALOGS
929
930C<<<<<< libxml2 >>>>>> supports XML catalogs. Catalogs are used to map remote resources to their local
931copies. Using catalogs can speed up parsing processes if many external
932resources from remote addresses are loaded into the parsed documents (such as
933DTDs or XIncludes).
934
935Note that libxml2 has a global pool of loaded catalogs, so if you apply the
936method C<<<<<< load_catalog >>>>>> to one parser instance, all parser instances will start using the catalog (in
937addition to other previously loaded catalogs).
938
939Note also that catalogs are not used when a custom external entity handler is
940specified. At the current state it is not possible to make use of both types of
941resolving systems at the same time.
942
943=over 4
944
945=item load_catalog
946
947  $parser->load_catalog( $catalog_file );
948
949Loads the XML catalog file $catalog_file.
950
951
952
953  # Global external entity loader (similar to ext_ent_handler option
954  # but this works really globally, also in XML::LibXSLT include etc..)
955
956  XML::LibXML::externalEntityLoader(\&my_loader);
957
958
959
960=back
961
962
963=head1 ERROR REPORTING
964
965XML::LibXML throws exceptions during parsing, validation or XPath processing
966(and some other occasions). These errors can be caught by using I<<<<<< eval >>>>>> blocks. The error is stored in I<<<<<< $@ >>>>>>. There are two implementations: the old one throws $@ which is just a message
967string, in the new one $@ is an object from the class XML::LibXML::Error; this
968class overrides the operator "" so that when printed, the object flattens to
969the usual error message.
970
971XML::LibXML throws errors as they occur. This is a very common misunderstanding
972in the use of XML::LibXML. If the eval is omitted, XML::LibXML will always halt
973your script by "croaking" (see Carp man page for details).
974
975Also note that an increasing number of functions throw errors if bad data is
976passed as arguments. If you cannot assure valid data passed to XML::LibXML you
977should eval these functions.
978
979Note: since version 1.59, get_last_error() is no longer available in
980XML::LibXML for thread-safety reasons.
981
982=head1 AUTHORS
983
984Matt Sergeant,
985Christian Glahn,
986Petr Pajas
987
988
989=head1 VERSION
990
9912.0207
992
993=head1 COPYRIGHT
994
9952001-2007, AxKit.com Ltd.
996
9972002-2006, Christian Glahn.
998
9992006-2009, Petr Pajas.
1000
1001=cut
1002
1003
1004=head1 LICENSE
1005
1006This program is free software; you can redistribute it and/or modify it under
1007the same terms as Perl itself.
1008
1009