• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

lib/XML/H29-Jun-2017-24440

t/H29-Jun-2017-3,4423,335

ChangesH A D29-Jun-20171.2 KiB5538

Fast.xsH A D29-Jun-201734.5 KiB1,4181,022

LICENSEH A D24-Jun-2017170 74

MANIFESTH A D29-Jun-2017489 2827

MANIFEST.SKIPH A D24-Jun-2017532 3832

META.jsonH A D29-Jun-2017938 4342

META.ymlH A D29-Jun-2017559 2423

Makefile.PLH A D29-Jun-2017938 3027

READMEH A D29-Jun-20174.7 KiB166110

entities.hH A D24-Jun-20171.1 KiB2219

ppport.hH A D24-Jun-2017170.8 KiB7,0643,086

xmlfast.cH A D24-Jun-201716.2 KiB644504

xmlfast.hH A D24-Jun-20172.4 KiB10877

README

1NAME
2    XML::Fast - Simple and very fast XML - hash conversion
3
4SYNOPSIS
5      use XML::Fast;
6
7      my $hash = xml2hash $xml;
8      my $hash2 = xml2hash $xml, attr => '.', text => '~';
9
10DESCRIPTION
11    This module implements simple, state machine based, XML parser written
12    in C.
13
14    It could parse and recover some kind of broken XML's. If you need XML
15    validator, use XML::LibXML
16
17RATIONALE
18    Another similar module is XML::Bare. I've used it for some time, but it
19    have some failures:
20
21    *   If your XML have node with TextNode, then CDATANode, then again
22        TextNode, you'll got broken value
23
24    *   It doesn't support charsets
25
26    *   It doesn't support any kind of entities.
27
28    So, after count of tries to fix XML::Bare I've decided to write parser
29    from scratch.
30
31    Here is some features and principles:
32
33    *   It uses minimal count of memory allocations.
34
35    *   All XML is parsed in 1 scan.
36
37    *   All values are copied from source XML only once (to destination
38        keys/values)
39
40    *   If some types of nodes (for ex comments) are ignored, there are no
41        memory allocations/copy for them.
42
43    I've removed benchmark results, since they are very different for
44    different xml's. Sometimes XML::Bare is faster, sometimes not. So,
45    XML::Fast mainly should be considered not "faster-than-bare", but
46    "format-other-than-bare"
47
48EXPORT
49  xml2hash $xml, [ %options ]
50  hash2xml $hash, [ %options ]
51OPTIONS
52    order [ = 0 ]
53        Not implemented yet. Strictly keep the output order. When enabled,
54        structures become more complex, but xml could be completely
55        reverted.
56
57    attr [ = '-' ]
58        Attribute prefix
59
60            <node attr="test" />  =>  { node => { -attr => "test" } }
61
62    text [ = '#text' ]
63        Key name for storing text
64
65        When undef, text nodes will be ignored
66
67            <node>text<sub /></node>  =>  { node => { sub => '', '#text' => "test" } }
68
69    join [ = '' ]
70        Join separator for text nodes, splitted by subnodes
71
72        Ignored when "order" in effect
73
74            # default:
75            xml2hash( '<item>Test1<sub />Test2</item>' )
76            : { item => { sub => '', '~' => 'Test1Test2' } };
77
78            xml2hash( '<item>Test1<sub />Test2</item>', join => '+' )
79            : { item => { sub => '', '~' => 'Test1+Test2' } };
80
81    trim [ = 1 ]
82        Trim leading and trailing whitespace from text nodes
83
84    cdata [ = undef ]
85        When defined, CDATA sections will be stored under this key
86
87            # cdata = undef
88            <node><![CDATA[ test ]]></node>  =>  { node => 'test' }
89
90            # cdata = '#'
91            <node><![CDATA[ test ]]></node>  =>  { node => { '#' => 'test' } }
92
93    comm [ = undef ]
94        When defined, comments sections will be stored under this key
95
96        When undef, comments will be ignored
97
98            # comm = undef
99            <node><!-- comm --><sub/></node>  =>  { node => { sub => '' } }
100
101            # comm = '/'
102            <node><!-- comm --><sub/></node>  =>  { node => { sub => '', '/' => 'comm' } }
103
104    array => 1
105        Force all nodes to be kept as arrays.
106
107            # no array
108            <node><sub/></node>  =>  { node => { sub => '' } }
109
110            # array = 1
111            <node><sub/></node>  =>  { node => [ { sub => [ '' ] } ] }
112
113    array => [ 'node', 'names']
114        Force nodes with names to be stored as arrays
115
116            # no array
117            <node><sub/></node>  =>  { node => { sub => '' } }
118
119            # array => ['sub']
120            <node><sub/></node>  =>  { node => { sub => [ '' ] } }
121
122    utf8decode => 1
123        Force decoding of utf8 sequences, instead of just upgrading them
124        (may be useful for broken xml)
125
126SEE ALSO
127    *   XML::Bare
128
129        Another fast parser
130
131    *   XML::LibXML
132
133        The most powerful XML parser for perl. If you don't need to parse
134        gigabytes of XML ;)
135
136    *   XML::Hash::LX
137
138        XML parser, that uses XML::LibXML for parsing and then constructs
139        hash structure, identical to one, generated by this module. (At
140        least, it should ;)). But of course it is much more slower, than
141        XML::Fast
142
143LIMITATIONS
144    *   Does not support wide charsets (UTF-16/32) (see RT71534
145        <https://rt.cpan.org/Ticket/Display.html?id=71534>)
146
147TODO
148    *   Ordered mode (as implemented in XML::Hash::LX)
149
150    *   Create hash2xml, identical to one in XML::Hash::LX
151
152    *   Partial content event-based parsing (I need this for reading XML
153        streams)
154
155    Patches, propositions and bug reports are welcome ;)
156
157AUTHOR
158    Mons Anderson, <mons@cpan.org>
159
160COPYRIGHT AND LICENSE
161    Copyright (C) 2010 Mons Anderson
162
163    This library is free software; you can redistribute it and/or modify it
164    under the same terms as Perl itself.
165
166