README
1NAME
2 XML::Handler::YAWriter - Yet another Perl SAX XML Writer
3
4SYNOPSIS
5 use XML::Handler::YAWriter;
6
7 my $ya = new XML::Handler::YAWriter( %options );
8 my $perlsax = new XML::Parser::PerlSAX( 'Handler' => $ya );
9
10
11DESCRIPTION
12 YAWriter implements Yet Another XML::Handler::Writer. The
13 reasons for this one are that I needed a flexible escaping
14 technique, and want some kind of pretty printing. If an
15 instance of YAWriter is created without any options, the
16 default behavior is to produce an array of strings
17 containing the XML in :
18
19 @{$ya->{Strings}}
20
21
22 Options
23
24 Options are given in the usual 'key' => 'value' idiom.
25
26 Output IO::File
27 This option tells YAWriter to use an already open file
28 for output, instead of using $ya->{Strings} to store
29 the array of strings. It should be noted that the only
30 thing the object needs to implement is the print
31 method. So anything can be used to receive a stream of
32 strings from YAWriter.
33
34 AsFile string
35 This option will cause start_document to open named
36 file and end_document to close it. Use the literal
37 dash "-" if you want to print on standard output.
38
39 AsArray boolean
40 This option will force storage of the XML in
41 $ya->{Strings}, even if the Output option is given.
42
43 AsString boolean
44 This option will cause end_document to return the
45 complete XML document in a single string. Most SAX
46 drivers return the value of end_document as a result
47 of their parse method. As this may not work with some
48 combinations of SAX drivers and filters, a join of
49 $ya->{Strings} in the controlling method is preferred.
50
51 Encoding string
52 This will change the default encoding from UTF-8 to
53 anything you like. You should ensure that given data
54 are already in this encoding or provide an Escape
55 hash, to tell YAWriter about the recoding.
56
57 Escape hash
58 The Escape hash defines substitutions that have to be
59 done to any string, with the exception of the
60 processing_instruction and doctype_decl methods, where
61 I think that escaping of target and data would cause
62 more trouble than necessary.
63
64 The default value for Escape is
65
66 $XML::Handler::YAWriter::escape = {
67 '&' => '&',
68 '<' => '<',
69 '>' => '>',
70 '"' => '"',
71 '--' => '--'
72 };
73
74 YAWriter will use an evaluated sub to make the
75 recoding based on a given Escape hash reasonably fast.
76 Future versions may use XS to improve this performance
77 bottleneck.
78
79 Pretty hash
80 Hash of string => boolean tuples, to define kind of
81 prettyprinting. Default to undef. Possible string
82 values:
83
84 AddHiddenNewline boolean
85 Add hidden newline before ">"
86
87 AddHiddenAttrTab boolean
88 Add hidden tabulation for attributes
89
90 CatchEmptyElement boolean
91 Catch empty Elements, apply "/>" compression
92
93 CatchWhiteSpace boolean
94 Catch whitespace with comments
95
96 IsSGML boolean
97 This option will cause start_document,
98 processing_instruction and doctype_decl to appear
99 as SGML. The SGML is still well-formed of course,
100 if your SAX events are well-formed.
101
102 NoComments boolean
103 Supress Comments
104
105 NoDTD boolean
106 Supress DTD
107
108 NoPI boolean
109 Supress Processing Instructions
110
111 NoProlog boolean
112 Supress <?xml ... ?> Prolog
113
114 NoWhiteSpace boolean
115 Supress WhiteSpace to clean documents from prior
116 pretty printing.
117
118 PrettyWhiteIndent boolean
119 Add visible indent before any eventstring
120
121 PrettyWhiteNewline boolean
122 Add visible newlines before any eventstring
123
124 SAX1 boolean (not yet implemented)
125 Output only SAX1 compliant eventstrings
126
127 Notes:
128
129 Correct handling of start_document and end_document is
130 required!
131
132 The YAWriter Object initialises its structures during
133 start_document and does its cleanup during end_document.
134 If you forget to call start_document, any other method
135 will break during the run. Most likely place is the encode
136 method, trying to eval undef as a subroutine. If you
137 forget to call end_document, you should not use a single
138 instance of YAWriter more than once.
139
140 For small documents AsArray may be the fastest method and
141 AsString the easiest one to receive the output of
142 YAWriter. But AsString and AsArray may run out of memory
143 with infinite SAX streams. The only method
144 XML::Handler::Writer calls on a given Output object is the
145 print method. So it's easy to use a self written Output
146 object to improve streaming.
147
148 A single instance of XML::Handler::YAWriter is able to
149 produce more than one file in a single run. Be sure to
150 provide a fresh IO::File as Output before you call
151 start_document and close this File after calling
152 end_document. Or provide a filename in AsFile, so
153 start_document and end_document can open and close its own
154 filehandle.
155
156 Automatic recoding between 8bit and 16bit does not yet
157 work correctly !
158
159 I have Perl-5.00563 at home and here I can specify "use
160 utf8;" in the right places to make recoding work. But I
161 dislike saying "use 5.00555;" because many systems run
162 5.00503.
163
164 If you use some 8bit character set internally and want use
165 national characters, either state your character as
166 Encoding to be ISO-8859-1, or provide an Escape hash
167 similar to the following :
168
169 $ya->{'Escape'} = {
170 '&' => '&',
171 '<' => '<',
172 '>' => '>',
173 '"' => '"',
174 '--' => '--'
175 '�' => 'ö'
176 '�' => 'ä'
177 '�' => 'ü'
178 '�' => 'Ö'
179 '�' => 'Ä'
180 '�' => 'Ü'
181 '�' => 'ß'
182 };
183
184 You may abuse YAWriter to clean whitespace from XML
185 documents. Take a look at test.pl, doing just that with an
186 XML::Edifact message, without querying the DTD. This may
187 work in 99% of the cases where you want to get rid of
188 ignorable whitespace caused by the various forms of pretty
189 printing.
190
191 my $ya = new XML::Handler::YAWriter(
192 'Output' => new IO::File ( ">-" );
193 'Pretty' => {
194 'NoWhiteSpace'=>1,
195 'NoComments'=>1,
196 'AddHiddenNewline'=>1,
197 'AddHiddenAttrTab'=>1,
198 } );
199
200 XML::Handler::Writer implements any method
201 XML::Parser::PerlSAX wants. This extends the Java SAX1.0
202 specification. I have in mind using Pretty=>SAX1=>1 to
203 disable this feature, if abusing YAWriter for a SAX proxy.
204
205AUTHOR
206 Michael Koehne, Kraehe@Copyleft.De
207
208Thanks
209 "Derksen, Eduard (Enno), CSCIO" <enno@att.com> helped me
210 with the Escape hash and gave quite a lot of useful
211 comments.
212
213SEE ALSO
214 the perl manpage and the XML::Parser::PerlSAX manpage
215