xref: /freebsd/contrib/expat/doc/xmlwf.xml (revision 85732ac8)
1<!DOCTYPE refentry [
2  <!-- Fill in your name for FIRSTNAME and SURNAME. -->
3  <!ENTITY dhfirstname "<firstname>Scott</firstname>">
4  <!ENTITY dhsurname   "<surname>Bronson</surname>">
5  <!-- Please adjust the date whenever revising the manpage. -->
6  <!ENTITY dhdate      "<date>March 11, 2016</date>">
7  <!-- SECTION should be 1-8, maybe w/ subsection other parameters are
8       allowed: see man(7), man(1). -->
9  <!ENTITY dhsection   "<manvolnum>1</manvolnum>">
10  <!ENTITY dhemail     "<email>bronson@rinspin.com</email>">
11  <!ENTITY dhusername  "Scott Bronson">
12  <!ENTITY dhucpackage "<refentrytitle>XMLWF</refentrytitle>">
13  <!ENTITY dhpackage   "xmlwf">
14
15  <!ENTITY debian      "<productname>Debian GNU/Linux</productname>">
16  <!ENTITY gnu         "<acronym>GNU</acronym>">
17]>
18
19<refentry>
20  <refentryinfo>
21    <address>
22      &dhemail;
23    </address>
24    <author>
25      &dhfirstname;
26      &dhsurname;
27    </author>
28    <copyright>
29      <year>2001</year>
30      <holder>&dhusername;</holder>
31    </copyright>
32    &dhdate;
33  </refentryinfo>
34  <refmeta>
35    &dhucpackage;
36
37    &dhsection;
38  </refmeta>
39  <refnamediv>
40    <refname>&dhpackage;</refname>
41
42    <refpurpose>Determines if an XML document is well-formed</refpurpose>
43  </refnamediv>
44  <refsynopsisdiv>
45    <cmdsynopsis>
46      <command>&dhpackage;</command>
47	  <arg><option>-s</option></arg>
48	  <arg><option>-n</option></arg>
49	  <arg><option>-p</option></arg>
50	  <arg><option>-x</option></arg>
51
52	  <arg><option>-e <replaceable>encoding</replaceable></option></arg>
53	  <arg><option>-w</option></arg>
54
55	  <arg><option>-d <replaceable>output-dir</replaceable></option></arg>
56	  <arg><option>-c</option></arg>
57	  <arg><option>-m</option></arg>
58
59	  <arg><option>-r</option></arg>
60	  <arg><option>-t</option></arg>
61          <arg><option>-N</option></arg>
62
63	  <arg><option>-v</option></arg>
64
65	  <arg>file ...</arg>
66    </cmdsynopsis>
67  </refsynopsisdiv>
68
69  <refsect1>
70    <title>DESCRIPTION</title>
71
72    <para>
73	<command>&dhpackage;</command> uses the Expat library to
74	determine if an XML document is well-formed.  It is
75	non-validating.
76	</para>
77
78	<para>
79	If you do not specify any files on the command-line, and you
80	have a recent version of <command>&dhpackage;</command>, the
81	input file will be read from standard input.
82	</para>
83
84  </refsect1>
85
86  <refsect1>
87    <title>WELL-FORMED DOCUMENTS</title>
88
89	<para>
90	  A well-formed document must adhere to the
91	  following rules:
92	</para>
93
94	<itemizedlist>
95      <listitem><para>
96	    The file begins with an XML declaration.  For instance,
97		<literal>&lt;?xml version="1.0" standalone="yes"?&gt;</literal>.
98		<emphasis>NOTE:</emphasis>
99		<command>&dhpackage;</command> does not currently
100		check for a valid XML declaration.
101      </para></listitem>
102      <listitem><para>
103		Every start tag is either empty (&lt;tag/&gt;)
104		or has a corresponding end tag.
105      </para></listitem>
106      <listitem><para>
107	    There is exactly one root element.  This element must contain
108		all other elements in the document.  Only comments, white
109		space, and processing instructions may come after the close
110		of the root element.
111      </para></listitem>
112      <listitem><para>
113		All elements nest properly.
114      </para></listitem>
115      <listitem><para>
116		All attribute values are enclosed in quotes (either single
117		or double).
118      </para></listitem>
119    </itemizedlist>
120
121	<para>
122	  If the document has a DTD, and it strictly complies with that
123	  DTD, then the document is also considered <emphasis>valid</emphasis>.
124	  <command>&dhpackage;</command> is a non-validating parser --
125	  it does not check the DTD.  However, it does support
126	  external entities (see the <option>-x</option> option).
127	</para>
128  </refsect1>
129
130  <refsect1>
131    <title>OPTIONS</title>
132
133<para>
134When an option includes an argument, you may specify the argument either
135separately ("<option>-d</option> output") or concatenated with the
136option ("<option>-d</option>output").  <command>&dhpackage;</command>
137supports both.
138</para>
139
140    <variablelist>
141
142      <varlistentry>
143        <term><option>-c</option></term>
144        <listitem>
145		<para>
146  If the input file is well-formed and <command>&dhpackage;</command>
147  doesn't encounter any errors, the input file is simply copied to
148  the output directory unchanged.
149  This implies no namespaces (turns off <option>-n</option>) and
150  requires <option>-d</option> to specify an output directory.
151  		</para>
152        </listitem>
153      </varlistentry>
154
155      <varlistentry>
156        <term><option>-d output-dir</option></term>
157        <listitem>
158		<para>
159  Specifies a directory to contain transformed
160  representations of the input files.
161  By default, <option>-d</option> outputs a canonical representation
162  (described below).
163  You can select different output formats using <option>-c</option>,
164  <option>-m</option> and <option>-N</option>.
165	  </para>
166	  <para>
167  The output filenames will
168  be exactly the same as the input filenames or "STDIN" if the input is
169  coming from standard input.  Therefore, you must be careful that the
170  output file does not go into the same directory as the input
171  file.  Otherwise, <command>&dhpackage;</command> will delete the
172  input file before it generates the output file (just like running
173  <literal>cat &lt; file &gt; file</literal> in most shells).
174	  </para>
175	  <para>
176  Two structurally equivalent XML documents have a byte-for-byte
177  identical canonical XML representation.
178  Note that ignorable white space is considered significant and
179  is treated equivalently to data.
180  More on canonical XML can be found at
181  http://www.jclark.com/xml/canonxml.html .
182	  </para>
183        </listitem>
184      </varlistentry>
185
186      <varlistentry>
187        <term><option>-e encoding</option></term>
188        <listitem>
189		<para>
190   Specifies the character encoding for the document, overriding
191   any document encoding declaration.  <command>&dhpackage;</command>
192   supports four built-in encodings:
193   	<literal>US-ASCII</literal>,
194	<literal>UTF-8</literal>,
195	<literal>UTF-16</literal>, and
196	<literal>ISO-8859-1</literal>.
197   Also see the <option>-w</option> option.
198	   </para>
199        </listitem>
200      </varlistentry>
201
202      <varlistentry>
203        <term><option>-m</option></term>
204        <listitem>
205		<para>
206  Outputs some strange sort of XML file that completely
207  describes the input file, including character positions.
208  Requires <option>-d</option> to specify an output file.
209	   </para>
210        </listitem>
211      </varlistentry>
212
213      <varlistentry>
214        <term><option>-n</option></term>
215        <listitem>
216		<para>
217  Turns on namespace processing.  (describe namespaces)
218  <option>-c</option> disables namespaces.
219	   </para>
220        </listitem>
221      </varlistentry>
222
223      <varlistentry>
224        <term><option>-N</option></term>
225        <listitem>
226          <para>
227  Adds a doctype and notation declarations to canonical XML output.
228  This matches the example output used by the formal XML test cases.
229  Requires <option>-d</option> to specify an output file.
230          </para>
231        </listitem>
232      </varlistentry>
233
234      <varlistentry>
235        <term><option>-p</option></term>
236        <listitem>
237		<para>
238    Tells xmlwf to process external DTDs and parameter
239    entities.
240	 </para>
241	 <para>
242   Normally <command>&dhpackage;</command> never parses parameter
243   entities.  <option>-p</option> tells it to always parse them.
244   <option>-p</option> implies <option>-x</option>.
245	   </para>
246        </listitem>
247      </varlistentry>
248
249      <varlistentry>
250        <term><option>-r</option></term>
251        <listitem>
252		<para>
253   Normally <command>&dhpackage;</command> memory-maps the XML file
254   before parsing; this can result in faster parsing on many
255   platforms.
256   <option>-r</option> turns off memory-mapping and uses normal file
257   IO calls instead.
258   Of course, memory-mapping is automatically turned off
259   when reading from standard input.
260	   </para>
261		<para>
262   Use of memory-mapping can cause some platforms to report
263   substantially higher memory usage for
264   <command>&dhpackage;</command>, but this appears to be a matter of
265   the operating system reporting memory in a strange way; there is
266   not a leak in <command>&dhpackage;</command>.
267           </para>
268        </listitem>
269      </varlistentry>
270
271      <varlistentry>
272        <term><option>-s</option></term>
273        <listitem>
274		<para>
275  Prints an error if the document is not standalone.
276  A document is standalone if it has no external subset and no
277  references to parameter entities.
278	   </para>
279        </listitem>
280      </varlistentry>
281
282      <varlistentry>
283        <term><option>-t</option></term>
284        <listitem>
285		<para>
286  Turns on timings.  This tells Expat to parse the entire file,
287  but not perform any processing.
288  This gives a fairly accurate idea of the raw speed of Expat itself
289  without client overhead.
290  <option>-t</option> turns off most of the output options
291  (<option>-d</option>, <option>-m</option>, <option>-c</option>, ...).
292	   </para>
293        </listitem>
294      </varlistentry>
295
296      <varlistentry>
297        <term><option>-v</option></term>
298        <listitem>
299		<para>
300  Prints the version of the Expat library being used, including some
301  information on the compile-time configuration of the library, and
302  then exits.
303	   </para>
304        </listitem>
305      </varlistentry>
306
307      <varlistentry>
308        <term><option>-w</option></term>
309        <listitem>
310		<para>
311  Enables support for Windows code pages.
312  Normally, <command>&dhpackage;</command> will throw an error if it
313  runs across an encoding that it is not equipped to handle itself.  With
314  <option>-w</option>, &dhpackage; will try to use a Windows code
315  page.  See also <option>-e</option>.
316	   </para>
317        </listitem>
318      </varlistentry>
319
320      <varlistentry>
321        <term><option>-x</option></term>
322        <listitem>
323		<para>
324  Turns on parsing external entities.
325  </para>
326<para>
327  Non-validating parsers are not required to resolve external
328  entities, or even expand entities at all.
329  Expat always expands internal entities (?),
330  but external entity parsing must be enabled explicitly.
331  </para>
332  <para>
333  External entities are simply entities that obtain their
334  data from outside the XML file currently being parsed.
335  </para>
336  <para>
337  This is an example of an internal entity:
338<literallayout>
339&lt;!ENTITY vers '1.0.2'&gt;
340</literallayout>
341  </para>
342  <para>
343  And here are some examples of external entities:
344
345<literallayout>
346&lt;!ENTITY header SYSTEM "header-&amp;vers;.xml"&gt;  (parsed)
347&lt;!ENTITY logo SYSTEM "logo.png" PNG&gt;         (unparsed)
348</literallayout>
349
350	   </para>
351        </listitem>
352      </varlistentry>
353
354      <varlistentry>
355        <term><option>--</option></term>
356        <listitem>
357		<para>
358    (Two hyphens.)
359    Terminates the list of options.  This is only needed if a filename
360    starts with a hyphen.  For example:
361	   </para>
362<literallayout>
363&dhpackage; -- -myfile.xml
364</literallayout>
365		<para>
366    will run <command>&dhpackage;</command> on the file
367    <filename>-myfile.xml</filename>.
368	   </para>
369        </listitem>
370      </varlistentry>
371    </variablelist>
372
373	<para>
374    Older versions of <command>&dhpackage;</command> do not support
375    reading from standard input.
376	</para>
377  </refsect1>
378
379  <refsect1>
380  <title>OUTPUT</title>
381    <para>
382	If an input file is not well-formed,
383	<command>&dhpackage;</command> prints a single line describing
384	the problem to standard output.  If a file is well formed,
385	<command>&dhpackage;</command> outputs nothing.
386	Note that the result code is <emphasis>not</emphasis> set.
387	</para>
388  </refsect1>
389
390  <refsect1>
391    <title>BUGS</title>
392	<para>
393	<command>&dhpackage;</command> returns a 0 - noerr result,
394	even if the file is not well-formed.  There is no good way for
395	a program to use <command>&dhpackage;</command> to quickly
396	check a file -- it must parse <command>&dhpackage;</command>'s
397	standard output.
398	</para>
399	<para>
400	The errors should go to standard error, not standard output.
401	</para>
402	<para>
403	There should be a way to get <option>-d</option> to send its
404	output to standard output rather than forcing the user to send
405	it to a file.
406	</para>
407	<para>
408	I have no idea why anyone would want to use the
409	<option>-d</option>, <option>-c</option>, and
410	<option>-m</option> options.  If someone could explain it to
411	me, I'd like to add this information to this manpage.
412	</para>
413  </refsect1>
414
415  <refsect1>
416    <title>ALTERNATIVES</title>
417	<para>
418	  Here are some XML validators on the web:
419
420<literallayout>
421http://www.hcrc.ed.ac.uk/~richard/xml-check.html
422http://www.stg.brown.edu/service/xmlvalid/
423http://www.scripting.com/frontier5/xml/code/xmlValidator.html
424http://www.xml.com/pub/a/tools/ruwf/check.html
425</literallayout>
426
427		 </para>
428  </refsect1>
429
430  <refsect1>
431    <title>SEE ALSO</title>
432	<para>
433
434<literallayout>
435The Expat home page:        http://www.libexpat.org/
436The W3 XML specification:   http://www.w3.org/TR/REC-xml
437</literallayout>
438
439	</para>
440  </refsect1>
441
442  <refsect1>
443    <title>AUTHOR</title>
444    <para>
445	  This manual page was written by &dhusername; &dhemail; for
446      the &debian; system (but may be used by others).  Permission is
447      granted to copy, distribute and/or modify this document under
448      the terms of the <acronym>GNU</acronym> Free Documentation
449      License, Version 1.1.
450	</para>
451  </refsect1>
452</refentry>
453