1<Chapter Label="ch:intro"><Heading>Introduction and Example</Heading>
2
3The main  purpose of  the &GAPDoc; package  is to define  a file  format for
4documentation of &GAP;-programs and -packages (see <Cite Key="GAP4" />). The
5problem  is that  such documentation  should be  readable in  several output
6formats. For example it should be  possible to read the documentation inside
7the terminal in which  &GAP; is running (a text mode) and  there should be a
8printable version in  high typesetting quality (produced by  some version of
9&TeX;). It  is also popular to  view &GAP;'s online help  with a Web-browser
10via an HTML-version  of the documentation. Nowadays one can  use &LaTeX; and
11standard viewer  programs to produce and  view on the screen  <C>dvi</C>- or
12<C>pdf</C>-files  with full  support  of internal  and external  hyperlinks.
13Certainly there will be other interesting document formats and tools in this
14direction in the future. <P/>
15
16Our aim is to find a <Emph>format for writing</Emph> the documentation which
17allows a relatively easy translation  into the output formats just mentioned
18and which hopefully  makes it easy to translate to  future output formats as
19well. <P/>
20
21To make  documentation written  in the &GAPDoc;  format directly  usable, we
22also  provide a  set of  programs, called  converters, which  produce text-,
23hyperlinked &LaTeX;- and HTML-output versions  of a &GAPDoc; document. These
24programs are developed by the first named author. They run completely inside
25&GAP;, i.e., no external programs are needed. You only need <C>latex</C> and
26<C>pdflatex</C> to process the &LaTeX;  output. These programs are described
27in Chapter&nbsp;<Ref Chap="ch:conv"/>.
28
29<Section Label="sec:XML"><Heading>XML</Heading>
30<Index >XML</Index>
31
32The definition  of the  &GAPDoc; format uses  XML, the  <Q>eXtendible Markup
33Language</Q>.  This  is a  standard  (defined  by  the W3C  consortium,  see
34<URL>http://www.w3c.org</URL>) which lays down a syntax for adding markup to
35a document  or to  some data.  It allows to  define document  structures via
36introducing markup <E>elements</E> and  certain relations between them. This
37is done  in a  <E>document type  definition</E>. The  file <F>gapdoc.dtd</F>
38contains such  a document  type definition  and is the  central part  of the
39&GAPDoc; package. <P/>
40
41The easiest way for getting a good idea about this is probably to look at an
42example. The Appendix&nbsp;<Ref Appendix="app:3k+1"  /> contains a short but
43complete  &GAPDoc; document  for a  fictitious  share package.  In the  next
44section we will go through this  document, explain basic facts about XML and
45the &GAPDoc; document type, and give pointers to more details in later parts
46of this documentation. <P/>
47
48In the last Section&nbsp;<Ref Sect="sec:faq" /> of this introductory chapter
49we try  to answer some general  questions about the decisions  which lead to
50the &GAPDoc; package.
51
52</Section>
53
54<Section Label="sec:3k+1expl"><Heading>A complete example</Heading>
55
56In  this  section  we  recall  the   lines  from  the  example  document  in
57Appendix&nbsp;<Ref Appendix="app:3k+1" /> and give some explanations.
58
59<Listing Type="from 3k+1.xml">
60<![CDATA[<?xml version="1.0" encoding="UTF-8"?> ]]>
61</Listing>
62
63This line just tells a human  reader and computer programs that the file
64is a document with XML markup and  that the text is encoded in the UTF-8
65character set (other common encodings are ASCII or ISO-8895-X encodings).
66
67<Listing Type="from 3k+1.xml">
68<![CDATA[<!--   A complete "fake package" documentation
69-->
70]]></Listing>
71
72Everything   in    a   XML    file   between    <Q><C>&lt;!--</C></Q>   and
73<Q><C>--></C></Q> is a comment and not part of the document content.
74
75<Listing Type="from 3k+1.xml">
76<![CDATA[<!DOCTYPE Book SYSTEM "gapdoc.dtd">
77]]></Listing>
78
79This  line says  that  the  document contains  markup  which  is defined  in
80the  system  file  <F>gapdoc.dtd</F>  and  that  the  markup  obeys  certain
81rules defined  in that  file (the ending  <F>dtd</F> means  <Q>document type
82definition</Q>). It  further says  that the actual  content of  the document
83consists of an element with name <Q>Book</Q>. And we can really see that the
84remaining part of the file is enclosed as follows:
85
86<Listing Type="from 3k+1.xml">
87<![CDATA[<Book Name="3k+1">
88  [...] (content omitted)
89</Book>
90]]></Listing>
91
92This demonstrates the basics of the markup in XML. This part of the document
93is an <Q>element</Q>. It consists  of the <Q>start tag</Q> <C><![CDATA[<Book
94Name="3k+1">]]></C>,  the  <Q>element  content</Q> and  the  <Q>end  tag</Q>
95<C><![CDATA[</Book>]]></C> (end  tags always start with  <C>&lt;/</C>). This
96element  also  has an  <Q>attribute</Q>  <C>Name</C>  whose <Q>value</Q>  is
97<C>3k+1</C>.
98<P/>
99
100If  you know  HTML, this  will  look familiar  to  you. But  there are  some
101important  differences:  The element  name  <C>Book</C>  and attribute  name
102<C>Name</C>  are  <E>case sensitive</E>.  The  value  of an  attribute  must
103<E>always</E> be enclosed in quotes. In XML <E>every</E> element has a start
104and end tag (which can be combined for elements defined as <Q>empty</Q>, see
105for example <C>&lt;TableOfContents/&gt;</C> below).
106<P/>
107
108If   you   know   &LaTeX;,   you   are   familiar   with   quite   different
109types  of   markup,  for   example:  The   equivalent  of   the  <C>Book</C>
110element   in   &LaTeX;   is   <C>\begin{document}   ...
111\end{document}</C>.  The sectioning  in &LaTeX;  is not
112done  by  explicit  start  and   end  markup,  but  implicitly  via  heading
113commands  like  <C>\section</C>.  Other   markup  is  done  by  using
114braces  <C>{}</C> and  putting some  commands inside.  And for
115mathematical  formulae  one  can  use  the  <C>$</C>  for  the  start
116<E>and</E> the end of the markup.  In XML <E>all</E> markup looks similar to
117that of the <C>Book</C> element. <P/>
118
119The content of the book starts with a title page.
120
121<Listing Type="from 3k+1.xml">
122<![CDATA[<TitlePage>
123  <Title>The <Package>ThreeKPlusOne</Package> Package</Title>
124  <Version>Version 42</Version>
125  <Author>Dummy Authör
126    <Email>3kplusone@dev.null</Email>
127  </Author>
128
129  <Copyright>&copyright; 2000 The Author. <P/>
130    You can do with this package what you want.<P/> Really.
131  </Copyright>
132</TitlePage>
133]]></Listing>
134
135The content of  the <C>TitlePage</C> element consists again  of elements. In
136Chapter&nbsp;<Ref  Chap="DTD"  /> we  describe  which  elements are  allowed
137within  a <C>TitlePage</C>  and that  their ordering  is prescribed  in this
138case. In  the (stupid) name of  the author you  see that a German  umlaut is
139used directly (in ISO-latin1 encoding).
140<P/>
141
142Contrary to &LaTeX;-  or HTML-files this markup does not  say anything about
143the actual layout of  the title page in any output  version of the document.
144It just adds information about the <E>meaning</E> of pieces of text. <P/>
145
146Within the <C>Copyright</C> element there are two more things to learn about
147XML markup. The <C>&lt;P/></C> is a  complete element. It is a combined
148start and end  tag. This shortcut is allowed for  elements which are defined
149to be  always <Q>empty</Q>, i.e., to  have no content. You  may have already
150guessed that <C>&lt;P/></C> is used as a paragraph separator. Note that
151empty lines do not separate paragraphs (contrary to &LaTeX;). <P/>
152
153The  other construct  we see  here  is <C>&amp;copyright;</C>.  This is  an
154example of  an <Q>entity</Q>  in XML  and is a  macro for  some substitution
155text. Here we use an entity as a shortcut for a complicated expression which
156makes it  possible that the  term <E>copyright</E>  is printed as  some text
157like <C>(C)</C>  in text  terminal output  and as  a copyright  character in
158other output formats. In &GAPDoc;  we predefine some entities.
159Certain <Q>special  characters</Q> must be  typed via entities,  for example
160<Q>&lt;</Q>, <Q>></Q> and <Q>&amp;</Q> to avoid a misinterpretation as
161XML markup.    It  is  possible   to  define
162additional entities for your document inside the <C>&lt;!DOCTYPE ...&gt;</C>
163declaration, see&nbsp;<Ref Subsect="GDent" />. <P/>
164
165Note  that elements  in  XML must  always  be properly  nested,  as in  this
166example. A construct like <C><![CDATA[<a><b>...</a></b>]]></C> is <E>not</E>
167allowed.
168
169<Listing Type="from 3k+1.xml">
170<![CDATA[<TableOfContents/>
171]]></Listing>
172
173This is  another example of  an <Q>empty element</Q>.  It just means  that a
174table of contents for the whole  document should be included into any output
175version of the document.
176<P/>
177After this the  main text of the document follows  inside certain sectioning
178elements:
179
180<Listing Type="from 3k+1.xml">
181<![CDATA[<Body>
182  <Chapter> <Heading>The <M>3k+1</M> Problem</Heading>
183    <Section Label="sec:theory"> <Heading>Theory</Heading>
184      [...] (content omitted)
185    </Section>
186    <Section> <Heading>Program</Heading>
187      [...] (content omitted)
188    </Section>
189  </Chapter>
190</Body>
191]]></Listing>
192
193These   elements   are   used  similarly   to   <Q>\chapter</Q>   and
194<Q>\section</Q> in &LaTeX;.  But note that the explicit  end tags are
195necessary here.
196<P/>
197The sectioning commands allow to assign an optional attribute <Q>Label</Q>.
198This can be used for referring to a section inside the document.
199<P/>
200The text of the first section starts  as follows. The whitespace in the text
201is unimportant and the indenting is not necessary.
202
203<Listing Type="from 3k+1.xml">
204
205<![CDATA[      Let  <M>k \in  &NN;</M> be  a  natural number.  We consider  the
206      sequence <M>n(i, k), i \in &NN;,</M> with <M>n(1, k) = k</M> and
207      else
208]]></Listing>
209
210Here we come  to the interesting question how to  type mathematical formulae
211in a &GAPDoc; document. We did not find any alternative for writing formulae
212in &TeX;  syntax. (There is MATHML,  but even simple formulae  contain a lot
213of  markup,  become  quite  unreadable  and they  are  cumbersome  to  type.
214Furthermore  there  seem to  be  no  tools  available which  translate  such
215formulae in  a nice way into  &TeX; and text.) So,  formulae are essentially
216typed  as  in &LaTeX;.  (Actually,  it  is  also  possible to  type  unicode
217characters of some mathematical symbols directly,  or via an entity like the
218<C>&amp;NN;</C>  above.)  There  are  three  types  of  elements  containing
219formulae: <Q>M</Q>,  <Q>Math</Q> and <Q>Display</Q>.  The first two  are for
220in-text formulae and the third is  for displayed formulae. Here <Q>M</Q> and
221<Q>Math</Q>  are  equivalent,  when  translating a  &GAPDoc;  document  into
222&LaTeX;.  But they  are handled  differently  for terminal  text (and  HTML)
223output. For the content of an <Q>M</Q>-element there are defined rules for a
224translation into well readable terminal  text. More complicated formulae are
225in <Q>Math</Q> or <Q>Display</Q> elements and  they are just printed as they
226are typed  in text  output. So,  to make  a section  well readable  inside a
227terminal window  you should  try to  put as many  formulae as  possible into
228<Q>M</Q>-elements. In our  example text we used the  notation <C>n(i, k)</C>
229instead of  <C>n_i(k)</C> because  it is  easier to read  in text  mode. See
230Sections&nbsp;<Ref Sect="GDformulae"/> and&nbsp;<Ref  Sect="sec:misc" /> for
231more details. <P/>
232
233A few lines further on we find two non-internal references.
234
235<Listing Type="from 3k+1.xml">
236<![CDATA[      problem, see <Cite Key="Wi98"/> or
237      <URL>http://mathsrv.ku-eichstaett.de/MGF/homes/wirsching/</URL>
238]]></Listing>
239
240The  first within  the <Q>Cite</Q>-element  is the  citation of  a book.  In
241&GAPDoc;  we use  the widely  used  &BibTeX; database  format for  reference
242lists. This  does not use  XML but has a  well documented structure which is
243easy  to parse.  And  many  people have  collections  of references  readily
244available in this format. The reference list in an  output version of the
245document is produced with the empty element
246
247<Listing Type="from 3k+1.xml">
248<![CDATA[<Bibliography Databases="3k+1" />
249]]></Listing>
250
251close  to  the end  of  our  example  file. The  attribute  <Q>Databases</Q>
252give  the name(s)  of the  database  (<F>.bib</F>) files  which contain  the
253references.
254<P/>
255
256Putting  a  Web-address  into an  <Q>URL</Q>-element allows one to create  a
257hyperlink in output formats which allow this.
258<P/>
259
260The second section of our example contains a special kind of subsection
261defined in &GAPDoc;.
262
263<Listing Type="from 3k+1.xml">
264<![CDATA[      <ManSection>
265        <Func Name="ThreeKPlusOneSequence" Arg="k[, max]"/>
266        <Description>
267          This  function computes  for a  natural number  <A>k</A> the
268          beginning of the sequence  <M>n(i, k)</M> defined in section
269          <Ref Sect="sec:theory"/>.  The sequence  stops at  the first
270          <M>1</M>  or at  <M>n(<A>max</A>, k)</M>,  if <A>max</A>  is
271          given.
272<Example>
273gap> ThreeKPlusOneSequence(101);
274"Sorry, not yet implemented. Wait for Version 84 of the package"
275</Example>
276        </Description>
277      </ManSection>
278]]></Listing>
279
280A <Q>ManSection</Q>  contains the  description of some  function, operation,
281method, filter  and so on. The  <Q>Func</Q>-element describes the name  of a
282<E>function</E> (there  are also similar elements  <Q>Oper</Q>, <Q>Meth</Q>,
283<Q>Filt</Q>  and so  on) and  names  for its  arguments, optional  arguments
284enclosed in square brackets. See Section&nbsp;<Ref Sect="sec:mansect" /> for
285more details. <P/>
286
287In the <Q>Description</Q> we write  the argument names as <Q>A</Q>-elements.
288A good  description of a function  should usually contain an  example of its
289use.  For this  there  are  some verbatim-like  elements  in &GAPDoc;,  like
290<Q>Example</Q>  above  (here, clearly,  whitespace  matters  which causes  a
291slightly strange indenting). <P/>
292
293The  text contains  an  internal  reference to  the  first  section via  the
294explicitly defined label <C>sec:theory</C>.
295<P/>
296
297The first  section also  contains a <Q>Ref</Q>-element  which refers  to the
298function described  here. Note that  there is no  explicit label for  such a
299reference. The pair  <C><![CDATA[<Func Name="ThreeKPlusOneSequence" Arg="k[,
300max]"/>]]></C>  and  <C><![CDATA[<Ref  Func="ThreeKPlusOneSequence"/>]]></C>
301does the cross referencing (and hyperlinking if possible) implicitly via the
302name of the function.
303<P/>
304
305Here  is one  further element  from our  example document  which we  want to
306explain.
307
308
309<Listing Type="from 3k+1.xml">
310<![CDATA[<TheIndex/>
311]]></Listing>
312
313This is again an empty element which just says that an output version of the
314document should contain  an index. Many entries for the  index are generated
315automatically  because  the  <Q>Func</Q>  and  similar  elements  implicitly
316produce such  entries. It  is also possible  to include  explicit additional
317entries in the index.
318
319</Section>
320
321
322<Section Label="sec:faq"><Heading>Some questions</Heading>
323
324<List>
325 <Mark>Are those XML files too ugly to read and edit?</Mark>
326 <Item>
327  Just have a look and decide yourself. The markup needs more characters
328  than most &TeX; or &LaTeX; markup. But the structure of the document is
329  easier to see. If you configure your favorite editor well, you do not need
330  more key strokes for typing the markup than in &LaTeX;.
331 </Item>
332
333 <Mark>Why do we not use &LaTeX; alone?</Mark>
334 <Item>
335  &LaTeX; is  good for  writing books. But  &LaTeX; files  are generally
336  difficult to  parse and to process  to other output formats  like text
337  for browsing  in a terminal window  or HTML (or new  formats which may
338  become  popular in  the  future).  &GAPDoc; markup  is  one step  more
339  abstract  than &LaTeX;  insofar  as it  describes  meaning instead  of
340  appearance of text. The inner  workings of &LaTeX; are too complicated
341  to learn without  pain, which makes it difficult  to overcome problems
342  that occur occasionally.
343 </Item>
344
345 <Mark>Why XML and not a newly defined markup language?</Mark>
346 <Item>
347  XML is a well defined standard that is more and more widely used. Lots
348  of people have thought about it. Years of experience with SGML went into the
349  design. It is easy to explain, easy to parse and lots of tools are available,
350  there will be more in the future.
351 </Item>
352</List>
353
354
355</Section>
356
357</Chapter>
358
359