1==========================
2 Docutils_ Hacker's Guide
3==========================
4
5:Author: Lea Wiemann
6:Contact: docutils-develop@lists.sourceforge.net
7:Revision: $Revision: 7302 $
8:Date: $Date: 2012-01-03 20:23:53 +0100 (Di, 03. Jän 2012) $
9:Copyright: This document has been placed in the public domain.
10
11:Abstract: This is the introduction to Docutils for all persons who
12    want to extend Docutils in some way.
13:Prerequisites: You have used reStructuredText_ and played around with
14    the `Docutils front-end tools`_ before.  Some (basic) Python
15    knowledge is certainly helpful (though not necessary, strictly
16    speaking).
17
18.. _Docutils: http://docutils.sourceforge.net/
19.. _reStructuredText: http://docutils.sourceforge.net/rst.html
20.. _Docutils front-end tools: ../user/tools.html
21
22.. contents::
23
24
25Overview of the Docutils Architecture
26=====================================
27
28To give you an understanding of the Docutils architecture, we'll dive
29right into the internals using a practical example.
30
31Consider the following reStructuredText file::
32
33    My *favorite* language is Python_.
34
35    .. _Python: http://www.python.org/
36
37Using the ``rst2html.py`` front-end tool, you would get an HTML output
38which looks like this::
39
40    [uninteresting HTML code removed]
41    <body>
42    <div class="document">
43    <p>My <em>favorite</em> language is <a class="reference" href="http://www.python.org/">Python</a>.</p>
44    </div>
45    </body>
46    </html>
47
48While this looks very simple, it's enough to illustrate all internal
49processing stages of Docutils.  Let's see how this document is
50processed from the reStructuredText source to the final HTML output:
51
52
53Reading the Document
54--------------------
55
56The **Reader** reads the document from the source file and passes it
57to the parser (see below).  The default reader is the standalone
58reader (``docutils/readers/standalone.py``) which just reads the input
59data from a single text file.  Unless you want to do really fancy
60things, there is no need to change that.
61
62Since you probably won't need to touch readers, we will just move on
63to the next stage:
64
65
66Parsing the Document
67--------------------
68
69The **Parser** analyzes the the input document and creates a **node
70tree** representation.  In this case we are using the
71**reStructuredText parser** (``docutils/parsers/rst/__init__.py``).
72To see what that node tree looks like, we call ``quicktest.py`` (which
73can be found in the ``tools/`` directory of the Docutils distribution)
74with our example file (``test.txt``) as first parameter (Windows users
75might need to type ``python quicktest.py test.txt``)::
76
77    $ quicktest.py test.txt
78    <document source="test.txt">
79        <paragraph>
80            My
81            <emphasis>
82                favorite
83             language is
84            <reference name="Python" refname="python">
85                Python
86            .
87        <target ids="python" names="python" refuri="http://www.python.org/">
88
89Let us now examine the node tree:
90
91The top-level node is ``document``.  It has a ``source`` attribute
92whose value is ``text.txt``.  There are two children: A ``paragraph``
93node and a ``target`` node.  The ``paragraph`` in turn has children: A
94text node ("My "), an ``emphasis`` node, a text node (" language is "),
95a ``reference`` node, and again a ``Text`` node (".").
96
97These node types (``document``, ``paragraph``, ``emphasis``, etc.) are
98all defined in ``docutils/nodes.py``.  The node types are internally
99arranged as a class hierarchy (for example, both ``emphasis`` and
100``reference`` have the common superclass ``Inline``).  To get an
101overview of the node class hierarchy, use epydoc (type ``epydoc
102nodes.py``) and look at the class hierarchy tree.
103
104
105Transforming the Document
106-------------------------
107
108In the node tree above, the ``reference`` node does not contain the
109target URI (``http://www.python.org/``) yet.
110
111Assigning the target URI (from the ``target`` node) to the
112``reference`` node is *not* done by the parser (the parser only
113translates the input document into a node tree).
114
115Instead, it's done by a **Transform**.  In this case (resolving a
116reference), it's done by the ``ExternalTargets`` transform in
117``docutils/transforms/references.py``.
118
119In fact, there are quite a lot of Transforms, which do various useful
120things like creating the table of contents, applying substitution
121references or resolving auto-numbered footnotes.
122
123The Transforms are applied after parsing.  To see how the node tree
124has changed after applying the Transforms, we use the
125``rst2pseudoxml.py`` tool:
126
127.. parsed-literal::
128
129    $ rst2pseudoxml.py test.txt
130    <document source="test.txt">
131        <paragraph>
132            My
133            <emphasis>
134                favorite
135             language is
136            <reference name="Python" **refuri="http://www.python.org/"**>
137                Python
138            .
139        <target ids="python" names="python" ``refuri="http://www.python.org/"``>
140
141For our small test document, the only change is that the ``refname``
142attribute of the reference has been replaced by a ``refuri``
143attribute |---| the reference has been resolved.
144
145While this does not look very exciting, transforms are a powerful tool
146to apply any kind of transformation on the node tree.
147
148By the way, you can also get a "real" XML representation of the node
149tree by using ``rst2xml.py`` instead of ``rst2pseudoxml.py``.
150
151
152Writing the Document
153--------------------
154
155To get an HTML document out of the node tree, we use a **Writer**, the
156HTML writer in this case (``docutils/writers/html4css1.py``).
157
158The writer receives the node tree and returns the output document.
159For HTML output, we can test this using the ``rst2html.py`` tool::
160
161    $ rst2html.py --link-stylesheet test.txt
162    <?xml version="1.0" encoding="utf-8" ?>
163    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
164    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
165    <head>
166    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
167    <meta name="generator" content="Docutils 0.3.10: http://docutils.sourceforge.net/" />
168    <title></title>
169    <link rel="stylesheet" href="../docutils/writers/html4css1/html4css1.css" type="text/css" />
170    </head>
171    <body>
172    <div class="document">
173    <p>My <em>favorite</em> language is <a class="reference" href="http://www.python.org/">Python</a>.</p>
174    </div>
175    </body>
176    </html>
177
178So here we finally have our HTML output.  The actual document contents
179are in the fourth-last line.  Note, by the way, that the HTML writer
180did not render the (invisible) ``target`` node |---| only the
181``paragraph`` node and its children appear in the HTML output.
182
183
184Extending Docutils
185==================
186
187Now you'll ask, "how do I actually extend Docutils?"
188
189First of all, once you are clear about *what* you want to achieve, you
190have to decide *where* to implement it |---| in the Parser (e.g. by
191adding a directive or role to the reStructuredText parser), as a
192Transform, or in the Writer.  There is often one obvious choice among
193those three (Parser, Transform, Writer).  If you are unsure, ask on
194the Docutils-develop_ mailing list.
195
196In order to find out how to start, it is often helpful to look at
197similar features which are already implemented.  For example, if you
198want to add a new directive to the reStructuredText parser, look at
199the implementation of a similar directive in
200``docutils/parsers/rst/directives/``.
201
202
203Modifying the Document Tree Before It Is Written
204------------------------------------------------
205
206You can modify the document tree right before the writer is called.
207One possibility is to use the publish_doctree_ and
208publish_from_doctree_ functions.
209
210To retrieve the document tree, call::
211
212    document = docutils.core.publish_doctree(...)
213
214Please see the docstring of publish_doctree for a list of parameters.
215
216.. XXX Need to write a well-readable list of (commonly used) options
217   of the publish_* functions.  Probably in api/publisher.txt.
218
219``document`` is the root node of the document tree.  You can now
220change the document by accessing the ``document`` node and its
221children |---| see `The Node Interface`_ below.
222
223When you're done with modifying the document tree, you can write it
224out by calling::
225
226    output = docutils.core.publish_from_doctree(document, ...)
227
228.. _publish_doctree: ../api/publisher.html#publish_doctree
229.. _publish_from_doctree: ../api/publisher.html#publish_from_doctree
230
231
232The Node Interface
233------------------
234
235As described in the overview above, Docutils' internal representation
236of a document is a tree of nodes.  We'll now have a look at the
237interface of these nodes.
238
239(To be completed.)
240
241
242What Now?
243=========
244
245This document is not complete.  Many topics could (and should) be
246covered here.  To find out with which topics we should write about
247first, we are awaiting *your* feedback.  So please ask your questions
248on the Docutils-develop_ mailing list.
249
250
251.. _Docutils-develop: ../user/mailing-lists.html#docutils-develop
252
253
254.. |---| unicode:: 8212 .. em-dash
255   :trim:
256
257
258..
259   Local Variables:
260   mode: indented-text
261   indent-tabs-mode: nil
262   sentence-end-double-space: t
263   fill-column: 70
264   End:
265