1========================== 2 Docutils_ Hacker's Guide 3========================== 4 5:Author: Lea Wiemann 6:Contact: docutils-develop@lists.sourceforge.net 7:Revision: $Revision: 7302 $ 8:Date: $Date: 2012-01-03 20:23:53 +0100 (Di, 03. Jän 2012) $ 9:Copyright: This document has been placed in the public domain. 10 11:Abstract: This is the introduction to Docutils for all persons who 12 want to extend Docutils in some way. 13:Prerequisites: You have used reStructuredText_ and played around with 14 the `Docutils front-end tools`_ before. Some (basic) Python 15 knowledge is certainly helpful (though not necessary, strictly 16 speaking). 17 18.. _Docutils: http://docutils.sourceforge.net/ 19.. _reStructuredText: http://docutils.sourceforge.net/rst.html 20.. _Docutils front-end tools: ../user/tools.html 21 22.. contents:: 23 24 25Overview of the Docutils Architecture 26===================================== 27 28To give you an understanding of the Docutils architecture, we'll dive 29right into the internals using a practical example. 30 31Consider the following reStructuredText file:: 32 33 My *favorite* language is Python_. 34 35 .. _Python: http://www.python.org/ 36 37Using the ``rst2html.py`` front-end tool, you would get an HTML output 38which looks like this:: 39 40 [uninteresting HTML code removed] 41 <body> 42 <div class="document"> 43 <p>My <em>favorite</em> language is <a class="reference" href="http://www.python.org/">Python</a>.</p> 44 </div> 45 </body> 46 </html> 47 48While this looks very simple, it's enough to illustrate all internal 49processing stages of Docutils. Let's see how this document is 50processed from the reStructuredText source to the final HTML output: 51 52 53Reading the Document 54-------------------- 55 56The **Reader** reads the document from the source file and passes it 57to the parser (see below). The default reader is the standalone 58reader (``docutils/readers/standalone.py``) which just reads the input 59data from a single text file. Unless you want to do really fancy 60things, there is no need to change that. 61 62Since you probably won't need to touch readers, we will just move on 63to the next stage: 64 65 66Parsing the Document 67-------------------- 68 69The **Parser** analyzes the the input document and creates a **node 70tree** representation. In this case we are using the 71**reStructuredText parser** (``docutils/parsers/rst/__init__.py``). 72To see what that node tree looks like, we call ``quicktest.py`` (which 73can be found in the ``tools/`` directory of the Docutils distribution) 74with our example file (``test.txt``) as first parameter (Windows users 75might need to type ``python quicktest.py test.txt``):: 76 77 $ quicktest.py test.txt 78 <document source="test.txt"> 79 <paragraph> 80 My 81 <emphasis> 82 favorite 83 language is 84 <reference name="Python" refname="python"> 85 Python 86 . 87 <target ids="python" names="python" refuri="http://www.python.org/"> 88 89Let us now examine the node tree: 90 91The top-level node is ``document``. It has a ``source`` attribute 92whose value is ``text.txt``. There are two children: A ``paragraph`` 93node and a ``target`` node. The ``paragraph`` in turn has children: A 94text node ("My "), an ``emphasis`` node, a text node (" language is "), 95a ``reference`` node, and again a ``Text`` node ("."). 96 97These node types (``document``, ``paragraph``, ``emphasis``, etc.) are 98all defined in ``docutils/nodes.py``. The node types are internally 99arranged as a class hierarchy (for example, both ``emphasis`` and 100``reference`` have the common superclass ``Inline``). To get an 101overview of the node class hierarchy, use epydoc (type ``epydoc 102nodes.py``) and look at the class hierarchy tree. 103 104 105Transforming the Document 106------------------------- 107 108In the node tree above, the ``reference`` node does not contain the 109target URI (``http://www.python.org/``) yet. 110 111Assigning the target URI (from the ``target`` node) to the 112``reference`` node is *not* done by the parser (the parser only 113translates the input document into a node tree). 114 115Instead, it's done by a **Transform**. In this case (resolving a 116reference), it's done by the ``ExternalTargets`` transform in 117``docutils/transforms/references.py``. 118 119In fact, there are quite a lot of Transforms, which do various useful 120things like creating the table of contents, applying substitution 121references or resolving auto-numbered footnotes. 122 123The Transforms are applied after parsing. To see how the node tree 124has changed after applying the Transforms, we use the 125``rst2pseudoxml.py`` tool: 126 127.. parsed-literal:: 128 129 $ rst2pseudoxml.py test.txt 130 <document source="test.txt"> 131 <paragraph> 132 My 133 <emphasis> 134 favorite 135 language is 136 <reference name="Python" **refuri="http://www.python.org/"**> 137 Python 138 . 139 <target ids="python" names="python" ``refuri="http://www.python.org/"``> 140 141For our small test document, the only change is that the ``refname`` 142attribute of the reference has been replaced by a ``refuri`` 143attribute |---| the reference has been resolved. 144 145While this does not look very exciting, transforms are a powerful tool 146to apply any kind of transformation on the node tree. 147 148By the way, you can also get a "real" XML representation of the node 149tree by using ``rst2xml.py`` instead of ``rst2pseudoxml.py``. 150 151 152Writing the Document 153-------------------- 154 155To get an HTML document out of the node tree, we use a **Writer**, the 156HTML writer in this case (``docutils/writers/html4css1.py``). 157 158The writer receives the node tree and returns the output document. 159For HTML output, we can test this using the ``rst2html.py`` tool:: 160 161 $ rst2html.py --link-stylesheet test.txt 162 <?xml version="1.0" encoding="utf-8" ?> 163 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 164 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 165 <head> 166 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> 167 <meta name="generator" content="Docutils 0.3.10: http://docutils.sourceforge.net/" /> 168 <title></title> 169 <link rel="stylesheet" href="../docutils/writers/html4css1/html4css1.css" type="text/css" /> 170 </head> 171 <body> 172 <div class="document"> 173 <p>My <em>favorite</em> language is <a class="reference" href="http://www.python.org/">Python</a>.</p> 174 </div> 175 </body> 176 </html> 177 178So here we finally have our HTML output. The actual document contents 179are in the fourth-last line. Note, by the way, that the HTML writer 180did not render the (invisible) ``target`` node |---| only the 181``paragraph`` node and its children appear in the HTML output. 182 183 184Extending Docutils 185================== 186 187Now you'll ask, "how do I actually extend Docutils?" 188 189First of all, once you are clear about *what* you want to achieve, you 190have to decide *where* to implement it |---| in the Parser (e.g. by 191adding a directive or role to the reStructuredText parser), as a 192Transform, or in the Writer. There is often one obvious choice among 193those three (Parser, Transform, Writer). If you are unsure, ask on 194the Docutils-develop_ mailing list. 195 196In order to find out how to start, it is often helpful to look at 197similar features which are already implemented. For example, if you 198want to add a new directive to the reStructuredText parser, look at 199the implementation of a similar directive in 200``docutils/parsers/rst/directives/``. 201 202 203Modifying the Document Tree Before It Is Written 204------------------------------------------------ 205 206You can modify the document tree right before the writer is called. 207One possibility is to use the publish_doctree_ and 208publish_from_doctree_ functions. 209 210To retrieve the document tree, call:: 211 212 document = docutils.core.publish_doctree(...) 213 214Please see the docstring of publish_doctree for a list of parameters. 215 216.. XXX Need to write a well-readable list of (commonly used) options 217 of the publish_* functions. Probably in api/publisher.txt. 218 219``document`` is the root node of the document tree. You can now 220change the document by accessing the ``document`` node and its 221children |---| see `The Node Interface`_ below. 222 223When you're done with modifying the document tree, you can write it 224out by calling:: 225 226 output = docutils.core.publish_from_doctree(document, ...) 227 228.. _publish_doctree: ../api/publisher.html#publish_doctree 229.. _publish_from_doctree: ../api/publisher.html#publish_from_doctree 230 231 232The Node Interface 233------------------ 234 235As described in the overview above, Docutils' internal representation 236of a document is a tree of nodes. We'll now have a look at the 237interface of these nodes. 238 239(To be completed.) 240 241 242What Now? 243========= 244 245This document is not complete. Many topics could (and should) be 246covered here. To find out with which topics we should write about 247first, we are awaiting *your* feedback. So please ask your questions 248on the Docutils-develop_ mailing list. 249 250 251.. _Docutils-develop: ../user/mailing-lists.html#docutils-develop 252 253 254.. |---| unicode:: 8212 .. em-dash 255 :trim: 256 257 258.. 259 Local Variables: 260 mode: indented-text 261 indent-tabs-mode: nil 262 sentence-end-double-space: t 263 fill-column: 70 264 End: 265