1==============================
2 Problems With StructuredText
3==============================
4:Author: David Goodger
5:Contact: docutils-develop@lists.sourceforge.net
6:Revision: $Revision: 7302 $
7:Date: $Date: 2012-01-03 20:23:53 +0100 (Di, 03. Jän 2012) $
8:Copyright: This document has been placed in the public domain.
9
10There are several problems, unresolved issues, and areas of
11controversy within StructuredText_ (Classic and Next Generation).  In
12order to resolve all these issues, this analysis brings all of the
13issues out into the open, enumerates all the alternatives, and
14proposes solutions to be incorporated into the reStructuredText_
15specification.
16
17
18.. contents::
19
20
21Formal Specification
22====================
23
24The description in the original StructuredText.py has been criticized
25for being vague.  For practical purposes, "the code *is* the spec."
26Tony Ibbs has been working on deducing a `detailed description`_ from
27the documentation and code of StructuredTextNG_.  Edward Loper's
28STMinus_ is another attempt to formalize a spec.
29
30For this kind of a project, the specification should always precede
31the code.  Otherwise, the markup is a moving target which can never be
32adopted as a standard.  Of course, a specification may be revised
33during lifetime of the code, but without a spec there is no visible
34control and thus no confidence.
35
36
37Understanding and Extending the Code
38====================================
39
40The original StructuredText_ is a dense mass of sparsely commented
41code and inscrutable regular expressions.  It was not designed to be
42extended and is very difficult to understand.  StructuredTextNG_ has
43been designed to allow input (syntax) and output extensions, but its
44documentation (both internal [comments & docstrings], and external) is
45inadequate for the complexity of the code itself.
46
47For reStructuredText to become truly useful, perhaps even part of
48Python's standard library, it must have clear, understandable
49documentation and implementation code.  For the implementation of
50reStructuredText to be taken seriously, it must be a sterling example
51of the potential of docstrings; the implementation must practice what
52the specification preaches.
53
54
55Section Structure via Indentation
56=================================
57
58Setext_ required that body text be indented by 2 spaces.  The original
59StructuredText_ and StructuredTextNG_ require that section structure
60be indicated through indentation, as "inspired by Python".  For
61certain structures with a very limited, local extent (such as lists,
62block quotes, and literal blocks), indentation naturally indicates
63structure or hierarchy.  For sections (which may have a very large
64extent), structure via indentation is unnecessary, unnatural and
65ambiguous.  Rather, the syntax of the section title *itself* should
66indicate that it is a section title.
67
68The original StructuredText states that "A single-line paragraph whose
69immediately succeeding paragraphs are lower level is treated as a
70header." Requiring indentation in this way is:
71
72- Unnecessary.  The vast majority of docstrings and standalone
73  documents will have no more than one level of section structure.
74  Requiring indentation for such docstrings is unnecessary and
75  irritating.
76
77- Unnatural.  Most published works use title style (type size, face,
78  weight, and position) and/or section/subsection numbering rather
79  than indentation to indicate hierarchy.  This is a tradition with a
80  very long history.
81
82- Ambiguous.  A StructuredText header is indistinguishable from a
83  one-line paragraph followed by a block quote (precluding the use of
84  block quotes).  Enumerated section titles are ambiguous (is it a
85  header? is it a list item?).  Some additional adornment must be
86  required to confirm the line's role as a title, both to a parser and
87  to the human reader of the source text.
88
89Python's use of significant whitespace is a wonderful (if not
90original) innovation, however requiring indentation in ordinary
91written text is hypergeneralization.
92
93reStructuredText_ indicates section structure through title adornment
94style (as exemplified by this document).  This is far more natural.
95In fact, it is already in widespread use in plain text documents,
96including in Python's standard distribution (such as the toplevel
97README_ file).
98
99
100Character Escaping Mechanism
101============================
102
103No matter what characters are chosen for markup, some day someone will
104want to write documentation *about* that markup or using markup
105characters in a non-markup context.  Therefore, any complete markup
106language must have an escaping or encoding mechanism.  For a
107lightweight markup system, encoding mechanisms like SGML/XML's '*'
108are out.  So an escaping mechanism is in.  However, with carefully
109chosen markup, it should be necessary to use the escaping mechanism
110only infrequently.
111
112reStructuredText_ needs an escaping mechanism: a way to treat
113markup-significant characters as the characters themselves.  Currently
114there is no such mechanism (although ZWiki uses '!').  What are the
115candidates?
116
1171. ``!``
118   (http://www.zope.org/DevHome/Members/jim/StructuredTextWiki/NGEscaping)
1192. ``\``
1203. ``~``
1214. doubling of characters
122
123The best choice for this is the backslash (``\``).  It's "the single
124most popular escaping character in the world!", therefore familiar and
125unsurprising.  Since characters only need to be escaped under special
126circumstances, which are typically those explaining technical
127programming issues, the use of the backslash is natural and
128understandable.  Python docstrings can be raw (prefixed with an 'r',
129as in 'r""'), which would obviate the need for gratuitous doubling-up
130of backslashes.
131
132(On 2001-03-29 on the Doc-SIG mailing list, GvR endorsed backslash
133escapes, saying, "'nuff said.  Backslash it is." Although neither
134legally binding nor irrevocable nor any kind of guarantee of anything,
135it is a good sign.)
136
137The rule would be: An unescaped backslash followed by any markup
138character escapes the character.  The escaped character represents the
139character itself, and is prevented from playing a role in any markup
140interpretation.  The backslash is removed from the output.  A literal
141backslash is represented by an "escaped backslash," two backslashes in
142a row.
143
144A carefully constructed set of recognition rules for inline markup
145will obviate the need for backslash-escapes in almost all cases; see
146`Delimitation of Inline Markup`_ below.
147
148When an expression (requiring backslashes and other characters used
149for markup) becomes too complicated and therefore unreadable, a
150literal block may be used instead.  Inside literal blocks, no markup
151is recognized, therefore backslashes (for the purpose of escaping
152markup) become unnecessary.
153
154We could allow backslashes preceding non-markup characters to remain
155in the output.  This would make describing regular expressions and
156other uses of backslashes easier.  However, this would complicate the
157markup rules and would be confusing.
158
159
160Blank Lines in Lists
161====================
162
163Oft-requested in Doc-SIG (the earliest reference is dated 1996-08-13)
164is the ability to write lists without requiring blank lines between
165items.  In docstrings, space is at a premium.  Authors want to convey
166their API or usage information in as compact a form as possible.
167StructuredText_ requires blank lines between all body elements,
168including list items, even when boundaries are obvious from the markup
169itself.
170
171In reStructuredText, blank lines are optional between list items.
172However, in order to eliminate ambiguity, a blank line is required
173before the first list item and after the last.  Nested lists also
174require blank lines before the list start and after the list end.
175
176
177Bullet List Markup
178==================
179
180StructuredText_ includes 'o' as a bullet character.  This is dangerous
181and counter to the language-independent nature of the markup.  There
182are many languages in which 'o' is a word.  For example, in Spanish::
183
184    Llamame a la casa
185    o al trabajo.
186
187    (Call me at home or at work.)
188
189And in Japanese (when romanized)::
190
191    Senshuu no doyoubi ni tegami
192    o kakimashita.
193
194    ([I] wrote a letter on Saturday last week.)
195
196If a paragraph containing an 'o' word wraps such that the 'o' is the
197first text on a line, or if a paragraph begins with such a word, it
198could be misinterpreted as a bullet list.
199
200In reStructuredText_, 'o' is not used as a bullet character.  '-',
201'*', and '+' are the possible bullet characters.
202
203
204Enumerated List Markup
205======================
206
207StructuredText enumerated lists are allowed to begin with numbers and
208letters followed by a period or right-parenthesis, then whitespace.
209This has surprising consequences for writing styles.  For example,
210this is recognized as an enumerated list item by StructuredText::
211
212    Mr. Creosote.
213
214People will write enumerated lists in all different ways.  It is folly
215to try to come up with the "perfect" format for an enumerated list,
216and limit the docstring parser's recognition to that one format only.
217
218Rather, the parser should recognize a variety of enumerator styles.
219It is also recommended that the enumerator of the first list item be
220ordinal-1 ('1', 'A', 'a', 'I', or 'i'), as output formats may not be
221able to begin a list at an arbitrary enumeration.
222
223An initial idea was to require two or more consistent enumerated list
224items in a row.  This idea proved impractical and was dropped.  In
225practice, the presence of a proper enumerator is enough to reliably
226recognize an enumerated list item; any ambiguities are reported by the
227parser.  Here's the original idea for posterity:
228
229    The parser should recognize a variety of enumerator styles, mark
230    each block as a potential enumerated list item (PELI), and
231    interpret the enumerators of adjacent PELIs to decide whether they
232    make up a consistent enumerated list.
233
234    If a PELI is labeled with a "1.", and is immediately followed by a
235    PELI labeled with a "2.", we've got an enumerated list.  Or "(A)"
236    followed by "(B)".  Or "i)" followed by "ii)", etc.  The chances
237    of accidentally recognizing two adjacent and consistently labeled
238    PELIs, are acceptably small.
239
240    For an enumerated list to be recognized, the following must be
241    true:
242
243    - the list must consist of multiple adjacent list items (2 or
244      more)
245    - the enumerators must all have the same format
246    - the enumerators must be sequential
247
248
249Definition List Markup
250======================
251
252StructuredText uses ' -- ' (whitespace, two hyphens, whitespace) on
253the first line of a paragraph to indicate a definition list item.  The
254' -- ' serves to separate the term (on the left) from the definition
255(on the right).
256
257Many people use ' -- ' as an em-dash in their text, conflicting with
258the StructuredText usage.  Although the Chicago Manual of Style says
259that spaces should not be used around an em-dash, Peter Funk pointed
260out that this is standard usage in German (according to the Duden, the
261official German reference), and possibly in other languages as well.
262The widespread use of ' -- ' precludes its use for definition lists;
263it would violate the "unsurprising" criterion.
264
265A simpler, and at least equally visually distinctive construct
266(proposed by Guido van Rossum, who incidentally is a frequent user of
267' -- ') would do just as well::
268
269    term 1
270        Definition.
271
272    term 2
273        Definition 2, paragraph 1.
274
275        Definition 2, paragraph 2.
276
277A reStructuredText definition list item consists of a term and a
278definition.  A term is a simple one-line paragraph.  A definition is a
279block indented relative to the term, and may contain multiple
280paragraphs and other body elements.  No blank line precedes a
281definition (this distinguishes definition lists from block quotes).
282
283
284Literal Blocks
285==============
286
287The StructuredText_ specification has literal blocks indicated by
288'example', 'examples', or '::' ending the preceding paragraph.  STNG
289only recognizes '::'; 'example'/'examples' are not implemented.  This
290is good; it fixes an unnecessary language dependency.  The problem is
291what to do with the sometimes- unwanted '::'.
292
293In reStructuredText_ '::' at the end of a paragraph indicates that
294subsequent *indented* blocks are treated as literal text.  No further
295markup interpretation is done within literal blocks (not even
296backslash-escapes).  If the '::' is preceded by whitespace, '::' is
297omitted from the output; if '::' was the sole content of a paragraph,
298the entire paragraph is removed (no 'empty' paragraph remains).  If
299'::' is preceded by a non-whitespace character, '::' is replaced by
300':' (i.e., the extra colon is removed).
301
302Thus, a section could begin with a literal block as follows::
303
304    Section Title
305    -------------
306
307    ::
308
309        print "this is example literal"
310
311
312Tables
313======
314
315The table markup scheme in classic StructuredText was horrible.  Its
316omission from StructuredTextNG is welcome, and its markup will not be
317repeated here.  However, tables themselves are useful in
318documentation.  Alternatives:
319
3201. This format is the most natural and obvious.  It was independently
321   invented (no great feat of creation!), and later found to be the
322   format supported by the `Emacs table mode`_::
323
324       +------------+------------+------------+--------------+
325       |  Header 1  |  Header 2  |  Header 3  |  Header 4    |
326       +============+============+============+==============+
327       |  Column 1  |  Column 2  | Column 3 & 4 span (Row 1) |
328       +------------+------------+------------+--------------+
329       |    Column 1 & 2 span    |  Column 3  | - Column 4   |
330       +------------+------------+------------+ - Row 2 & 3  |
331       |      1     |      2     |      3     | - span       |
332       +------------+------------+------------+--------------+
333
334   Tables are described with a visual outline made up of the
335   characters '-', '=', '|', and '+':
336
337   - The hyphen ('-') is used for horizontal lines (row separators).
338   - The equals sign ('=') is optionally used as a header separator
339     (as of version 1.5.24, this is not supported by the Emacs table
340     mode).
341   - The vertical bar ('|') is used for for vertical lines (column
342     separators).
343   - The plus sign ('+') is used for intersections of horizontal and
344     vertical lines.
345
346   Row and column spans are possible simply by omitting the column or
347   row separators, respectively.  The header row separator must be
348   complete; in other words, a header cell may not span into the table
349   body.  Each cell contains body elements, and may have multiple
350   paragraphs, lists, etc.  Initial spaces for a left margin are
351   allowed; the first line of text in a cell determines its left
352   margin.
353
3542. Below is a simpler table structure.  It may be better suited to
355   manual input than alternative #1, but there is no Emacs editing
356   mode available.  One disadvantage is that it resembles section
357   titles; a one-column table would look exactly like section &
358   subsection titles. ::
359
360       ============ ============ ============ ==============
361         Header 1     Header 2     Header 3     Header 4
362       ============ ============ ============ ==============
363         Column 1     Column 2    Column 3 & 4 span (Row 1)
364       ------------ ------------ ---------------------------
365           Column 1 & 2 span       Column 3    - Column 4
366       ------------------------- ------------  - Row 2 & 3
367             1            2            3       - span
368       ============ ============ ============ ==============
369
370   The table begins with a top border of equals signs with a space at
371   each column boundary (regardless of spans).  Each row is
372   underlined.  Internal row separators are underlines of '-', with
373   spaces at column boundaries.  The last of the optional head rows is
374   underlined with '=', again with spaces at column boundaries.
375   Column spans have no spaces in their underline.  Row spans simply
376   lack an underline at the row boundary.  The bottom boundary of the
377   table consists of '=' underlines.  A blank line is required
378   following a table.
379
3803. A minimalist alternative is as follows::
381
382       ====  =====  ========  ========  =======  ====  =====  =====
383       Old State    Input     Action             New State    Notes
384       -----------  --------  -----------------  -----------
385       ids   types  new type  sys.msg.  dupname  ids   types
386       ====  =====  ========  ========  =======  ====  =====  =====
387       --    --     explicit  --        --       new   True
388       --    --     implicit  --        --       new   False
389       None  False  explicit  --        --       new   True
390       old   False  explicit  implicit  old      new   True
391       None  True   explicit  explicit  new      None  True
392       old   True   explicit  explicit  new,old  None  True   [1]
393       None  False  implicit  implicit  new      None  False
394       old   False  implicit  implicit  new,old  None  False
395       None  True   implicit  implicit  new      None  True
396       old   True   implicit  implicit  new      old   True
397       ====  =====  ========  ========  =======  ====  =====  =====
398
399   The table begins with a top border of equals signs with one or more
400   spaces at each column boundary (regardless of spans).  There must
401   be at least two columns in the table (to differentiate it from
402   section headers).  Each line starts a new row.  The rightmost
403   column is unbounded; text may continue past the edge of the table.
404   Each row/line must contain spaces at column boundaries, except for
405   explicit column spans.  Underlines of '-' can be used to indicate
406   column spans, but should be used sparingly if at all.  Lines
407   containing column span underlines may not contain any other text.
408   The last of the optional head rows is underlined with '=', again
409   with spaces at column boundaries.  The bottom boundary of the table
410   consists of '=' underlines.  A blank line is required following a
411   table.
412
413   This table sums up the features.  Using all the features in such a
414   small space is not pretty though::
415
416       ========  ========  ========
417                 Header 2 & 3 Span
418                 ------------------
419       Header 1  Header 2  Header 3
420       ========  ========  ========
421       Each      line is   a new row.
422       Each row  consists  of one line only.
423       Row       spans     are not possible.
424       The last  column    may spill over to the right.
425       Column spans are possible with an underline joining columns.
426       ----------------------------
427       The span  is        limited to the row above the underline.
428       ========  ========  ========
429
4304. As a variation of alternative 3, bullet list syntax in the first
431   column could be used to indicate row starts.  Multi-line rows are
432   possible, but row spans are not.  For example::
433
434       ===== =====
435       col 1 col 2
436       ===== =====
437       - 1   Second column of row 1.
438       - 2   Second column of row 2.
439             Second line of paragraph.
440       - 3   Second column of row 3.
441
442             Second paragraph of row 3,
443             column 2
444       ===== =====
445
446   Column spans would be indicated on the line after the last line of
447   the row.  To indicate a real bullet list within a first-column
448   cell, simply nest the bullets.
449
4505. In a further variation, we could simply assume that whitespace in
451   the first column implies a multi-line row; the text in other
452   columns is continuation text.  For example::
453
454       ===== =====
455       col 1 col 2
456       ===== =====
457       1     Second column of row 1.
458       2     Second column of row 2.
459             Second line of paragraph.
460       3     Second column of row 3.
461
462             Second paragraph of row 3,
463             column 2
464       ===== =====
465
466   Limitations of this approach:
467
468   - Cells in the first column are limited to one line of text.
469
470   - Cells in the first column *must* contain some text; blank cells
471     would lead to a misinterpretation.  An empty comment ("..") is
472     sufficient.
473
4746. Combining alternative 3 and 4, a bullet list in the first column
475   could mean multi-line rows, and no bullet list means single-line
476   rows only.
477
478Alternatives 1 and 5 has been adopted by reStructuredText.
479
480
481Delimitation of Inline Markup
482=============================
483
484StructuredText specifies that inline markup must begin with
485whitespace, precluding such constructs as parenthesized or quoted
486emphatic text::
487
488    "**What?**" she cried.  (*exit stage left*)
489
490The `reStructuredText markup specification`_ allows for such
491constructs and disambiguates inline markup through a set of
492recognition rules.  These recognition rules define the context of
493markup start-strings and end-strings, allowing markup characters to be
494used in most non-markup contexts without a problem (or a backslash).
495So we can say, "Use asterisks (*) around words or phrases to
496*emphasisze* them." The '(*)' will not be recognized as markup.  This
497reduces the need for markup escaping to the point where an escape
498character is *almost* (but not quite!) unnecessary.
499
500
501Underlining
502===========
503
504StructuredText uses '_text_' to indicate underlining.  To quote David
505Ascher in his 2000-01-21 Doc-SIG mailing list post, "Docstring
506grammar: a very revised proposal":
507
508    The tagging of underlined text with _'s is suboptimal.  Underlines
509    shouldn't be used from a typographic perspective (underlines were
510    designed to be used in manuscripts to communicate to the
511    typesetter that the text should be italicized -- no well-typeset
512    book ever uses underlines), and conflict with double-underscored
513    Python variable names (__init__ and the like), which would get
514    truncated and underlined when that effect is not desired.  Note
515    that while *complete* markup would prevent that truncation
516    ('__init__'), I think of docstring markups much like I think of
517    type annotations -- they should be optional and above all do no
518    harm.  In this case the underline markup does harm.
519
520Underlining is not part of the reStructuredText specification.
521
522
523Inline Literals
524===============
525
526StructuredText's markup for inline literals (text left as-is,
527verbatim, usually in a monospaced font; as in HTML <TT>) is single
528quotes ('literals').  The problem with single quotes is that they are
529too often used for other purposes:
530
531- Apostrophes: "Don't blame me, 'cause it ain't mine, it's Chris'.";
532
533- Quoting text:
534
535      First Bruce: "Well Bruce, I heard the prime minister use it.
536      'S'hot enough to boil a monkey's bum in 'ere your Majesty,' he
537      said, and she smiled quietly to herself."
538
539  In the UK, single quotes are used for dialogue in published works.
540
541- String literals: s = ''
542
543Alternatives::
544
545    'text'    \'text\'    ''text''    "text"    \"text\"    ""text""
546    #text#     @text@      `text`     ^text^    ``text''    ``text``
547
548The examples below contain inline literals, quoted text, and
549apostrophes.  Each example should evaluate to the following HTML::
550
551    Some <TT>code</TT>, with a 'quote', "double", ain't it grand?
552    Does <TT>a[b] = 'c' + "d" + `2^3`</TT> work?
553
554    0. Some code, with a quote, double, ain't it grand?
555       Does a[b] = 'c' + "d" + `2^3` work?
556    1. Some 'code', with a \'quote\', "double", ain\'t it grand?
557       Does 'a[b] = \'c\' + "d" + `2^3`' work?
558    2. Some \'code\', with a 'quote', "double", ain't it grand?
559       Does \'a[b] = 'c' + "d" + `2^3`\' work?
560    3. Some ''code'', with a 'quote', "double", ain't it grand?
561       Does ''a[b] = 'c' + "d" + `2^3`'' work?
562    4. Some "code", with a 'quote', \"double\", ain't it grand?
563       Does "a[b] = 'c' + "d" + `2^3`" work?
564    5. Some \"code\", with a 'quote', "double", ain't it grand?
565       Does \"a[b] = 'c' + "d" + `2^3`\" work?
566    6. Some ""code"", with a 'quote', "double", ain't it grand?
567       Does ""a[b] = 'c' + "d" + `2^3`"" work?
568    7. Some #code#, with a 'quote', "double", ain't it grand?
569       Does #a[b] = 'c' + "d" + `2^3`# work?
570    8. Some @code@, with a 'quote', "double", ain't it grand?
571       Does @a[b] = 'c' + "d" + `2^3`@ work?
572    9. Some `code`, with a 'quote', "double", ain't it grand?
573       Does `a[b] = 'c' + "d" + \`2^3\`` work?
574    10. Some ^code^, with a 'quote', "double", ain't it grand?
575        Does ^a[b] = 'c' + "d" + `2\^3`^ work?
576    11. Some ``code'', with a 'quote', "double", ain't it grand?
577        Does ``a[b] = 'c' + "d" + `2^3`'' work?
578    12. Some ``code``, with a 'quote', "double", ain't it grand?
579        Does ``a[b] = 'c' + "d" + `2^3\``` work?
580
581Backquotes (#9 & #12) are the best choice.  They are unobtrusive and
582relatviely rarely used (more rarely than ' or ", anyhow).  Backquotes
583have the connotation of 'quotes', which other options (like carets,
584#10) don't.
585
586Analogously with ``*emph*`` & ``**strong**``, double-backquotes (#12)
587could be used for inline literals.  If single-backquotes are used for
588'interpreted text' (context-sensitive domain-specific descriptive
589markup) such as function name hyperlinks in Python docstrings, then
590double-backquotes could be used for absolute-literals, wherein no
591processing whatsoever takes place.  An advantage of double-backquotes
592would be that backslash-escaping would no longer be necessary for
593embedded single-backquotes; however, embedded double-backquotes (in an
594end-string context) would be illegal.  See `Backquotes in
595Phrase-Links`__ in `Record of reStructuredText Syntax Alternatives`__.
596
597__ alternatives.html#backquotes-in-phrase-links
598__ alternatives.html
599
600Alternative choices are carets (#10) and TeX-style quotes (#11).  For
601examples of TeX-style quoting, see
602http://www.zope.org/Members/jim/StructuredTextWiki/CustomizingTheDocumentProcessor.
603
604Some existing uses of backquotes:
605
6061. As a synonym for repr() in Python.
6072. For command-interpolation in shell scripts.
6083. Used as open-quotes in TeX code (and carried over into plaintext
609   by TeXies).
610
611The inline markup start-string and end-string recognition rules
612defined by the `reStructuredText markup specification`_ would allow
613all of these cases inside inline literals, with very few exceptions.
614As a fallback, literal blocks could handle all cases.
615
616Outside of inline literals, the above uses of backquotes would require
617backslash-escaping.  However, these are all prime examples of text
618that should be marked up with inline literals.
619
620If either backquotes or straight single-quotes are used as markup,
621TeX-quotes are too troublesome to support, so no special-casing of
622TeX-quotes should be done (at least at first).  If TeX-quotes have to
623be used outside of literals, a single backslash-escaped would suffice:
624\``TeX quote''.  Ugly, true, but very infrequently used.
625
626Using literal blocks is a fallback option which removes the need for
627backslash-escaping::
628
629    like this::
630
631        Here, we can do ``absolutely'' anything `'`'\|/|\ we like!
632
633No mechanism for inline literals is perfect, just as no escaping
634mechanism is perfect.  No matter what we use, complicated inline
635expressions involving the inline literal quote and/or the backslash
636will end up looking ugly.  We can only choose the least often ugly
637option.
638
639reStructuredText will use double backquotes for inline literals, and
640single backqoutes for interpreted text.
641
642
643Hyperlinks
644==========
645
646There are three forms of hyperlink currently in StructuredText_:
647
6481. (Absolute & relative URIs.)  Text enclosed by double quotes
649   followed by a colon, a URI, and concluded by punctuation plus white
650   space, or just white space, is treated as a hyperlink::
651
652       "Python":http://www.python.org/
653
6542. (Absolute URIs only.)  Text enclosed by double quotes followed by a
655   comma, one or more spaces, an absolute URI and concluded by
656   punctuation plus white space, or just white space, is treated as a
657   hyperlink::
658
659       "mail me", mailto:me@mail.com
660
6613. (Endnotes.)  Text enclosed by brackets link to an endnote at the
662   end of the document: at the beginning of the line, two dots, a
663   space, and the same text in brackets, followed by the end note
664   itself::
665
666       Please refer to the fine manual [GVR2001].
667
668       .. [GVR2001] Python Documentation, Release 2.1, van Rossum,
669          Drake, et al., http://www.python.org/doc/
670
671The problem with forms 1 and 2 is that they are neither intuitive nor
672unobtrusive (they break design goals 5 & 2).  They overload
673double-quotes, which are too often used in ordinary text (potentially
674breaking design goal 4).  The brackets in form 3 are also too common
675in ordinary text (such as [nested] asides and Python lists like [12]).
676
677Alternatives:
678
6791. Have no special markup for hyperlinks.
680
6812. A. Interpret and mark up hyperlinks as any contiguous text
682      containing '://' or ':...@' (absolute URI) or '@' (email
683      address) after an alphanumeric word.  To de-emphasize the URI,
684      simply enclose it in parentheses:
685
686          Python (http://www.python.org/)
687
688   B. Leave special hyperlink markup as a domain-specific extension.
689      Hyperlinks in ordinary reStructuredText documents would be
690      required to be standalone (i.e. the URI text inline in the
691      document text).  Processed hyperlinks (where the URI text is
692      hidden behind the link) are important enough to warrant syntax.
693
6943. The original Setext_ introduced a mechanism of indirect hyperlinks.
695   A source link word ('hot word') in the text was given a trailing
696   underscore::
697
698       Here is some text with a hyperlink_ built in.
699
700   The hyperlink itself appeared at the end of the document on a line
701   by itself, beginning with two dots, a space, the link word with a
702   leading underscore, whitespace, and the URI itself::
703
704       .. _hyperlink http://www.123.xyz
705
706   Setext used ``underscores_instead_of_spaces_`` for phrase links.
707
708With some modification, alternative 3 best satisfies the design goals.
709It has the advantage of being readable and relatively unobtrusive.
710Since each source link must match up to a target, the odd variable
711ending in an underscore can be spared being marked up (although it
712should generate a "no such link target" warning).  The only
713disadvantage is that phrase-links aren't possible without some
714obtrusive syntax.
715
716We could achieve phrase-links if we enclose the link text:
717
7181. in double quotes::
719
720       "like this"_
721
7222. in brackets::
723
724       [like this]_
725
7263. or in backquotes::
727
728       `like this`_
729
730Each gives us somewhat obtrusive markup, but that is unavoidable.  The
731bracketed syntax (#2) is reminiscent of links on many web pages
732(intuitive), although it is somewhat obtrusive.  Alternative #3 is
733much less obtrusive, and is consistent with interpreted text: the
734trailing underscore indicates the interpretation of the phrase, as a
735hyperlink.  #3 also disambiguates hyperlinks from footnote references.
736Alternative #3 wins.
737
738The same trailing underscore markup can also be used for footnote and
739citation references, removing the problem with ordinary bracketed text
740and Python lists::
741
742    Please refer to the fine manual [GVR2000]_.
743
744    .. [GVR2000] Python Documentation, van Rossum, Drake, et al.,
745       http://www.python.org/doc/
746
747The two-dots-and-a-space syntax was generalized by Setext for
748comments, which are removed from the (visible) processed output.
749reStructuredText uses this syntax for comments, footnotes, and link
750target, collectively termed "explicit markup".  For link targets, in
751order to eliminate ambiguity with comments and footnotes,
752reStructuredText specifies that a colon always follow the link target
753word/phrase.  The colon denotes 'maps to'.  There is no reason to
754restrict target links to the end of the document; they could just as
755easily be interspersed.
756
757Internal hyperlinks (links from one point to another within a single
758document) can be expressed by a source link as before, and a target
759link with a colon but no URI.  In effect, these targets 'map to' the
760element immediately following.
761
762As an added bonus, we now have a perfect candidate for
763reStructuredText directives, a simple extension mechanism: explicit
764markup containing a single word followed by two colons and whitespace.
765The interpretation of subsequent data on the directive line or
766following is directive-dependent.
767
768To summarize::
769
770    .. This is a comment.
771
772    .. The line below is an example of a directive.
773    .. version:: 1
774
775    This is a footnote [1]_.
776
777    This internal hyperlink will take us to the footnotes_ area below.
778
779    Here is a one-word_ external hyperlink.
780
781    Here is `a hyperlink phrase`_.
782
783    .. _footnotes:
784    .. [1] Footnote text goes here.
785
786    .. external hyperlink target mappings:
787    .. _one-word: http://www.123.xyz
788    .. _a hyperlink phrase: http://www.123.xyz
789
790The presence or absence of a colon after the target link
791differentiates an indirect hyperlink from a footnote, respectively.  A
792footnote requires brackets.  Backquotes around a target link word or
793phrase are required if the phrase contains a colon, optional
794otherwise.
795
796Below are examples using no markup, the two StructuredText hypertext
797styles, and the reStructuredText hypertext style.  Each example
798contains an indirect link, a direct link, a footnote/endnote, and
799bracketed text.  In HTML, each example should evaluate to::
800
801    <P>A <A HREF="http://spam.org">URI</A>, see <A HREF="#eggs2000">
802    [eggs2000]</A> (in Bacon [Publisher]).  Also see
803    <A HREF="http://eggs.org">http://eggs.org</A>.</P>
804
805    <P><A NAME="eggs2000">[eggs2000]</A> "Spam, Spam, Spam, Eggs,
806    Bacon, and Spam"</P>
807
8081. No markup::
809
810       A URI http://spam.org, see eggs2000 (in Bacon [Publisher]).
811       Also see http://eggs.org.
812
813       eggs2000 "Spam, Spam, Spam, Eggs, Bacon, and Spam"
814
8152. StructuredText absolute/relative URI syntax
816   ("text":http://www.url.org)::
817
818       A "URI":http://spam.org, see [eggs2000] (in Bacon [Publisher]).
819       Also see "http://eggs.org":http://eggs.org.
820
821       .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam"
822
823   Note that StructuredText does not recognize standalone URIs,
824   forcing doubling up as shown in the second line of the example
825   above.
826
8273. StructuredText absolute-only URI syntax
828   ("text", mailto:you@your.com)::
829
830       A "URI", http://spam.org, see [eggs2000] (in Bacon
831       [Publisher]).  Also see "http://eggs.org", http://eggs.org.
832
833       .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam"
834
8354. reStructuredText syntax::
836
837    4. A URI_, see [eggs2000]_ (in Bacon [Publisher]).
838       Also see http://eggs.org.
839
840       .. _URI: http:/spam.org
841       .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam"
842
843The bracketed text '[Publisher]' may be problematic with
844StructuredText (syntax 2 & 3).
845
846reStructuredText's syntax (#4) is definitely the most readable.  The
847text is separated from the link URI and the footnote, resulting in
848cleanly readable text.
849
850.. _StructuredText:
851   http://www.zope.org/DevHome/Members/jim/StructuredTextWiki/FrontPage
852.. _Setext: http://docutils.sourceforge.net/mirror/setext.html
853.. _reStructuredText: http://docutils.sourceforge.net/rst.html
854.. _detailed description:
855   http://homepage.ntlworld.com/tibsnjoan/docutils/STNG-format.html
856.. _STMinus: http://www.cis.upenn.edu/~edloper/pydoc/stminus.html
857.. _StructuredTextNG:
858   http://www.zope.org/DevHome/Members/jim/StructuredTextWiki/StructuredTextNG
859.. _README: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/
860   python/python/dist/src/README
861.. _Emacs table mode: http://table.sourceforge.net/
862.. _reStructuredText Markup Specification:
863   ../../ref/rst/restructuredtext.html
864
865
866..
867   Local Variables:
868   mode: indented-text
869   indent-tabs-mode: nil
870   sentence-end-double-space: t
871   fill-column: 70
872   End:
873