1============================== 2 Problems With StructuredText 3============================== 4:Author: David Goodger 5:Contact: docutils-develop@lists.sourceforge.net 6:Revision: $Revision: 7302 $ 7:Date: $Date: 2012-01-03 20:23:53 +0100 (Di, 03. Jän 2012) $ 8:Copyright: This document has been placed in the public domain. 9 10There are several problems, unresolved issues, and areas of 11controversy within StructuredText_ (Classic and Next Generation). In 12order to resolve all these issues, this analysis brings all of the 13issues out into the open, enumerates all the alternatives, and 14proposes solutions to be incorporated into the reStructuredText_ 15specification. 16 17 18.. contents:: 19 20 21Formal Specification 22==================== 23 24The description in the original StructuredText.py has been criticized 25for being vague. For practical purposes, "the code *is* the spec." 26Tony Ibbs has been working on deducing a `detailed description`_ from 27the documentation and code of StructuredTextNG_. Edward Loper's 28STMinus_ is another attempt to formalize a spec. 29 30For this kind of a project, the specification should always precede 31the code. Otherwise, the markup is a moving target which can never be 32adopted as a standard. Of course, a specification may be revised 33during lifetime of the code, but without a spec there is no visible 34control and thus no confidence. 35 36 37Understanding and Extending the Code 38==================================== 39 40The original StructuredText_ is a dense mass of sparsely commented 41code and inscrutable regular expressions. It was not designed to be 42extended and is very difficult to understand. StructuredTextNG_ has 43been designed to allow input (syntax) and output extensions, but its 44documentation (both internal [comments & docstrings], and external) is 45inadequate for the complexity of the code itself. 46 47For reStructuredText to become truly useful, perhaps even part of 48Python's standard library, it must have clear, understandable 49documentation and implementation code. For the implementation of 50reStructuredText to be taken seriously, it must be a sterling example 51of the potential of docstrings; the implementation must practice what 52the specification preaches. 53 54 55Section Structure via Indentation 56================================= 57 58Setext_ required that body text be indented by 2 spaces. The original 59StructuredText_ and StructuredTextNG_ require that section structure 60be indicated through indentation, as "inspired by Python". For 61certain structures with a very limited, local extent (such as lists, 62block quotes, and literal blocks), indentation naturally indicates 63structure or hierarchy. For sections (which may have a very large 64extent), structure via indentation is unnecessary, unnatural and 65ambiguous. Rather, the syntax of the section title *itself* should 66indicate that it is a section title. 67 68The original StructuredText states that "A single-line paragraph whose 69immediately succeeding paragraphs are lower level is treated as a 70header." Requiring indentation in this way is: 71 72- Unnecessary. The vast majority of docstrings and standalone 73 documents will have no more than one level of section structure. 74 Requiring indentation for such docstrings is unnecessary and 75 irritating. 76 77- Unnatural. Most published works use title style (type size, face, 78 weight, and position) and/or section/subsection numbering rather 79 than indentation to indicate hierarchy. This is a tradition with a 80 very long history. 81 82- Ambiguous. A StructuredText header is indistinguishable from a 83 one-line paragraph followed by a block quote (precluding the use of 84 block quotes). Enumerated section titles are ambiguous (is it a 85 header? is it a list item?). Some additional adornment must be 86 required to confirm the line's role as a title, both to a parser and 87 to the human reader of the source text. 88 89Python's use of significant whitespace is a wonderful (if not 90original) innovation, however requiring indentation in ordinary 91written text is hypergeneralization. 92 93reStructuredText_ indicates section structure through title adornment 94style (as exemplified by this document). This is far more natural. 95In fact, it is already in widespread use in plain text documents, 96including in Python's standard distribution (such as the toplevel 97README_ file). 98 99 100Character Escaping Mechanism 101============================ 102 103No matter what characters are chosen for markup, some day someone will 104want to write documentation *about* that markup or using markup 105characters in a non-markup context. Therefore, any complete markup 106language must have an escaping or encoding mechanism. For a 107lightweight markup system, encoding mechanisms like SGML/XML's '*' 108are out. So an escaping mechanism is in. However, with carefully 109chosen markup, it should be necessary to use the escaping mechanism 110only infrequently. 111 112reStructuredText_ needs an escaping mechanism: a way to treat 113markup-significant characters as the characters themselves. Currently 114there is no such mechanism (although ZWiki uses '!'). What are the 115candidates? 116 1171. ``!`` 118 (http://www.zope.org/DevHome/Members/jim/StructuredTextWiki/NGEscaping) 1192. ``\`` 1203. ``~`` 1214. doubling of characters 122 123The best choice for this is the backslash (``\``). It's "the single 124most popular escaping character in the world!", therefore familiar and 125unsurprising. Since characters only need to be escaped under special 126circumstances, which are typically those explaining technical 127programming issues, the use of the backslash is natural and 128understandable. Python docstrings can be raw (prefixed with an 'r', 129as in 'r""'), which would obviate the need for gratuitous doubling-up 130of backslashes. 131 132(On 2001-03-29 on the Doc-SIG mailing list, GvR endorsed backslash 133escapes, saying, "'nuff said. Backslash it is." Although neither 134legally binding nor irrevocable nor any kind of guarantee of anything, 135it is a good sign.) 136 137The rule would be: An unescaped backslash followed by any markup 138character escapes the character. The escaped character represents the 139character itself, and is prevented from playing a role in any markup 140interpretation. The backslash is removed from the output. A literal 141backslash is represented by an "escaped backslash," two backslashes in 142a row. 143 144A carefully constructed set of recognition rules for inline markup 145will obviate the need for backslash-escapes in almost all cases; see 146`Delimitation of Inline Markup`_ below. 147 148When an expression (requiring backslashes and other characters used 149for markup) becomes too complicated and therefore unreadable, a 150literal block may be used instead. Inside literal blocks, no markup 151is recognized, therefore backslashes (for the purpose of escaping 152markup) become unnecessary. 153 154We could allow backslashes preceding non-markup characters to remain 155in the output. This would make describing regular expressions and 156other uses of backslashes easier. However, this would complicate the 157markup rules and would be confusing. 158 159 160Blank Lines in Lists 161==================== 162 163Oft-requested in Doc-SIG (the earliest reference is dated 1996-08-13) 164is the ability to write lists without requiring blank lines between 165items. In docstrings, space is at a premium. Authors want to convey 166their API or usage information in as compact a form as possible. 167StructuredText_ requires blank lines between all body elements, 168including list items, even when boundaries are obvious from the markup 169itself. 170 171In reStructuredText, blank lines are optional between list items. 172However, in order to eliminate ambiguity, a blank line is required 173before the first list item and after the last. Nested lists also 174require blank lines before the list start and after the list end. 175 176 177Bullet List Markup 178================== 179 180StructuredText_ includes 'o' as a bullet character. This is dangerous 181and counter to the language-independent nature of the markup. There 182are many languages in which 'o' is a word. For example, in Spanish:: 183 184 Llamame a la casa 185 o al trabajo. 186 187 (Call me at home or at work.) 188 189And in Japanese (when romanized):: 190 191 Senshuu no doyoubi ni tegami 192 o kakimashita. 193 194 ([I] wrote a letter on Saturday last week.) 195 196If a paragraph containing an 'o' word wraps such that the 'o' is the 197first text on a line, or if a paragraph begins with such a word, it 198could be misinterpreted as a bullet list. 199 200In reStructuredText_, 'o' is not used as a bullet character. '-', 201'*', and '+' are the possible bullet characters. 202 203 204Enumerated List Markup 205====================== 206 207StructuredText enumerated lists are allowed to begin with numbers and 208letters followed by a period or right-parenthesis, then whitespace. 209This has surprising consequences for writing styles. For example, 210this is recognized as an enumerated list item by StructuredText:: 211 212 Mr. Creosote. 213 214People will write enumerated lists in all different ways. It is folly 215to try to come up with the "perfect" format for an enumerated list, 216and limit the docstring parser's recognition to that one format only. 217 218Rather, the parser should recognize a variety of enumerator styles. 219It is also recommended that the enumerator of the first list item be 220ordinal-1 ('1', 'A', 'a', 'I', or 'i'), as output formats may not be 221able to begin a list at an arbitrary enumeration. 222 223An initial idea was to require two or more consistent enumerated list 224items in a row. This idea proved impractical and was dropped. In 225practice, the presence of a proper enumerator is enough to reliably 226recognize an enumerated list item; any ambiguities are reported by the 227parser. Here's the original idea for posterity: 228 229 The parser should recognize a variety of enumerator styles, mark 230 each block as a potential enumerated list item (PELI), and 231 interpret the enumerators of adjacent PELIs to decide whether they 232 make up a consistent enumerated list. 233 234 If a PELI is labeled with a "1.", and is immediately followed by a 235 PELI labeled with a "2.", we've got an enumerated list. Or "(A)" 236 followed by "(B)". Or "i)" followed by "ii)", etc. The chances 237 of accidentally recognizing two adjacent and consistently labeled 238 PELIs, are acceptably small. 239 240 For an enumerated list to be recognized, the following must be 241 true: 242 243 - the list must consist of multiple adjacent list items (2 or 244 more) 245 - the enumerators must all have the same format 246 - the enumerators must be sequential 247 248 249Definition List Markup 250====================== 251 252StructuredText uses ' -- ' (whitespace, two hyphens, whitespace) on 253the first line of a paragraph to indicate a definition list item. The 254' -- ' serves to separate the term (on the left) from the definition 255(on the right). 256 257Many people use ' -- ' as an em-dash in their text, conflicting with 258the StructuredText usage. Although the Chicago Manual of Style says 259that spaces should not be used around an em-dash, Peter Funk pointed 260out that this is standard usage in German (according to the Duden, the 261official German reference), and possibly in other languages as well. 262The widespread use of ' -- ' precludes its use for definition lists; 263it would violate the "unsurprising" criterion. 264 265A simpler, and at least equally visually distinctive construct 266(proposed by Guido van Rossum, who incidentally is a frequent user of 267' -- ') would do just as well:: 268 269 term 1 270 Definition. 271 272 term 2 273 Definition 2, paragraph 1. 274 275 Definition 2, paragraph 2. 276 277A reStructuredText definition list item consists of a term and a 278definition. A term is a simple one-line paragraph. A definition is a 279block indented relative to the term, and may contain multiple 280paragraphs and other body elements. No blank line precedes a 281definition (this distinguishes definition lists from block quotes). 282 283 284Literal Blocks 285============== 286 287The StructuredText_ specification has literal blocks indicated by 288'example', 'examples', or '::' ending the preceding paragraph. STNG 289only recognizes '::'; 'example'/'examples' are not implemented. This 290is good; it fixes an unnecessary language dependency. The problem is 291what to do with the sometimes- unwanted '::'. 292 293In reStructuredText_ '::' at the end of a paragraph indicates that 294subsequent *indented* blocks are treated as literal text. No further 295markup interpretation is done within literal blocks (not even 296backslash-escapes). If the '::' is preceded by whitespace, '::' is 297omitted from the output; if '::' was the sole content of a paragraph, 298the entire paragraph is removed (no 'empty' paragraph remains). If 299'::' is preceded by a non-whitespace character, '::' is replaced by 300':' (i.e., the extra colon is removed). 301 302Thus, a section could begin with a literal block as follows:: 303 304 Section Title 305 ------------- 306 307 :: 308 309 print "this is example literal" 310 311 312Tables 313====== 314 315The table markup scheme in classic StructuredText was horrible. Its 316omission from StructuredTextNG is welcome, and its markup will not be 317repeated here. However, tables themselves are useful in 318documentation. Alternatives: 319 3201. This format is the most natural and obvious. It was independently 321 invented (no great feat of creation!), and later found to be the 322 format supported by the `Emacs table mode`_:: 323 324 +------------+------------+------------+--------------+ 325 | Header 1 | Header 2 | Header 3 | Header 4 | 326 +============+============+============+==============+ 327 | Column 1 | Column 2 | Column 3 & 4 span (Row 1) | 328 +------------+------------+------------+--------------+ 329 | Column 1 & 2 span | Column 3 | - Column 4 | 330 +------------+------------+------------+ - Row 2 & 3 | 331 | 1 | 2 | 3 | - span | 332 +------------+------------+------------+--------------+ 333 334 Tables are described with a visual outline made up of the 335 characters '-', '=', '|', and '+': 336 337 - The hyphen ('-') is used for horizontal lines (row separators). 338 - The equals sign ('=') is optionally used as a header separator 339 (as of version 1.5.24, this is not supported by the Emacs table 340 mode). 341 - The vertical bar ('|') is used for for vertical lines (column 342 separators). 343 - The plus sign ('+') is used for intersections of horizontal and 344 vertical lines. 345 346 Row and column spans are possible simply by omitting the column or 347 row separators, respectively. The header row separator must be 348 complete; in other words, a header cell may not span into the table 349 body. Each cell contains body elements, and may have multiple 350 paragraphs, lists, etc. Initial spaces for a left margin are 351 allowed; the first line of text in a cell determines its left 352 margin. 353 3542. Below is a simpler table structure. It may be better suited to 355 manual input than alternative #1, but there is no Emacs editing 356 mode available. One disadvantage is that it resembles section 357 titles; a one-column table would look exactly like section & 358 subsection titles. :: 359 360 ============ ============ ============ ============== 361 Header 1 Header 2 Header 3 Header 4 362 ============ ============ ============ ============== 363 Column 1 Column 2 Column 3 & 4 span (Row 1) 364 ------------ ------------ --------------------------- 365 Column 1 & 2 span Column 3 - Column 4 366 ------------------------- ------------ - Row 2 & 3 367 1 2 3 - span 368 ============ ============ ============ ============== 369 370 The table begins with a top border of equals signs with a space at 371 each column boundary (regardless of spans). Each row is 372 underlined. Internal row separators are underlines of '-', with 373 spaces at column boundaries. The last of the optional head rows is 374 underlined with '=', again with spaces at column boundaries. 375 Column spans have no spaces in their underline. Row spans simply 376 lack an underline at the row boundary. The bottom boundary of the 377 table consists of '=' underlines. A blank line is required 378 following a table. 379 3803. A minimalist alternative is as follows:: 381 382 ==== ===== ======== ======== ======= ==== ===== ===== 383 Old State Input Action New State Notes 384 ----------- -------- ----------------- ----------- 385 ids types new type sys.msg. dupname ids types 386 ==== ===== ======== ======== ======= ==== ===== ===== 387 -- -- explicit -- -- new True 388 -- -- implicit -- -- new False 389 None False explicit -- -- new True 390 old False explicit implicit old new True 391 None True explicit explicit new None True 392 old True explicit explicit new,old None True [1] 393 None False implicit implicit new None False 394 old False implicit implicit new,old None False 395 None True implicit implicit new None True 396 old True implicit implicit new old True 397 ==== ===== ======== ======== ======= ==== ===== ===== 398 399 The table begins with a top border of equals signs with one or more 400 spaces at each column boundary (regardless of spans). There must 401 be at least two columns in the table (to differentiate it from 402 section headers). Each line starts a new row. The rightmost 403 column is unbounded; text may continue past the edge of the table. 404 Each row/line must contain spaces at column boundaries, except for 405 explicit column spans. Underlines of '-' can be used to indicate 406 column spans, but should be used sparingly if at all. Lines 407 containing column span underlines may not contain any other text. 408 The last of the optional head rows is underlined with '=', again 409 with spaces at column boundaries. The bottom boundary of the table 410 consists of '=' underlines. A blank line is required following a 411 table. 412 413 This table sums up the features. Using all the features in such a 414 small space is not pretty though:: 415 416 ======== ======== ======== 417 Header 2 & 3 Span 418 ------------------ 419 Header 1 Header 2 Header 3 420 ======== ======== ======== 421 Each line is a new row. 422 Each row consists of one line only. 423 Row spans are not possible. 424 The last column may spill over to the right. 425 Column spans are possible with an underline joining columns. 426 ---------------------------- 427 The span is limited to the row above the underline. 428 ======== ======== ======== 429 4304. As a variation of alternative 3, bullet list syntax in the first 431 column could be used to indicate row starts. Multi-line rows are 432 possible, but row spans are not. For example:: 433 434 ===== ===== 435 col 1 col 2 436 ===== ===== 437 - 1 Second column of row 1. 438 - 2 Second column of row 2. 439 Second line of paragraph. 440 - 3 Second column of row 3. 441 442 Second paragraph of row 3, 443 column 2 444 ===== ===== 445 446 Column spans would be indicated on the line after the last line of 447 the row. To indicate a real bullet list within a first-column 448 cell, simply nest the bullets. 449 4505. In a further variation, we could simply assume that whitespace in 451 the first column implies a multi-line row; the text in other 452 columns is continuation text. For example:: 453 454 ===== ===== 455 col 1 col 2 456 ===== ===== 457 1 Second column of row 1. 458 2 Second column of row 2. 459 Second line of paragraph. 460 3 Second column of row 3. 461 462 Second paragraph of row 3, 463 column 2 464 ===== ===== 465 466 Limitations of this approach: 467 468 - Cells in the first column are limited to one line of text. 469 470 - Cells in the first column *must* contain some text; blank cells 471 would lead to a misinterpretation. An empty comment ("..") is 472 sufficient. 473 4746. Combining alternative 3 and 4, a bullet list in the first column 475 could mean multi-line rows, and no bullet list means single-line 476 rows only. 477 478Alternatives 1 and 5 has been adopted by reStructuredText. 479 480 481Delimitation of Inline Markup 482============================= 483 484StructuredText specifies that inline markup must begin with 485whitespace, precluding such constructs as parenthesized or quoted 486emphatic text:: 487 488 "**What?**" she cried. (*exit stage left*) 489 490The `reStructuredText markup specification`_ allows for such 491constructs and disambiguates inline markup through a set of 492recognition rules. These recognition rules define the context of 493markup start-strings and end-strings, allowing markup characters to be 494used in most non-markup contexts without a problem (or a backslash). 495So we can say, "Use asterisks (*) around words or phrases to 496*emphasisze* them." The '(*)' will not be recognized as markup. This 497reduces the need for markup escaping to the point where an escape 498character is *almost* (but not quite!) unnecessary. 499 500 501Underlining 502=========== 503 504StructuredText uses '_text_' to indicate underlining. To quote David 505Ascher in his 2000-01-21 Doc-SIG mailing list post, "Docstring 506grammar: a very revised proposal": 507 508 The tagging of underlined text with _'s is suboptimal. Underlines 509 shouldn't be used from a typographic perspective (underlines were 510 designed to be used in manuscripts to communicate to the 511 typesetter that the text should be italicized -- no well-typeset 512 book ever uses underlines), and conflict with double-underscored 513 Python variable names (__init__ and the like), which would get 514 truncated and underlined when that effect is not desired. Note 515 that while *complete* markup would prevent that truncation 516 ('__init__'), I think of docstring markups much like I think of 517 type annotations -- they should be optional and above all do no 518 harm. In this case the underline markup does harm. 519 520Underlining is not part of the reStructuredText specification. 521 522 523Inline Literals 524=============== 525 526StructuredText's markup for inline literals (text left as-is, 527verbatim, usually in a monospaced font; as in HTML <TT>) is single 528quotes ('literals'). The problem with single quotes is that they are 529too often used for other purposes: 530 531- Apostrophes: "Don't blame me, 'cause it ain't mine, it's Chris'."; 532 533- Quoting text: 534 535 First Bruce: "Well Bruce, I heard the prime minister use it. 536 'S'hot enough to boil a monkey's bum in 'ere your Majesty,' he 537 said, and she smiled quietly to herself." 538 539 In the UK, single quotes are used for dialogue in published works. 540 541- String literals: s = '' 542 543Alternatives:: 544 545 'text' \'text\' ''text'' "text" \"text\" ""text"" 546 #text# @text@ `text` ^text^ ``text'' ``text`` 547 548The examples below contain inline literals, quoted text, and 549apostrophes. Each example should evaluate to the following HTML:: 550 551 Some <TT>code</TT>, with a 'quote', "double", ain't it grand? 552 Does <TT>a[b] = 'c' + "d" + `2^3`</TT> work? 553 554 0. Some code, with a quote, double, ain't it grand? 555 Does a[b] = 'c' + "d" + `2^3` work? 556 1. Some 'code', with a \'quote\', "double", ain\'t it grand? 557 Does 'a[b] = \'c\' + "d" + `2^3`' work? 558 2. Some \'code\', with a 'quote', "double", ain't it grand? 559 Does \'a[b] = 'c' + "d" + `2^3`\' work? 560 3. Some ''code'', with a 'quote', "double", ain't it grand? 561 Does ''a[b] = 'c' + "d" + `2^3`'' work? 562 4. Some "code", with a 'quote', \"double\", ain't it grand? 563 Does "a[b] = 'c' + "d" + `2^3`" work? 564 5. Some \"code\", with a 'quote', "double", ain't it grand? 565 Does \"a[b] = 'c' + "d" + `2^3`\" work? 566 6. Some ""code"", with a 'quote', "double", ain't it grand? 567 Does ""a[b] = 'c' + "d" + `2^3`"" work? 568 7. Some #code#, with a 'quote', "double", ain't it grand? 569 Does #a[b] = 'c' + "d" + `2^3`# work? 570 8. Some @code@, with a 'quote', "double", ain't it grand? 571 Does @a[b] = 'c' + "d" + `2^3`@ work? 572 9. Some `code`, with a 'quote', "double", ain't it grand? 573 Does `a[b] = 'c' + "d" + \`2^3\`` work? 574 10. Some ^code^, with a 'quote', "double", ain't it grand? 575 Does ^a[b] = 'c' + "d" + `2\^3`^ work? 576 11. Some ``code'', with a 'quote', "double", ain't it grand? 577 Does ``a[b] = 'c' + "d" + `2^3`'' work? 578 12. Some ``code``, with a 'quote', "double", ain't it grand? 579 Does ``a[b] = 'c' + "d" + `2^3\``` work? 580 581Backquotes (#9 & #12) are the best choice. They are unobtrusive and 582relatviely rarely used (more rarely than ' or ", anyhow). Backquotes 583have the connotation of 'quotes', which other options (like carets, 584#10) don't. 585 586Analogously with ``*emph*`` & ``**strong**``, double-backquotes (#12) 587could be used for inline literals. If single-backquotes are used for 588'interpreted text' (context-sensitive domain-specific descriptive 589markup) such as function name hyperlinks in Python docstrings, then 590double-backquotes could be used for absolute-literals, wherein no 591processing whatsoever takes place. An advantage of double-backquotes 592would be that backslash-escaping would no longer be necessary for 593embedded single-backquotes; however, embedded double-backquotes (in an 594end-string context) would be illegal. See `Backquotes in 595Phrase-Links`__ in `Record of reStructuredText Syntax Alternatives`__. 596 597__ alternatives.html#backquotes-in-phrase-links 598__ alternatives.html 599 600Alternative choices are carets (#10) and TeX-style quotes (#11). For 601examples of TeX-style quoting, see 602http://www.zope.org/Members/jim/StructuredTextWiki/CustomizingTheDocumentProcessor. 603 604Some existing uses of backquotes: 605 6061. As a synonym for repr() in Python. 6072. For command-interpolation in shell scripts. 6083. Used as open-quotes in TeX code (and carried over into plaintext 609 by TeXies). 610 611The inline markup start-string and end-string recognition rules 612defined by the `reStructuredText markup specification`_ would allow 613all of these cases inside inline literals, with very few exceptions. 614As a fallback, literal blocks could handle all cases. 615 616Outside of inline literals, the above uses of backquotes would require 617backslash-escaping. However, these are all prime examples of text 618that should be marked up with inline literals. 619 620If either backquotes or straight single-quotes are used as markup, 621TeX-quotes are too troublesome to support, so no special-casing of 622TeX-quotes should be done (at least at first). If TeX-quotes have to 623be used outside of literals, a single backslash-escaped would suffice: 624\``TeX quote''. Ugly, true, but very infrequently used. 625 626Using literal blocks is a fallback option which removes the need for 627backslash-escaping:: 628 629 like this:: 630 631 Here, we can do ``absolutely'' anything `'`'\|/|\ we like! 632 633No mechanism for inline literals is perfect, just as no escaping 634mechanism is perfect. No matter what we use, complicated inline 635expressions involving the inline literal quote and/or the backslash 636will end up looking ugly. We can only choose the least often ugly 637option. 638 639reStructuredText will use double backquotes for inline literals, and 640single backqoutes for interpreted text. 641 642 643Hyperlinks 644========== 645 646There are three forms of hyperlink currently in StructuredText_: 647 6481. (Absolute & relative URIs.) Text enclosed by double quotes 649 followed by a colon, a URI, and concluded by punctuation plus white 650 space, or just white space, is treated as a hyperlink:: 651 652 "Python":http://www.python.org/ 653 6542. (Absolute URIs only.) Text enclosed by double quotes followed by a 655 comma, one or more spaces, an absolute URI and concluded by 656 punctuation plus white space, or just white space, is treated as a 657 hyperlink:: 658 659 "mail me", mailto:me@mail.com 660 6613. (Endnotes.) Text enclosed by brackets link to an endnote at the 662 end of the document: at the beginning of the line, two dots, a 663 space, and the same text in brackets, followed by the end note 664 itself:: 665 666 Please refer to the fine manual [GVR2001]. 667 668 .. [GVR2001] Python Documentation, Release 2.1, van Rossum, 669 Drake, et al., http://www.python.org/doc/ 670 671The problem with forms 1 and 2 is that they are neither intuitive nor 672unobtrusive (they break design goals 5 & 2). They overload 673double-quotes, which are too often used in ordinary text (potentially 674breaking design goal 4). The brackets in form 3 are also too common 675in ordinary text (such as [nested] asides and Python lists like [12]). 676 677Alternatives: 678 6791. Have no special markup for hyperlinks. 680 6812. A. Interpret and mark up hyperlinks as any contiguous text 682 containing '://' or ':...@' (absolute URI) or '@' (email 683 address) after an alphanumeric word. To de-emphasize the URI, 684 simply enclose it in parentheses: 685 686 Python (http://www.python.org/) 687 688 B. Leave special hyperlink markup as a domain-specific extension. 689 Hyperlinks in ordinary reStructuredText documents would be 690 required to be standalone (i.e. the URI text inline in the 691 document text). Processed hyperlinks (where the URI text is 692 hidden behind the link) are important enough to warrant syntax. 693 6943. The original Setext_ introduced a mechanism of indirect hyperlinks. 695 A source link word ('hot word') in the text was given a trailing 696 underscore:: 697 698 Here is some text with a hyperlink_ built in. 699 700 The hyperlink itself appeared at the end of the document on a line 701 by itself, beginning with two dots, a space, the link word with a 702 leading underscore, whitespace, and the URI itself:: 703 704 .. _hyperlink http://www.123.xyz 705 706 Setext used ``underscores_instead_of_spaces_`` for phrase links. 707 708With some modification, alternative 3 best satisfies the design goals. 709It has the advantage of being readable and relatively unobtrusive. 710Since each source link must match up to a target, the odd variable 711ending in an underscore can be spared being marked up (although it 712should generate a "no such link target" warning). The only 713disadvantage is that phrase-links aren't possible without some 714obtrusive syntax. 715 716We could achieve phrase-links if we enclose the link text: 717 7181. in double quotes:: 719 720 "like this"_ 721 7222. in brackets:: 723 724 [like this]_ 725 7263. or in backquotes:: 727 728 `like this`_ 729 730Each gives us somewhat obtrusive markup, but that is unavoidable. The 731bracketed syntax (#2) is reminiscent of links on many web pages 732(intuitive), although it is somewhat obtrusive. Alternative #3 is 733much less obtrusive, and is consistent with interpreted text: the 734trailing underscore indicates the interpretation of the phrase, as a 735hyperlink. #3 also disambiguates hyperlinks from footnote references. 736Alternative #3 wins. 737 738The same trailing underscore markup can also be used for footnote and 739citation references, removing the problem with ordinary bracketed text 740and Python lists:: 741 742 Please refer to the fine manual [GVR2000]_. 743 744 .. [GVR2000] Python Documentation, van Rossum, Drake, et al., 745 http://www.python.org/doc/ 746 747The two-dots-and-a-space syntax was generalized by Setext for 748comments, which are removed from the (visible) processed output. 749reStructuredText uses this syntax for comments, footnotes, and link 750target, collectively termed "explicit markup". For link targets, in 751order to eliminate ambiguity with comments and footnotes, 752reStructuredText specifies that a colon always follow the link target 753word/phrase. The colon denotes 'maps to'. There is no reason to 754restrict target links to the end of the document; they could just as 755easily be interspersed. 756 757Internal hyperlinks (links from one point to another within a single 758document) can be expressed by a source link as before, and a target 759link with a colon but no URI. In effect, these targets 'map to' the 760element immediately following. 761 762As an added bonus, we now have a perfect candidate for 763reStructuredText directives, a simple extension mechanism: explicit 764markup containing a single word followed by two colons and whitespace. 765The interpretation of subsequent data on the directive line or 766following is directive-dependent. 767 768To summarize:: 769 770 .. This is a comment. 771 772 .. The line below is an example of a directive. 773 .. version:: 1 774 775 This is a footnote [1]_. 776 777 This internal hyperlink will take us to the footnotes_ area below. 778 779 Here is a one-word_ external hyperlink. 780 781 Here is `a hyperlink phrase`_. 782 783 .. _footnotes: 784 .. [1] Footnote text goes here. 785 786 .. external hyperlink target mappings: 787 .. _one-word: http://www.123.xyz 788 .. _a hyperlink phrase: http://www.123.xyz 789 790The presence or absence of a colon after the target link 791differentiates an indirect hyperlink from a footnote, respectively. A 792footnote requires brackets. Backquotes around a target link word or 793phrase are required if the phrase contains a colon, optional 794otherwise. 795 796Below are examples using no markup, the two StructuredText hypertext 797styles, and the reStructuredText hypertext style. Each example 798contains an indirect link, a direct link, a footnote/endnote, and 799bracketed text. In HTML, each example should evaluate to:: 800 801 <P>A <A HREF="http://spam.org">URI</A>, see <A HREF="#eggs2000"> 802 [eggs2000]</A> (in Bacon [Publisher]). Also see 803 <A HREF="http://eggs.org">http://eggs.org</A>.</P> 804 805 <P><A NAME="eggs2000">[eggs2000]</A> "Spam, Spam, Spam, Eggs, 806 Bacon, and Spam"</P> 807 8081. No markup:: 809 810 A URI http://spam.org, see eggs2000 (in Bacon [Publisher]). 811 Also see http://eggs.org. 812 813 eggs2000 "Spam, Spam, Spam, Eggs, Bacon, and Spam" 814 8152. StructuredText absolute/relative URI syntax 816 ("text":http://www.url.org):: 817 818 A "URI":http://spam.org, see [eggs2000] (in Bacon [Publisher]). 819 Also see "http://eggs.org":http://eggs.org. 820 821 .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam" 822 823 Note that StructuredText does not recognize standalone URIs, 824 forcing doubling up as shown in the second line of the example 825 above. 826 8273. StructuredText absolute-only URI syntax 828 ("text", mailto:you@your.com):: 829 830 A "URI", http://spam.org, see [eggs2000] (in Bacon 831 [Publisher]). Also see "http://eggs.org", http://eggs.org. 832 833 .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam" 834 8354. reStructuredText syntax:: 836 837 4. A URI_, see [eggs2000]_ (in Bacon [Publisher]). 838 Also see http://eggs.org. 839 840 .. _URI: http:/spam.org 841 .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam" 842 843The bracketed text '[Publisher]' may be problematic with 844StructuredText (syntax 2 & 3). 845 846reStructuredText's syntax (#4) is definitely the most readable. The 847text is separated from the link URI and the footnote, resulting in 848cleanly readable text. 849 850.. _StructuredText: 851 http://www.zope.org/DevHome/Members/jim/StructuredTextWiki/FrontPage 852.. _Setext: http://docutils.sourceforge.net/mirror/setext.html 853.. _reStructuredText: http://docutils.sourceforge.net/rst.html 854.. _detailed description: 855 http://homepage.ntlworld.com/tibsnjoan/docutils/STNG-format.html 856.. _STMinus: http://www.cis.upenn.edu/~edloper/pydoc/stminus.html 857.. _StructuredTextNG: 858 http://www.zope.org/DevHome/Members/jim/StructuredTextWiki/StructuredTextNG 859.. _README: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/ 860 python/python/dist/src/README 861.. _Emacs table mode: http://table.sourceforge.net/ 862.. _reStructuredText Markup Specification: 863 ../../ref/rst/restructuredtext.html 864 865 866.. 867 Local Variables: 868 mode: indented-text 869 indent-tabs-mode: nil 870 sentence-end-double-space: t 871 fill-column: 70 872 End: 873