1=encoding utf8 2 3=head1 NAME 4 5perlpodspec - Plain Old Documentation: format specification and notes 6 7=head1 DESCRIPTION 8 9This document is detailed notes on the Pod markup language. Most 10people will only have to read L<perlpod|perlpod> to know how to write 11in Pod, but this document may answer some incidental questions to do 12with parsing and rendering Pod. 13 14In this document, "must" / "must not", "should" / 15"should not", and "may" have their conventional (cf. RFC 2119) 16meanings: "X must do Y" means that if X doesn't do Y, it's against 17this specification, and should really be fixed. "X should do Y" 18means that it's recommended, but X may fail to do Y, if there's a 19good reason. "X may do Y" is merely a note that X can do Y at 20will (although it is up to the reader to detect any connotation of 21"and I think it would be I<nice> if X did Y" versus "it wouldn't 22really I<bother> me if X did Y"). 23 24Notably, when I say "the parser should do Y", the 25parser may fail to do Y, if the calling application explicitly 26requests that the parser I<not> do Y. I often phrase this as 27"the parser should, by default, do Y." This doesn't I<require> 28the parser to provide an option for turning off whatever 29feature Y is (like expanding tabs in verbatim paragraphs), although 30it implicates that such an option I<may> be provided. 31 32=head1 Pod Definitions 33 34Pod is embedded in files, typically Perl source files, although you 35can write a file that's nothing but Pod. 36 37A B<line> in a file consists of zero or more non-newline characters, 38terminated by either a newline or the end of the file. 39 40A B<newline sequence> is usually a platform-dependent concept, but 41Pod parsers should understand it to mean any of CR (ASCII 13), LF 42(ASCII 10), or a CRLF (ASCII 13 followed immediately by ASCII 10), in 43addition to any other system-specific meaning. The first CR/CRLF/LF 44sequence in the file may be used as the basis for identifying the 45newline sequence for parsing the rest of the file. 46 47A B<blank line> is a line consisting entirely of zero or more spaces 48(ASCII 32) or tabs (ASCII 9), and terminated by a newline or end-of-file. 49A B<non-blank line> is a line containing one or more characters other 50than space or tab (and terminated by a newline or end-of-file). 51 52(I<Note:> Many older Pod parsers did not accept a line consisting of 53spaces/tabs and then a newline as a blank line. The only lines they 54considered blank were lines consisting of I<no characters at all>, 55terminated by a newline.) 56 57B<Whitespace> is used in this document as a blanket term for spaces, 58tabs, and newline sequences. (By itself, this term usually refers 59to literal whitespace. That is, sequences of whitespace characters 60in Pod source, as opposed to "EE<lt>32>", which is a formatting 61code that I<denotes> a whitespace character.) 62 63A B<Pod parser> is a module meant for parsing Pod (regardless of 64whether this involves calling callbacks or building a parse tree or 65directly formatting it). A B<Pod formatter> (or B<Pod translator>) 66is a module or program that converts Pod to some other format (HTML, 67plaintext, TeX, PostScript, RTF). A B<Pod processor> might be a 68formatter or translator, or might be a program that does something 69else with the Pod (like counting words, scanning for index points, 70etc.). 71 72Pod content is contained in B<Pod blocks>. A Pod block starts with a 73line that matches C<m/\A=[a-zA-Z]/>, and continues up to the next line 74that matches C<m/\A=cut/> or up to the end of the file if there is 75no C<m/\A=cut/> line. 76 77=for comment 78 The current perlsyn says: 79 [beginquote] 80 Note that pod translators should look at only paragraphs beginning 81 with a pod directive (it makes parsing easier), whereas the compiler 82 actually knows to look for pod escapes even in the middle of a 83 paragraph. This means that the following secret stuff will be ignored 84 by both the compiler and the translators. 85 $a=3; 86 =secret stuff 87 warn "Neither POD nor CODE!?" 88 =cut back 89 print "got $a\n"; 90 You probably shouldn't rely upon the warn() being podded out forever. 91 Not all pod translators are well-behaved in this regard, and perhaps 92 the compiler will become pickier. 93 [endquote] 94 I think that those paragraphs should just be removed; paragraph-based 95 parsing seems to have been largely abandoned, because of the hassle 96 with non-empty blank lines messing up what people meant by "paragraph". 97 Even if the "it makes parsing easier" bit were especially true, 98 it wouldn't be worth the confusion of having perl and pod2whatever 99 actually disagree on what can constitute a Pod block. 100 101Within a Pod block, there are B<Pod paragraphs>. A Pod paragraph 102consists of non-blank lines of text, separated by one or more blank 103lines. 104 105For purposes of Pod processing, there are four types of paragraphs in 106a Pod block: 107 108=over 109 110=item * 111 112A command paragraph (also called a "directive"). The first line of 113this paragraph must match C<m/\A=[a-zA-Z]/>. Command paragraphs are 114typically one line, as in: 115 116 =head1 NOTES 117 118 =item * 119 120But they may span several (non-blank) lines: 121 122 =for comment 123 Hm, I wonder what it would look like if 124 you tried to write a BNF for Pod from this. 125 126 =head3 Dr. Strangelove, or: How I Learned to 127 Stop Worrying and Love the Bomb 128 129I<Some> command paragraphs allow formatting codes in their content 130(i.e., after the part that matches C<m/\A=[a-zA-Z]\S*\s*/>), as in: 131 132 =head1 Did You Remember to C<use strict;>? 133 134In other words, the Pod processing handler for "head1" will apply the 135same processing to "Did You Remember to CE<lt>use strict;>?" that it 136would to an ordinary paragraph (i.e., formatting codes like 137"CE<lt>...>") are parsed and presumably formatted appropriately, and 138whitespace in the form of literal spaces and/or tabs is not 139significant. 140 141=item * 142 143A B<verbatim paragraph>. The first line of this paragraph must be a 144literal space or tab, and this paragraph must not be inside a "=begin 145I<identifier>", ... "=end I<identifier>" sequence unless 146"I<identifier>" begins with a colon (":"). That is, if a paragraph 147starts with a literal space or tab, but I<is> inside a 148"=begin I<identifier>", ... "=end I<identifier>" region, then it's 149a data paragraph, unless "I<identifier>" begins with a colon. 150 151Whitespace I<is> significant in verbatim paragraphs (although, in 152processing, tabs are probably expanded). 153 154=item * 155 156An B<ordinary paragraph>. A paragraph is an ordinary paragraph 157if its first line matches neither C<m/\A=[a-zA-Z]/> nor 158C<m/\A[ \t]/>, I<and> if it's not inside a "=begin I<identifier>", 159... "=end I<identifier>" sequence unless "I<identifier>" begins with 160a colon (":"). 161 162=item * 163 164A B<data paragraph>. This is a paragraph that I<is> inside a "=begin 165I<identifier>" ... "=end I<identifier>" sequence where 166"I<identifier>" does I<not> begin with a literal colon (":"). In 167some sense, a data paragraph is not part of Pod at all (i.e., 168effectively it's "out-of-band"), since it's not subject to most kinds 169of Pod parsing; but it is specified here, since Pod 170parsers need to be able to call an event for it, or store it in some 171form in a parse tree, or at least just parse I<around> it. 172 173=back 174 175For example: consider the following paragraphs: 176 177 # <- that's the 0th column 178 179 =head1 Foo 180 181 Stuff 182 183 $foo->bar 184 185 =cut 186 187Here, "=head1 Foo" and "=cut" are command paragraphs because the first 188line of each matches C<m/\A=[a-zA-Z]/>. "I<[space][space]>$foo->bar" 189is a verbatim paragraph, because its first line starts with a literal 190whitespace character (and there's no "=begin"..."=end" region around). 191 192The "=begin I<identifier>" ... "=end I<identifier>" commands stop 193paragraphs that they surround from being parsed as ordinary or verbatim 194paragraphs, if I<identifier> doesn't begin with a colon. This 195is discussed in detail in the section 196L</About Data Paragraphs and "=beginE<sol>=end" Regions>. 197 198=head1 Pod Commands 199 200This section is intended to supplement and clarify the discussion in 201L<perlpod/"Command Paragraph">. These are the currently recognized 202Pod commands: 203 204=over 205 206=item "=head1", "=head2", "=head3", "=head4" 207 208This command indicates that the text in the remainder of the paragraph 209is a heading. That text may contain formatting codes. Examples: 210 211 =head1 Object Attributes 212 213 =head3 What B<Not> to Do! 214 215=item "=pod" 216 217This command indicates that this paragraph begins a Pod block. (If we 218are already in the middle of a Pod block, this command has no effect at 219all.) If there is any text in this command paragraph after "=pod", 220it must be ignored. Examples: 221 222 =pod 223 224 This is a plain Pod paragraph. 225 226 =pod This text is ignored. 227 228=item "=cut" 229 230This command indicates that this line is the end of this previously 231started Pod block. If there is any text after "=cut" on the line, it must be 232ignored. Examples: 233 234 =cut 235 236 =cut The documentation ends here. 237 238 =cut 239 # This is the first line of program text. 240 sub foo { # This is the second. 241 242It is an error to try to I<start> a Pod block with a "=cut" command. In 243that case, the Pod processor must halt parsing of the input file, and 244must by default emit a warning. 245 246=item "=over" 247 248This command indicates that this is the start of a list/indent 249region. If there is any text following the "=over", it must consist 250of only a nonzero positive numeral. The semantics of this numeral is 251explained in the L</"About =over...=back Regions"> section, further 252below. Formatting codes are not expanded. Examples: 253 254 =over 3 255 256 =over 3.5 257 258 =over 259 260=item "=item" 261 262This command indicates that an item in a list begins here. Formatting 263codes are processed. The semantics of the (optional) text in the 264remainder of this paragraph are 265explained in the L</"About =over...=back Regions"> section, further 266below. Examples: 267 268 =item 269 270 =item * 271 272 =item * 273 274 =item 14 275 276 =item 3. 277 278 =item C<< $thing->stuff(I<dodad>) >> 279 280 =item For transporting us beyond seas to be tried for pretended 281 offenses 282 283 =item He is at this time transporting large armies of foreign 284 mercenaries to complete the works of death, desolation and 285 tyranny, already begun with circumstances of cruelty and perfidy 286 scarcely paralleled in the most barbarous ages, and totally 287 unworthy the head of a civilized nation. 288 289=item "=back" 290 291This command indicates that this is the end of the region begun 292by the most recent "=over" command. It permits no text after the 293"=back" command. 294 295=item "=begin formatname" 296 297=item "=begin formatname parameter" 298 299This marks the following paragraphs (until the matching "=end 300formatname") as being for some special kind of processing. Unless 301"formatname" begins with a colon, the contained non-command 302paragraphs are data paragraphs. But if "formatname" I<does> begin 303with a colon, then non-command paragraphs are ordinary paragraphs 304or data paragraphs. This is discussed in detail in the section 305L</About Data Paragraphs and "=beginE<sol>=end" Regions>. 306 307It is advised that formatnames match the regexp 308C<m/\A:?[-a-zA-Z0-9_]+\z/>. Everything following whitespace after the 309formatname is a parameter that may be used by the formatter when dealing 310with this region. This parameter must not be repeated in the "=end" 311paragraph. Implementors should anticipate future expansion in the 312semantics and syntax of the first parameter to "=begin"/"=end"/"=for". 313 314=item "=end formatname" 315 316This marks the end of the region opened by the matching 317"=begin formatname" region. If "formatname" is not the formatname 318of the most recent open "=begin formatname" region, then this 319is an error, and must generate an error message. This 320is discussed in detail in the section 321L</About Data Paragraphs and "=beginE<sol>=end" Regions>. 322 323=item "=for formatname text..." 324 325This is synonymous with: 326 327 =begin formatname 328 329 text... 330 331 =end formatname 332 333That is, it creates a region consisting of a single paragraph; that 334paragraph is to be treated as a normal paragraph if "formatname" 335begins with a ":"; if "formatname" I<doesn't> begin with a colon, 336then "text..." will constitute a data paragraph. There is no way 337to use "=for formatname text..." to express "text..." as a verbatim 338paragraph. 339 340=item "=encoding encodingname" 341 342This command, which should occur early in the document (at least 343before any non-US-ASCII data!), declares that this document is 344encoded in the encoding I<encodingname>, which must be 345an encoding name that L<Encode> recognizes. (Encode's list 346of supported encodings, in L<Encode::Supported>, is useful here.) 347If the Pod parser cannot decode the declared encoding, it 348should emit a warning and may abort parsing the document 349altogether. 350 351A document having more than one "=encoding" line should be 352considered an error. Pod processors may silently tolerate this if 353the not-first "=encoding" lines are just duplicates of the 354first one (e.g., if there's a "=encoding utf8" line, and later on 355another "=encoding utf8" line). But Pod processors should complain if 356there are contradictory "=encoding" lines in the same document 357(e.g., if there is a "=encoding utf8" early in the document and 358"=encoding big5" later). Pod processors that recognize BOMs 359may also complain if they see an "=encoding" line 360that contradicts the BOM (e.g., if a document with a UTF-16LE 361BOM has an "=encoding shiftjis" line). 362 363=back 364 365If a Pod processor sees any command other than the ones listed 366above (like "=head", or "=haed1", or "=stuff", or "=cuttlefish", 367or "=w123"), that processor must by default treat this as an 368error. It must not process the paragraph beginning with that 369command, must by default warn of this as an error, and may 370abort the parse. A Pod parser may allow a way for particular 371applications to add to the above list of known commands, and to 372stipulate, for each additional command, whether formatting 373codes should be processed. 374 375Future versions of this specification may add additional 376commands. 377 378 379 380=head1 Pod Formatting Codes 381 382(Note that in previous drafts of this document and of perlpod, 383formatting codes were referred to as "interior sequences", and 384this term may still be found in the documentation for Pod parsers, 385and in error messages from Pod processors.) 386 387There are two syntaxes for formatting codes: 388 389=over 390 391=item * 392 393A formatting code starts with a capital letter (just US-ASCII [A-Z]) 394followed by a "<", any number of characters, and ending with the first 395matching ">". Examples: 396 397 That's what I<you> think! 398 399 What's C<dump()> for? 400 401 X<C<chmod> and C<unlink()> Under Different Operating Systems> 402 403=item * 404 405A formatting code starts with a capital letter (just US-ASCII [A-Z]) 406followed by two or more "<"'s, one or more whitespace characters, 407any number of characters, one or more whitespace characters, 408and ending with the first matching sequence of two or more ">"'s, where 409the number of ">"'s equals the number of "<"'s in the opening of this 410formatting code. Examples: 411 412 That's what I<< you >> think! 413 414 C<<< open(X, ">>thing.dat") || die $! >>> 415 416 B<< $foo->bar(); >> 417 418With this syntax, the whitespace character(s) after the "CE<lt><<" 419and before the ">>>" (or whatever letter) are I<not> renderable. They 420do not signify whitespace, are merely part of the formatting codes 421themselves. That is, these are all synonymous: 422 423 C<thing> 424 C<< thing >> 425 C<< thing >> 426 C<<< thing >>> 427 C<<<< 428 thing 429 >>>> 430 431and so on. 432 433Finally, the multiple-angle-bracket form does I<not> alter the interpretation 434of nested formatting codes, meaning that the following four example lines are 435identical in meaning: 436 437 B<example: C<$a E<lt>=E<gt> $b>> 438 439 B<example: C<< $a <=> $b >>> 440 441 B<example: C<< $a E<lt>=E<gt> $b >>> 442 443 B<<< example: C<< $a E<lt>=E<gt> $b >> >>> 444 445=back 446 447In parsing Pod, a notably tricky part is the correct parsing of 448(potentially nested!) formatting codes. Implementors should 449consult the code in the C<parse_text> routine in Pod::Parser as an 450example of a correct implementation. 451 452=over 453 454=item C<IE<lt>textE<gt>> -- italic text 455 456See the brief discussion in L<perlpod/"Formatting Codes">. 457 458=item C<BE<lt>textE<gt>> -- bold text 459 460See the brief discussion in L<perlpod/"Formatting Codes">. 461 462=item C<CE<lt>codeE<gt>> -- code text 463 464See the brief discussion in L<perlpod/"Formatting Codes">. 465 466=item C<FE<lt>filenameE<gt>> -- style for filenames 467 468See the brief discussion in L<perlpod/"Formatting Codes">. 469 470=item C<XE<lt>topic nameE<gt>> -- an index entry 471 472See the brief discussion in L<perlpod/"Formatting Codes">. 473 474This code is unusual in that most formatters completely discard 475this code and its content. Other formatters will render it with 476invisible codes that can be used in building an index of 477the current document. 478 479=item C<ZE<lt>E<gt>> -- a null (zero-effect) formatting code 480 481Discussed briefly in L<perlpod/"Formatting Codes">. 482 483This code is unusual in that it should have no content. That is, 484a processor may complain if it sees C<ZE<lt>potatoesE<gt>>. Whether 485or not it complains, the I<potatoes> text should ignored. 486 487=item C<LE<lt>nameE<gt>> -- a hyperlink 488 489The complicated syntaxes of this code are discussed at length in 490L<perlpod/"Formatting Codes">, and implementation details are 491discussed below, in L</"About LE<lt>...E<gt> Codes">. Parsing the 492contents of LE<lt>content> is tricky. Notably, the content has to be 493checked for whether it looks like a URL, or whether it has to be split 494on literal "|" and/or "/" (in the right order!), and so on, 495I<before> EE<lt>...> codes are resolved. 496 497=item C<EE<lt>escapeE<gt>> -- a character escape 498 499See L<perlpod/"Formatting Codes">, and several points in 500L</Notes on Implementing Pod Processors>. 501 502=item C<SE<lt>textE<gt>> -- text contains non-breaking spaces 503 504This formatting code is syntactically simple, but semantically 505complex. What it means is that each space in the printable 506content of this code signifies a non-breaking space. 507 508Consider: 509 510 C<$x ? $y : $z> 511 512 S<C<$x ? $y : $z>> 513 514Both signify the monospace (c[ode] style) text consisting of 515"$x", one space, "?", one space, ":", one space, "$z". The 516difference is that in the latter, with the S code, those spaces 517are not "normal" spaces, but instead are non-breaking spaces. 518 519=back 520 521 522If a Pod processor sees any formatting code other than the ones 523listed above (as in "NE<lt>...>", or "QE<lt>...>", etc.), that 524processor must by default treat this as an error. 525A Pod parser may allow a way for particular 526applications to add to the above list of known formatting codes; 527a Pod parser might even allow a way to stipulate, for each additional 528command, whether it requires some form of special processing, as 529LE<lt>...> does. 530 531Future versions of this specification may add additional 532formatting codes. 533 534Historical note: A few older Pod processors would not see a ">" as 535closing a "CE<lt>" code, if the ">" was immediately preceded by 536a "-". This was so that this: 537 538 C<$foo->bar> 539 540would parse as equivalent to this: 541 542 C<$foo-E<gt>bar> 543 544instead of as equivalent to a "C" formatting code containing 545only "$foo-", and then a "bar>" outside the "C" formatting code. This 546problem has since been solved by the addition of syntaxes like this: 547 548 C<< $foo->bar >> 549 550Compliant parsers must not treat "->" as special. 551 552Formatting codes absolutely cannot span paragraphs. If a code is 553opened in one paragraph, and no closing code is found by the end of 554that paragraph, the Pod parser must close that formatting code, 555and should complain (as in "Unterminated I code in the paragraph 556starting at line 123: 'Time objects are not...'"). So these 557two paragraphs: 558 559 I<I told you not to do this! 560 561 Don't make me say it again!> 562 563...must I<not> be parsed as two paragraphs in italics (with the I 564code starting in one paragraph and starting in another.) Instead, 565the first paragraph should generate a warning, but that aside, the 566above code must parse as if it were: 567 568 I<I told you not to do this!> 569 570 Don't make me say it again!E<gt> 571 572(In SGMLish jargon, all Pod commands are like block-level 573elements, whereas all Pod formatting codes are like inline-level 574elements.) 575 576 577 578=head1 Notes on Implementing Pod Processors 579 580The following is a long section of miscellaneous requirements 581and suggestions to do with Pod processing. 582 583=over 584 585=item * 586 587Pod formatters should tolerate lines in verbatim blocks that are of 588any length, even if that means having to break them (possibly several 589times, for very long lines) to avoid text running off the side of the 590page. Pod formatters may warn of such line-breaking. Such warnings 591are particularly appropriate for lines are over 100 characters long, which 592are usually not intentional. 593 594=item * 595 596Pod parsers must recognize I<all> of the three well-known newline 597formats: CR, LF, and CRLF. See L<perlport|perlport>. 598 599=item * 600 601Pod parsers should accept input lines that are of any length. 602 603=item * 604 605Since Perl recognizes a Unicode Byte Order Mark at the start of files 606as signaling that the file is Unicode encoded as in UTF-16 (whether 607big-endian or little-endian) or UTF-8, Pod parsers should do the 608same. Otherwise, the character encoding should be understood as 609being UTF-8 if the first highbit byte sequence in the file seems 610valid as a UTF-8 sequence, or otherwise as CP-1252 (earlier versions of 611this specification used Latin-1 instead of CP-1252). 612 613Future versions of this specification may specify 614how Pod can accept other encodings. Presumably treatment of other 615encodings in Pod parsing would be as in XML parsing: whatever the 616encoding declared by a particular Pod file, content is to be 617stored in memory as Unicode characters. 618 619=item * 620 621The well known Unicode Byte Order Marks are as follows: if the 622file begins with the two literal byte values 0xFE 0xFF, this is 623the BOM for big-endian UTF-16. If the file begins with the two 624literal byte value 0xFF 0xFE, this is the BOM for little-endian 625UTF-16. On an ASCII platform, if the file begins with the three literal 626byte values 6270xEF 0xBB 0xBF, this is the BOM for UTF-8. 628A mechanism portable to EBCDIC platforms is to: 629 630 my $utf8_bom = "\x{FEFF}"; 631 utf8::encode($utf8_bom); 632 633=for comment 634 use bytes; print map sprintf(" 0x%02X", ord $_), split '', "\x{feff}"; 635 0xEF 0xBB 0xBF 636 637=for comment 638 If toke.c is modified to support UTF-32, add mention of those here. 639 640=item * 641 642A naive, but often sufficient heuristic on ASCII platforms, for testing 643the first highbit 644byte-sequence in a BOM-less file (whether in code or in Pod!), to see 645whether that sequence is valid as UTF-8 (RFC 2279) is to check whether 646that the first byte in the sequence is in the range 0xC2 - 0xFD 647I<and> whether the next byte is in the range 6480x80 - 0xBF. If so, the parser may conclude that this file is in 649UTF-8, and all highbit sequences in the file should be assumed to 650be UTF-8. Otherwise the parser should treat the file as being 651in CP-1252. (A better check, and which works on EBCDIC platforms as 652well, is to pass a copy of the sequence to 653L<utf8::decode()|utf8> which performs a full validity check on the 654sequence and returns TRUE if it is valid UTF-8, FALSE otherwise. This 655function is always pre-loaded, is fast because it is written in C, and 656will only get called at most once, so you don't need to avoid it out of 657performance concerns.) 658In the unlikely circumstance that the first highbit 659sequence in a truly non-UTF-8 file happens to appear to be UTF-8, one 660can cater to our heuristic (as well as any more intelligent heuristic) 661by prefacing that line with a comment line containing a highbit 662sequence that is clearly I<not> valid as UTF-8. A line consisting 663of simply "#", an e-acute, and any non-highbit byte, 664is sufficient to establish this file's encoding. 665 666=for comment 667 If/WHEN some brave soul makes these heuristics into a generic 668 text-file class (or PerlIO layer?), we can presumably delete 669 mention of these icky details from this file, and can instead 670 tell people to just use appropriate class/layer. 671 Auto-recognition of newline sequences would be another desirable 672 feature of such a class/layer. 673 HINT HINT HINT. 674 675=for comment 676 "The probability that a string of characters 677 in any other encoding appears as valid UTF-8 is low" - RFC2279 678 679=item * 680 681Pod processors must treat a "=for [label] [content...]" paragraph as 682meaning the same thing as a "=begin [label]" paragraph, content, and 683an "=end [label]" paragraph. (The parser may conflate these two 684constructs, or may leave them distinct, in the expectation that the 685formatter will nevertheless treat them the same.) 686 687=item * 688 689When rendering Pod to a format that allows comments (i.e., to nearly 690any format other than plaintext), a Pod formatter must insert comment 691text identifying its name and version number, and the name and 692version numbers of any modules it might be using to process the Pod. 693Minimal examples: 694 695 %% POD::Pod2PS v3.14159, using POD::Parser v1.92 696 697 <!-- Pod::HTML v3.14159, using POD::Parser v1.92 --> 698 699 {\doccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08} 700 701 .\" Pod::Man version 3.14159, using POD::Parser version 1.92 702 703Formatters may also insert additional comments, including: the 704release date of the Pod formatter program, the contact address for 705the author(s) of the formatter, the current time, the name of input 706file, the formatting options in effect, version of Perl used, etc. 707 708Formatters may also choose to note errors/warnings as comments, 709besides or instead of emitting them otherwise (as in messages to 710STDERR, or C<die>ing). 711 712=item * 713 714Pod parsers I<may> emit warnings or error messages ("Unknown E code 715EE<lt>zslig>!") to STDERR (whether through printing to STDERR, or 716C<warn>ing/C<carp>ing, or C<die>ing/C<croak>ing), but I<must> allow 717suppressing all such STDERR output, and instead allow an option for 718reporting errors/warnings 719in some other way, whether by triggering a callback, or noting errors 720in some attribute of the document object, or some similarly unobtrusive 721mechanism -- or even by appending a "Pod Errors" section to the end of 722the parsed form of the document. 723 724=item * 725 726In cases of exceptionally aberrant documents, Pod parsers may abort the 727parse. Even then, using C<die>ing/C<croak>ing is to be avoided; where 728possible, the parser library may simply close the input file 729and add text like "*** Formatting Aborted ***" to the end of the 730(partial) in-memory document. 731 732=item * 733 734In paragraphs where formatting codes (like EE<lt>...>, BE<lt>...>) 735are understood (i.e., I<not> verbatim paragraphs, but I<including> 736ordinary paragraphs, and command paragraphs that produce renderable 737text, like "=head1"), literal whitespace should generally be considered 738"insignificant", in that one literal space has the same meaning as any 739(nonzero) number of literal spaces, literal newlines, and literal tabs 740(as long as this produces no blank lines, since those would terminate 741the paragraph). Pod parsers should compact literal whitespace in each 742processed paragraph, but may provide an option for overriding this 743(since some processing tasks do not require it), or may follow 744additional special rules (for example, specially treating 745period-space-space or period-newline sequences). 746 747=item * 748 749Pod parsers should not, by default, try to coerce apostrophe (') and 750quote (") into smart quotes (little 9's, 66's, 99's, etc), nor try to 751turn backtick (`) into anything else but a single backtick character 752(distinct from an open quote character!), nor "--" into anything but 753two minus signs. They I<must never> do any of those things to text 754in CE<lt>...> formatting codes, and never I<ever> to text in verbatim 755paragraphs. 756 757=item * 758 759When rendering Pod to a format that has two kinds of hyphens (-), one 760that's a non-breaking hyphen, and another that's a breakable hyphen 761(as in "object-oriented", which can be split across lines as 762"object-", newline, "oriented"), formatters are encouraged to 763generally translate "-" to non-breaking hyphen, but may apply 764heuristics to convert some of these to breaking hyphens. 765 766=item * 767 768Pod formatters should make reasonable efforts to keep words of Perl 769code from being broken across lines. For example, "Foo::Bar" in some 770formatting systems is seen as eligible for being broken across lines 771as "Foo::" newline "Bar" or even "Foo::-" newline "Bar". This should 772be avoided where possible, either by disabling all line-breaking in 773mid-word, or by wrapping particular words with internal punctuation 774in "don't break this across lines" codes (which in some formats may 775not be a single code, but might be a matter of inserting non-breaking 776zero-width spaces between every pair of characters in a word.) 777 778=item * 779 780Pod parsers should, by default, expand tabs in verbatim paragraphs as 781they are processed, before passing them to the formatter or other 782processor. Parsers may also allow an option for overriding this. 783 784=item * 785 786Pod parsers should, by default, remove newlines from the end of 787ordinary and verbatim paragraphs before passing them to the 788formatter. For example, while the paragraph you're reading now 789could be considered, in Pod source, to end with (and contain) 790the newline(s) that end it, it should be processed as ending with 791(and containing) the period character that ends this sentence. 792 793=item * 794 795Pod parsers, when reporting errors, should make some effort to report 796an approximate line number ("Nested EE<lt>>'s in Paragraph #52, near 797line 633 of Thing/Foo.pm!"), instead of merely noting the paragraph 798number ("Nested EE<lt>>'s in Paragraph #52 of Thing/Foo.pm!"). Where 799this is problematic, the paragraph number should at least be 800accompanied by an excerpt from the paragraph ("Nested EE<lt>>'s in 801Paragraph #52 of Thing/Foo.pm, which begins 'Read/write accessor for 802the CE<lt>interest rate> attribute...'"). 803 804=item * 805 806Pod parsers, when processing a series of verbatim paragraphs one 807after another, should consider them to be one large verbatim 808paragraph that happens to contain blank lines. I.e., these two 809lines, which have a blank line between them: 810 811 use Foo; 812 813 print Foo->VERSION 814 815should be unified into one paragraph ("\tuse Foo;\n\n\tprint 816Foo->VERSION") before being passed to the formatter or other 817processor. Parsers may also allow an option for overriding this. 818 819While this might be too cumbersome to implement in event-based Pod 820parsers, it is straightforward for parsers that return parse trees. 821 822=item * 823 824Pod formatters, where feasible, are advised to avoid splitting short 825verbatim paragraphs (under twelve lines, say) across pages. 826 827=item * 828 829Pod parsers must treat a line with only spaces and/or tabs on it as a 830"blank line" such as separates paragraphs. (Some older parsers 831recognized only two adjacent newlines as a "blank line" but would not 832recognize a newline, a space, and a newline, as a blank line. This 833is noncompliant behavior.) 834 835=item * 836 837Authors of Pod formatters/processors should make every effort to 838avoid writing their own Pod parser. There are already several in 839CPAN, with a wide range of interface styles -- and one of them, 840Pod::Simple, comes with modern versions of Perl. 841 842=item * 843 844Characters in Pod documents may be conveyed either as literals, or by 845number in EE<lt>n> codes, or by an equivalent mnemonic, as in 846EE<lt>eacute> which is exactly equivalent to EE<lt>233>. The numbers 847are the Latin1/Unicode values, even on EBCDIC platforms. 848 849When referring to characters by using a EE<lt>n> numeric code, numbers 850in the range 32-126 refer to those well known US-ASCII characters (also 851defined there by Unicode, with the same meaning), which all Pod 852formatters must render faithfully. Characters whose EE<lt>E<gt> numbers 853are in the ranges 0-31 and 127-159 should not be used (neither as 854literals, 855nor as EE<lt>number> codes), except for the literal byte-sequences for 856newline (ASCII 13, ASCII 13 10, or ASCII 10), and tab (ASCII 9). 857 858Numbers in the range 160-255 refer to Latin-1 characters (also 859defined there by Unicode, with the same meaning). Numbers above 860255 should be understood to refer to Unicode characters. 861 862=item * 863 864Be warned 865that some formatters cannot reliably render characters outside 32-126; 866and many are able to handle 32-126 and 160-255, but nothing above 867255. 868 869=item * 870 871Besides the well-known "EE<lt>lt>" and "EE<lt>gt>" codes for 872less-than and greater-than, Pod parsers must understand "EE<lt>sol>" 873for "/" (solidus, slash), and "EE<lt>verbar>" for "|" (vertical bar, 874pipe). Pod parsers should also understand "EE<lt>lchevron>" and 875"EE<lt>rchevron>" as legacy codes for characters 171 and 187, i.e., 876"left-pointing double angle quotation mark" = "left pointing 877guillemet" and "right-pointing double angle quotation mark" = "right 878pointing guillemet". (These look like little "<<" and ">>", and they 879are now preferably expressed with the HTML/XHTML codes "EE<lt>laquo>" 880and "EE<lt>raquo>".) 881 882=item * 883 884Pod parsers should understand all "EE<lt>html>" codes as defined 885in the entity declarations in the most recent XHTML specification at 886C<www.W3.org>. Pod parsers must understand at least the entities 887that define characters in the range 160-255 (Latin-1). Pod parsers, 888when faced with some unknown "EE<lt>I<identifier>>" code, 889shouldn't simply replace it with nullstring (by default, at least), 890but may pass it through as a string consisting of the literal characters 891E, less-than, I<identifier>, greater-than. Or Pod parsers may offer the 892alternative option of processing such unknown 893"EE<lt>I<identifier>>" codes by firing an event especially 894for such codes, or by adding a special node-type to the in-memory 895document tree. Such "EE<lt>I<identifier>>" may have special meaning 896to some processors, or some processors may choose to add them to 897a special error report. 898 899=item * 900 901Pod parsers must also support the XHTML codes "EE<lt>quot>" for 902character 34 (doublequote, "), "EE<lt>amp>" for character 38 903(ampersand, &), and "EE<lt>apos>" for character 39 (apostrophe, '). 904 905=item * 906 907Note that in all cases of "EE<lt>whateverE<gt>", I<whatever> (whether 908an htmlname, or a number in any base) must consist only of 909alphanumeric characters -- that is, I<whatever> must match 910C<m/\A\w+\z/>. So S<"EE<lt> 0 1 2 3 E<gt>"> is invalid, because 911it contains spaces, which aren't alphanumeric characters. This 912presumably does not I<need> special treatment by a Pod processor; 913S<" 0 1 2 3 "> doesn't look like a number in any base, so it would 914presumably be looked up in the table of HTML-like names. Since 915there isn't (and cannot be) an HTML-like entity called S<" 0 1 2 3 ">, 916this will be treated as an error. However, Pod processors may 917treat S<"EE<lt> 0 1 2 3 E<gt>"> or "EE<lt>e-acute>" as I<syntactically> 918invalid, potentially earning a different error message than the 919error message (or warning, or event) generated by a merely unknown 920(but theoretically valid) htmlname, as in "EE<lt>qacute>" 921[sic]. However, Pod parsers are not required to make this 922distinction. 923 924=item * 925 926Note that EE<lt>number> I<must not> be interpreted as simply 927"codepoint I<number> in the current/native character set". It always 928means only "the character represented by codepoint I<number> in 929Unicode." (This is identical to the semantics of &#I<number>; in XML.) 930 931This will likely require many formatters to have tables mapping from 932treatable Unicode codepoints (such as the "\xE9" for the e-acute 933character) to the escape sequences or codes necessary for conveying 934such sequences in the target output format. A converter to *roff 935would, for example know that "\xE9" (whether conveyed literally, or via 936a EE<lt>...> sequence) is to be conveyed as "e\\*'". 937Similarly, a program rendering Pod in a Mac OS application window, would 938presumably need to know that "\xE9" maps to codepoint 142 in MacRoman 939encoding that (at time of writing) is native for Mac OS. Such 940Unicode2whatever mappings are presumably already widely available for 941common output formats. (Such mappings may be incomplete! Implementers 942are not expected to bend over backwards in an attempt to render 943Cherokee syllabics, Etruscan runes, Byzantine musical symbols, or any 944of the other weird things that Unicode can encode.) And 945if a Pod document uses a character not found in such a mapping, the 946formatter should consider it an unrenderable character. 947 948=item * 949 950If, surprisingly, the implementor of a Pod formatter can't find a 951satisfactory pre-existing table mapping from Unicode characters to 952escapes in the target format (e.g., a decent table of Unicode 953characters to *roff escapes), it will be necessary to build such a 954table. If you are in this circumstance, you should begin with the 955characters in the range 0x00A0 - 0x00FF, which is mostly the heavily 956used accented characters. Then proceed (as patience permits and 957fastidiousness compels) through the characters that the (X)HTML 958standards groups judged important enough to merit mnemonics 959for. These are declared in the (X)HTML specifications at the 960www.W3.org site. At time of writing (September 2001), the most recent 961entity declaration files are: 962 963 http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent 964 http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent 965 http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent 966 967Then you can progress through any remaining notable Unicode characters 968in the range 0x2000-0x204D (consult the character tables at 969www.unicode.org), and whatever else strikes your fancy. For example, 970in F<xhtml-symbol.ent>, there is the entry: 971 972 <!ENTITY infin "∞"> <!-- infinity, U+221E ISOtech --> 973 974While the mapping "infin" to the character "\x{221E}" will (hopefully) 975have been already handled by the Pod parser, the presence of the 976character in this file means that it's reasonably important enough to 977include in a formatter's table that maps from notable Unicode characters 978to the codes necessary for rendering them. So for a Unicode-to-*roff 979mapping, for example, this would merit the entry: 980 981 "\x{221E}" => '\(in', 982 983It is eagerly hoped that in the future, increasing numbers of formats 984(and formatters) will support Unicode characters directly (as (X)HTML 985does with C<∞>, C<∞>, or C<∞>), reducing the need 986for idiosyncratic mappings of Unicode-to-I<my_escapes>. 987 988=item * 989 990It is up to individual Pod formatter to display good judgement when 991confronted with an unrenderable character (which is distinct from an 992unknown EE<lt>thing> sequence that the parser couldn't resolve to 993anything, renderable or not). It is good practice to map Latin letters 994with diacritics (like "EE<lt>eacute>"/"EE<lt>233>") to the corresponding 995unaccented US-ASCII letters (like a simple character 101, "e"), but 996clearly this is often not feasible, and an unrenderable character may 997be represented as "?", or the like. In attempting a sane fallback 998(as from EE<lt>233> to "e"), Pod formatters may use the 999%Latin1Code_to_fallback table in L<Pod::Escapes|Pod::Escapes>, or 1000L<Text::Unidecode|Text::Unidecode>, if available. 1001 1002For example, this Pod text: 1003 1004 magic is enabled if you set C<$Currency> to 'E<euro>'. 1005 1006may be rendered as: 1007"magic is enabled if you set C<$Currency> to 'I<?>'" or as 1008"magic is enabled if you set C<$Currency> to 'B<[euro]>'", or as 1009"magic is enabled if you set C<$Currency> to '[x20AC]', etc. 1010 1011A Pod formatter may also note, in a comment or warning, a list of what 1012unrenderable characters were encountered. 1013 1014=item * 1015 1016EE<lt>...> may freely appear in any formatting code (other than 1017in another EE<lt>...> or in an ZE<lt>>). That is, "XE<lt>The 1018EE<lt>euro>1,000,000 Solution>" is valid, as is "LE<lt>The 1019EE<lt>euro>1,000,000 Solution|Million::Euros>". 1020 1021=item * 1022 1023Some Pod formatters output to formats that implement non-breaking 1024spaces as an individual character (which I'll call "NBSP"), and 1025others output to formats that implement non-breaking spaces just as 1026spaces wrapped in a "don't break this across lines" code. Note that 1027at the level of Pod, both sorts of codes can occur: Pod can contain a 1028NBSP character (whether as a literal, or as a "EE<lt>160>" or 1029"EE<lt>nbsp>" code); and Pod can contain "SE<lt>foo 1030IE<lt>barE<gt> baz>" codes, where "mere spaces" (character 32) in 1031such codes are taken to represent non-breaking spaces. Pod 1032parsers should consider supporting the optional parsing of "SE<lt>foo 1033IE<lt>barE<gt> baz>" as if it were 1034"fooI<NBSP>IE<lt>barE<gt>I<NBSP>baz", and, going the other way, the 1035optional parsing of groups of words joined by NBSP's as if each group 1036were in a SE<lt>...> code, so that formatters may use the 1037representation that maps best to what the output format demands. 1038 1039=item * 1040 1041Some processors may find that the C<SE<lt>...E<gt>> code is easiest to 1042implement by replacing each space in the parse tree under the content 1043of the S, with an NBSP. But note: the replacement should apply I<not> to 1044spaces in I<all> text, but I<only> to spaces in I<printable> text. (This 1045distinction may or may not be evident in the particular tree/event 1046model implemented by the Pod parser.) For example, consider this 1047unusual case: 1048 1049 S<L</Autoloaded Functions>> 1050 1051This means that the space in the middle of the visible link text must 1052not be broken across lines. In other words, it's the same as this: 1053 1054 L<"AutoloadedE<160>Functions"/Autoloaded Functions> 1055 1056However, a misapplied space-to-NBSP replacement could (wrongly) 1057produce something equivalent to this: 1058 1059 L<"AutoloadedE<160>Functions"/AutoloadedE<160>Functions> 1060 1061...which is almost definitely not going to work as a hyperlink (assuming 1062this formatter outputs a format supporting hypertext). 1063 1064Formatters may choose to just not support the S format code, 1065especially in cases where the output format simply has no NBSP 1066character/code and no code for "don't break this stuff across lines". 1067 1068=item * 1069 1070Besides the NBSP character discussed above, implementors are reminded 1071of the existence of the other "special" character in Latin-1, the 1072"soft hyphen" character, also known as "discretionary hyphen", 1073i.e. C<EE<lt>173E<gt>> = C<EE<lt>0xADE<gt>> = 1074C<EE<lt>shyE<gt>>). This character expresses an optional hyphenation 1075point. That is, it normally renders as nothing, but may render as a 1076"-" if a formatter breaks the word at that point. Pod formatters 1077should, as appropriate, do one of the following: 1) render this with 1078a code with the same meaning (e.g., "\-" in RTF), 2) pass it through 1079in the expectation that the formatter understands this character as 1080such, or 3) delete it. 1081 1082For example: 1083 1084 sigE<shy>action 1085 manuE<shy>script 1086 JarkE<shy>ko HieE<shy>taE<shy>nieE<shy>mi 1087 1088These signal to a formatter that if it is to hyphenate "sigaction" 1089or "manuscript", then it should be done as 1090"sig-I<[linebreak]>action" or "manu-I<[linebreak]>script" 1091(and if it doesn't hyphenate it, then the C<EE<lt>shyE<gt>> doesn't 1092show up at all). And if it is 1093to hyphenate "Jarkko" and/or "Hietaniemi", it can do 1094so only at the points where there is a C<EE<lt>shyE<gt>> code. 1095 1096In practice, it is anticipated that this character will not be used 1097often, but formatters should either support it, or delete it. 1098 1099=item * 1100 1101If you think that you want to add a new command to Pod (like, say, a 1102"=biblio" command), consider whether you could get the same 1103effect with a for or begin/end sequence: "=for biblio ..." or "=begin 1104biblio" ... "=end biblio". Pod processors that don't understand 1105"=for biblio", etc, will simply ignore it, whereas they may complain 1106loudly if they see "=biblio". 1107 1108=item * 1109 1110Throughout this document, "Pod" has been the preferred spelling for 1111the name of the documentation format. One may also use "POD" or 1112"pod". For the documentation that is (typically) in the Pod 1113format, you may use "pod", or "Pod", or "POD". Understanding these 1114distinctions is useful; but obsessing over how to spell them, usually 1115is not. 1116 1117=back 1118 1119 1120 1121 1122 1123=head1 About LE<lt>...E<gt> Codes 1124 1125As you can tell from a glance at L<perlpod|perlpod>, the LE<lt>...> 1126code is the most complex of the Pod formatting codes. The points below 1127will hopefully clarify what it means and how processors should deal 1128with it. 1129 1130=over 1131 1132=item * 1133 1134In parsing an LE<lt>...> code, Pod parsers must distinguish at least 1135four attributes: 1136 1137=over 1138 1139=item First: 1140 1141The link-text. If there is none, this must be C<undef>. (E.g., in 1142"LE<lt>Perl Functions|perlfunc>", the link-text is "Perl Functions". 1143In "LE<lt>Time::HiRes>" and even "LE<lt>|Time::HiRes>", there is no 1144link text. Note that link text may contain formatting.) 1145 1146=item Second: 1147 1148The possibly inferred link-text; i.e., if there was no real link 1149text, then this is the text that we'll infer in its place. (E.g., for 1150"LE<lt>Getopt::Std>", the inferred link text is "Getopt::Std".) 1151 1152=item Third: 1153 1154The name or URL, or C<undef> if none. (E.g., in "LE<lt>Perl 1155Functions|perlfunc>", the name (also sometimes called the page) 1156is "perlfunc". In "LE<lt>/CAVEATS>", the name is C<undef>.) 1157 1158=item Fourth: 1159 1160The section (AKA "item" in older perlpods), or C<undef> if none. E.g., 1161in "LE<lt>Getopt::Std/DESCRIPTIONE<gt>", "DESCRIPTION" is the section. (Note 1162that this is not the same as a manpage section like the "5" in "man 5 1163crontab". "Section Foo" in the Pod sense means the part of the text 1164that's introduced by the heading or item whose text is "Foo".) 1165 1166=back 1167 1168Pod parsers may also note additional attributes including: 1169 1170=over 1171 1172=item Fifth: 1173 1174A flag for whether item 3 (if present) is a URL (like 1175"http://lists.perl.org" is), in which case there should be no section 1176attribute; a Pod name (like "perldoc" and "Getopt::Std" are); or 1177possibly a man page name (like "crontab(5)" is). 1178 1179=item Sixth: 1180 1181The raw original LE<lt>...> content, before text is split on 1182"|", "/", etc, and before EE<lt>...> codes are expanded. 1183 1184=back 1185 1186(The above were numbered only for concise reference below. It is not 1187a requirement that these be passed as an actual list or array.) 1188 1189For example: 1190 1191 L<Foo::Bar> 1192 => undef, # link text 1193 "Foo::Bar", # possibly inferred link text 1194 "Foo::Bar", # name 1195 undef, # section 1196 'pod', # what sort of link 1197 "Foo::Bar" # original content 1198 1199 L<Perlport's section on NL's|perlport/Newlines> 1200 => "Perlport's section on NL's", # link text 1201 "Perlport's section on NL's", # possibly inferred link text 1202 "perlport", # name 1203 "Newlines", # section 1204 'pod', # what sort of link 1205 "Perlport's section on NL's|perlport/Newlines" 1206 # original content 1207 1208 L<perlport/Newlines> 1209 => undef, # link text 1210 '"Newlines" in perlport', # possibly inferred link text 1211 "perlport", # name 1212 "Newlines", # section 1213 'pod', # what sort of link 1214 "perlport/Newlines" # original content 1215 1216 L<crontab(5)/"DESCRIPTION"> 1217 => undef, # link text 1218 '"DESCRIPTION" in crontab(5)', # possibly inferred link text 1219 "crontab(5)", # name 1220 "DESCRIPTION", # section 1221 'man', # what sort of link 1222 'crontab(5)/"DESCRIPTION"' # original content 1223 1224 L</Object Attributes> 1225 => undef, # link text 1226 '"Object Attributes"', # possibly inferred link text 1227 undef, # name 1228 "Object Attributes", # section 1229 'pod', # what sort of link 1230 "/Object Attributes" # original content 1231 1232 L<http://www.perl.org/> 1233 => undef, # link text 1234 "http://www.perl.org/", # possibly inferred link text 1235 "http://www.perl.org/", # name 1236 undef, # section 1237 'url', # what sort of link 1238 "http://www.perl.org/" # original content 1239 1240 L<Perl.org|http://www.perl.org/> 1241 => "Perl.org", # link text 1242 "http://www.perl.org/", # possibly inferred link text 1243 "http://www.perl.org/", # name 1244 undef, # section 1245 'url', # what sort of link 1246 "Perl.org|http://www.perl.org/" # original content 1247 1248Note that you can distinguish URL-links from anything else by the 1249fact that they match C<m/\A\w+:[^:\s]\S*\z/>. So 1250C<LE<lt>http://www.perl.comE<gt>> is a URL, but 1251C<LE<lt>HTTP::ResponseE<gt>> isn't. 1252 1253=item * 1254 1255In case of LE<lt>...> codes with no "text|" part in them, 1256older formatters have exhibited great variation in actually displaying 1257the link or cross reference. For example, LE<lt>crontab(5)> would render 1258as "the C<crontab(5)> manpage", or "in the C<crontab(5)> manpage" 1259or just "C<crontab(5)>". 1260 1261Pod processors must now treat "text|"-less links as follows: 1262 1263 L<name> => L<name|name> 1264 L</section> => L<"section"|/section> 1265 L<name/section> => L<"section" in name|name/section> 1266 1267=item * 1268 1269Note that section names might contain markup. I.e., if a section 1270starts with: 1271 1272 =head2 About the C<-M> Operator 1273 1274or with: 1275 1276 =item About the C<-M> Operator 1277 1278then a link to it would look like this: 1279 1280 L<somedoc/About the C<-M> Operator> 1281 1282Formatters may choose to ignore the markup for purposes of resolving 1283the link and use only the renderable characters in the section name, 1284as in: 1285 1286 <h1><a name="About_the_-M_Operator">About the <code>-M</code> 1287 Operator</h1> 1288 1289 ... 1290 1291 <a href="somedoc#About_the_-M_Operator">About the <code>-M</code> 1292 Operator" in somedoc</a> 1293 1294=item * 1295 1296Previous versions of perlpod distinguished C<LE<lt>name/"section"E<gt>> 1297links from C<LE<lt>name/itemE<gt>> links (and their targets). These 1298have been merged syntactically and semantically in the current 1299specification, and I<section> can refer either to a "=headI<n> Heading 1300Content" command or to a "=item Item Content" command. This 1301specification does not specify what behavior should be in the case 1302of a given document having several things all seeming to produce the 1303same I<section> identifier (e.g., in HTML, several things all producing 1304the same I<anchorname> in <a name="I<anchorname>">...</a> 1305elements). Where Pod processors can control this behavior, they should 1306use the first such anchor. That is, C<LE<lt>Foo/BarE<gt>> refers to the 1307I<first> "Bar" section in Foo. 1308 1309But for some processors/formats this cannot be easily controlled; as 1310with the HTML example, the behavior of multiple ambiguous 1311<a name="I<anchorname>">...</a> is most easily just left up to 1312browsers to decide. 1313 1314=item * 1315 1316In a C<LE<lt>text|...E<gt>> code, text may contain formatting codes 1317for formatting or for EE<lt>...> escapes, as in: 1318 1319 L<B<ummE<234>stuff>|...> 1320 1321For C<LE<lt>...E<gt>> codes without a "name|" part, only 1322C<EE<lt>...E<gt>> and C<ZE<lt>E<gt>> codes may occur. That is, 1323authors should not use "C<LE<lt>BE<lt>Foo::BarE<gt>E<gt>>". 1324 1325Note, however, that formatting codes and ZE<lt>>'s can occur in any 1326and all parts of an LE<lt>...> (i.e., in I<name>, I<section>, I<text>, 1327and I<url>). 1328 1329Authors must not nest LE<lt>...> codes. For example, "LE<lt>The 1330LE<lt>Foo::Bar> man page>" should be treated as an error. 1331 1332=item * 1333 1334Note that Pod authors may use formatting codes inside the "text" 1335part of "LE<lt>text|name>" (and so on for LE<lt>text|/"sec">). 1336 1337In other words, this is valid: 1338 1339 Go read L<the docs on C<$.>|perlvar/"$."> 1340 1341Some output formats that do allow rendering "LE<lt>...>" codes as 1342hypertext, might not allow the link-text to be formatted; in 1343that case, formatters will have to just ignore that formatting. 1344 1345=item * 1346 1347At time of writing, C<LE<lt>nameE<gt>> values are of two types: 1348either the name of a Pod page like C<LE<lt>Foo::BarE<gt>> (which 1349might be a real Perl module or program in an @INC / PATH 1350directory, or a .pod file in those places); or the name of a Unix 1351man page, like C<LE<lt>crontab(5)E<gt>>. In theory, C<LE<lt>chmodE<gt>> 1352is ambiguous between a Pod page called "chmod", or the Unix man page 1353"chmod" (in whatever man-section). However, the presence of a string 1354in parens, as in "crontab(5)", is sufficient to signal that what 1355is being discussed is not a Pod page, and so is presumably a 1356Unix man page. The distinction is of no importance to many 1357Pod processors, but some processors that render to hypertext formats 1358may need to distinguish them in order to know how to render a 1359given C<LE<lt>fooE<gt>> code. 1360 1361=item * 1362 1363Previous versions of perlpod allowed for a C<LE<lt>sectionE<gt>> syntax (as in 1364C<LE<lt>Object AttributesE<gt>>), which was not easily distinguishable from 1365C<LE<lt>nameE<gt>> syntax and for C<LE<lt>"section"E<gt>> which was only 1366slightly less ambiguous. This syntax is no longer in the specification, and 1367has been replaced by the C<LE<lt>/sectionE<gt>> syntax (where the slash was 1368formerly optional). Pod parsers should tolerate the C<LE<lt>"section"E<gt>> 1369syntax, for a while at least. The suggested heuristic for distinguishing 1370C<LE<lt>sectionE<gt>> from C<LE<lt>nameE<gt>> is that if it contains any 1371whitespace, it's a I<section>. Pod processors should warn about this being 1372deprecated syntax. 1373 1374=back 1375 1376=head1 About =over...=back Regions 1377 1378"=over"..."=back" regions are used for various kinds of list-like 1379structures. (I use the term "region" here simply as a collective 1380term for everything from the "=over" to the matching "=back".) 1381 1382=over 1383 1384=item * 1385 1386The non-zero numeric I<indentlevel> in "=over I<indentlevel>" ... 1387"=back" is used for giving the formatter a clue as to how many 1388"spaces" (ems, or roughly equivalent units) it should tab over, 1389although many formatters will have to convert this to an absolute 1390measurement that may not exactly match with the size of spaces (or M's) 1391in the document's base font. Other formatters may have to completely 1392ignore the number. The lack of any explicit I<indentlevel> parameter is 1393equivalent to an I<indentlevel> value of 4. Pod processors may 1394complain if I<indentlevel> is present but is not a positive number 1395matching C<m/\A(\d*\.)?\d+\z/>. 1396 1397=item * 1398 1399Authors of Pod formatters are reminded that "=over" ... "=back" may 1400map to several different constructs in your output format. For 1401example, in converting Pod to (X)HTML, it can map to any of 1402<ul>...</ul>, <ol>...</ol>, <dl>...</dl>, or 1403<blockquote>...</blockquote>. Similarly, "=item" can map to <li> or 1404<dt>. 1405 1406=item * 1407 1408Each "=over" ... "=back" region should be one of the following: 1409 1410=over 1411 1412=item * 1413 1414An "=over" ... "=back" region containing only "=item *" commands, 1415each followed by some number of ordinary/verbatim paragraphs, other 1416nested "=over" ... "=back" regions, "=for..." paragraphs, and 1417"=begin"..."=end" regions. 1418 1419(Pod processors must tolerate a bare "=item" as if it were "=item 1420*".) Whether "*" is rendered as a literal asterisk, an "o", or as 1421some kind of real bullet character, is left up to the Pod formatter, 1422and may depend on the level of nesting. 1423 1424=item * 1425 1426An "=over" ... "=back" region containing only 1427C<m/\A=item\s+\d+\.?\s*\z/> paragraphs, each one (or each group of them) 1428followed by some number of ordinary/verbatim paragraphs, other nested 1429"=over" ... "=back" regions, "=for..." paragraphs, and/or 1430"=begin"..."=end" codes. Note that the numbers must start at 1 1431in each section, and must proceed in order and without skipping 1432numbers. 1433 1434(Pod processors must tolerate lines like "=item 1" as if they were 1435"=item 1.", with the period.) 1436 1437=item * 1438 1439An "=over" ... "=back" region containing only "=item [text]" 1440commands, each one (or each group of them) followed by some number of 1441ordinary/verbatim paragraphs, other nested "=over" ... "=back" 1442regions, or "=for..." paragraphs, and "=begin"..."=end" regions. 1443 1444The "=item [text]" paragraph should not match 1445C<m/\A=item\s+\d+\.?\s*\z/> or C<m/\A=item\s+\*\s*\z/>, nor should it 1446match just C<m/\A=item\s*\z/>. 1447 1448=item * 1449 1450An "=over" ... "=back" region containing no "=item" paragraphs at 1451all, and containing only some number of 1452ordinary/verbatim paragraphs, and possibly also some nested "=over" 1453... "=back" regions, "=for..." paragraphs, and "=begin"..."=end" 1454regions. Such an itemless "=over" ... "=back" region in Pod is 1455equivalent in meaning to a "<blockquote>...</blockquote>" element in 1456HTML. 1457 1458=back 1459 1460Note that with all the above cases, you can determine which type of 1461"=over" ... "=back" you have, by examining the first (non-"=cut", 1462non-"=pod") Pod paragraph after the "=over" command. 1463 1464=item * 1465 1466Pod formatters I<must> tolerate arbitrarily large amounts of text 1467in the "=item I<text...>" paragraph. In practice, most such 1468paragraphs are short, as in: 1469 1470 =item For cutting off our trade with all parts of the world 1471 1472But they may be arbitrarily long: 1473 1474 =item For transporting us beyond seas to be tried for pretended 1475 offenses 1476 1477 =item He is at this time transporting large armies of foreign 1478 mercenaries to complete the works of death, desolation and 1479 tyranny, already begun with circumstances of cruelty and perfidy 1480 scarcely paralleled in the most barbarous ages, and totally 1481 unworthy the head of a civilized nation. 1482 1483=item * 1484 1485Pod processors should tolerate "=item *" / "=item I<number>" commands 1486with no accompanying paragraph. The middle item is an example: 1487 1488 =over 1489 1490 =item 1 1491 1492 Pick up dry cleaning. 1493 1494 =item 2 1495 1496 =item 3 1497 1498 Stop by the store. Get Abba Zabas, Stoli, and cheap lawn chairs. 1499 1500 =back 1501 1502=item * 1503 1504No "=over" ... "=back" region can contain headings. Processors may 1505treat such a heading as an error. 1506 1507=item * 1508 1509Note that an "=over" ... "=back" region should have some 1510content. That is, authors should not have an empty region like this: 1511 1512 =over 1513 1514 =back 1515 1516Pod processors seeing such a contentless "=over" ... "=back" region, 1517may ignore it, or may report it as an error. 1518 1519=item * 1520 1521Processors must tolerate an "=over" list that goes off the end of the 1522document (i.e., which has no matching "=back"), but they may warn 1523about such a list. 1524 1525=item * 1526 1527Authors of Pod formatters should note that this construct: 1528 1529 =item Neque 1530 1531 =item Porro 1532 1533 =item Quisquam Est 1534 1535 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1536 velit, sed quia non numquam eius modi tempora incidunt ut 1537 labore et dolore magnam aliquam quaerat voluptatem. 1538 1539 =item Ut Enim 1540 1541is semantically ambiguous, in a way that makes formatting decisions 1542a bit difficult. On the one hand, it could be mention of an item 1543"Neque", mention of another item "Porro", and mention of another 1544item "Quisquam Est", with just the last one requiring the explanatory 1545paragraph "Qui dolorem ipsum quia dolor..."; and then an item 1546"Ut Enim". In that case, you'd want to format it like so: 1547 1548 Neque 1549 1550 Porro 1551 1552 Quisquam Est 1553 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1554 velit, sed quia non numquam eius modi tempora incidunt ut 1555 labore et dolore magnam aliquam quaerat voluptatem. 1556 1557 Ut Enim 1558 1559But it could equally well be a discussion of three (related or equivalent) 1560items, "Neque", "Porro", and "Quisquam Est", followed by a paragraph 1561explaining them all, and then a new item "Ut Enim". In that case, you'd 1562probably want to format it like so: 1563 1564 Neque 1565 Porro 1566 Quisquam Est 1567 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1568 velit, sed quia non numquam eius modi tempora incidunt ut 1569 labore et dolore magnam aliquam quaerat voluptatem. 1570 1571 Ut Enim 1572 1573But (for the foreseeable future), Pod does not provide any way for Pod 1574authors to distinguish which grouping is meant by the above 1575"=item"-cluster structure. So formatters should format it like so: 1576 1577 Neque 1578 1579 Porro 1580 1581 Quisquam Est 1582 1583 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1584 velit, sed quia non numquam eius modi tempora incidunt ut 1585 labore et dolore magnam aliquam quaerat voluptatem. 1586 1587 Ut Enim 1588 1589That is, there should be (at least roughly) equal spacing between 1590items as between paragraphs (although that spacing may well be less 1591than the full height of a line of text). This leaves it to the reader 1592to use (con)textual cues to figure out whether the "Qui dolorem 1593ipsum..." paragraph applies to the "Quisquam Est" item or to all three 1594items "Neque", "Porro", and "Quisquam Est". While not an ideal 1595situation, this is preferable to providing formatting cues that may 1596be actually contrary to the author's intent. 1597 1598=back 1599 1600 1601 1602=head1 About Data Paragraphs and "=begin/=end" Regions 1603 1604Data paragraphs are typically used for inlining non-Pod data that is 1605to be used (typically passed through) when rendering the document to 1606a specific format: 1607 1608 =begin rtf 1609 1610 \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par} 1611 1612 =end rtf 1613 1614The exact same effect could, incidentally, be achieved with a single 1615"=for" paragraph: 1616 1617 =for rtf \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par} 1618 1619(Although that is not formally a data paragraph, it has the same 1620meaning as one, and Pod parsers may parse it as one.) 1621 1622Another example of a data paragraph: 1623 1624 =begin html 1625 1626 I like <em>PIE</em>! 1627 1628 <hr>Especially pecan pie! 1629 1630 =end html 1631 1632If these were ordinary paragraphs, the Pod parser would try to 1633expand the "EE<lt>/em>" (in the first paragraph) as a formatting 1634code, just like "EE<lt>lt>" or "EE<lt>eacute>". But since this 1635is in a "=begin I<identifier>"..."=end I<identifier>" region I<and> 1636the identifier "html" doesn't begin have a ":" prefix, the contents 1637of this region are stored as data paragraphs, instead of being 1638processed as ordinary paragraphs (or if they began with a spaces 1639and/or tabs, as verbatim paragraphs). 1640 1641As a further example: At time of writing, no "biblio" identifier is 1642supported, but suppose some processor were written to recognize it as 1643a way of (say) denoting a bibliographic reference (necessarily 1644containing formatting codes in ordinary paragraphs). The fact that 1645"biblio" paragraphs were meant for ordinary processing would be 1646indicated by prefacing each "biblio" identifier with a colon: 1647 1648 =begin :biblio 1649 1650 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1651 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1652 1653 =end :biblio 1654 1655This would signal to the parser that paragraphs in this begin...end 1656region are subject to normal handling as ordinary/verbatim paragraphs 1657(while still tagged as meant only for processors that understand the 1658"biblio" identifier). The same effect could be had with: 1659 1660 =for :biblio 1661 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1662 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1663 1664The ":" on these identifiers means simply "process this stuff 1665normally, even though the result will be for some special target". 1666I suggest that parser APIs report "biblio" as the target identifier, 1667but also report that it had a ":" prefix. (And similarly, with the 1668above "html", report "html" as the target identifier, and note the 1669I<lack> of a ":" prefix.) 1670 1671Note that a "=begin I<identifier>"..."=end I<identifier>" region where 1672I<identifier> begins with a colon, I<can> contain commands. For example: 1673 1674 =begin :biblio 1675 1676 Wirth's classic is available in several editions, including: 1677 1678 =for comment 1679 hm, check abebooks.com for how much used copies cost. 1680 1681 =over 1682 1683 =item 1684 1685 Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.> 1686 Teubner, Stuttgart. [Yes, it's in German.] 1687 1688 =item 1689 1690 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1691 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1692 1693 =back 1694 1695 =end :biblio 1696 1697Note, however, a "=begin I<identifier>"..."=end I<identifier>" 1698region where I<identifier> does I<not> begin with a colon, should not 1699directly contain "=head1" ... "=head4" commands, nor "=over", nor "=back", 1700nor "=item". For example, this may be considered invalid: 1701 1702 =begin somedata 1703 1704 This is a data paragraph. 1705 1706 =head1 Don't do this! 1707 1708 This is a data paragraph too. 1709 1710 =end somedata 1711 1712A Pod processor may signal that the above (specifically the "=head1" 1713paragraph) is an error. Note, however, that the following should 1714I<not> be treated as an error: 1715 1716 =begin somedata 1717 1718 This is a data paragraph. 1719 1720 =cut 1721 1722 # Yup, this isn't Pod anymore. 1723 sub excl { (rand() > .5) ? "hoo!" : "hah!" } 1724 1725 =pod 1726 1727 This is a data paragraph too. 1728 1729 =end somedata 1730 1731And this too is valid: 1732 1733 =begin someformat 1734 1735 This is a data paragraph. 1736 1737 And this is a data paragraph. 1738 1739 =begin someotherformat 1740 1741 This is a data paragraph too. 1742 1743 And this is a data paragraph too. 1744 1745 =begin :yetanotherformat 1746 1747 =head2 This is a command paragraph! 1748 1749 This is an ordinary paragraph! 1750 1751 And this is a verbatim paragraph! 1752 1753 =end :yetanotherformat 1754 1755 =end someotherformat 1756 1757 Another data paragraph! 1758 1759 =end someformat 1760 1761The contents of the above "=begin :yetanotherformat" ... 1762"=end :yetanotherformat" region I<aren't> data paragraphs, because 1763the immediately containing region's identifier (":yetanotherformat") 1764begins with a colon. In practice, most regions that contain 1765data paragraphs will contain I<only> data paragraphs; however, 1766the above nesting is syntactically valid as Pod, even if it is 1767rare. However, the handlers for some formats, like "html", 1768will accept only data paragraphs, not nested regions; and they may 1769complain if they see (targeted for them) nested regions, or commands, 1770other than "=end", "=pod", and "=cut". 1771 1772Also consider this valid structure: 1773 1774 =begin :biblio 1775 1776 Wirth's classic is available in several editions, including: 1777 1778 =over 1779 1780 =item 1781 1782 Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.> 1783 Teubner, Stuttgart. [Yes, it's in German.] 1784 1785 =item 1786 1787 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1788 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1789 1790 =back 1791 1792 Buy buy buy! 1793 1794 =begin html 1795 1796 <img src='wirth_spokesmodeling_book.png'> 1797 1798 <hr> 1799 1800 =end html 1801 1802 Now now now! 1803 1804 =end :biblio 1805 1806There, the "=begin html"..."=end html" region is nested inside 1807the larger "=begin :biblio"..."=end :biblio" region. Note that the 1808content of the "=begin html"..."=end html" region is data 1809paragraph(s), because the immediately containing region's identifier 1810("html") I<doesn't> begin with a colon. 1811 1812Pod parsers, when processing a series of data paragraphs one 1813after another (within a single region), should consider them to 1814be one large data paragraph that happens to contain blank lines. So 1815the content of the above "=begin html"..."=end html" I<may> be stored 1816as two data paragraphs (one consisting of 1817"<img src='wirth_spokesmodeling_book.png'>\n" 1818and another consisting of "<hr>\n"), but I<should> be stored as 1819a single data paragraph (consisting of 1820"<img src='wirth_spokesmodeling_book.png'>\n\n<hr>\n"). 1821 1822Pod processors should tolerate empty 1823"=begin I<something>"..."=end I<something>" regions, 1824empty "=begin :I<something>"..."=end :I<something>" regions, and 1825contentless "=for I<something>" and "=for :I<something>" 1826paragraphs. I.e., these should be tolerated: 1827 1828 =for html 1829 1830 =begin html 1831 1832 =end html 1833 1834 =begin :biblio 1835 1836 =end :biblio 1837 1838Incidentally, note that there's no easy way to express a data 1839paragraph starting with something that looks like a command. Consider: 1840 1841 =begin stuff 1842 1843 =shazbot 1844 1845 =end stuff 1846 1847There, "=shazbot" will be parsed as a Pod command "shazbot", not as a data 1848paragraph "=shazbot\n". However, you can express a data paragraph consisting 1849of "=shazbot\n" using this code: 1850 1851 =for stuff =shazbot 1852 1853The situation where this is necessary, is presumably quite rare. 1854 1855Note that =end commands must match the currently open =begin command. That 1856is, they must properly nest. For example, this is valid: 1857 1858 =begin outer 1859 1860 X 1861 1862 =begin inner 1863 1864 Y 1865 1866 =end inner 1867 1868 Z 1869 1870 =end outer 1871 1872while this is invalid: 1873 1874 =begin outer 1875 1876 X 1877 1878 =begin inner 1879 1880 Y 1881 1882 =end outer 1883 1884 Z 1885 1886 =end inner 1887 1888This latter is improper because when the "=end outer" command is seen, the 1889currently open region has the formatname "inner", not "outer". (It just 1890happens that "outer" is the format name of a higher-up region.) This is 1891an error. Processors must by default report this as an error, and may halt 1892processing the document containing that error. A corollary of this is that 1893regions cannot "overlap". That is, the latter block above does not represent 1894a region called "outer" which contains X and Y, overlapping a region called 1895"inner" which contains Y and Z. But because it is invalid (as all 1896apparently overlapping regions would be), it doesn't represent that, or 1897anything at all. 1898 1899Similarly, this is invalid: 1900 1901 =begin thing 1902 1903 =end hting 1904 1905This is an error because the region is opened by "thing", and the "=end" 1906tries to close "hting" [sic]. 1907 1908This is also invalid: 1909 1910 =begin thing 1911 1912 =end 1913 1914This is invalid because every "=end" command must have a formatname 1915parameter. 1916 1917=head1 SEE ALSO 1918 1919L<perlpod>, L<perlsyn/"PODs: Embedded Documentation">, 1920L<podchecker> 1921 1922=head1 AUTHOR 1923 1924Sean M. Burke 1925 1926=cut 1927 1928 1929