1=encoding utf8 2 3=head1 NAME 4 5perlpodspec - Plain Old Documentation: format specification and notes 6 7=head1 DESCRIPTION 8 9This document is detailed notes on the Pod markup language. Most 10people will only have to read L<perlpod|perlpod> to know how to write 11in Pod, but this document may answer some incidental questions to do 12with parsing and rendering Pod. 13 14In this document, "must" / "must not", "should" / 15"should not", and "may" have their conventional (cf. RFC 2119) 16meanings: "X must do Y" means that if X doesn't do Y, it's against 17this specification, and should really be fixed. "X should do Y" 18means that it's recommended, but X may fail to do Y, if there's a 19good reason. "X may do Y" is merely a note that X can do Y at 20will (although it is up to the reader to detect any connotation of 21"and I think it would be I<nice> if X did Y" versus "it wouldn't 22really I<bother> me if X did Y"). 23 24Notably, when I say "the parser should do Y", the 25parser may fail to do Y, if the calling application explicitly 26requests that the parser I<not> do Y. I often phrase this as 27"the parser should, by default, do Y." This doesn't I<require> 28the parser to provide an option for turning off whatever 29feature Y is (like expanding tabs in verbatim paragraphs), although 30it implicates that such an option I<may> be provided. 31 32=head1 Pod Definitions 33 34Pod is embedded in files, typically Perl source files, although you 35can write a file that's nothing but Pod. 36 37A B<line> in a file consists of zero or more non-newline characters, 38terminated by either a newline or the end of the file. 39 40A B<newline sequence> is usually a platform-dependent concept, but 41Pod parsers should understand it to mean any of CR (ASCII 13), LF 42(ASCII 10), or a CRLF (ASCII 13 followed immediately by ASCII 10), in 43addition to any other system-specific meaning. The first CR/CRLF/LF 44sequence in the file may be used as the basis for identifying the 45newline sequence for parsing the rest of the file. 46 47A B<blank line> is a line consisting entirely of zero or more spaces 48(ASCII 32) or tabs (ASCII 9), and terminated by a newline or end-of-file. 49A B<non-blank line> is a line containing one or more characters other 50than space or tab (and terminated by a newline or end-of-file). 51 52(I<Note:> Many older Pod parsers did not accept a line consisting of 53spaces/tabs and then a newline as a blank line. The only lines they 54considered blank were lines consisting of I<no characters at all>, 55terminated by a newline.) 56 57B<Whitespace> is used in this document as a blanket term for spaces, 58tabs, and newline sequences. (By itself, this term usually refers 59to literal whitespace. That is, sequences of whitespace characters 60in Pod source, as opposed to "EE<lt>32>", which is a formatting 61code that I<denotes> a whitespace character.) 62 63A B<Pod parser> is a module meant for parsing Pod (regardless of 64whether this involves calling callbacks or building a parse tree or 65directly formatting it). A B<Pod formatter> (or B<Pod translator>) 66is a module or program that converts Pod to some other format (HTML, 67plaintext, TeX, PostScript, RTF). A B<Pod processor> might be a 68formatter or translator, or might be a program that does something 69else with the Pod (like counting words, scanning for index points, 70etc.). 71 72Pod content is contained in B<Pod blocks>. A Pod block starts with a 73line that matches C<m/\A=[a-zA-Z]/>, and continues up to the next line 74that matches C<m/\A=cut/> or up to the end of the file if there is 75no C<m/\A=cut/> line. 76 77=for comment 78 The current perlsyn says: 79 [beginquote] 80 Note that pod translators should look at only paragraphs beginning 81 with a pod directive (it makes parsing easier), whereas the compiler 82 actually knows to look for pod escapes even in the middle of a 83 paragraph. This means that the following secret stuff will be ignored 84 by both the compiler and the translators. 85 $a=3; 86 =secret stuff 87 warn "Neither POD nor CODE!?" 88 =cut back 89 print "got $a\n"; 90 You probably shouldn't rely upon the warn() being podded out forever. 91 Not all pod translators are well-behaved in this regard, and perhaps 92 the compiler will become pickier. 93 [endquote] 94 I think that those paragraphs should just be removed; paragraph-based 95 parsing seems to have been largely abandoned, because of the hassle 96 with non-empty blank lines messing up what people meant by "paragraph". 97 Even if the "it makes parsing easier" bit were especially true, 98 it wouldn't be worth the confusion of having perl and pod2whatever 99 actually disagree on what can constitute a Pod block. 100 101Note that a parser is not expected to distinguish between something that 102looks like pod, but is in a quoted string, such as a here document. 103 104Within a Pod block, there are B<Pod paragraphs>. A Pod paragraph 105consists of non-blank lines of text, separated by one or more blank 106lines. 107 108For purposes of Pod processing, there are four types of paragraphs in 109a Pod block: 110 111=over 112 113=item * 114 115A command paragraph (also called a "directive"). The first line of 116this paragraph must match C<m/\A=[a-zA-Z]/>. Command paragraphs are 117typically one line, as in: 118 119 =head1 NOTES 120 121 =item * 122 123But they may span several (non-blank) lines: 124 125 =for comment 126 Hm, I wonder what it would look like if 127 you tried to write a BNF for Pod from this. 128 129 =head3 Dr. Strangelove, or: How I Learned to 130 Stop Worrying and Love the Bomb 131 132I<Some> command paragraphs allow formatting codes in their content 133(i.e., after the part that matches C<m/\A=[a-zA-Z]\S*\s*/>), as in: 134 135 =head1 Did You Remember to C<use strict;>? 136 137In other words, the Pod processing handler for "head1" will apply the 138same processing to "Did You Remember to CE<lt>use strict;>?" that it 139would to an ordinary paragraph (i.e., formatting codes like 140"CE<lt>...>") are parsed and presumably formatted appropriately, and 141whitespace in the form of literal spaces and/or tabs is not 142significant. 143 144=item * 145 146A B<verbatim paragraph>. The first line of this paragraph must be a 147literal space or tab, and this paragraph must not be inside a "=begin 148I<identifier>", ... "=end I<identifier>" sequence unless 149"I<identifier>" begins with a colon (":"). That is, if a paragraph 150starts with a literal space or tab, but I<is> inside a 151"=begin I<identifier>", ... "=end I<identifier>" region, then it's 152a data paragraph, unless "I<identifier>" begins with a colon. 153 154Whitespace I<is> significant in verbatim paragraphs (although, in 155processing, tabs are probably expanded). 156 157=item * 158 159An B<ordinary paragraph>. A paragraph is an ordinary paragraph 160if its first line matches neither C<m/\A=[a-zA-Z]/> nor 161C<m/\A[ \t]/>, I<and> if it's not inside a "=begin I<identifier>", 162... "=end I<identifier>" sequence unless "I<identifier>" begins with 163a colon (":"). 164 165=item * 166 167A B<data paragraph>. This is a paragraph that I<is> inside a "=begin 168I<identifier>" ... "=end I<identifier>" sequence where 169"I<identifier>" does I<not> begin with a literal colon (":"). In 170some sense, a data paragraph is not part of Pod at all (i.e., 171effectively it's "out-of-band"), since it's not subject to most kinds 172of Pod parsing; but it is specified here, since Pod 173parsers need to be able to call an event for it, or store it in some 174form in a parse tree, or at least just parse I<around> it. 175 176=back 177 178For example: consider the following paragraphs: 179 180 # <- that's the 0th column 181 182 =head1 Foo 183 184 Stuff 185 186 $foo->bar 187 188 =cut 189 190Here, "=head1 Foo" and "=cut" are command paragraphs because the first 191line of each matches C<m/\A=[a-zA-Z]/>. "I<[space][space]>$foo->bar" 192is a verbatim paragraph, because its first line starts with a literal 193whitespace character (and there's no "=begin"..."=end" region around). 194 195The "=begin I<identifier>" ... "=end I<identifier>" commands stop 196paragraphs that they surround from being parsed as ordinary or verbatim 197paragraphs, if I<identifier> doesn't begin with a colon. This 198is discussed in detail in the section 199L</About Data Paragraphs and "=beginE<sol>=end" Regions>. 200 201=head1 Pod Commands 202 203This section is intended to supplement and clarify the discussion in 204L<perlpod/"Command Paragraph">. These are the currently recognized 205Pod commands: 206 207=over 208 209=item "=head1", "=head2", "=head3", "=head4" 210 211This command indicates that the text in the remainder of the paragraph 212is a heading. That text may contain formatting codes. Examples: 213 214 =head1 Object Attributes 215 216 =head3 What B<Not> to Do! 217 218=item "=pod" 219 220This command indicates that this paragraph begins a Pod block. (If we 221are already in the middle of a Pod block, this command has no effect at 222all.) If there is any text in this command paragraph after "=pod", 223it must be ignored. Examples: 224 225 =pod 226 227 This is a plain Pod paragraph. 228 229 =pod This text is ignored. 230 231=item "=cut" 232 233This command indicates that this line is the end of this previously 234started Pod block. If there is any text after "=cut" on the line, it must be 235ignored. Examples: 236 237 =cut 238 239 =cut The documentation ends here. 240 241 =cut 242 # This is the first line of program text. 243 sub foo { # This is the second. 244 245It is an error to try to I<start> a Pod block with a "=cut" command. In 246that case, the Pod processor must halt parsing of the input file, and 247must by default emit a warning. 248 249=item "=over" 250 251This command indicates that this is the start of a list/indent 252region. If there is any text following the "=over", it must consist 253of only a nonzero positive numeral. The semantics of this numeral is 254explained in the L</"About =over...=back Regions"> section, further 255below. Formatting codes are not expanded. Examples: 256 257 =over 3 258 259 =over 3.5 260 261 =over 262 263=item "=item" 264 265This command indicates that an item in a list begins here. Formatting 266codes are processed. The semantics of the (optional) text in the 267remainder of this paragraph are 268explained in the L</"About =over...=back Regions"> section, further 269below. Examples: 270 271 =item 272 273 =item * 274 275 =item * 276 277 =item 14 278 279 =item 3. 280 281 =item C<< $thing->stuff(I<dodad>) >> 282 283 =item For transporting us beyond seas to be tried for pretended 284 offenses 285 286 =item He is at this time transporting large armies of foreign 287 mercenaries to complete the works of death, desolation and 288 tyranny, already begun with circumstances of cruelty and perfidy 289 scarcely paralleled in the most barbarous ages, and totally 290 unworthy the head of a civilized nation. 291 292=item "=back" 293 294This command indicates that this is the end of the region begun 295by the most recent "=over" command. It permits no text after the 296"=back" command. 297 298=item "=begin formatname" 299 300=item "=begin formatname parameter" 301 302This marks the following paragraphs (until the matching "=end 303formatname") as being for some special kind of processing. Unless 304"formatname" begins with a colon, the contained non-command 305paragraphs are data paragraphs. But if "formatname" I<does> begin 306with a colon, then non-command paragraphs are ordinary paragraphs 307or data paragraphs. This is discussed in detail in the section 308L</About Data Paragraphs and "=beginE<sol>=end" Regions>. 309 310It is advised that formatnames match the regexp 311C<m/\A:?[-a-zA-Z0-9_]+\z/>. Everything following whitespace after the 312formatname is a parameter that may be used by the formatter when dealing 313with this region. This parameter must not be repeated in the "=end" 314paragraph. Implementors should anticipate future expansion in the 315semantics and syntax of the first parameter to "=begin"/"=end"/"=for". 316 317=item "=end formatname" 318 319This marks the end of the region opened by the matching 320"=begin formatname" region. If "formatname" is not the formatname 321of the most recent open "=begin formatname" region, then this 322is an error, and must generate an error message. This 323is discussed in detail in the section 324L</About Data Paragraphs and "=beginE<sol>=end" Regions>. 325 326=item "=for formatname text..." 327 328This is synonymous with: 329 330 =begin formatname 331 332 text... 333 334 =end formatname 335 336That is, it creates a region consisting of a single paragraph; that 337paragraph is to be treated as a normal paragraph if "formatname" 338begins with a ":"; if "formatname" I<doesn't> begin with a colon, 339then "text..." will constitute a data paragraph. There is no way 340to use "=for formatname text..." to express "text..." as a verbatim 341paragraph. 342 343=item "=encoding encodingname" 344 345This command, which should occur early in the document (at least 346before any non-US-ASCII data!), declares that this document is 347encoded in the encoding I<encodingname>, which must be 348an encoding name that L<Encode> recognizes. (Encode's list 349of supported encodings, in L<Encode::Supported>, is useful here.) 350If the Pod parser cannot decode the declared encoding, it 351should emit a warning and may abort parsing the document 352altogether. 353 354A document having more than one "=encoding" line should be 355considered an error. Pod processors may silently tolerate this if 356the not-first "=encoding" lines are just duplicates of the 357first one (e.g., if there's a "=encoding utf8" line, and later on 358another "=encoding utf8" line). But Pod processors should complain if 359there are contradictory "=encoding" lines in the same document 360(e.g., if there is a "=encoding utf8" early in the document and 361"=encoding big5" later). Pod processors that recognize BOMs 362may also complain if they see an "=encoding" line 363that contradicts the BOM (e.g., if a document with a UTF-16LE 364BOM has an "=encoding shiftjis" line). 365 366=back 367 368If a Pod processor sees any command other than the ones listed 369above (like "=head", or "=haed1", or "=stuff", or "=cuttlefish", 370or "=w123"), that processor must by default treat this as an 371error. It must not process the paragraph beginning with that 372command, must by default warn of this as an error, and may 373abort the parse. A Pod parser may allow a way for particular 374applications to add to the above list of known commands, and to 375stipulate, for each additional command, whether formatting 376codes should be processed. 377 378Future versions of this specification may add additional 379commands. 380 381 382 383=head1 Pod Formatting Codes 384 385(Note that in previous drafts of this document and of perlpod, 386formatting codes were referred to as "interior sequences", and 387this term may still be found in the documentation for Pod parsers, 388and in error messages from Pod processors.) 389 390There are two syntaxes for formatting codes: 391 392=over 393 394=item * 395 396A formatting code starts with a capital letter (just US-ASCII [A-Z]) 397followed by a "<", any number of characters, and ending with the first 398matching ">". Examples: 399 400 That's what I<you> think! 401 402 What's C<CORE::dump()> for? 403 404 X<C<chmod> and C<unlink()> Under Different Operating Systems> 405 406=item * 407 408A formatting code starts with a capital letter (just US-ASCII [A-Z]) 409followed by two or more "<"'s, one or more whitespace characters, 410any number of characters, one or more whitespace characters, 411and ending with the first matching sequence of two or more ">"'s, where 412the number of ">"'s equals the number of "<"'s in the opening of this 413formatting code. Examples: 414 415 That's what I<< you >> think! 416 417 C<<< open(X, ">>thing.dat") || die $! >>> 418 419 B<< $foo->bar(); >> 420 421With this syntax, the whitespace character(s) after the "CE<lt><<" 422and before the ">>>" (or whatever letter) are I<not> renderable. They 423do not signify whitespace, are merely part of the formatting codes 424themselves. That is, these are all synonymous: 425 426 C<thing> 427 C<< thing >> 428 C<< thing >> 429 C<<< thing >>> 430 C<<<< 431 thing 432 >>>> 433 434and so on. 435 436Finally, the multiple-angle-bracket form does I<not> alter the interpretation 437of nested formatting codes, meaning that the following four example lines are 438identical in meaning: 439 440 B<example: C<$a E<lt>=E<gt> $b>> 441 442 B<example: C<< $a <=> $b >>> 443 444 B<example: C<< $a E<lt>=E<gt> $b >>> 445 446 B<<< example: C<< $a E<lt>=E<gt> $b >> >>> 447 448=back 449 450In parsing Pod, a notably tricky part is the correct parsing of 451(potentially nested!) formatting codes. Implementors should 452consult the code in the C<parse_text> routine in Pod::Parser as an 453example of a correct implementation. 454 455=over 456 457=item C<IE<lt>textE<gt>> -- italic text 458 459See the brief discussion in L<perlpod/"Formatting Codes">. 460 461=item C<BE<lt>textE<gt>> -- bold text 462 463See the brief discussion in L<perlpod/"Formatting Codes">. 464 465=item C<CE<lt>codeE<gt>> -- code text 466 467See the brief discussion in L<perlpod/"Formatting Codes">. 468 469=item C<FE<lt>filenameE<gt>> -- style for filenames 470 471See the brief discussion in L<perlpod/"Formatting Codes">. 472 473=item C<XE<lt>topic nameE<gt>> -- an index entry 474 475See the brief discussion in L<perlpod/"Formatting Codes">. 476 477This code is unusual in that most formatters completely discard 478this code and its content. Other formatters will render it with 479invisible codes that can be used in building an index of 480the current document. 481 482=item C<ZE<lt>E<gt>> -- a null (zero-effect) formatting code 483 484Discussed briefly in L<perlpod/"Formatting Codes">. 485 486This code is unusual in that it should have no content. That is, 487a processor may complain if it sees C<ZE<lt>potatoesE<gt>>. Whether 488or not it complains, the I<potatoes> text should ignored. 489 490=item C<LE<lt>nameE<gt>> -- a hyperlink 491 492The complicated syntaxes of this code are discussed at length in 493L<perlpod/"Formatting Codes">, and implementation details are 494discussed below, in L</"About LE<lt>...E<gt> Codes">. Parsing the 495contents of LE<lt>content> is tricky. Notably, the content has to be 496checked for whether it looks like a URL, or whether it has to be split 497on literal "|" and/or "/" (in the right order!), and so on, 498I<before> EE<lt>...> codes are resolved. 499 500=item C<EE<lt>escapeE<gt>> -- a character escape 501 502See L<perlpod/"Formatting Codes">, and several points in 503L</Notes on Implementing Pod Processors>. 504 505=item C<SE<lt>textE<gt>> -- text contains non-breaking spaces 506 507This formatting code is syntactically simple, but semantically 508complex. What it means is that each space in the printable 509content of this code signifies a non-breaking space. 510 511Consider: 512 513 C<$x ? $y : $z> 514 515 S<C<$x ? $y : $z>> 516 517Both signify the monospace (c[ode] style) text consisting of 518"$x", one space, "?", one space, ":", one space, "$z". The 519difference is that in the latter, with the S code, those spaces 520are not "normal" spaces, but instead are non-breaking spaces. 521 522=back 523 524 525If a Pod processor sees any formatting code other than the ones 526listed above (as in "NE<lt>...>", or "QE<lt>...>", etc.), that 527processor must by default treat this as an error. 528A Pod parser may allow a way for particular 529applications to add to the above list of known formatting codes; 530a Pod parser might even allow a way to stipulate, for each additional 531command, whether it requires some form of special processing, as 532LE<lt>...> does. 533 534Future versions of this specification may add additional 535formatting codes. 536 537Historical note: A few older Pod processors would not see a ">" as 538closing a "CE<lt>" code, if the ">" was immediately preceded by 539a "-". This was so that this: 540 541 C<$foo->bar> 542 543would parse as equivalent to this: 544 545 C<$foo-E<gt>bar> 546 547instead of as equivalent to a "C" formatting code containing 548only "$foo-", and then a "bar>" outside the "C" formatting code. This 549problem has since been solved by the addition of syntaxes like this: 550 551 C<< $foo->bar >> 552 553Compliant parsers must not treat "->" as special. 554 555Formatting codes absolutely cannot span paragraphs. If a code is 556opened in one paragraph, and no closing code is found by the end of 557that paragraph, the Pod parser must close that formatting code, 558and should complain (as in "Unterminated I code in the paragraph 559starting at line 123: 'Time objects are not...'"). So these 560two paragraphs: 561 562 I<I told you not to do this! 563 564 Don't make me say it again!> 565 566...must I<not> be parsed as two paragraphs in italics (with the I 567code starting in one paragraph and starting in another.) Instead, 568the first paragraph should generate a warning, but that aside, the 569above code must parse as if it were: 570 571 I<I told you not to do this!> 572 573 Don't make me say it again!E<gt> 574 575(In SGMLish jargon, all Pod commands are like block-level 576elements, whereas all Pod formatting codes are like inline-level 577elements.) 578 579 580 581=head1 Notes on Implementing Pod Processors 582 583The following is a long section of miscellaneous requirements 584and suggestions to do with Pod processing. 585 586=over 587 588=item * 589 590Pod formatters should tolerate lines in verbatim blocks that are of 591any length, even if that means having to break them (possibly several 592times, for very long lines) to avoid text running off the side of the 593page. Pod formatters may warn of such line-breaking. Such warnings 594are particularly appropriate for lines are over 100 characters long, which 595are usually not intentional. 596 597=item * 598 599Pod parsers must recognize I<all> of the three well-known newline 600formats: CR, LF, and CRLF. See L<perlport|perlport>. 601 602=item * 603 604Pod parsers should accept input lines that are of any length. 605 606=item * 607 608Since Perl recognizes a Unicode Byte Order Mark at the start of files 609as signaling that the file is Unicode encoded as in UTF-16 (whether 610big-endian or little-endian) or UTF-8, Pod parsers should do the 611same. Otherwise, the character encoding should be understood as 612being UTF-8 if the first highbit byte sequence in the file seems 613valid as a UTF-8 sequence, or otherwise as CP-1252 (earlier versions of 614this specification used Latin-1 instead of CP-1252). 615 616Future versions of this specification may specify 617how Pod can accept other encodings. Presumably treatment of other 618encodings in Pod parsing would be as in XML parsing: whatever the 619encoding declared by a particular Pod file, content is to be 620stored in memory as Unicode characters. 621 622=item * 623 624The well known Unicode Byte Order Marks are as follows: if the 625file begins with the two literal byte values 0xFE 0xFF, this is 626the BOM for big-endian UTF-16. If the file begins with the two 627literal byte value 0xFF 0xFE, this is the BOM for little-endian 628UTF-16. On an ASCII platform, if the file begins with the three literal 629byte values 6300xEF 0xBB 0xBF, this is the BOM for UTF-8. 631A mechanism portable to EBCDIC platforms is to: 632 633 my $utf8_bom = "\x{FEFF}"; 634 utf8::encode($utf8_bom); 635 636=for comment 637 use bytes; print map sprintf(" 0x%02X", ord $_), split '', "\x{feff}"; 638 0xEF 0xBB 0xBF 639 640=for comment 641 If toke.c is modified to support UTF-32, add mention of those here. 642 643=item * 644 645A naive, but often sufficient heuristic on ASCII platforms, for testing 646the first highbit 647byte-sequence in a BOM-less file (whether in code or in Pod!), to see 648whether that sequence is valid as UTF-8 (RFC 2279) is to check whether 649that the first byte in the sequence is in the range 0xC2 - 0xFD 650I<and> whether the next byte is in the range 6510x80 - 0xBF. If so, the parser may conclude that this file is in 652UTF-8, and all highbit sequences in the file should be assumed to 653be UTF-8. Otherwise the parser should treat the file as being 654in CP-1252. (A better check, and which works on EBCDIC platforms as 655well, is to pass a copy of the sequence to 656L<utf8::decode()|utf8> which performs a full validity check on the 657sequence and returns TRUE if it is valid UTF-8, FALSE otherwise. This 658function is always pre-loaded, is fast because it is written in C, and 659will only get called at most once, so you don't need to avoid it out of 660performance concerns.) 661In the unlikely circumstance that the first highbit 662sequence in a truly non-UTF-8 file happens to appear to be UTF-8, one 663can cater to our heuristic (as well as any more intelligent heuristic) 664by prefacing that line with a comment line containing a highbit 665sequence that is clearly I<not> valid as UTF-8. A line consisting 666of simply "#", an e-acute, and any non-highbit byte, 667is sufficient to establish this file's encoding. 668 669=for comment 670 If/WHEN some brave soul makes these heuristics into a generic 671 text-file class (or PerlIO layer?), we can presumably delete 672 mention of these icky details from this file, and can instead 673 tell people to just use appropriate class/layer. 674 Auto-recognition of newline sequences would be another desirable 675 feature of such a class/layer. 676 HINT HINT HINT. 677 678=for comment 679 "The probability that a string of characters 680 in any other encoding appears as valid UTF-8 is low" - RFC2279 681 682=item * 683 684Pod processors must treat a "=for [label] [content...]" paragraph as 685meaning the same thing as a "=begin [label]" paragraph, content, and 686an "=end [label]" paragraph. (The parser may conflate these two 687constructs, or may leave them distinct, in the expectation that the 688formatter will nevertheless treat them the same.) 689 690=item * 691 692When rendering Pod to a format that allows comments (i.e., to nearly 693any format other than plaintext), a Pod formatter must insert comment 694text identifying its name and version number, and the name and 695version numbers of any modules it might be using to process the Pod. 696Minimal examples: 697 698 %% POD::Pod2PS v3.14159, using POD::Parser v1.92 699 700 <!-- Pod::HTML v3.14159, using POD::Parser v1.92 --> 701 702 {\doccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08} 703 704 .\" Pod::Man version 3.14159, using POD::Parser version 1.92 705 706Formatters may also insert additional comments, including: the 707release date of the Pod formatter program, the contact address for 708the author(s) of the formatter, the current time, the name of input 709file, the formatting options in effect, version of Perl used, etc. 710 711Formatters may also choose to note errors/warnings as comments, 712besides or instead of emitting them otherwise (as in messages to 713STDERR, or C<die>ing). 714 715=item * 716 717Pod parsers I<may> emit warnings or error messages ("Unknown E code 718EE<lt>zslig>!") to STDERR (whether through printing to STDERR, or 719C<warn>ing/C<carp>ing, or C<die>ing/C<croak>ing), but I<must> allow 720suppressing all such STDERR output, and instead allow an option for 721reporting errors/warnings 722in some other way, whether by triggering a callback, or noting errors 723in some attribute of the document object, or some similarly unobtrusive 724mechanism -- or even by appending a "Pod Errors" section to the end of 725the parsed form of the document. 726 727=item * 728 729In cases of exceptionally aberrant documents, Pod parsers may abort the 730parse. Even then, using C<die>ing/C<croak>ing is to be avoided; where 731possible, the parser library may simply close the input file 732and add text like "*** Formatting Aborted ***" to the end of the 733(partial) in-memory document. 734 735=item * 736 737In paragraphs where formatting codes (like EE<lt>...>, BE<lt>...>) 738are understood (i.e., I<not> verbatim paragraphs, but I<including> 739ordinary paragraphs, and command paragraphs that produce renderable 740text, like "=head1"), literal whitespace should generally be considered 741"insignificant", in that one literal space has the same meaning as any 742(nonzero) number of literal spaces, literal newlines, and literal tabs 743(as long as this produces no blank lines, since those would terminate 744the paragraph). Pod parsers should compact literal whitespace in each 745processed paragraph, but may provide an option for overriding this 746(since some processing tasks do not require it), or may follow 747additional special rules (for example, specially treating 748period-space-space or period-newline sequences). 749 750=item * 751 752Pod parsers should not, by default, try to coerce apostrophe (') and 753quote (") into smart quotes (little 9's, 66's, 99's, etc), nor try to 754turn backtick (`) into anything else but a single backtick character 755(distinct from an open quote character!), nor "--" into anything but 756two minus signs. They I<must never> do any of those things to text 757in CE<lt>...> formatting codes, and never I<ever> to text in verbatim 758paragraphs. 759 760=item * 761 762When rendering Pod to a format that has two kinds of hyphens (-), one 763that's a non-breaking hyphen, and another that's a breakable hyphen 764(as in "object-oriented", which can be split across lines as 765"object-", newline, "oriented"), formatters are encouraged to 766generally translate "-" to non-breaking hyphen, but may apply 767heuristics to convert some of these to breaking hyphens. 768 769=item * 770 771Pod formatters should make reasonable efforts to keep words of Perl 772code from being broken across lines. For example, "Foo::Bar" in some 773formatting systems is seen as eligible for being broken across lines 774as "Foo::" newline "Bar" or even "Foo::-" newline "Bar". This should 775be avoided where possible, either by disabling all line-breaking in 776mid-word, or by wrapping particular words with internal punctuation 777in "don't break this across lines" codes (which in some formats may 778not be a single code, but might be a matter of inserting non-breaking 779zero-width spaces between every pair of characters in a word.) 780 781=item * 782 783Pod parsers should, by default, expand tabs in verbatim paragraphs as 784they are processed, before passing them to the formatter or other 785processor. Parsers may also allow an option for overriding this. 786 787=item * 788 789Pod parsers should, by default, remove newlines from the end of 790ordinary and verbatim paragraphs before passing them to the 791formatter. For example, while the paragraph you're reading now 792could be considered, in Pod source, to end with (and contain) 793the newline(s) that end it, it should be processed as ending with 794(and containing) the period character that ends this sentence. 795 796=item * 797 798Pod parsers, when reporting errors, should make some effort to report 799an approximate line number ("Nested EE<lt>>'s in Paragraph #52, near 800line 633 of Thing/Foo.pm!"), instead of merely noting the paragraph 801number ("Nested EE<lt>>'s in Paragraph #52 of Thing/Foo.pm!"). Where 802this is problematic, the paragraph number should at least be 803accompanied by an excerpt from the paragraph ("Nested EE<lt>>'s in 804Paragraph #52 of Thing/Foo.pm, which begins 'Read/write accessor for 805the CE<lt>interest rate> attribute...'"). 806 807=item * 808 809Pod parsers, when processing a series of verbatim paragraphs one 810after another, should consider them to be one large verbatim 811paragraph that happens to contain blank lines. I.e., these two 812lines, which have a blank line between them: 813 814 use Foo; 815 816 print Foo->VERSION 817 818should be unified into one paragraph ("\tuse Foo;\n\n\tprint 819Foo->VERSION") before being passed to the formatter or other 820processor. Parsers may also allow an option for overriding this. 821 822While this might be too cumbersome to implement in event-based Pod 823parsers, it is straightforward for parsers that return parse trees. 824 825=item * 826 827Pod formatters, where feasible, are advised to avoid splitting short 828verbatim paragraphs (under twelve lines, say) across pages. 829 830=item * 831 832Pod parsers must treat a line with only spaces and/or tabs on it as a 833"blank line" such as separates paragraphs. (Some older parsers 834recognized only two adjacent newlines as a "blank line" but would not 835recognize a newline, a space, and a newline, as a blank line. This 836is noncompliant behavior.) 837 838=item * 839 840Authors of Pod formatters/processors should make every effort to 841avoid writing their own Pod parser. There are already several in 842CPAN, with a wide range of interface styles -- and one of them, 843Pod::Simple, comes with modern versions of Perl. 844 845=item * 846 847Characters in Pod documents may be conveyed either as literals, or by 848number in EE<lt>n> codes, or by an equivalent mnemonic, as in 849EE<lt>eacute> which is exactly equivalent to EE<lt>233>. The numbers 850are the Latin1/Unicode values, even on EBCDIC platforms. 851 852When referring to characters by using a EE<lt>n> numeric code, numbers 853in the range 32-126 refer to those well known US-ASCII characters (also 854defined there by Unicode, with the same meaning), which all Pod 855formatters must render faithfully. Characters whose EE<lt>E<gt> numbers 856are in the ranges 0-31 and 127-159 should not be used (neither as 857literals, 858nor as EE<lt>number> codes), except for the literal byte-sequences for 859newline (ASCII 13, ASCII 13 10, or ASCII 10), and tab (ASCII 9). 860 861Numbers in the range 160-255 refer to Latin-1 characters (also 862defined there by Unicode, with the same meaning). Numbers above 863255 should be understood to refer to Unicode characters. 864 865=item * 866 867Be warned 868that some formatters cannot reliably render characters outside 32-126; 869and many are able to handle 32-126 and 160-255, but nothing above 870255. 871 872=item * 873 874Besides the well-known "EE<lt>lt>" and "EE<lt>gt>" codes for 875less-than and greater-than, Pod parsers must understand "EE<lt>sol>" 876for "/" (solidus, slash), and "EE<lt>verbar>" for "|" (vertical bar, 877pipe). Pod parsers should also understand "EE<lt>lchevron>" and 878"EE<lt>rchevron>" as legacy codes for characters 171 and 187, i.e., 879"left-pointing double angle quotation mark" = "left pointing 880guillemet" and "right-pointing double angle quotation mark" = "right 881pointing guillemet". (These look like little "<<" and ">>", and they 882are now preferably expressed with the HTML/XHTML codes "EE<lt>laquo>" 883and "EE<lt>raquo>".) 884 885=item * 886 887Pod parsers should understand all "EE<lt>html>" codes as defined 888in the entity declarations in the most recent XHTML specification at 889C<www.W3.org>. Pod parsers must understand at least the entities 890that define characters in the range 160-255 (Latin-1). Pod parsers, 891when faced with some unknown "EE<lt>I<identifier>>" code, 892shouldn't simply replace it with nullstring (by default, at least), 893but may pass it through as a string consisting of the literal characters 894E, less-than, I<identifier>, greater-than. Or Pod parsers may offer the 895alternative option of processing such unknown 896"EE<lt>I<identifier>>" codes by firing an event especially 897for such codes, or by adding a special node-type to the in-memory 898document tree. Such "EE<lt>I<identifier>>" may have special meaning 899to some processors, or some processors may choose to add them to 900a special error report. 901 902=item * 903 904Pod parsers must also support the XHTML codes "EE<lt>quot>" for 905character 34 (doublequote, "), "EE<lt>amp>" for character 38 906(ampersand, &), and "EE<lt>apos>" for character 39 (apostrophe, '). 907 908=item * 909 910Note that in all cases of "EE<lt>whateverE<gt>", I<whatever> (whether 911an htmlname, or a number in any base) must consist only of 912alphanumeric characters -- that is, I<whatever> must match 913C<m/\A\w+\z/>. So S<"EE<lt> 0 1 2 3 E<gt>"> is invalid, because 914it contains spaces, which aren't alphanumeric characters. This 915presumably does not I<need> special treatment by a Pod processor; 916S<" 0 1 2 3 "> doesn't look like a number in any base, so it would 917presumably be looked up in the table of HTML-like names. Since 918there isn't (and cannot be) an HTML-like entity called S<" 0 1 2 3 ">, 919this will be treated as an error. However, Pod processors may 920treat S<"EE<lt> 0 1 2 3 E<gt>"> or "EE<lt>e-acute>" as I<syntactically> 921invalid, potentially earning a different error message than the 922error message (or warning, or event) generated by a merely unknown 923(but theoretically valid) htmlname, as in "EE<lt>qacute>" 924[sic]. However, Pod parsers are not required to make this 925distinction. 926 927=item * 928 929Note that EE<lt>number> I<must not> be interpreted as simply 930"codepoint I<number> in the current/native character set". It always 931means only "the character represented by codepoint I<number> in 932Unicode." (This is identical to the semantics of &#I<number>; in XML.) 933 934This will likely require many formatters to have tables mapping from 935treatable Unicode codepoints (such as the "\xE9" for the e-acute 936character) to the escape sequences or codes necessary for conveying 937such sequences in the target output format. A converter to *roff 938would, for example know that "\xE9" (whether conveyed literally, or via 939a EE<lt>...> sequence) is to be conveyed as "e\\*'". 940Similarly, a program rendering Pod in a Mac OS application window, would 941presumably need to know that "\xE9" maps to codepoint 142 in MacRoman 942encoding that (at time of writing) is native for Mac OS. Such 943Unicode2whatever mappings are presumably already widely available for 944common output formats. (Such mappings may be incomplete! Implementers 945are not expected to bend over backwards in an attempt to render 946Cherokee syllabics, Etruscan runes, Byzantine musical symbols, or any 947of the other weird things that Unicode can encode.) And 948if a Pod document uses a character not found in such a mapping, the 949formatter should consider it an unrenderable character. 950 951=item * 952 953If, surprisingly, the implementor of a Pod formatter can't find a 954satisfactory pre-existing table mapping from Unicode characters to 955escapes in the target format (e.g., a decent table of Unicode 956characters to *roff escapes), it will be necessary to build such a 957table. If you are in this circumstance, you should begin with the 958characters in the range 0x00A0 - 0x00FF, which is mostly the heavily 959used accented characters. Then proceed (as patience permits and 960fastidiousness compels) through the characters that the (X)HTML 961standards groups judged important enough to merit mnemonics 962for. These are declared in the (X)HTML specifications at the 963www.W3.org site. At time of writing (September 2001), the most recent 964entity declaration files are: 965 966 http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent 967 http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent 968 http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent 969 970Then you can progress through any remaining notable Unicode characters 971in the range 0x2000-0x204D (consult the character tables at 972www.unicode.org), and whatever else strikes your fancy. For example, 973in F<xhtml-symbol.ent>, there is the entry: 974 975 <!ENTITY infin "∞"> <!-- infinity, U+221E ISOtech --> 976 977While the mapping "infin" to the character "\x{221E}" will (hopefully) 978have been already handled by the Pod parser, the presence of the 979character in this file means that it's reasonably important enough to 980include in a formatter's table that maps from notable Unicode characters 981to the codes necessary for rendering them. So for a Unicode-to-*roff 982mapping, for example, this would merit the entry: 983 984 "\x{221E}" => '\(in', 985 986It is eagerly hoped that in the future, increasing numbers of formats 987(and formatters) will support Unicode characters directly (as (X)HTML 988does with C<∞>, C<∞>, or C<∞>), reducing the need 989for idiosyncratic mappings of Unicode-to-I<my_escapes>. 990 991=item * 992 993It is up to individual Pod formatter to display good judgement when 994confronted with an unrenderable character (which is distinct from an 995unknown EE<lt>thing> sequence that the parser couldn't resolve to 996anything, renderable or not). It is good practice to map Latin letters 997with diacritics (like "EE<lt>eacute>"/"EE<lt>233>") to the corresponding 998unaccented US-ASCII letters (like a simple character 101, "e"), but 999clearly this is often not feasible, and an unrenderable character may 1000be represented as "?", or the like. In attempting a sane fallback 1001(as from EE<lt>233> to "e"), Pod formatters may use the 1002%Latin1Code_to_fallback table in L<Pod::Escapes|Pod::Escapes>, or 1003L<Text::Unidecode|Text::Unidecode>, if available. 1004 1005For example, this Pod text: 1006 1007 magic is enabled if you set C<$Currency> to 'E<euro>'. 1008 1009may be rendered as: 1010"magic is enabled if you set C<$Currency> to 'I<?>'" or as 1011"magic is enabled if you set C<$Currency> to 'B<[euro]>'", or as 1012"magic is enabled if you set C<$Currency> to '[x20AC]', etc. 1013 1014A Pod formatter may also note, in a comment or warning, a list of what 1015unrenderable characters were encountered. 1016 1017=item * 1018 1019EE<lt>...> may freely appear in any formatting code (other than 1020in another EE<lt>...> or in an ZE<lt>>). That is, "XE<lt>The 1021EE<lt>euro>1,000,000 Solution>" is valid, as is "LE<lt>The 1022EE<lt>euro>1,000,000 Solution|Million::Euros>". 1023 1024=item * 1025 1026Some Pod formatters output to formats that implement non-breaking 1027spaces as an individual character (which I'll call "NBSP"), and 1028others output to formats that implement non-breaking spaces just as 1029spaces wrapped in a "don't break this across lines" code. Note that 1030at the level of Pod, both sorts of codes can occur: Pod can contain a 1031NBSP character (whether as a literal, or as a "EE<lt>160>" or 1032"EE<lt>nbsp>" code); and Pod can contain "SE<lt>foo 1033IE<lt>barE<gt> baz>" codes, where "mere spaces" (character 32) in 1034such codes are taken to represent non-breaking spaces. Pod 1035parsers should consider supporting the optional parsing of "SE<lt>foo 1036IE<lt>barE<gt> baz>" as if it were 1037"fooI<NBSP>IE<lt>barE<gt>I<NBSP>baz", and, going the other way, the 1038optional parsing of groups of words joined by NBSP's as if each group 1039were in a SE<lt>...> code, so that formatters may use the 1040representation that maps best to what the output format demands. 1041 1042=item * 1043 1044Some processors may find that the C<SE<lt>...E<gt>> code is easiest to 1045implement by replacing each space in the parse tree under the content 1046of the S, with an NBSP. But note: the replacement should apply I<not> to 1047spaces in I<all> text, but I<only> to spaces in I<printable> text. (This 1048distinction may or may not be evident in the particular tree/event 1049model implemented by the Pod parser.) For example, consider this 1050unusual case: 1051 1052 S<L</Autoloaded Functions>> 1053 1054This means that the space in the middle of the visible link text must 1055not be broken across lines. In other words, it's the same as this: 1056 1057 L<"AutoloadedE<160>Functions"/Autoloaded Functions> 1058 1059However, a misapplied space-to-NBSP replacement could (wrongly) 1060produce something equivalent to this: 1061 1062 L<"AutoloadedE<160>Functions"/AutoloadedE<160>Functions> 1063 1064...which is almost definitely not going to work as a hyperlink (assuming 1065this formatter outputs a format supporting hypertext). 1066 1067Formatters may choose to just not support the S format code, 1068especially in cases where the output format simply has no NBSP 1069character/code and no code for "don't break this stuff across lines". 1070 1071=item * 1072 1073Besides the NBSP character discussed above, implementors are reminded 1074of the existence of the other "special" character in Latin-1, the 1075"soft hyphen" character, also known as "discretionary hyphen", 1076i.e. C<EE<lt>173E<gt>> = C<EE<lt>0xADE<gt>> = 1077C<EE<lt>shyE<gt>>). This character expresses an optional hyphenation 1078point. That is, it normally renders as nothing, but may render as a 1079"-" if a formatter breaks the word at that point. Pod formatters 1080should, as appropriate, do one of the following: 1) render this with 1081a code with the same meaning (e.g., "\-" in RTF), 2) pass it through 1082in the expectation that the formatter understands this character as 1083such, or 3) delete it. 1084 1085For example: 1086 1087 sigE<shy>action 1088 manuE<shy>script 1089 JarkE<shy>ko HieE<shy>taE<shy>nieE<shy>mi 1090 1091These signal to a formatter that if it is to hyphenate "sigaction" 1092or "manuscript", then it should be done as 1093"sig-I<[linebreak]>action" or "manu-I<[linebreak]>script" 1094(and if it doesn't hyphenate it, then the C<EE<lt>shyE<gt>> doesn't 1095show up at all). And if it is 1096to hyphenate "Jarkko" and/or "Hietaniemi", it can do 1097so only at the points where there is a C<EE<lt>shyE<gt>> code. 1098 1099In practice, it is anticipated that this character will not be used 1100often, but formatters should either support it, or delete it. 1101 1102=item * 1103 1104If you think that you want to add a new command to Pod (like, say, a 1105"=biblio" command), consider whether you could get the same 1106effect with a for or begin/end sequence: "=for biblio ..." or "=begin 1107biblio" ... "=end biblio". Pod processors that don't understand 1108"=for biblio", etc, will simply ignore it, whereas they may complain 1109loudly if they see "=biblio". 1110 1111=item * 1112 1113Throughout this document, "Pod" has been the preferred spelling for 1114the name of the documentation format. One may also use "POD" or 1115"pod". For the documentation that is (typically) in the Pod 1116format, you may use "pod", or "Pod", or "POD". Understanding these 1117distinctions is useful; but obsessing over how to spell them, usually 1118is not. 1119 1120=back 1121 1122 1123 1124 1125 1126=head1 About LE<lt>...E<gt> Codes 1127 1128As you can tell from a glance at L<perlpod|perlpod>, the LE<lt>...> 1129code is the most complex of the Pod formatting codes. The points below 1130will hopefully clarify what it means and how processors should deal 1131with it. 1132 1133=over 1134 1135=item * 1136 1137In parsing an LE<lt>...> code, Pod parsers must distinguish at least 1138four attributes: 1139 1140=over 1141 1142=item First: 1143 1144The link-text. If there is none, this must be C<undef>. (E.g., in 1145"LE<lt>Perl Functions|perlfunc>", the link-text is "Perl Functions". 1146In "LE<lt>Time::HiRes>" and even "LE<lt>|Time::HiRes>", there is no 1147link text. Note that link text may contain formatting.) 1148 1149=item Second: 1150 1151The possibly inferred link-text; i.e., if there was no real link 1152text, then this is the text that we'll infer in its place. (E.g., for 1153"LE<lt>Getopt::Std>", the inferred link text is "Getopt::Std".) 1154 1155=item Third: 1156 1157The name or URL, or C<undef> if none. (E.g., in "LE<lt>Perl 1158Functions|perlfunc>", the name (also sometimes called the page) 1159is "perlfunc". In "LE<lt>/CAVEATS>", the name is C<undef>.) 1160 1161=item Fourth: 1162 1163The section (AKA "item" in older perlpods), or C<undef> if none. E.g., 1164in "LE<lt>Getopt::Std/DESCRIPTIONE<gt>", "DESCRIPTION" is the section. (Note 1165that this is not the same as a manpage section like the "5" in "man 5 1166crontab". "Section Foo" in the Pod sense means the part of the text 1167that's introduced by the heading or item whose text is "Foo".) 1168 1169=back 1170 1171Pod parsers may also note additional attributes including: 1172 1173=over 1174 1175=item Fifth: 1176 1177A flag for whether item 3 (if present) is a URL (like 1178"http://lists.perl.org" is), in which case there should be no section 1179attribute; a Pod name (like "perldoc" and "Getopt::Std" are); or 1180possibly a man page name (like "crontab(5)" is). 1181 1182=item Sixth: 1183 1184The raw original LE<lt>...> content, before text is split on 1185"|", "/", etc, and before EE<lt>...> codes are expanded. 1186 1187=back 1188 1189(The above were numbered only for concise reference below. It is not 1190a requirement that these be passed as an actual list or array.) 1191 1192For example: 1193 1194 L<Foo::Bar> 1195 => undef, # link text 1196 "Foo::Bar", # possibly inferred link text 1197 "Foo::Bar", # name 1198 undef, # section 1199 'pod', # what sort of link 1200 "Foo::Bar" # original content 1201 1202 L<Perlport's section on NL's|perlport/Newlines> 1203 => "Perlport's section on NL's", # link text 1204 "Perlport's section on NL's", # possibly inferred link text 1205 "perlport", # name 1206 "Newlines", # section 1207 'pod', # what sort of link 1208 "Perlport's section on NL's|perlport/Newlines" 1209 # original content 1210 1211 L<perlport/Newlines> 1212 => undef, # link text 1213 '"Newlines" in perlport', # possibly inferred link text 1214 "perlport", # name 1215 "Newlines", # section 1216 'pod', # what sort of link 1217 "perlport/Newlines" # original content 1218 1219 L<crontab(5)/"DESCRIPTION"> 1220 => undef, # link text 1221 '"DESCRIPTION" in crontab(5)', # possibly inferred link text 1222 "crontab(5)", # name 1223 "DESCRIPTION", # section 1224 'man', # what sort of link 1225 'crontab(5)/"DESCRIPTION"' # original content 1226 1227 L</Object Attributes> 1228 => undef, # link text 1229 '"Object Attributes"', # possibly inferred link text 1230 undef, # name 1231 "Object Attributes", # section 1232 'pod', # what sort of link 1233 "/Object Attributes" # original content 1234 1235 L<https://www.perl.org/> 1236 => undef, # link text 1237 "https://www.perl.org/", # possibly inferred link text 1238 "https://www.perl.org/", # name 1239 undef, # section 1240 'url', # what sort of link 1241 "https://www.perl.org/" # original content 1242 1243 L<Perl.org|https://www.perl.org/> 1244 => "Perl.org", # link text 1245 "https://www.perl.org/", # possibly inferred link text 1246 "https://www.perl.org/", # name 1247 undef, # section 1248 'url', # what sort of link 1249 "Perl.org|https://www.perl.org/" # original content 1250 1251Note that you can distinguish URL-links from anything else by the 1252fact that they match C<m/\A\w+:[^:\s]\S*\z/>. So 1253C<LE<lt>http://www.perl.comE<gt>> is a URL, but 1254C<LE<lt>HTTP::ResponseE<gt>> isn't. 1255 1256=item * 1257 1258In case of LE<lt>...> codes with no "text|" part in them, 1259older formatters have exhibited great variation in actually displaying 1260the link or cross reference. For example, LE<lt>crontab(5)> would render 1261as "the C<crontab(5)> manpage", or "in the C<crontab(5)> manpage" 1262or just "C<crontab(5)>". 1263 1264Pod processors must now treat "text|"-less links as follows: 1265 1266 L<name> => L<name|name> 1267 L</section> => L<"section"|/section> 1268 L<name/section> => L<"section" in name|name/section> 1269 1270=item * 1271 1272Note that section names might contain markup. I.e., if a section 1273starts with: 1274 1275 =head2 About the C<-M> Operator 1276 1277or with: 1278 1279 =item About the C<-M> Operator 1280 1281then a link to it would look like this: 1282 1283 L<somedoc/About the C<-M> Operator> 1284 1285Formatters may choose to ignore the markup for purposes of resolving 1286the link and use only the renderable characters in the section name, 1287as in: 1288 1289 <h1><a name="About_the_-M_Operator">About the <code>-M</code> 1290 Operator</h1> 1291 1292 ... 1293 1294 <a href="somedoc#About_the_-M_Operator">About the <code>-M</code> 1295 Operator" in somedoc</a> 1296 1297=item * 1298 1299Previous versions of perlpod distinguished C<LE<lt>name/"section"E<gt>> 1300links from C<LE<lt>name/itemE<gt>> links (and their targets). These 1301have been merged syntactically and semantically in the current 1302specification, and I<section> can refer either to a "=headI<n> Heading 1303Content" command or to a "=item Item Content" command. This 1304specification does not specify what behavior should be in the case 1305of a given document having several things all seeming to produce the 1306same I<section> identifier (e.g., in HTML, several things all producing 1307the same I<anchorname> in <a name="I<anchorname>">...</a> 1308elements). Where Pod processors can control this behavior, they should 1309use the first such anchor. That is, C<LE<lt>Foo/BarE<gt>> refers to the 1310I<first> "Bar" section in Foo. 1311 1312But for some processors/formats this cannot be easily controlled; as 1313with the HTML example, the behavior of multiple ambiguous 1314<a name="I<anchorname>">...</a> is most easily just left up to 1315browsers to decide. 1316 1317=item * 1318 1319In a C<LE<lt>text|...E<gt>> code, text may contain formatting codes 1320for formatting or for EE<lt>...> escapes, as in: 1321 1322 L<B<ummE<234>stuff>|...> 1323 1324For C<LE<lt>...E<gt>> codes without a "name|" part, only 1325C<EE<lt>...E<gt>> and C<ZE<lt>E<gt>> codes may occur. That is, 1326authors should not use "C<LE<lt>BE<lt>Foo::BarE<gt>E<gt>>". 1327 1328Note, however, that formatting codes and ZE<lt>>'s can occur in any 1329and all parts of an LE<lt>...> (i.e., in I<name>, I<section>, I<text>, 1330and I<url>). 1331 1332Authors must not nest LE<lt>...> codes. For example, "LE<lt>The 1333LE<lt>Foo::Bar> man page>" should be treated as an error. 1334 1335=item * 1336 1337Note that Pod authors may use formatting codes inside the "text" 1338part of "LE<lt>text|name>" (and so on for LE<lt>text|/"sec">). 1339 1340In other words, this is valid: 1341 1342 Go read L<the docs on C<$.>|perlvar/"$."> 1343 1344Some output formats that do allow rendering "LE<lt>...>" codes as 1345hypertext, might not allow the link-text to be formatted; in 1346that case, formatters will have to just ignore that formatting. 1347 1348=item * 1349 1350At time of writing, C<LE<lt>nameE<gt>> values are of two types: 1351either the name of a Pod page like C<LE<lt>Foo::BarE<gt>> (which 1352might be a real Perl module or program in an @INC / PATH 1353directory, or a .pod file in those places); or the name of a Unix 1354man page, like C<LE<lt>crontab(5)E<gt>>. In theory, C<LE<lt>chmodE<gt>> 1355is ambiguous between a Pod page called "chmod", or the Unix man page 1356"chmod" (in whatever man-section). However, the presence of a string 1357in parens, as in "crontab(5)", is sufficient to signal that what 1358is being discussed is not a Pod page, and so is presumably a 1359Unix man page. The distinction is of no importance to many 1360Pod processors, but some processors that render to hypertext formats 1361may need to distinguish them in order to know how to render a 1362given C<LE<lt>fooE<gt>> code. 1363 1364=item * 1365 1366Previous versions of perlpod allowed for a C<LE<lt>sectionE<gt>> syntax (as in 1367C<LE<lt>Object AttributesE<gt>>), which was not easily distinguishable from 1368C<LE<lt>nameE<gt>> syntax and for C<LE<lt>"section"E<gt>> which was only 1369slightly less ambiguous. This syntax is no longer in the specification, and 1370has been replaced by the C<LE<lt>/sectionE<gt>> syntax (where the slash was 1371formerly optional). Pod parsers should tolerate the C<LE<lt>"section"E<gt>> 1372syntax, for a while at least. The suggested heuristic for distinguishing 1373C<LE<lt>sectionE<gt>> from C<LE<lt>nameE<gt>> is that if it contains any 1374whitespace, it's a I<section>. Pod processors should warn about this being 1375deprecated syntax. 1376 1377=back 1378 1379=head1 About =over...=back Regions 1380 1381"=over"..."=back" regions are used for various kinds of list-like 1382structures. (I use the term "region" here simply as a collective 1383term for everything from the "=over" to the matching "=back".) 1384 1385=over 1386 1387=item * 1388 1389The non-zero numeric I<indentlevel> in "=over I<indentlevel>" ... 1390"=back" is used for giving the formatter a clue as to how many 1391"spaces" (ems, or roughly equivalent units) it should tab over, 1392although many formatters will have to convert this to an absolute 1393measurement that may not exactly match with the size of spaces (or M's) 1394in the document's base font. Other formatters may have to completely 1395ignore the number. The lack of any explicit I<indentlevel> parameter is 1396equivalent to an I<indentlevel> value of 4. Pod processors may 1397complain if I<indentlevel> is present but is not a positive number 1398matching C<m/\A(\d*\.)?\d+\z/>. 1399 1400=item * 1401 1402Authors of Pod formatters are reminded that "=over" ... "=back" may 1403map to several different constructs in your output format. For 1404example, in converting Pod to (X)HTML, it can map to any of 1405<ul>...</ul>, <ol>...</ol>, <dl>...</dl>, or 1406<blockquote>...</blockquote>. Similarly, "=item" can map to <li> or 1407<dt>. 1408 1409=item * 1410 1411Each "=over" ... "=back" region should be one of the following: 1412 1413=over 1414 1415=item * 1416 1417An "=over" ... "=back" region containing only "=item *" commands, 1418each followed by some number of ordinary/verbatim paragraphs, other 1419nested "=over" ... "=back" regions, "=for..." paragraphs, and 1420"=begin"..."=end" regions. 1421 1422(Pod processors must tolerate a bare "=item" as if it were "=item 1423*".) Whether "*" is rendered as a literal asterisk, an "o", or as 1424some kind of real bullet character, is left up to the Pod formatter, 1425and may depend on the level of nesting. 1426 1427=item * 1428 1429An "=over" ... "=back" region containing only 1430C<m/\A=item\s+\d+\.?\s*\z/> paragraphs, each one (or each group of them) 1431followed by some number of ordinary/verbatim paragraphs, other nested 1432"=over" ... "=back" regions, "=for..." paragraphs, and/or 1433"=begin"..."=end" codes. Note that the numbers must start at 1 1434in each section, and must proceed in order and without skipping 1435numbers. 1436 1437(Pod processors must tolerate lines like "=item 1" as if they were 1438"=item 1.", with the period.) 1439 1440=item * 1441 1442An "=over" ... "=back" region containing only "=item [text]" 1443commands, each one (or each group of them) followed by some number of 1444ordinary/verbatim paragraphs, other nested "=over" ... "=back" 1445regions, or "=for..." paragraphs, and "=begin"..."=end" regions. 1446 1447The "=item [text]" paragraph should not match 1448C<m/\A=item\s+\d+\.?\s*\z/> or C<m/\A=item\s+\*\s*\z/>, nor should it 1449match just C<m/\A=item\s*\z/>. 1450 1451=item * 1452 1453An "=over" ... "=back" region containing no "=item" paragraphs at 1454all, and containing only some number of 1455ordinary/verbatim paragraphs, and possibly also some nested "=over" 1456... "=back" regions, "=for..." paragraphs, and "=begin"..."=end" 1457regions. Such an itemless "=over" ... "=back" region in Pod is 1458equivalent in meaning to a "<blockquote>...</blockquote>" element in 1459HTML. 1460 1461=back 1462 1463Note that with all the above cases, you can determine which type of 1464"=over" ... "=back" you have, by examining the first (non-"=cut", 1465non-"=pod") Pod paragraph after the "=over" command. 1466 1467=item * 1468 1469Pod formatters I<must> tolerate arbitrarily large amounts of text 1470in the "=item I<text...>" paragraph. In practice, most such 1471paragraphs are short, as in: 1472 1473 =item For cutting off our trade with all parts of the world 1474 1475But they may be arbitrarily long: 1476 1477 =item For transporting us beyond seas to be tried for pretended 1478 offenses 1479 1480 =item He is at this time transporting large armies of foreign 1481 mercenaries to complete the works of death, desolation and 1482 tyranny, already begun with circumstances of cruelty and perfidy 1483 scarcely paralleled in the most barbarous ages, and totally 1484 unworthy the head of a civilized nation. 1485 1486=item * 1487 1488Pod processors should tolerate "=item *" / "=item I<number>" commands 1489with no accompanying paragraph. The middle item is an example: 1490 1491 =over 1492 1493 =item 1 1494 1495 Pick up dry cleaning. 1496 1497 =item 2 1498 1499 =item 3 1500 1501 Stop by the store. Get Abba Zabas, Stoli, and cheap lawn chairs. 1502 1503 =back 1504 1505=item * 1506 1507No "=over" ... "=back" region can contain headings. Processors may 1508treat such a heading as an error. 1509 1510=item * 1511 1512Note that an "=over" ... "=back" region should have some 1513content. That is, authors should not have an empty region like this: 1514 1515 =over 1516 1517 =back 1518 1519Pod processors seeing such a contentless "=over" ... "=back" region, 1520may ignore it, or may report it as an error. 1521 1522=item * 1523 1524Processors must tolerate an "=over" list that goes off the end of the 1525document (i.e., which has no matching "=back"), but they may warn 1526about such a list. 1527 1528=item * 1529 1530Authors of Pod formatters should note that this construct: 1531 1532 =item Neque 1533 1534 =item Porro 1535 1536 =item Quisquam Est 1537 1538 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1539 velit, sed quia non numquam eius modi tempora incidunt ut 1540 labore et dolore magnam aliquam quaerat voluptatem. 1541 1542 =item Ut Enim 1543 1544is semantically ambiguous, in a way that makes formatting decisions 1545a bit difficult. On the one hand, it could be mention of an item 1546"Neque", mention of another item "Porro", and mention of another 1547item "Quisquam Est", with just the last one requiring the explanatory 1548paragraph "Qui dolorem ipsum quia dolor..."; and then an item 1549"Ut Enim". In that case, you'd want to format it like so: 1550 1551 Neque 1552 1553 Porro 1554 1555 Quisquam Est 1556 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1557 velit, sed quia non numquam eius modi tempora incidunt ut 1558 labore et dolore magnam aliquam quaerat voluptatem. 1559 1560 Ut Enim 1561 1562But it could equally well be a discussion of three (related or equivalent) 1563items, "Neque", "Porro", and "Quisquam Est", followed by a paragraph 1564explaining them all, and then a new item "Ut Enim". In that case, you'd 1565probably want to format it like so: 1566 1567 Neque 1568 Porro 1569 Quisquam Est 1570 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1571 velit, sed quia non numquam eius modi tempora incidunt ut 1572 labore et dolore magnam aliquam quaerat voluptatem. 1573 1574 Ut Enim 1575 1576But (for the foreseeable future), Pod does not provide any way for Pod 1577authors to distinguish which grouping is meant by the above 1578"=item"-cluster structure. So formatters should format it like so: 1579 1580 Neque 1581 1582 Porro 1583 1584 Quisquam Est 1585 1586 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 1587 velit, sed quia non numquam eius modi tempora incidunt ut 1588 labore et dolore magnam aliquam quaerat voluptatem. 1589 1590 Ut Enim 1591 1592That is, there should be (at least roughly) equal spacing between 1593items as between paragraphs (although that spacing may well be less 1594than the full height of a line of text). This leaves it to the reader 1595to use (con)textual cues to figure out whether the "Qui dolorem 1596ipsum..." paragraph applies to the "Quisquam Est" item or to all three 1597items "Neque", "Porro", and "Quisquam Est". While not an ideal 1598situation, this is preferable to providing formatting cues that may 1599be actually contrary to the author's intent. 1600 1601=back 1602 1603 1604 1605=head1 About Data Paragraphs and "=begin/=end" Regions 1606 1607Data paragraphs are typically used for inlining non-Pod data that is 1608to be used (typically passed through) when rendering the document to 1609a specific format: 1610 1611 =begin rtf 1612 1613 \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par} 1614 1615 =end rtf 1616 1617The exact same effect could, incidentally, be achieved with a single 1618"=for" paragraph: 1619 1620 =for rtf \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par} 1621 1622(Although that is not formally a data paragraph, it has the same 1623meaning as one, and Pod parsers may parse it as one.) 1624 1625Another example of a data paragraph: 1626 1627 =begin html 1628 1629 I like <em>PIE</em>! 1630 1631 <hr>Especially pecan pie! 1632 1633 =end html 1634 1635If these were ordinary paragraphs, the Pod parser would try to 1636expand the "EE<lt>/em>" (in the first paragraph) as a formatting 1637code, just like "EE<lt>lt>" or "EE<lt>eacute>". But since this 1638is in a "=begin I<identifier>"..."=end I<identifier>" region I<and> 1639the identifier "html" doesn't begin have a ":" prefix, the contents 1640of this region are stored as data paragraphs, instead of being 1641processed as ordinary paragraphs (or if they began with a spaces 1642and/or tabs, as verbatim paragraphs). 1643 1644As a further example: At time of writing, no "biblio" identifier is 1645supported, but suppose some processor were written to recognize it as 1646a way of (say) denoting a bibliographic reference (necessarily 1647containing formatting codes in ordinary paragraphs). The fact that 1648"biblio" paragraphs were meant for ordinary processing would be 1649indicated by prefacing each "biblio" identifier with a colon: 1650 1651 =begin :biblio 1652 1653 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1654 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1655 1656 =end :biblio 1657 1658This would signal to the parser that paragraphs in this begin...end 1659region are subject to normal handling as ordinary/verbatim paragraphs 1660(while still tagged as meant only for processors that understand the 1661"biblio" identifier). The same effect could be had with: 1662 1663 =for :biblio 1664 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1665 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1666 1667The ":" on these identifiers means simply "process this stuff 1668normally, even though the result will be for some special target". 1669I suggest that parser APIs report "biblio" as the target identifier, 1670but also report that it had a ":" prefix. (And similarly, with the 1671above "html", report "html" as the target identifier, and note the 1672I<lack> of a ":" prefix.) 1673 1674Note that a "=begin I<identifier>"..."=end I<identifier>" region where 1675I<identifier> begins with a colon, I<can> contain commands. For example: 1676 1677 =begin :biblio 1678 1679 Wirth's classic is available in several editions, including: 1680 1681 =for comment 1682 hm, check abebooks.com for how much used copies cost. 1683 1684 =over 1685 1686 =item 1687 1688 Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.> 1689 Teubner, Stuttgart. [Yes, it's in German.] 1690 1691 =item 1692 1693 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1694 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1695 1696 =back 1697 1698 =end :biblio 1699 1700Note, however, a "=begin I<identifier>"..."=end I<identifier>" 1701region where I<identifier> does I<not> begin with a colon, should not 1702directly contain "=head1" ... "=head4" commands, nor "=over", nor "=back", 1703nor "=item". For example, this may be considered invalid: 1704 1705 =begin somedata 1706 1707 This is a data paragraph. 1708 1709 =head1 Don't do this! 1710 1711 This is a data paragraph too. 1712 1713 =end somedata 1714 1715A Pod processor may signal that the above (specifically the "=head1" 1716paragraph) is an error. Note, however, that the following should 1717I<not> be treated as an error: 1718 1719 =begin somedata 1720 1721 This is a data paragraph. 1722 1723 =cut 1724 1725 # Yup, this isn't Pod anymore. 1726 sub excl { (rand() > .5) ? "hoo!" : "hah!" } 1727 1728 =pod 1729 1730 This is a data paragraph too. 1731 1732 =end somedata 1733 1734And this too is valid: 1735 1736 =begin someformat 1737 1738 This is a data paragraph. 1739 1740 And this is a data paragraph. 1741 1742 =begin someotherformat 1743 1744 This is a data paragraph too. 1745 1746 And this is a data paragraph too. 1747 1748 =begin :yetanotherformat 1749 1750 =head2 This is a command paragraph! 1751 1752 This is an ordinary paragraph! 1753 1754 And this is a verbatim paragraph! 1755 1756 =end :yetanotherformat 1757 1758 =end someotherformat 1759 1760 Another data paragraph! 1761 1762 =end someformat 1763 1764The contents of the above "=begin :yetanotherformat" ... 1765"=end :yetanotherformat" region I<aren't> data paragraphs, because 1766the immediately containing region's identifier (":yetanotherformat") 1767begins with a colon. In practice, most regions that contain 1768data paragraphs will contain I<only> data paragraphs; however, 1769the above nesting is syntactically valid as Pod, even if it is 1770rare. However, the handlers for some formats, like "html", 1771will accept only data paragraphs, not nested regions; and they may 1772complain if they see (targeted for them) nested regions, or commands, 1773other than "=end", "=pod", and "=cut". 1774 1775Also consider this valid structure: 1776 1777 =begin :biblio 1778 1779 Wirth's classic is available in several editions, including: 1780 1781 =over 1782 1783 =item 1784 1785 Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.> 1786 Teubner, Stuttgart. [Yes, it's in German.] 1787 1788 =item 1789 1790 Wirth, Niklaus. 1976. I<Algorithms + Data Structures = 1791 Programs.> Prentice-Hall, Englewood Cliffs, NJ. 1792 1793 =back 1794 1795 Buy buy buy! 1796 1797 =begin html 1798 1799 <img src='wirth_spokesmodeling_book.png'> 1800 1801 <hr> 1802 1803 =end html 1804 1805 Now now now! 1806 1807 =end :biblio 1808 1809There, the "=begin html"..."=end html" region is nested inside 1810the larger "=begin :biblio"..."=end :biblio" region. Note that the 1811content of the "=begin html"..."=end html" region is data 1812paragraph(s), because the immediately containing region's identifier 1813("html") I<doesn't> begin with a colon. 1814 1815Pod parsers, when processing a series of data paragraphs one 1816after another (within a single region), should consider them to 1817be one large data paragraph that happens to contain blank lines. So 1818the content of the above "=begin html"..."=end html" I<may> be stored 1819as two data paragraphs (one consisting of 1820"<img src='wirth_spokesmodeling_book.png'>\n" 1821and another consisting of "<hr>\n"), but I<should> be stored as 1822a single data paragraph (consisting of 1823"<img src='wirth_spokesmodeling_book.png'>\n\n<hr>\n"). 1824 1825Pod processors should tolerate empty 1826"=begin I<something>"..."=end I<something>" regions, 1827empty "=begin :I<something>"..."=end :I<something>" regions, and 1828contentless "=for I<something>" and "=for :I<something>" 1829paragraphs. I.e., these should be tolerated: 1830 1831 =for html 1832 1833 =begin html 1834 1835 =end html 1836 1837 =begin :biblio 1838 1839 =end :biblio 1840 1841Incidentally, note that there's no easy way to express a data 1842paragraph starting with something that looks like a command. Consider: 1843 1844 =begin stuff 1845 1846 =shazbot 1847 1848 =end stuff 1849 1850There, "=shazbot" will be parsed as a Pod command "shazbot", not as a data 1851paragraph "=shazbot\n". However, you can express a data paragraph consisting 1852of "=shazbot\n" using this code: 1853 1854 =for stuff =shazbot 1855 1856The situation where this is necessary, is presumably quite rare. 1857 1858Note that =end commands must match the currently open =begin command. That 1859is, they must properly nest. For example, this is valid: 1860 1861 =begin outer 1862 1863 X 1864 1865 =begin inner 1866 1867 Y 1868 1869 =end inner 1870 1871 Z 1872 1873 =end outer 1874 1875while this is invalid: 1876 1877 =begin outer 1878 1879 X 1880 1881 =begin inner 1882 1883 Y 1884 1885 =end outer 1886 1887 Z 1888 1889 =end inner 1890 1891This latter is improper because when the "=end outer" command is seen, the 1892currently open region has the formatname "inner", not "outer". (It just 1893happens that "outer" is the format name of a higher-up region.) This is 1894an error. Processors must by default report this as an error, and may halt 1895processing the document containing that error. A corollary of this is that 1896regions cannot "overlap". That is, the latter block above does not represent 1897a region called "outer" which contains X and Y, overlapping a region called 1898"inner" which contains Y and Z. But because it is invalid (as all 1899apparently overlapping regions would be), it doesn't represent that, or 1900anything at all. 1901 1902Similarly, this is invalid: 1903 1904 =begin thing 1905 1906 =end hting 1907 1908This is an error because the region is opened by "thing", and the "=end" 1909tries to close "hting" [sic]. 1910 1911This is also invalid: 1912 1913 =begin thing 1914 1915 =end 1916 1917This is invalid because every "=end" command must have a formatname 1918parameter. 1919 1920=head1 SEE ALSO 1921 1922L<perlpod>, L<perlsyn/"PODs: Embedded Documentation">, 1923L<podchecker> 1924 1925=head1 AUTHOR 1926 1927Sean M. Burke 1928 1929=cut 1930 1931 1932