1#!/usr/bin/perl -w 2 3BEGIN { 4 chdir 't'; 5 unshift @INC, "../lib"; 6} 7 8use strict; 9use warnings; 10use feature 'unicode_strings'; 11 12use Carp; 13use Config; 14use Digest; 15use File::Find; 16use File::Spec; 17use Scalar::Util; 18use Text::Tabs; 19 20BEGIN { 21 if ( $Config{usecrosscompile} ) { 22 print "1..0 # Not all files are available during cross-compilation\n"; 23 exit 0; 24 } 25 if ($^O eq 'dec_osf') { 26 print "1..0 # $^O cannot handle this test\n"; 27 exit 0; 28 } 29 require '../regen/regen_lib.pl'; 30} 31 32sub DEBUG { 0 }; 33 34=pod 35 36=head1 NAME 37 38podcheck.t - Look for possible problems in the Perl pods 39 40=head1 SYNOPSIS 41 42 cd t 43 ./perl -I../lib porting/podcheck.t [--show_all] [--cpan] [--deltas] 44 [--counts] [--pedantic] [FILE ...] 45 46 ./perl -I../lib porting/podcheck.t --add_link MODULE ... 47 48 ./perl -I../lib porting/podcheck.t --regen 49 50=head1 DESCRIPTION 51 52podcheck.t is an extension of Pod::Checker. It looks for pod errors and 53potential errors in the files given as arguments, or if none specified, in all 54pods in the distribution workspace, except certain known special ones 55(specified below). It does additional checking beyond that done by 56Pod::Checker, and keeps a database of known potential problems, and will 57fail a pod only if the number of such problems differs from that given in the 58database. It also suppresses the C<(section) deprecated> message from 59Pod::Checker, since specifying the man page section number is quite proper to do. 60 61The additional checks it always makes are: 62 63=over 64 65=item Cross-pod link checking 66 67Pod::Checker verifies that links to an internal target in a pod are not 68broken. podcheck.t extends that (when called without FILE arguments) to 69external links. It does this by gathering up all the possible targets in the 70workspace, and cross-checking them. It also checks that a non-broken link 71points to just one target. (The destination pod could have two targets with 72the same name.) 73 74The way that the C<LE<lt>E<gt>> pod command works (for links outside the pod) 75is to actually create a link to C<search.cpan.org> with an embedded query for 76the desired pod or man page. That means that links outside the distribution 77are valid. podcheck.t doesn't verify the validity of such links, but instead 78keeps a database of those known to be valid. This means that if a link to a 79target not on the list is created, the target needs to be added to the data 80base. This is accomplished via the L<--add_link|/--add_link MODULE ...> 81option to podcheck.t, described below. 82 83=item An internal link that isn't so specified 84 85If a link is broken, but there is an existing internal target of the same 86name, it is likely that the internal target was meant, and the C<"/"> is 87missing from the C<LE<lt>E<gt>> pod command. 88 89=item Missing or duplicate NAME or missing NAME short description 90 91A pod can't be linked to unless it has a unique name. 92And a NAME should have a dash and short description after it. 93 94=item =encoding statement issues 95 96This indicates if an C<=encoding> statement should be present, or moved to the 97front of the pod. 98 99=back 100 101If the C<PERL_POD_PEDANTIC> environment variable is set or the C<--pedantic> 102command line argument is provided then a few more checks are made. 103The pedantic checks are: 104 105=over 106 107=item Verbatim paragraphs that wrap in an 80 (including 1 spare) column window 108 109It's annoying to have lines wrap when displaying pod documentation in a 110terminal window. This checks that all verbatim lines fit in a standard 80 111column window, even when using a pager that reserves a column for its own use. 112(Thus the check is for a net of 79 columns.) 113For those lines that don't fit, it tells you how much needs to be cut in 114order to fit. 115 116Often, the easiest thing to do to gain space for these is to lower the indent 117to just one space. 118 119=item Items that perhaps should be links 120 121There are mentions of apparent files in the pods that perhaps should be links 122instead, using C<LE<lt>...E<gt>> 123 124=item Items that perhaps should be C<FE<lt>...E<gt>> 125 126What look like path names enclosed in C<CE<lt>...E<gt>> should perhaps have 127C<FE<lt>...E<gt>> mark-up instead. 128 129=back 130 131A number of issues raised by podcheck.t and by the base Pod::Checker are not 132really problems, but merely potential problems, that is, false positives. 133After inspecting them and 134deciding that they aren't real problems, it is possible to shut up this program 135about them, unlike base Pod::Checker. For a valid link to an outside module 136or man page, call podcheck.t with the C<--add_link> option to add it to the 137the database of known links; for other causes, call podcheck.t with the C<--regen> 138option to regenerate the entire database. This tells it that all existing 139issues are to not be mentioned again. 140 141C<--regen> isn't fool-proof. The database merely keeps track of the number of these 142potential problems of each type for each pod. If a new problem of a given 143type is introduced into the pod, podcheck.t will spit out all of them. You 144then have to figure out which is the new one, and should it be changed or not. 145But doing it this way insulates the database from having to keep track of line 146numbers of problems, which may change, or the exact wording of each problem 147which might also change without affecting whether it is a problem or not. 148 149Also, if the count of potential problems of a given type for a pod decreases, 150the database must be regenerated so that it knows the new number. The program 151gives instructions when this happens. 152 153Some pods will have varying numbers of problems of a given type. This can 154be handled by manually editing the database file (see L</FILES>), and setting 155the number of those problems for that pod to a negative number. This will 156cause the corresponding error to always be suppressed no matter how many there 157actually are. 158 159Another problem is that there is currently no check that modules listed as 160valid in the database 161actually are. Thus any errors introduced there will remain there. 162 163=head2 Specially handled pods 164 165=over 166 167=item perltoc 168 169This pod is generated by pasting bits from other pods. Errors in those bits 170will show up as errors here, as well as for those other pods. Therefore 171errors here are suppressed, and the pod is checked only to verify that nodes 172within it actually exist that are externally linked to. 173 174=item perldelta 175 176The current perldelta pod is initialized from a template that contains 177placeholder text. Some of this text is in the form of links that don't really 178exist. Any such links that are listed in C<@perldelta_ignore_links> will not 179generate messages. It is presumed that these links will be cleaned up when 180the perldelta is cleaned up for release since they should be marked with 181C<XXX>. 182 183=item Porting/perldelta_template.pod 184 185This is not a pod, but a template for C<perldelta>. Any errors introduced 186here will show up when C<perldelta> is created from it. 187 188=item cpan-upstream pods 189 190See the L</--cpan> option documentation 191 192=item old perldeltas 193 194See the L</--deltas> option documentation 195 196=back 197 198=head1 OPTIONS 199 200=over 201 202=item --add_link MODULE ... 203 204Use this option to teach podcheck.t that the C<MODULE>s or man pages actually 205exist, and to silence any messages that links to them are broken. 206 207podcheck.t checks that links within the Perl core distribution are valid, but 208it doesn't check links to man pages or external modules. When it finds 209a broken link, it checks its database of external modules and man pages, 210and only if not found there does it raise a message. This option just adds 211the list of modules and man page references that follow it on the command line 212to that database. 213 214For example, 215 216 cd t 217 ./perl -I../lib porting/podcheck.t --add_link Unicode::Casing 218 219causes the external module "Unicode::Casing" to be added to the database, so 220C<LE<lt>Unicode::CasingE<gt>> will be considered valid. 221 222=item --regen 223 224Regenerate the database used by podcheck.t to include all the existing 225potential problems. Future runs of the program will not then flag any of 226these. Setting this option also sets C<--pedantic>. 227 228=item --cpan 229 230Normally, all pods in the cpan directory are skipped, except to make sure that 231any blead-upstream links to such pods are valid. 232This option will cause cpan upstream pods to be fully checked. 233 234=item --deltas 235 236Normally, all old perldelta pods are skipped, except to make sure that 237any links to such pods are valid. This is because they are considered 238stable, and perhaps trying to fix them will cause changes that will 239misrepresent Perl's history. But, this option will cause them to be fully 240checked. 241 242=item --show_all 243 244Normally, if the number of potential problems of a given type found for a 245pod matches the expected value in the database, they will not be displayed. 246This option forces the database to be ignored during the run, so all potential 247problems are displayed and will fail their respective pod test. Specifying 248any particular FILES to operate on automatically selects this option. 249 250=item --counts 251 252Instead of testing, this just dumps the counts of the occurrences of the 253various types of potential problems in the database. 254 255=item --pedantic 256 257There are three potential problems that are not checked for by default. 258This options enables them. The environment variable C<PERL_POD_PEDANTIC> 259can be set to 1 to enable this option also. 260This option is set when C<--regen> is used. 261 262=back 263 264=head1 FILES 265 266The database is stored in F<t/porting/known_pod_issues.dat> 267 268=head1 SEE ALSO 269 270L<Pod::Checker> 271 272=cut 273 274# VMS builds have a '.com' appended to utility and script names, and it adds a 275# trailing dot for any other file name that doesn't have a dot in it. The db 276# is stored without those things. This regex allows for these special file 277# names to be dealt with. It needs to be interpolated into a larger regex 278# that furnishes the closing boundary. 279my $vms_re = qr/ \. (?: com )? /x; 280 281# Some filenames in the MANIFEST match $vms_re, and so must not be handled the 282# same way that that the special vms ones are. This hash lists those. 283my %special_vms_files; 284 285# This is to get this to work across multiple file systems, including those 286# that are not case sensitive. The db is stored in lower case, Un*x style, 287# and all file name comparisons are done that way. 288sub canonicalize($) { 289 my $input = shift; 290 my ($volume, $directories, $file) 291 = File::Spec->splitpath(File::Spec->canonpath($input)); 292 # Assumes $volume is constant for everything in this directory structure 293 $directories = "" if ! $directories; 294 $file = "" if ! $file; 295 $file = lc join '/', File::Spec->splitdir($directories), $file; 296 $file =~ s! / /+ !/!gx; # Multiple slashes => single slash 297 298 # The db is stored without the special suffixes that are there in VMS, so 299 # strip them off to get the comparable name. But some files on all 300 # platforms have these suffixes, so this shouldn't happen for them, as any 301 # of their db entries will have the suffixes in them. The hash has been 302 # populated with these files. 303 if ($^O eq 'VMS' 304 && $file =~ / ( $vms_re ) $ /x 305 && ! exists $special_vms_files{$file}) 306 { 307 $file =~ s/ $1 $ //x; 308 } 309 return $file; 310} 311 312##################################################### 313# HOW IT WORKS (in general) 314# 315# If not called with specific files to check, the directory structure is 316# examined for files that have pods in them. Files that might not have to be 317# fully parsed (e.g. in cpan) are parsed enough at this time to find their 318# pod's NAME, and to get a checksum. 319# 320# Those kinds of files are sorted last, but otherwise the pods are parsed with 321# the package coded here, My::Pod::Checker, which is an extension to 322# Pod::Checker that adds some tests and suppresses others that aren't 323# appropriate. The latter module has no provision for capturing diagnostics, 324# so a package, Tie_Array_to_FH, is used to force them to be placed into an 325# array instead of printed. 326# 327# Parsing the files builds up a list of links. The files are gone through 328# again, doing cross-link checking and outputting all saved-up problems with 329# each pod. 330# 331# Sorting the files last that potentially don't need to be fully parsed allows 332# us to not parse them unless there is a link to an internal anchor in them 333# from something that we have already parsed. Keeping checksums allows us to 334# not parse copies of other pods. 335# 336##################################################### 337 338# 1 => Exclude low priority messages that aren't likely to be problems, and 339# has many false positives; higher numbers give more messages. 340my $Warnings_Level = 200; 341 342# perldelta during construction may have place holder links. N.B. This 343# variable is referred to by name in release_managers_guide.pod 344our @perldelta_ignore_links = ( "XXX", "perl5YYYdelta", "perldiag/message" ); 345 346# To see if two pods with the same NAME are actually copies of the same pod, 347# which is not an error, it uses a checksum to save work. 348my $digest_type = "SHA-1"; 349 350my $original_dir = File::Spec->rel2abs(File::Spec->curdir); 351my $data_dir = File::Spec->catdir($original_dir, 'porting'); 352my $known_issues = File::Spec->catfile($data_dir, 'known_pod_issues.dat'); 353my $MANIFEST = File::Spec->catfile(File::Spec->updir($original_dir), 'MANIFEST'); 354my $copy_fh; 355 356my $MAX_LINE_LENGTH = 79; # 79 columns 357my $INDENT = 7; # default nroff indent 358 359# Our warning messages. Better not have [('"] in them, as those are used as 360# delimiters for variable parts of the messages by poderror. 361my $broken_link = "Apparent broken link"; 362my $broken_internal_link = "Apparent internal link is missing its forward slash"; 363my $multiple_targets = "There is more than one target"; 364my $duplicate_name = "Pod NAME already used"; 365my $need_encoding = "Should have =encoding statement because have non-ASCII"; 366my $encoding_first = "=encoding must be first command (if present)"; 367my $no_name = "There is no NAME"; 368my $missing_name_description = "The NAME should have a dash and short description after it"; 369# the pedantic warnings messages 370my $line_length = "Verbatim line length including indents exceeds $MAX_LINE_LENGTH by"; 371my $C_not_linked = "? Should you be using L<...> instead of"; 372my $C_with_slash = "? Should you be using F<...> or maybe L<...> instead of"; 373 374# objects, tests, etc can't be pods, so don't look for them. Also skip 375# files output by the patch program. Could also ignore most of .gitignore 376# files, but not all, so don't. 377 378my $obj_ext = $Config{'obj_ext'}; $obj_ext =~ tr/.//d; # dot will be added back 379my $lib_ext = $Config{'lib_ext'}; $lib_ext =~ tr/.//d; 380my $lib_so = $Config{'so'}; $lib_so =~ tr/.//d; 381my $dl_ext = $Config{'dlext'}; $dl_ext =~ tr/.//d; 382 383# Not really pods, but can look like them. 384my %excluded_files = ( 385 canonicalize("lib/unicore/mktables") => 1, 386 canonicalize("Porting/make-rmg-checklist") => 1, 387 # this one is a POD, but unfinished, so skip 388 # it for now 389 canonicalize("Porting/perl5200delta.pod") => 1, 390 canonicalize("Porting/perldelta_template.pod") => 1, 391 canonicalize("regen/feature.pl") => 1, 392 canonicalize("regen/warnings.pl") => 1, 393 canonicalize("autodoc.pl") => 1, 394 canonicalize("configpm") => 1, 395 canonicalize("miniperl") => 1, 396 canonicalize("perl") => 1, 397 canonicalize('cpan/Pod-Perldoc/corpus/no-head.pod') => 1, 398 canonicalize('cpan/Pod-Perldoc/corpus/perlfunc.pod') => 1, 399 canonicalize('cpan/Pod-Perldoc/corpus/utf8.pod') => 1, 400 canonicalize("lib/unicore/mktables") => 1, 401 ); 402 403# This list should not include anything for which case sensitivity is 404# important, as it won't work on VMS, and won't show up until tested on VMS. 405# All or almost all such files should be listed in the MANIFEST, so that can 406# be examined for them, and each such file explicitly excluded, as is done for 407# .PL files in the loop just below this. For files not catchable this way, 408# is_pod_file() can be used to exclude these at a finer grained level. 409my $non_pods = qr/ (?: \. 410 (?: [achot] | zip | gz | bz2 | jar | tar | tgz 411 | orig | rej | patch # Patch program output 412 | sw[op] | \#.* # Editor droppings 413 | old # buildtoc output 414 | xs # pod should be in the .pm file 415 | al # autosplit files 416 | bs # bootstrap files 417 | (?i:sh) # shell scripts, hints, templates 418 | lst # assorted listing files 419 | bat # Windows,Netware,OS2 batch files 420 | cmd # Windows,Netware,OS2 command files 421 | lis # VMS compiler listings 422 | map # VMS linker maps 423 | opt # VMS linker options files 424 | mms # MM(K|S) description files 425 | ts # timestamp files generated during build 426 | $obj_ext # object files 427 | exe # $Config{'exe_ext'} might be empty string 428 | $lib_ext # object libraries 429 | $lib_so # shared libraries 430 | $dl_ext # dynamic libraries 431 | gif # GIF images (example files from CGI.pm) 432 | eg # examples from libnet 433 ) 434 $ 435 ) | ~$ | \ \(Autosaved\)\.txt$ # Other editor droppings 436 | ^cxx\$demangler_db\.$ # VMS name mangler database 437 | ^typemap\.?$ # typemap files 438 | ^(?i:Makefile\.PL)$ 439 /x; 440 441# '.PL' files should be excluded, as they aren't final pods, but often contain 442# material used in generating pods, and so can look like a pod. We can't use 443# the regexp above because case sensisitivity is important for these, as some 444# '.pl' files should be examined for pods. Instead look through the MANIFEST 445# for .PL files and get their full path names, so we can exclude each such 446# file explicitly. This works because other porting tests prohibit having two 447# files with the same names except for case. 448open my $manifest_fh, '<:bytes', $MANIFEST or die "Can't open $MANIFEST"; 449while (<$manifest_fh>) { 450 451 # While we have MANIFEST open, on VMS platforms, look for files that match 452 # the magic VMS file names that have to be handled specially. Add these 453 # to the list of them. 454 if ($^O eq 'VMS' && / ^ ( [^\t]* $vms_re ) \t /x) { 455 $special_vms_files{$1} = 1; 456 } 457 if (/ ^ ( [^\t]* \. PL ) \t /x) { 458 $excluded_files{canonicalize($1)} = 1; 459 } 460} 461close $manifest_fh, or die "Can't close $MANIFEST"; 462 463 464# Pod::Checker messages to suppress 465my @suppressed_messages = ( 466 "(section) in", # Checker is wrong to flag this 467 "multiple occurrence of link target", # We catch independently the ones 468 # that are real problems. 469 "unescaped <>", 470 "Entity number out of range", # Checker outputs this for anything above 471 # 255, but in fact all Unicode is valid 472 "No items in =over", # ie a blockquote 473); 474 475sub suppressed { 476 # Returns bool as to if input message is one that is to be suppressed 477 478 my $message = shift; 479 return grep { $message =~ /^\Q$_/i } @suppressed_messages; 480} 481 482{ # Closure to contain a simple subset of test.pl. This is to get rid of the 483 # unnecessary 'failed at' messages that would otherwise be output pointing 484 # to a particular line in this file. 485 486 my $current_test = 0; 487 my $planned; 488 489 sub plan { 490 my %plan = @_; 491 $planned = $plan{tests} + 1; # +1 for final test that files haven't 492 # been removed 493 print "1..$planned\n"; 494 return; 495 } 496 497 sub ok { 498 my $success = shift; 499 my $message = shift; 500 501 chomp $message; 502 503 $current_test++; 504 print "not " unless $success; 505 print "ok $current_test - $message\n"; 506 return $success; 507 } 508 509 sub skip { 510 my $why = shift; 511 my $n = @_ ? shift : 1; 512 for (1..$n) { 513 $current_test++; 514 print "ok $current_test # skip $why\n"; 515 } 516 no warnings 'exiting'; 517 last SKIP; 518 } 519 520 sub note { 521 my $message = shift; 522 523 chomp $message; 524 525 print $message =~ s/^/# /mgr; 526 print "\n"; 527 return; 528 } 529 530 END { 531 if ($planned && $planned != $current_test) { 532 print STDERR 533 "# Looks like you planned $planned tests but ran $current_test.\n"; 534 } 535 } 536} 537 538# List of known potential problems by pod and type. 539my %known_problems; 540 541# Pods given by the keys contain an interior node that is referred to from 542# outside it. 543my %has_referred_to_node; 544 545my $show_counts = 0; 546my $regen = 0; 547my $add_link = 0; 548my $show_all = 0; 549my $pedantic = 0; 550 551my $do_upstream_cpan = 0; # Assume that are to skip anything in /cpan 552my $do_deltas = 0; # And stable perldeltas 553 554while (@ARGV && substr($ARGV[0], 0, 1) eq '-') { 555 my $arg = shift @ARGV; 556 557 $arg =~ s/^--/-/; # Treat '--' the same as a single '-' 558 if ($arg eq '-regen') { 559 $regen = 1; 560 $pedantic = 1; 561 } 562 elsif ($arg eq '-add_link') { 563 $add_link = 1; 564 } 565 elsif ($arg eq '-cpan') { 566 $do_upstream_cpan = 1; 567 } 568 elsif ($arg eq '-deltas') { 569 $do_deltas = 1; 570 } 571 elsif ($arg eq '-show_all') { 572 $show_all = 1; 573 } 574 elsif ($arg eq '-counts') { 575 $show_counts = 1; 576 } 577 elsif ($arg eq '-pedantic') { 578 $pedantic = 1; 579 } 580 else { 581 die <<EOF; 582Unknown option '$arg' 583 584Usage: $0 [ --regen | --cpan | --show_all | FILE ... | --add_link MODULE ... ]\n" 585 --add_link -> Add the MODULE and man page references to the database 586 --regen -> Regenerate the data file for $0 587 --cpan -> Include files in the cpan subdirectory. 588 --deltas -> Include stable perldeltas 589 --show_all -> Show all known potential problems 590 --counts -> Don't test, but give summary counts of the currently 591 existing database 592 --pedantic -> Check for overly long lines in verbatim blocks 593EOF 594 } 595} 596 597$pedantic = 1 if exists $ENV{PERL_POD_PEDANTIC} and $ENV{PERL_POD_PEDANTIC}; 598my @files = @ARGV; 599 600my $cpan_or_deltas = $do_upstream_cpan || $do_deltas; 601if (($regen + $show_all + $show_counts + $add_link + $cpan_or_deltas ) > 1) { 602 croak "--regen, --show_all, --counts, and --add_link are mutually exclusive\n and none can be run with --cpan nor --deltas"; 603} 604 605my $has_input_files = @files; 606 607 608if ($add_link) { 609 if (! $has_input_files) { 610 croak "--add_link requires at least one module or man page reference"; 611 } 612} 613elsif ($has_input_files) { 614 if ($regen || $show_counts || $do_upstream_cpan || $do_deltas) { 615 croak "--regen, --counts, --deltas, and --cpan can't be used since using specific files"; 616 } 617 foreach my $file (@files) { 618 croak "Can't read file '$file'" if ! -r $file; 619 } 620} 621 622our %problems; # potential problems found in this run 623 624package My::Pod::Checker { # Extend Pod::Checker 625 use parent 'Pod::Checker'; 626 627 # Uses inside out hash to protect from typos 628 # For new fields, remember to add to destructor DESTROY() 629 my %indents; # Stack of indents from =over's in effect for 630 # current line 631 my %current_indent; # Current line's indent 632 my %filename; # The pod is store in this file 633 my %skip; # is SKIP set for this pod 634 my %in_NAME; # true if within NAME section 635 my %in_begin; # true if within =begin section 636 my %linkable_item; # Bool: if the latest =item is linkable. It isn't 637 # for bullet and number lists 638 my %linkable_nodes; # Pod::Checker adds all =items to its node list, 639 # but not all =items are linkable to 640 my %seen_encoding_cmd; # true if have =encoding earlier 641 my %command_count; # Number of commands seen 642 my %seen_pod_cmd; # true if have =pod earlier 643 my %warned_encoding; # true if already have warned about =encoding 644 # problems 645 646 sub DESTROY { 647 my $addr = Scalar::Util::refaddr $_[0]; 648 delete $command_count{$addr}; 649 delete $current_indent{$addr}; 650 delete $filename{$addr}; 651 delete $in_begin{$addr}; 652 delete $indents{$addr}; 653 delete $in_NAME{$addr}; 654 delete $linkable_item{$addr}; 655 delete $linkable_nodes{$addr}; 656 delete $seen_encoding_cmd{$addr}; 657 delete $seen_pod_cmd{$addr}; 658 delete $skip{$addr}; 659 delete $warned_encoding{$addr}; 660 return; 661 } 662 663 sub new { 664 my $class = shift; 665 my $filename = shift; 666 667 my $self = $class->SUPER::new(-quiet => 1, 668 -warnings => $Warnings_Level); 669 my $addr = Scalar::Util::refaddr $self; 670 $command_count{$addr} = 0; 671 $current_indent{$addr} = 0; 672 $filename{$addr} = $filename; 673 $in_begin{$addr} = 0; 674 $in_NAME{$addr} = 0; 675 $linkable_item{$addr} = 0; 676 $seen_encoding_cmd{$addr} = 0; 677 $seen_pod_cmd{$addr} = 0; 678 $warned_encoding{$addr} = 0; 679 return $self; 680 } 681 682 # re's for messages that Pod::Checker outputs 683 my $location = qr/ \b (?:in|at|on|near) \s+ /xi; 684 my $optional_location = qr/ (?: $location )? /xi; 685 my $line_reference = qr/ [('"]? $optional_location \b line \s+ 686 (?: \d+ | EOF | \Q???\E | - ) 687 [)'"]? /xi; 688 689 sub poderror { # Called to register a potential problem 690 691 # This adds an extra field to the parent hash, 'parameter'. It is 692 # used to extract the variable parts of a message leaving just the 693 # constant skeleton. This in turn allows the message to be 694 # categorized better, so that it shows up as a single type in our 695 # database, with the specifics of each occurrence not being stored with 696 # it. 697 698 my $self = shift; 699 my $opts = shift; 700 701 my $addr = Scalar::Util::refaddr $self; 702 return if $skip{$addr}; 703 704 # Input can be a string or hash. If a string, parse it to separate 705 # out the line number and convert to a hash for easier further 706 # processing 707 my $message; 708 if (ref $opts ne 'HASH') { 709 $message = join "", $opts, @_; 710 my $line_number; 711 if ($message =~ s/\s*($line_reference)//) { 712 ($line_number = $1) =~ s/\s*$optional_location//; 713 } 714 else { 715 $line_number = '???'; 716 } 717 $opts = { -msg => $message, -line => $line_number }; 718 } else { 719 $message = $opts->{'-msg'}; 720 721 } 722 723 $message =~ s/^\d+\s+//; 724 return if main::suppressed($message); 725 726 $self->SUPER::poderror($opts, @_); 727 728 $opts->{parameter} = "" unless $opts->{parameter}; 729 730 # The variable parts of the message tend to be enclosed in '...', 731 # "....", or (...). Extract them and put them in an extra field, 732 # 'parameter'. This is trickier because the matching delimiter to a 733 # '(' is its mirror, and not itself. Text::Balanced could be used 734 # instead. 735 while ($message =~ m/ \s* $optional_location ( [('"] )/xg) { 736 my $delimiter = $1; 737 my $start = $-[0]; 738 $delimiter = ')' if $delimiter eq '('; 739 740 # If there is no ending delimiter, don't consider it to be a 741 # variable part. Most likely it is a contraction like "Don't" 742 last unless $message =~ m/\G .+? \Q$delimiter/xg; 743 744 my $length = $+[0] - $start; 745 746 # Get the part up through the closing delimiter 747 my $special = substr($message, $start, $length); 748 $special =~ s/^\s+//; # No leading whitespace 749 750 # And add that variable part to the parameter, while removing it 751 # from the message. This isn't a foolproof way of finding the 752 # variable part. For example '(s)' can occur in e.g., 753 # 'paragraph(s)' 754 if ($special ne '(s)') { 755 substr($message, $start, $length) = ""; 756 pos $message = $start; 757 $opts->{-msg} = $message; 758 $opts->{parameter} .= " " if $opts->{parameter}; 759 $opts->{parameter} .= $special; 760 } 761 } 762 763 # Extract any additional line number given. This is often the 764 # beginning location of something whereas the main line number gives 765 # the ending one. 766 if ($message =~ /( $line_reference )/xi) { 767 my $line_ref = $1; 768 while ($message =~ s/\s*\Q$line_ref//) { 769 $opts->{-msg} = $message; 770 $opts->{parameter} .= " " if $opts->{parameter}; 771 $opts->{parameter} .= $line_ref; 772 } 773 } 774 775 Carp::carp("Couldn't extract line number from '$message'") if $message =~ /line \d+/; 776 push @{$problems{$filename{$addr}}{$message}}, $opts; 777 #push @{$problems{$self->get_filename}{$message}}, $opts; 778 } 779 780 sub check_encoding { # Does it need an =encoding statement? 781 my ($self, $paragraph, $line_num, $pod_para) = @_; 782 783 # Do nothing if there is an =encoding in the file, or if the line 784 # doesn't require an =encoding, or have already warned. 785 my $addr = Scalar::Util::refaddr $self; 786 return if $seen_encoding_cmd{$addr} 787 || $warned_encoding{$addr} 788 || $paragraph !~ /\P{ASCII}/; 789 790 $warned_encoding{$addr} = 1; 791 my ($file, $line) = $pod_para->file_line; 792 $self->poderror({ -line => $line, -file => $file, 793 -msg => $need_encoding 794 }); 795 return; 796 } 797 798 sub verbatim { 799 my ($self, $paragraph, $line_num, $pod_para) = @_; 800 $self->check_encoding($paragraph, $line_num, $pod_para); 801 802 $self->SUPER::verbatim($paragraph, $line_num, $pod_para); 803 804 my $addr = Scalar::Util::refaddr $self; 805 806 # Pick up the name, since the parent class doesn't in verbatim 807 # NAMEs; so treat as non-verbatim. The parent class only allows one 808 # paragraph in a NAME section, so if there is an extra blank line, it 809 # will trigger a message, but such a blank line is harmless, so skip 810 # in that case. 811 if ($in_NAME{$addr} && $paragraph =~ /\S/) { 812 $self->textblock($paragraph, $line_num, $pod_para); 813 } 814 815 my @lines = split /^/, $paragraph; 816 for my $i (0 .. @lines - 1) { 817 if ( my $encoding = $seen_encoding_cmd{$addr} ) { 818 require Encode; 819 $lines[$i] = Encode::decode($encoding, $lines[$i]); 820 } 821 $lines[$i] =~ s/\s+$//; 822 my $indent = $self->get_current_indent; 823 824 if ($pedantic) { # TODO: this check should be moved higher 825 # to avoid more unnecessary work 826 my $exceeds = length(Text::Tabs::expand($lines[$i])) 827 + $indent - $MAX_LINE_LENGTH; 828 next unless $exceeds > 0; 829 my ($file, $line) = $pod_para->file_line; 830 $self->poderror({ -line => $line + $i, -file => $file, 831 -msg => $line_length, 832 parameter => "+$exceeds (including " . ($indent - $INDENT) . " from =over's)", 833 }); 834 } 835 } 836 } 837 838 sub textblock { 839 my ($self, $paragraph, $line_num, $pod_para) = @_; 840 $self->check_encoding($paragraph, $line_num, $pod_para); 841 842 $self->SUPER::textblock($paragraph, $line_num, $pod_para); 843 844 my ($file, $line) = $pod_para->file_line; 845 my $addr = Scalar::Util::refaddr $self; 846 if ($in_NAME{$addr}) { 847 if (! $self->name) { 848 my $text = $self->interpolate($paragraph, $line_num); 849 if ($text =~ /^\s*(\S+?)\s*$/) { 850 $self->name($1); 851 $self->poderror({ -line => $line, -file => $file, 852 -msg => $missing_name_description, 853 parameter => $1}); 854 } 855 } 856 } 857 $paragraph = join " ", split /^/, $paragraph; 858 859 # Matches something that looks like a file name, but is enclosed in 860 # C<...> 861 my $C_path_re = qr{ \b ( C< 862 # exclude various things that have slashes 863 # in them but aren't paths 864 (?! 865 (?: (?: s | qr | m) / ) # regexes 866 | \d+/\d+> # probable fractions 867 | OS/2> 868 | Perl/Tk> 869 | origin/blead> 870 | origin/maint 871 | - # File names don't begin with "-" 872 ) 873 [-\w]+ (?: / [-\w]+ )+ (?: \. \w+ )? > ) 874 }x; 875 876 # If looks like a reference to other documentation by containing the 877 # word 'See' and then a likely pod directive, warn. 878 while ($paragraph =~ m{ 879 ( (?: \w+ \s+ )* ) # The phrase before, if any 880 \b [Ss]ee \s+ 881 ( ( [^L] ) 882 < 883 ( [^<]*? ) # The not < excludes nested C<L<... 884 > 885 ) 886 ( \s+ (?: under | in ) \s+ L< )? 887 }xg) { 888 my $prefix = $1 // ""; 889 my $construct = $2; # The whole thing, like C<...> 890 my $type = $3; 891 my $interior = $4; 892 my $trailing = $5; # After the whole thing ending in "L<" 893 894 # If the full phrase is something like, "you might see C<", or 895 # similar, it really isn't a reference to a link. The ones I saw 896 # all had the word "you" in them; and the "you" wasn't the 897 # beginning of a sentence. 898 if ($prefix !~ / \b you \b /x) { 899 900 # Now, find what the module or man page name within the 901 # construct would be if it actually has L<> syntax. If it 902 # doesn't have that syntax, will set the module to the entire 903 # interior. 904 $interior =~ m/ ^ 905 (?: [^|]+ \| )? # Optional arbitrary text ending 906 # in "|" 907 ( .+? ) # module, etc. name 908 (?: \/ .+ )? # target within module 909 $ 910 /xs; 911 my $module = $1; 912 if (! defined $trailing # not referring to something in another 913 # section 914 && $interior !~ /$non_pods/ 915 916 # C<> that look like files have their own message below, so 917 # exclude them 918 && $construct !~ /$C_path_re/g 919 920 # There can't be spaces (I think) in module names or man 921 # pages 922 && $module !~ / \s /x 923 924 # F<> that end in eg \.pl are almost certainly ok, as are 925 # those that look like a path with multiple "/" chars 926 && ($type ne "F" 927 || (! -e $interior 928 && $interior !~ /\.\w+$/ 929 && $interior !~ /\/.+\//) 930 ) 931 ) { 932 # TODO: move the checking of $pedantic higher up 933 $self->poderror({ -line => $line, -file => $file, 934 -msg => $C_not_linked, 935 parameter => $construct 936 }) if $pedantic; 937 } 938 } 939 } 940 while ($paragraph =~ m/$C_path_re/g) { 941 my $construct = $1; 942 # TODO: move the checking of $pedantic higher up 943 $self->poderror({ -line => $line, -file => $file, 944 -msg => $C_with_slash, 945 parameter => $construct 946 }) if $pedantic; 947 } 948 return; 949 } 950 951 sub command { 952 my ($self, $cmd, $paragraph, $line_num, $pod_para) = @_; 953 my $addr = Scalar::Util::refaddr $self; 954 if ($cmd eq "pod") { 955 $seen_pod_cmd{$addr}++; 956 } 957 elsif ($cmd eq "encoding") { 958 my ($file, $line) = $pod_para->file_line; 959 $seen_encoding_cmd{$addr} = $paragraph; # for later decoding 960 if ($command_count{$addr} != 1 && $seen_pod_cmd{$addr}) { 961 $self->poderror({ -line => $line, -file => $file, 962 -msg => $encoding_first 963 }); 964 } 965 } 966 $self->check_encoding($paragraph, $line_num, $pod_para); 967 968 # Pod::Check treats all =items as linkable, but the bullet and 969 # numbered lists really aren't. So keep our own list. This has to be 970 # processed before SUPER is called so that the list is started before 971 # the rest of it gets parsed. 972 if ($cmd eq 'item') { # Not linkable if item begins with * or a digit 973 $linkable_item{$addr} = ($paragraph !~ / ^ \s* 974 (?: [*] 975 | \d+ \.? (?: \$ | \s+ ) 976 )/x) 977 ? 1 978 : 0; 979 980 } 981 $self->SUPER::command($cmd, $paragraph, $line_num, $pod_para); 982 983 $command_count{$addr}++; 984 985 $in_NAME{$addr} = 0; # Will change to 1 below if necessary 986 $in_begin{$addr} = 0; # ibid 987 if ($cmd eq 'over') { 988 my $text = $self->interpolate($paragraph, $line_num); 989 my $indent = 4; # default 990 $indent = $1 if $text && $text =~ /^\s*(\d+)\s*$/; 991 push @{$indents{$addr}}, $indent; 992 $current_indent{$addr} += $indent; 993 } 994 elsif ($cmd eq 'back') { 995 if (@{$indents{$addr}}) { 996 $current_indent{$addr} -= pop @{$indents{$addr}}; 997 } 998 else { 999 # =back without corresponding =over, but should have 1000 # warned already 1001 $current_indent{$addr} = 0; 1002 } 1003 } 1004 elsif ($cmd =~ /^head/) { 1005 if (! $in_begin{$addr}) { 1006 1007 # If a particular formatter, then this command doesn't really 1008 # apply 1009 $current_indent{$addr} = 0; 1010 undef @{$indents{$addr}}; 1011 } 1012 1013 my $text = $self->interpolate($paragraph, $line_num); 1014 $in_NAME{$addr} = 1 if $cmd eq 'head1' 1015 && $text && $text =~ /^NAME\b/; 1016 } 1017 elsif ($cmd eq 'begin') { 1018 $in_begin{$addr} = 1; 1019 } 1020 1021 return; 1022 } 1023 1024 sub hyperlink { 1025 my $self = shift; 1026 1027 my $page; 1028 if ($_[0] && ($page = $_[0][1]{'-page'})) { 1029 my $node = $_[0][1]{'-node'}; 1030 1031 # If the hyperlink is to an interior node of another page, save it 1032 # so that we can see if we need to parse normally skipped files. 1033 $has_referred_to_node{$page} = 1 if $node; 1034 1035 # Ignore certain placeholder links in perldelta. Check if the 1036 # link is page-level, and also check if to a node within the page 1037 if ($self->name && $self->name eq "perldelta" 1038 && ((grep { $page eq $_ } @perldelta_ignore_links) 1039 || ($node 1040 && (grep { "$page/$node" eq $_ } @perldelta_ignore_links) 1041 ))) { 1042 return; 1043 } 1044 } 1045 return $self->SUPER::hyperlink($_[0]); 1046 } 1047 1048 sub node { 1049 my $self = shift; 1050 my $text = $_[0]; 1051 if($text) { 1052 $text =~ s/\s+$//s; # strip trailing whitespace 1053 $text =~ s/\s+/ /gs; # collapse whitespace 1054 my $addr = Scalar::Util::refaddr $self; 1055 push(@{$linkable_nodes{$addr}}, $text) if 1056 ! $current_indent{$addr} 1057 || $linkable_item{$addr}; 1058 } 1059 return $self->SUPER::node($_[0]); 1060 } 1061 1062 sub get_current_indent { 1063 return $INDENT + $current_indent{Scalar::Util::refaddr $_[0]}; 1064 } 1065 1066 sub get_filename { 1067 return $filename{Scalar::Util::refaddr $_[0]}; 1068 } 1069 1070 sub linkable_nodes { 1071 my $linkables = $linkable_nodes{Scalar::Util::refaddr $_[0]}; 1072 return undef unless $linkables; 1073 return @$linkables; 1074 } 1075 1076 sub get_skip { 1077 return $skip{Scalar::Util::refaddr $_[0]} // 0; 1078 } 1079 1080 sub set_skip { 1081 my $self = shift; 1082 $skip{Scalar::Util::refaddr $self} = shift; 1083 1084 # If skipping, no need to keep the problems for it 1085 delete $problems{$self->get_filename}; 1086 return; 1087 } 1088 1089 sub parse_from_file { 1090 # This overrides the super class method so that if an open fails on a 1091 # transitory file, it doesn't croak. It returns 1 if it did find the 1092 # file, 0 if it didn't 1093 1094 my $self = shift; 1095 my $filename = shift; 1096 # ignores 2nd param, which is output file. Always uses undef 1097 1098 if (open my $in_fh, '<:bytes', $filename) { 1099 $self->SUPER::parse_from_filehandle($in_fh, undef); 1100 close $in_fh; 1101 return 1; 1102 } 1103 1104 # If couldn't open file, perhaps it was transitory, and hence not an error 1105 return 0 unless -e $filename; 1106 1107 die "Can't open '$filename': $!\n"; 1108 } 1109} 1110 1111package Tie_Array_to_FH { # So printing actually goes to an array 1112 1113 my %array; 1114 1115 sub TIEHANDLE { 1116 my $class = shift; 1117 my $array_ref = shift; 1118 1119 my $self = bless \do{ my $anonymous_scalar }, $class; 1120 $array{Scalar::Util::refaddr $self} = $array_ref; 1121 1122 return $self; 1123 } 1124 1125 sub PRINT { 1126 my $self = shift; 1127 push @{$array{Scalar::Util::refaddr $self}}, @_; 1128 return 1; 1129 } 1130} 1131 1132 1133my %filename_to_checker; # Map a filename to it's pod checker object 1134my %id_to_checker; # Map a checksum to it's pod checker object 1135my %nodes; # key is filename, values are nodes in that file. 1136my %nodes_first_word; # same, but value is first word of each node 1137my %valid_modules; # List of modules known to exist outside us. 1138my %digests; # checksums of files, whose names are the keys 1139my %filename_to_pod; # Map a filename to its pod NAME 1140my %files_with_unknown_issues; 1141my %files_with_fixes; 1142 1143my $data_fh; 1144open $data_fh, '<:bytes', $known_issues or die "Can't open $known_issues"; 1145 1146my %counts; # For --counts param, count of each issue type 1147my %suppressed_files; # Files with at least one issue type to suppress 1148my $HEADER = <<END; 1149# This file is the data file for $0. 1150# There are three types of lines. 1151# Comment lines are white-space only or begin with a '#', like this one. Any 1152# changes you make to the comment lines will be lost when the file is 1153# regen'd. 1154# Lines without tab characters are simply NAMES of pods that the program knows 1155# will have links to them and the program does not check if those links are 1156# valid. 1157# All other lines should have three fields, each separated by a tab. The 1158# first field is the name of a pod; the second field is an error message 1159# generated by this program; and the third field is a count of how many 1160# known instances of that message there are in the pod. -1 means that the 1161# program can expect any number of this type of message. 1162END 1163 1164my @existing_issues; 1165 1166 1167while (<$data_fh>) { # Read the database 1168 chomp; 1169 next if /^\s*(?:#|$)/; # Skip comment and empty lines 1170 if (/\t/) { 1171 next if $show_all; 1172 if ($add_link) { # The issues are saved and later output unchanged 1173 push @existing_issues, $_; 1174 next; 1175 } 1176 1177 # Keep track of counts of each issue type for each file 1178 my ($filename, $message, $count) = split /\t/; 1179 $known_problems{$filename}{$message} = $count; 1180 1181 if ($show_counts) { 1182 if ($count < 0) { # -1 means to suppress this issue type 1183 $suppressed_files{$filename} = $filename; 1184 } 1185 else { 1186 $counts{$message} += $count; 1187 } 1188 } 1189 } 1190 else { # Lines without a tab are modules known to be valid 1191 $valid_modules{$_} = 1 1192 } 1193} 1194close $data_fh; 1195 1196if ($add_link) { 1197 $copy_fh = open_new($known_issues); 1198 1199 # Check for basic sanity, and add each command line argument 1200 foreach my $module (@files) { 1201 die "\"$module\" does not look like a module or man page" 1202 # Must look like (A or A::B or A::B::C ..., or foo(3C) 1203 if $module !~ /^ (?: \w+ (?: :: \w+ )* | \w+ \( \d \w* \) ) $/x; 1204 $valid_modules{$module} = 1 1205 } 1206 my_safer_print($copy_fh, $HEADER); 1207 foreach (sort { lc $a cmp lc $b } keys %valid_modules) { 1208 my_safer_print($copy_fh, $_, "\n"); 1209 } 1210 1211 # The rest of the db file is output unchanged. 1212 my_safer_print($copy_fh, join "\n", @existing_issues, ""); 1213 1214 close_and_rename($copy_fh); 1215 exit; 1216} 1217 1218if ($show_counts) { 1219 my $total = 0; 1220 foreach my $message (sort keys %counts) { 1221 $total += $counts{$message}; 1222 note(Text::Tabs::expand("$counts{$message}\t$message")); 1223 } 1224 note("-----\n" . Text::Tabs::expand("$total\tknown potential issues")); 1225 if (%suppressed_files) { 1226 note("\nFiles that have all messages of at least one type suppressed:"); 1227 note(join ",", keys %suppressed_files); 1228 } 1229 exit 0; 1230} 1231 1232# re to match files that are to be parsed only if there is an internal link 1233# to them. It does not include cpan, as whether those are parsed depends 1234# on a switch. Currently, only perltoc and the stable perldelta.pod's 1235# are included. The latter all have characters between 'perl' and 1236# 'delta'. (Actually the currently developed one matches as well, but 1237# is a duplicate of perldelta.pod, so can be skipped, so fine for it to 1238# match this. 1239my $only_for_interior_links_re = qr/ ^ pod\/perltoc.pod $ 1240 /x; 1241unless ($do_deltas) { 1242 $only_for_interior_links_re = qr/$only_for_interior_links_re | 1243 \b perl \d+ delta \. pod \b 1244 /x; 1245} 1246 1247{ # Closure 1248 my $first_time = 1; 1249 1250 sub output_thanks ($$$$) { # Called when an issue has been fixed 1251 my $filename = shift; 1252 my $original_count = shift; 1253 my $current_count = shift; 1254 my $message = shift; 1255 1256 $files_with_fixes{$filename} = 1; 1257 my $return; 1258 my $fixed_count = $original_count - $current_count; 1259 my $a_problem = ($fixed_count == 1) ? "a problem" : "multiple problems"; 1260 my $another_problem = ($fixed_count == 1) ? "another problem" : "another set of problems"; 1261 my $diff; 1262 if ($message) { 1263 $diff = <<EOF; 1264There were $original_count occurrences (now $current_count) in this pod of type 1265"$message", 1266EOF 1267 } else { 1268 $diff = <<EOF; 1269There are no longer any problems found in this pod! 1270EOF 1271 } 1272 1273 if ($first_time) { 1274 $first_time = 0; 1275 $return = <<EOF; 1276Thanks for fixing $a_problem! 1277$diff 1278Now you must teach $0 that this was fixed. 1279EOF 1280 } 1281 else { 1282 $return = <<EOF 1283Thanks for fixing $another_problem. 1284$diff 1285EOF 1286 } 1287 1288 return $return; 1289 } 1290} 1291 1292sub my_safer_print { # print, with error checking for outputting to db 1293 my ($fh, @lines) = @_; 1294 1295 if (! print $fh @lines) { 1296 my $save_error = $!; 1297 close($fh); 1298 die "Write failure: $save_error"; 1299 } 1300} 1301 1302sub extract_pod { # Extracts just the pod from a file; returns undef if file 1303 # doesn't exist 1304 my $filename = shift; 1305 1306 my @pod; 1307 1308 # Arrange for the output of Pod::Parser to be collected in an array we can 1309 # look at instead of being printed 1310 tie *ALREADY_FH, 'Tie_Array_to_FH', \@pod; 1311 if (open my $in_fh, '<:bytes', $filename) { 1312 my $parser = Pod::Parser->new(); 1313 $parser->parse_from_filehandle($in_fh, *ALREADY_FH); 1314 close $in_fh; 1315 1316 return join "", @pod 1317 } 1318 1319 # The file should already have been opened once to get here, so if that 1320 # fails, something is wrong. It's possible that a transitory file 1321 # containing a pod would get here, so if the file no longer exists just 1322 # return undef. 1323 return unless -e $filename; 1324 die "Can't open '$filename': $!\n"; 1325} 1326 1327my $digest = Digest->new($digest_type); 1328 1329# This is used as a callback from File::Find::find(), which always constructs 1330# pathnames using Unix separators 1331sub is_pod_file { 1332 # If $_ is a pod file, add it to the lists and do other prep work. 1333 1334 if (-d) { 1335 # Don't look at files in directories that are for tests, nor those 1336 # beginning with a dot 1337 if (m!/t\z! || m!/\.!) { 1338 $File::Find::prune = 1; 1339 } 1340 return; 1341 } 1342 1343 return unless -r && -s; # Can't check it if can't read it; no need to 1344 # check if 0 length 1345 return unless -f || -l; # Weird file types won't be pods 1346 1347 my ($leaf) = m!([^/]+)\z!; 1348 if (m!/\.! # No hidden Unix files 1349 || $leaf =~ $non_pods) { 1350 note("Not considering $_") if DEBUG; 1351 return; 1352 } 1353 1354 my $filename = $File::Find::name; 1355 1356 # $filename is relative, like './path'. Strip that initial part away. 1357 $filename =~ s!^\./!! or die 'Unexpected pathname "$filename"'; 1358 1359 return if $excluded_files{canonicalize($filename)}; 1360 1361 my $contents = do { 1362 local $/; 1363 my $candidate; 1364 if (! open $candidate, '<:bytes', $_) { 1365 1366 # If a transitory file was found earlier, the open could fail 1367 # legitimately and we just skip the file; also skip it if it is a 1368 # broken symbolic link, as it is probably just a build problem; 1369 # certainly not a file that we would want to check the pod of. 1370 # Otherwise fail it here and no reason to process it further. 1371 # (But the test count will be off too) 1372 ok(0, "Can't open '$filename': $!") 1373 if -r $filename && ! -l $filename; 1374 return; 1375 } 1376 <$candidate>; 1377 }; 1378 1379 # If the file is a .pm or .pod, having any initial '=' on a line is 1380 # grounds for testing it. Otherwise, require a head1 NAME line to 1381 # consider it as a potential pod 1382 if ($filename =~ /\.(?:pm|pod)/) { 1383 return unless $contents =~ /^=/m; 1384 } else { 1385 return unless $contents =~ /^=head1 +NAME/m; 1386 } 1387 1388 # Here, we know that the file is a pod. Add it to the list of files 1389 # to check and create a checker object for it. 1390 1391 push @files, $filename; 1392 my $checker = My::Pod::Checker->new($filename); 1393 $filename_to_checker{$filename} = $checker; 1394 1395 # In order to detect duplicate pods and only analyze them once, we 1396 # compute checksums for the file, so don't have to do an exact 1397 # compare. Note that if the pod is just part of the file, the 1398 # checksums can differ for the same pod. That special case is handled 1399 # later, since if the checksums of the whole file are the same, that 1400 # case won't even come up. We don't need the checksums for files that 1401 # we parse only if there is a link to its interior, but we do need its 1402 # NAME, which is also retrieved in the code below. 1403 1404 if ($filename =~ / (?: ^(cpan|lib|ext|dist)\/ ) 1405 | $only_for_interior_links_re 1406 /x) { 1407 $digest->add($contents); 1408 $digests{$filename} = $digest->digest; 1409 1410 # lib files aren't analyzed if they are duplicates of files copied 1411 # there from some other directory. But to determine this, we need 1412 # to know their NAMEs. We might as well find the NAME now while 1413 # the file is open. Similarly, cpan files aren't analyzed unless 1414 # we're analyzing all of them, or this particular file is linked 1415 # to by a file we are analyzing, and thus we will want to verify 1416 # that the target exists in it. We need to know at least the NAME 1417 # to see if it's worth analyzing, or so we can determine if a lib 1418 # file is a copy of a cpan one. 1419 if ($filename =~ m{ (?: ^ (?: cpan | lib ) / ) 1420 | $only_for_interior_links_re 1421 }x) { 1422 if ($contents =~ /^=head1 +NAME.*/mg) { 1423 # The NAME is the first non-spaces on the line up to a 1424 # comma, dash or end of line. Otherwise, it's invalid and 1425 # this pod doesn't have a legal name that we're smart 1426 # enough to find currently. But the parser will later 1427 # find it if it thinks there is a legal name, and set the 1428 # name 1429 if ($contents =~ /\G # continue from the line after =head1 1430 \s* # ignore any empty lines 1431 ^ \s* ( \S+?) \s* (?: [,-] | $ )/mx) { 1432 my $name = $1; 1433 $checker->name($name); 1434 $id_to_checker{$name} = $checker 1435 if $filename =~ m{^cpan/}; 1436 } 1437 } 1438 elsif ($filename =~ m{^cpan/}) { 1439 $id_to_checker{$digests{$filename}} = $checker; 1440 } 1441 } 1442 } 1443 1444 return; 1445} # End of is_pod_file() 1446 1447# Start of real code that isn't processing the command line (except the 1448# db is read in above, as is processing of the --add_link option). 1449# Here, @files contains list of files on the command line. If have any of 1450# these, unconditionally test them, and show all the errors, even the known 1451# ones, and, since not testing other pods, don't do cross-pod link tests. 1452# (Could add extra code to do cross-pod tests for the ones in the list.) 1453 1454if ($has_input_files) { 1455 undef %known_problems; 1456 $do_upstream_cpan = $do_deltas = 1; # In case one of the inputs is one 1457 # of these types 1458} 1459else { # No input files -- go find all the possibilities. 1460 if ($regen) { 1461 $copy_fh = open_new($known_issues); 1462 note("Regenerating $known_issues, please be patient..."); 1463 print $copy_fh $HEADER; 1464 } 1465 1466 # Move to the directory above us, but have to adjust @INC to account for 1467 # that. 1468 s{^\.\./lib$}{lib} for @INC; 1469 chdir File::Spec->updir; 1470 1471 # And look in this directory and all its subdirectories 1472 find( {wanted => \&is_pod_file, no_chdir => 1}, '.'); 1473 1474 # Add ourselves to the test 1475 push @files, "t/porting/podcheck.t"; 1476} 1477 1478# Now we know how many tests there will be. 1479plan (tests => scalar @files) if ! $regen; 1480 1481 1482 # Sort file names so we get consistent results, and to put cpan last, 1483 # preceeded by the ones that we don't generally parse. This is because both 1484 # these classes are generally parsed only if there is a link to the interior 1485 # of them, and we have to parse all others first to guarantee that they don't 1486 # have such a link. 'lib' files come just before these, as some of these are 1487 # duplicates of others. We already have figured this out when gathering the 1488 # data as a special case for all such files, but this, while unnecessary, 1489 # puts the derived file last in the output. 'readme' files come before those, 1490 # as those also could be duplicates of others, which are considered the 1491 # primary ones. These currently aren't figured out when gathering data, so 1492 # are done here. 1493 @files = sort { if ($a =~ /^cpan/) { 1494 return 1 if $b !~ /^cpan/; 1495 return lc $a cmp lc $b; 1496 } 1497 elsif ($b =~ /^cpan/) { 1498 return -1; 1499 } 1500 elsif ($a =~ /$only_for_interior_links_re/) { 1501 return 1 if $b !~ /$only_for_interior_links_re/; 1502 return lc $a cmp lc $b; 1503 } 1504 elsif ($b =~ /$only_for_interior_links_re/) { 1505 return -1; 1506 } 1507 elsif ($a =~ /^lib/) { 1508 return 1 if $b !~ /^lib/; 1509 return lc $a cmp lc $b; 1510 } 1511 elsif ($b =~ /^lib/) { 1512 return -1; 1513 } elsif ($a =~ /\breadme\b/i) { 1514 return 1 if $b !~ /\breadme\b/i; 1515 return lc $a cmp lc $b; 1516 } 1517 elsif ($b =~ /\breadme\b/i) { 1518 return -1; 1519 } 1520 else { 1521 return lc $a cmp lc $b; 1522 } 1523 } 1524 @files; 1525 1526# Now go through all the files and parse them 1527FILE: 1528foreach my $filename (@files) { 1529 my $parsed = 0; 1530 note("parsing $filename") if DEBUG; 1531 1532 # We may have already figured out some things in the process of generating 1533 # the file list. If so, we have a $checker object already. But if not, 1534 # generate one now. 1535 my $checker = $filename_to_checker{$filename}; 1536 if (! $checker) { 1537 $checker = My::Pod::Checker->new($filename); 1538 $filename_to_checker{$filename} = $checker; 1539 } 1540 1541 # We have set the name in the checker object if there is a possibility 1542 # that no further parsing is necessary, but otherwise do the parsing now. 1543 if (! $checker->name) { 1544 if (! $checker->parse_from_file($filename, undef)) { 1545 $checker->set_skip("$filename is transitory"); 1546 next FILE; 1547 } 1548 $parsed = 1; 1549 1550 } 1551 1552 if ($checker->num_errors() < 0) { # Returns negative if not a pod 1553 $checker->set_skip("$filename is not a pod"); 1554 } 1555 else { 1556 1557 # Here, is a pod. See if it is one that has already been tested, 1558 # or should be tested under another directory. Use either its NAME 1559 # if it has one, or a checksum if not. 1560 my $name = $checker->name; 1561 my $id; 1562 1563 if ($name) { 1564 $id = $name; 1565 } 1566 else { 1567 my $digest = Digest->new($digest_type); 1568 my $contents = extract_pod($filename); 1569 1570 # If the return is undef, it means that $filename was a transitory 1571 # file; skip it. 1572 next FILE unless defined $contents; 1573 $digest->add($contents); 1574 $id = $digest->digest; 1575 } 1576 1577 # If there is a match for this pod with something that we've already 1578 # processed, don't process it, and output why. 1579 my $prior_checker; 1580 if (defined ($prior_checker = $id_to_checker{$id}) 1581 && $prior_checker != $checker) # Could have defined the checker 1582 # earlier without pursuing it 1583 { 1584 1585 # If the pods are identical, then it's just a copy, and isn't an 1586 # error. First use the checksums we have already computed to see 1587 # if the entire files are identical, which means that the pods are 1588 # identical too. 1589 my $prior_filename = $prior_checker->get_filename; 1590 my $same = (! $name 1591 || ($digests{$prior_filename} 1592 && $digests{$filename} 1593 && $digests{$prior_filename} eq $digests{$filename})); 1594 1595 # If they differ, it could be that the files differ for some 1596 # reason, but the pods they contain are identical. Extract the 1597 # pods and do the comparisons on just those. 1598 if (! $same && $name) { 1599 my $contents = extract_pod($filename); 1600 1601 # If return is <undef>, it means that $filename no longer 1602 # exists. This means it was a transitory file, and should not 1603 # be tested. 1604 next FILE unless defined $contents; 1605 1606 my $prior_contents = extract_pod($prior_filename); 1607 1608 # If return is <undef>, it means that $prior_filename no 1609 # longer exists. This means it was a transitory file, and 1610 # should not have been tested, but we already did process it. 1611 # What we should do now is to back-out its records, and 1612 # process $filename in its stead. But backing out is not so 1613 # simple, and so I'm (khw) skipping that unless and until 1614 # experience shows that it is needed. We do go process 1615 # $filename, and there are potential false positive conflicts 1616 # with the transitory $prior_contents, and rerunning the test 1617 # should cause it to succeed. 1618 goto process_this_pod unless defined $prior_contents; 1619 1620 $same = $prior_contents eq $contents; 1621 } 1622 1623 if ($same) { 1624 $checker->set_skip("The pod of $filename is a duplicate of " 1625 . "the pod for $prior_filename"); 1626 } elsif ($prior_filename =~ /\breadme\b/i) { 1627 $checker->set_skip("$prior_filename is a README apparently for $filename"); 1628 } elsif ($filename =~ /\breadme\b/i) { 1629 $checker->set_skip("$filename is a README apparently for $prior_filename"); 1630 } elsif (! $do_upstream_cpan 1631 && $filename =~ /^cpan/ 1632 && $prior_filename =~ /^cpan/) 1633 { 1634 $checker->set_skip("CPAN is upstream for $filename"); 1635 } elsif ( $filename =~ /^utils/ or $prior_filename =~ /^utils/ ) { 1636 $checker->set_skip("$filename copy is in utils/"); 1637 } else { # Here have two pods with identical names that differ 1638 $prior_checker->poderror( 1639 { -msg => $duplicate_name, 1640 -line => "???", 1641 parameter => "'$filename' also has NAME '$name'" 1642 }); 1643 $checker->poderror( 1644 { -msg => $duplicate_name, 1645 -line => "???", 1646 parameter => "'$prior_filename' also has NAME '$name'" 1647 }); 1648 1649 # Changing the names helps later. 1650 $prior_checker->name("$name version arbitrarily numbered 1"); 1651 $checker->name("$name version arbitrarily numbered 2"); 1652 } 1653 1654 # In any event, don't process this pod that has the same name as 1655 # another. 1656 next FILE; 1657 } 1658 1659 process_this_pod: 1660 1661 # A unique pod. 1662 $id_to_checker{$id} = $checker; 1663 1664 my $parsed_for_links = ", but parsed for its interior links"; 1665 if ((! $do_upstream_cpan && $filename =~ /^cpan/) 1666 || $filename =~ $only_for_interior_links_re) 1667 { 1668 if ($filename =~ /^cpan/) { 1669 $checker->set_skip("CPAN is upstream for $filename"); 1670 } 1671 elsif ($filename =~ /perl\d+delta/) { 1672 if (! $do_deltas) { 1673 $checker->set_skip("$filename is a stable perldelta"); 1674 } 1675 } 1676 elsif ($filename =~ /perltoc/) { 1677 $checker->set_skip("$filename dependent on component pods"); 1678 } 1679 else { 1680 croak("Unexpected file '$filename' encountered that has parsing for interior-linking only"); 1681 } 1682 1683 if ($name && $has_referred_to_node{$name}) { 1684 $checker->set_skip($checker->get_skip() . $parsed_for_links); 1685 } 1686 } 1687 1688 # Need a name in order to process it, because not meaningful 1689 # otherwise, and also can't test links to this without a name. 1690 if (!defined $name) { 1691 $checker->poderror( { -msg => $no_name, 1692 -line => '???' 1693 }); 1694 next FILE; 1695 } 1696 1697 # For skipped files, just get its NAME 1698 my $skip; 1699 if (($skip = $checker->get_skip()) && $skip !~ /$parsed_for_links/) 1700 { 1701 $checker->node($name) if $name; 1702 } 1703 elsif (! $parsed) { 1704 if (! $checker->parse_from_file($filename, undef)) { 1705 $checker->set_skip("$filename is transitory"); 1706 next FILE; 1707 } 1708 } 1709 1710 # Go through everything in the file that could be an anchor that 1711 # could be a link target. Count how many there are of the same name. 1712 foreach my $node ($checker->linkable_nodes) { 1713 next FILE if ! $node; # Can be empty is like '=item *' 1714 if (exists $nodes{$name}{$node}) { 1715 $nodes{$name}{$node}++; 1716 } 1717 else { 1718 $nodes{$name}{$node} = 1; 1719 } 1720 1721 # Experiments have shown that cpan search can figure out the 1722 # target of a link even if the exact wording is incorrect, as long 1723 # as the first word is. This happens frequently in perlfunc.pod, 1724 # where the link will be just to the function, but the target 1725 # entry also includes parameters to the function. 1726 my $first_word = $node; 1727 if ($first_word =~ s/^(\S+)\s+\S.*/$1/) { 1728 $nodes_first_word{$name}{$first_word} = $node; 1729 } 1730 } 1731 $filename_to_pod{$filename} = $name; 1732 } 1733} 1734 1735# Here, all files have been parsed, and all links and link targets are stored. 1736# Now go through the files again and see which don't have matches. 1737if (! $has_input_files) { 1738 foreach my $filename (@files) { 1739 next if $filename_to_checker{$filename}->get_skip; 1740 my $checker = $filename_to_checker{$filename}; 1741 foreach my $link ($checker->hyperlink) { 1742 my $linked_to_page = $link->[1]->page; 1743 next unless $linked_to_page; # intra-file checks are handled by std 1744 # Pod::Checker 1745 1746 # Initialize the potential message. 1747 my %problem = ( -msg => $broken_link, 1748 -line => $link->[0], 1749 parameter => "to \"$linked_to_page\"", 1750 ); 1751 1752 # See if we have found the linked-to_file in our parse 1753 if (exists $nodes{$linked_to_page}) { 1754 my $node = $link->[1]->node; 1755 1756 # If link is only to the page-level, already have it 1757 next if ! $node; 1758 1759 # Transform pod language to what we are expecting 1760 $node =~ s,E<sol>,/,g; 1761 $node =~ s/E<verbar>/|/g; 1762 1763 # If link is to a node that exists in the file, is ok 1764 if ($nodes{$linked_to_page}{$node}) { 1765 1766 # But if the page has multiple targets with the same name, 1767 # it's ambiguous which one this should be to. 1768 if ($nodes{$linked_to_page}{$node} > 1) { 1769 $problem{-msg} = $multiple_targets; 1770 $problem{parameter} = "in $linked_to_page that $node could be pointing to"; 1771 $checker->poderror(\%problem); 1772 } 1773 } elsif (! $nodes_first_word{$linked_to_page}{$node}) { 1774 1775 # Here the link target was not found, either exactly or to 1776 # the first word. Is an error. 1777 $problem{parameter} =~ s,"$,/$node",; 1778 $checker->poderror(\%problem); 1779 } 1780 1781 } # Linked-to-file not in parse; maybe is in exception list 1782 elsif (! exists $valid_modules{$link->[1]->page}) { 1783 1784 # Here, is a link to a target that we can't find. Check if 1785 # there is an internal link on the page with the target name. 1786 # If so, it could be that they just forgot the initial '/' 1787 # But perldelta is handled specially: only do this if the 1788 # broken link isn't one of the known bad ones (that are 1789 # placemarkers and should be removed for the final) 1790 my $NAME = $filename_to_pod{$filename}; 1791 if (! defined $NAME) { 1792 $checker->poderror(\%problem); 1793 } 1794 else { 1795 if ($nodes{$NAME}{$linked_to_page}) { 1796 $problem{-msg} = $broken_internal_link; 1797 } 1798 $checker->poderror(\%problem); 1799 } 1800 } 1801 } 1802 } 1803} 1804 1805# If regenerating the data file, start with the modules for which we don't 1806# check targets. If you change the sort order, you need to run --regen before 1807# committing so that future commits that do run regen don't show irrelevant 1808# changes. 1809if ($regen) { 1810 foreach (sort { lc $a cmp lc $b } keys %valid_modules) { 1811 my_safer_print($copy_fh, $_, "\n"); 1812 } 1813} 1814 1815# Now ready to output the messages. 1816foreach my $filename (@files) { 1817 my $canonical = canonicalize($filename); 1818 SKIP: { 1819 my $skip = $filename_to_checker{$filename}->get_skip // ""; 1820 1821 if ($regen) { 1822 foreach my $message ( sort keys %{$problems{$filename}}) { 1823 my $count; 1824 1825 # Preserve a negative setting. 1826 if ($known_problems{$canonical}{$message} 1827 && $known_problems{$canonical}{$message} < 0) 1828 { 1829 $count = $known_problems{$canonical}{$message}; 1830 } 1831 else { 1832 $count = @{$problems{$filename}{$message}}; 1833 } 1834 my_safer_print($copy_fh, $canonical . "\t$message\t$count\n"); 1835 } 1836 next; 1837 } 1838 1839 skip($skip, 1) if $skip; 1840 my @diagnostics; 1841 my $thankful_diagnostics = 0; 1842 my $indent = ' '; 1843 1844 my $total_known = 0; 1845 foreach my $message ( sort keys %{$problems{$filename}}) { 1846 $known_problems{$canonical}{$message} = 0 1847 if ! $known_problems{$canonical}{$message}; 1848 my $diagnostic = ""; 1849 my $problem_count = scalar @{$problems{$filename}{$message}}; 1850 $total_known += $problem_count; 1851 next if $known_problems{$canonical}{$message} < 0; 1852 if ($problem_count > $known_problems{$canonical}{$message}) { 1853 1854 # Here we are about to output all the messages for this type, 1855 # subtract back this number we previously added in. 1856 $total_known -= $problem_count; 1857 1858 $diagnostic .= $indent . qq{"$message"}; 1859 if ($problem_count > 2) { 1860 $diagnostic .= " ($problem_count occurrences," 1861 . " expected $known_problems{$canonical}{$message})"; 1862 } 1863 foreach my $problem (@{$problems{$filename}{$message}}) { 1864 $diagnostic .= " " if $problem_count == 1; 1865 $diagnostic .= "\n$indent$indent"; 1866 $diagnostic .= "$problem->{parameter}" if $problem->{parameter}; 1867 $diagnostic .= " near line $problem->{-line}"; 1868 $diagnostic .= " $problem->{comment}" if $problem->{comment}; 1869 } 1870 $diagnostic .= "\n"; 1871 $files_with_unknown_issues{$filename} = 1; 1872 } elsif ($problem_count < $known_problems{$canonical}{$message}) { 1873 $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, $problem_count, $message); 1874 $thankful_diagnostics++; 1875 } 1876 push @diagnostics, $diagnostic if $diagnostic; 1877 } 1878 1879 # The above loop has output messages where there are current potential 1880 # issues. But it misses where there were some that have been entirely 1881 # fixed. For those, we need to look through the old issues 1882 foreach my $message ( sort keys %{$known_problems{$canonical}}) { 1883 next if $problems{$filename}{$message}; 1884 next if ! $known_problems{$canonical}{$message}; 1885 next if $known_problems{$canonical}{$message} < 0; # Preserve negs 1886 1887 next if !$pedantic and $message =~ 1888 /^(?:\Q$line_length\E|\Q$C_not_linked\E|\Q$C_with_slash\E)/; 1889 1890 my $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, 0, $message); 1891 push @diagnostics, $diagnostic if $diagnostic; 1892 $thankful_diagnostics++ if $diagnostic; 1893 } 1894 1895 my $output = "POD of $filename"; 1896 $output .= ", excluding $total_known not shown known potential problems" 1897 if $total_known; 1898 if (@diagnostics && @diagnostics == $thankful_diagnostics) { 1899 # Output fixed issues as passing to-do tests, so they do not 1900 # cause failures, but t/harness still flags them. 1901 $output .= " # TODO" 1902 } 1903 ok(@diagnostics == $thankful_diagnostics, $output); 1904 if (@diagnostics) { 1905 note(join "", @diagnostics, 1906 "See end of this test output for your options on silencing this"); 1907 } 1908 1909 delete $known_problems{$canonical}; 1910 } 1911} 1912 1913if (! $regen 1914 && ! ok (keys %known_problems == 0, "The known problems database includes no references to non-existent files")) 1915{ 1916 note("The following files were not found: " 1917 . join ", ", keys %known_problems); 1918 note("They will automatically be removed from the db the next time"); 1919 note(" cd t; ./perl -I../lib porting/podcheck.t --regen"); 1920 note("is run"); 1921} 1922 1923my $how_to = <<EOF; 1924 run this test script by hand, using the following formula (on 1925 Un*x-like machines): 1926 cd t 1927 ./perl -I../lib porting/podcheck.t --regen 1928EOF 1929 1930if (%files_with_unknown_issues) { 1931 my $were_count_files = scalar keys %files_with_unknown_issues; 1932 $were_count_files = ($were_count_files == 1) 1933 ? "was $were_count_files file" 1934 : "were $were_count_files files"; 1935 my $message = <<EOF; 1936 1937HOW TO GET THIS .t TO PASS 1938 1939There $were_count_files that had new potential problems identified. 1940Some of them may be real, and some of them may be false positives because 1941this program isn't as smart as it likes to think it is. You can teach this 1942program to ignore the issues it has identified, and hence pass, by doing the 1943following: 1944 19451) If a problem is about a link to an unknown module or man page that 1946 you know exists, re-run the command something like: 1947 ./perl -I../lib porting/podcheck.t --add_link MODULE man_page ... 1948 (MODULEs should look like Foo::Bar, and man_pages should look like 1949 bar(3c); don't do this for a module or man page that you aren't sure 1950 about; instead treat as another type of issue and follow the 1951 instructions below.) 1952 19532) For other issues, decide if each should be fixed now or not. Fix the 1954 ones you decided to, and rerun this test to verify that the fixes 1955 worked. 1956 19573) If there remain false positive or problems that you don't plan to fix right 1958 now, 1959$how_to 1960 That should cause all current potential problems to be accepted by 1961 the program, so that the next time it runs, they won't be flagged. 1962EOF 1963 if (%files_with_fixes) { 1964 $message .= " This step will also take care of the files that have fixes in them\n"; 1965 } 1966 1967 $message .= <<EOF; 1968 For a few files, such as perltoc, certain issues will always be 1969 expected, and more of the same will be added over time. For those, 1970 before you do the regen, you can edit 1971 $known_issues 1972 and find the entry for the module's file and specific error message, 1973 and change the count of known potential problems to -1. 1974EOF 1975 1976 note($message); 1977} elsif (%files_with_fixes) { 1978 note(<<EOF 1979To teach this test script that the potential problems have been fixed, 1980$how_to 1981EOF 1982 ); 1983} 1984 1985if ($regen) { 1986 chdir $original_dir || die "Can't change directories to $original_dir"; 1987 close_and_rename($copy_fh); 1988} 1989