1#!/usr/bin/perl -w 2 3package main; 4 5BEGIN { 6 chdir 't' if -d 't'; 7 @INC = "../lib"; 8 # Do not require test.pl, this file has its own framework. 9} 10 11use strict; 12use warnings; 13use feature 'unicode_strings'; 14no warnings 'experimental::builtin'; 15use builtin 'refaddr'; 16 17use Carp; 18use Config; 19use Digest; 20use File::Find; 21use File::Spec; 22use Text::Tabs; 23 24$| = 1; 25 26BEGIN { 27 if ( $Config{usecrosscompile} ) { 28 print "1..0 # Not all files are available during cross-compilation\n"; 29 exit 0; 30 } 31 if ($^O eq 'dec_osf') { 32 print "1..0 # $^O cannot handle this test\n"; 33 exit(0); 34 } 35 if ( $ENV{'PERL_BUILD_PACKAGING'} ) { 36 print "1..0 # This distro may have modified some files in cpan/. Skipping validation. \n"; 37 exit 0; 38 } 39 require '../regen/regen_lib.pl'; 40} 41 42sub DEBUG { 0 }; 43 44=pod 45 46=head1 NAME 47 48podcheck.t - Look for possible problems in the Perl pods 49 50=head1 SYNOPSIS 51 52 cd t 53 ./perl -I../lib porting/podcheck.t [--show-all] [--cpan] [--deltas] 54 [--counts] [--pedantic] [FILE ...] 55 56 ./perl -I../lib porting/podcheck.t --add-link MODULE ... 57 58 ./perl -I../lib porting/podcheck.t --regen 59 60=head1 DESCRIPTION 61 62podcheck.t is an extension of Pod::Checker. It looks for pod errors and 63potential errors in the files given as arguments, or if none specified, in all 64pods in the distribution workspace, except certain known special ones 65(specified below). It does additional checking beyond that done by 66Pod::Checker, and keeps a database of known potential problems, and will 67fail a pod only if the number of such problems differs from that given in the 68database. 69 70The additional checks it always makes are: 71 72=over 73 74=item Cross-pod link checking 75 76Pod::Checker verifies that links to an internal target in a pod are not 77broken. podcheck.t extends that (when called without FILE arguments) to 78external links. It does this by gathering up all the possible targets in the 79workspace, and cross-checking them. It also checks that a non-broken link 80points to just one target. (The destination pod could have two targets with 81the same name.) 82 83The way that the C<LE<lt>E<gt>> pod command works (for links outside the pod) 84is to actually create a link to C<metacpan.org> with an embedded query for 85the desired pod or man page. That means that links outside the distribution 86are valid. podcheck.t doesn't verify the validity of such links, but instead 87keeps a database of those known to be valid. This means that if a link to a 88target not on the list is created, the target needs to be added to the data 89base. This is accomplished via the L<--add-link|/--add-link MODULE ...> 90option to podcheck.t, described below. 91 92=item An internal link that isn't so specified 93 94If a link is broken, but there is an existing internal target of the same 95name, it is likely that the internal target was meant, and the C<"/"> is 96missing from the C<LE<lt>E<gt>> pod command. 97 98=item Missing or duplicate NAME or missing NAME short description 99 100A pod can't be linked to unless it has a unique name. 101And a NAME should have a dash and short description after it. 102 103=item Occurrences of the Unicode replacement character 104 105L<Pod::Simple> replaces bytes that aren't valid according to the document's 106encoding (declared or auto-detected) with C<\N{REPLACEMENT CHARACTER}>. 107 108=back 109 110If the C<PERL_POD_PEDANTIC> environment variable is set or the C<--pedantic> 111command line argument is provided, then a few more checks are made. 112The pedantic checks are: 113 114=over 115 116=item Verbatim paragraphs that wrap in an 80 (including 2 spare) column window 117 118Pod that inappropriately wraps is less legible. Pod formatters generally wrap 119correctly, except for too long verbatim lines. We assume that any display 120window has at least the traditional 80 columns, and check for verbatim lines 121that won't fit in that space, including when using a pager that reserves 2 122columns for its own use. (Thus the check is for a net of 78 columns.) 123For those lines that don't fit, it tells you how much needs to be cut in 124order to fit. 125 126Often, the easiest thing to do to gain space for these is to lower the indent 127to just one space. 128 129=item Items that perhaps should be links 130 131There are mentions of apparent files in the pods that perhaps should be links 132instead, using C<LE<lt>...E<gt>> 133 134=item Items that perhaps should be C<FE<lt>...E<gt>> 135 136What look like path names enclosed in C<CE<lt>...E<gt>> should perhaps have 137C<FE<lt>...E<gt>> mark-up instead. 138 139=back 140 141A number of issues raised by podcheck.t and by the base Pod::Checker are not 142really problems, but merely potential problems, that is, false positives. 143After inspecting them and 144deciding that they aren't real problems, it is possible to shut up this program 145about them, unlike base Pod::Checker. For a valid link to an outside module 146or man page, call podcheck.t with the C<--add-link> option to add it to the 147database of known links; for other causes, call podcheck.t with the C<--regen> 148option to regenerate the entire database. This tells it that all existing 149issues are to not be mentioned again. 150 151C<--regen> isn't fool-proof. The database merely keeps track of the number of these 152potential problems of each type for each pod. If a new problem of a given 153type is introduced into the pod, podcheck.t will spit out all of them. You 154then have to figure out which is the new one, and should it be changed or not. 155But doing it this way insulates the database from having to keep track of line 156numbers of problems, which may change, or the exact wording of each problem 157which might also change without affecting whether it is a problem or not. 158 159Also, if the count of potential problems of a given type for a pod decreases, 160the database must be regenerated so that it knows the new number. The program 161gives instructions when this happens. 162 163Some pods will have varying numbers of problems of a given type. This can 164be handled by manually editing the database file (see L</FILES>), and setting 165the number of those problems for that pod to a negative number. This will 166cause the corresponding error to always be suppressed no matter how many there 167actually are. 168 169Another problem is that there is currently no check that modules listed as 170valid in the database 171actually are. Thus any errors introduced there will remain there. 172 173=head2 Specially handled pods 174 175=over 176 177=item perltoc 178 179This pod is generated by pasting bits from other pods. Errors in those bits 180will show up as errors here, as well as for those other pods. Therefore 181errors here are suppressed, and the pod is checked only to verify that nodes 182within it actually exist that are externally linked to. 183 184=item perldelta 185 186The current perldelta pod is initialized from a template that contains 187placeholder text. Some of this text is in the form of links that don't really 188exist. Any such links that are listed in C<@perldelta_ignore_links> will not 189generate messages. It is presumed that these links will be cleaned up when 190the perldelta is cleaned up for release since they should be marked with 191C<XXX>. 192 193=item Porting/perldelta_template.pod 194 195This is not a pod, but a template for C<perldelta>. Any errors introduced 196by it will show up when C<perldelta> is created from it. 197 198=item cpan-upstream pods 199 200See the L</--cpan> option documentation 201 202=item old perldeltas 203 204See the L</--deltas> option documentation 205 206=back 207 208=head1 OPTIONS 209 210=over 211 212=item --add-link MODULE ... 213 214Use this option to teach podcheck.t that the C<MODULE>s or man pages actually 215exist, and to silence any messages that links to them are broken. 216 217podcheck.t checks that links within the Perl core distribution are valid, but 218it doesn't check links to man pages or external modules. When it finds 219a broken link, it checks its database of external modules and man pages, 220and only if not found there does it raise a message. This option just adds 221the list of modules and man page references that follow it on the command line 222to that database. 223 224For example, 225 226 cd t 227 ./perl -I../lib porting/podcheck.t --add-link Unicode::Casing 228 229causes the external module "Unicode::Casing" to be added to the database, so 230C<LE<lt>Unicode::CasingE<gt>> will be considered valid. 231 232=item --regen 233 234Regenerate the database used by podcheck.t to include all the existing 235potential problems. Future runs of the program will not then flag any of 236these. Setting this option also sets C<--pedantic>. 237 238=item --cpan 239 240Normally, all pods in the cpan directory are skipped, except to make sure that 241any blead-upstream links to such pods are valid. 242This option will cause cpan upstream pods to be fully checked. 243 244=item --deltas 245 246Normally, all old perldelta pods are skipped, except to make sure that 247any links to such pods are valid. This is because they are considered 248stable, and perhaps trying to fix them will cause changes that will 249misrepresent Perl's history. But, this option will cause them to be fully 250checked. 251 252=item --show-all 253 254Normally, if the number of potential problems of a given type found for a 255pod matches the expected value in the database, they will not be displayed. 256This option forces the database to be generally ignored during the run, so all 257potential problems are displayed and will fail their respective pod test. 258If, however, the database indicates that a particular problem type for a 259particular file is to be skipped, this option doesn't override that unless 260that particular file is passed specifically as one of the FILE parameters on 261the command line. And, passing particular FILEs selects this option in 262general. 263 264=item --counts 265 266Instead of testing, this just dumps the counts of the occurrences of the 267various types of potential problems in the database. 268 269=item --pedantic 270 271There are three potential problems that are not checked for by default. 272This options enables them. The environment variable C<PERL_POD_PEDANTIC> 273can be set to 1 to enable this option also. 274This option is set when C<--regen> is used. 275 276=back 277 278=head1 FILES 279 280The database is stored in F<t/porting/known_pod_issues.dat> 281 282=head1 SEE ALSO 283 284L<Pod::Checker> 285 286=cut 287 288# VMS builds have a '.com' appended to utility and script names, and it adds a 289# trailing dot for any other file name that doesn't have a dot in it. The db 290# is stored without those things. This regex allows for these special file 291# names to be dealt with. It needs to be interpolated into a larger regex 292# that furnishes the closing boundary. 293my $vms_re = qr/ \. (?: com )? /x; 294 295# Some filenames in the MANIFEST match $vms_re, and so must not be handled the 296# same way that the special vms ones are. This hash lists those. 297my %special_vms_files; 298 299# This is to get this to work across multiple file systems, including those 300# that are not case sensitive. The db is stored in lower case, Un*x style, 301# and all file name comparisons are done that way. 302sub canonicalize($) { 303 my $input = shift; 304 my ($volume, $directories, $file) 305 = File::Spec->splitpath(File::Spec->canonpath($input)); 306 # Assumes $volume is constant for everything in this directory structure 307 $directories = "" if ! $directories; 308 $file = "" if ! $file; 309 $file = lc join '/', File::Spec->splitdir($directories), $file; 310 $file =~ s! / /+ !/!gx; # Multiple slashes => single slash 311 312 # The db is stored without the special suffixes that are there in VMS, so 313 # strip them off to get the comparable name. But some files on all 314 # platforms have these suffixes, so this shouldn't happen for them, as any 315 # of their db entries will have the suffixes in them. The hash has been 316 # populated with these files. 317 if ($^O eq 'VMS' 318 && $file =~ / ( $vms_re ) $ /x 319 && ! exists $special_vms_files{$file}) 320 { 321 $file =~ s/ $1 $ //x; 322 } 323 return $file; 324} 325 326##################################################### 327# HOW IT WORKS (in general) 328# 329# If not called with specific files to check, the directory structure is 330# examined for files that have pods in them. Files that might not have to be 331# fully parsed (e.g. in cpan) are parsed enough at this time to find their 332# pod's NAME, and to get a checksum. 333# 334# Those kinds of files are sorted last, but otherwise the pods are parsed with 335# the package coded here, My::Pod::Checker, which is an extension to 336# Pod::Checker that adds some tests and suppresses others that aren't 337# appropriate. The latter module has no provision for capturing diagnostics, 338# so a package, Tie_Array_to_FH, is used to force them to be placed into an 339# array instead of printed. 340# 341# Parsing the files builds up a list of links. The files are gone through 342# again, doing cross-link checking and outputting all saved-up problems with 343# each pod. 344# 345# Sorting the files last that potentially don't need to be fully parsed allows 346# us to not parse them unless there is a link to an internal anchor in them 347# from something that we have already parsed. Keeping checksums allows us to 348# not parse copies of other pods. 349# 350##################################################### 351 352# 1 => Exclude low priority messages that aren't likely to be problems, and 353# has many false positives; higher numbers give more messages. 354my $Warnings_Level = 200; 355 356# perldelta during construction may have place holder links. N.B. This 357# variable is referred to by name in release_managers_guide.pod 358our @perldelta_ignore_links = ( "XXX", "perl5YYYdelta", "perldiag/message" ); 359 360# To see if two pods with the same NAME are actually copies of the same pod, 361# which is not an error, it uses a checksum to save work. 362my $digest_type = "SHA-1"; 363 364my $original_t_dir = File::Spec->rel2abs(File::Spec->curdir); 365my $data_dir = File::Spec->catdir($original_t_dir, 'porting'); 366my $known_issues = File::Spec->catfile($data_dir, 'known_pod_issues.dat'); 367my $MANIFEST = File::Spec->catfile(File::Spec->updir($original_t_dir), 'MANIFEST'); 368my $copy_fh; 369 370my $MAX_LINE_LENGTH = 78; # 78 columns 371my $INDENT = 4; # Lines other than '=head' lines are indented at 372 # least this much 373 374# Our warning messages. Better not have [('"] in them, as those are used as 375# delimiters for variable parts of the messages by poderror. 376my $broken_link = "Apparent broken link"; 377my $broken_internal_link = "Apparent internal link is missing its forward slash"; 378my $multiple_targets = "There is more than one target"; 379my $duplicate_name = "Pod NAME already used"; 380my $no_name = "There is no NAME"; 381my $missing_name_description = "The NAME should have a dash and short description after it"; 382my $replacement_character = "Unicode replacement character found"; 383# the pedantic warnings messages 384my $line_length = "Verbatim line length including indents exceeds $MAX_LINE_LENGTH by"; 385my $C_not_linked = "? Should you be using L<...> instead of"; 386my $C_with_slash = "? Should you be using F<...> or maybe L<...> instead of"; 387 388# objects, tests, etc can't be pods, so don't look for them. Also skip 389# files output by the patch program. Could also ignore most of .gitignore 390# files, but not all, so don't. 391 392my $obj_ext = $Config{'obj_ext'}; $obj_ext =~ tr/.//d; # dot will be added back 393my $lib_ext = $Config{'lib_ext'}; $lib_ext =~ tr/.//d; 394my $lib_so = $Config{'so'}; $lib_so =~ tr/.//d; 395my $dl_ext = $Config{'dlext'}; $dl_ext =~ tr/.//d; 396 397# Not really pods, but can look like them. 398my %excluded_files = ( 399 canonicalize("lib/unicore/mktables") => 1, 400 canonicalize("Porting/make-rmg-checklist") => 1, 401 canonicalize("Porting/perldelta_template.pod") => 1, 402 canonicalize("regen/feature.pl") => 1, 403 canonicalize("regen/warnings.pl") => 1, 404 canonicalize("autodoc.pl") => 1, 405 canonicalize("configpm") => 1, 406 canonicalize("miniperl") => 1, 407 canonicalize("perl") => 1, 408 canonicalize("lib/unicore/mktables") => 1, 409 canonicalize("dist/devel-ppport/parts/inc/ppphdoc") => 1, 410 ); 411 412# This list should not include anything for which case sensitivity is 413# important, as it won't work on VMS, and won't show up until tested on VMS. 414# All or almost all such files should be listed in the MANIFEST, so that can 415# be examined for them, and each such file explicitly excluded, as is done for 416# .PL files in the loop just below this. For files not catchable this way, 417# is_pod_file() can be used to exclude these at a finer grained level. 418my $non_pods = qr/ 419 (?: \. (?: [achot] | zip | gz | bz2 | jar | tar | tgz 420 | orig | rej | patch # Patch program output 421 | sw[op] | \#.* # Editor droppings 422 | old # buildtoc output 423 | xs # pod should be in the .pm file 424 | al # autosplit files 425 | bs # bootstrap files 426 | (?i:sh) # shell scripts, hints, templates 427 | lst # assorted listing files 428 | bat # Windows,OS2 batch files 429 | cmd # Windows,OS2 command files 430 | lis # VMS compiler listings 431 | map # VMS linker maps 432 | opt # VMS linker options files 433 | mms # MM(K|S) description files 434 | ts # timestamp files generated during build 435 | txt # plain text 436 | $obj_ext # object files 437 | exe # $Config{'exe_ext'} might be empty string 438 | $lib_ext # object libraries 439 | $lib_so # shared libraries 440 | $dl_ext # dynamic libraries 441 | gif # GIF images (example files from CGI.pm) 442 | eg # examples from libnet 443 | U # metaconfig unit 444 | core .* 445 ) 446 $ 447 ) | ~$ # Vim droppings 448 | \.bak$ # Other editor droppings 449 | \ \(Autosaved\)\.txt$ # Other editor droppings 450 | ^cxx\$demangler_db\.$ # VMS name mangler database 451 | ^typemap\.?$ # typemap files 452 | ^(?i:Makefile\.PL)$ 453 | ^core (?: $ | \. .* ) 454 | ^vgcore\.[1-9][0-9]*$ 455 | \b Changes \b 456 457 # This is a pod, but is part of a corpus to test agains; we 458 # don't care about any issues in it. 459 | ext\/Pod-Html\/corpus\/perlvar-copy.pod 460 /x; 461 462# Matches something that looks like a file name, but is enclosed in C<...> 463my $C_path_re = qr{ ^ 464 # exclude various things that have slashes 465 # in them but aren't paths 466 (?! 467 (?: (?: s | qr | m | tr | y ) / ) # regexes 468 | \d+/\d+ \b # probable fractions 469 | (?: [LF] < )+ 470 | OS/2 \b 471 | Perl/perl.git \b 472 | Perl/perl5.git \b 473 | Perl/Tk \b 474 | origin/blead \b 475 | origin/maint \b 476 ) 477 /? # Optional initial slash 478 \w+ # First component of path, doesn't begin with 479 # a minus 480 (?: / [-\w]+ )+ # Subsequent path components 481 (?: \. \w+ )? # Optional trailing dot and suffix 482 >* # Any enclosed L< F< have matching closing > 483 $ 484 }x; 485 486# '.PL' files should be excluded, as they aren't final pods, but often contain 487# material used in generating pods, and so can look like a pod. We can't use 488# the regexp above because case sensitivity is important for these, as some 489# '.pl' files should be examined for pods. Instead look through the MANIFEST 490# for .PL files and get their full path names, so we can exclude each such 491# file explicitly. This works because other porting tests prohibit having two 492# files with the same names except for case. 493open my $manifest_fh, '<:bytes', $MANIFEST or die "Can't open $MANIFEST"; 494while (<$manifest_fh>) { 495 496 # While we have MANIFEST open, on VMS platforms, look for files that match 497 # the magic VMS file names that have to be handled specially. Add these 498 # to the list of them. 499 if ($^O eq 'VMS' && / ^ ( [^\t]* $vms_re ) \t /x) { 500 $special_vms_files{$1} = 1; 501 } 502 if (/ ^ ( [^\t]* \. PL ) \t /x) { 503 $excluded_files{canonicalize($1)} = 1; 504 } 505} 506close $manifest_fh, or die "Can't close $MANIFEST"; 507 508 509# Pod::Checker messages to suppress 510my @suppressed_messages = ( 511 # We catch independently the ones that are real problems. 512 qr/multiple occurrences \(\d+\) of link target/, 513 514 "unescaped <>", # Not every '<' or '>' need be escaped 515 qr/No items in =over/, # i.e., a blockquote, which we consider legal 516); 517 518sub suppressed { 519 # Returns bool as to if input message is one that is to be suppressed 520 521 my $message = shift; 522 523 return grep { $message =~ /^$_/i } @suppressed_messages; 524} 525 526{ # Closure to contain a simple subset of test.pl. This is to get rid of the 527 # unnecessary 'failed at' messages that would otherwise be output pointing 528 # to a particular line in this file. 529 530 my $current_test = 0; 531 my $planned; 532 533 sub plan { 534 my %plan = @_; 535 $planned = $plan{tests} + 1; # +1 for final test that files haven't 536 # been removed 537 print "1..$planned\n"; 538 return; 539 } 540 541 sub ok { 542 my $success = shift; 543 my $message = shift; 544 545 chomp $message; 546 547 $current_test++; 548 print "not " unless $success; 549 print "ok $current_test - $message\n"; 550 return $success; 551 } 552 553 sub skip { 554 my $why = shift; 555 my $n = @_ ? shift : 1; 556 for (1..$n) { 557 $current_test++; 558 print "ok $current_test # skip $why\n"; 559 } 560 no warnings 'exiting'; 561 last SKIP; 562 } 563 564 sub _note { 565 my ($andle, $message) = @_; 566 567 chomp $message; 568 569 print $andle $message =~ s/^/# /mgr; 570 print $andle "\n"; 571 return; 572 } 573 574 sub note { unshift @_, \*STDOUT; goto &_note } 575 576 sub diag { unshift @_, \*STDERR; goto &_note } 577 578 END { 579 if ($planned && $planned != $current_test) { 580 print STDERR 581 "# Looks like you planned $planned tests but ran $current_test.\n"; 582 } 583 } 584} 585 586# List of known potential problems by pod and type. 587my %known_problems; 588 589# Pods given by the keys contain an interior node that is referred to from 590# outside it. 591my %has_referred_to_node; 592 593my $show_counts = 0; 594my $regen = 0; 595my $add_link = 0; 596my $show_all = 0; 597my $pedantic = 0; 598 599my $do_upstream_cpan = 0; # Assume that are to skip anything in /cpan 600my $do_deltas = 0; # And stable perldeltas 601 602while (@ARGV && substr($ARGV[0], 0, 1) eq '-') { 603 my $arg = shift @ARGV; 604 605 $arg =~ s/^--/-/; # Treat '--' the same as a single '-' 606 if ($arg eq '-regen') { 607 $regen = 1; 608 $pedantic = 1; 609 } 610 elsif ($arg =~ /^-add[-_]link$/) { 611 $add_link = 1; 612 } 613 elsif ($arg eq '-cpan') { 614 $do_upstream_cpan = 1; 615 } 616 elsif ($arg eq '-deltas') { 617 $do_deltas = 1; 618 } 619 elsif ($arg =~ /^-show[-_]all$/) { 620 $show_all = 1; 621 } 622 elsif ($arg eq '-counts') { 623 $show_counts = 1; 624 } 625 elsif ($arg eq '-pedantic') { 626 $pedantic = 1; 627 } 628 else { 629 die <<EOF; 630Unknown option '$arg' 631 632Usage: $0 [ --regen | --cpan | --show-all | FILE ... | --add-link MODULE ... ]\n" 633 --add-link -> Add the MODULE and man page references to the database 634 --regen -> Regenerate the data file for $0 635 --cpan -> Include files in the cpan subdirectory. 636 --deltas -> Include stable perldeltas 637 --show-all -> Show all known potential problems 638 --counts -> Don't test, but give summary counts of the currently 639 existing database 640 --pedantic -> Check for overly long lines in verbatim blocks 641EOF 642 } 643} 644 645$pedantic = 1 if exists $ENV{PERL_POD_PEDANTIC} and $ENV{PERL_POD_PEDANTIC}; 646my @files = @ARGV; 647 648my $cpan_or_deltas = $do_upstream_cpan || $do_deltas; 649if (($regen + $show_all + $show_counts + $add_link + $cpan_or_deltas ) > 1) { 650 croak "--regen, --show-all, --counts, and --add-link are mutually" 651 . " exclusive\n and none can be run with --cpan nor --deltas"; 652} 653 654my $has_input_files = @files; 655 656 657if ($add_link) { 658 if (! $has_input_files) { 659 croak "--add-link requires at least one module or man page reference"; 660 } 661} 662elsif ($has_input_files) { 663 if ($regen || $show_counts || $do_upstream_cpan || $do_deltas) { 664 croak "--regen, --counts, --deltas, and --cpan can't be used since using specific files"; 665 } 666 foreach my $file (@files) { 667 croak "Can't read file '$file'" if ! -r $file; 668 } 669} 670 671our %problems; # potential problems found in this run 672 673package My::Pod::Checker { # Extend Pod::Checker 674 use parent 'Pod::Checker'; 675 676 # Uses inside out hash to protect from typos 677 # For new fields, remember to add to destructor DESTROY() 678 my %CFL_text; # The text comprising the current C<>, F<>, or L<> 679 my %C_text; # If defined, are in a C<> section, and includes 680 # the accumulated text from that 681 my %current_indent; # Current line's indent 682 my %filename; # The pod is stored in this file 683 my %in_CFL; # count of stacked C<>, F<>, L<> directives 684 my %indents; # Stack of indents from =over's in effect for 685 # current line 686 my %in_for; # true if in a =for or =begin 687 my %in_NAME; # true if within NAME section 688 my %in_begin; # true if within =begin section 689 my %in_X; # true if in a X<> 690 my %linkable_item; # Bool: if the latest =item is linkable. It isn't 691 # for bullet and number lists 692 my %linkable_nodes; # Pod::Checker adds all =items to its node list, 693 # but not all =items are linkable-to 694 my %running_CFL_text; # The current text that is being accumulated until 695 # an end_FOO is found, and this includes any C<>, 696 # F<>, or L<> directives. 697 my %running_simple_text; # The currentt text that is being accumulated 698 # until an end_FOO is found, and all directives 699 # have been expanded into plain text 700 my %command_count; # Number of commands seen 701 my %seen_pod_cmd; # true if have =pod earlier 702 my %skip; # is SKIP set for this pod 703 my %start_line; # the first input line number in the thing 704 # currently being worked on 705 706 sub DESTROY { 707 my $addr = refaddr $_[0]; 708 delete $CFL_text{$addr}; 709 delete $C_text{$addr}; 710 delete $command_count{$addr}; 711 delete $current_indent{$addr}; 712 delete $filename{$addr}; 713 delete $in_begin{$addr}; 714 delete $in_CFL{$addr}; 715 delete $indents{$addr}; 716 delete $in_for{$addr}; 717 delete $in_NAME{$addr}; 718 delete $in_X{$addr}; 719 delete $linkable_item{$addr}; 720 delete $linkable_nodes{$addr}; 721 delete $running_CFL_text{$addr}; 722 delete $running_simple_text{$addr}; 723 delete $seen_pod_cmd{$addr}; 724 delete $skip{$addr}; 725 delete $start_line{$addr}; 726 return; 727 } 728 729 sub new { 730 my $class = shift; 731 my $filename = shift; 732 733 my $self = $class->SUPER::new(-quiet => 1, 734 -warnings => $Warnings_Level); 735 my $addr = refaddr $self; 736 $command_count{$addr} = 0; 737 $current_indent{$addr} = 0; 738 $filename{$addr} = $filename; 739 $in_begin{$addr} = 0; 740 $in_X{$addr} = 0; 741 $in_CFL{$addr} = 0; 742 $in_NAME{$addr} = 0; 743 $linkable_item{$addr} = 0; 744 $seen_pod_cmd{$addr} = 0; 745 return $self; 746 } 747 748 # re's for messages that Pod::Checker outputs 749 my $location = qr/ \b (?:in|at|on|near) \s+ /xi; 750 my $optional_location = qr/ (?: $location )? /xi; 751 my $line_reference = qr/ [('"]? $optional_location \b line \s+ 752 (?: \d+ | EOF | \Q???\E | - ) 753 [)'"]? /xi; 754 755 sub poderror { # Called to register a potential problem 756 757 # This adds an extra field to the parent hash, 'parameter'. It is 758 # used to extract the variable parts of a message leaving just the 759 # constant skeleton. This in turn allows the message to be 760 # categorized better, so that it shows up as a single type in our 761 # database, with the specifics of each occurrence not being stored with 762 # it. 763 764 my $self = shift; 765 my $opts = shift; 766 767 my $addr = refaddr $self; 768 return if $skip{$addr}; 769 770 # Input can be a string or hash. If a string, parse it to separate 771 # out the line number and convert to a hash for easier further 772 # processing 773 my $message; 774 if (ref $opts ne 'HASH') { 775 $message = join "", $opts, @_; 776 my $line_number; 777 if ($message =~ s/\s*($line_reference)//) { 778 ($line_number = $1) =~ s/\s*$optional_location//; 779 } 780 else { 781 $line_number = '???'; 782 } 783 $opts = { -msg => $message, -line => $line_number }; 784 } else { 785 $message = $opts->{'-msg'}; 786 787 } 788 789 $message =~ s/^\d+\s+//; 790 return if main::suppressed($message); 791 792 $self->SUPER::poderror($opts, @_); 793 794 $opts->{parameter} = "" unless $opts->{parameter}; 795 796 # The variable parts of the message tend to be enclosed in '...', 797 # "....", or (...). Extract them and put them in an extra field, 798 # 'parameter'. This is trickier because the matching delimiter to a 799 # '(' is its mirror, and not itself. Text::Balanced could be used 800 # instead. 801 while ($message =~ m/ \s* $optional_location ( [('"] )/xg) { 802 my $delimiter = $1; 803 my $start = $-[0]; 804 $delimiter = ')' if $delimiter eq '('; 805 806 # If there is no ending delimiter, don't consider it to be a 807 # variable part. Most likely it is a contraction like "Don't" 808 last unless $message =~ m/\G .+? \Q$delimiter/xg; 809 810 my $length = $+[0] - $start; 811 812 # Get the part up through the closing delimiter 813 my $special = substr($message, $start, $length); 814 $special =~ s/^\s+//; # No leading whitespace 815 816 # And add that variable part to the parameter, while removing it 817 # from the message. This isn't a foolproof way of finding the 818 # variable part. For example '(s)' can occur in e.g., 819 # 'paragraph(s)' 820 if ($special ne '(s)') { 821 substr($message, $start, $length) = ""; 822 pos $message = $start; 823 $opts->{-msg} = $message; 824 $opts->{parameter} .= " " if $opts->{parameter}; 825 $opts->{parameter} .= $special; 826 } 827 } 828 829 # Extract any additional line number given. This is often the 830 # beginning location of something whereas the main line number gives 831 # the ending one. 832 if ($message =~ /( $line_reference )/xi) { 833 my $line_ref = $1; 834 while ($message =~ s/\s*\Q$line_ref//) { 835 $opts->{-msg} = $message; 836 $opts->{parameter} .= " " if $opts->{parameter}; 837 $opts->{parameter} .= $line_ref; 838 } 839 } 840 841 Carp::carp("Couldn't extract line number from '$message'") if $message =~ /line \d+/; 842 push @{$problems{$filename{$addr}}{$message}}, $opts; 843 #push @{$problems{$self->get_filename}{$message}}, $opts; 844 } 845 846 # In the next subroutines, we keep track of the text of the current 847 # innermost thing, like F<fooC<bar>baz>. The things we care about raising 848 # messages about in this program all come from a single sequence of 849 # characters uninterrupted by other pod commands. Therefore we don't have 850 # to worry about recursion, and we can just set the string we care about 851 # to empty on entrance to each command. 852 853 sub handle_text { 854 # This is called by the parent class to deal with any straight text. 855 # We mostly just append this to the running current value which will 856 # be dealt with upon the end of the current construct, like a 857 # paragraph. But certain things don't contribute to checking the pod 858 # and are ignored. We also have set flags to indicate this text is 859 # going towards constructing certain constructs, and handle those 860 # specially. 861 862 my $self = shift; 863 my $addr = refaddr $self; 864 865 my $return = $self->SUPER::handle_text(@_); 866 867 if ($in_X{$addr} || $in_for{$addr}) { # ignore 868 return $return; 869 } 870 871 my $text = join "\n", @_; 872 $running_simple_text{$addr} .= $text; 873 874 # Keep separate tabs on C<>, F<>, and L<> directives, and one 875 # especially for C<> ones. 876 if ($in_CFL{$addr}) { 877 $CFL_text{$addr} .= $text; 878 $C_text{$addr} .= $text if defined $C_text{$addr}; 879 } 880 else { 881 # This variable is updated instead in the corresponding C, F, or L 882 # handler. 883 $running_CFL_text{$addr} .= $text; 884 } 885 886 # do this line-by-line so we can get the right line number 887 my @lines = split /^/, $running_simple_text{$addr}; 888 for my $i (0..$#lines) { 889 if ($lines[$i] =~ m/\N{REPLACEMENT CHARACTER}/) { 890 $self->poderror({ -line => $start_line{$addr} + $i, 891 -msg => $replacement_character, 892 parameter => "possibly invalid ". $self->encoding . " input at character " . pos $lines[$i], 893 }); 894 } 895 } 896 return $return; 897 } 898 899 # The start_FOO routines check that somehow a C<> construct hasn't escaped 900 # without being checked, and initialize things, and call the parent 901 # class's equivalent routine. 902 903 # The end_FOO routines close things off, and check the text that has been 904 # accumulated for FOO, then call the parent's corresponding routine. 905 906 sub start_Para { 907 my $self = shift; 908 check_see_but_not_link($self); 909 910 my $addr = refaddr $self; 911 $start_line{$addr} = $_[0]->{start_line}; 912 $running_CFL_text{$addr} = ""; 913 $running_simple_text{$addr} = ""; 914 return $self->SUPER::start_Para(@_); 915 } 916 917 sub start_item { 918 my $self = shift; 919 check_see_but_not_link($self); 920 921 my $addr = refaddr $self; 922 $start_line{$addr} = $_[0]->{start_line}; 923 $running_CFL_text{$addr} = ""; 924 $running_simple_text{$addr} = ""; 925 926 } 927 928 sub start_item_text { 929 my $self = shift; 930 start_item($self); 931 my $addr = refaddr $self; 932 933 # This is the only =item that is linkable 934 $linkable_item{$addr} = 1; 935 936 return $self->SUPER::start_item_text(@_); 937 } 938 939 sub start_item_number { 940 my $self = shift; 941 start_item($self); 942 943 return $self->SUPER::start_item_number(@_); 944 } 945 946 sub start_item_bullet { 947 my $self = shift; 948 start_item($self); 949 950 return $self->SUPER::start_item_bullet(@_); 951 } 952 953 sub end_item { # No difference in =item types endings 954 my $self = shift; 955 check_see_but_not_link($self); 956 return $self->SUPER::end_item(@_); 957 } 958 959 sub start_over { 960 my $self = shift; 961 check_see_but_not_link($self); 962 963 my $addr = refaddr $self; 964 $start_line{$addr} = $_[0]->{start_line}; 965 $running_CFL_text{$addr} = ""; 966 $running_simple_text{$addr} = ""; 967 968 # Save this indent on a stack, and keep track of total indent 969 my $indent = $_[0]{'indent'}; 970 push @{$indents{$addr}}, $indent; 971 $current_indent{$addr} += $indent; 972 973 return $self->SUPER::start_over(@_); 974 } 975 976 sub end_over_bullet { shift->end_over(@_) } 977 sub end_over_number { shift->end_over(@_) } 978 sub end_over_text { shift->end_over(@_) } 979 sub end_over_block { shift->end_over(@_) } 980 sub end_over_empty { shift->end_over(@_) } 981 sub end_over { 982 my $self = shift; 983 check_see_but_not_link($self); 984 985 my $addr = refaddr $self; 986 987 # Pop current indent 988 if (@{$indents{$addr}}) { 989 $current_indent{$addr} -= pop @{$indents{$addr}}; 990 } 991 else { 992 # =back without corresponding =over, but should have 993 # warned already 994 $current_indent{$addr} = 0; 995 } 996 } 997 998 sub check_see_but_not_link { 999 1000 # Looks through accumulated text for current element to see if it 1001 # refers to something that should be linked to, but isn't. 1002 1003 my $self = shift; 1004 my $addr = refaddr $self; 1005 1006 return unless defined $running_CFL_text{$addr}; 1007 1008 while ($running_CFL_text{$addr} =~ m{ 1009 ( (?: \w+ \s+ )* ) # The phrase before, if any 1010 \b [Ss]ee \s+ 1011 ( ( [^L] ) 1012 < 1013 ( [^<]*? ) # The not < excludes nested C<L<... 1014 > 1015 ) 1016 ( \s+ (?: under | in ) \s+ L< )? 1017 }xg) 1018 { 1019 my $prefix = $1 // ""; 1020 my $construct = $2; # The whole thing, like C<...> 1021 my $type = $3; 1022 my $interior = $4; 1023 my $trailing = $5; # After the whole thing ending in "L<" 1024 1025 # If the full phrase is something like, "you might see C<", or 1026 # similar, it really isn't a reference to a link. The ones I saw 1027 # all had the word "you" in them; and the "you" wasn't the 1028 # beginning of a sentence. 1029 if ($prefix !~ / \b you \b /x) { 1030 1031 # Now, find what the module or man page name within the 1032 # construct would be if it actually has L<> syntax. If it 1033 # doesn't have that syntax, will set the module to the entire 1034 # interior. 1035 if (! defined $trailing # not referring to something in another 1036 # section 1037 && $interior !~ /$non_pods/ 1038 1039 # There can't be spaces (I think) in module names or man 1040 # pages 1041 && $interior !~ / \s /x 1042 1043 # F<> that end in eg \.pl are almost certainly ok, as are 1044 # those that look like a path with multiple "/" chars 1045 && ($type ne "F" 1046 || (! -e $interior 1047 && $interior !~ /\.\w+$/ 1048 && $interior !~ /\/.+\//) 1049 ) 1050 ) { 1051 # TODO: move the checking of $pedantic higher up 1052 $self->poderror({ -line => $start_line{$addr}, 1053 -msg => $C_not_linked, 1054 parameter => $construct 1055 }); 1056 } 1057 } 1058 } 1059 1060 undef $running_CFL_text{$addr}; 1061 } 1062 1063 sub end_Para { 1064 my $self = shift; 1065 check_see_but_not_link($self); 1066 1067 my $addr = refaddr $self; 1068 if ($in_NAME{$addr}) { 1069 if ($running_simple_text{$addr} =~ /^\s*(\S+?)\s*$/) { 1070 $self->poderror({ -line => $start_line{$addr}, 1071 -msg => $missing_name_description, 1072 parameter => $1}); 1073 } 1074 $in_NAME{$addr} = 0; 1075 } 1076 $self->SUPER::end_Para(@_); 1077 } 1078 1079 sub start_head1 { 1080 my $self = shift; 1081 check_see_but_not_link($self); 1082 1083 my $addr = refaddr $self; 1084 $start_line{$addr} = $_[0]->{start_line}; 1085 $running_CFL_text{$addr} = ""; 1086 $running_simple_text{$addr} = ""; 1087 1088 return $self->SUPER::start_head1(@_); 1089 } 1090 1091 sub end_head1 { # This is called at the end of the =head line. 1092 my $self = shift; 1093 check_see_but_not_link($self); 1094 1095 my $addr = refaddr $self; 1096 1097 $in_NAME{$addr} = 1 if $running_simple_text{$addr} eq 'NAME'; 1098 return $self->SUPER::end_head(@_); 1099 } 1100 1101 sub start_Verbatim { 1102 my $self = shift; 1103 check_see_but_not_link($self); 1104 1105 my $addr = refaddr $self; 1106 $running_simple_text{$addr} = ""; 1107 $start_line{$addr} = $_[0]->{start_line}; 1108 return $self->SUPER::start_Verbatim(@_); 1109 } 1110 1111 sub end_Verbatim { 1112 my $self = shift; 1113 my $addr = refaddr $self; 1114 1115 # Pick up the name if it looks like one, since the parent class 1116 # doesn't handle verbatim NAMEs 1117 if ($in_NAME{$addr} 1118 && $running_simple_text{$addr} =~ /^\s*(\S+?)\s*[,-]/) 1119 { 1120 $self->name($1); 1121 } 1122 1123 my $indent = $self->get_current_indent; 1124 1125 # split the code by line. 1126 my @lines = split /^/, $running_simple_text{$addr}; 1127 1128 # We have two cases here. The verbatim text may be copied from 1129 # one of our files, in which case we check to make sure that the 1130 # code in the documentation matches that of the code in the 1131 # file, OR, we check the line lengths are appropriate. Copied text 1132 # is identified by the first line containing one of the following 1133 # strings: "file source: FILENAME" or "copied from: FILENAME" where 1134 # the FILENAME actually exists. 1135 # 1136 # Yes, this implies that where we are copying code verbatim from 1137 # one of our source files we do not check its length. This is 1138 # because it is a copy, and we shouldn't require the code to be 1139 # munged to be placed in the docs. This should only be used in 1140 # pod files which are used to document the internals for other 1141 # developers, as it may result in unpleasant to view HTML docs. 1142 1143 my $copied = 0; 1144 if ($lines[0] =~ /(?:file source|copied from):\s*([\/\w.]+)/i and -e $1) { 1145 # this text was copied from a source file. Make sure that it still 1146 # matches the source. This is a whitespace insensitive match. 1147 1148 my $file = $1; 1149 $copied = 1; 1150 my $pat = ""; 1151 foreach my $line (@lines[1..$#lines]) { 1152 # convert each line into a pattern fragment 1153 $line =~ s/(?:(\s+)|(\S+))/$1 ? "\\s++" : quotemeta($2)/ge; 1154 $pat .= $line; 1155 } 1156 # merge adjacent \s+ sequences 1157 $pat =~ s/(?:\\s\+\+){2,}/\\s++/g; 1158 # slurp the file 1159 my $slurped = do { 1160 open my $ifh, "<", $file 1161 or die "Failed to open '$file' for read: $!"; 1162 local $/; 1163 <$ifh>; 1164 }; 1165 if ($slurped !~ /$pat/) { 1166 $self->poderror({ -line => $start_line{$addr}, 1167 -msg => "Copied verbatim text is out of sync with source file", 1168 parameter => $file, 1169 }); 1170 } 1171 } 1172 if (!$copied) { 1173 # if the verbatim text has not been copied from one of our 1174 # source files we look at each line to verify it is short enough 1175 for my $i (0 .. @lines - 1) { 1176 $lines[$i] =~ s/\s+$//; 1177 my $exceeds = length(Text::Tabs::expand($lines[$i])) 1178 + $indent - $MAX_LINE_LENGTH; 1179 next unless $exceeds > 0; 1180 1181 $self->poderror({ -line => $start_line{$addr} + $i, 1182 -msg => $line_length, 1183 parameter => "+$exceeds (including " . ($indent - $INDENT) . 1184 " from =over's and $INDENT as base indent)", 1185 }); 1186 } 1187 } 1188 1189 undef $running_simple_text{$addr}; 1190 1191 # Parent class didn't bother to define this 1192 #return $self->SUPER::SUPER::end_Verbatim(@_); 1193 } 1194 1195 sub start_C { 1196 my $self = shift; 1197 my $addr = refaddr $self; 1198 1199 $C_text{$addr} = ""; 1200 1201 # If not in a stacked set of C<>, F<> and L<>, initialize the text for 1202 # them. 1203 $CFL_text{$addr} = "" if ! $in_CFL{$addr}; 1204 $in_CFL{$addr}++; 1205 1206 return $self->SUPER::start_C(@_); 1207 } 1208 1209 sub start_F { 1210 my $self = shift; 1211 my $addr = refaddr $self; 1212 1213 $CFL_text{$addr} = "" if ! $in_CFL{$addr}; 1214 $in_CFL{$addr}++; 1215 return $self->SUPER::start_F(@_); 1216 } 1217 1218 sub start_L { 1219 my $self = shift; 1220 my $addr = refaddr $self; 1221 1222 $CFL_text{$addr} = "" if ! $in_CFL{$addr}; 1223 $in_CFL{$addr}++; 1224 return $self->SUPER::start_L(@_); 1225 } 1226 1227 sub end_C { 1228 my $self = shift; 1229 my $addr = refaddr $self; 1230 1231 # Warn if looks like a file or link enclosed instead by this C<> 1232 if ($C_text{$addr} =~ qr/^ $C_path_re $/x) { 1233 # Here it does look like it could be a file path or a link. 1234 # But some varieties of regex patterns could also fit with what we 1235 # have so far. Weed those out as best we can. '/foo/' is almost 1236 # certainly meant to be a pattern, as is '/foo/g'. 1237 my $is_pattern; 1238 if ($C_text{$addr} !~ qr| ^ / [^/]* / ( [msixpodualngcr]* ) $ |x) { 1239 $is_pattern = 0; 1240 } 1241 else { 1242 1243 # Here, it looks like a pattern potentially followed by some 1244 # modifiers. To make doubly sure, don't count as patterns 1245 # those constructs which have more occurrences (generally 1) 1246 # of a modifier than is legal. 1247 my %counts; 1248 map { $counts{$_}++ } split "", $1; 1249 foreach my $modifier (keys %counts) { 1250 if ($counts{$modifier} > (($modifier eq 'a') 1251 ? 2 1252 : 1)) 1253 { 1254 $is_pattern = 0; 1255 last; 1256 } 1257 } 1258 $is_pattern = 1 unless defined $is_pattern; 1259 } 1260 1261 unless ($is_pattern) { 1262 $self->poderror({ -line => $start_line{$addr}, 1263 -msg => $C_with_slash, 1264 parameter => "C<$C_text{$addr}>" 1265 }); 1266 } 1267 } 1268 undef $C_text{$addr}; 1269 1270 # Add the current text to the running total. This was not done in 1271 # handle_text(), because it just sees the plain text of the innermost 1272 # stacked directive. We want to keep all the directive names 1273 # enclosing the text. Otherwise the fact that C<L<foobar>> is to a 1274 # link would be lost, as the L<> would be gone. 1275 $CFL_text{$addr} = "C<$CFL_text{$addr}>"; 1276 1277 # Add this text to the whole running total only if popping this 1278 # directive off the stack leaves it empty. As long as something is on 1279 # the stack, it gets added to $CFL_text (just above). It is only 1280 # entirely constructed when the stack is empty. 1281 $in_CFL{$addr}--; 1282 $running_CFL_text{$addr} .= $CFL_text{$addr} if ! $in_CFL{$addr}; 1283 1284 return $self->SUPER::end_C(@_); 1285 } 1286 1287 sub end_F { 1288 my $self = shift; 1289 my $addr = refaddr $self; 1290 1291 $CFL_text{$addr} = "F<$CFL_text{$addr}>"; 1292 $in_CFL{$addr}--; 1293 $running_CFL_text{$addr} .= $CFL_text{$addr} if ! $in_CFL{$addr}; 1294 return $self->SUPER::end_F(@_); 1295 } 1296 1297 sub end_L { 1298 my $self = shift; 1299 my $addr = refaddr $self; 1300 1301 $CFL_text{$addr} = "L<$CFL_text{$addr}>"; 1302 $in_CFL{$addr}--; 1303 $running_CFL_text{$addr} .= $CFL_text{$addr} if ! $in_CFL{$addr}; 1304 return $self->SUPER::end_L(@_); 1305 } 1306 1307 sub start_X { 1308 my $self = shift; 1309 my $addr = refaddr $self; 1310 1311 $in_X{$addr} = 1; 1312 return $self->SUPER::start_X(@_); 1313 } 1314 1315 sub end_X { 1316 my $self = shift; 1317 my $addr = refaddr $self; 1318 1319 $in_X{$addr} = 0; 1320 return $self->SUPER::end_X(@_); 1321 } 1322 1323 sub start_for { 1324 my $self = shift; 1325 my $addr = refaddr $self; 1326 1327 $in_for{$addr} = 1; 1328 return $self->SUPER::start_for(@_); 1329 } 1330 1331 sub end_for { 1332 my $self = shift; 1333 my $addr = refaddr $self; 1334 1335 $in_for{$addr} = 0; 1336 return $self->SUPER::end_for(@_); 1337 } 1338 1339 sub hyperlink { 1340 my ($self, $link) = @_; 1341 1342 if ($link && $link->type eq 'pod') { 1343 my $page = $link->page; 1344 my $node = $link->node; 1345 1346 # If the hyperlink is to an interior node of another page, save it 1347 # so that we can see if we need to parse normally skipped files. 1348 $has_referred_to_node{$page} = 1 if $node; 1349 1350 # Ignore certain placeholder links in perldelta. Check if the 1351 # link is page-level, and also check if to a node within the page 1352 if ( $self->name && $self->name eq "perldelta" 1353 && (( grep { $page eq $_ } @perldelta_ignore_links) 1354 || ( $node 1355 && (grep { "$page/$node" eq $_ } @perldelta_ignore_links) 1356 ))) { 1357 return; 1358 } 1359 } 1360 1361 return $self->SUPER::hyperlink($link); 1362 } 1363 1364 sub node { 1365 my $self = shift; 1366 my $text = $_[0]; 1367 if($text) { 1368 $text =~ s/\s+$//s; # strip trailing whitespace 1369 $text =~ s/\s+/ /gs; # collapse whitespace 1370 my $addr = refaddr $self; 1371 push(@{$linkable_nodes{$addr}}, $text) if 1372 ! $current_indent{$addr} 1373 || $linkable_item{$addr}; 1374 } 1375 return $self->SUPER::node($_[0]); 1376 } 1377 1378 sub get_current_indent { 1379 return $INDENT + $current_indent{refaddr $_[0]}; 1380 } 1381 1382 sub get_filename { 1383 return $filename{refaddr $_[0]}; 1384 } 1385 1386 sub linkable_nodes { 1387 my $linkables = $linkable_nodes{refaddr $_[0]}; 1388 return undef unless $linkables; 1389 return @$linkables; 1390 } 1391 1392 sub get_skip { 1393 return $skip{refaddr $_[0]} // 0; 1394 } 1395 1396 sub set_skip { 1397 my $self = shift; 1398 $skip{refaddr $self} = shift; 1399 1400 # If skipping, no need to keep the problems for it 1401 delete $problems{$self->get_filename}; 1402 return; 1403 } 1404 1405 sub parse_from_file { 1406 # This overrides the super class method so that if an open fails on a 1407 # transitory file, it doesn't croak. It returns 1 if it did find the 1408 # file, 0 if it didn't 1409 1410 my $self = shift; 1411 my $filename = shift; 1412 # ignores 2nd param, which is output file. Always uses undef 1413 1414 if (open my $in_fh, '<:bytes', $filename) { 1415 $self->SUPER::parse_from_file($in_fh, undef); 1416 close $in_fh; 1417 return 1; 1418 } 1419 1420 # If couldn't open file, perhaps it was transitory, and hence not an error 1421 return 0 unless -e $filename; 1422 1423 die "Can't open '$filename': $!\n"; 1424 } 1425} 1426 1427my %filename_to_checker; # Map a filename to its pod checker object 1428my %id_to_checker; # Map a checksum to its pod checker object 1429my %nodes; # key is filename, values are nodes in that file. 1430my %nodes_first_word; # same, but value is first word of each node 1431my %valid_modules; # List of modules known to exist outside us. 1432my %digests; # checksums of files, whose names are the keys 1433my %filename_to_pod; # Map a filename to its pod NAME 1434my %files_with_unknown_issues; 1435my %files_with_fixes; 1436 1437my $data_fh; 1438open $data_fh, '<:bytes', $known_issues or die "Can't open $known_issues"; 1439 1440my %counts; # For --counts param, count of each issue type 1441my %suppressed_files; # Files with at least one issue type to suppress 1442my $HEADER = <<END; 1443# This file is the data file for $0. 1444# There are three types of lines. 1445# Comment lines are white-space only or begin with a '#', like this one. Any 1446# changes you make to the comment lines will be lost when the file is 1447# regen'd. 1448# Lines without tab characters are simply NAMES of pods that the program knows 1449# will have links to them and the program does not check if those links are 1450# valid. 1451# All other lines should have three fields, each separated by a tab. The 1452# first field is the name of a pod; the second field is an error message 1453# generated by this program; and the third field is a count of how many 1454# known instances of that message there are in the pod. -1 means that the 1455# program can expect any number of this type of message. 1456END 1457 1458my @existing_issues; 1459 1460 1461while (<$data_fh>) { # Read the database 1462 chomp; 1463 next if /^\s*(?:#|$)/; # Skip comment and empty lines 1464 next if /^ [ < = > ]{7} /xx; # Skip version control conflict markers 1465 if (/\t/) { 1466 if ($add_link) { # The issues are saved and later output unchanged 1467 push @existing_issues, $_; 1468 next; 1469 } 1470 1471 # Keep track of counts of each issue type for each file 1472 my ($filename, $message, $count) = split /\t/; 1473 1474 # The way things aren't shown is to see if the count of the number of 1475 # warnings of a given type has changed. To show all, we pretend there 1476 # weren't any already stored. If the stored value is negative, it 1477 # means counting for this warning in this file is disabled, and hence 1478 # won't change. To skip showing those files under --show-all, we 1479 # retain the negatvie value. To show all occurrences of other 1480 # warnings, we skip setting their count, making them appear to have 1481 # had zero occurrences. 1482 next if $show_all && $count > 0; 1483 1484 $known_problems{$filename}{$message} = $count; 1485 1486 if ($show_counts) { 1487 if ($count < 0) { # -1 means to suppress this issue type 1488 $suppressed_files{$filename} = $filename; 1489 } 1490 else { 1491 $counts{$message} += $count; 1492 } 1493 } 1494 } 1495 else { # Lines without a tab are modules known to be valid 1496 $valid_modules{$_} = 1 1497 } 1498} 1499close $data_fh; 1500 1501if ($add_link) { 1502 $copy_fh = open_new($known_issues); 1503 1504 # Check for basic sanity, and add each command line argument 1505 foreach my $module (@files) { 1506 die "\"$module\" does not look like a module or man page" 1507 # Must look like (A or A::B or A::B::C ..., or foo(3C) 1508 if $module !~ /^ (?: \w+ (?: :: \w+ )* | \w+ \( \d \w* \) ) $/x; 1509 $valid_modules{$module} = 1 1510 } 1511 my_safer_print($copy_fh, $HEADER); 1512 foreach (sort { lc $a cmp lc $b } keys %valid_modules) { 1513 my_safer_print($copy_fh, $_, "\n"); 1514 } 1515 1516 # The rest of the db file is output unchanged. 1517 my_safer_print($copy_fh, join "\n", @existing_issues, ""); 1518 1519 close_and_rename($copy_fh); 1520 exit; 1521} 1522 1523if ($show_counts) { 1524 my $total = 0; 1525 foreach my $message (sort keys %counts) { 1526 $total += $counts{$message}; 1527 note(Text::Tabs::expand("$counts{$message}\t$message")); 1528 } 1529 note("-----\n" . Text::Tabs::expand("$total\tknown potential issues")); 1530 if (%suppressed_files) { 1531 note("\nFiles that have all messages of at least one type suppressed:"); 1532 note(join ", ", sort keys %suppressed_files); 1533 } 1534 exit 0; 1535} 1536 1537# re to match files that are to be parsed only if there is an internal link 1538# to them. It does not include cpan, as whether those are parsed depends 1539# on a switch. Currently, only perltoc and the stable perldelta.pod's 1540# are included. The latter all have characters between 'perl' and 1541# 'delta'. (Actually the currently developed one matches as well, but 1542# is a duplicate of perldelta.pod, so can be skipped, so fine for it to 1543# match this. 1544my $only_for_interior_links_re = qr/ ^ pod\/perltoc.pod $ 1545 /x; 1546unless ($do_deltas) { 1547 $only_for_interior_links_re = qr/$only_for_interior_links_re | 1548 \b perl \d+ delta \. pod \b 1549 /x; 1550} 1551 1552{ # Closure 1553 my $first_time = 1; 1554 1555 sub output_thanks ($$$$) { # Called when an issue has been fixed 1556 my $filename = shift; 1557 my $original_count = shift; 1558 my $current_count = shift; 1559 my $message = shift; 1560 1561 $files_with_fixes{$filename} = 1; 1562 my $return; 1563 my $fixed_count = $original_count - $current_count; 1564 my $a_problem = ($fixed_count == 1) ? "a problem" : "multiple problems"; 1565 my $another_problem = ($fixed_count == 1) ? "another problem" : "another set of problems"; 1566 my $diff; 1567 if ($message) { 1568 $diff = <<EOF; 1569There were $original_count occurrences (now $current_count) in this pod of type 1570"$message", 1571EOF 1572 } else { 1573 $diff = <<EOF; 1574There are no longer any problems found in this pod! 1575EOF 1576 } 1577 1578 if ($first_time) { 1579 $first_time = 0; 1580 $return = <<EOF; 1581Thanks for fixing $a_problem! 1582$diff 1583Now you must teach $0 that this was fixed. 1584EOF 1585 } 1586 else { 1587 $return = <<EOF 1588Thanks for fixing $another_problem. 1589$diff 1590EOF 1591 } 1592 1593 return $return; 1594 } 1595} 1596 1597sub my_safer_print { # print, with error checking for outputting to db 1598 my ($fh, @lines) = @_; 1599 1600 if (! print $fh @lines) { 1601 my $save_error = $!; 1602 close($fh); 1603 die "Write failure: $save_error"; 1604 } 1605} 1606 1607sub extract_pod { # Extracts just the pod from a file; returns undef if file 1608 # doesn't exist 1609 my $filename = shift; 1610 1611 if (open my $in_fh, '<:bytes', $filename) { 1612 use Pod::Simple::JustPod; 1613 my $parser = Pod::Simple::JustPod->new(); 1614 $parser->no_errata_section(1); 1615 $parser->no_whining(1); 1616 $parser->source_filename($filename); 1617 my $output; 1618 $parser->output_string( \$output ); 1619 $parser->parse_lines( <$in_fh>, undef ); 1620 close $in_fh; 1621 1622 return $output; 1623 } 1624 1625 # The file should already have been opened once to get here, so if that 1626 # fails, something is wrong. It's possible that a transitory file 1627 # containing a pod would get here, so if the file no longer exists just 1628 # return undef. 1629 return unless -e $filename; 1630 die "Can't open '$filename': $!\n"; 1631} 1632 1633my $digest = Digest->new($digest_type); 1634 1635# This is used as a callback from File::Find::find(), which always constructs 1636# pathnames using Unix separators 1637sub is_pod_file { 1638 # If $_ is a pod file, add it to the lists and do other prep work. 1639 1640 if (-d) { 1641 # Don't look at files in directories that are for tests, nor those 1642 # beginning with a dot, nor those in the directory where Windows 1643 # builds generate HTML from other POD sources. 1644 if (m!/t\z! || m!/\.! || m!^./win32/html\z!) { 1645 $File::Find::prune = 1; 1646 } 1647 return; 1648 } 1649 1650 return unless -r && -s; # Can't check it if can't read it; no need to 1651 # check if 0 length 1652 return unless -f || -l; # Weird file types won't be pods 1653 1654 my ($leaf) = m!([^/]+)\z!; 1655 if (m!/\.! # No hidden Unix files 1656 || $leaf =~ $non_pods) { 1657 note("Not considering $_") if DEBUG; 1658 return; 1659 } 1660 1661 my $filename = $File::Find::name; 1662 1663 # $filename is relative, like './path'. Strip that initial part away. 1664 $filename =~ s!^\./!! or die 'Unexpected pathname "$filename"'; 1665 1666 return if $excluded_files{canonicalize($filename)}; 1667 1668 my $contents = do { 1669 local $/; 1670 my $candidate; 1671 if (! open $candidate, '<:bytes', $_) { 1672 1673 # If a transitory file was found earlier, the open could fail 1674 # legitimately and we just skip the file; also skip it if it is a 1675 # broken symbolic link, as it is probably just a build problem; 1676 # certainly not a file that we would want to check the pod of. 1677 # Otherwise fail it here and no reason to process it further. 1678 # (But the test count will be off too) 1679 ok(0, "Can't open '$filename': $!") 1680 if -r $filename && ! -l $filename; 1681 return; 1682 } 1683 <$candidate>; 1684 }; 1685 1686 # If the file is a .pm or .pod, having any initial '=' on a line is 1687 # grounds for testing it. Otherwise, require a head1 NAME line to 1688 # consider it as a potential pod 1689 if ($filename =~ /\.(?:pm|pod)/) { 1690 return unless $contents =~ /^=/m; 1691 } else { 1692 return unless $contents =~ /^=head1 +NAME/m; 1693 } 1694 1695 # Here, we know that the file is a pod. Add it to the list of files 1696 # to check and create a checker object for it. 1697 1698 push @files, $filename; 1699 my $checker = My::Pod::Checker->new($filename); 1700 $filename_to_checker{$filename} = $checker; 1701 1702 # In order to detect duplicate pods and only analyze them once, we 1703 # compute checksums for the file, so don't have to do an exact 1704 # compare. Note that if the pod is just part of the file, the 1705 # checksums can differ for the same pod. That special case is handled 1706 # later, since if the checksums of the whole file are the same, that 1707 # case won't even come up. We don't need the checksums for files that 1708 # we parse only if there is a link to its interior, but we do need its 1709 # NAME, which is also retrieved in the code below. 1710 1711 if ($filename =~ / (?: ^(cpan|lib|ext|dist)\/ ) 1712 | $only_for_interior_links_re 1713 /x) 1714 { 1715 my $byte_contents = $contents; 1716 utf8::encode($byte_contents); 1717 $digest->add($byte_contents); # Doesn't handle Unicode 1718 $digests{$filename} = $digest->digest; 1719 1720 # lib files aren't analyzed if they are duplicates of files copied 1721 # there from some other directory. But to determine this, we need 1722 # to know their NAMEs. We might as well find the NAME now while 1723 # the file is open. Similarly, cpan files aren't analyzed unless 1724 # we're analyzing all of them, or this particular file is linked 1725 # to by a file we are analyzing, and thus we will want to verify 1726 # that the target exists in it. We need to know at least the NAME 1727 # to see if it's worth analyzing, or so we can determine if a lib 1728 # file is a copy of a cpan one. 1729 if ($filename =~ m{ (?: ^ (?: cpan | lib ) / ) 1730 | $only_for_interior_links_re 1731 }x) { 1732 if ($contents =~ /^=head1 +NAME.*/mg) { 1733 # The NAME is the first non-spaces on the line up to a 1734 # comma, dash or end of line. Otherwise, it's invalid and 1735 # this pod doesn't have a legal name that we're smart 1736 # enough to find currently. But the parser will later 1737 # find it if it thinks there is a legal name, and set the 1738 # name 1739 if ($contents =~ /\G # continue from the line after =head1 1740 \s* # ignore any empty lines 1741 1742 # ignore =for paragraphs followed by empty 1743 # lines 1744 (?: ^ =for .*? \n (?: [^\s]*? \n )* \s* )* 1745 1746 ^ \s* ( \S+?) \s* (?: [,-] | $ )/mx) { 1747 my $name = $1; 1748 $checker->name($name); 1749 $id_to_checker{$name} = $checker 1750 if $filename =~ m{^cpan/}; 1751 } 1752 } 1753 elsif ($filename =~ m{^cpan/}) { 1754 $id_to_checker{$digests{$filename}} = $checker; 1755 } 1756 } 1757 } 1758 1759 return; 1760} # End of is_pod_file() 1761 1762# Start of real code that isn't processing the command line (except the 1763# db is read in above, as is processing of the --add-link option). 1764# Here, @files contains list of files on the command line. If have any of 1765# these, unconditionally test them, and show all the errors, even the known 1766# ones, and, since not testing other pods, don't do cross-pod link tests. 1767# (Could add extra code to do cross-pod tests for the ones in the list.) 1768 1769if ($has_input_files) { 1770 undef %known_problems; 1771 $do_upstream_cpan = $do_deltas = 1; # In case one of the inputs is one 1772 # of these types 1773} 1774else { # No input files -- go find all the possibilities. 1775 if ($regen) { 1776 $copy_fh = open_new($known_issues); 1777 note("Regenerating $known_issues, please be patient..."); 1778 print $copy_fh $HEADER; 1779 } 1780 1781 # Move to the directory above us, but have to adjust @INC to account for 1782 # that. 1783 s{^\.\./lib$}{lib} for @INC; 1784 chdir File::Spec->updir; 1785 1786 # And look in this directory and all its subdirectories 1787 find( {wanted => \&is_pod_file, no_chdir => 1}, '.'); 1788 1789 # Add ourselves to the test 1790 push @files, "t/porting/podcheck.t"; 1791} 1792 1793# Now we know how many tests there will be. 1794plan (tests => scalar @files) if ! $regen; 1795 1796 1797# Sort file names so we get consistent results, and to put cpan last, 1798# preceded by the ones that we don't generally parse. This is because both 1799# these classes are generally parsed only if there is a link to the interior 1800# of them, and we have to parse all others first to guarantee that they don't 1801# have such a link. 'lib' files come just before these, as some of these are 1802# duplicates of others. We already have figured this out when gathering the 1803# data as a special case for all such files, but this, while unnecessary, 1804# puts the derived file last in the output. 'readme' files come before those, 1805# as those also could be duplicates of others, which are considered the 1806# primary ones. These currently aren't figured out when gathering data, so 1807# are done here. 1808@files = sort { if ($a =~ /^cpan/) { 1809 return 1 if $b !~ /^cpan/; 1810 return lc $a cmp lc $b; 1811 } 1812 elsif ($b =~ /^cpan/) { 1813 return -1; 1814 } 1815 elsif ($a =~ /$only_for_interior_links_re/) { 1816 return 1 if $b !~ /$only_for_interior_links_re/; 1817 return lc $a cmp lc $b; 1818 } 1819 elsif ($b =~ /$only_for_interior_links_re/) { 1820 return -1; 1821 } 1822 elsif ($a =~ /^lib/) { 1823 return 1 if $b !~ /^lib/; 1824 return lc $a cmp lc $b; 1825 } 1826 elsif ($b =~ /^lib/) { 1827 return -1; 1828 } elsif ($a =~ /\breadme\b/i) { 1829 return 1 if $b !~ /\breadme\b/i; 1830 return lc $a cmp lc $b; 1831 } 1832 elsif ($b =~ /\breadme\b/i) { 1833 return -1; 1834 } 1835 else { 1836 return lc $a cmp lc $b; 1837 } 1838 } 1839 @files; 1840 1841# Now go through all the files and parse them 1842FILE: 1843foreach my $filename (@files) { 1844 my $parsed = 0; 1845 note("parsing $filename") if DEBUG; 1846 1847 # We may have already figured out some things in the process of generating 1848 # the file list. If so, we have a $checker object already. But if not, 1849 # generate one now. 1850 my $checker = $filename_to_checker{$filename}; 1851 if (! $checker) { 1852 $checker = My::Pod::Checker->new($filename); 1853 $filename_to_checker{$filename} = $checker; 1854 } 1855 1856 # We have set the name in the checker object if there is a possibility 1857 # that no further parsing is necessary, but otherwise do the parsing now. 1858 if (! $checker->name) { 1859 if (! $checker->parse_from_file($filename, undef)) { 1860 $checker->set_skip("$filename is transitory"); 1861 next FILE; 1862 } 1863 $parsed = 1; 1864 } 1865 1866 if ($checker->num_errors() < 0) { # Returns negative if not a pod 1867 $checker->set_skip("$filename is not a pod"); 1868 } 1869 else { 1870 1871 # Here, is a pod. See if it is one that has already been tested, 1872 # or should be tested under another directory. Use either its NAME 1873 # if it has one, or a checksum if not. 1874 my $name = $checker->name; 1875 my $id; 1876 1877 if ($name) { 1878 $id = $name; 1879 } 1880 else { 1881 my $digest = Digest->new($digest_type); 1882 my $contents = extract_pod($filename); 1883 1884 # If the return is undef, it means that $filename was a transitory 1885 # file; skip it. 1886 next FILE unless defined $contents; 1887 my $byte_contents = $contents; 1888 utf8::encode($byte_contents); 1889 $digest->add($byte_contents); # Doesn't handle Unicode 1890 $id = $digest->digest; 1891 } 1892 1893 # If there is a match for this pod with something that we've already 1894 # processed, don't process it, and output why. 1895 my $prior_checker; 1896 if (defined ($prior_checker = $id_to_checker{$id}) 1897 && $prior_checker != $checker) # Could have defined the checker 1898 # earlier without pursuing it 1899 { 1900 1901 # If the pods are identical, then it's just a copy, and isn't an 1902 # error. First use the checksums we have already computed to see 1903 # if the entire files are identical, which means that the pods are 1904 # identical too. 1905 my $prior_filename = $prior_checker->get_filename; 1906 my $same = (! $name 1907 || ($digests{$prior_filename} 1908 && $digests{$filename} 1909 && $digests{$prior_filename} eq $digests{$filename})); 1910 1911 # If they differ, it could be that the files differ for some 1912 # reason, but the pods they contain are identical. Extract the 1913 # pods and do the comparisons on just those. 1914 if (! $same && $name) { 1915 my $contents = extract_pod($filename); 1916 1917 # If return is <undef>, it means that $filename no longer 1918 # exists. This means it was a transitory file, and should not 1919 # be tested. 1920 next FILE unless defined $contents; 1921 1922 my $prior_contents = extract_pod($prior_filename); 1923 1924 # If return is <undef>, it means that $prior_filename no 1925 # longer exists. This means it was a transitory file, and 1926 # should not have been tested, but we already did process it. 1927 # What we should do now is to back-out its records, and 1928 # process $filename in its stead. But backing out is not so 1929 # simple, and so I'm (khw) skipping that unless and until 1930 # experience shows that it is needed. We do go process 1931 # $filename, and there are potential false positive conflicts 1932 # with the transitory $prior_contents, and rerunning the test 1933 # should cause it to succeed. 1934 goto process_this_pod unless defined $prior_contents; 1935 1936 $same = $prior_contents eq $contents; 1937 } 1938 1939 use File::Basename 'basename'; 1940 if ($same) { 1941 $checker->set_skip("The pod of $filename is a duplicate of " 1942 . "the pod for $prior_filename"); 1943 } elsif ($prior_filename =~ /\breadme\b/i) { 1944 $checker->set_skip("$prior_filename is a README apparently for $filename"); 1945 } elsif ($filename =~ /\breadme\b/i) { 1946 $checker->set_skip("$filename is a README apparently for $prior_filename"); 1947 } elsif (! $do_upstream_cpan 1948 && $filename =~ /^cpan/ 1949 && $prior_filename =~ /^cpan/) 1950 { 1951 $checker->set_skip("CPAN is upstream for $filename"); 1952 } elsif ( $filename =~ /^utils/ or $prior_filename =~ /^utils/ ) { 1953 $checker->set_skip("$filename copy is in utils/"); 1954 } elsif ($prior_filename =~ /^(?:cpan|ext|dist)/ 1955 && $filename !~ /^(?:cpan|ext|dist)/ 1956 && basename($prior_filename) eq basename($filename)) 1957 { 1958 $checker->set_skip("$filename: Need to run make?"); 1959 } else { # Here have two pods with identical names that differ 1960 $prior_checker->poderror( 1961 { -msg => $duplicate_name, 1962 -line => "???", 1963 parameter => "'$filename' also has NAME '$name'" 1964 }); 1965 $checker->poderror( 1966 { -msg => $duplicate_name, 1967 -line => "???", 1968 parameter => "'$prior_filename' also has NAME '$name'" 1969 }); 1970 1971 # Changing the names helps later. 1972 $prior_checker->name("$name version arbitrarily numbered 1"); 1973 $checker->name("$name version arbitrarily numbered 2"); 1974 } 1975 1976 # In any event, don't process this pod that has the same name as 1977 # another. 1978 next FILE; 1979 } 1980 1981 process_this_pod: 1982 1983 # A unique pod. 1984 $id_to_checker{$id} = $checker; 1985 1986 my $parsed_for_links = ", but parsed for its interior links"; 1987 if ((! $do_upstream_cpan && $filename =~ /^cpan/) 1988 || $filename =~ $only_for_interior_links_re) 1989 { 1990 if ($filename =~ /^cpan/) { 1991 $checker->set_skip("CPAN is upstream for $filename"); 1992 } 1993 elsif ($filename =~ /perl\d+delta/) { 1994 if (! $do_deltas) { 1995 $checker->set_skip("$filename is a stable perldelta"); 1996 } 1997 } 1998 elsif ($filename =~ /perltoc/) { 1999 $checker->set_skip("$filename dependent on component pods"); 2000 } 2001 else { 2002 croak("Unexpected file '$filename' encountered that has parsing for interior-linking only"); 2003 } 2004 2005 if ($name && $has_referred_to_node{$name}) { 2006 $checker->set_skip($checker->get_skip() . $parsed_for_links); 2007 } 2008 } 2009 2010 # Need a name in order to process it, because not meaningful 2011 # otherwise, and also can't test links to this without a name. 2012 if (!defined $name) { 2013 $checker->poderror( { -msg => $no_name, 2014 -line => '???' 2015 }); 2016 next FILE; 2017 } 2018 2019 # For skipped files, just get its NAME 2020 my $skip; 2021 if (($skip = $checker->get_skip()) && $skip !~ /$parsed_for_links/) 2022 { 2023 $checker->node($name) if $name; 2024 } 2025 elsif (! $parsed) { 2026 if (! $checker->parse_from_file($filename, undef)) { 2027 $checker->set_skip("$filename is transitory"); 2028 next FILE; 2029 } 2030 } 2031 2032 # Go through everything in the file that could be an anchor that 2033 # could be a link target. Count how many there are of the same name. 2034 foreach my $node ($checker->linkable_nodes) { 2035 next FILE if ! $node; # Can be empty is like '=item *' 2036 $nodes{$name}{$node}++; 2037 2038 # Experiments have shown that cpan search can figure out the 2039 # target of a link even if the exact wording is incorrect, as long 2040 # as the first word is. This happens frequently in perlfunc.pod, 2041 # where the link will be just to the function, but the target 2042 # entry also includes parameters to the function. 2043 my $first_word = $node; 2044 if ($first_word =~ s/^(\S+)\s+\S.*/$1/) { 2045 $nodes_first_word{$name}{$first_word} = $node; 2046 } 2047 } 2048 $filename_to_pod{$filename} = $name; 2049 } 2050} 2051 2052# Here, all files have been parsed, and all links and link targets are stored. 2053# Now go through the files again and see which don't have matches. 2054if (! $has_input_files) { 2055 foreach my $filename (@files) { 2056 next if $filename_to_checker{$filename}->get_skip; 2057 2058 my $checker = $filename_to_checker{$filename}; 2059 foreach my $link ($checker->hyperlinks()) { 2060 my $linked_to_page = $link->page; 2061 next unless $linked_to_page; # intra-file checks are handled by std 2062 # Pod::Checker 2063 # Currently, we assume all external links are valid 2064 next if $link->type eq 'url'; 2065 2066 # Initialize the potential message. 2067 my %problem = ( -msg => $broken_link, 2068 -line => $link->line, 2069 parameter => "to \"$linked_to_page\"", 2070 ); 2071 2072 # See if we have found the linked-to_file in our parse 2073 if (exists $nodes{$linked_to_page}) { 2074 my $node = $link->node; 2075 2076 # If link is only to the page-level, already have it 2077 next if ! $node; 2078 2079 # If link is to a node that exists in the file, is ok 2080 if ($nodes{$linked_to_page}{$node}) { 2081 2082 # But if the page has multiple targets with the same name, 2083 # it's ambiguous which one this should be to. 2084 if ($nodes{$linked_to_page}{$node} > 1) { 2085 $problem{-msg} = $multiple_targets; 2086 $problem{parameter} = "in $linked_to_page that $node could be pointing to"; 2087 $checker->poderror(\%problem); 2088 } 2089 } elsif (! $nodes_first_word{$linked_to_page}{$node}) { 2090 2091 # Here the link target was not found, either exactly or to 2092 # the first word. Is an error. 2093 $problem{parameter} =~ s,"$,/$node",; 2094 $checker->poderror(\%problem); 2095 } 2096 2097 } # Linked-to-file not in parse; maybe is in exception list 2098 elsif (! exists $valid_modules{$link->page}) { 2099 2100 # Here, is a link to a target that we can't find. Check if 2101 # there is an internal link on the page with the target name. 2102 # If so, it could be that they just forgot the initial '/' 2103 # But perldelta is handled specially: only do this if the 2104 # broken link isn't one of the known bad ones (that are 2105 # placemarkers and should be removed for the final) 2106 my $NAME = $filename_to_pod{$filename}; 2107 if (! defined $NAME) { 2108 $checker->poderror(\%problem); 2109 } 2110 else { 2111 if ($nodes{$NAME}{$linked_to_page}) { 2112 $problem{-msg} = $broken_internal_link; 2113 } 2114 $checker->poderror(\%problem); 2115 } 2116 } 2117 } 2118 } 2119} 2120 2121# If regenerating the data file, start with the modules for which we don't 2122# check targets. If you change the sort order, you need to run --regen before 2123# committing so that future commits that do run regen don't show irrelevant 2124# changes. 2125if ($regen) { 2126 foreach (sort { lc $a cmp lc $b } keys %valid_modules) { 2127 my_safer_print($copy_fh, $_, "\n"); 2128 } 2129} 2130 2131# Now ready to output the messages. 2132foreach my $filename (@files) { 2133 my $canonical = canonicalize($filename); 2134 SKIP: { 2135 my $skip = $filename_to_checker{$filename}->get_skip // ""; 2136 2137 if ($regen) { 2138 foreach my $message ( sort keys %{$problems{$filename}}) { 2139 my $count; 2140 2141 # Preserve a negative setting. 2142 if ($known_problems{$canonical}{$message} 2143 && $known_problems{$canonical}{$message} < 0) 2144 { 2145 $count = $known_problems{$canonical}{$message}; 2146 } 2147 else { 2148 $count = @{$problems{$filename}{$message}}; 2149 } 2150 my_safer_print($copy_fh, $canonical . "\t$message\t$count\n"); 2151 } 2152 next; 2153 } 2154 2155 skip($skip, 1) if $skip; 2156 my @diagnostics; 2157 my $thankful_diagnostics = 0; 2158 my $indent = ' '; 2159 2160 my $total_known = 0; 2161 foreach my $message ( sort keys %{$problems{$filename}}) { 2162 $known_problems{$canonical}{$message} = 0 2163 if ! $known_problems{$canonical}{$message}; 2164 my $diagnostic = ""; 2165 my $problem_count = scalar @{$problems{$filename}{$message}}; 2166 $total_known += $problem_count; 2167 next if $known_problems{$canonical}{$message} < 0; 2168 2169 # If we have new problems not previously known, we output all of 2170 # such problems, as we can't know which are really new and which 2171 # not 2172 if ($problem_count > $known_problems{$canonical}{$message}) { 2173 2174 # Here we are about to output all the messages for this type, 2175 # subtract back this number we previously added in. 2176 $total_known -= $problem_count; 2177 2178 $diagnostic .= $indent . qq{"$message"}; 2179 if ($problem_count > 2) { 2180 $diagnostic .= " ($problem_count occurrences," 2181 . " expected $known_problems{$canonical}{$message})"; 2182 } 2183 foreach my $problem (@{$problems{$filename}{$message}}) { 2184 $diagnostic .= " " if $problem_count == 1; 2185 $diagnostic .= "\n$indent$indent"; 2186 $diagnostic .= "$problem->{parameter}" if $problem->{parameter}; 2187 $diagnostic .= " near line $problem->{-line} of " 2188 . $filename; 2189 $diagnostic .= " $problem->{comment}" if $problem->{comment}; 2190 } 2191 $diagnostic .= "\n"; 2192 $files_with_unknown_issues{$filename} = 1; 2193 } elsif ($problem_count < $known_problems{$canonical}{$message}) { 2194 $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, $problem_count, $message); 2195 $thankful_diagnostics++; 2196 } 2197 push @diagnostics, $diagnostic if $diagnostic; 2198 } 2199 2200 # The above loop has output messages where there are current potential 2201 # issues. But it misses where there were some that have been entirely 2202 # fixed. For those, we need to look through the old issues 2203 foreach my $message ( sort keys %{$known_problems{$canonical}}) { 2204 next if $problems{$filename}{$message}; 2205 next if ! $known_problems{$canonical}{$message}; 2206 next if $known_problems{$canonical}{$message} < 0; # Preserve negs 2207 2208 next if !$pedantic and $message =~ 2209 /^(?:\Q$line_length\E|\Q$C_not_linked\E|\Q$C_with_slash\E)/; 2210 2211 my $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, 0, $message); 2212 push @diagnostics, $diagnostic if $diagnostic; 2213 $thankful_diagnostics++ if $diagnostic; 2214 } 2215 2216 my $output = "POD of $filename"; 2217 $output .= ", excluding $total_known not shown known potential problems" 2218 if $total_known; 2219 if (@diagnostics && @diagnostics == $thankful_diagnostics) { 2220 # Output fixed issues as passing to-do tests, so they do not 2221 # cause failures, but t/harness still flags them. 2222 $output .= " # TODO" 2223 } 2224 ok(@diagnostics == $thankful_diagnostics, $output); 2225 if (@diagnostics) { 2226 diag(join "", @diagnostics, 2227 "See end of this test output for your options on silencing this"); 2228 } 2229 2230 delete $known_problems{$canonical}; 2231 } 2232} 2233 2234if (! $regen 2235 && ! ok (keys %known_problems == 0, "The known problems database ($data_dir/known_pod_issues.dat) includes no references to non-existent files")) 2236{ 2237 note("The following files were not found: " 2238 . join ", ", sort keys %known_problems); 2239 note("They will automatically be removed from the db the next time"); 2240 note(" cd t; ./perl -I../lib porting/podcheck.t --regen"); 2241 note("is run"); 2242} 2243 2244my $how_to = <<EOF; 2245 run this test script by hand, using the following formula (on 2246 Un*x-like machines): 2247 cd t 2248 ./perl -I../lib porting/podcheck.t --regen 2249EOF 2250 2251if (%files_with_unknown_issues) { 2252 my $were_count_files = scalar keys %files_with_unknown_issues; 2253 $were_count_files = ($were_count_files == 1) 2254 ? "was $were_count_files file" 2255 : "were $were_count_files files"; 2256 my $message = <<EOF; 2257 2258HOW TO GET ${\__FILE__} TO PASS 2259 2260There $were_count_files that had new potential problems identified. 2261Some of them may be real, and some of them may be false positives because 2262this program isn't as smart as it likes to think it is. You can teach this 2263program to ignore the issues it has identified, and hence pass, by doing the 2264following: 2265 22661) If a problem is about a link to an unknown module or man page that 2267 you know exists, re-run the command something like: 2268 ./perl -I../lib porting/podcheck.t --add-link { MODULE | man_page ... } 2269 (MODULEs should look like Foo::Bar, and man_pages should look like 2270 bar(3c); don't do this for a module or man page that you aren't sure 2271 about; instead treat as another type of issue and follow the 2272 instructions below.) 2273 22742) For other issues, decide if each should be fixed now or not. Fix the 2275 ones you decided to, and rerun this test to verify that the fixes 2276 worked. 2277 22783) If there remain false positive or problems that you don't plan to fix right 2279 now, 2280$how_to 2281 That should cause all current potential problems to be accepted by 2282 the program, so that the next time it runs, they won't be flagged. 2283EOF 2284 if (%files_with_fixes) { 2285 $message .= " This step will also take care of the files that have fixes in them\n"; 2286 } 2287 2288 $message .= <<EOF; 2289 For a few files, such as perltoc, certain issues will always be 2290 expected, and more of the same will be added over time. For those, 2291 before you do the regen, you can edit 2292 $known_issues 2293 and find the entry for the module's file and specific error message, 2294 and change the count of known potential problems to -1. 2295EOF 2296 2297 diag($message); 2298} elsif (%files_with_fixes) { 2299 diag(<<EOF 2300To teach this test script that the potential problems have been fixed, 2301$how_to 2302EOF 2303 ); 2304} 2305 2306if ($regen) { 2307 chdir $original_t_dir || die "Can't change directories to $original_t_dir"; 2308 close_and_rename($copy_fh); 2309} 2310 23111; 2312