xref: /openbsd/gnu/usr.bin/perl/t/porting/podcheck.t (revision 3d61058a)
1#!/usr/bin/perl -w
2
3package main;
4
5BEGIN {
6    chdir 't' if -d 't';
7    @INC = "../lib";
8    # Do not require test.pl, this file has its own framework.
9}
10
11use strict;
12use warnings;
13use feature 'unicode_strings';
14no warnings 'experimental::builtin';
15use builtin 'refaddr';
16
17use Carp;
18use Config;
19use Digest;
20use File::Find;
21use File::Spec;
22use Text::Tabs;
23
24$| = 1;
25
26BEGIN {
27    if ( $Config{usecrosscompile} ) {
28        print "1..0 # Not all files are available during cross-compilation\n";
29        exit 0;
30    }
31    if ($^O eq 'dec_osf') {
32        print "1..0 # $^O cannot handle this test\n";
33        exit(0);
34    }
35    if ( $ENV{'PERL_BUILD_PACKAGING'} ) {
36        print "1..0 # This distro may have modified some files in cpan/. Skipping validation. \n";
37        exit 0;
38    }
39    require '../regen/regen_lib.pl';
40}
41
42sub DEBUG { 0 };
43
44=pod
45
46=head1 NAME
47
48podcheck.t - Look for possible problems in the Perl pods
49
50=head1 SYNOPSIS
51
52 cd t
53 ./perl -I../lib porting/podcheck.t [--show-all] [--cpan] [--deltas]
54                                    [--counts] [--pedantic] [FILE ...]
55
56 ./perl -I../lib porting/podcheck.t --add-link MODULE ...
57
58 ./perl -I../lib porting/podcheck.t --regen
59
60=head1 DESCRIPTION
61
62podcheck.t is an extension of Pod::Checker.  It looks for pod errors and
63potential errors in the files given as arguments, or if none specified, in all
64pods in the distribution workspace, except certain known special ones
65(specified below).  It does additional checking beyond that done by
66Pod::Checker, and keeps a database of known potential problems, and will
67fail a pod only if the number of such problems differs from that given in the
68database.
69
70The additional checks it always makes are:
71
72=over
73
74=item Cross-pod link checking
75
76Pod::Checker verifies that links to an internal target in a pod are not
77broken.  podcheck.t extends that (when called without FILE arguments) to
78external links.  It does this by gathering up all the possible targets in the
79workspace, and cross-checking them.  It also checks that a non-broken link
80points to just one target.  (The destination pod could have two targets with
81the same name.)
82
83The way that the C<LE<lt>E<gt>> pod command works (for links outside the pod)
84is to actually create a link to C<metacpan.org> with an embedded query for
85the desired pod or man page.  That means that links outside the distribution
86are valid.  podcheck.t doesn't verify the validity of such links, but instead
87keeps a database of those known to be valid.  This means that if a link to a
88target not on the list is created, the target needs to be added to the data
89base.  This is accomplished via the L<--add-link|/--add-link MODULE ...>
90option to podcheck.t, described below.
91
92=item An internal link that isn't so specified
93
94If a link is broken, but there is an existing internal target of the same
95name, it is likely that the internal target was meant, and the C<"/"> is
96missing from the C<LE<lt>E<gt>> pod command.
97
98=item Missing or duplicate NAME or missing NAME short description
99
100A pod can't be linked to unless it has a unique name.
101And a NAME should have a dash and short description after it.
102
103=item Occurrences of the Unicode replacement character
104
105L<Pod::Simple> replaces bytes that aren't valid according to the document's
106encoding (declared or auto-detected) with C<\N{REPLACEMENT CHARACTER}>.
107
108=back
109
110If the C<PERL_POD_PEDANTIC> environment variable is set or the C<--pedantic>
111command line argument is provided, then a few more checks are made.
112The pedantic checks are:
113
114=over
115
116=item Verbatim paragraphs that wrap in an 80 (including 2 spare) column window
117
118Pod that inappropriately wraps is less legible.  Pod formatters generally wrap
119correctly, except for too long verbatim lines.  We assume that any display
120window has at least the traditional 80 columns, and check for verbatim lines
121that won't fit in that space, including when using a pager that reserves 2
122columns for its own use.  (Thus the check is for a net of 78 columns.)
123For those lines that don't fit, it tells you how much needs to be cut in
124order to fit.
125
126Often, the easiest thing to do to gain space for these is to lower the indent
127to just one space.
128
129=item Items that perhaps should be links
130
131There are mentions of apparent files in the pods that perhaps should be links
132instead, using C<LE<lt>...E<gt>>
133
134=item Items that perhaps should be C<FE<lt>...E<gt>>
135
136What look like path names enclosed in C<CE<lt>...E<gt>> should perhaps have
137C<FE<lt>...E<gt>> mark-up instead.
138
139=back
140
141A number of issues raised by podcheck.t and by the base Pod::Checker are not
142really problems, but merely potential problems, that is, false positives.
143After inspecting them and
144deciding that they aren't real problems, it is possible to shut up this program
145about them, unlike base Pod::Checker.  For a valid link to an outside module
146or man page, call podcheck.t with the C<--add-link> option to add it to the
147database of known links; for other causes, call podcheck.t with the C<--regen>
148option to regenerate the entire database.  This tells it that all existing
149issues are to not be mentioned again.
150
151C<--regen> isn't fool-proof.  The database merely keeps track of the number of these
152potential problems of each type for each pod.  If a new problem of a given
153type is introduced into the pod, podcheck.t will spit out all of them.  You
154then have to figure out which is the new one, and should it be changed or not.
155But doing it this way insulates the database from having to keep track of line
156numbers of problems, which may change, or the exact wording of each problem
157which might also change without affecting whether it is a problem or not.
158
159Also, if the count of potential problems of a given type for a pod decreases,
160the database must be regenerated so that it knows the new number.  The program
161gives instructions when this happens.
162
163Some pods will have varying numbers of problems of a given type.  This can
164be handled by manually editing the database file (see L</FILES>), and setting
165the number of those problems for that pod to a negative number.  This will
166cause the corresponding error to always be suppressed no matter how many there
167actually are.
168
169Another problem is that there is currently no check that modules listed as
170valid in the database
171actually are.  Thus any errors introduced there will remain there.
172
173=head2 Specially handled pods
174
175=over
176
177=item perltoc
178
179This pod is generated by pasting bits from other pods.  Errors in those bits
180will show up as errors here, as well as for those other pods.  Therefore
181errors here are suppressed, and the pod is checked only to verify that nodes
182within it actually exist that are externally linked to.
183
184=item perldelta
185
186The current perldelta pod is initialized from a template that contains
187placeholder text.  Some of this text is in the form of links that don't really
188exist.  Any such links that are listed in C<@perldelta_ignore_links> will not
189generate messages.  It is presumed that these links will be cleaned up when
190the perldelta is cleaned up for release since they should be marked with
191C<XXX>.
192
193=item Porting/perldelta_template.pod
194
195This is not a pod, but a template for C<perldelta>.  Any errors introduced
196by it will show up when C<perldelta> is created from it.
197
198=item cpan-upstream pods
199
200See the L</--cpan> option documentation
201
202=item old perldeltas
203
204See the L</--deltas> option documentation
205
206=back
207
208=head1 OPTIONS
209
210=over
211
212=item --add-link MODULE ...
213
214Use this option to teach podcheck.t that the C<MODULE>s or man pages actually
215exist, and to silence any messages that links to them are broken.
216
217podcheck.t checks that links within the Perl core distribution are valid, but
218it doesn't check links to man pages or external modules.  When it finds
219a broken link, it checks its database of external modules and man pages,
220and only if not found there does it raise a message.  This option just adds
221the list of modules and man page references that follow it on the command line
222to that database.
223
224For example,
225
226    cd t
227    ./perl -I../lib porting/podcheck.t --add-link Unicode::Casing
228
229causes the external module "Unicode::Casing" to be added to the database, so
230C<LE<lt>Unicode::CasingE<gt>> will be considered valid.
231
232=item --regen
233
234Regenerate the database used by podcheck.t to include all the existing
235potential problems.  Future runs of the program will not then flag any of
236these.  Setting this option also sets C<--pedantic>.
237
238=item --cpan
239
240Normally, all pods in the cpan directory are skipped, except to make sure that
241any blead-upstream links to such pods are valid.
242This option will cause cpan upstream pods to be fully checked.
243
244=item --deltas
245
246Normally, all old perldelta pods are skipped, except to make sure that
247any links to such pods are valid.  This is because they are considered
248stable, and perhaps trying to fix them will cause changes that will
249misrepresent Perl's history.  But, this option will cause them to be fully
250checked.
251
252=item --show-all
253
254Normally, if the number of potential problems of a given type found for a
255pod matches the expected value in the database, they will not be displayed.
256This option forces the database to be generally ignored during the run, so all
257potential problems are displayed and will fail their respective pod test.
258If, however, the database indicates that a particular problem type for a
259particular file is to be skipped, this option doesn't override that unless
260that particular file is passed specifically as one of the FILE parameters on
261the command line.  And, passing particular FILEs selects this option in
262general.
263
264=item --counts
265
266Instead of testing, this just dumps the counts of the occurrences of the
267various types of potential problems in the database.
268
269=item --pedantic
270
271There are three potential problems that are not checked for by default.
272This options enables them. The environment variable C<PERL_POD_PEDANTIC>
273can be set to 1 to enable this option also.
274This option is set when C<--regen> is used.
275
276=back
277
278=head1 FILES
279
280The database is stored in F<t/porting/known_pod_issues.dat>
281
282=head1 SEE ALSO
283
284L<Pod::Checker>
285
286=cut
287
288# VMS builds have a '.com' appended to utility and script names, and it adds a
289# trailing dot for any other file name that doesn't have a dot in it.  The db
290# is stored without those things.  This regex allows for these special file
291# names to be dealt with.  It needs to be interpolated into a larger regex
292# that furnishes the closing boundary.
293my $vms_re = qr/ \. (?: com )? /x;
294
295# Some filenames in the MANIFEST match $vms_re, and so must not be handled the
296# same way that the special vms ones are.  This hash lists those.
297my %special_vms_files;
298
299# This is to get this to work across multiple file systems, including those
300# that are not case sensitive.  The db is stored in lower case, Un*x style,
301# and all file name comparisons are done that way.
302sub canonicalize($) {
303    my $input = shift;
304    my ($volume, $directories, $file)
305                    = File::Spec->splitpath(File::Spec->canonpath($input));
306    # Assumes $volume is constant for everything in this directory structure
307    $directories = "" if ! $directories;
308    $file = "" if ! $file;
309    $file = lc join '/', File::Spec->splitdir($directories), $file;
310    $file =~ s! / /+ !/!gx;       # Multiple slashes => single slash
311
312    # The db is stored without the special suffixes that are there in VMS, so
313    # strip them off to get the comparable name.  But some files on all
314    # platforms have these suffixes, so this shouldn't happen for them, as any
315    # of their db entries will have the suffixes in them.  The hash has been
316    # populated with these files.
317    if ($^O eq 'VMS'
318        && $file =~ / ( $vms_re ) $ /x
319        && ! exists $special_vms_files{$file})
320    {
321        $file =~ s/ $1 $ //x;
322    }
323    return $file;
324}
325
326#####################################################
327# HOW IT WORKS (in general)
328#
329# If not called with specific files to check, the directory structure is
330# examined for files that have pods in them.  Files that might not have to be
331# fully parsed (e.g. in cpan) are parsed enough at this time to find their
332# pod's NAME, and to get a checksum.
333#
334# Those kinds of files are sorted last, but otherwise the pods are parsed with
335# the package coded here, My::Pod::Checker, which is an extension to
336# Pod::Checker that adds some tests and suppresses others that aren't
337# appropriate.  The latter module has no provision for capturing diagnostics,
338# so a package, Tie_Array_to_FH, is used to force them to be placed into an
339# array instead of printed.
340#
341# Parsing the files builds up a list of links.  The files are gone through
342# again, doing cross-link checking and outputting all saved-up problems with
343# each pod.
344#
345# Sorting the files last that potentially don't need to be fully parsed allows
346# us to not parse them unless there is a link to an internal anchor in them
347# from something that we have already parsed.  Keeping checksums allows us to
348# not parse copies of other pods.
349#
350#####################################################
351
352# 1 => Exclude low priority messages that aren't likely to be problems, and
353# has many false positives; higher numbers give more messages.
354my $Warnings_Level = 200;
355
356# perldelta during construction may have place holder links.  N.B.  This
357# variable is referred to by name in release_managers_guide.pod
358our @perldelta_ignore_links = ( "XXX", "perl5YYYdelta", "perldiag/message" );
359
360# To see if two pods with the same NAME are actually copies of the same pod,
361# which is not an error, it uses a checksum to save work.
362my $digest_type = "SHA-1";
363
364my $original_t_dir = File::Spec->rel2abs(File::Spec->curdir);
365my $data_dir = File::Spec->catdir($original_t_dir, 'porting');
366my $known_issues = File::Spec->catfile($data_dir, 'known_pod_issues.dat');
367my $MANIFEST = File::Spec->catfile(File::Spec->updir($original_t_dir), 'MANIFEST');
368my $copy_fh;
369
370my $MAX_LINE_LENGTH = 78;   # 78 columns
371my $INDENT = 4;             # Lines other than '=head' lines are indented at
372                            # least this much
373
374# Our warning messages.  Better not have [('"] in them, as those are used as
375# delimiters for variable parts of the messages by poderror.
376my $broken_link = "Apparent broken link";
377my $broken_internal_link = "Apparent internal link is missing its forward slash";
378my $multiple_targets = "There is more than one target";
379my $duplicate_name = "Pod NAME already used";
380my $no_name = "There is no NAME";
381my $missing_name_description = "The NAME should have a dash and short description after it";
382my $replacement_character = "Unicode replacement character found";
383# the pedantic warnings messages
384my $line_length = "Verbatim line length including indents exceeds $MAX_LINE_LENGTH by";
385my $C_not_linked = "? Should you be using L<...> instead of";
386my $C_with_slash = "? Should you be using F<...> or maybe L<...> instead of";
387
388# objects, tests, etc can't be pods, so don't look for them. Also skip
389# files output by the patch program.  Could also ignore most of .gitignore
390# files, but not all, so don't.
391
392my $obj_ext = $Config{'obj_ext'}; $obj_ext =~ tr/.//d; # dot will be added back
393my $lib_ext = $Config{'lib_ext'}; $lib_ext =~ tr/.//d;
394my $lib_so  = $Config{'so'};      $lib_so  =~ tr/.//d;
395my $dl_ext  = $Config{'dlext'};   $dl_ext  =~ tr/.//d;
396
397# Not really pods, but can look like them.
398my %excluded_files = (
399                        canonicalize("lib/unicore/mktables") => 1,
400                        canonicalize("Porting/make-rmg-checklist") => 1,
401                        canonicalize("Porting/perldelta_template.pod") => 1,
402                        canonicalize("regen/feature.pl") => 1,
403                        canonicalize("regen/warnings.pl") => 1,
404                        canonicalize("autodoc.pl") => 1,
405                        canonicalize("configpm") => 1,
406                        canonicalize("miniperl") => 1,
407                        canonicalize("perl") => 1,
408                        canonicalize("lib/unicore/mktables") => 1,
409                        canonicalize("dist/devel-ppport/parts/inc/ppphdoc") => 1,
410                    );
411
412# This list should not include anything for which case sensitivity is
413# important, as it won't work on VMS, and won't show up until tested on VMS.
414# All or almost all such files should be listed in the MANIFEST, so that can
415# be examined for them, and each such file explicitly excluded, as is done for
416# .PL files in the loop just below this.  For files not catchable this way,
417# is_pod_file() can be used to exclude these at a finer grained level.
418my $non_pods = qr/
419                (?: \. (?: [achot]  | zip | gz | bz2 | jar | tar | tgz
420                           | orig | rej | patch   # Patch program output
421                           | sw[op] | \#.*  # Editor droppings
422                           | old      # buildtoc output
423                           | xs       # pod should be in the .pm file
424                           | al       # autosplit files
425                           | bs       # bootstrap files
426                           | (?i:sh)  # shell scripts, hints, templates
427                           | lst      # assorted listing files
428                           | bat      # Windows,OS2 batch files
429                           | cmd      # Windows,OS2 command files
430                           | lis      # VMS compiler listings
431                           | map      # VMS linker maps
432                           | opt      # VMS linker options files
433                           | mms      # MM(K|S) description files
434                           | ts       # timestamp files generated during build
435                           | txt      # plain text
436                           | $obj_ext # object files
437                           | exe      # $Config{'exe_ext'} might be empty string
438                           | $lib_ext # object libraries
439                           | $lib_so  # shared libraries
440                           | $dl_ext  # dynamic libraries
441                           | gif      # GIF images (example files from CGI.pm)
442                           | eg       # examples from libnet
443                           | U        # metaconfig unit
444                           | core .*
445                       )
446                 $
447               ) | ~$                    # Vim droppings
448                 | \.bak$                # Other editor droppings
449                 | \ \(Autosaved\)\.txt$ # Other editor droppings
450                 | ^cxx\$demangler_db\.$ # VMS name mangler database
451                 | ^typemap\.?$          # typemap files
452                 | ^(?i:Makefile\.PL)$
453                 | ^core (?: $ | \. .* )
454                 | ^vgcore\.[1-9][0-9]*$
455                 | \b Changes \b
456
457                   # This is a pod, but is part of a corpus to test agains; we
458                   # don't care about any issues in it.
459                 | ext\/Pod-Html\/corpus\/perlvar-copy.pod
460             /x;
461
462# Matches something that looks like a file name, but is enclosed in C<...>
463my $C_path_re = qr{ ^
464                        # exclude various things that have slashes
465                        # in them but aren't paths
466                        (?!
467                            (?: (?: s | qr | m | tr | y ) / ) # regexes
468                            | \d+/\d+ \b       # probable fractions
469                            | (?: [LF] < )+
470                            | OS/2 \b
471                            | Perl/perl.git \b
472                            | Perl/perl5.git \b
473                            | Perl/Tk \b
474                            | origin/blead \b
475                            | origin/maint \b
476                        )
477                        /?  # Optional initial slash
478                        \w+ # First component of path, doesn't begin with
479                            # a minus
480                        (?: / [-\w]+ )+ # Subsequent path components
481                        (?: \. \w+ )?   # Optional trailing dot and suffix
482                        >*  # Any enclosed L< F< have matching closing >
483                        $
484                    }x;
485
486# '.PL' files should be excluded, as they aren't final pods, but often contain
487# material used in generating pods, and so can look like a pod.  We can't use
488# the regexp above because case sensitivity is important for these, as some
489# '.pl' files should be examined for pods.  Instead look through the MANIFEST
490# for .PL files and get their full path names, so we can exclude each such
491# file explicitly.  This works because other porting tests prohibit having two
492# files with the same names except for case.
493open my $manifest_fh, '<:bytes', $MANIFEST or die "Can't open $MANIFEST";
494while (<$manifest_fh>) {
495
496    # While we have MANIFEST open, on VMS platforms, look for files that match
497    # the magic VMS file names that have to be handled specially.  Add these
498    # to the list of them.
499    if ($^O eq 'VMS' && / ^ ( [^\t]* $vms_re ) \t /x) {
500        $special_vms_files{$1} = 1;
501    }
502    if (/ ^ ( [^\t]* \. PL ) \t /x) {
503        $excluded_files{canonicalize($1)} = 1;
504    }
505}
506close $manifest_fh, or die "Can't close $MANIFEST";
507
508
509# Pod::Checker messages to suppress
510my @suppressed_messages = (
511    # We catch independently the ones that are real problems.
512    qr/multiple occurrences \(\d+\) of link target/,
513
514    "unescaped <>",                 # Not every '<' or '>' need be escaped
515    qr/No items in =over/,          # i.e., a blockquote, which we consider legal
516);
517
518sub suppressed {
519    # Returns bool as to if input message is one that is to be suppressed
520
521    my $message = shift;
522
523    return grep { $message =~ /^$_/i } @suppressed_messages;
524}
525
526{   # Closure to contain a simple subset of test.pl.  This is to get rid of the
527    # unnecessary 'failed at' messages that would otherwise be output pointing
528    # to a particular line in this file.
529
530    my $current_test = 0;
531    my $planned;
532
533    sub plan {
534        my %plan = @_;
535        $planned = $plan{tests} + 1;    # +1 for final test that files haven't
536                                        # been removed
537        print "1..$planned\n";
538        return;
539    }
540
541    sub ok {
542        my $success = shift;
543        my $message = shift;
544
545        chomp $message;
546
547        $current_test++;
548        print "not " unless $success;
549        print "ok $current_test - $message\n";
550        return $success;
551    }
552
553    sub skip {
554        my $why = shift;
555        my $n    = @_ ? shift : 1;
556        for (1..$n) {
557            $current_test++;
558            print "ok $current_test # skip $why\n";
559        }
560        no warnings 'exiting';
561        last SKIP;
562    }
563
564    sub _note {
565        my ($andle, $message) = @_;
566
567        chomp $message;
568
569        print $andle $message =~ s/^/# /mgr;
570        print $andle "\n";
571        return;
572    }
573
574    sub note { unshift @_, \*STDOUT; goto &_note }
575
576    sub diag { unshift @_, \*STDERR; goto &_note }
577
578    END {
579        if ($planned && $planned != $current_test) {
580            print STDERR
581            "# Looks like you planned $planned tests but ran $current_test.\n";
582        }
583    }
584}
585
586# List of known potential problems by pod and type.
587my %known_problems;
588
589# Pods given by the keys contain an interior node that is referred to from
590# outside it.
591my %has_referred_to_node;
592
593my $show_counts = 0;
594my $regen = 0;
595my $add_link = 0;
596my $show_all = 0;
597my $pedantic = 0;
598
599my $do_upstream_cpan = 0; # Assume that are to skip anything in /cpan
600my $do_deltas = 0;        # And stable perldeltas
601
602while (@ARGV && substr($ARGV[0], 0, 1) eq '-') {
603    my $arg = shift @ARGV;
604
605    $arg =~ s/^--/-/; # Treat '--' the same as a single '-'
606    if ($arg eq '-regen') {
607        $regen = 1;
608        $pedantic = 1;
609    }
610    elsif ($arg =~ /^-add[-_]link$/) {
611        $add_link = 1;
612    }
613    elsif ($arg eq '-cpan') {
614        $do_upstream_cpan = 1;
615    }
616    elsif ($arg eq '-deltas') {
617        $do_deltas = 1;
618    }
619    elsif ($arg =~ /^-show[-_]all$/) {
620        $show_all = 1;
621    }
622    elsif ($arg eq '-counts') {
623        $show_counts = 1;
624    }
625    elsif ($arg eq '-pedantic') {
626        $pedantic = 1;
627    }
628    else {
629        die <<EOF;
630Unknown option '$arg'
631
632Usage: $0 [ --regen | --cpan | --show-all | FILE ... | --add-link MODULE ... ]\n"
633    --add-link -> Add the MODULE and man page references to the database
634    --regen    -> Regenerate the data file for $0
635    --cpan     -> Include files in the cpan subdirectory.
636    --deltas   -> Include stable perldeltas
637    --show-all -> Show all known potential problems
638    --counts   -> Don't test, but give summary counts of the currently
639                  existing database
640    --pedantic -> Check for overly long lines in verbatim blocks
641EOF
642    }
643}
644
645$pedantic = 1 if exists $ENV{PERL_POD_PEDANTIC} and $ENV{PERL_POD_PEDANTIC};
646my @files = @ARGV;
647
648my $cpan_or_deltas = $do_upstream_cpan || $do_deltas;
649if (($regen + $show_all + $show_counts + $add_link + $cpan_or_deltas ) > 1) {
650    croak "--regen, --show-all, --counts, and --add-link are mutually"
651        . " exclusive\n and none can be run with --cpan nor --deltas";
652}
653
654my $has_input_files = @files;
655
656
657if ($add_link) {
658    if (! $has_input_files) {
659        croak "--add-link requires at least one module or man page reference";
660    }
661}
662elsif ($has_input_files) {
663    if ($regen || $show_counts || $do_upstream_cpan || $do_deltas) {
664        croak "--regen, --counts, --deltas, and --cpan can't be used since using specific files";
665    }
666    foreach my $file (@files) {
667        croak "Can't read file '$file'" if ! -r $file;
668    }
669}
670
671our %problems;  # potential problems found in this run
672
673package My::Pod::Checker {      # Extend Pod::Checker
674    use parent 'Pod::Checker';
675
676    # Uses inside out hash to protect from typos
677    # For new fields, remember to add to destructor DESTROY()
678    my %CFL_text;           # The text comprising the current C<>, F<>, or L<>
679    my %C_text;             # If defined, are in a C<> section, and includes
680                            # the accumulated text from that
681    my %current_indent;     # Current line's indent
682    my %filename;           # The pod is stored in this file
683    my %in_CFL;             # count of stacked C<>, F<>, L<> directives
684    my %indents;            # Stack of indents from =over's in effect for
685                            # current line
686    my %in_for;             # true if in a =for or =begin
687    my %in_NAME;            # true if within NAME section
688    my %in_begin;           # true if within =begin section
689    my %in_X;               # true if in a X<>
690    my %linkable_item;      # Bool: if the latest =item is linkable.  It isn't
691                            # for bullet and number lists
692    my %linkable_nodes;     # Pod::Checker adds all =items to its node list,
693                            # but not all =items are linkable-to
694    my %running_CFL_text;   # The current text that is being accumulated until
695                            # an end_FOO is found, and this includes any C<>,
696                            # F<>, or L<> directives.
697    my %running_simple_text; # The currentt text that is being accumulated
698                            # until an end_FOO is found, and all directives
699                            # have been expanded into plain text
700    my %command_count;      # Number of commands seen
701    my %seen_pod_cmd;       # true if have =pod earlier
702    my %skip;               # is SKIP set for this pod
703    my %start_line;         # the first input line number in the thing
704                            # currently being worked on
705
706    sub DESTROY {
707        my $addr = refaddr $_[0];
708        delete $CFL_text{$addr};
709        delete $C_text{$addr};
710        delete $command_count{$addr};
711        delete $current_indent{$addr};
712        delete $filename{$addr};
713        delete $in_begin{$addr};
714        delete $in_CFL{$addr};
715        delete $indents{$addr};
716        delete $in_for{$addr};
717        delete $in_NAME{$addr};
718        delete $in_X{$addr};
719        delete $linkable_item{$addr};
720        delete $linkable_nodes{$addr};
721        delete $running_CFL_text{$addr};
722        delete $running_simple_text{$addr};
723        delete $seen_pod_cmd{$addr};
724        delete $skip{$addr};
725        delete $start_line{$addr};
726        return;
727    }
728
729    sub new {
730        my $class = shift;
731        my $filename = shift;
732
733        my $self = $class->SUPER::new(-quiet => 1,
734                                     -warnings => $Warnings_Level);
735        my $addr = refaddr $self;
736        $command_count{$addr} = 0;
737        $current_indent{$addr} = 0;
738        $filename{$addr} = $filename;
739        $in_begin{$addr} = 0;
740        $in_X{$addr} = 0;
741        $in_CFL{$addr} = 0;
742        $in_NAME{$addr} = 0;
743        $linkable_item{$addr} = 0;
744        $seen_pod_cmd{$addr} = 0;
745        return $self;
746    }
747
748    # re's for messages that Pod::Checker outputs
749    my $location = qr/ \b (?:in|at|on|near) \s+ /xi;
750    my $optional_location = qr/ (?: $location )? /xi;
751    my $line_reference = qr/ [('"]? $optional_location \b line \s+
752                             (?: \d+ | EOF | \Q???\E | - )
753                             [)'"]? /xi;
754
755    sub poderror {  # Called to register a potential problem
756
757        # This adds an extra field to the parent hash, 'parameter'.  It is
758        # used to extract the variable parts of a message leaving just the
759        # constant skeleton.  This in turn allows the message to be
760        # categorized better, so that it shows up as a single type in our
761        # database, with the specifics of each occurrence not being stored with
762        # it.
763
764        my $self = shift;
765        my $opts = shift;
766
767        my $addr = refaddr $self;
768        return if $skip{$addr};
769
770        # Input can be a string or hash.  If a string, parse it to separate
771        # out the line number and convert to a hash for easier further
772        # processing
773        my $message;
774        if (ref $opts ne 'HASH') {
775            $message = join "", $opts, @_;
776            my $line_number;
777            if ($message =~ s/\s*($line_reference)//) {
778                ($line_number = $1) =~ s/\s*$optional_location//;
779            }
780            else {
781                $line_number = '???';
782            }
783            $opts = { -msg => $message, -line => $line_number };
784        } else {
785            $message = $opts->{'-msg'};
786
787        }
788
789        $message =~ s/^\d+\s+//;
790        return if main::suppressed($message);
791
792        $self->SUPER::poderror($opts, @_);
793
794        $opts->{parameter} = "" unless $opts->{parameter};
795
796        # The variable parts of the message tend to be enclosed in '...',
797        # "....", or (...).  Extract them and put them in an extra field,
798        # 'parameter'.  This is trickier because the matching delimiter to a
799        # '(' is its mirror, and not itself.  Text::Balanced could be used
800        # instead.
801        while ($message =~ m/ \s* $optional_location ( [('"] )/xg) {
802            my $delimiter = $1;
803            my $start = $-[0];
804            $delimiter = ')' if $delimiter eq '(';
805
806            # If there is no ending delimiter, don't consider it to be a
807            # variable part.  Most likely it is a contraction like "Don't"
808            last unless $message =~ m/\G .+? \Q$delimiter/xg;
809
810            my $length = $+[0] - $start;
811
812            # Get the part up through the closing delimiter
813            my $special = substr($message, $start, $length);
814            $special =~ s/^\s+//;   # No leading whitespace
815
816            # And add that variable part to the parameter, while removing it
817            # from the message.  This isn't a foolproof way of finding the
818            # variable part.  For example '(s)' can occur in e.g.,
819            # 'paragraph(s)'
820            if ($special ne '(s)') {
821                substr($message, $start, $length) = "";
822                pos $message = $start;
823                $opts->{-msg} = $message;
824                $opts->{parameter} .= " " if $opts->{parameter};
825                $opts->{parameter} .= $special;
826            }
827        }
828
829        # Extract any additional line number given.  This is often the
830        # beginning location of something whereas the main line number gives
831        # the ending one.
832        if ($message =~ /( $line_reference )/xi) {
833            my $line_ref = $1;
834            while ($message =~ s/\s*\Q$line_ref//) {
835                $opts->{-msg} = $message;
836                $opts->{parameter} .= " " if $opts->{parameter};
837                $opts->{parameter} .= $line_ref;
838            }
839        }
840
841        Carp::carp("Couldn't extract line number from '$message'") if $message =~ /line \d+/;
842        push @{$problems{$filename{$addr}}{$message}}, $opts;
843        #push @{$problems{$self->get_filename}{$message}}, $opts;
844    }
845
846    # In the next subroutines, we keep track of the text of the current
847    # innermost thing, like F<fooC<bar>baz>.  The things we care about raising
848    # messages about in this program all come from a single sequence of
849    # characters uninterrupted by other pod commands.  Therefore we don't have
850    # to worry about recursion, and we can just set the string we care about
851    # to empty on entrance to each command.
852
853    sub handle_text {
854        # This is called by the parent class to deal with any straight text.
855        # We mostly just append this to the running current value which will
856        # be dealt with upon the end of the current construct, like a
857        # paragraph.  But certain things don't contribute to checking the pod
858        # and are ignored.  We also have set flags to indicate this text is
859        # going towards constructing certain constructs, and handle those
860        # specially.
861
862        my $self = shift;
863        my $addr = refaddr $self;
864
865        my $return = $self->SUPER::handle_text(@_);
866
867        if ($in_X{$addr} || $in_for{$addr}) { # ignore
868            return $return;
869        }
870
871        my $text = join "\n", @_;
872        $running_simple_text{$addr} .= $text;
873
874        # Keep separate tabs on C<>, F<>, and L<> directives, and one
875        # especially for C<> ones.
876        if ($in_CFL{$addr}) {
877            $CFL_text{$addr} .= $text;
878            $C_text{$addr} .= $text if defined $C_text{$addr};
879        }
880        else {
881            # This variable is updated instead in the corresponding C, F, or L
882            # handler.
883            $running_CFL_text{$addr} .= $text;
884        }
885
886        # do this line-by-line so we can get the right line number
887        my @lines = split /^/, $running_simple_text{$addr};
888        for my $i (0..$#lines) {
889            if ($lines[$i] =~ m/\N{REPLACEMENT CHARACTER}/) {
890                $self->poderror({ -line => $start_line{$addr} + $i,
891                    -msg => $replacement_character,
892                    parameter => "possibly invalid ". $self->encoding . " input at character " . pos $lines[$i],
893                });
894            }
895        }
896        return $return;
897    }
898
899    # The start_FOO routines check that somehow a C<> construct hasn't escaped
900    # without being checked, and initialize things, and call the parent
901    # class's equivalent routine.
902
903    # The end_FOO routines close things off, and check the text that has been
904    # accumulated for FOO, then call the parent's corresponding routine.
905
906    sub start_Para {
907        my $self = shift;
908        check_see_but_not_link($self);
909
910        my $addr = refaddr $self;
911        $start_line{$addr} = $_[0]->{start_line};
912        $running_CFL_text{$addr} = "";
913        $running_simple_text{$addr} = "";
914        return $self->SUPER::start_Para(@_);
915    }
916
917    sub start_item {
918        my $self = shift;
919        check_see_but_not_link($self);
920
921        my $addr = refaddr $self;
922        $start_line{$addr} = $_[0]->{start_line};
923        $running_CFL_text{$addr} = "";
924        $running_simple_text{$addr} = "";
925
926    }
927
928    sub start_item_text {
929        my $self = shift;
930        start_item($self);
931        my $addr = refaddr $self;
932
933        # This is the only =item that is linkable
934        $linkable_item{$addr} = 1;
935
936        return $self->SUPER::start_item_text(@_);
937    }
938
939    sub start_item_number {
940        my $self = shift;
941        start_item($self);
942
943        return $self->SUPER::start_item_number(@_);
944    }
945
946    sub start_item_bullet {
947        my $self = shift;
948        start_item($self);
949
950        return $self->SUPER::start_item_bullet(@_);
951    }
952
953    sub end_item {  # No difference in =item types endings
954        my $self = shift;
955        check_see_but_not_link($self);
956        return $self->SUPER::end_item(@_);
957    }
958
959    sub start_over {
960        my $self = shift;
961        check_see_but_not_link($self);
962
963        my $addr = refaddr $self;
964        $start_line{$addr} = $_[0]->{start_line};
965        $running_CFL_text{$addr} = "";
966        $running_simple_text{$addr} = "";
967
968        # Save this indent on a stack, and keep track of total indent
969        my $indent =  $_[0]{'indent'};
970        push @{$indents{$addr}}, $indent;
971        $current_indent{$addr} += $indent;
972
973        return $self->SUPER::start_over(@_);
974    }
975
976    sub end_over_bullet { shift->end_over(@_) }
977    sub end_over_number { shift->end_over(@_) }
978    sub end_over_text   { shift->end_over(@_) }
979    sub end_over_block  { shift->end_over(@_) }
980    sub end_over_empty  { shift->end_over(@_) }
981    sub end_over {
982        my $self = shift;
983        check_see_but_not_link($self);
984
985        my $addr = refaddr $self;
986
987        # Pop current indent
988        if (@{$indents{$addr}}) {
989            $current_indent{$addr} -= pop @{$indents{$addr}};
990        }
991        else {
992            # =back without corresponding =over, but should have
993            # warned already
994            $current_indent{$addr} = 0;
995        }
996    }
997
998    sub check_see_but_not_link {
999
1000        # Looks through accumulated text for current element to see if it
1001        # refers to something that should be linked to, but isn't.
1002
1003        my $self = shift;
1004        my $addr = refaddr $self;
1005
1006        return unless defined $running_CFL_text{$addr};
1007
1008        while ($running_CFL_text{$addr} =~ m{
1009                                ( (?: \w+ \s+ )* )  # The phrase before, if any
1010                                \b [Ss]ee \s+
1011                                ( ( [^L] )
1012                                  <
1013                                  ( [^<]*? )  # The not < excludes nested C<L<...
1014                                  >
1015                                )
1016                                ( \s+ (?: under | in ) \s+ L< )?
1017                            }xg)
1018        {
1019            my $prefix = $1 // "";
1020            my $construct = $2;     # The whole thing, like C<...>
1021            my $type = $3;
1022            my $interior = $4;
1023            my $trailing = $5;      # After the whole thing ending in "L<"
1024
1025            # If the full phrase is something like, "you might see C<", or
1026            # similar, it really isn't a reference to a link.  The ones I saw
1027            # all had the word "you" in them; and the "you" wasn't the
1028            # beginning of a sentence.
1029            if ($prefix !~ / \b you \b /x) {
1030
1031                # Now, find what the module or man page name within the
1032                # construct would be if it actually has L<> syntax.  If it
1033                # doesn't have that syntax, will set the module to the entire
1034                # interior.
1035                if (! defined $trailing # not referring to something in another
1036                                        # section
1037                    && $interior !~ /$non_pods/
1038
1039                    # There can't be spaces (I think) in module names or man
1040                    # pages
1041                    && $interior !~ / \s /x
1042
1043                    # F<> that end in eg \.pl are almost certainly ok, as are
1044                    # those that look like a path with multiple "/" chars
1045                    && ($type ne "F"
1046                        || (! -e $interior
1047                            && $interior !~ /\.\w+$/
1048                            && $interior !~ /\/.+\//)
1049                    )
1050                ) {
1051                    # TODO: move the checking of $pedantic higher up
1052                    $self->poderror({ -line => $start_line{$addr},
1053                        -msg => $C_not_linked,
1054                        parameter => $construct
1055                    });
1056                }
1057            }
1058        }
1059
1060        undef $running_CFL_text{$addr};
1061    }
1062
1063    sub end_Para {
1064        my $self = shift;
1065        check_see_but_not_link($self);
1066
1067        my $addr = refaddr $self;
1068        if ($in_NAME{$addr}) {
1069            if ($running_simple_text{$addr} =~ /^\s*(\S+?)\s*$/) {
1070                $self->poderror({ -line => $start_line{$addr},
1071                    -msg => $missing_name_description,
1072                    parameter => $1});
1073            }
1074            $in_NAME{$addr} = 0;
1075        }
1076        $self->SUPER::end_Para(@_);
1077    }
1078
1079    sub start_head1 {
1080        my $self = shift;
1081        check_see_but_not_link($self);
1082
1083        my $addr = refaddr $self;
1084        $start_line{$addr} = $_[0]->{start_line};
1085        $running_CFL_text{$addr} = "";
1086        $running_simple_text{$addr} = "";
1087
1088        return $self->SUPER::start_head1(@_);
1089    }
1090
1091    sub end_head1 {  # This is called at the end of the =head line.
1092        my $self = shift;
1093        check_see_but_not_link($self);
1094
1095        my $addr = refaddr $self;
1096
1097        $in_NAME{$addr} = 1 if $running_simple_text{$addr} eq 'NAME';
1098        return $self->SUPER::end_head(@_);
1099    }
1100
1101    sub start_Verbatim {
1102        my $self = shift;
1103        check_see_but_not_link($self);
1104
1105        my $addr = refaddr $self;
1106        $running_simple_text{$addr} = "";
1107        $start_line{$addr} = $_[0]->{start_line};
1108        return $self->SUPER::start_Verbatim(@_);
1109    }
1110
1111    sub end_Verbatim {
1112        my $self = shift;
1113        my $addr = refaddr $self;
1114
1115        # Pick up the name if it looks like one, since the parent class
1116        # doesn't handle verbatim NAMEs
1117        if ($in_NAME{$addr}
1118            && $running_simple_text{$addr} =~ /^\s*(\S+?)\s*[,-]/)
1119        {
1120            $self->name($1);
1121        }
1122
1123        my $indent = $self->get_current_indent;
1124
1125        # split the code by line.
1126        my @lines = split /^/, $running_simple_text{$addr};
1127
1128        # We have two cases here. The verbatim text may be copied from
1129        # one of our files, in which case we check to make sure that the
1130        # code in the documentation matches that of the code in the
1131        # file, OR, we check the line lengths are appropriate. Copied text
1132        # is identified by the first line containing one of the following
1133        # strings: "file source: FILENAME" or "copied from: FILENAME" where
1134        # the FILENAME actually exists.
1135        #
1136        # Yes, this implies that where we are copying code verbatim from
1137        # one of our source files we do not check its length. This is
1138        # because it is a copy, and we shouldn't require the code to be
1139        # munged to be placed in the docs. This should only be used in
1140        # pod files which are used to document the internals for other
1141        # developers, as it may result in unpleasant to view HTML docs.
1142
1143        my $copied = 0;
1144        if ($lines[0] =~ /(?:file source|copied from):\s*([\/\w.]+)/i and -e $1) {
1145            # this text was copied from a source file. Make sure that it still
1146            # matches the source. This is a whitespace insensitive match.
1147
1148            my $file = $1;
1149            $copied = 1;
1150            my $pat = "";
1151            foreach my $line (@lines[1..$#lines]) {
1152                # convert each line into a pattern fragment
1153                $line =~ s/(?:(\s+)|(\S+))/$1 ? "\\s++" : quotemeta($2)/ge;
1154                $pat .= $line;
1155            }
1156            # merge adjacent \s+ sequences
1157            $pat =~ s/(?:\\s\+\+){2,}/\\s++/g;
1158            # slurp the file
1159            my $slurped = do {
1160                open my $ifh, "<", $file
1161                    or die "Failed to open '$file' for read: $!";
1162                local $/;
1163                <$ifh>;
1164            };
1165            if ($slurped !~ /$pat/) {
1166                $self->poderror({ -line => $start_line{$addr},
1167                    -msg => "Copied verbatim text is out of sync with source file",
1168                    parameter => $file,
1169                });
1170            }
1171        }
1172        if (!$copied) {
1173            # if the verbatim text has not been copied from one of our
1174            # source files we look at each line to verify it is short enough
1175            for my $i (0 .. @lines - 1) {
1176                $lines[$i] =~ s/\s+$//;
1177                my $exceeds = length(Text::Tabs::expand($lines[$i]))
1178                            + $indent - $MAX_LINE_LENGTH;
1179                next unless $exceeds > 0;
1180
1181                $self->poderror({ -line => $start_line{$addr} + $i,
1182                    -msg => $line_length,
1183                    parameter => "+$exceeds (including " . ($indent - $INDENT) .
1184                                 " from =over's and $INDENT as base indent)",
1185                });
1186            }
1187        }
1188
1189        undef $running_simple_text{$addr};
1190
1191        # Parent class didn't bother to define this
1192        #return $self->SUPER::SUPER::end_Verbatim(@_);
1193    }
1194
1195    sub start_C {
1196        my $self = shift;
1197        my $addr = refaddr $self;
1198
1199        $C_text{$addr} = "";
1200
1201        # If not in a stacked set of C<>, F<> and L<>, initialize the text for
1202        # them.
1203        $CFL_text{$addr} = "" if ! $in_CFL{$addr};
1204        $in_CFL{$addr}++;
1205
1206        return $self->SUPER::start_C(@_);
1207    }
1208
1209    sub start_F {
1210        my $self = shift;
1211        my $addr = refaddr $self;
1212
1213        $CFL_text{$addr} = "" if ! $in_CFL{$addr};
1214        $in_CFL{$addr}++;
1215        return $self->SUPER::start_F(@_);
1216    }
1217
1218    sub start_L {
1219        my $self = shift;
1220        my $addr = refaddr $self;
1221
1222        $CFL_text{$addr} = "" if ! $in_CFL{$addr};
1223        $in_CFL{$addr}++;
1224        return $self->SUPER::start_L(@_);
1225    }
1226
1227    sub end_C {
1228        my $self = shift;
1229        my $addr = refaddr $self;
1230
1231        # Warn if looks like a file or link enclosed instead by this C<>
1232        if ($C_text{$addr} =~ qr/^ $C_path_re $/x) {
1233            # Here it does look like it could be a file path or a link.
1234            # But some varieties of regex patterns could also fit with what we
1235            # have so far.  Weed those out as best we can.  '/foo/' is almost
1236            # certainly meant to be a pattern, as is '/foo/g'.
1237            my $is_pattern;
1238            if ($C_text{$addr} !~ qr| ^ / [^/]* / ( [msixpodualngcr]* ) $ |x) {
1239                $is_pattern = 0;
1240            }
1241            else {
1242
1243                # Here, it looks like a pattern potentially followed by some
1244                # modifiers.  To make doubly sure, don't count as patterns
1245                # those constructs which have more occurrences (generally 1)
1246                # of a modifier than is legal.
1247                my %counts;
1248                map { $counts{$_}++ } split "", $1;
1249                foreach my $modifier (keys %counts) {
1250                    if ($counts{$modifier} > (($modifier eq 'a')
1251                                              ? 2
1252                                              : 1))
1253                    {
1254                        $is_pattern = 0;
1255                        last;
1256                    }
1257                }
1258                $is_pattern = 1 unless defined $is_pattern;
1259            }
1260
1261            unless ($is_pattern) {
1262                $self->poderror({ -line => $start_line{$addr},
1263                    -msg => $C_with_slash,
1264                    parameter => "C<$C_text{$addr}>"
1265                });
1266            }
1267        }
1268        undef $C_text{$addr};
1269
1270        # Add the current text to the running total.  This was not done in
1271        # handle_text(), because it just sees the plain text of the innermost
1272        # stacked directive.  We want to keep all the directive names
1273        # enclosing the text.  Otherwise the fact that C<L<foobar>> is to a
1274        # link would be lost, as the L<> would be gone.
1275        $CFL_text{$addr} = "C<$CFL_text{$addr}>";
1276
1277        # Add this text to the whole running total only if popping this
1278        # directive off the stack leaves it empty.  As long as something is on
1279        # the stack, it gets added to $CFL_text (just above).  It is only
1280        # entirely constructed when the stack is empty.
1281        $in_CFL{$addr}--;
1282        $running_CFL_text{$addr} .= $CFL_text{$addr} if ! $in_CFL{$addr};
1283
1284        return $self->SUPER::end_C(@_);
1285    }
1286
1287    sub end_F {
1288        my $self = shift;
1289        my $addr = refaddr $self;
1290
1291        $CFL_text{$addr} = "F<$CFL_text{$addr}>";
1292        $in_CFL{$addr}--;
1293        $running_CFL_text{$addr} .= $CFL_text{$addr} if ! $in_CFL{$addr};
1294        return $self->SUPER::end_F(@_);
1295    }
1296
1297    sub end_L {
1298        my $self = shift;
1299        my $addr = refaddr $self;
1300
1301        $CFL_text{$addr} = "L<$CFL_text{$addr}>";
1302        $in_CFL{$addr}--;
1303        $running_CFL_text{$addr} .= $CFL_text{$addr} if ! $in_CFL{$addr};
1304        return $self->SUPER::end_L(@_);
1305    }
1306
1307    sub start_X {
1308        my $self = shift;
1309        my $addr = refaddr $self;
1310
1311        $in_X{$addr} = 1;
1312        return $self->SUPER::start_X(@_);
1313    }
1314
1315    sub end_X {
1316        my $self = shift;
1317        my $addr = refaddr $self;
1318
1319        $in_X{$addr} = 0;
1320        return $self->SUPER::end_X(@_);
1321    }
1322
1323    sub start_for {
1324        my $self = shift;
1325        my $addr = refaddr $self;
1326
1327        $in_for{$addr} = 1;
1328        return $self->SUPER::start_for(@_);
1329    }
1330
1331    sub end_for {
1332        my $self = shift;
1333        my $addr = refaddr $self;
1334
1335        $in_for{$addr} = 0;
1336        return $self->SUPER::end_for(@_);
1337    }
1338
1339    sub hyperlink {
1340        my ($self, $link) = @_;
1341
1342        if ($link && $link->type eq 'pod') {
1343            my $page = $link->page;
1344            my $node = $link->node;
1345
1346            # If the hyperlink is to an interior node of another page, save it
1347            # so that we can see if we need to parse normally skipped files.
1348            $has_referred_to_node{$page} = 1 if $node;
1349
1350            # Ignore certain placeholder links in perldelta.  Check if the
1351            # link is page-level, and also check if to a node within the page
1352            if (   $self->name && $self->name eq "perldelta"
1353                && ((  grep { $page eq $_ } @perldelta_ignore_links)
1354                    || (   $node
1355                        && (grep { "$page/$node" eq $_ } @perldelta_ignore_links)
1356            ))) {
1357                return;
1358            }
1359        }
1360
1361        return $self->SUPER::hyperlink($link);
1362    }
1363
1364    sub node {
1365        my $self = shift;
1366        my $text = $_[0];
1367        if($text) {
1368            $text =~ s/\s+$//s; # strip trailing whitespace
1369            $text =~ s/\s+/ /gs; # collapse whitespace
1370            my $addr = refaddr $self;
1371            push(@{$linkable_nodes{$addr}}, $text) if
1372                                    ! $current_indent{$addr}
1373                                    || $linkable_item{$addr};
1374        }
1375        return $self->SUPER::node($_[0]);
1376    }
1377
1378    sub get_current_indent {
1379        return $INDENT + $current_indent{refaddr $_[0]};
1380    }
1381
1382    sub get_filename {
1383        return $filename{refaddr $_[0]};
1384    }
1385
1386    sub linkable_nodes {
1387        my $linkables = $linkable_nodes{refaddr $_[0]};
1388        return undef unless $linkables;
1389        return @$linkables;
1390    }
1391
1392    sub get_skip {
1393        return $skip{refaddr $_[0]} // 0;
1394    }
1395
1396    sub set_skip {
1397        my $self = shift;
1398        $skip{refaddr $self} = shift;
1399
1400        # If skipping, no need to keep the problems for it
1401        delete $problems{$self->get_filename};
1402        return;
1403    }
1404
1405    sub parse_from_file {
1406        # This overrides the super class method so that if an open fails on a
1407        # transitory file, it doesn't croak.  It returns 1 if it did find the
1408        # file, 0 if it didn't
1409
1410        my $self = shift;
1411        my $filename = shift;
1412        # ignores 2nd param, which is output file.  Always uses undef
1413
1414        if (open my $in_fh, '<:bytes', $filename) {
1415            $self->SUPER::parse_from_file($in_fh, undef);
1416            close $in_fh;
1417            return 1;
1418        }
1419
1420        # If couldn't open file, perhaps it was transitory, and hence not an error
1421        return 0 unless -e $filename;
1422
1423        die "Can't open '$filename': $!\n";
1424    }
1425}
1426
1427my %filename_to_checker; # Map a filename to its pod checker object
1428my %id_to_checker;       # Map a checksum to its pod checker object
1429my %nodes;               # key is filename, values are nodes in that file.
1430my %nodes_first_word;    # same, but value is first word of each node
1431my %valid_modules;       # List of modules known to exist outside us.
1432my %digests;             # checksums of files, whose names are the keys
1433my %filename_to_pod;     # Map a filename to its pod NAME
1434my %files_with_unknown_issues;
1435my %files_with_fixes;
1436
1437my $data_fh;
1438open $data_fh, '<:bytes', $known_issues or die "Can't open $known_issues";
1439
1440my %counts; # For --counts param, count of each issue type
1441my %suppressed_files;   # Files with at least one issue type to suppress
1442my $HEADER = <<END;
1443# This file is the data file for $0.
1444# There are three types of lines.
1445# Comment lines are white-space only or begin with a '#', like this one.  Any
1446#   changes you make to the comment lines will be lost when the file is
1447#   regen'd.
1448# Lines without tab characters are simply NAMES of pods that the program knows
1449#   will have links to them and the program does not check if those links are
1450#   valid.
1451# All other lines should have three fields, each separated by a tab.  The
1452#   first field is the name of a pod; the second field is an error message
1453#   generated by this program; and the third field is a count of how many
1454#   known instances of that message there are in the pod.  -1 means that the
1455#   program can expect any number of this type of message.
1456END
1457
1458my @existing_issues;
1459
1460
1461while (<$data_fh>) {    # Read the database
1462    chomp;
1463    next if /^\s*(?:#|$)/;          # Skip comment and empty lines
1464    next if /^ [ < = > ]{7} /xx;    # Skip version control conflict markers
1465    if (/\t/) {
1466        if ($add_link) {    # The issues are saved and later output unchanged
1467            push @existing_issues, $_;
1468            next;
1469        }
1470
1471        # Keep track of counts of each issue type for each file
1472        my ($filename, $message, $count) = split /\t/;
1473
1474        # The way things aren't shown is to see if the count of the number of
1475        # warnings of a given type has changed.  To show all, we pretend there
1476        # weren't any already stored.  If the stored value is negative, it
1477        # means counting for this warning in this file is disabled, and hence
1478        # won't change.  To skip showing those files under --show-all, we
1479        # retain the negatvie value.  To show all occurrences of other
1480        # warnings, we skip setting their count, making them appear to have
1481        # had zero occurrences.
1482        next if $show_all && $count > 0;
1483
1484        $known_problems{$filename}{$message} = $count;
1485
1486        if ($show_counts) {
1487            if ($count < 0) {   # -1 means to suppress this issue type
1488                $suppressed_files{$filename} = $filename;
1489            }
1490            else {
1491                $counts{$message} += $count;
1492            }
1493        }
1494    }
1495    else {  # Lines without a tab are modules known to be valid
1496        $valid_modules{$_} = 1
1497    }
1498}
1499close $data_fh;
1500
1501if ($add_link) {
1502    $copy_fh = open_new($known_issues);
1503
1504    # Check for basic sanity, and add each command line argument
1505    foreach my $module (@files) {
1506        die "\"$module\" does not look like a module or man page"
1507            # Must look like (A or A::B or A::B::C ..., or foo(3C)
1508            if $module !~ /^ (?: \w+ (?: :: \w+ )* | \w+ \( \d \w* \) ) $/x;
1509        $valid_modules{$module} = 1
1510    }
1511    my_safer_print($copy_fh, $HEADER);
1512    foreach (sort { lc $a cmp lc $b } keys %valid_modules) {
1513        my_safer_print($copy_fh, $_, "\n");
1514    }
1515
1516    # The rest of the db file is output unchanged.
1517    my_safer_print($copy_fh, join "\n", @existing_issues, "");
1518
1519    close_and_rename($copy_fh);
1520    exit;
1521}
1522
1523if ($show_counts) {
1524    my $total = 0;
1525    foreach my $message (sort keys %counts) {
1526        $total += $counts{$message};
1527        note(Text::Tabs::expand("$counts{$message}\t$message"));
1528    }
1529    note("-----\n" . Text::Tabs::expand("$total\tknown potential issues"));
1530    if (%suppressed_files) {
1531        note("\nFiles that have all messages of at least one type suppressed:");
1532        note(join ", ", sort keys %suppressed_files);
1533    }
1534    exit 0;
1535}
1536
1537# re to match files that are to be parsed only if there is an internal link
1538# to them.  It does not include cpan, as whether those are parsed depends
1539# on a switch.  Currently, only perltoc and the stable perldelta.pod's
1540# are included.  The latter all have characters between 'perl' and
1541# 'delta'.  (Actually the currently developed one matches as well, but
1542# is a duplicate of perldelta.pod, so can be skipped, so fine for it to
1543# match this.
1544my $only_for_interior_links_re = qr/ ^ pod\/perltoc.pod $
1545                                   /x;
1546unless ($do_deltas) {
1547    $only_for_interior_links_re = qr/$only_for_interior_links_re |
1548                                    \b perl \d+ delta \. pod \b
1549                                /x;
1550}
1551
1552{ # Closure
1553    my $first_time = 1;
1554
1555    sub output_thanks ($$$$) {  # Called when an issue has been fixed
1556        my $filename = shift;
1557        my $original_count = shift;
1558        my $current_count = shift;
1559        my $message = shift;
1560
1561        $files_with_fixes{$filename} = 1;
1562        my $return;
1563        my $fixed_count = $original_count - $current_count;
1564        my $a_problem = ($fixed_count == 1) ? "a problem" : "multiple problems";
1565        my $another_problem = ($fixed_count == 1) ? "another problem" : "another set of problems";
1566        my $diff;
1567        if ($message) {
1568            $diff = <<EOF;
1569There were $original_count occurrences (now $current_count) in this pod of type
1570"$message",
1571EOF
1572        } else {
1573            $diff = <<EOF;
1574There are no longer any problems found in this pod!
1575EOF
1576        }
1577
1578        if ($first_time) {
1579            $first_time = 0;
1580            $return = <<EOF;
1581Thanks for fixing $a_problem!
1582$diff
1583Now you must teach $0 that this was fixed.
1584EOF
1585        }
1586        else {
1587            $return = <<EOF
1588Thanks for fixing $another_problem.
1589$diff
1590EOF
1591        }
1592
1593        return $return;
1594    }
1595}
1596
1597sub my_safer_print {    # print, with error checking for outputting to db
1598    my ($fh, @lines) = @_;
1599
1600    if (! print $fh @lines) {
1601        my $save_error = $!;
1602        close($fh);
1603        die "Write failure: $save_error";
1604    }
1605}
1606
1607sub extract_pod {   # Extracts just the pod from a file; returns undef if file
1608                    # doesn't exist
1609    my $filename = shift;
1610
1611    if (open my $in_fh, '<:bytes', $filename) {
1612        use Pod::Simple::JustPod;
1613        my $parser = Pod::Simple::JustPod->new();
1614        $parser->no_errata_section(1);
1615        $parser->no_whining(1);
1616        $parser->source_filename($filename);
1617        my $output;
1618        $parser->output_string( \$output );
1619        $parser->parse_lines( <$in_fh>, undef );
1620        close $in_fh;
1621
1622        return $output;
1623    }
1624
1625    # The file should already have been opened once to get here, so if that
1626    # fails, something is wrong.  It's possible that a transitory file
1627    # containing a pod would get here, so if the file no longer exists just
1628    # return undef.
1629    return unless -e $filename;
1630    die "Can't open '$filename': $!\n";
1631}
1632
1633my $digest = Digest->new($digest_type);
1634
1635# This is used as a callback from File::Find::find(), which always constructs
1636# pathnames using Unix separators
1637sub is_pod_file {
1638    # If $_ is a pod file, add it to the lists and do other prep work.
1639
1640    if (-d) {
1641        # Don't look at files in directories that are for tests, nor those
1642        # beginning with a dot, nor those in the directory where Windows
1643        # builds generate HTML from other POD sources.
1644        if (m!/t\z! || m!/\.! || m!^./win32/html\z!) {
1645            $File::Find::prune = 1;
1646        }
1647        return;
1648    }
1649
1650    return unless -r && -s;    # Can't check it if can't read it; no need to
1651                               # check if 0 length
1652    return unless -f || -l;    # Weird file types won't be pods
1653
1654    my ($leaf) = m!([^/]+)\z!;
1655    if (m!/\.!                 # No hidden Unix files
1656        || $leaf =~ $non_pods) {
1657        note("Not considering $_") if DEBUG;
1658        return;
1659    }
1660
1661    my $filename = $File::Find::name;
1662
1663    # $filename is relative, like './path'.  Strip that initial part away.
1664    $filename =~ s!^\./!! or die 'Unexpected pathname "$filename"';
1665
1666    return if $excluded_files{canonicalize($filename)};
1667
1668    my $contents = do {
1669        local $/;
1670        my $candidate;
1671        if (! open $candidate, '<:bytes', $_) {
1672
1673            # If a transitory file was found earlier, the open could fail
1674            # legitimately and we just skip the file; also skip it if it is a
1675            # broken symbolic link, as it is probably just a build problem;
1676            # certainly not a file that we would want to check the pod of.
1677            # Otherwise fail it here and no reason to process it further.
1678            # (But the test count will be off too)
1679            ok(0, "Can't open '$filename': $!")
1680                                            if -r $filename && ! -l $filename;
1681            return;
1682        }
1683        <$candidate>;
1684    };
1685
1686    # If the file is a .pm or .pod, having any initial '=' on a line is
1687    # grounds for testing it.  Otherwise, require a head1 NAME line to
1688    # consider it as a potential pod
1689    if ($filename =~ /\.(?:pm|pod)/) {
1690        return unless $contents =~ /^=/m;
1691    } else {
1692        return unless $contents =~ /^=head1 +NAME/m;
1693    }
1694
1695    # Here, we know that the file is a pod.  Add it to the list of files
1696    # to check and create a checker object for it.
1697
1698    push @files, $filename;
1699    my $checker = My::Pod::Checker->new($filename);
1700    $filename_to_checker{$filename} = $checker;
1701
1702    # In order to detect duplicate pods and only analyze them once, we
1703    # compute checksums for the file, so don't have to do an exact
1704    # compare.  Note that if the pod is just part of the file, the
1705    # checksums can differ for the same pod.  That special case is handled
1706    # later, since if the checksums of the whole file are the same, that
1707    # case won't even come up.  We don't need the checksums for files that
1708    # we parse only if there is a link to its interior, but we do need its
1709    # NAME, which is also retrieved in the code below.
1710
1711    if ($filename =~ / (?: ^(cpan|lib|ext|dist)\/ )
1712                        | $only_for_interior_links_re
1713                    /x)
1714    {
1715        my $byte_contents = $contents;
1716        utf8::encode($byte_contents);
1717        $digest->add($byte_contents);   # Doesn't handle Unicode
1718        $digests{$filename} = $digest->digest;
1719
1720        # lib files aren't analyzed if they are duplicates of files copied
1721        # there from some other directory.  But to determine this, we need
1722        # to know their NAMEs.  We might as well find the NAME now while
1723        # the file is open.  Similarly, cpan files aren't analyzed unless
1724        # we're analyzing all of them, or this particular file is linked
1725        # to by a file we are analyzing, and thus we will want to verify
1726        # that the target exists in it.  We need to know at least the NAME
1727        # to see if it's worth analyzing, or so we can determine if a lib
1728        # file is a copy of a cpan one.
1729        if ($filename =~ m{ (?: ^ (?: cpan | lib ) / )
1730                            | $only_for_interior_links_re
1731                            }x) {
1732            if ($contents =~ /^=head1 +NAME.*/mg) {
1733                # The NAME is the first non-spaces on the line up to a
1734                # comma, dash or end of line.  Otherwise, it's invalid and
1735                # this pod doesn't have a legal name that we're smart
1736                # enough to find currently.  But the  parser will later
1737                # find it if it thinks there is a legal name, and set the
1738                # name
1739                if ($contents =~ /\G    # continue from the line after =head1
1740                                  \s*   # ignore any empty lines
1741
1742                                  # ignore =for paragraphs followed by empty
1743                                  # lines
1744                                  (?: ^ =for .*? \n (?: [^\s]*? \n )* \s* )*
1745
1746                                  ^ \s* ( \S+?) \s* (?: [,-] | $ )/mx) {
1747                    my $name = $1;
1748                    $checker->name($name);
1749                    $id_to_checker{$name} = $checker
1750                        if $filename =~ m{^cpan/};
1751                }
1752            }
1753            elsif ($filename =~ m{^cpan/}) {
1754                $id_to_checker{$digests{$filename}} = $checker;
1755            }
1756        }
1757    }
1758
1759    return;
1760} # End of is_pod_file()
1761
1762# Start of real code that isn't processing the command line (except the
1763# db is read in above, as is processing of the --add-link option).
1764# Here, @files contains list of files on the command line.  If have any of
1765# these, unconditionally test them, and show all the errors, even the known
1766# ones, and, since not testing other pods, don't do cross-pod link tests.
1767# (Could add extra code to do cross-pod tests for the ones in the list.)
1768
1769if ($has_input_files) {
1770    undef %known_problems;
1771    $do_upstream_cpan = $do_deltas = 1;  # In case one of the inputs is one
1772                                         # of these types
1773}
1774else { # No input files -- go find all the possibilities.
1775    if ($regen) {
1776        $copy_fh = open_new($known_issues);
1777        note("Regenerating $known_issues, please be patient...");
1778        print $copy_fh $HEADER;
1779    }
1780
1781    # Move to the directory above us, but have to adjust @INC to account for
1782    # that.
1783    s{^\.\./lib$}{lib} for @INC;
1784    chdir File::Spec->updir;
1785
1786    # And look in this directory and all its subdirectories
1787    find( {wanted => \&is_pod_file, no_chdir => 1}, '.');
1788
1789    # Add ourselves to the test
1790    push @files, "t/porting/podcheck.t";
1791}
1792
1793# Now we know how many tests there will be.
1794plan (tests => scalar @files) if ! $regen;
1795
1796
1797# Sort file names so we get consistent results, and to put cpan last,
1798# preceded by the ones that we don't generally parse.  This is because both
1799# these classes are generally parsed only if there is a link to the interior
1800# of them, and we have to parse all others first to guarantee that they don't
1801# have such a link. 'lib' files come just before these, as some of these are
1802# duplicates of others.  We already have figured this out when gathering the
1803# data as a special case for all such files, but this, while unnecessary,
1804# puts the derived file last in the output.  'readme' files come before those,
1805# as those also could be duplicates of others, which are considered the
1806# primary ones.  These currently aren't figured out when gathering data, so
1807# are done here.
1808@files = sort { if ($a =~ /^cpan/) {
1809                   return 1 if $b !~ /^cpan/;
1810                   return lc $a cmp lc $b;
1811               }
1812               elsif ($b =~ /^cpan/) {
1813                   return -1;
1814               }
1815               elsif ($a =~ /$only_for_interior_links_re/) {
1816                   return 1 if $b !~ /$only_for_interior_links_re/;
1817                   return lc $a cmp lc $b;
1818               }
1819               elsif ($b =~ /$only_for_interior_links_re/) {
1820                   return -1;
1821               }
1822               elsif ($a =~ /^lib/) {
1823                   return 1 if $b !~ /^lib/;
1824                   return lc $a cmp lc $b;
1825               }
1826               elsif ($b =~ /^lib/) {
1827                   return -1;
1828               } elsif ($a =~ /\breadme\b/i) {
1829                   return 1 if $b !~ /\breadme\b/i;
1830                   return lc $a cmp lc $b;
1831               }
1832               elsif ($b =~ /\breadme\b/i) {
1833                   return -1;
1834               }
1835               else {
1836                   return lc $a cmp lc $b;
1837               }
1838           }
1839           @files;
1840
1841# Now go through all the files and parse them
1842FILE:
1843foreach my $filename (@files) {
1844    my $parsed = 0;
1845    note("parsing $filename") if DEBUG;
1846
1847    # We may have already figured out some things in the process of generating
1848    # the file list.  If so, we have a $checker object already.  But if not,
1849    # generate one now.
1850    my $checker = $filename_to_checker{$filename};
1851    if (! $checker) {
1852        $checker = My::Pod::Checker->new($filename);
1853        $filename_to_checker{$filename} = $checker;
1854    }
1855
1856    # We have set the name in the checker object if there is a possibility
1857    # that no further parsing is necessary, but otherwise do the parsing now.
1858    if (! $checker->name) {
1859        if (! $checker->parse_from_file($filename, undef)) {
1860            $checker->set_skip("$filename is transitory");
1861            next FILE;
1862        }
1863        $parsed = 1;
1864    }
1865
1866    if ($checker->num_errors() < 0) {   # Returns negative if not a pod
1867        $checker->set_skip("$filename is not a pod");
1868    }
1869    else {
1870
1871        # Here, is a pod.  See if it is one that has already been tested,
1872        # or should be tested under another directory.  Use either its NAME
1873        # if it has one, or a checksum if not.
1874        my $name = $checker->name;
1875        my $id;
1876
1877        if ($name) {
1878            $id = $name;
1879        }
1880        else {
1881            my $digest = Digest->new($digest_type);
1882            my $contents = extract_pod($filename);
1883
1884            # If the return is undef, it means that $filename was a transitory
1885            # file; skip it.
1886            next FILE unless defined $contents;
1887            my $byte_contents = $contents;
1888            utf8::encode($byte_contents);
1889            $digest->add($byte_contents);   # Doesn't handle Unicode
1890            $id = $digest->digest;
1891        }
1892
1893        # If there is a match for this pod with something that we've already
1894        # processed, don't process it, and output why.
1895        my $prior_checker;
1896        if (defined ($prior_checker = $id_to_checker{$id})
1897            && $prior_checker != $checker)  # Could have defined the checker
1898                                            # earlier without pursuing it
1899        {
1900
1901            # If the pods are identical, then it's just a copy, and isn't an
1902            # error.  First use the checksums we have already computed to see
1903            # if the entire files are identical, which means that the pods are
1904            # identical too.
1905            my $prior_filename = $prior_checker->get_filename;
1906            my $same = (! $name
1907                        || ($digests{$prior_filename}
1908                            && $digests{$filename}
1909                            && $digests{$prior_filename} eq $digests{$filename}));
1910
1911            # If they differ, it could be that the files differ for some
1912            # reason, but the pods they contain are identical.  Extract the
1913            # pods and do the comparisons on just those.
1914            if (! $same && $name) {
1915                my $contents = extract_pod($filename);
1916
1917                # If return is <undef>, it means that $filename no longer
1918                # exists.  This means it was a transitory file, and should not
1919                # be tested.
1920                next FILE unless defined $contents;
1921
1922                my $prior_contents = extract_pod($prior_filename);
1923
1924                # If return is <undef>, it means that $prior_filename no
1925                # longer exists.  This means it was a transitory file, and
1926                # should not have been tested, but we already did process it.
1927                # What we should do now is to back-out its records, and
1928                # process $filename in its stead.  But backing out is not so
1929                # simple, and so I'm (khw) skipping that unless and until
1930                # experience shows that it is needed.  We do go process
1931                # $filename, and there are potential false positive conflicts
1932                # with the transitory $prior_contents, and rerunning the test
1933                # should cause it to succeed.
1934                goto process_this_pod unless defined $prior_contents;
1935
1936                $same = $prior_contents eq $contents;
1937            }
1938
1939            use File::Basename 'basename';
1940            if ($same) {
1941                $checker->set_skip("The pod of $filename is a duplicate of "
1942                                    . "the pod for $prior_filename");
1943            } elsif ($prior_filename =~ /\breadme\b/i) {
1944                $checker->set_skip("$prior_filename is a README apparently for $filename");
1945            } elsif ($filename =~ /\breadme\b/i) {
1946                $checker->set_skip("$filename is a README apparently for $prior_filename");
1947            } elsif (! $do_upstream_cpan
1948                     && $filename =~ /^cpan/
1949                     && $prior_filename =~ /^cpan/)
1950            {
1951                $checker->set_skip("CPAN is upstream for $filename");
1952            } elsif ( $filename =~ /^utils/ or $prior_filename =~ /^utils/ ) {
1953                $checker->set_skip("$filename copy is in utils/");
1954            } elsif ($prior_filename =~ /^(?:cpan|ext|dist)/
1955                     && $filename !~ /^(?:cpan|ext|dist)/
1956                     && basename($prior_filename) eq basename($filename))
1957            {
1958                $checker->set_skip("$filename: Need to run make?");
1959            } else { # Here have two pods with identical names that differ
1960                $prior_checker->poderror(
1961                        { -msg => $duplicate_name,
1962                            -line => "???",
1963                            parameter => "'$filename' also has NAME '$name'"
1964                        });
1965                $checker->poderror(
1966                    { -msg => $duplicate_name,
1967                        -line => "???",
1968                        parameter => "'$prior_filename' also has NAME '$name'"
1969                    });
1970
1971                # Changing the names helps later.
1972                $prior_checker->name("$name version arbitrarily numbered 1");
1973                $checker->name("$name version arbitrarily numbered 2");
1974            }
1975
1976            # In any event, don't process this pod that has the same name as
1977            # another.
1978            next FILE;
1979        }
1980
1981    process_this_pod:
1982
1983        # A unique pod.
1984        $id_to_checker{$id} = $checker;
1985
1986        my $parsed_for_links = ", but parsed for its interior links";
1987        if ((! $do_upstream_cpan && $filename =~ /^cpan/)
1988             || $filename =~ $only_for_interior_links_re)
1989        {
1990            if ($filename =~ /^cpan/) {
1991                $checker->set_skip("CPAN is upstream for $filename");
1992            }
1993            elsif ($filename =~ /perl\d+delta/) {
1994                if (! $do_deltas) {
1995                    $checker->set_skip("$filename is a stable perldelta");
1996                }
1997            }
1998            elsif ($filename =~ /perltoc/) {
1999                $checker->set_skip("$filename dependent on component pods");
2000            }
2001            else {
2002                croak("Unexpected file '$filename' encountered that has parsing for interior-linking only");
2003            }
2004
2005            if ($name && $has_referred_to_node{$name}) {
2006                $checker->set_skip($checker->get_skip() . $parsed_for_links);
2007            }
2008        }
2009
2010        # Need a name in order to process it, because not meaningful
2011        # otherwise, and also can't test links to this without a name.
2012        if (!defined $name) {
2013            $checker->poderror( { -msg => $no_name,
2014                                  -line => '???'
2015                                });
2016            next FILE;
2017        }
2018
2019        # For skipped files, just get its NAME
2020        my $skip;
2021        if (($skip = $checker->get_skip()) && $skip !~ /$parsed_for_links/)
2022        {
2023            $checker->node($name) if $name;
2024        }
2025        elsif (! $parsed) {
2026            if (! $checker->parse_from_file($filename, undef)) {
2027                $checker->set_skip("$filename is transitory");
2028                next FILE;
2029            }
2030        }
2031
2032        # Go through everything in the file that could be an anchor that
2033        # could be a link target.  Count how many there are of the same name.
2034        foreach my $node ($checker->linkable_nodes) {
2035            next FILE if ! $node;        # Can be empty is like '=item *'
2036            $nodes{$name}{$node}++;
2037
2038            # Experiments have shown that cpan search can figure out the
2039            # target of a link even if the exact wording is incorrect, as long
2040            # as the first word is.  This happens frequently in perlfunc.pod,
2041            # where the link will be just to the function, but the target
2042            # entry also includes parameters to the function.
2043            my $first_word = $node;
2044            if ($first_word =~ s/^(\S+)\s+\S.*/$1/) {
2045                $nodes_first_word{$name}{$first_word} = $node;
2046            }
2047        }
2048        $filename_to_pod{$filename} = $name;
2049    }
2050}
2051
2052# Here, all files have been parsed, and all links and link targets are stored.
2053# Now go through the files again and see which don't have matches.
2054if (! $has_input_files) {
2055    foreach my $filename (@files) {
2056        next if $filename_to_checker{$filename}->get_skip;
2057
2058        my $checker = $filename_to_checker{$filename};
2059        foreach my $link ($checker->hyperlinks()) {
2060            my $linked_to_page = $link->page;
2061            next unless $linked_to_page;   # intra-file checks are handled by std
2062                                           # Pod::Checker
2063            # Currently, we assume all external links are valid
2064            next if $link->type eq 'url';
2065
2066            # Initialize the potential message.
2067            my %problem = ( -msg => $broken_link,
2068                            -line => $link->line,
2069                            parameter => "to \"$linked_to_page\"",
2070                        );
2071
2072            # See if we have found the linked-to_file in our parse
2073            if (exists $nodes{$linked_to_page}) {
2074                my $node = $link->node;
2075
2076                # If link is only to the page-level, already have it
2077                next if ! $node;
2078
2079                # If link is to a node that exists in the file, is ok
2080                if ($nodes{$linked_to_page}{$node}) {
2081
2082                    # But if the page has multiple targets with the same name,
2083                    # it's ambiguous which one this should be to.
2084                    if ($nodes{$linked_to_page}{$node} > 1) {
2085                        $problem{-msg} = $multiple_targets;
2086                        $problem{parameter} = "in $linked_to_page that $node could be pointing to";
2087                        $checker->poderror(\%problem);
2088                    }
2089                } elsif (! $nodes_first_word{$linked_to_page}{$node}) {
2090
2091                    # Here the link target was not found, either exactly or to
2092                    # the first word.  Is an error.
2093                    $problem{parameter} =~ s,"$,/$node",;
2094                    $checker->poderror(\%problem);
2095                }
2096
2097            } # Linked-to-file not in parse; maybe is in exception list
2098            elsif (! exists $valid_modules{$link->page}) {
2099
2100                # Here, is a link to a target that we can't find.  Check if
2101                # there is an internal link on the page with the target name.
2102                # If so, it could be that they just forgot the initial '/'
2103                # But perldelta is handled specially: only do this if the
2104                # broken link isn't one of the known bad ones (that are
2105                # placemarkers and should be removed for the final)
2106                my $NAME = $filename_to_pod{$filename};
2107                if (! defined $NAME) {
2108                    $checker->poderror(\%problem);
2109                }
2110                else {
2111                    if ($nodes{$NAME}{$linked_to_page}) {
2112                        $problem{-msg} =  $broken_internal_link;
2113                    }
2114                    $checker->poderror(\%problem);
2115                }
2116            }
2117        }
2118    }
2119}
2120
2121# If regenerating the data file, start with the modules for which we don't
2122# check targets.  If you change the sort order, you need to run --regen before
2123# committing so that future commits that do run regen don't show irrelevant
2124# changes.
2125if ($regen) {
2126    foreach (sort { lc $a cmp lc $b } keys %valid_modules) {
2127        my_safer_print($copy_fh, $_, "\n");
2128    }
2129}
2130
2131# Now ready to output the messages.
2132foreach my $filename (@files) {
2133    my $canonical = canonicalize($filename);
2134    SKIP: {
2135        my $skip = $filename_to_checker{$filename}->get_skip // "";
2136
2137        if ($regen) {
2138            foreach my $message ( sort keys %{$problems{$filename}}) {
2139                my $count;
2140
2141                # Preserve a negative setting.
2142                if ($known_problems{$canonical}{$message}
2143                    && $known_problems{$canonical}{$message} < 0)
2144                {
2145                    $count = $known_problems{$canonical}{$message};
2146                }
2147                else {
2148                    $count = @{$problems{$filename}{$message}};
2149                }
2150                my_safer_print($copy_fh, $canonical . "\t$message\t$count\n");
2151            }
2152            next;
2153        }
2154
2155        skip($skip, 1) if $skip;
2156        my @diagnostics;
2157        my $thankful_diagnostics = 0;
2158        my $indent = '  ';
2159
2160        my $total_known = 0;
2161        foreach my $message ( sort keys %{$problems{$filename}}) {
2162            $known_problems{$canonical}{$message} = 0
2163                                    if ! $known_problems{$canonical}{$message};
2164            my $diagnostic = "";
2165            my $problem_count = scalar @{$problems{$filename}{$message}};
2166            $total_known += $problem_count;
2167            next if $known_problems{$canonical}{$message} < 0;
2168
2169            # If we have new problems not previously known, we output all of
2170            # such problems, as we can't know which are really new and which
2171            # not
2172            if ($problem_count > $known_problems{$canonical}{$message}) {
2173
2174                # Here we are about to output all the messages for this type,
2175                # subtract back this number we previously added in.
2176                $total_known -= $problem_count;
2177
2178                $diagnostic .= $indent . qq{"$message"};
2179                if ($problem_count > 2) {
2180                    $diagnostic .= "  ($problem_count occurrences,"
2181			. " expected $known_problems{$canonical}{$message})";
2182                }
2183                foreach my $problem (@{$problems{$filename}{$message}}) {
2184                    $diagnostic .= " " if $problem_count == 1;
2185                    $diagnostic .= "\n$indent$indent";
2186                    $diagnostic .= "$problem->{parameter}" if $problem->{parameter};
2187                    $diagnostic .= " near line $problem->{-line} of "
2188                                   . $filename;
2189                    $diagnostic .= " $problem->{comment}" if $problem->{comment};
2190                }
2191                $diagnostic .= "\n";
2192                $files_with_unknown_issues{$filename} = 1;
2193            } elsif ($problem_count < $known_problems{$canonical}{$message}) {
2194               $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, $problem_count, $message);
2195               $thankful_diagnostics++;
2196            }
2197            push @diagnostics, $diagnostic if $diagnostic;
2198        }
2199
2200        # The above loop has output messages where there are current potential
2201        # issues.  But it misses where there were some that have been entirely
2202        # fixed.  For those, we need to look through the old issues
2203        foreach my $message ( sort keys %{$known_problems{$canonical}}) {
2204            next if $problems{$filename}{$message};
2205            next if ! $known_problems{$canonical}{$message};
2206            next if $known_problems{$canonical}{$message} < 0; # Preserve negs
2207
2208            next if !$pedantic and $message =~
2209                /^(?:\Q$line_length\E|\Q$C_not_linked\E|\Q$C_with_slash\E)/;
2210
2211            my $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, 0, $message);
2212            push @diagnostics, $diagnostic if $diagnostic;
2213            $thankful_diagnostics++ if $diagnostic;
2214        }
2215
2216        my $output = "POD of $filename";
2217        $output .= ", excluding $total_known not shown known potential problems"
2218                                                                if $total_known;
2219        if (@diagnostics && @diagnostics == $thankful_diagnostics) {
2220            # Output fixed issues as passing to-do tests, so they do not
2221            # cause failures, but t/harness still flags them.
2222            $output .= " # TODO"
2223        }
2224        ok(@diagnostics == $thankful_diagnostics, $output);
2225        if (@diagnostics) {
2226            diag(join "", @diagnostics,
2227            "See end of this test output for your options on silencing this");
2228        }
2229
2230        delete $known_problems{$canonical};
2231    }
2232}
2233
2234if (! $regen
2235    && ! ok (keys %known_problems == 0, "The known problems database ($data_dir/known_pod_issues.dat) includes no references to non-existent files"))
2236{
2237    note("The following files were not found: "
2238         . join ", ", sort keys %known_problems);
2239    note("They will automatically be removed from the db the next time");
2240    note("  cd t; ./perl -I../lib porting/podcheck.t --regen");
2241    note("is run");
2242}
2243
2244my $how_to = <<EOF;
2245   run this test script by hand, using the following formula (on
2246   Un*x-like machines):
2247        cd t
2248        ./perl -I../lib porting/podcheck.t --regen
2249EOF
2250
2251if (%files_with_unknown_issues) {
2252    my $were_count_files = scalar keys %files_with_unknown_issues;
2253    $were_count_files = ($were_count_files == 1)
2254                        ? "was $were_count_files file"
2255                        : "were $were_count_files files";
2256    my $message = <<EOF;
2257
2258HOW TO GET ${\__FILE__} TO PASS
2259
2260There $were_count_files that had new potential problems identified.
2261Some of them may be real, and some of them may be false positives because
2262this program isn't as smart as it likes to think it is.  You can teach this
2263program to ignore the issues it has identified, and hence pass, by doing the
2264following:
2265
22661) If a problem is about a link to an unknown module or man page that
2267   you know exists, re-run the command something like:
2268      ./perl -I../lib porting/podcheck.t --add-link { MODULE | man_page ... }
2269   (MODULEs should look like Foo::Bar, and man_pages should look like
2270   bar(3c); don't do this for a module or man page that you aren't sure
2271   about; instead treat as another type of issue and follow the
2272   instructions below.)
2273
22742) For other issues, decide if each should be fixed now or not.  Fix the
2275   ones you decided to, and rerun this test to verify that the fixes
2276   worked.
2277
22783) If there remain false positive or problems that you don't plan to fix right
2279   now,
2280$how_to
2281   That should cause all current potential problems to be accepted by
2282   the program, so that the next time it runs, they won't be flagged.
2283EOF
2284    if (%files_with_fixes) {
2285        $message .= "   This step will also take care of the files that have fixes in them\n";
2286    }
2287
2288    $message .= <<EOF;
2289   For a few files, such as perltoc, certain issues will always be
2290   expected, and more of the same will be added over time.  For those,
2291   before you do the regen, you can edit
2292   $known_issues
2293   and find the entry for the module's file and specific error message,
2294   and change the count of known potential problems to -1.
2295EOF
2296
2297    diag($message);
2298} elsif (%files_with_fixes) {
2299    diag(<<EOF
2300To teach this test script that the potential problems have been fixed,
2301$how_to
2302EOF
2303    );
2304}
2305
2306if ($regen) {
2307    chdir $original_t_dir || die "Can't change directories to $original_t_dir";
2308    close_and_rename($copy_fh);
2309}
2310
23111;
2312