xref: /openbsd/gnu/usr.bin/perl/t/porting/podcheck.t (revision 5dea098c)
1#!/usr/bin/perl -w
2
3package main;
4
5BEGIN {
6    chdir 't' if -d 't';
7    @INC = "../lib";
8    # Do not require test.pl, this file has its own framework.
9}
10
11use strict;
12use warnings;
13use feature 'unicode_strings';
14no warnings 'experimental::builtin';
15use builtin 'refaddr';
16
17use Carp;
18use Config;
19use Digest;
20use File::Find;
21use File::Spec;
22use Text::Tabs;
23
24$| = 1;
25
26BEGIN {
27    if ( $Config{usecrosscompile} ) {
28        print "1..0 # Not all files are available during cross-compilation\n";
29        exit 0;
30    }
31    if ($^O eq 'dec_osf') {
32        print "1..0 # $^O cannot handle this test\n";
33        exit(0);
34    }
35    if ( $ENV{'PERL_BUILD_PACKAGING'} ) {
36        print "1..0 # This distro may have modified some files in cpan/. Skipping validation. \n";
37        exit 0;
38    }
39    require '../regen/regen_lib.pl';
40}
41
42sub DEBUG { 0 };
43
44=pod
45
46=head1 NAME
47
48podcheck.t - Look for possible problems in the Perl pods
49
50=head1 SYNOPSIS
51
52 cd t
53 ./perl -I../lib porting/podcheck.t [--show-all] [--cpan] [--deltas]
54                                    [--counts] [--pedantic] [FILE ...]
55
56 ./perl -I../lib porting/podcheck.t --add-link MODULE ...
57
58 ./perl -I../lib porting/podcheck.t --regen
59
60=head1 DESCRIPTION
61
62podcheck.t is an extension of Pod::Checker.  It looks for pod errors and
63potential errors in the files given as arguments, or if none specified, in all
64pods in the distribution workspace, except certain known special ones
65(specified below).  It does additional checking beyond that done by
66Pod::Checker, and keeps a database of known potential problems, and will
67fail a pod only if the number of such problems differs from that given in the
68database.
69
70The additional checks it always makes are:
71
72=over
73
74=item Cross-pod link checking
75
76Pod::Checker verifies that links to an internal target in a pod are not
77broken.  podcheck.t extends that (when called without FILE arguments) to
78external links.  It does this by gathering up all the possible targets in the
79workspace, and cross-checking them.  It also checks that a non-broken link
80points to just one target.  (The destination pod could have two targets with
81the same name.)
82
83The way that the C<LE<lt>E<gt>> pod command works (for links outside the pod)
84is to actually create a link to C<metacpan.org> with an embedded query for
85the desired pod or man page.  That means that links outside the distribution
86are valid.  podcheck.t doesn't verify the validity of such links, but instead
87keeps a database of those known to be valid.  This means that if a link to a
88target not on the list is created, the target needs to be added to the data
89base.  This is accomplished via the L<--add-link|/--add-link MODULE ...>
90option to podcheck.t, described below.
91
92=item An internal link that isn't so specified
93
94If a link is broken, but there is an existing internal target of the same
95name, it is likely that the internal target was meant, and the C<"/"> is
96missing from the C<LE<lt>E<gt>> pod command.
97
98=item Missing or duplicate NAME or missing NAME short description
99
100A pod can't be linked to unless it has a unique name.
101And a NAME should have a dash and short description after it.
102
103=item Occurrences of the Unicode replacement character
104
105L<Pod::Simple> replaces bytes that aren't valid according to the document's
106encoding (declared or auto-detected) with C<\N{REPLACEMENT CHARACTER}>.
107
108=back
109
110If the C<PERL_POD_PEDANTIC> environment variable is set or the C<--pedantic>
111command line argument is provided, then a few more checks are made.
112The pedantic checks are:
113
114=over
115
116=item Verbatim paragraphs that wrap in an 80 (including 2 spare) column window
117
118Pod that inappropriately wraps is less legible.  Pod formatters generally wrap
119correctly, except for too long verbatim lines.  We assume that any display
120window has at least the traditional 80 columns, and check for verbatim lines
121that won't fit in that space, including when using a pager that reserves 2
122columns for its own use.  (Thus the check is for a net of 78 columns.)
123For those lines that don't fit, it tells you how much needs to be cut in
124order to fit.
125
126Often, the easiest thing to do to gain space for these is to lower the indent
127to just one space.
128
129=item Items that perhaps should be links
130
131There are mentions of apparent files in the pods that perhaps should be links
132instead, using C<LE<lt>...E<gt>>
133
134=item Items that perhaps should be C<FE<lt>...E<gt>>
135
136What look like path names enclosed in C<CE<lt>...E<gt>> should perhaps have
137C<FE<lt>...E<gt>> mark-up instead.
138
139=back
140
141A number of issues raised by podcheck.t and by the base Pod::Checker are not
142really problems, but merely potential problems, that is, false positives.
143After inspecting them and
144deciding that they aren't real problems, it is possible to shut up this program
145about them, unlike base Pod::Checker.  For a valid link to an outside module
146or man page, call podcheck.t with the C<--add-link> option to add it to the
147database of known links; for other causes, call podcheck.t with the C<--regen>
148option to regenerate the entire database.  This tells it that all existing
149issues are to not be mentioned again.
150
151C<--regen> isn't fool-proof.  The database merely keeps track of the number of these
152potential problems of each type for each pod.  If a new problem of a given
153type is introduced into the pod, podcheck.t will spit out all of them.  You
154then have to figure out which is the new one, and should it be changed or not.
155But doing it this way insulates the database from having to keep track of line
156numbers of problems, which may change, or the exact wording of each problem
157which might also change without affecting whether it is a problem or not.
158
159Also, if the count of potential problems of a given type for a pod decreases,
160the database must be regenerated so that it knows the new number.  The program
161gives instructions when this happens.
162
163Some pods will have varying numbers of problems of a given type.  This can
164be handled by manually editing the database file (see L</FILES>), and setting
165the number of those problems for that pod to a negative number.  This will
166cause the corresponding error to always be suppressed no matter how many there
167actually are.
168
169Another problem is that there is currently no check that modules listed as
170valid in the database
171actually are.  Thus any errors introduced there will remain there.
172
173=head2 Specially handled pods
174
175=over
176
177=item perltoc
178
179This pod is generated by pasting bits from other pods.  Errors in those bits
180will show up as errors here, as well as for those other pods.  Therefore
181errors here are suppressed, and the pod is checked only to verify that nodes
182within it actually exist that are externally linked to.
183
184=item perldelta
185
186The current perldelta pod is initialized from a template that contains
187placeholder text.  Some of this text is in the form of links that don't really
188exist.  Any such links that are listed in C<@perldelta_ignore_links> will not
189generate messages.  It is presumed that these links will be cleaned up when
190the perldelta is cleaned up for release since they should be marked with
191C<XXX>.
192
193=item Porting/perldelta_template.pod
194
195This is not a pod, but a template for C<perldelta>.  Any errors introduced
196by it will show up when C<perldelta> is created from it.
197
198=item cpan-upstream pods
199
200See the L</--cpan> option documentation
201
202=item old perldeltas
203
204See the L</--deltas> option documentation
205
206=back
207
208=head1 OPTIONS
209
210=over
211
212=item --add-link MODULE ...
213
214Use this option to teach podcheck.t that the C<MODULE>s or man pages actually
215exist, and to silence any messages that links to them are broken.
216
217podcheck.t checks that links within the Perl core distribution are valid, but
218it doesn't check links to man pages or external modules.  When it finds
219a broken link, it checks its database of external modules and man pages,
220and only if not found there does it raise a message.  This option just adds
221the list of modules and man page references that follow it on the command line
222to that database.
223
224For example,
225
226    cd t
227    ./perl -I../lib porting/podcheck.t --add-link Unicode::Casing
228
229causes the external module "Unicode::Casing" to be added to the database, so
230C<LE<lt>Unicode::CasingE<gt>> will be considered valid.
231
232=item --regen
233
234Regenerate the database used by podcheck.t to include all the existing
235potential problems.  Future runs of the program will not then flag any of
236these.  Setting this option also sets C<--pedantic>.
237
238=item --cpan
239
240Normally, all pods in the cpan directory are skipped, except to make sure that
241any blead-upstream links to such pods are valid.
242This option will cause cpan upstream pods to be fully checked.
243
244=item --deltas
245
246Normally, all old perldelta pods are skipped, except to make sure that
247any links to such pods are valid.  This is because they are considered
248stable, and perhaps trying to fix them will cause changes that will
249misrepresent Perl's history.  But, this option will cause them to be fully
250checked.
251
252=item --show-all
253
254Normally, if the number of potential problems of a given type found for a
255pod matches the expected value in the database, they will not be displayed.
256This option forces the database to be generally ignored during the run, so all
257potential problems are displayed and will fail their respective pod test.
258If, however, the database indicates that a particular problem type for a
259particular file is to be skipped, this option doesn't override that unless
260that particular file is passed specifically as one of the FILE parameters on
261the command line.  And, passing particular FILEs selects this option in
262general.
263
264=item --counts
265
266Instead of testing, this just dumps the counts of the occurrences of the
267various types of potential problems in the database.
268
269=item --pedantic
270
271There are three potential problems that are not checked for by default.
272This options enables them. The environment variable C<PERL_POD_PEDANTIC>
273can be set to 1 to enable this option also.
274This option is set when C<--regen> is used.
275
276=back
277
278=head1 FILES
279
280The database is stored in F<t/porting/known_pod_issues.dat>
281
282=head1 SEE ALSO
283
284L<Pod::Checker>
285
286=cut
287
288# VMS builds have a '.com' appended to utility and script names, and it adds a
289# trailing dot for any other file name that doesn't have a dot in it.  The db
290# is stored without those things.  This regex allows for these special file
291# names to be dealt with.  It needs to be interpolated into a larger regex
292# that furnishes the closing boundary.
293my $vms_re = qr/ \. (?: com )? /x;
294
295# Some filenames in the MANIFEST match $vms_re, and so must not be handled the
296# same way that the special vms ones are.  This hash lists those.
297my %special_vms_files;
298
299# This is to get this to work across multiple file systems, including those
300# that are not case sensitive.  The db is stored in lower case, Un*x style,
301# and all file name comparisons are done that way.
302sub canonicalize($) {
303    my $input = shift;
304    my ($volume, $directories, $file)
305                    = File::Spec->splitpath(File::Spec->canonpath($input));
306    # Assumes $volume is constant for everything in this directory structure
307    $directories = "" if ! $directories;
308    $file = "" if ! $file;
309    $file = lc join '/', File::Spec->splitdir($directories), $file;
310    $file =~ s! / /+ !/!gx;       # Multiple slashes => single slash
311
312    # The db is stored without the special suffixes that are there in VMS, so
313    # strip them off to get the comparable name.  But some files on all
314    # platforms have these suffixes, so this shouldn't happen for them, as any
315    # of their db entries will have the suffixes in them.  The hash has been
316    # populated with these files.
317    if ($^O eq 'VMS'
318        && $file =~ / ( $vms_re ) $ /x
319        && ! exists $special_vms_files{$file})
320    {
321        $file =~ s/ $1 $ //x;
322    }
323    return $file;
324}
325
326#####################################################
327# HOW IT WORKS (in general)
328#
329# If not called with specific files to check, the directory structure is
330# examined for files that have pods in them.  Files that might not have to be
331# fully parsed (e.g. in cpan) are parsed enough at this time to find their
332# pod's NAME, and to get a checksum.
333#
334# Those kinds of files are sorted last, but otherwise the pods are parsed with
335# the package coded here, My::Pod::Checker, which is an extension to
336# Pod::Checker that adds some tests and suppresses others that aren't
337# appropriate.  The latter module has no provision for capturing diagnostics,
338# so a package, Tie_Array_to_FH, is used to force them to be placed into an
339# array instead of printed.
340#
341# Parsing the files builds up a list of links.  The files are gone through
342# again, doing cross-link checking and outputting all saved-up problems with
343# each pod.
344#
345# Sorting the files last that potentially don't need to be fully parsed allows
346# us to not parse them unless there is a link to an internal anchor in them
347# from something that we have already parsed.  Keeping checksums allows us to
348# not parse copies of other pods.
349#
350#####################################################
351
352# 1 => Exclude low priority messages that aren't likely to be problems, and
353# has many false positives; higher numbers give more messages.
354my $Warnings_Level = 200;
355
356# perldelta during construction may have place holder links.  N.B.  This
357# variable is referred to by name in release_managers_guide.pod
358our @perldelta_ignore_links = ( "XXX", "perl5YYYdelta", "perldiag/message" );
359
360# To see if two pods with the same NAME are actually copies of the same pod,
361# which is not an error, it uses a checksum to save work.
362my $digest_type = "SHA-1";
363
364my $original_t_dir = File::Spec->rel2abs(File::Spec->curdir);
365my $data_dir = File::Spec->catdir($original_t_dir, 'porting');
366my $known_issues = File::Spec->catfile($data_dir, 'known_pod_issues.dat');
367my $MANIFEST = File::Spec->catfile(File::Spec->updir($original_t_dir), 'MANIFEST');
368my $copy_fh;
369
370my $MAX_LINE_LENGTH = 78;   # 78 columns
371my $INDENT = 4;             # Lines other than '=head' lines are indented at
372                            # least this much
373
374# Our warning messages.  Better not have [('"] in them, as those are used as
375# delimiters for variable parts of the messages by poderror.
376my $broken_link = "Apparent broken link";
377my $broken_internal_link = "Apparent internal link is missing its forward slash";
378my $multiple_targets = "There is more than one target";
379my $duplicate_name = "Pod NAME already used";
380my $no_name = "There is no NAME";
381my $missing_name_description = "The NAME should have a dash and short description after it";
382my $replacement_character = "Unicode replacement character found";
383# the pedantic warnings messages
384my $line_length = "Verbatim line length including indents exceeds $MAX_LINE_LENGTH by";
385my $C_not_linked = "? Should you be using L<...> instead of";
386my $C_with_slash = "? Should you be using F<...> or maybe L<...> instead of";
387
388# objects, tests, etc can't be pods, so don't look for them. Also skip
389# files output by the patch program.  Could also ignore most of .gitignore
390# files, but not all, so don't.
391
392my $obj_ext = $Config{'obj_ext'}; $obj_ext =~ tr/.//d; # dot will be added back
393my $lib_ext = $Config{'lib_ext'}; $lib_ext =~ tr/.//d;
394my $lib_so  = $Config{'so'};      $lib_so  =~ tr/.//d;
395my $dl_ext  = $Config{'dlext'};   $dl_ext  =~ tr/.//d;
396
397# Not really pods, but can look like them.
398my %excluded_files = (
399                        canonicalize("lib/unicore/mktables") => 1,
400                        canonicalize("Porting/make-rmg-checklist") => 1,
401                        canonicalize("Porting/perldelta_template.pod") => 1,
402                        canonicalize("regen/feature.pl") => 1,
403                        canonicalize("regen/warnings.pl") => 1,
404                        canonicalize("autodoc.pl") => 1,
405                        canonicalize("configpm") => 1,
406                        canonicalize("miniperl") => 1,
407                        canonicalize("perl") => 1,
408                        canonicalize('cpan/Pod-Perldoc/corpus/no-head.pod') => 1,
409                        canonicalize('cpan/Pod-Perldoc/corpus/perlfunc.pod') => 1,
410                        canonicalize('cpan/Pod-Perldoc/corpus/utf8.pod') => 1,
411                        canonicalize("lib/unicore/mktables") => 1,
412                        canonicalize("dist/devel-ppport/parts/inc/ppphdoc") => 1,
413                    );
414
415# This list should not include anything for which case sensitivity is
416# important, as it won't work on VMS, and won't show up until tested on VMS.
417# All or almost all such files should be listed in the MANIFEST, so that can
418# be examined for them, and each such file explicitly excluded, as is done for
419# .PL files in the loop just below this.  For files not catchable this way,
420# is_pod_file() can be used to exclude these at a finer grained level.
421my $non_pods = qr/
422                (?: \. (?: [achot]  | zip | gz | bz2 | jar | tar | tgz
423                           | orig | rej | patch   # Patch program output
424                           | sw[op] | \#.*  # Editor droppings
425                           | old      # buildtoc output
426                           | xs       # pod should be in the .pm file
427                           | al       # autosplit files
428                           | bs       # bootstrap files
429                           | (?i:sh)  # shell scripts, hints, templates
430                           | lst      # assorted listing files
431                           | bat      # Windows,OS2 batch files
432                           | cmd      # Windows,OS2 command files
433                           | lis      # VMS compiler listings
434                           | map      # VMS linker maps
435                           | opt      # VMS linker options files
436                           | mms      # MM(K|S) description files
437                           | ts       # timestamp files generated during build
438                           | txt      # plain text
439                           | $obj_ext # object files
440                           | exe      # $Config{'exe_ext'} might be empty string
441                           | $lib_ext # object libraries
442                           | $lib_so  # shared libraries
443                           | $dl_ext  # dynamic libraries
444                           | gif      # GIF images (example files from CGI.pm)
445                           | eg       # examples from libnet
446                           | U        # metaconfig unit
447                           | core .*
448                       )
449                 $
450               ) | ~$
451                 | \ \(Autosaved\)\.txt$ # Other editor droppings
452                 | ^cxx\$demangler_db\.$ # VMS name mangler database
453                 | ^typemap\.?$          # typemap files
454                 | ^(?i:Makefile\.PL)$
455                 | ^core (?: $ | \. .* )
456                 | ^vgcore\.[1-9][0-9]*$
457                 | \b Changes \b
458             /x;
459
460# Matches something that looks like a file name, but is enclosed in C<...>
461my $C_path_re = qr{ ^
462                        # exclude various things that have slashes
463                        # in them but aren't paths
464                        (?!
465                            (?: (?: s | qr | m | tr | y ) / ) # regexes
466                            | \d+/\d+ \b       # probable fractions
467                            | (?: [LF] < )+
468                            | OS/2 \b
469                            | Perl/perl.git \b
470                            | Perl/perl5.git \b
471                            | Perl/Tk \b
472                            | origin/blead \b
473                            | origin/maint \b
474                        )
475                        /?  # Optional initial slash
476                        \w+ # First component of path, doesn't begin with
477                            # a minus
478                        (?: / [-\w]+ )+ # Subsequent path components
479                        (?: \. \w+ )?   # Optional trailing dot and suffix
480                        >*  # Any enclosed L< F< have matching closing >
481                        $
482                    }x;
483
484# '.PL' files should be excluded, as they aren't final pods, but often contain
485# material used in generating pods, and so can look like a pod.  We can't use
486# the regexp above because case sensitivity is important for these, as some
487# '.pl' files should be examined for pods.  Instead look through the MANIFEST
488# for .PL files and get their full path names, so we can exclude each such
489# file explicitly.  This works because other porting tests prohibit having two
490# files with the same names except for case.
491open my $manifest_fh, '<:bytes', $MANIFEST or die "Can't open $MANIFEST";
492while (<$manifest_fh>) {
493
494    # While we have MANIFEST open, on VMS platforms, look for files that match
495    # the magic VMS file names that have to be handled specially.  Add these
496    # to the list of them.
497    if ($^O eq 'VMS' && / ^ ( [^\t]* $vms_re ) \t /x) {
498        $special_vms_files{$1} = 1;
499    }
500    if (/ ^ ( [^\t]* \. PL ) \t /x) {
501        $excluded_files{canonicalize($1)} = 1;
502    }
503}
504close $manifest_fh, or die "Can't close $MANIFEST";
505
506
507# Pod::Checker messages to suppress
508my @suppressed_messages = (
509    # We catch independently the ones that are real problems.
510    qr/multiple occurrences \(\d+\) of link target/,
511
512    "unescaped <>",                 # Not every '<' or '>' need be escaped
513    qr/No items in =over/,          # i.e., a blockquote, which we consider legal
514);
515
516sub suppressed {
517    # Returns bool as to if input message is one that is to be suppressed
518
519    my $message = shift;
520
521    return grep { $message =~ /^$_/i } @suppressed_messages;
522}
523
524{   # Closure to contain a simple subset of test.pl.  This is to get rid of the
525    # unnecessary 'failed at' messages that would otherwise be output pointing
526    # to a particular line in this file.
527
528    my $current_test = 0;
529    my $planned;
530
531    sub plan {
532        my %plan = @_;
533        $planned = $plan{tests} + 1;    # +1 for final test that files haven't
534                                        # been removed
535        print "1..$planned\n";
536        return;
537    }
538
539    sub ok {
540        my $success = shift;
541        my $message = shift;
542
543        chomp $message;
544
545        $current_test++;
546        print "not " unless $success;
547        print "ok $current_test - $message\n";
548        return $success;
549    }
550
551    sub skip {
552        my $why = shift;
553        my $n    = @_ ? shift : 1;
554        for (1..$n) {
555            $current_test++;
556            print "ok $current_test # skip $why\n";
557        }
558        no warnings 'exiting';
559        last SKIP;
560    }
561
562    sub _note {
563        my ($andle, $message) = @_;
564
565        chomp $message;
566
567        print $andle $message =~ s/^/# /mgr;
568        print $andle "\n";
569        return;
570    }
571
572    sub note { unshift @_, \*STDOUT; goto &_note }
573
574    sub diag { unshift @_, \*STDERR; goto &_note }
575
576    END {
577        if ($planned && $planned != $current_test) {
578            print STDERR
579            "# Looks like you planned $planned tests but ran $current_test.\n";
580        }
581    }
582}
583
584# List of known potential problems by pod and type.
585my %known_problems;
586
587# Pods given by the keys contain an interior node that is referred to from
588# outside it.
589my %has_referred_to_node;
590
591my $show_counts = 0;
592my $regen = 0;
593my $add_link = 0;
594my $show_all = 0;
595my $pedantic = 0;
596
597my $do_upstream_cpan = 0; # Assume that are to skip anything in /cpan
598my $do_deltas = 0;        # And stable perldeltas
599
600while (@ARGV && substr($ARGV[0], 0, 1) eq '-') {
601    my $arg = shift @ARGV;
602
603    $arg =~ s/^--/-/; # Treat '--' the same as a single '-'
604    if ($arg eq '-regen') {
605        $regen = 1;
606        $pedantic = 1;
607    }
608    elsif ($arg =~ /^-add[-_]link$/) {
609        $add_link = 1;
610    }
611    elsif ($arg eq '-cpan') {
612        $do_upstream_cpan = 1;
613    }
614    elsif ($arg eq '-deltas') {
615        $do_deltas = 1;
616    }
617    elsif ($arg =~ /^-show[-_]all$/) {
618        $show_all = 1;
619    }
620    elsif ($arg eq '-counts') {
621        $show_counts = 1;
622    }
623    elsif ($arg eq '-pedantic') {
624        $pedantic = 1;
625    }
626    else {
627        die <<EOF;
628Unknown option '$arg'
629
630Usage: $0 [ --regen | --cpan | --show-all | FILE ... | --add-link MODULE ... ]\n"
631    --add-link -> Add the MODULE and man page references to the database
632    --regen    -> Regenerate the data file for $0
633    --cpan     -> Include files in the cpan subdirectory.
634    --deltas   -> Include stable perldeltas
635    --show-all -> Show all known potential problems
636    --counts   -> Don't test, but give summary counts of the currently
637                  existing database
638    --pedantic -> Check for overly long lines in verbatim blocks
639EOF
640    }
641}
642
643$pedantic = 1 if exists $ENV{PERL_POD_PEDANTIC} and $ENV{PERL_POD_PEDANTIC};
644my @files = @ARGV;
645
646my $cpan_or_deltas = $do_upstream_cpan || $do_deltas;
647if (($regen + $show_all + $show_counts + $add_link + $cpan_or_deltas ) > 1) {
648    croak "--regen, --show-all, --counts, and --add-link are mutually"
649        . " exclusive\n and none can be run with --cpan nor --deltas";
650}
651
652my $has_input_files = @files;
653
654
655if ($add_link) {
656    if (! $has_input_files) {
657        croak "--add-link requires at least one module or man page reference";
658    }
659}
660elsif ($has_input_files) {
661    if ($regen || $show_counts || $do_upstream_cpan || $do_deltas) {
662        croak "--regen, --counts, --deltas, and --cpan can't be used since using specific files";
663    }
664    foreach my $file (@files) {
665        croak "Can't read file '$file'" if ! -r $file;
666    }
667}
668
669our %problems;  # potential problems found in this run
670
671package My::Pod::Checker {      # Extend Pod::Checker
672    use parent 'Pod::Checker';
673
674    # Uses inside out hash to protect from typos
675    # For new fields, remember to add to destructor DESTROY()
676    my %CFL_text;           # The text comprising the current C<>, F<>, or L<>
677    my %C_text;             # If defined, are in a C<> section, and includes
678                            # the accumulated text from that
679    my %current_indent;     # Current line's indent
680    my %filename;           # The pod is stored in this file
681    my %in_CFL;             # count of stacked C<>, F<>, L<> directives
682    my %indents;            # Stack of indents from =over's in effect for
683                            # current line
684    my %in_for;             # true if in a =for or =begin
685    my %in_NAME;            # true if within NAME section
686    my %in_begin;           # true if within =begin section
687    my %in_X;               # true if in a X<>
688    my %linkable_item;      # Bool: if the latest =item is linkable.  It isn't
689                            # for bullet and number lists
690    my %linkable_nodes;     # Pod::Checker adds all =items to its node list,
691                            # but not all =items are linkable-to
692    my %running_CFL_text;   # The current text that is being accumulated until
693                            # an end_FOO is found, and this includes any C<>,
694                            # F<>, or L<> directives.
695    my %running_simple_text; # The currentt text that is being accumulated
696                            # until an end_FOO is found, and all directives
697                            # have been expanded into plain text
698    my %command_count;      # Number of commands seen
699    my %seen_pod_cmd;       # true if have =pod earlier
700    my %skip;               # is SKIP set for this pod
701    my %start_line;         # the first input line number in the thing
702                            # currently being worked on
703
704    sub DESTROY {
705        my $addr = refaddr $_[0];
706        delete $CFL_text{$addr};
707        delete $C_text{$addr};
708        delete $command_count{$addr};
709        delete $current_indent{$addr};
710        delete $filename{$addr};
711        delete $in_begin{$addr};
712        delete $in_CFL{$addr};
713        delete $indents{$addr};
714        delete $in_for{$addr};
715        delete $in_NAME{$addr};
716        delete $in_X{$addr};
717        delete $linkable_item{$addr};
718        delete $linkable_nodes{$addr};
719        delete $running_CFL_text{$addr};
720        delete $running_simple_text{$addr};
721        delete $seen_pod_cmd{$addr};
722        delete $skip{$addr};
723        delete $start_line{$addr};
724        return;
725    }
726
727    sub new {
728        my $class = shift;
729        my $filename = shift;
730
731        my $self = $class->SUPER::new(-quiet => 1,
732                                     -warnings => $Warnings_Level);
733        my $addr = refaddr $self;
734        $command_count{$addr} = 0;
735        $current_indent{$addr} = 0;
736        $filename{$addr} = $filename;
737        $in_begin{$addr} = 0;
738        $in_X{$addr} = 0;
739        $in_CFL{$addr} = 0;
740        $in_NAME{$addr} = 0;
741        $linkable_item{$addr} = 0;
742        $seen_pod_cmd{$addr} = 0;
743        return $self;
744    }
745
746    # re's for messages that Pod::Checker outputs
747    my $location = qr/ \b (?:in|at|on|near) \s+ /xi;
748    my $optional_location = qr/ (?: $location )? /xi;
749    my $line_reference = qr/ [('"]? $optional_location \b line \s+
750                             (?: \d+ | EOF | \Q???\E | - )
751                             [)'"]? /xi;
752
753    sub poderror {  # Called to register a potential problem
754
755        # This adds an extra field to the parent hash, 'parameter'.  It is
756        # used to extract the variable parts of a message leaving just the
757        # constant skeleton.  This in turn allows the message to be
758        # categorized better, so that it shows up as a single type in our
759        # database, with the specifics of each occurrence not being stored with
760        # it.
761
762        my $self = shift;
763        my $opts = shift;
764
765        my $addr = refaddr $self;
766        return if $skip{$addr};
767
768        # Input can be a string or hash.  If a string, parse it to separate
769        # out the line number and convert to a hash for easier further
770        # processing
771        my $message;
772        if (ref $opts ne 'HASH') {
773            $message = join "", $opts, @_;
774            my $line_number;
775            if ($message =~ s/\s*($line_reference)//) {
776                ($line_number = $1) =~ s/\s*$optional_location//;
777            }
778            else {
779                $line_number = '???';
780            }
781            $opts = { -msg => $message, -line => $line_number };
782        } else {
783            $message = $opts->{'-msg'};
784
785        }
786
787        $message =~ s/^\d+\s+//;
788        return if main::suppressed($message);
789
790        $self->SUPER::poderror($opts, @_);
791
792        $opts->{parameter} = "" unless $opts->{parameter};
793
794        # The variable parts of the message tend to be enclosed in '...',
795        # "....", or (...).  Extract them and put them in an extra field,
796        # 'parameter'.  This is trickier because the matching delimiter to a
797        # '(' is its mirror, and not itself.  Text::Balanced could be used
798        # instead.
799        while ($message =~ m/ \s* $optional_location ( [('"] )/xg) {
800            my $delimiter = $1;
801            my $start = $-[0];
802            $delimiter = ')' if $delimiter eq '(';
803
804            # If there is no ending delimiter, don't consider it to be a
805            # variable part.  Most likely it is a contraction like "Don't"
806            last unless $message =~ m/\G .+? \Q$delimiter/xg;
807
808            my $length = $+[0] - $start;
809
810            # Get the part up through the closing delimiter
811            my $special = substr($message, $start, $length);
812            $special =~ s/^\s+//;   # No leading whitespace
813
814            # And add that variable part to the parameter, while removing it
815            # from the message.  This isn't a foolproof way of finding the
816            # variable part.  For example '(s)' can occur in e.g.,
817            # 'paragraph(s)'
818            if ($special ne '(s)') {
819                substr($message, $start, $length) = "";
820                pos $message = $start;
821                $opts->{-msg} = $message;
822                $opts->{parameter} .= " " if $opts->{parameter};
823                $opts->{parameter} .= $special;
824            }
825        }
826
827        # Extract any additional line number given.  This is often the
828        # beginning location of something whereas the main line number gives
829        # the ending one.
830        if ($message =~ /( $line_reference )/xi) {
831            my $line_ref = $1;
832            while ($message =~ s/\s*\Q$line_ref//) {
833                $opts->{-msg} = $message;
834                $opts->{parameter} .= " " if $opts->{parameter};
835                $opts->{parameter} .= $line_ref;
836            }
837        }
838
839        Carp::carp("Couldn't extract line number from '$message'") if $message =~ /line \d+/;
840        push @{$problems{$filename{$addr}}{$message}}, $opts;
841        #push @{$problems{$self->get_filename}{$message}}, $opts;
842    }
843
844    # In the next subroutines, we keep track of the text of the current
845    # innermost thing, like F<fooC<bar>baz>.  The things we care about raising
846    # messages about in this program all come from a single sequence of
847    # characters uninterrupted by other pod commands.  Therefore we don't have
848    # to worry about recursion, and we can just set the string we care about
849    # to empty on entrance to each command.
850
851    sub handle_text {
852        # This is called by the parent class to deal with any straight text.
853        # We mostly just append this to the running current value which will
854        # be dealt with upon the end of the current construct, like a
855        # paragraph.  But certain things don't contribute to checking the pod
856        # and are ignored.  We also have set flags to indicate this text is
857        # going towards constructing certain constructs, and handle those
858        # specially.
859
860        my $self = shift;
861        my $addr = refaddr $self;
862
863        my $return = $self->SUPER::handle_text(@_);
864
865        if ($in_X{$addr} || $in_for{$addr}) { # ignore
866            return $return;
867        }
868
869        my $text = join "\n", @_;
870        $running_simple_text{$addr} .= $text;
871
872        # Keep separate tabs on C<>, F<>, and L<> directives, and one
873        # especially for C<> ones.
874        if ($in_CFL{$addr}) {
875            $CFL_text{$addr} .= $text;
876            $C_text{$addr} .= $text if defined $C_text{$addr};
877        }
878        else {
879            # This variable is updated instead in the corresponding C, F, or L
880            # handler.
881            $running_CFL_text{$addr} .= $text;
882        }
883
884        # do this line-by-line so we can get the right line number
885        my @lines = split /^/, $running_simple_text{$addr};
886        for my $i (0..$#lines) {
887            if ($lines[$i] =~ m/\N{REPLACEMENT CHARACTER}/) {
888                $self->poderror({ -line => $start_line{$addr} + $i,
889                    -msg => $replacement_character,
890                    parameter => "possibly invalid ". $self->encoding . " input at character " . pos $lines[$i],
891                });
892            }
893        }
894        return $return;
895    }
896
897    # The start_FOO routines check that somehow a C<> construct hasn't escaped
898    # without being checked, and initialize things, and call the parent
899    # class's equivalent routine.
900
901    # The end_FOO routines close things off, and check the text that has been
902    # accumulated for FOO, then call the parent's corresponding routine.
903
904    sub start_Para {
905        my $self = shift;
906        check_see_but_not_link($self);
907
908        my $addr = refaddr $self;
909        $start_line{$addr} = $_[0]->{start_line};
910        $running_CFL_text{$addr} = "";
911        $running_simple_text{$addr} = "";
912        return $self->SUPER::start_Para(@_);
913    }
914
915    sub start_item {
916        my $self = shift;
917        check_see_but_not_link($self);
918
919        my $addr = refaddr $self;
920        $start_line{$addr} = $_[0]->{start_line};
921        $running_CFL_text{$addr} = "";
922        $running_simple_text{$addr} = "";
923
924    }
925
926    sub start_item_text {
927        my $self = shift;
928        start_item($self);
929        my $addr = refaddr $self;
930
931        # This is the only =item that is linkable
932        $linkable_item{$addr} = 1;
933
934        return $self->SUPER::start_item_text(@_);
935    }
936
937    sub start_item_number {
938        my $self = shift;
939        start_item($self);
940
941        return $self->SUPER::start_item_number(@_);
942    }
943
944    sub start_item_bullet {
945        my $self = shift;
946        start_item($self);
947
948        return $self->SUPER::start_item_bullet(@_);
949    }
950
951    sub end_item {  # No difference in =item types endings
952        my $self = shift;
953        check_see_but_not_link($self);
954        return $self->SUPER::end_item(@_);
955    }
956
957    sub start_over {
958        my $self = shift;
959        check_see_but_not_link($self);
960
961        my $addr = refaddr $self;
962        $start_line{$addr} = $_[0]->{start_line};
963        $running_CFL_text{$addr} = "";
964        $running_simple_text{$addr} = "";
965
966        # Save this indent on a stack, and keep track of total indent
967        my $indent =  $_[0]{'indent'};
968        push @{$indents{$addr}}, $indent;
969        $current_indent{$addr} += $indent;
970
971        return $self->SUPER::start_over(@_);
972    }
973
974    sub end_over_bullet { shift->end_over(@_) }
975    sub end_over_number { shift->end_over(@_) }
976    sub end_over_text   { shift->end_over(@_) }
977    sub end_over_block  { shift->end_over(@_) }
978    sub end_over_empty  { shift->end_over(@_) }
979    sub end_over {
980        my $self = shift;
981        check_see_but_not_link($self);
982
983        my $addr = refaddr $self;
984
985        # Pop current indent
986        if (@{$indents{$addr}}) {
987            $current_indent{$addr} -= pop @{$indents{$addr}};
988        }
989        else {
990            # =back without corresponding =over, but should have
991            # warned already
992            $current_indent{$addr} = 0;
993        }
994    }
995
996    sub check_see_but_not_link {
997
998        # Looks through accumulated text for current element to see if it
999        # refers to something that should be linked to, but isn't.
1000
1001        my $self = shift;
1002        my $addr = refaddr $self;
1003
1004        return unless defined $running_CFL_text{$addr};
1005
1006        while ($running_CFL_text{$addr} =~ m{
1007                                ( (?: \w+ \s+ )* )  # The phrase before, if any
1008                                \b [Ss]ee \s+
1009                                ( ( [^L] )
1010                                  <
1011                                  ( [^<]*? )  # The not < excludes nested C<L<...
1012                                  >
1013                                )
1014                                ( \s+ (?: under | in ) \s+ L< )?
1015                            }xg)
1016        {
1017            my $prefix = $1 // "";
1018            my $construct = $2;     # The whole thing, like C<...>
1019            my $type = $3;
1020            my $interior = $4;
1021            my $trailing = $5;      # After the whole thing ending in "L<"
1022
1023            # If the full phrase is something like, "you might see C<", or
1024            # similar, it really isn't a reference to a link.  The ones I saw
1025            # all had the word "you" in them; and the "you" wasn't the
1026            # beginning of a sentence.
1027            if ($prefix !~ / \b you \b /x) {
1028
1029                # Now, find what the module or man page name within the
1030                # construct would be if it actually has L<> syntax.  If it
1031                # doesn't have that syntax, will set the module to the entire
1032                # interior.
1033                if (! defined $trailing # not referring to something in another
1034                                        # section
1035                    && $interior !~ /$non_pods/
1036
1037                    # There can't be spaces (I think) in module names or man
1038                    # pages
1039                    && $interior !~ / \s /x
1040
1041                    # F<> that end in eg \.pl are almost certainly ok, as are
1042                    # those that look like a path with multiple "/" chars
1043                    && ($type ne "F"
1044                        || (! -e $interior
1045                            && $interior !~ /\.\w+$/
1046                            && $interior !~ /\/.+\//)
1047                    )
1048                ) {
1049                    # TODO: move the checking of $pedantic higher up
1050                    $self->poderror({ -line => $start_line{$addr},
1051                        -msg => $C_not_linked,
1052                        parameter => $construct
1053                    });
1054                }
1055            }
1056        }
1057
1058        undef $running_CFL_text{$addr};
1059    }
1060
1061    sub end_Para {
1062        my $self = shift;
1063        check_see_but_not_link($self);
1064
1065        my $addr = refaddr $self;
1066        if ($in_NAME{$addr}) {
1067            if ($running_simple_text{$addr} =~ /^\s*(\S+?)\s*$/) {
1068                $self->poderror({ -line => $start_line{$addr},
1069                    -msg => $missing_name_description,
1070                    parameter => $1});
1071            }
1072            $in_NAME{$addr} = 0;
1073        }
1074        $self->SUPER::end_Para(@_);
1075    }
1076
1077    sub start_head1 {
1078        my $self = shift;
1079        check_see_but_not_link($self);
1080
1081        my $addr = refaddr $self;
1082        $start_line{$addr} = $_[0]->{start_line};
1083        $running_CFL_text{$addr} = "";
1084        $running_simple_text{$addr} = "";
1085
1086        return $self->SUPER::start_head1(@_);
1087    }
1088
1089    sub end_head1 {  # This is called at the end of the =head line.
1090        my $self = shift;
1091        check_see_but_not_link($self);
1092
1093        my $addr = refaddr $self;
1094
1095        $in_NAME{$addr} = 1 if $running_simple_text{$addr} eq 'NAME';
1096        return $self->SUPER::end_head(@_);
1097    }
1098
1099    sub start_Verbatim {
1100        my $self = shift;
1101        check_see_but_not_link($self);
1102
1103        my $addr = refaddr $self;
1104        $running_simple_text{$addr} = "";
1105        $start_line{$addr} = $_[0]->{start_line};
1106        return $self->SUPER::start_Verbatim(@_);
1107    }
1108
1109    sub end_Verbatim {
1110        my $self = shift;
1111        my $addr = refaddr $self;
1112
1113        # Pick up the name if it looks like one, since the parent class
1114        # doesn't handle verbatim NAMEs
1115        if ($in_NAME{$addr}
1116            && $running_simple_text{$addr} =~ /^\s*(\S+?)\s*[,-]/)
1117        {
1118            $self->name($1);
1119        }
1120
1121        my $indent = $self->get_current_indent;
1122
1123        # Look at each line to verify it is short enough
1124        my @lines = split /^/, $running_simple_text{$addr};
1125        for my $i (0 .. @lines - 1) {
1126            $lines[$i] =~ s/\s+$//;
1127            my $exceeds = length(Text::Tabs::expand($lines[$i]))
1128                        + $indent - $MAX_LINE_LENGTH;
1129            next unless $exceeds > 0;
1130
1131            $self->poderror({ -line => $start_line{$addr} + $i,
1132                -msg => $line_length,
1133                parameter => "+$exceeds (including " . ($indent - $INDENT) . " from =over's)",
1134            });
1135        }
1136
1137        undef $running_simple_text{$addr};
1138
1139        # Parent class didn't bother to define this
1140        #return $self->SUPER::SUPER::end_Verbatim(@_);
1141    }
1142
1143    sub start_C {
1144        my $self = shift;
1145        my $addr = refaddr $self;
1146
1147        $C_text{$addr} = "";
1148
1149        # If not in a stacked set of C<>, F<> and L<>, initialize the text for
1150        # them.
1151        $CFL_text{$addr} = "" if ! $in_CFL{$addr};
1152        $in_CFL{$addr}++;
1153
1154        return $self->SUPER::start_C(@_);
1155    }
1156
1157    sub start_F {
1158        my $self = shift;
1159        my $addr = refaddr $self;
1160
1161        $CFL_text{$addr} = "" if ! $in_CFL{$addr};
1162        $in_CFL{$addr}++;
1163        return $self->SUPER::start_F(@_);
1164    }
1165
1166    sub start_L {
1167        my $self = shift;
1168        my $addr = refaddr $self;
1169
1170        $CFL_text{$addr} = "" if ! $in_CFL{$addr};
1171        $in_CFL{$addr}++;
1172        return $self->SUPER::start_L(@_);
1173    }
1174
1175    sub end_C {
1176        my $self = shift;
1177        my $addr = refaddr $self;
1178
1179        # Warn if looks like a file or link enclosed instead by this C<>
1180        if ($C_text{$addr} =~ qr/^ $C_path_re $/x) {
1181            # Here it does look like it could be a file path or a link.
1182            # But some varieties of regex patterns could also fit with what we
1183            # have so far.  Weed those out as best we can.  '/foo/' is almost
1184            # certainly meant to be a pattern, as is '/foo/g'.
1185            my $is_pattern;
1186            if ($C_text{$addr} !~ qr| ^ / [^/]* / ( [msixpodualngcr]* ) $ |x) {
1187                $is_pattern = 0;
1188            }
1189            else {
1190
1191                # Here, it looks like a pattern potentially followed by some
1192                # modifiers.  To make doubly sure, don't count as patterns
1193                # those constructs which have more occurrences (generally 1)
1194                # of a modifier than is legal.
1195                my %counts;
1196                map { $counts{$_}++ } split "", $1;
1197                foreach my $modifier (keys %counts) {
1198                    if ($counts{$modifier} > (($modifier eq 'a')
1199                                              ? 2
1200                                              : 1))
1201                    {
1202                        $is_pattern = 0;
1203                        last;
1204                    }
1205                }
1206                $is_pattern = 1 unless defined $is_pattern;
1207            }
1208
1209            unless ($is_pattern) {
1210                $self->poderror({ -line => $start_line{$addr},
1211                    -msg => $C_with_slash,
1212                    parameter => "C<$C_text{$addr}>"
1213                });
1214            }
1215        }
1216        undef $C_text{$addr};
1217
1218        # Add the current text to the running total.  This was not done in
1219        # handle_text(), because it just sees the plain text of the innermost
1220        # stacked directive.  We want to keep all the directive names
1221        # enclosing the text.  Otherwise the fact that C<L<foobar>> is to a
1222        # link would be lost, as the L<> would be gone.
1223        $CFL_text{$addr} = "C<$CFL_text{$addr}>";
1224
1225        # Add this text to the whole running total only if popping this
1226        # directive off the stack leaves it empty.  As long as something is on
1227        # the stack, it gets added to $CFL_text (just above).  It is only
1228        # entirely constructed when the stack is empty.
1229        $in_CFL{$addr}--;
1230        $running_CFL_text{$addr} .= $CFL_text{$addr} if ! $in_CFL{$addr};
1231
1232        return $self->SUPER::end_C(@_);
1233    }
1234
1235    sub end_F {
1236        my $self = shift;
1237        my $addr = refaddr $self;
1238
1239        $CFL_text{$addr} = "F<$CFL_text{$addr}>";
1240        $in_CFL{$addr}--;
1241        $running_CFL_text{$addr} .= $CFL_text{$addr} if ! $in_CFL{$addr};
1242        return $self->SUPER::end_F(@_);
1243    }
1244
1245    sub end_L {
1246        my $self = shift;
1247        my $addr = refaddr $self;
1248
1249        $CFL_text{$addr} = "L<$CFL_text{$addr}>";
1250        $in_CFL{$addr}--;
1251        $running_CFL_text{$addr} .= $CFL_text{$addr} if ! $in_CFL{$addr};
1252        return $self->SUPER::end_L(@_);
1253    }
1254
1255    sub start_X {
1256        my $self = shift;
1257        my $addr = refaddr $self;
1258
1259        $in_X{$addr} = 1;
1260        return $self->SUPER::start_X(@_);
1261    }
1262
1263    sub end_X {
1264        my $self = shift;
1265        my $addr = refaddr $self;
1266
1267        $in_X{$addr} = 0;
1268        return $self->SUPER::end_X(@_);
1269    }
1270
1271    sub start_for {
1272        my $self = shift;
1273        my $addr = refaddr $self;
1274
1275        $in_for{$addr} = 1;
1276        return $self->SUPER::start_for(@_);
1277    }
1278
1279    sub end_for {
1280        my $self = shift;
1281        my $addr = refaddr $self;
1282
1283        $in_for{$addr} = 0;
1284        return $self->SUPER::end_for(@_);
1285    }
1286
1287    sub hyperlink {
1288        my ($self, $link) = @_;
1289
1290        if ($link && $link->type eq 'pod') {
1291            my $page = $link->page;
1292            my $node = $link->node;
1293
1294            # If the hyperlink is to an interior node of another page, save it
1295            # so that we can see if we need to parse normally skipped files.
1296            $has_referred_to_node{$page} = 1 if $node;
1297
1298            # Ignore certain placeholder links in perldelta.  Check if the
1299            # link is page-level, and also check if to a node within the page
1300            if (   $self->name && $self->name eq "perldelta"
1301                && ((  grep { $page eq $_ } @perldelta_ignore_links)
1302                    || (   $node
1303                        && (grep { "$page/$node" eq $_ } @perldelta_ignore_links)
1304            ))) {
1305                return;
1306            }
1307        }
1308
1309        return $self->SUPER::hyperlink($link);
1310    }
1311
1312    sub node {
1313        my $self = shift;
1314        my $text = $_[0];
1315        if($text) {
1316            $text =~ s/\s+$//s; # strip trailing whitespace
1317            $text =~ s/\s+/ /gs; # collapse whitespace
1318            my $addr = refaddr $self;
1319            push(@{$linkable_nodes{$addr}}, $text) if
1320                                    ! $current_indent{$addr}
1321                                    || $linkable_item{$addr};
1322        }
1323        return $self->SUPER::node($_[0]);
1324    }
1325
1326    sub get_current_indent {
1327        return $INDENT + $current_indent{refaddr $_[0]};
1328    }
1329
1330    sub get_filename {
1331        return $filename{refaddr $_[0]};
1332    }
1333
1334    sub linkable_nodes {
1335        my $linkables = $linkable_nodes{refaddr $_[0]};
1336        return undef unless $linkables;
1337        return @$linkables;
1338    }
1339
1340    sub get_skip {
1341        return $skip{refaddr $_[0]} // 0;
1342    }
1343
1344    sub set_skip {
1345        my $self = shift;
1346        $skip{refaddr $self} = shift;
1347
1348        # If skipping, no need to keep the problems for it
1349        delete $problems{$self->get_filename};
1350        return;
1351    }
1352
1353    sub parse_from_file {
1354        # This overrides the super class method so that if an open fails on a
1355        # transitory file, it doesn't croak.  It returns 1 if it did find the
1356        # file, 0 if it didn't
1357
1358        my $self = shift;
1359        my $filename = shift;
1360        # ignores 2nd param, which is output file.  Always uses undef
1361
1362        if (open my $in_fh, '<:bytes', $filename) {
1363            $self->SUPER::parse_from_file($in_fh, undef);
1364            close $in_fh;
1365            return 1;
1366        }
1367
1368        # If couldn't open file, perhaps it was transitory, and hence not an error
1369        return 0 unless -e $filename;
1370
1371        die "Can't open '$filename': $!\n";
1372    }
1373}
1374
1375my %filename_to_checker; # Map a filename to its pod checker object
1376my %id_to_checker;       # Map a checksum to its pod checker object
1377my %nodes;               # key is filename, values are nodes in that file.
1378my %nodes_first_word;    # same, but value is first word of each node
1379my %valid_modules;       # List of modules known to exist outside us.
1380my %digests;             # checksums of files, whose names are the keys
1381my %filename_to_pod;     # Map a filename to its pod NAME
1382my %files_with_unknown_issues;
1383my %files_with_fixes;
1384
1385my $data_fh;
1386open $data_fh, '<:bytes', $known_issues or die "Can't open $known_issues";
1387
1388my %counts; # For --counts param, count of each issue type
1389my %suppressed_files;   # Files with at least one issue type to suppress
1390my $HEADER = <<END;
1391# This file is the data file for $0.
1392# There are three types of lines.
1393# Comment lines are white-space only or begin with a '#', like this one.  Any
1394#   changes you make to the comment lines will be lost when the file is
1395#   regen'd.
1396# Lines without tab characters are simply NAMES of pods that the program knows
1397#   will have links to them and the program does not check if those links are
1398#   valid.
1399# All other lines should have three fields, each separated by a tab.  The
1400#   first field is the name of a pod; the second field is an error message
1401#   generated by this program; and the third field is a count of how many
1402#   known instances of that message there are in the pod.  -1 means that the
1403#   program can expect any number of this type of message.
1404END
1405
1406my @existing_issues;
1407
1408
1409while (<$data_fh>) {    # Read the database
1410    chomp;
1411    next if /^\s*(?:#|$)/;          # Skip comment and empty lines
1412    next if /^ [ < = > ]{7} /xx;    # Skip version control conflict markers
1413    if (/\t/) {
1414        if ($add_link) {    # The issues are saved and later output unchanged
1415            push @existing_issues, $_;
1416            next;
1417        }
1418
1419        # Keep track of counts of each issue type for each file
1420        my ($filename, $message, $count) = split /\t/;
1421
1422        # The way things aren't shown is to see if the count of the number of
1423        # warnings of a given type has changed.  To show all, we pretend there
1424        # weren't any already stored.  If the stored value is negative, it
1425        # means counting for this warning in this file is disabled, and hence
1426        # won't change.  To skip showing those files under --show-all, we
1427        # retain the negatvie value.  To show all occurrences of other
1428        # warnings, we skip setting their count, making them appear to have
1429        # had zero occurrences.
1430        next if $show_all && $count > 0;
1431
1432        $known_problems{$filename}{$message} = $count;
1433
1434        if ($show_counts) {
1435            if ($count < 0) {   # -1 means to suppress this issue type
1436                $suppressed_files{$filename} = $filename;
1437            }
1438            else {
1439                $counts{$message} += $count;
1440            }
1441        }
1442    }
1443    else {  # Lines without a tab are modules known to be valid
1444        $valid_modules{$_} = 1
1445    }
1446}
1447close $data_fh;
1448
1449if ($add_link) {
1450    $copy_fh = open_new($known_issues);
1451
1452    # Check for basic sanity, and add each command line argument
1453    foreach my $module (@files) {
1454        die "\"$module\" does not look like a module or man page"
1455            # Must look like (A or A::B or A::B::C ..., or foo(3C)
1456            if $module !~ /^ (?: \w+ (?: :: \w+ )* | \w+ \( \d \w* \) ) $/x;
1457        $valid_modules{$module} = 1
1458    }
1459    my_safer_print($copy_fh, $HEADER);
1460    foreach (sort { lc $a cmp lc $b } keys %valid_modules) {
1461        my_safer_print($copy_fh, $_, "\n");
1462    }
1463
1464    # The rest of the db file is output unchanged.
1465    my_safer_print($copy_fh, join "\n", @existing_issues, "");
1466
1467    close_and_rename($copy_fh);
1468    exit;
1469}
1470
1471if ($show_counts) {
1472    my $total = 0;
1473    foreach my $message (sort keys %counts) {
1474        $total += $counts{$message};
1475        note(Text::Tabs::expand("$counts{$message}\t$message"));
1476    }
1477    note("-----\n" . Text::Tabs::expand("$total\tknown potential issues"));
1478    if (%suppressed_files) {
1479        note("\nFiles that have all messages of at least one type suppressed:");
1480        note(join ", ", sort keys %suppressed_files);
1481    }
1482    exit 0;
1483}
1484
1485# re to match files that are to be parsed only if there is an internal link
1486# to them.  It does not include cpan, as whether those are parsed depends
1487# on a switch.  Currently, only perltoc and the stable perldelta.pod's
1488# are included.  The latter all have characters between 'perl' and
1489# 'delta'.  (Actually the currently developed one matches as well, but
1490# is a duplicate of perldelta.pod, so can be skipped, so fine for it to
1491# match this.
1492my $only_for_interior_links_re = qr/ ^ pod\/perltoc.pod $
1493                                   /x;
1494unless ($do_deltas) {
1495    $only_for_interior_links_re = qr/$only_for_interior_links_re |
1496                                    \b perl \d+ delta \. pod \b
1497                                /x;
1498}
1499
1500{ # Closure
1501    my $first_time = 1;
1502
1503    sub output_thanks ($$$$) {  # Called when an issue has been fixed
1504        my $filename = shift;
1505        my $original_count = shift;
1506        my $current_count = shift;
1507        my $message = shift;
1508
1509        $files_with_fixes{$filename} = 1;
1510        my $return;
1511        my $fixed_count = $original_count - $current_count;
1512        my $a_problem = ($fixed_count == 1) ? "a problem" : "multiple problems";
1513        my $another_problem = ($fixed_count == 1) ? "another problem" : "another set of problems";
1514        my $diff;
1515        if ($message) {
1516            $diff = <<EOF;
1517There were $original_count occurrences (now $current_count) in this pod of type
1518"$message",
1519EOF
1520        } else {
1521            $diff = <<EOF;
1522There are no longer any problems found in this pod!
1523EOF
1524        }
1525
1526        if ($first_time) {
1527            $first_time = 0;
1528            $return = <<EOF;
1529Thanks for fixing $a_problem!
1530$diff
1531Now you must teach $0 that this was fixed.
1532EOF
1533        }
1534        else {
1535            $return = <<EOF
1536Thanks for fixing $another_problem.
1537$diff
1538EOF
1539        }
1540
1541        return $return;
1542    }
1543}
1544
1545sub my_safer_print {    # print, with error checking for outputting to db
1546    my ($fh, @lines) = @_;
1547
1548    if (! print $fh @lines) {
1549        my $save_error = $!;
1550        close($fh);
1551        die "Write failure: $save_error";
1552    }
1553}
1554
1555sub extract_pod {   # Extracts just the pod from a file; returns undef if file
1556                    # doesn't exist
1557    my $filename = shift;
1558
1559    if (open my $in_fh, '<:bytes', $filename) {
1560        use Pod::Simple::JustPod;
1561        my $parser = Pod::Simple::JustPod->new();
1562        $parser->no_errata_section(1);
1563        $parser->no_whining(1);
1564        $parser->source_filename($filename);
1565        my $output;
1566        $parser->output_string( \$output );
1567        $parser->parse_lines( <$in_fh>, undef );
1568        close $in_fh;
1569
1570        return $output;
1571    }
1572
1573    # The file should already have been opened once to get here, so if that
1574    # fails, something is wrong.  It's possible that a transitory file
1575    # containing a pod would get here, so if the file no longer exists just
1576    # return undef.
1577    return unless -e $filename;
1578    die "Can't open '$filename': $!\n";
1579}
1580
1581my $digest = Digest->new($digest_type);
1582
1583# This is used as a callback from File::Find::find(), which always constructs
1584# pathnames using Unix separators
1585sub is_pod_file {
1586    # If $_ is a pod file, add it to the lists and do other prep work.
1587
1588    if (-d) {
1589        # Don't look at files in directories that are for tests, nor those
1590        # beginning with a dot
1591        if (m!/t\z! || m!/\.!) {
1592            $File::Find::prune = 1;
1593        }
1594        return;
1595    }
1596
1597    return unless -r && -s;    # Can't check it if can't read it; no need to
1598                               # check if 0 length
1599    return unless -f || -l;    # Weird file types won't be pods
1600
1601    my ($leaf) = m!([^/]+)\z!;
1602    if (m!/\.!                 # No hidden Unix files
1603        || $leaf =~ $non_pods) {
1604        note("Not considering $_") if DEBUG;
1605        return;
1606    }
1607
1608    my $filename = $File::Find::name;
1609
1610    # $filename is relative, like './path'.  Strip that initial part away.
1611    $filename =~ s!^\./!! or die 'Unexpected pathname "$filename"';
1612
1613    return if $excluded_files{canonicalize($filename)};
1614
1615    my $contents = do {
1616        local $/;
1617        my $candidate;
1618        if (! open $candidate, '<:bytes', $_) {
1619
1620            # If a transitory file was found earlier, the open could fail
1621            # legitimately and we just skip the file; also skip it if it is a
1622            # broken symbolic link, as it is probably just a build problem;
1623            # certainly not a file that we would want to check the pod of.
1624            # Otherwise fail it here and no reason to process it further.
1625            # (But the test count will be off too)
1626            ok(0, "Can't open '$filename': $!")
1627                                            if -r $filename && ! -l $filename;
1628            return;
1629        }
1630        <$candidate>;
1631    };
1632
1633    # If the file is a .pm or .pod, having any initial '=' on a line is
1634    # grounds for testing it.  Otherwise, require a head1 NAME line to
1635    # consider it as a potential pod
1636    if ($filename =~ /\.(?:pm|pod)/) {
1637        return unless $contents =~ /^=/m;
1638    } else {
1639        return unless $contents =~ /^=head1 +NAME/m;
1640    }
1641
1642    # Here, we know that the file is a pod.  Add it to the list of files
1643    # to check and create a checker object for it.
1644
1645    push @files, $filename;
1646    my $checker = My::Pod::Checker->new($filename);
1647    $filename_to_checker{$filename} = $checker;
1648
1649    # In order to detect duplicate pods and only analyze them once, we
1650    # compute checksums for the file, so don't have to do an exact
1651    # compare.  Note that if the pod is just part of the file, the
1652    # checksums can differ for the same pod.  That special case is handled
1653    # later, since if the checksums of the whole file are the same, that
1654    # case won't even come up.  We don't need the checksums for files that
1655    # we parse only if there is a link to its interior, but we do need its
1656    # NAME, which is also retrieved in the code below.
1657
1658    if ($filename =~ / (?: ^(cpan|lib|ext|dist)\/ )
1659                        | $only_for_interior_links_re
1660                    /x)
1661    {
1662        my $byte_contents = $contents;
1663        utf8::encode($byte_contents);
1664        $digest->add($byte_contents);   # Doesn't handle Unicode
1665        $digests{$filename} = $digest->digest;
1666
1667        # lib files aren't analyzed if they are duplicates of files copied
1668        # there from some other directory.  But to determine this, we need
1669        # to know their NAMEs.  We might as well find the NAME now while
1670        # the file is open.  Similarly, cpan files aren't analyzed unless
1671        # we're analyzing all of them, or this particular file is linked
1672        # to by a file we are analyzing, and thus we will want to verify
1673        # that the target exists in it.  We need to know at least the NAME
1674        # to see if it's worth analyzing, or so we can determine if a lib
1675        # file is a copy of a cpan one.
1676        if ($filename =~ m{ (?: ^ (?: cpan | lib ) / )
1677                            | $only_for_interior_links_re
1678                            }x) {
1679            if ($contents =~ /^=head1 +NAME.*/mg) {
1680                # The NAME is the first non-spaces on the line up to a
1681                # comma, dash or end of line.  Otherwise, it's invalid and
1682                # this pod doesn't have a legal name that we're smart
1683                # enough to find currently.  But the  parser will later
1684                # find it if it thinks there is a legal name, and set the
1685                # name
1686                if ($contents =~ /\G    # continue from the line after =head1
1687                                  \s*   # ignore any empty lines
1688
1689                                  # ignore =for paragraphs followed by empty
1690                                  # lines
1691                                  (?: ^ =for .*? \n (?: [^\s]*? \n )* \s* )*
1692
1693                                  ^ \s* ( \S+?) \s* (?: [,-] | $ )/mx) {
1694                    my $name = $1;
1695                    $checker->name($name);
1696                    $id_to_checker{$name} = $checker
1697                        if $filename =~ m{^cpan/};
1698                }
1699            }
1700            elsif ($filename =~ m{^cpan/}) {
1701                $id_to_checker{$digests{$filename}} = $checker;
1702            }
1703        }
1704    }
1705
1706    return;
1707} # End of is_pod_file()
1708
1709# Start of real code that isn't processing the command line (except the
1710# db is read in above, as is processing of the --add-link option).
1711# Here, @files contains list of files on the command line.  If have any of
1712# these, unconditionally test them, and show all the errors, even the known
1713# ones, and, since not testing other pods, don't do cross-pod link tests.
1714# (Could add extra code to do cross-pod tests for the ones in the list.)
1715
1716if ($has_input_files) {
1717    undef %known_problems;
1718    $do_upstream_cpan = $do_deltas = 1;  # In case one of the inputs is one
1719                                         # of these types
1720}
1721else { # No input files -- go find all the possibilities.
1722    if ($regen) {
1723        $copy_fh = open_new($known_issues);
1724        note("Regenerating $known_issues, please be patient...");
1725        print $copy_fh $HEADER;
1726    }
1727
1728    # Move to the directory above us, but have to adjust @INC to account for
1729    # that.
1730    s{^\.\./lib$}{lib} for @INC;
1731    chdir File::Spec->updir;
1732
1733    # And look in this directory and all its subdirectories
1734    find( {wanted => \&is_pod_file, no_chdir => 1}, '.');
1735
1736    # Add ourselves to the test
1737    push @files, "t/porting/podcheck.t";
1738}
1739
1740# Now we know how many tests there will be.
1741plan (tests => scalar @files) if ! $regen;
1742
1743
1744# Sort file names so we get consistent results, and to put cpan last,
1745# preceded by the ones that we don't generally parse.  This is because both
1746# these classes are generally parsed only if there is a link to the interior
1747# of them, and we have to parse all others first to guarantee that they don't
1748# have such a link. 'lib' files come just before these, as some of these are
1749# duplicates of others.  We already have figured this out when gathering the
1750# data as a special case for all such files, but this, while unnecessary,
1751# puts the derived file last in the output.  'readme' files come before those,
1752# as those also could be duplicates of others, which are considered the
1753# primary ones.  These currently aren't figured out when gathering data, so
1754# are done here.
1755@files = sort { if ($a =~ /^cpan/) {
1756                   return 1 if $b !~ /^cpan/;
1757                   return lc $a cmp lc $b;
1758               }
1759               elsif ($b =~ /^cpan/) {
1760                   return -1;
1761               }
1762               elsif ($a =~ /$only_for_interior_links_re/) {
1763                   return 1 if $b !~ /$only_for_interior_links_re/;
1764                   return lc $a cmp lc $b;
1765               }
1766               elsif ($b =~ /$only_for_interior_links_re/) {
1767                   return -1;
1768               }
1769               elsif ($a =~ /^lib/) {
1770                   return 1 if $b !~ /^lib/;
1771                   return lc $a cmp lc $b;
1772               }
1773               elsif ($b =~ /^lib/) {
1774                   return -1;
1775               } elsif ($a =~ /\breadme\b/i) {
1776                   return 1 if $b !~ /\breadme\b/i;
1777                   return lc $a cmp lc $b;
1778               }
1779               elsif ($b =~ /\breadme\b/i) {
1780                   return -1;
1781               }
1782               else {
1783                   return lc $a cmp lc $b;
1784               }
1785           }
1786           @files;
1787
1788# Now go through all the files and parse them
1789FILE:
1790foreach my $filename (@files) {
1791    my $parsed = 0;
1792    note("parsing $filename") if DEBUG;
1793
1794    # We may have already figured out some things in the process of generating
1795    # the file list.  If so, we have a $checker object already.  But if not,
1796    # generate one now.
1797    my $checker = $filename_to_checker{$filename};
1798    if (! $checker) {
1799        $checker = My::Pod::Checker->new($filename);
1800        $filename_to_checker{$filename} = $checker;
1801    }
1802
1803    # We have set the name in the checker object if there is a possibility
1804    # that no further parsing is necessary, but otherwise do the parsing now.
1805    if (! $checker->name) {
1806        if (! $checker->parse_from_file($filename, undef)) {
1807            $checker->set_skip("$filename is transitory");
1808            next FILE;
1809        }
1810        $parsed = 1;
1811    }
1812
1813    if ($checker->num_errors() < 0) {   # Returns negative if not a pod
1814        $checker->set_skip("$filename is not a pod");
1815    }
1816    else {
1817
1818        # Here, is a pod.  See if it is one that has already been tested,
1819        # or should be tested under another directory.  Use either its NAME
1820        # if it has one, or a checksum if not.
1821        my $name = $checker->name;
1822        my $id;
1823
1824        if ($name) {
1825            $id = $name;
1826        }
1827        else {
1828            my $digest = Digest->new($digest_type);
1829            my $contents = extract_pod($filename);
1830
1831            # If the return is undef, it means that $filename was a transitory
1832            # file; skip it.
1833            next FILE unless defined $contents;
1834            my $byte_contents = $contents;
1835            utf8::encode($byte_contents);
1836            $digest->add($byte_contents);   # Doesn't handle Unicode
1837            $id = $digest->digest;
1838        }
1839
1840        # If there is a match for this pod with something that we've already
1841        # processed, don't process it, and output why.
1842        my $prior_checker;
1843        if (defined ($prior_checker = $id_to_checker{$id})
1844            && $prior_checker != $checker)  # Could have defined the checker
1845                                            # earlier without pursuing it
1846        {
1847
1848            # If the pods are identical, then it's just a copy, and isn't an
1849            # error.  First use the checksums we have already computed to see
1850            # if the entire files are identical, which means that the pods are
1851            # identical too.
1852            my $prior_filename = $prior_checker->get_filename;
1853            my $same = (! $name
1854                        || ($digests{$prior_filename}
1855                            && $digests{$filename}
1856                            && $digests{$prior_filename} eq $digests{$filename}));
1857
1858            # If they differ, it could be that the files differ for some
1859            # reason, but the pods they contain are identical.  Extract the
1860            # pods and do the comparisons on just those.
1861            if (! $same && $name) {
1862                my $contents = extract_pod($filename);
1863
1864                # If return is <undef>, it means that $filename no longer
1865                # exists.  This means it was a transitory file, and should not
1866                # be tested.
1867                next FILE unless defined $contents;
1868
1869                my $prior_contents = extract_pod($prior_filename);
1870
1871                # If return is <undef>, it means that $prior_filename no
1872                # longer exists.  This means it was a transitory file, and
1873                # should not have been tested, but we already did process it.
1874                # What we should do now is to back-out its records, and
1875                # process $filename in its stead.  But backing out is not so
1876                # simple, and so I'm (khw) skipping that unless and until
1877                # experience shows that it is needed.  We do go process
1878                # $filename, and there are potential false positive conflicts
1879                # with the transitory $prior_contents, and rerunning the test
1880                # should cause it to succeed.
1881                goto process_this_pod unless defined $prior_contents;
1882
1883                $same = $prior_contents eq $contents;
1884            }
1885
1886            use File::Basename 'basename';
1887            if ($same) {
1888                $checker->set_skip("The pod of $filename is a duplicate of "
1889                                    . "the pod for $prior_filename");
1890            } elsif ($prior_filename =~ /\breadme\b/i) {
1891                $checker->set_skip("$prior_filename is a README apparently for $filename");
1892            } elsif ($filename =~ /\breadme\b/i) {
1893                $checker->set_skip("$filename is a README apparently for $prior_filename");
1894            } elsif (! $do_upstream_cpan
1895                     && $filename =~ /^cpan/
1896                     && $prior_filename =~ /^cpan/)
1897            {
1898                $checker->set_skip("CPAN is upstream for $filename");
1899            } elsif ( $filename =~ /^utils/ or $prior_filename =~ /^utils/ ) {
1900                $checker->set_skip("$filename copy is in utils/");
1901            } elsif ($prior_filename =~ /^(?:cpan|ext|dist)/
1902                     && $filename !~ /^(?:cpan|ext|dist)/
1903                     && basename($prior_filename) eq basename($filename))
1904            {
1905                $checker->set_skip("$filename: Need to run make?");
1906            } else { # Here have two pods with identical names that differ
1907                $prior_checker->poderror(
1908                        { -msg => $duplicate_name,
1909                            -line => "???",
1910                            parameter => "'$filename' also has NAME '$name'"
1911                        });
1912                $checker->poderror(
1913                    { -msg => $duplicate_name,
1914                        -line => "???",
1915                        parameter => "'$prior_filename' also has NAME '$name'"
1916                    });
1917
1918                # Changing the names helps later.
1919                $prior_checker->name("$name version arbitrarily numbered 1");
1920                $checker->name("$name version arbitrarily numbered 2");
1921            }
1922
1923            # In any event, don't process this pod that has the same name as
1924            # another.
1925            next FILE;
1926        }
1927
1928    process_this_pod:
1929
1930        # A unique pod.
1931        $id_to_checker{$id} = $checker;
1932
1933        my $parsed_for_links = ", but parsed for its interior links";
1934        if ((! $do_upstream_cpan && $filename =~ /^cpan/)
1935             || $filename =~ $only_for_interior_links_re)
1936        {
1937            if ($filename =~ /^cpan/) {
1938                $checker->set_skip("CPAN is upstream for $filename");
1939            }
1940            elsif ($filename =~ /perl\d+delta/) {
1941                if (! $do_deltas) {
1942                    $checker->set_skip("$filename is a stable perldelta");
1943                }
1944            }
1945            elsif ($filename =~ /perltoc/) {
1946                $checker->set_skip("$filename dependent on component pods");
1947            }
1948            else {
1949                croak("Unexpected file '$filename' encountered that has parsing for interior-linking only");
1950            }
1951
1952            if ($name && $has_referred_to_node{$name}) {
1953                $checker->set_skip($checker->get_skip() . $parsed_for_links);
1954            }
1955        }
1956
1957        # Need a name in order to process it, because not meaningful
1958        # otherwise, and also can't test links to this without a name.
1959        if (!defined $name) {
1960            $checker->poderror( { -msg => $no_name,
1961                                  -line => '???'
1962                                });
1963            next FILE;
1964        }
1965
1966        # For skipped files, just get its NAME
1967        my $skip;
1968        if (($skip = $checker->get_skip()) && $skip !~ /$parsed_for_links/)
1969        {
1970            $checker->node($name) if $name;
1971        }
1972        elsif (! $parsed) {
1973            if (! $checker->parse_from_file($filename, undef)) {
1974                $checker->set_skip("$filename is transitory");
1975                next FILE;
1976            }
1977        }
1978
1979        # Go through everything in the file that could be an anchor that
1980        # could be a link target.  Count how many there are of the same name.
1981        foreach my $node ($checker->linkable_nodes) {
1982            next FILE if ! $node;        # Can be empty is like '=item *'
1983            $nodes{$name}{$node}++;
1984
1985            # Experiments have shown that cpan search can figure out the
1986            # target of a link even if the exact wording is incorrect, as long
1987            # as the first word is.  This happens frequently in perlfunc.pod,
1988            # where the link will be just to the function, but the target
1989            # entry also includes parameters to the function.
1990            my $first_word = $node;
1991            if ($first_word =~ s/^(\S+)\s+\S.*/$1/) {
1992                $nodes_first_word{$name}{$first_word} = $node;
1993            }
1994        }
1995        $filename_to_pod{$filename} = $name;
1996    }
1997}
1998
1999# Here, all files have been parsed, and all links and link targets are stored.
2000# Now go through the files again and see which don't have matches.
2001if (! $has_input_files) {
2002    foreach my $filename (@files) {
2003        next if $filename_to_checker{$filename}->get_skip;
2004
2005        my $checker = $filename_to_checker{$filename};
2006        foreach my $link ($checker->hyperlinks()) {
2007            my $linked_to_page = $link->page;
2008            next unless $linked_to_page;   # intra-file checks are handled by std
2009                                           # Pod::Checker
2010            # Currently, we assume all external links are valid
2011            next if $link->type eq 'url';
2012
2013            # Initialize the potential message.
2014            my %problem = ( -msg => $broken_link,
2015                            -line => $link->line,
2016                            parameter => "to \"$linked_to_page\"",
2017                        );
2018
2019            # See if we have found the linked-to_file in our parse
2020            if (exists $nodes{$linked_to_page}) {
2021                my $node = $link->node;
2022
2023                # If link is only to the page-level, already have it
2024                next if ! $node;
2025
2026                # If link is to a node that exists in the file, is ok
2027                if ($nodes{$linked_to_page}{$node}) {
2028
2029                    # But if the page has multiple targets with the same name,
2030                    # it's ambiguous which one this should be to.
2031                    if ($nodes{$linked_to_page}{$node} > 1) {
2032                        $problem{-msg} = $multiple_targets;
2033                        $problem{parameter} = "in $linked_to_page that $node could be pointing to";
2034                        $checker->poderror(\%problem);
2035                    }
2036                } elsif (! $nodes_first_word{$linked_to_page}{$node}) {
2037
2038                    # Here the link target was not found, either exactly or to
2039                    # the first word.  Is an error.
2040                    $problem{parameter} =~ s,"$,/$node",;
2041                    $checker->poderror(\%problem);
2042                }
2043
2044            } # Linked-to-file not in parse; maybe is in exception list
2045            elsif (! exists $valid_modules{$link->page}) {
2046
2047                # Here, is a link to a target that we can't find.  Check if
2048                # there is an internal link on the page with the target name.
2049                # If so, it could be that they just forgot the initial '/'
2050                # But perldelta is handled specially: only do this if the
2051                # broken link isn't one of the known bad ones (that are
2052                # placemarkers and should be removed for the final)
2053                my $NAME = $filename_to_pod{$filename};
2054                if (! defined $NAME) {
2055                    $checker->poderror(\%problem);
2056                }
2057                else {
2058                    if ($nodes{$NAME}{$linked_to_page}) {
2059                        $problem{-msg} =  $broken_internal_link;
2060                    }
2061                    $checker->poderror(\%problem);
2062                }
2063            }
2064        }
2065    }
2066}
2067
2068# If regenerating the data file, start with the modules for which we don't
2069# check targets.  If you change the sort order, you need to run --regen before
2070# committing so that future commits that do run regen don't show irrelevant
2071# changes.
2072if ($regen) {
2073    foreach (sort { lc $a cmp lc $b } keys %valid_modules) {
2074        my_safer_print($copy_fh, $_, "\n");
2075    }
2076}
2077
2078# Now ready to output the messages.
2079foreach my $filename (@files) {
2080    my $canonical = canonicalize($filename);
2081    SKIP: {
2082        my $skip = $filename_to_checker{$filename}->get_skip // "";
2083
2084        if ($regen) {
2085            foreach my $message ( sort keys %{$problems{$filename}}) {
2086                my $count;
2087
2088                # Preserve a negative setting.
2089                if ($known_problems{$canonical}{$message}
2090                    && $known_problems{$canonical}{$message} < 0)
2091                {
2092                    $count = $known_problems{$canonical}{$message};
2093                }
2094                else {
2095                    $count = @{$problems{$filename}{$message}};
2096                }
2097                my_safer_print($copy_fh, $canonical . "\t$message\t$count\n");
2098            }
2099            next;
2100        }
2101
2102        skip($skip, 1) if $skip;
2103        my @diagnostics;
2104        my $thankful_diagnostics = 0;
2105        my $indent = '  ';
2106
2107        my $total_known = 0;
2108        foreach my $message ( sort keys %{$problems{$filename}}) {
2109            $known_problems{$canonical}{$message} = 0
2110                                    if ! $known_problems{$canonical}{$message};
2111            my $diagnostic = "";
2112            my $problem_count = scalar @{$problems{$filename}{$message}};
2113            $total_known += $problem_count;
2114            next if $known_problems{$canonical}{$message} < 0;
2115
2116            # If we have new problems not previously known, we output all of
2117            # such problems, as we can't know which are really new and which
2118            # not
2119            if ($problem_count > $known_problems{$canonical}{$message}) {
2120
2121                # Here we are about to output all the messages for this type,
2122                # subtract back this number we previously added in.
2123                $total_known -= $problem_count;
2124
2125                $diagnostic .= $indent . qq{"$message"};
2126                if ($problem_count > 2) {
2127                    $diagnostic .= "  ($problem_count occurrences,"
2128			. " expected $known_problems{$canonical}{$message})";
2129                }
2130                foreach my $problem (@{$problems{$filename}{$message}}) {
2131                    $diagnostic .= " " if $problem_count == 1;
2132                    $diagnostic .= "\n$indent$indent";
2133                    $diagnostic .= "$problem->{parameter}" if $problem->{parameter};
2134                    $diagnostic .= " near line $problem->{-line} of "
2135                                   . $filename;
2136                    $diagnostic .= " $problem->{comment}" if $problem->{comment};
2137                }
2138                $diagnostic .= "\n";
2139                $files_with_unknown_issues{$filename} = 1;
2140            } elsif ($problem_count < $known_problems{$canonical}{$message}) {
2141               $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, $problem_count, $message);
2142               $thankful_diagnostics++;
2143            }
2144            push @diagnostics, $diagnostic if $diagnostic;
2145        }
2146
2147        # The above loop has output messages where there are current potential
2148        # issues.  But it misses where there were some that have been entirely
2149        # fixed.  For those, we need to look through the old issues
2150        foreach my $message ( sort keys %{$known_problems{$canonical}}) {
2151            next if $problems{$filename}{$message};
2152            next if ! $known_problems{$canonical}{$message};
2153            next if $known_problems{$canonical}{$message} < 0; # Preserve negs
2154
2155            next if !$pedantic and $message =~
2156                /^(?:\Q$line_length\E|\Q$C_not_linked\E|\Q$C_with_slash\E)/;
2157
2158            my $diagnostic = output_thanks($filename, $known_problems{$canonical}{$message}, 0, $message);
2159            push @diagnostics, $diagnostic if $diagnostic;
2160            $thankful_diagnostics++ if $diagnostic;
2161        }
2162
2163        my $output = "POD of $filename";
2164        $output .= ", excluding $total_known not shown known potential problems"
2165                                                                if $total_known;
2166        if (@diagnostics && @diagnostics == $thankful_diagnostics) {
2167            # Output fixed issues as passing to-do tests, so they do not
2168            # cause failures, but t/harness still flags them.
2169            $output .= " # TODO"
2170        }
2171        ok(@diagnostics == $thankful_diagnostics, $output);
2172        if (@diagnostics) {
2173            diag(join "", @diagnostics,
2174            "See end of this test output for your options on silencing this");
2175        }
2176
2177        delete $known_problems{$canonical};
2178    }
2179}
2180
2181if (! $regen
2182    && ! ok (keys %known_problems == 0, "The known problems database ($data_dir/known_pod_issues.dat) includes no references to non-existent files"))
2183{
2184    note("The following files were not found: "
2185         . join ", ", sort keys %known_problems);
2186    note("They will automatically be removed from the db the next time");
2187    note("  cd t; ./perl -I../lib porting/podcheck.t --regen");
2188    note("is run");
2189}
2190
2191my $how_to = <<EOF;
2192   run this test script by hand, using the following formula (on
2193   Un*x-like machines):
2194        cd t
2195        ./perl -I../lib porting/podcheck.t --regen
2196EOF
2197
2198if (%files_with_unknown_issues) {
2199    my $were_count_files = scalar keys %files_with_unknown_issues;
2200    $were_count_files = ($were_count_files == 1)
2201                        ? "was $were_count_files file"
2202                        : "were $were_count_files files";
2203    my $message = <<EOF;
2204
2205HOW TO GET ${\__FILE__} TO PASS
2206
2207There $were_count_files that had new potential problems identified.
2208Some of them may be real, and some of them may be false positives because
2209this program isn't as smart as it likes to think it is.  You can teach this
2210program to ignore the issues it has identified, and hence pass, by doing the
2211following:
2212
22131) If a problem is about a link to an unknown module or man page that
2214   you know exists, re-run the command something like:
2215      ./perl -I../lib porting/podcheck.t --add-link MODULE man_page ...
2216   (MODULEs should look like Foo::Bar, and man_pages should look like
2217   bar(3c); don't do this for a module or man page that you aren't sure
2218   about; instead treat as another type of issue and follow the
2219   instructions below.)
2220
22212) For other issues, decide if each should be fixed now or not.  Fix the
2222   ones you decided to, and rerun this test to verify that the fixes
2223   worked.
2224
22253) If there remain false positive or problems that you don't plan to fix right
2226   now,
2227$how_to
2228   That should cause all current potential problems to be accepted by
2229   the program, so that the next time it runs, they won't be flagged.
2230EOF
2231    if (%files_with_fixes) {
2232        $message .= "   This step will also take care of the files that have fixes in them\n";
2233    }
2234
2235    $message .= <<EOF;
2236   For a few files, such as perltoc, certain issues will always be
2237   expected, and more of the same will be added over time.  For those,
2238   before you do the regen, you can edit
2239   $known_issues
2240   and find the entry for the module's file and specific error message,
2241   and change the count of known potential problems to -1.
2242EOF
2243
2244    diag($message);
2245} elsif (%files_with_fixes) {
2246    diag(<<EOF
2247To teach this test script that the potential problems have been fixed,
2248$how_to
2249EOF
2250    );
2251}
2252
2253if ($regen) {
2254    chdir $original_t_dir || die "Can't change directories to $original_t_dir";
2255    close_and_rename($copy_fh);
2256}
2257
22581;
2259