1=pod
2
3=head1 NAME
4
5Pod::Simple::XHTML -- format Pod as validating XHTML
6
7=head1 SYNOPSIS
8
9  use Pod::Simple::XHTML;
10
11  my $parser = Pod::Simple::XHTML->new();
12
13  ...
14
15  $parser->parse_file('path/to/file.pod');
16
17=head1 DESCRIPTION
18
19This class is a formatter that takes Pod and renders it as XHTML
20validating HTML.
21
22This is a subclass of L<Pod::Simple::Methody> and inherits all its
23methods. The implementation is entirely different than
24L<Pod::Simple::HTML>, but it largely preserves the same interface.
25
26=head2 Minimal code
27
28  use Pod::Simple::XHTML;
29  my $psx = Pod::Simple::XHTML->new;
30  $psx->output_string(\my $html);
31  $psx->parse_file('path/to/Module/Name.pm');
32  open my $out, '>', 'out.html' or die "Cannot open 'out.html': $!\n";
33  print $out $html;
34
35You can also control the character encoding and entities. For example, if
36you're sure that the POD is properly encoded (using the C<=encoding> command),
37you can prevent high-bit characters from being encoded as HTML entities and
38declare the output character set as UTF-8 before parsing, like so:
39
40  $psx->html_charset('UTF-8');
41  $psx->html_encode_chars(q{&<>'"});
42
43=cut
44
45package Pod::Simple::XHTML;
46use strict;
47use vars qw( $VERSION @ISA $HAS_HTML_ENTITIES );
48$VERSION = '3.35';
49use Pod::Simple::Methody ();
50@ISA = ('Pod::Simple::Methody');
51
52BEGIN {
53  $HAS_HTML_ENTITIES = eval "require HTML::Entities; 1";
54}
55
56my %entities = (
57  q{>} => 'gt',
58  q{<} => 'lt',
59  q{'} => '#39',
60  q{"} => 'quot',
61  q{&} => 'amp',
62);
63
64sub encode_entities {
65  my $self = shift;
66  my $ents = $self->html_encode_chars;
67  return HTML::Entities::encode_entities( $_[0], $ents ) if $HAS_HTML_ENTITIES;
68  if (defined $ents) {
69      $ents =~ s,(?<!\\)([]/]),\\$1,g;
70      $ents =~ s,(?<!\\)\\\z,\\\\,;
71  } else {
72      $ents = join '', keys %entities;
73  }
74  my $str = $_[0];
75  $str =~ s/([$ents])/'&' . ($entities{$1} || sprintf '#x%X', ord $1) . ';'/ge;
76  return $str;
77}
78
79#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
80
81=head1 METHODS
82
83Pod::Simple::XHTML offers a number of methods that modify the format of
84the HTML output. Call these after creating the parser object, but before
85the call to C<parse_file>:
86
87  my $parser = Pod::PseudoPod::HTML->new();
88  $parser->set_optional_param("value");
89  $parser->parse_file($file);
90
91=head2 perldoc_url_prefix
92
93In turning L<Foo::Bar> into http://whatever/Foo%3a%3aBar, what
94to put before the "Foo%3a%3aBar". The default value is
95"http://search.cpan.org/perldoc?".
96
97=head2 perldoc_url_postfix
98
99What to put after "Foo%3a%3aBar" in the URL. This option is not set by
100default.
101
102=head2 man_url_prefix
103
104In turning C<< L<crontab(5)> >> into http://whatever/man/1/crontab, what
105to put before the "1/crontab". The default value is
106"http://man.he.net/man".
107
108=head2 man_url_postfix
109
110What to put after "1/crontab" in the URL. This option is not set by default.
111
112=head2 title_prefix, title_postfix
113
114What to put before and after the title in the head. The values should
115already be &-escaped.
116
117=head2 html_css
118
119  $parser->html_css('path/to/style.css');
120
121The URL or relative path of a CSS file to include. This option is not
122set by default.
123
124=head2 html_javascript
125
126The URL or relative path of a JavaScript file to pull in. This option is
127not set by default.
128
129=head2 html_doctype
130
131A document type tag for the file. This option is not set by default.
132
133=head2 html_charset
134
135The character set to declare in the Content-Type meta tag created by default
136for C<html_header_tags>. Note that this option will be ignored if the value of
137C<html_header_tags> is changed. Defaults to "ISO-8859-1".
138
139=head2 html_header_tags
140
141Additional arbitrary HTML tags for the header of the document. The
142default value is just a content type header tag:
143
144  <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
145
146Add additional meta tags here, or blocks of inline CSS or JavaScript
147(wrapped in the appropriate tags).
148
149=head3 html_encode_chars
150
151A string containing all characters that should be encoded as HTML entities,
152specified using the regular expression character class syntax (what you find
153within brackets in regular expressions). This value will be passed as the
154second argument to the C<encode_entities> function of L<HTML::Entities>. If
155L<HTML::Entities> is not installed, then any characters other than C<&<>"'>
156will be encoded numerically.
157
158=head2 html_h_level
159
160This is the level of HTML "Hn" element to which a Pod "head1" corresponds.  For
161example, if C<html_h_level> is set to 2, a head1 will produce an H2, a head2
162will produce an H3, and so on.
163
164=head2 default_title
165
166Set a default title for the page if no title can be determined from the
167content. The value of this string should already be &-escaped.
168
169=head2 force_title
170
171Force a title for the page (don't try to determine it from the content).
172The value of this string should already be &-escaped.
173
174=head2 html_header, html_footer
175
176Set the HTML output at the beginning and end of each file. The default
177header includes a title, a doctype tag (if C<html_doctype> is set), a
178content tag (customized by C<html_header_tags>), a tag for a CSS file
179(if C<html_css> is set), and a tag for a Javascript file (if
180C<html_javascript> is set). The default footer simply closes the C<html>
181and C<body> tags.
182
183The options listed above customize parts of the default header, but
184setting C<html_header> or C<html_footer> completely overrides the
185built-in header or footer. These may be useful if you want to use
186template tags instead of literal HTML headers and footers or are
187integrating converted POD pages in a larger website.
188
189If you want no headers or footers output in the HTML, set these options
190to the empty string.
191
192=head2 index
193
194Whether to add a table-of-contents at the top of each page (called an
195index for the sake of tradition).
196
197=head2 anchor_items
198
199Whether to anchor every definition C<=item> directive. This needs to be
200enabled if you want to be able to link to specific C<=item> directives, which
201are output as C<< <dt> >> elements. Disabled by default.
202
203=head2 backlink
204
205Whether to turn every =head1 directive into a link pointing to the top
206of the page (specifically, the opening body tag).
207
208=cut
209
210__PACKAGE__->_accessorize(
211 'perldoc_url_prefix',
212 'perldoc_url_postfix',
213 'man_url_prefix',
214 'man_url_postfix',
215 'title_prefix',  'title_postfix',
216 'html_css',
217 'html_javascript',
218 'html_doctype',
219 'html_charset',
220 'html_encode_chars',
221 'html_h_level',
222 'title', # Used internally for the title extracted from the content
223 'default_title',
224 'force_title',
225 'html_header',
226 'html_footer',
227 'index',
228 'anchor_items',
229 'backlink',
230 'batch_mode', # whether we're in batch mode
231 'batch_mode_current_level',
232    # When in batch mode, how deep the current module is: 1 for "LWP",
233    #  2 for "LWP::Procotol", 3 for "LWP::Protocol::GHTTP", etc
234);
235
236#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
237
238=head1 SUBCLASSING
239
240If the standard options aren't enough, you may want to subclass
241Pod::Simple::XHMTL. These are the most likely candidates for methods
242you'll want to override when subclassing.
243
244=cut
245
246sub new {
247  my $self = shift;
248  my $new = $self->SUPER::new(@_);
249  $new->{'output_fh'} ||= *STDOUT{IO};
250  $new->perldoc_url_prefix('http://search.cpan.org/perldoc?');
251  $new->man_url_prefix('http://man.he.net/man');
252  $new->html_charset('ISO-8859-1');
253  $new->nix_X_codes(1);
254  $new->{'scratch'} = '';
255  $new->{'to_index'} = [];
256  $new->{'output'} = [];
257  $new->{'saved'} = [];
258  $new->{'ids'} = { '_podtop_' => 1 }; # used in <body>
259  $new->{'in_li'} = [];
260
261  $new->{'__region_targets'}  = [];
262  $new->{'__literal_targets'} = {};
263  $new->accept_targets_as_html( 'html', 'HTML' );
264
265  return $new;
266}
267
268sub html_header_tags {
269    my $self = shift;
270    return $self->{html_header_tags} = shift if @_;
271    return $self->{html_header_tags}
272        ||= '<meta http-equiv="Content-Type" content="text/html; charset='
273            . $self->html_charset . '" />';
274}
275
276#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
277
278=head2 handle_text
279
280This method handles the body of text within any element: it's the body
281of a paragraph, or everything between a "=begin" tag and the
282corresponding "=end" tag, or the text within an L entity, etc. You would
283want to override this if you are adding a custom element type that does
284more than just display formatted text. Perhaps adding a way to generate
285HTML tables from an extended version of POD.
286
287So, let's say you want to add a custom element called 'foo'. In your
288subclass's C<new> method, after calling C<SUPER::new> you'd call:
289
290  $new->accept_targets_as_text( 'foo' );
291
292Then override the C<start_for> method in the subclass to check for when
293"$flags->{'target'}" is equal to 'foo' and set a flag that marks that
294you're in a foo block (maybe "$self->{'in_foo'} = 1"). Then override the
295C<handle_text> method to check for the flag, and pass $text to your
296custom subroutine to construct the HTML output for 'foo' elements,
297something like:
298
299  sub handle_text {
300      my ($self, $text) = @_;
301      if ($self->{'in_foo'}) {
302          $self->{'scratch'} .= build_foo_html($text);
303          return;
304      }
305      $self->SUPER::handle_text($text);
306  }
307
308=head2 handle_code
309
310This method handles the body of text that is marked up to be code.
311You might for instance override this to plug in a syntax highlighter.
312The base implementation just escapes the text.
313
314The callback methods C<start_code> and C<end_code> emits the C<code> tags
315before and after C<handle_code> is invoked, so you might want to override these
316together with C<handle_code> if this wrapping isn't suitable.
317
318Note that the code might be broken into multiple segments if there are
319nested formatting codes inside a C<< CE<lt>...> >> sequence.  In between the
320calls to C<handle_code> other markup tags might have been emitted in that
321case.  The same is true for verbatim sections if the C<codes_in_verbatim>
322option is turned on.
323
324=head2 accept_targets_as_html
325
326This method behaves like C<accept_targets_as_text>, but also marks the region
327as one whose content should be emitted literally, without HTML entity escaping
328or wrapping in a C<div> element.
329
330=cut
331
332sub __in_literal_xhtml_region {
333    return unless @{ $_[0]{__region_targets} };
334    my $target = $_[0]{__region_targets}[-1];
335    return $_[0]{__literal_targets}{ $target };
336}
337
338sub accept_targets_as_html {
339    my ($self, @targets) = @_;
340    $self->accept_targets(@targets);
341    $self->{__literal_targets}{$_} = 1 for @targets;
342}
343
344sub handle_text {
345    # escape special characters in HTML (<, >, &, etc)
346    my $text = $_[0]->__in_literal_xhtml_region
347        ? $_[1]
348        : $_[0]->encode_entities( $_[1] );
349
350    if ($_[0]{'in_code'} && @{$_[0]{'in_code'}}) {
351        # Intentionally use the raw text in $_[1], even if we're not in a
352        # literal xhtml region, since handle_code calls encode_entities.
353        $_[0]->handle_code( $_[1], $_[0]{'in_code'}[-1] );
354    } else {
355        if ($_[0]->{in_for}) {
356            my $newlines = $_[0]->__in_literal_xhtml_region ? "\n\n" : '';
357            if ($_[0]->{started_for}) {
358                if ($text =~ /\S/) {
359                    delete $_[0]->{started_for};
360                    $_[0]{'scratch'} .= $text . $newlines;
361                }
362                # Otherwise, append nothing until we have something to append.
363            } else {
364                # The parser sometimes preserves newlines and sometimes doesn't!
365                $text =~ s/\n\z//;
366                $_[0]{'scratch'} .= $text . $newlines;
367            }
368        } else {
369            # Just plain text.
370            $_[0]{'scratch'} .= $text;
371        }
372    }
373
374    $_[0]{htext} .= $text if $_[0]{'in_head'};
375}
376
377sub start_code {
378    $_[0]{'scratch'} .= '<code>';
379}
380
381sub end_code {
382    $_[0]{'scratch'} .= '</code>';
383}
384
385sub handle_code {
386    $_[0]{'scratch'} .= $_[0]->encode_entities( $_[1] );
387}
388
389sub start_Para {
390    $_[0]{'scratch'} .= '<p>';
391}
392
393sub start_Verbatim {
394    $_[0]{'scratch'} = '<pre>';
395    push(@{$_[0]{'in_code'}}, 'Verbatim');
396    $_[0]->start_code($_[0]{'in_code'}[-1]);
397}
398
399sub start_head1 {  $_[0]{'in_head'} = 1; $_[0]{htext} = ''; }
400sub start_head2 {  $_[0]{'in_head'} = 2; $_[0]{htext} = ''; }
401sub start_head3 {  $_[0]{'in_head'} = 3; $_[0]{htext} = ''; }
402sub start_head4 {  $_[0]{'in_head'} = 4; $_[0]{htext} = ''; }
403
404sub start_item_number {
405    $_[0]{'scratch'} = "</li>\n" if ($_[0]{'in_li'}->[-1] && pop @{$_[0]{'in_li'}});
406    $_[0]{'scratch'} .= '<li><p>';
407    push @{$_[0]{'in_li'}}, 1;
408}
409
410sub start_item_bullet {
411    $_[0]{'scratch'} = "</li>\n" if ($_[0]{'in_li'}->[-1] && pop @{$_[0]{'in_li'}});
412    $_[0]{'scratch'} .= '<li><p>';
413    push @{$_[0]{'in_li'}}, 1;
414}
415
416sub start_item_text   {
417    # see end_item_text
418}
419
420sub start_over_bullet { $_[0]{'scratch'} = '<ul>'; push @{$_[0]{'in_li'}}, 0; $_[0]->emit }
421sub start_over_block  { $_[0]{'scratch'} = '<ul>'; $_[0]->emit }
422sub start_over_number { $_[0]{'scratch'} = '<ol>'; push @{$_[0]{'in_li'}}, 0; $_[0]->emit }
423sub start_over_text   {
424    $_[0]{'scratch'} = '<dl>';
425    $_[0]{'dl_level'}++;
426    $_[0]{'in_dd'} ||= [];
427    $_[0]->emit
428}
429
430sub end_over_block  { $_[0]{'scratch'} .= '</ul>'; $_[0]->emit }
431
432sub end_over_number   {
433    $_[0]{'scratch'} = "</li>\n" if ( pop @{$_[0]{'in_li'}} );
434    $_[0]{'scratch'} .= '</ol>';
435    pop @{$_[0]{'in_li'}};
436    $_[0]->emit;
437}
438
439sub end_over_bullet   {
440    $_[0]{'scratch'} = "</li>\n" if ( pop @{$_[0]{'in_li'}} );
441    $_[0]{'scratch'} .= '</ul>';
442    pop @{$_[0]{'in_li'}};
443    $_[0]->emit;
444}
445
446sub end_over_text   {
447    if ($_[0]{'in_dd'}[ $_[0]{'dl_level'} ]) {
448        $_[0]{'scratch'} = "</dd>\n";
449        $_[0]{'in_dd'}[ $_[0]{'dl_level'} ] = 0;
450    }
451    $_[0]{'scratch'} .= '</dl>';
452    $_[0]{'dl_level'}--;
453    $_[0]->emit;
454}
455
456# . . . . . Now the actual formatters:
457
458sub end_Para     { $_[0]{'scratch'} .= '</p>'; $_[0]->emit }
459sub end_Verbatim {
460    $_[0]->end_code(pop(@{$_[0]->{'in_code'}}));
461    $_[0]{'scratch'} .= '</pre>';
462    $_[0]->emit;
463}
464
465sub _end_head {
466    my $h = delete $_[0]{in_head};
467
468    my $add = $_[0]->html_h_level;
469    $add = 1 unless defined $add;
470    $h += $add - 1;
471
472    my $id = $_[0]->idify($_[0]{htext});
473    my $text = $_[0]{scratch};
474    $_[0]{'scratch'} = $_[0]->backlink && ($h - $add == 0)
475                         # backlinks enabled && =head1
476                         ? qq{<a href="#_podtop_"><h$h id="$id">$text</h$h></a>}
477                         : qq{<h$h id="$id">$text</h$h>};
478    $_[0]->emit;
479    push @{ $_[0]{'to_index'} }, [$h, $id, delete $_[0]{'htext'}];
480}
481
482sub end_head1       { shift->_end_head(@_); }
483sub end_head2       { shift->_end_head(@_); }
484sub end_head3       { shift->_end_head(@_); }
485sub end_head4       { shift->_end_head(@_); }
486
487sub end_item_bullet { $_[0]{'scratch'} .= '</p>'; $_[0]->emit }
488sub end_item_number { $_[0]{'scratch'} .= '</p>'; $_[0]->emit }
489
490sub end_item_text   {
491    # idify and anchor =item content if wanted
492    my $dt_id = $_[0]{'anchor_items'}
493                 ? ' id="'. $_[0]->idify($_[0]{'scratch'}) .'"'
494                 : '';
495
496    # reset scratch
497    my $text = $_[0]{scratch};
498    $_[0]{'scratch'} = '';
499
500    if ($_[0]{'in_dd'}[ $_[0]{'dl_level'} ]) {
501        $_[0]{'scratch'} = "</dd>\n";
502        $_[0]{'in_dd'}[ $_[0]{'dl_level'} ] = 0;
503    }
504
505    $_[0]{'scratch'} .= qq{<dt$dt_id>$text</dt>\n<dd>};
506    $_[0]{'in_dd'}[ $_[0]{'dl_level'} ] = 1;
507    $_[0]->emit;
508}
509
510# This handles =begin and =for blocks of all kinds.
511sub start_for {
512  my ($self, $flags) = @_;
513
514  push @{ $self->{__region_targets} }, $flags->{target_matching};
515  $self->{started_for} = 1;
516  $self->{in_for} = 1;
517
518  unless ($self->__in_literal_xhtml_region) {
519    $self->{scratch} .= '<div';
520    $self->{scratch} .= qq( class="$flags->{target}") if $flags->{target};
521    $self->{scratch} .= ">\n\n";
522  }
523}
524
525sub end_for {
526  my ($self) = @_;
527  delete $self->{started_for};
528  delete $self->{in_for};
529
530  if ($self->__in_literal_xhtml_region) {
531    # Remove trailine newlines.
532    $self->{'scratch'} =~ s/\s+\z//s;
533  } else {
534    $self->{'scratch'} .= '</div>';
535  }
536
537  pop @{ $self->{__region_targets} };
538  $self->emit;
539}
540
541sub start_Document {
542  my ($self) = @_;
543  if (defined $self->html_header) {
544    $self->{'scratch'} .= $self->html_header;
545    $self->emit unless $self->html_header eq "";
546  } else {
547    my ($doctype, $title, $metatags, $bodyid);
548    $doctype = $self->html_doctype || '';
549    $title = $self->force_title || $self->title || $self->default_title || '';
550    $metatags = $self->html_header_tags || '';
551    if (my $css = $self->html_css) {
552        if ($css !~ /<link/) {
553            # this is required to be compatible with Pod::Simple::BatchHTML
554            $metatags .= '<link rel="stylesheet" href="'
555                . $self->encode_entities($css) . '" type="text/css" />';
556        } else {
557            $metatags .= $css;
558        }
559    }
560    if ($self->html_javascript) {
561      $metatags .= qq{\n<script type="text/javascript" src="} .
562                    $self->html_javascript . '"></script>';
563    }
564    $bodyid = $self->backlink ? ' id="_podtop_"' : '';
565    $self->{'scratch'} .= <<"HTML";
566$doctype
567<html>
568<head>
569<title>$title</title>
570$metatags
571</head>
572<body$bodyid>
573HTML
574    $self->emit;
575  }
576}
577
578sub end_Document   {
579  my ($self) = @_;
580  my $to_index = $self->{'to_index'};
581  if ($self->index && @{ $to_index } ) {
582      my @out;
583      my $level  = 0;
584      my $indent = -1;
585      my $space  = '';
586      my $id     = ' id="index"';
587
588      for my $h (@{ $to_index }, [0]) {
589          my $target_level = $h->[0];
590          # Get to target_level by opening or closing ULs
591          if ($level == $target_level) {
592              $out[-1] .= '</li>';
593          } elsif ($level > $target_level) {
594              $out[-1] .= '</li>' if $out[-1] =~ /^\s+<li>/;
595              while ($level > $target_level) {
596                  --$level;
597                  push @out, ('  ' x --$indent) . '</li>' if @out && $out[-1] =~ m{^\s+<\/ul};
598                  push @out, ('  ' x --$indent) . '</ul>';
599              }
600              push @out, ('  ' x --$indent) . '</li>' if $level;
601          } else {
602              while ($level < $target_level) {
603                  ++$level;
604                  push @out, ('  ' x ++$indent) . '<li>' if @out && $out[-1]=~ /^\s*<ul/;
605                  push @out, ('  ' x ++$indent) . "<ul$id>";
606                  $id = '';
607              }
608              ++$indent;
609          }
610
611          next unless $level;
612          $space = '  '  x $indent;
613          push @out, sprintf '%s<li><a href="#%s">%s</a>',
614              $space, $h->[1], $h->[2];
615      }
616      # Splice the index in between the HTML headers and the first element.
617      my $offset = defined $self->html_header ? $self->html_header eq '' ? 0 : 1 : 1;
618      splice @{ $self->{'output'} }, $offset, 0, join "\n", @out;
619  }
620
621  if (defined $self->html_footer) {
622    $self->{'scratch'} .= $self->html_footer;
623    $self->emit unless $self->html_footer eq "";
624  } else {
625    $self->{'scratch'} .= "</body>\n</html>";
626    $self->emit;
627  }
628
629  if ($self->index) {
630      print {$self->{'output_fh'}} join ("\n\n", @{ $self->{'output'} }), "\n\n";
631      @{$self->{'output'}} = ();
632  }
633
634}
635
636# Handling code tags
637sub start_B { $_[0]{'scratch'} .= '<b>' }
638sub end_B   { $_[0]{'scratch'} .= '</b>' }
639
640sub start_C { push(@{$_[0]{'in_code'}}, 'C'); $_[0]->start_code($_[0]{'in_code'}[-1]); }
641sub end_C   { $_[0]->end_code(pop(@{$_[0]{'in_code'}})); }
642
643sub start_F { $_[0]{'scratch'} .= '<i>' }
644sub end_F   { $_[0]{'scratch'} .= '</i>' }
645
646sub start_I { $_[0]{'scratch'} .= '<i>' }
647sub end_I   { $_[0]{'scratch'} .= '</i>' }
648
649sub start_L {
650  my ($self, $flags) = @_;
651    my ($type, $to, $section) = @{$flags}{'type', 'to', 'section'};
652    my $url = $self->encode_entities(
653        $type eq 'url' ? $to
654            : $type eq 'pod' ? $self->resolve_pod_page_link($to, $section)
655            : $type eq 'man' ? $self->resolve_man_page_link($to, $section)
656            :                  undef
657    );
658
659    # If it's an unknown type, use an attribute-less <a> like HTML.pm.
660    $self->{'scratch'} .= '<a' . ($url ? ' href="'. $url . '">' : '>');
661}
662
663sub end_L   { $_[0]{'scratch'} .= '</a>' }
664
665sub start_S { $_[0]{'scratch'} .= '<span style="white-space: nowrap;">' }
666sub end_S   { $_[0]{'scratch'} .= '</span>' }
667
668sub emit {
669  my($self) = @_;
670  if ($self->index) {
671      push @{ $self->{'output'} }, $self->{'scratch'};
672  } else {
673      print {$self->{'output_fh'}} $self->{'scratch'}, "\n\n";
674  }
675  $self->{'scratch'} = '';
676  return;
677}
678
679=head2 resolve_pod_page_link
680
681  my $url = $pod->resolve_pod_page_link('Net::Ping', 'INSTALL');
682  my $url = $pod->resolve_pod_page_link('perlpodspec');
683  my $url = $pod->resolve_pod_page_link(undef, 'SYNOPSIS');
684
685Resolves a POD link target (typically a module or POD file name) and section
686name to a URL. The resulting link will be returned for the above examples as:
687
688  http://search.cpan.org/perldoc?Net::Ping#INSTALL
689  http://search.cpan.org/perldoc?perlpodspec
690  #SYNOPSIS
691
692Note that when there is only a section argument the URL will simply be a link
693to a section in the current document.
694
695=cut
696
697sub resolve_pod_page_link {
698    my ($self, $to, $section) = @_;
699    return undef unless defined $to || defined $section;
700    if (defined $section) {
701        $section = '#' . $self->idify($self->encode_entities($section), 1);
702        return $section unless defined $to;
703    } else {
704        $section = ''
705    }
706
707    return ($self->perldoc_url_prefix || '')
708        . $self->encode_entities($to) . $section
709        . ($self->perldoc_url_postfix || '');
710}
711
712=head2 resolve_man_page_link
713
714  my $url = $pod->resolve_man_page_link('crontab(5)', 'EXAMPLE CRON FILE');
715  my $url = $pod->resolve_man_page_link('crontab');
716
717Resolves a man page link target and numeric section to a URL. The resulting
718link will be returned for the above examples as:
719
720    http://man.he.net/man5/crontab
721    http://man.he.net/man1/crontab
722
723Note that the first argument is required. The section number will be parsed
724from it, and if it's missing will default to 1. The second argument is
725currently ignored, as L<man.he.net|http://man.he.net> does not currently
726include linkable IDs or anchor names in its pages. Subclass to link to a
727different man page HTTP server.
728
729=cut
730
731sub resolve_man_page_link {
732    my ($self, $to, $section) = @_;
733    return undef unless defined $to;
734    my ($page, $part) = $to =~ /^([^(]+)(?:[(](\d+)[)])?$/;
735    return undef unless $page;
736    return ($self->man_url_prefix || '')
737        . ($part || 1) . "/" . $self->encode_entities($page)
738        . ($self->man_url_postfix || '');
739
740}
741
742=head2 idify
743
744  my $id   = $pod->idify($text);
745  my $hash = $pod->idify($text, 1);
746
747This method turns an arbitrary string into a valid XHTML ID attribute value.
748The rules enforced, following
749L<http://webdesign.about.com/od/htmltags/a/aa031707.htm>, are:
750
751=over
752
753=item *
754
755The id must start with a letter (a-z or A-Z)
756
757=item *
758
759All subsequent characters can be letters, numbers (0-9), hyphens (-),
760underscores (_), colons (:), and periods (.).
761
762=item *
763
764The final character can't be a hyphen, colon, or period. URLs ending with these
765characters, while allowed by XHTML, can be awkward to extract from plain text.
766
767=item *
768
769Each id must be unique within the document.
770
771=back
772
773In addition, the returned value will be unique within the context of the
774Pod::Simple::XHTML object unless a second argument is passed a true value. ID
775attributes should always be unique within a single XHTML document, but pass
776the true value if you are creating not an ID but a URL hash to point to
777an ID (i.e., if you need to put the "#foo" in C<< <a href="#foo">foo</a> >>.
778
779=cut
780
781sub idify {
782    my ($self, $t, $not_unique) = @_;
783    for ($t) {
784        s/<[^>]+>//g;            # Strip HTML.
785        s/&[^;]+;//g;            # Strip entities.
786        s/^\s+//; s/\s+$//;      # Strip white space.
787        s/^([^a-zA-Z]+)$/pod$1/; # Prepend "pod" if no valid chars.
788        s/^[^a-zA-Z]+//;         # First char must be a letter.
789        s/[^-a-zA-Z0-9_:.]+/-/g; # All other chars must be valid.
790        s/[-:.]+$//;             # Strip trailing punctuation.
791    }
792    return $t if $not_unique;
793    my $i = '';
794    $i++ while $self->{ids}{"$t$i"}++;
795    return "$t$i";
796}
797
798=head2 batch_mode_page_object_init
799
800  $pod->batch_mode_page_object_init($batchconvobj, $module, $infile, $outfile, $depth);
801
802Called by L<Pod::Simple::HTMLBatch> so that the class has a chance to
803initialize the converter. Internally it sets the C<batch_mode> property to
804true and sets C<batch_mode_current_level()>, but Pod::Simple::XHTML does not
805currently use those features. Subclasses might, though.
806
807=cut
808
809sub batch_mode_page_object_init {
810  my ($self, $batchconvobj, $module, $infile, $outfile, $depth) = @_;
811  $self->batch_mode(1);
812  $self->batch_mode_current_level($depth);
813  return $self;
814}
815
816sub html_header_after_title {
817}
818
819
8201;
821
822__END__
823
824=head1 SEE ALSO
825
826L<Pod::Simple>, L<Pod::Simple::Text>, L<Pod::Spell>
827
828=head1 SUPPORT
829
830Questions or discussion about POD and Pod::Simple should be sent to the
831pod-people@perl.org mail list. Send an empty email to
832pod-people-subscribe@perl.org to subscribe.
833
834This module is managed in an open GitHub repository,
835L<https://github.com/perl-pod/pod-simple/>. Feel free to fork and contribute, or
836to clone L<git://github.com/perl-pod/pod-simple.git> and send patches!
837
838Patches against Pod::Simple are welcome. Please send bug reports to
839<bug-pod-simple@rt.cpan.org>.
840
841=head1 COPYRIGHT AND DISCLAIMERS
842
843Copyright (c) 2003-2005 Allison Randal.
844
845This library is free software; you can redistribute it and/or modify it
846under the same terms as Perl itself.
847
848This program is distributed in the hope that it will be useful, but
849without any warranty; without even the implied warranty of
850merchantability or fitness for a particular purpose.
851
852=head1 ACKNOWLEDGEMENTS
853
854Thanks to L<Hurricane Electric|http://he.net/> for permission to use its
855L<Linux man pages online|http://man.he.net/> site for man page links.
856
857Thanks to L<search.cpan.org|http://search.cpan.org/> for permission to use the
858site for Perl module links.
859
860=head1 AUTHOR
861
862Pod::Simpele::XHTML was created by Allison Randal <allison@perl.org>.
863
864Pod::Simple was created by Sean M. Burke <sburke@cpan.org>.
865But don't bother him, he's retired.
866
867Pod::Simple is maintained by:
868
869=over
870
871=item * Allison Randal C<allison@perl.org>
872
873=item * Hans Dieter Pearcey C<hdp@cpan.org>
874
875=item * David E. Wheeler C<dwheeler@cpan.org>
876
877=back
878
879=cut
880