• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

lib/HTML/H03-May-2022-1,931918

scripts/H02-Apr-2011-914109

t/H02-Apr-2011-1,094909

tfiles/H03-May-2022-374298

Build.PLH A D02-Apr-2011934 4638

ChangesH A D02-Apr-20111.1 KiB4739

LICENSEH A D02-Apr-201114.7 KiB266219

MANIFESTH A D02-Apr-20111,014 5150

MANIFEST.SKIPH A D02-Apr-2011284 3026

META.ymlH A D02-Apr-2011734 3130

OldChangesH A D02-Apr-20118.3 KiB232185

READMEH A D02-Apr-201123.2 KiB660474

README.mkdnH A D02-Apr-201120.9 KiB773515

README

1NAME
2    HTML::GenToc - Generate a Table of Contents for HTML documents.
3
4VERSION
5    version 3.20
6
7SYNOPSIS
8      use HTML::GenToc;
9
10      # create a new object
11      my $toc = new HTML::GenToc();
12
13      my $toc = new HTML::GenToc(title=>"Table of Contents",
14                              toc_entry=>{
15                                H1=>1,
16                                H2=>2
17                              },
18                              toc_end=>{
19                                H1=>'/H1',
20                                H2=>'/H2'
21                              }
22        );
23
24      # generate a ToC from a file
25      $toc->generate_toc(input=>$html_file,
26                         footer=>$footer_file,
27                         header=>$header_file
28        );
29
30DESCRIPTION
31    HTML::GenToc generates anchors and a table of contents for HTML
32    documents. Depending on the arguments, it will insert the information it
33    generates, or output to a string, a separate file or STDOUT.
34
35    While it defaults to taking H1 and H2 elements as the significant
36    elements to put into the table of contents, any tag can be defined as a
37    significant element. Also, it doesn't matter if the input HTML code is
38    complete, pure HTML, one can input pseudo-html or page-fragments, which
39    makes it suitable for using on templates and HTML meta-languages such as
40    WML.
41
42    Also included in the distrubution is hypertoc, a script which uses the
43    module so that one can process files on the command-line in a
44    user-friendly manner.
45
46DETAILS
47    The ToC generated is a multi-level level list containing links to the
48    significant elements. HTML::GenToc inserts the links into the ToC to
49    significant elements at a level specified by the user.
50
51    Example:
52
53    If H1s are specified as level 1, than they appear in the first level
54    list of the ToC. If H2s are specified as a level 2, than they appear in
55    a second level list in the ToC.
56
57    Information on the significant elements and what level they should occur
58    are passed in to the methods used by this object, or one can use the
59    defaults.
60
61    There are two phases to the ToC generation. The first phase is to put
62    suitable anchors into the HTML documents, and the second phase is to
63    generate the ToC from HTML documents which have anchors in them for the
64    ToC to link to.
65
66    For more information on controlling the contents of the created ToC, see
67    "Formatting the ToC".
68
69    HTML::GenToc also supports the ability to incorporate the ToC into the
70    HTML document itself via the inline option. See "Inlining the ToC" for
71    more information.
72
73    In order for HTML::GenToc to support linking to significant elements,
74    HTML::GenToc inserts anchors into the significant elements. One can use
75    HTML::GenToc as a filter, outputing the result to another file, or one
76    can overwrite the original file, with the original backed up with a
77    suffix (default: "org") appended to the filename. One can also output
78    the result to a string.
79
80METHODS
81    Default arguments can be set when the object is created, and overridden
82    by setting arguments when the generate_toc method is called. Arguments
83    are given as a hash of arguments.
84
85  Method -- new
86        $toc = new HTML::GenToc();
87
88        $toc = new HTML::GenToc(toc_entry=>\%my_toc_entry,
89            toc_end=>\%my_toc_end,
90            bak=>'bak',
91            ...
92            );
93
94    Creates a new HTML::GenToc object.
95
96    These arguments will be used as defaults in invocations of other
97    methods.
98
99    See generate_tod for possible arguments.
100
101  generate_toc
102        $toc->generate_toc(outfile=>"index2.html");
103
104        my $result_str = $toc->generate_toc(to_string=>1);
105
106    Generates a table of contents for the significant elements in the HTML
107    documents, optionally generating anchors for them first.
108
109    Options
110
111    bak bak => *string*
112
113        If the input file/files is/are being overwritten (overwrite is on),
114        copy the original file to "*filename*.*string*". If the value is
115        empty, no backup file will be created. (default:org)
116
117    debug
118        debug => 1
119
120        Enable verbose debugging output. Used for debugging this module; in
121        other words, don't bother. (default:off)
122
123    entrysep
124        entrysep => *string*
125
126        Separator string for non-<li> item entries (default: ", ")
127
128    filenames
129        filenames => \@filenames
130
131        The filenames to use when creating table-of-contents links. This
132        overrides the filenames given in the input option, and is expected
133        to have exactly the same number of elements. This can also be used
134        when passing in string-content to the input option, to give a (fake)
135        filename to use for the links relating to that content.
136
137    footer
138        footer => *file_or_string*
139
140        Either the filename of the file containing footer text for ToC; or a
141        string containing the footer text.
142
143    header
144        header => *file_or_string*
145
146        Either the filename of the file containing header text for ToC; or a
147        string containing the header text.
148
149    ignore_only_one
150        ignore_only_one => 1
151
152        If there would be only one item in the ToC, don't make a ToC.
153
154    ignore_sole_first
155        ignore_sole_first => 1
156
157        If the first item in the ToC is of the highest level, AND it is the
158        only one of that level, ignore it. This is useful in web-pages where
159        there is only one H1 header but one doesn't know beforehand whether
160        there will be only one.
161
162    inline
163        inline => 1
164
165        Put ToC in document at a given point. See "Inlining the ToC" for
166        more information.
167
168    input
169        input => \@filenames
170
171        input => $content
172
173        This is expected to be either a reference to an array of filenames,
174        or a string containing content to process.
175
176        The three main uses would be:
177
178        (a) you have more than one file to process, so pass in multiple
179            filenames
180
181        (b) you have one file to process, so pass in its filename as the
182            only array item
183
184        (c) you have HTML content to process, so pass in just the content as
185            a string
186
187        (default:undefined)
188
189    notoc_match
190        notoc_match => *string*
191
192        If there are certain individual tags you don't wish to include in
193        the table of contents, even though they match the "significant
194        elements", then if this pattern matches contents inside the tag (not
195        the body), then that tag will not be included, either in generating
196        anchors nor in generating the ToC. (default: "class="notoc"")
197
198    ol  ol => 1
199
200        Use an ordered list for level 1 ToC entries.
201
202    ol_num_levels
203        ol_num_levels => 2
204
205        The number of levels deep the OL listing will go if ol is true. If
206        set to zero, will use an ordered list for all levels. (default:1)
207
208    overwrite
209        overwrite => 1
210
211        Overwrite the input file with the output. (default:off)
212
213    outfile
214        outfile => *file*
215
216        File to write the output to. This is where the modified HTML output
217        goes to. Note that it doesn't make sense to use this option if you
218        are processing more than one file. If you give '-' as the filename,
219        then output will go to STDOUT. (default: STDOUT)
220
221    quiet
222        quiet => 1
223
224        Suppress informative messages. (default: off)
225
226    textonly
227        textonly => 1
228
229        Use only text content in significant elements.
230
231    title
232        title => *string*
233
234        Title for ToC page (if not using header or inline or toc_only)
235        (default: "Table of Contents")
236
237    toc_after
238        toc_after => \%toc_after_data
239
240        %toc_after_data = { *tag1* => *suffix1*, *tag2* => *suffix2* };
241
242        toc_after => { H2=>'</em>' }
243
244        For defining layout of significant elements in the ToC.
245
246        This expects a reference to a hash of tag=>suffix pairs.
247
248        The *tag* is the HTML tag which marks the start of the element. The
249        *suffix* is what is required to be appended to the Table of Contents
250        entry generated for that tag.
251
252        (default: undefined)
253
254    toc_before
255        toc_before => \%toc_before_data
256
257        %toc_before_data = { *tag1* => *prefix1*, *tag2* => *prefix2* };
258
259        toc_before=>{ H2=>'<em>' }
260
261        For defining the layout of significant elements in the ToC. The
262        *tag* is the HTML tag which marks the start of the element. The
263        *prefix* is what is required to be prepended to the Table of
264        Contents entry generated for that tag.
265
266        (default: undefined)
267
268    toc_end
269        toc_end => \%toc_end_data
270
271        %toc_end_data = { *tag1* => *endtag1*, *tag2* => *endtag2* };
272
273        toc_end => { H1 => '/H1', H2 => '/H2' }
274
275        For defining significant elements. The *tag* is the HTML tag which
276        marks the start of the element. The *endtag* the HTML tag which
277        marks the end of the element. When matching in the input file, case
278        is ignored (but make sure that all your *tag* options referring to
279        the same tag are exactly the same!).
280
281    toc_entry
282        toc_entry => \%toc_entry_data
283
284        %toc_entry_data = { *tag1* => *level1*, *tag2* => *level2* };
285
286        toc_entry => { H1 => 1, H2 => 2 }
287
288        For defining significant elements. The *tag* is the HTML tag which
289        marks the start of the element. The *level* is what level the tag is
290        considered to be. The value of *level* must be numeric, and
291        non-zero. If the value is negative, consective entries represented
292        by the significant_element will be separated by the value set by
293        entrysep option.
294
295    toclabel
296        toclabel => *string*
297
298        HTML text that labels the ToC. Always used. (default: "<h1>Table of
299        Contents</h1>")
300
301    toc_tag
302        toc_tag => *string*
303
304        If a ToC is to be included inline, this is the pattern which is used
305        to match the tag where the ToC should be put. This can be a
306        start-tag, an end-tag or a comment, but the < should be left out;
307        that is, if you want the ToC to be placed after the BODY tag, then
308        give "BODY". If you want a special comment tag to make where the ToC
309        should go, then include the comment marks, for example: "!--toc--"
310        (default:BODY)
311
312    toc_tag_replace
313        toc_tag_replace => 1
314
315        In conjunction with toc_tag, this is a flag to say whether the given
316        tag should be replaced, or if the ToC should be put after the tag.
317        This can be useful if your toc_tag is a comment and you don't need
318        it after you have the ToC in place. (default:false)
319
320    toc_only
321        toc_only => 1
322
323        Output only the Table of Contents, that is, the Table of Contents
324        plus the toclabel. If there is a header or a footer, these will also
325        be output.
326
327        If toc_only is false then if there is no header, and inline is not
328        true, then a suitable HTML page header will be output, and if there
329        is no footer and inline is not true, then a HTML page footer will be
330        output.
331
332        (default:false)
333
334    to_string
335        to_string => 1
336
337        Return the modified HTML output as a string. This *does* override
338        other methods of output (unlike version 3.00). If *to_string* is
339        false, the method will return 1 rather than a string.
340
341    use_id
342        use_id => 1
343
344        Use id="*name*" for anchors rather than <a name="*name*"/> anchors.
345        However if an anchor already exists for a Significant Element, this
346        won't make an id for that particular element.
347
348    useorg
349        useorg => 1
350
351        Use pre-existing backup files as the input source; that is, files of
352        the form *infile*.*bak* (see input and bak).
353
354INTERNAL METHODS
355    These methods are documented for developer purposes and aren't intended
356    to be used externally.
357
358  make_anchor_name
359        $toc->make_anchor_name(content=>$content,
360            anchors=>\%anchors);
361
362    Makes the anchor-name for one anchor. Bases the anchor on the content of
363    the significant element. Ensures that anchors are unique.
364
365  make_anchors
366        my $new_html = $toc->make_anchors(input=>$html,
367            notoc_match=>$notoc_match,
368            use_id=>$use_id,
369            toc_entry=>\%toc_entries,
370            toc_end=>\%toc_ends,
371            );
372
373    Makes the anchors the given input string. Returns a string.
374
375  make_toc_list
376        my @toc_list = $toc->make_toc_list(input=>$html,
377            labels=>\%labels,
378            notoc_match=>$notoc_match,
379            toc_entry=>\%toc_entry,
380            toc_end=>\%toc_end,
381            filename=>$filename);
382
383    Makes a list of lists which represents the structure and content of (a
384    portion of) the ToC from one file. Also updates a list of labels for the
385    ToC entries.
386
387  build_lol
388    Build a list of lists of paths, given a list of hashes with info about
389    paths.
390
391  output_toc
392        $self->output_toc(toc=>$toc_str,
393            input=>\@input,
394            filenames=>\@filenames);
395
396    Put the output (whether to file, STDOUT or string). The "output" in this
397    case could be the ToC, the modified (anchors added) HTML, or both.
398
399  put_toc_inline
400        my $newhtml = $toc->put_toc_inline(toc_str=>$toc_str,
401            filename=>$filename, in_string=>$in_string);
402
403    Puts the given toc_str into the given input string; returns a string.
404
405  cp
406        cp($src, $dst);
407
408    Copies file $src to $dst. Used for making backups of files.
409
410FILE FORMATS
411  Formatting the ToC
412    The toc_entry and other related options give you control on how the ToC
413    entries may look, but there are other options to affect the final
414    appearance of the ToC file created.
415
416    With the header option, the contents of the given file (or string) will
417    be prepended before the generated ToC. This allows you to have
418    introductory text, or any other text, before the ToC.
419
420    Note:
421        If you use the header option, make sure the file specified contains
422        the opening HTML tag, the HEAD element (containing the TITLE
423        element), and the opening BODY tag. However, these tags/elements
424        should not be in the header file if the inline option is used. See
425        "Inlining the ToC" for information on what the header file should
426        contain for inlining the ToC.
427
428    With the toclabel option, the contents of the given string will be
429    prepended before the generated ToC (but after any text taken from a
430    header file).
431
432    With the footer option, the contents of the file will be appended after
433    the generated ToC.
434
435    Note:
436        If you use the footer, make sure it includes the closing BODY and
437        HTML tags (unless, of course, you are using the inline option).
438
439    If the header option is not specified, the appropriate starting HTML
440    markup will be added, unless the toc_only option is specified. If the
441    footer option is not specified, the appropriate closing HTML markup will
442    be added, unless the toc_only option is specified.
443
444    If you do not want/need to deal with header, and footer, files, then you
445    are allowed to specify the title, title option, of the ToC file; and it
446    allows you to specify a heading, or label, to put before ToC entries'
447    list, the toclabel option. Both options have default values.
448
449    If you do not want HTML page tags to be supplied, and just want the ToC
450    itself, then specify the toc_only option. If there are no header or
451    footer files, then this will simply output the contents of toclabel and
452    the ToC itself.
453
454  Inlining the ToC
455    The ability to incorporate the ToC directly into an HTML document is
456    supported via the inline option.
457
458    Inlining will be done on the first file in the list of files processed,
459    and will only be done if that file contains an opening tag matching the
460    toc_tag value.
461
462    If overwrite is true, then the first file in the list will be
463    overwritten, with the generated ToC inserted at the appropriate spot.
464    Otherwise a modified version of the first file is output to either
465    STDOUT or to the output file defined by the outfile option.
466
467    The options toc_tag and toc_tag_replace are used to determine where and
468    how the ToC is inserted into the output.
469
470    Example 1
471
472        $toc->generate_toc(inline=>1,
473                           toc_tag => 'BODY',
474                           toc_tag_replace => 0,
475                           ...
476                           );
477
478    This will put the generated ToC after the BODY tag of the first file. If
479    the header option is specified, then the contents of the specified file
480    are inserted after the BODY tag. If the toclabel option is not empty,
481    then the text specified by the toclabel option is inserted. Then the ToC
482    is inserted, and finally, if the footer option is specified, it inserts
483    the footer. Then the rest of the input file follows as it was before.
484
485    Example 2
486
487        $toc->generate_toc(inline=>1,
488                           toc_tag => '!--toc--',
489                           toc_tag_replace => 1,
490                           ...
491                           );
492
493    This will put the generated ToC after the first comment of the form
494    <!--toc-->, and that comment will be replaced by the ToC (in the order
495    header toclabel ToC footer) followed by the rest of the input file.
496
497    Note:
498        The header file should not contain the beginning HTML tag and HEAD
499        element since the HTML file being processed should already contain
500        these tags/elements.
501
502NOTES
503    *   HTML::GenToc is smart enough to detect anchors inside significant
504        elements. If the anchor defines the NAME attribute, HTML::GenToc
505        uses the value. Else, it adds its own NAME attribute to the anchor.
506        If use_id is true, then it likewise checks for and uses IDs.
507
508    *   The TITLE element is treated specially if specified in the toc_entry
509        option. It is illegal to insert anchors (A) into TITLE elements.
510        Therefore, HTML::GenToc will actually link to the filename itself
511        instead of the TITLE element of the document.
512
513    *   HTML::GenToc will ignore a significant element if it does not
514        contain any non-whitespace characters. A warning message is
515        generated if such a condition exists.
516
517    *   If you have a sequence of significant elements that change in a
518        slightly disordered fashion, such as H1 -> H3 -> H2 or even H2 ->
519        H1, though HTML::GenToc deals with this to create a list which is
520        still good HTML, if you are using an ordered list to that depth,
521        then you will get strange numbering, as an extra list element will
522        have been inserted to nest the elements at the correct level.
523
524        For example (H2 -> H1 with ol_num_levels=1):
525
526            1.
527                * My H2 Header
528            2. My H1 Header
529
530        For example (H1 -> H3 -> H2 with ol_num_levels=0 and H3 also being
531        significant):
532
533            1. My H1 Header
534                1.
535                    1. My H3 Header
536                2. My H2 Header
537            2. My Second H1 Header
538
539        In cases such as this it may be better not to use the ol option.
540
541CAVEATS
542    *   Version 3.10 (and above) generates more verbose (SEO-friendly)
543        anchors than prior versions. Thus anchors generated with earlier
544        versions will not match version 3.10 anchors.
545
546    *   Version 3.00 (and above) of HTML::GenToc is not compatible with
547        Version 2.x of HTML::GenToc. It is now designed to do everything in
548        one pass, and has dropped certain options: the infile option is no
549        longer used (it has been replaced with the input option); the
550        toc_file option no longer exists; use the outfile option instead;
551        the tocmap option is no longer supported. Also the old array-parsing
552        of arguments is no longer supported. There is no longer a
553        generate_anchors method; everything is done with generate_toc.
554
555        It now generates lower-case tags rather than upper-case ones.
556
557    *   HTML::GenToc is not very efficient (memory and speed), and can be
558        slow for large documents.
559
560    *   Now that generation of anchors and of the ToC are done in one pass,
561        even more memory is used than was the case before. This is more
562        notable when processing multiple files, since all files are read
563        into memory before processing them.
564
565    *   Invalid markup will be generated if a significant element is
566        contained inside of an anchor. For example:
567
568            <a name="foo"><h1>The FOO command</h1></a>
569
570        will be converted to (if H1 is a significant element),
571
572            <a name="foo"><h1><a name="The">The</a> FOO command</h1></a>
573
574        which is illegal since anchors cannot be nested.
575
576        It is better style to put anchor statements within the element to be
577        anchored. For example, the following is preferred:
578
579            <h1><a name="foo">The FOO command</a></h1>
580
581        HTML::GenToc will detect the "foo" name and use it.
582
583    *   name attributes without quotes are not recognized.
584
585BUGS
586    Tell me about them.
587
588REQUIRES
589    The installation of this module requires "Module::Build". The module
590    depends on "HTML::SimpleParse", "HTML::Entities" and "HTML::LinkList"
591    and uses "Data::Dumper" for debugging purposes. The hypertoc script
592    depends on "Getopt::Long", "Getopt::ArgvFile" and "Pod::Usage". Testing
593    of this distribution depends on "Test::More".
594
595INSTALLATION
596    To install this module, run the following commands:
597
598        perl Build.PL
599        ./Build
600        ./Build test
601        ./Build install
602
603    Or, if you're on a platform (like DOS or Windows) that doesn't like the
604    "./" notation, you can do this:
605
606       perl Build.PL
607       perl Build
608       perl Build test
609       perl Build install
610
611    In order to install somewhere other than the default, such as in a
612    directory under your home directory, like "/home/fred/perl" go
613
614       perl Build.PL --install_base /home/fred/perl
615
616    as the first step instead.
617
618    This will install the files underneath /home/fred/perl.
619
620    You will then need to make sure that you alter the PERL5LIB variable to
621    find the modules, and the PATH variable to find the script.
622
623    Therefore you will need to change: your path, to include
624    /home/fred/perl/script (where the script will be)
625
626            PATH=/home/fred/perl/script:${PATH}
627
628    the PERL5LIB variable to add /home/fred/perl/lib
629
630            PERL5LIB=/home/fred/perl/lib:${PERL5LIB}
631
632SEE ALSO
633    perl(1) htmltoc(1) hypertoc(1)
634
635AUTHOR
636    Kathryn Andersen (RUBYKAT) http://www.katspace.org/tools/hypertoc/
637
638    Based on htmltoc by Earl Hood ehood AT medusa.acs.uci.edu
639
640    Contributions by Dan Dascalescu, <http://dandascalescu.com>
641
642COPYRIGHT
643    Copyright (C) 1994-1997 Earl Hood, ehood AT medusa.acs.uci.edu Copyright
644    (C) 2002-2008 Kathryn Andersen
645
646    This program is free software; you can redistribute it and/or modify it
647    under the terms of the GNU General Public License as published by the
648    Free Software Foundation; either version 2 of the License, or (at your
649    option) any later version.
650
651    This program is distributed in the hope that it will be useful, but
652    WITHOUT ANY WARRANTY; without even the implied warranty of
653    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
654    Public License for more details.
655
656    You should have received a copy of the GNU General Public License along
657    with this program; if not, write to the Free Software Foundation, Inc.,
658    675 Mass Ave, Cambridge, MA 02139, USA.
659
660

README.mkdn

1# NAME
2
3HTML::GenToc - Generate a Table of Contents for HTML documents.
4
5# VERSION
6
7version 3.20
8
9# SYNOPSIS
10
11  use HTML::GenToc;
12
13  # create a new object
14  my $toc = new HTML::GenToc();
15
16  my $toc = new HTML::GenToc(title=>"Table of Contents",
17			  toc_entry=>{
18			    H1=>1,
19			    H2=>2
20			  },
21			  toc_end=>{
22			    H1=>'/H1',
23			    H2=>'/H2'
24			  }
25    );
26
27  # generate a ToC from a file
28  $toc->generate_toc(input=>$html_file,
29		     footer=>$footer_file,
30		     header=>$header_file
31    );
32
33
34
35# DESCRIPTION
36
37HTML::GenToc generates anchors and a table of contents for
38HTML documents.  Depending on the arguments, it will insert
39the information it generates, or output to a string, a separate file
40or STDOUT.
41
42While it defaults to taking H1 and H2 elements as the significant
43elements to put into the table of contents, any tag can be defined
44as a significant element.  Also, it doesn't matter if the input
45HTML code is complete, pure HTML, one can input pseudo-html
46or page-fragments, which makes it suitable for using on templates
47and HTML meta-languages such as WML.
48
49Also included in the distrubution is hypertoc, a script which uses the
50module so that one can process files on the command-line in a
51user-friendly manner.
52
53# DETAILS
54
55The ToC generated is a multi-level level list containing links to the
56significant elements. HTML::GenToc inserts the links into the ToC to
57significant elements at a level specified by the user.
58
59__Example:__
60
61If H1s are specified as level 1, than they appear in the first
62level list of the ToC. If H2s are specified as a level 2, than
63they appear in a second level list in the ToC.
64
65Information on the significant elements and what level they should occur
66are passed in to the methods used by this object, or one can use the
67defaults.
68
69There are two phases to the ToC generation.  The first phase is to
70put suitable anchors into the HTML documents, and the second phase
71is to generate the ToC from HTML documents which have anchors
72in them for the ToC to link to.
73
74For more information on controlling the contents of the created ToC, see
75L</Formatting the ToC>.
76
77HTML::GenToc also supports the ability to incorporate the ToC into the HTML
78document itself via the __inline__ option.  See L</Inlining the ToC> for more
79information.
80
81In order for HTML::GenToc to support linking to significant elements,
82HTML::GenToc inserts anchors into the significant elements.  One can
83use HTML::GenToc as a filter, outputing the result to another file,
84or one can overwrite the original file, with the original backed
85up with a suffix (default: "org") appended to the filename.
86One can also output the result to a string.
87
88# METHODS
89
90Default arguments can be set when the object is created, and overridden
91by setting arguments when the generate_toc method is called.
92Arguments are given as a hash of arguments.
93
94## Method -- new
95
96    $toc = new HTML::GenToc();
97
98    $toc = new HTML::GenToc(toc_entry=>\%my_toc_entry,
99	toc_end=>\%my_toc_end,
100	bak=>'bak',
101    	...
102        );
103
104Creates a new HTML::GenToc object.
105
106These arguments will be used as defaults in invocations of other methods.
107
108See [generate_tod](http://search.cpan.org/perldoc?generate_tod) for possible arguments.
109
110## generate_toc
111
112    $toc->generate_toc(outfile=>"index2.html");
113
114    my $result_str = $toc->generate_toc(to_string=>1);
115
116Generates a table of contents for the significant elements in the HTML
117documents, optionally generating anchors for them first.
118
119__Options__
120
121- bak
122
123bak => _string_
124
125If the input file/files is/are being overwritten (__overwrite__ is on), copy
126the original file to "_filename_._string_".  If the value is empty, __no__
127backup file will be created.
128(default:org)
129
130- debug
131
132debug => 1
133
134Enable verbose debugging output.  Used for debugging this module;
135in other words, don't bother.
136(default:off)
137
138- entrysep
139
140entrysep => _string_
141
142Separator string for non-<li> item entries
143(default: ", ")
144
145- filenames
146
147filenames => \@filenames
148
149The filenames to use when creating table-of-contents links.
150This overrides the filenames given in the __input__ option,
151and is expected to have exactly the same number of elements.
152This can also be used when passing in string-content to the __input__
153option, to give a (fake) filename to use for the links relating
154to that content.
155
156- footer
157
158footer => _file_or_string_
159
160Either the filename of the file containing footer text for ToC;
161or a string containing the footer text.
162
163- header
164
165header => _file_or_string_
166
167Either the filename of the file containing header text for ToC;
168or a string containing the header text.
169
170- ignore_only_one
171
172ignore_only_one => 1
173
174If there would be only one item in the ToC, don't make a ToC.
175
176- ignore_sole_first
177
178ignore_sole_first => 1
179
180If the first item in the ToC is of the highest level,
181AND it is the only one of that level, ignore it.
182This is useful in web-pages where there is only one H1 header
183but one doesn't know beforehand whether there will be only one.
184
185- inline
186
187inline => 1
188
189Put ToC in document at a given point.
190See L</Inlining the ToC> for more information.
191
192- input
193
194input => \@filenames
195
196input => $content
197
198This is expected to be either a reference to an array of filenames,
199or a string containing content to process.
200
201The three main uses would be:
202
203    - (a)
204
205    you have more than one file to process, so pass in multiple filenames
206
207    - (b)
208
209    you have one file to process, so pass in its filename as the only array item
210
211    - (c)
212
213    you have HTML content to process, so pass in just the content as a string
214
215(default:undefined)
216
217- notoc_match
218
219notoc_match => _string_
220
221If there are certain individual tags you don't wish to include in the
222table of contents, even though they match the "significant elements",
223then if this pattern matches contents inside the tag (not the body),
224then that tag will not be included, either in generating anchors nor in
225generating the ToC.  (default: `class="notoc"`)
226
227- ol
228
229ol => 1
230
231Use an ordered list for level 1 ToC entries.
232
233- ol_num_levels
234
235ol_num_levels => 2
236
237The number of levels deep the OL listing will go if __ol__ is true.
238If set to zero, will use an ordered list for all levels.
239(default:1)
240
241- overwrite
242
243overwrite => 1
244
245Overwrite the input file with the output.
246(default:off)
247
248- outfile
249
250outfile => _file_
251
252File to write the output to.  This is where the modified HTML
253output goes to.  Note that it doesn't make sense to use this option if you
254are processing more than one file.  If you give '-' as the filename, then
255output will go to STDOUT.
256(default: STDOUT)
257
258- quiet
259
260quiet => 1
261
262Suppress informative messages. (default: off)
263
264- textonly
265
266textonly => 1
267
268Use only text content in significant elements.
269
270- title
271
272title => _string_
273
274Title for ToC page (if not using __header__ or __inline__ or __toc_only__)
275(default: "Table of Contents")
276
277- toc_after
278
279toc_after => \%toc_after_data
280
281%toc_after_data = { _tag1_ => _suffix1_,
282    _tag2_ => _suffix2_
283    };
284
285toc_after => { H2=>'</em>' }
286
287For defining layout of significant elements in the ToC.
288
289This expects a reference to a hash of
290tag=>suffix pairs.
291
292The _tag_ is the HTML tag which marks the start of the element.  The
293_suffix_ is what is required to be appended to the Table of Contents
294entry generated for that tag.
295
296(default: undefined)
297
298- toc_before
299
300toc_before => \%toc_before_data
301
302%toc_before_data = { _tag1_ => _prefix1_,
303    _tag2_ => _prefix2_
304    };
305
306toc_before=>{ H2=>'<em>' }
307
308For defining the layout of significant elements in the ToC.  The _tag_
309is the HTML tag which marks the start of the element.  The _prefix_ is
310what is required to be prepended to the Table of Contents entry
311generated for that tag.
312
313(default: undefined)
314
315- toc_end
316
317toc_end => \%toc_end_data
318
319%toc_end_data = { _tag1_ => _endtag1_,
320    _tag2_ => _endtag2_
321    };
322
323toc_end => { H1 => '/H1', H2 => '/H2' }
324
325For defining significant elements.  The _tag_ is the HTML tag which
326marks the start of the element.  The _endtag_ the HTML tag which marks
327the end of the element.  When matching in the input file, case is
328ignored (but make sure that all your _tag_ options referring to the
329same tag are exactly the same!).
330
331- toc_entry
332
333toc_entry => \%toc_entry_data
334
335%toc_entry_data = { _tag1_ => _level1_,
336    _tag2_ => _level2_
337    };
338
339toc_entry => { H1 => 1, H2 => 2 }
340
341For defining significant elements.  The _tag_ is the HTML tag which marks
342the start of the element.  The _level_ is what level the tag is considered
343to be.  The value of _level_ must be numeric, and non-zero. If the value
344is negative, consective entries represented by the significant_element will
345be separated by the value set by __entrysep__ option.
346
347- toclabel
348
349toclabel => _string_
350
351HTML text that labels the ToC.  Always used.
352(default: "<h1>Table of Contents</h1>")
353
354- toc_tag
355
356toc_tag => _string_
357
358If a ToC is to be included inline, this is the pattern which is used to
359match the tag where the ToC should be put.  This can be a start-tag, an
360end-tag or a comment, but the < should be left out; that is, if you
361want the ToC to be placed after the BODY tag, then give "BODY".  If you
362want a special comment tag to make where the ToC should go, then include
363the comment marks, for example: "!--toc--" (default:BODY)
364
365- toc_tag_replace
366
367toc_tag_replace => 1
368
369In conjunction with __toc_tag__, this is a flag to say whether the given tag
370should be replaced, or if the ToC should be put after the tag.
371This can be useful if your toc_tag is a comment and you don't need it
372after you have the ToC in place.
373(default:false)
374
375- toc_only
376
377toc_only => 1
378
379Output only the Table of Contents, that is, the Table of Contents plus
380the toclabel.  If there is a __header__ or a __footer__, these will also be
381output.
382
383If __toc_only__ is false then if there is no __header__, and __inline__ is
384not true, then a suitable HTML page header will be output, and if there
385is no __footer__ and __inline__ is not true, then a HTML page footer will
386be output.
387
388(default:false)
389
390- to_string
391
392to_string => 1
393
394Return the modified HTML output as a string.  This _does_ override
395other methods of output (unlike version 3.00).  If _to_string_ is false,
396the method will return 1 rather than a string.
397
398- use_id
399
400use_id => 1
401
402Use id="_name_" for anchors rather than <a name="_name_"/> anchors.
403However if an anchor already exists for a Significant Element, this
404won't make an id for that particular element.
405
406- useorg
407
408useorg => 1
409
410Use pre-existing backup files as the input source; that is, files of the
411form _infile_._bak_  (see __input__ and __bak__).
412
413# INTERNAL METHODS
414
415These methods are documented for developer purposes and aren't intended
416to be used externally.
417
418## make_anchor_name
419
420    $toc->make_anchor_name(content=>$content,
421	anchors=>\%anchors);
422
423Makes the anchor-name for one anchor.
424Bases the anchor on the content of the significant element.
425Ensures that anchors are unique.
426
427## make_anchors
428
429    my $new_html = $toc->make_anchors(input=>$html,
430	notoc_match=>$notoc_match,
431	use_id=>$use_id,
432	toc_entry=>\%toc_entries,
433	toc_end=>\%toc_ends,
434	);
435
436Makes the anchors the given input string.
437Returns a string.
438
439## make_toc_list
440
441    my @toc_list = $toc->make_toc_list(input=>$html,
442	labels=>\%labels,
443	notoc_match=>$notoc_match,
444	toc_entry=>\%toc_entry,
445	toc_end=>\%toc_end,
446	filename=>$filename);
447
448Makes a list of lists which represents the structure and content
449of (a portion of) the ToC from one file.
450Also updates a list of labels for the ToC entries.
451
452## build_lol
453
454Build a list of lists of paths, given a list
455of hashes with info about paths.
456
457## output_toc
458
459    $self->output_toc(toc=>$toc_str,
460	input=>\@input,
461	filenames=>\@filenames);
462
463Put the output (whether to file, STDOUT or string).
464The "output" in this case could be the ToC, the modified
465(anchors added) HTML, or both.
466
467## put_toc_inline
468
469    my $newhtml = $toc->put_toc_inline(toc_str=>$toc_str,
470	filename=>$filename, in_string=>$in_string);
471
472Puts the given toc_str into the given input string;
473returns a string.
474
475## cp
476
477    cp($src, $dst);
478
479Copies file $src to $dst.
480Used for making backups of files.
481
482# FILE FORMATS
483
484## Formatting the ToC
485
486The __toc_entry__ and other related options give you control on how the
487ToC entries may look, but there are other options to affect the final
488appearance of the ToC file created.
489
490With the __header__ option, the contents of the given file (or string)
491will be prepended before the generated ToC. This allows you to have
492introductory text, or any other text, before the ToC.
493
494- Note:
495
496If you use the __header__ option, make sure the file specified
497contains the opening HTML tag, the HEAD element (containing the
498TITLE element), and the opening BODY tag. However, these
499tags/elements should not be in the header file if the __inline__
500option is used. See L</Inlining the ToC> for information on what
501the header file should contain for inlining the ToC.
502
503With the __toclabel__ option, the contents of the given string will be
504prepended before the generated ToC (but after any text taken from a
505__header__ file).
506
507With the __footer__ option, the contents of the file will be appended
508after the generated ToC.
509
510- Note:
511
512If you use the __footer__, make sure it includes the closing BODY
513and HTML tags (unless, of course, you are using the __inline__ option).
514
515If the __header__ option is not specified, the appropriate starting
516HTML markup will be added, unless the __toc_only__ option is specified.
517If the __footer__ option is not specified, the appropriate closing
518HTML markup will be added, unless the __toc_only__ option is specified.
519
520If you do not want/need to deal with header, and footer, files, then
521you are allowed to specify the title, __title__ option, of the ToC file;
522and it allows you to specify a heading, or label, to put before ToC
523entries' list, the __toclabel__ option. Both options have default values.
524
525If you do not want HTML page tags to be supplied, and just want
526the ToC itself, then specify the __toc_only__ option.
527If there are no __header__ or __footer__ files, then this will simply
528output the contents of __toclabel__ and the ToC itself.
529
530## Inlining the ToC
531
532The ability to incorporate the ToC directly into an HTML document
533is supported via the __inline__ option.
534
535Inlining will be done on the first file in the list of files processed,
536and will only be done if that file contains an opening tag matching the
537__toc_tag__ value.
538
539If __overwrite__ is true, then the first file in the list will be
540overwritten, with the generated ToC inserted at the appropriate spot.
541Otherwise a modified version of the first file is output to either STDOUT
542or to the output file defined by the __outfile__ option.
543
544The options __toc_tag__ and __toc_tag_replace__ are used to determine where
545and how the ToC is inserted into the output.
546
547__Example 1__
548
549    $toc->generate_toc(inline=>1,
550		       toc_tag => 'BODY',
551		       toc_tag_replace => 0,
552		       ...
553		       );
554
555This will put the generated ToC after the BODY tag of the first file.
556If the __header__ option is specified, then the contents of the specified
557file are inserted after the BODY tag.  If the __toclabel__ option is not
558empty, then the text specified by the __toclabel__ option is inserted.
559Then the ToC is inserted, and finally, if the __footer__ option is
560specified, it inserts the footer.  Then the rest of the input file
561follows as it was before.
562
563__Example 2__
564
565    $toc->generate_toc(inline=>1,
566		       toc_tag => '!--toc--',
567		       toc_tag_replace => 1,
568		       ...
569		       );
570
571This will put the generated ToC after the first comment of the form
572<!--toc-->, and that comment will be replaced by the ToC
573(in the order
574    __header__
575    __toclabel__
576    ToC
577    __footer__)
578followed by the rest of the input file.
579
580- Note:
581
582The header file should not contain the beginning HTML tag
583and HEAD element since the HTML file being processed should
584already contain these tags/elements.
585
586# NOTES
587
588- *
589
590HTML::GenToc is smart enough to detect anchors inside significant
591elements. If the anchor defines the NAME attribute, HTML::GenToc uses
592the value. Else, it adds its own NAME attribute to the anchor.
593If __use_id__ is true, then it likewise checks for and uses IDs.
594
595- *
596
597The TITLE element is treated specially if specified in the __toc_entry__
598option. It is illegal to insert anchors (A) into TITLE elements.
599Therefore, HTML::GenToc will actually link to the filename itself
600instead of the TITLE element of the document.
601
602- *
603
604HTML::GenToc will ignore a significant element if it does not contain
605any non-whitespace characters. A warning message is generated if
606such a condition exists.
607
608- *
609
610If you have a sequence of significant elements that change in a slightly
611disordered fashion, such as H1 -> H3 -> H2 or even H2 -> H1, though
612HTML::GenToc deals with this to create a list which is still good HTML, if
613you are using an ordered list to that depth, then you will get strange
614numbering, as an extra list element will have been inserted to nest the
615elements at the correct level.
616
617For example (H2 -> H1 with ol_num_levels=1):
618
619    1.
620	* My H2 Header
621    2. My H1 Header
622
623For example (H1 -> H3 -> H2 with ol_num_levels=0 and H3 also being
624significant):
625
626    1. My H1 Header
627	1.
628	    1. My H3 Header
629	2. My H2 Header
630    2. My Second H1 Header
631
632In cases such as this it may be better not to use the __ol__ option.
633
634# CAVEATS
635
636- *
637
638Version 3.10 (and above) generates more verbose (SEO-friendly) anchors
639than prior versions. Thus anchors generated with earlier versions will
640not match version 3.10 anchors.
641
642- *
643
644Version 3.00 (and above) of HTML::GenToc is not compatible with
645Version 2.x of HTML::GenToc.  It is now designed to do everything
646in one pass, and has dropped certain options: the __infile__ option
647is no longer used (it has been replaced with the __input__ option);
648the __toc_file__ option no longer exists; use the __outfile__ option
649instead; the __tocmap__ option is no longer supported.  Also the old
650array-parsing of arguments is no longer supported.  There is no longer
651a __generate_anchors__ method; everything is done with __generate_toc__.
652
653It now generates lower-case tags rather than upper-case ones.
654
655- *
656
657HTML::GenToc is not very efficient (memory and speed), and can be
658slow for large documents.
659
660- *
661
662Now that generation of anchors and of the ToC are done in one pass,
663even more memory is used than was the case before.  This is more notable
664when processing multiple files, since all files are read into memory
665before processing them.
666
667- *
668
669Invalid markup will be generated if a significant element is
670contained inside of an anchor. For example:
671
672    <a name="foo"><h1>The FOO command</h1></a>
673
674will be converted to (if H1 is a significant element),
675
676    <a name="foo"><h1><a name="The">The</a> FOO command</h1></a>
677
678which is illegal since anchors cannot be nested.
679
680It is better style to put anchor statements within the element to
681be anchored. For example, the following is preferred:
682
683    <h1><a name="foo">The FOO command</a></h1>
684
685HTML::GenToc will detect the "foo" name and use it.
686
687- *
688
689name attributes without quotes are not recognized.
690
691# BUGS
692
693Tell me about them.
694
695# REQUIRES
696
697The installation of this module requires `Module::Build`.  The module
698depends on `HTML::SimpleParse`, `HTML::Entities` and `HTML::LinkList` and uses
699`Data::Dumper` for debugging purposes.  The hypertoc script depends on
700`Getopt::Long`, `Getopt::ArgvFile` and `Pod::Usage`.  Testing of this
701distribution depends on `Test::More`.
702
703# INSTALLATION
704
705To install this module, run the following commands:
706
707    perl Build.PL
708    ./Build
709    ./Build test
710    ./Build install
711
712Or, if you're on a platform (like DOS or Windows) that doesn't like the
713"./" notation, you can do this:
714
715   perl Build.PL
716   perl Build
717   perl Build test
718   perl Build install
719
720In order to install somewhere other than the default, such as
721in a directory under your home directory, like "/home/fred/perl"
722go
723
724   perl Build.PL --install_base /home/fred/perl
725
726as the first step instead.
727
728This will install the files underneath /home/fred/perl.
729
730You will then need to make sure that you alter the PERL5LIB variable to
731find the modules, and the PATH variable to find the script.
732
733Therefore you will need to change:
734your path, to include /home/fred/perl/script (where the script will be)
735
736	PATH=/home/fred/perl/script:${PATH}
737
738the PERL5LIB variable to add /home/fred/perl/lib
739
740	PERL5LIB=/home/fred/perl/lib:${PERL5LIB}
741
742# SEE ALSO
743
744perl(1)
745htmltoc(1)
746hypertoc(1)
747
748# AUTHOR
749
750Kathryn Andersen     (RUBYKAT)	http://www.katspace.org/tools/hypertoc/
751
752Based on htmltoc by Earl Hood       ehood AT medusa.acs.uci.edu
753
754Contributions by Dan Dascalescu, <http://dandascalescu.com>
755
756# COPYRIGHT
757
758Copyright (C) 1994-1997  Earl Hood, ehood AT medusa.acs.uci.edu
759Copyright (C) 2002-2008 Kathryn Andersen
760
761This program is free software; you can redistribute it and/or modify
762it under the terms of the GNU General Public License as published by
763the Free Software Foundation; either version 2 of the License, or
764(at your option) any later version.
765
766This program is distributed in the hope that it will be useful,
767but WITHOUT ANY WARRANTY; without even the implied warranty of
768MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
769GNU General Public License for more details.
770
771You should have received a copy of the GNU General Public License
772along with this program; if not, write to the Free Software
773Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.