README
1NAME
2 HTML::GenToc - Generate a Table of Contents for HTML documents.
3
4VERSION
5 version 3.20
6
7SYNOPSIS
8 use HTML::GenToc;
9
10 # create a new object
11 my $toc = new HTML::GenToc();
12
13 my $toc = new HTML::GenToc(title=>"Table of Contents",
14 toc_entry=>{
15 H1=>1,
16 H2=>2
17 },
18 toc_end=>{
19 H1=>'/H1',
20 H2=>'/H2'
21 }
22 );
23
24 # generate a ToC from a file
25 $toc->generate_toc(input=>$html_file,
26 footer=>$footer_file,
27 header=>$header_file
28 );
29
30DESCRIPTION
31 HTML::GenToc generates anchors and a table of contents for HTML
32 documents. Depending on the arguments, it will insert the information it
33 generates, or output to a string, a separate file or STDOUT.
34
35 While it defaults to taking H1 and H2 elements as the significant
36 elements to put into the table of contents, any tag can be defined as a
37 significant element. Also, it doesn't matter if the input HTML code is
38 complete, pure HTML, one can input pseudo-html or page-fragments, which
39 makes it suitable for using on templates and HTML meta-languages such as
40 WML.
41
42 Also included in the distrubution is hypertoc, a script which uses the
43 module so that one can process files on the command-line in a
44 user-friendly manner.
45
46DETAILS
47 The ToC generated is a multi-level level list containing links to the
48 significant elements. HTML::GenToc inserts the links into the ToC to
49 significant elements at a level specified by the user.
50
51 Example:
52
53 If H1s are specified as level 1, than they appear in the first level
54 list of the ToC. If H2s are specified as a level 2, than they appear in
55 a second level list in the ToC.
56
57 Information on the significant elements and what level they should occur
58 are passed in to the methods used by this object, or one can use the
59 defaults.
60
61 There are two phases to the ToC generation. The first phase is to put
62 suitable anchors into the HTML documents, and the second phase is to
63 generate the ToC from HTML documents which have anchors in them for the
64 ToC to link to.
65
66 For more information on controlling the contents of the created ToC, see
67 "Formatting the ToC".
68
69 HTML::GenToc also supports the ability to incorporate the ToC into the
70 HTML document itself via the inline option. See "Inlining the ToC" for
71 more information.
72
73 In order for HTML::GenToc to support linking to significant elements,
74 HTML::GenToc inserts anchors into the significant elements. One can use
75 HTML::GenToc as a filter, outputing the result to another file, or one
76 can overwrite the original file, with the original backed up with a
77 suffix (default: "org") appended to the filename. One can also output
78 the result to a string.
79
80METHODS
81 Default arguments can be set when the object is created, and overridden
82 by setting arguments when the generate_toc method is called. Arguments
83 are given as a hash of arguments.
84
85 Method -- new
86 $toc = new HTML::GenToc();
87
88 $toc = new HTML::GenToc(toc_entry=>\%my_toc_entry,
89 toc_end=>\%my_toc_end,
90 bak=>'bak',
91 ...
92 );
93
94 Creates a new HTML::GenToc object.
95
96 These arguments will be used as defaults in invocations of other
97 methods.
98
99 See generate_tod for possible arguments.
100
101 generate_toc
102 $toc->generate_toc(outfile=>"index2.html");
103
104 my $result_str = $toc->generate_toc(to_string=>1);
105
106 Generates a table of contents for the significant elements in the HTML
107 documents, optionally generating anchors for them first.
108
109 Options
110
111 bak bak => *string*
112
113 If the input file/files is/are being overwritten (overwrite is on),
114 copy the original file to "*filename*.*string*". If the value is
115 empty, no backup file will be created. (default:org)
116
117 debug
118 debug => 1
119
120 Enable verbose debugging output. Used for debugging this module; in
121 other words, don't bother. (default:off)
122
123 entrysep
124 entrysep => *string*
125
126 Separator string for non-<li> item entries (default: ", ")
127
128 filenames
129 filenames => \@filenames
130
131 The filenames to use when creating table-of-contents links. This
132 overrides the filenames given in the input option, and is expected
133 to have exactly the same number of elements. This can also be used
134 when passing in string-content to the input option, to give a (fake)
135 filename to use for the links relating to that content.
136
137 footer
138 footer => *file_or_string*
139
140 Either the filename of the file containing footer text for ToC; or a
141 string containing the footer text.
142
143 header
144 header => *file_or_string*
145
146 Either the filename of the file containing header text for ToC; or a
147 string containing the header text.
148
149 ignore_only_one
150 ignore_only_one => 1
151
152 If there would be only one item in the ToC, don't make a ToC.
153
154 ignore_sole_first
155 ignore_sole_first => 1
156
157 If the first item in the ToC is of the highest level, AND it is the
158 only one of that level, ignore it. This is useful in web-pages where
159 there is only one H1 header but one doesn't know beforehand whether
160 there will be only one.
161
162 inline
163 inline => 1
164
165 Put ToC in document at a given point. See "Inlining the ToC" for
166 more information.
167
168 input
169 input => \@filenames
170
171 input => $content
172
173 This is expected to be either a reference to an array of filenames,
174 or a string containing content to process.
175
176 The three main uses would be:
177
178 (a) you have more than one file to process, so pass in multiple
179 filenames
180
181 (b) you have one file to process, so pass in its filename as the
182 only array item
183
184 (c) you have HTML content to process, so pass in just the content as
185 a string
186
187 (default:undefined)
188
189 notoc_match
190 notoc_match => *string*
191
192 If there are certain individual tags you don't wish to include in
193 the table of contents, even though they match the "significant
194 elements", then if this pattern matches contents inside the tag (not
195 the body), then that tag will not be included, either in generating
196 anchors nor in generating the ToC. (default: "class="notoc"")
197
198 ol ol => 1
199
200 Use an ordered list for level 1 ToC entries.
201
202 ol_num_levels
203 ol_num_levels => 2
204
205 The number of levels deep the OL listing will go if ol is true. If
206 set to zero, will use an ordered list for all levels. (default:1)
207
208 overwrite
209 overwrite => 1
210
211 Overwrite the input file with the output. (default:off)
212
213 outfile
214 outfile => *file*
215
216 File to write the output to. This is where the modified HTML output
217 goes to. Note that it doesn't make sense to use this option if you
218 are processing more than one file. If you give '-' as the filename,
219 then output will go to STDOUT. (default: STDOUT)
220
221 quiet
222 quiet => 1
223
224 Suppress informative messages. (default: off)
225
226 textonly
227 textonly => 1
228
229 Use only text content in significant elements.
230
231 title
232 title => *string*
233
234 Title for ToC page (if not using header or inline or toc_only)
235 (default: "Table of Contents")
236
237 toc_after
238 toc_after => \%toc_after_data
239
240 %toc_after_data = { *tag1* => *suffix1*, *tag2* => *suffix2* };
241
242 toc_after => { H2=>'</em>' }
243
244 For defining layout of significant elements in the ToC.
245
246 This expects a reference to a hash of tag=>suffix pairs.
247
248 The *tag* is the HTML tag which marks the start of the element. The
249 *suffix* is what is required to be appended to the Table of Contents
250 entry generated for that tag.
251
252 (default: undefined)
253
254 toc_before
255 toc_before => \%toc_before_data
256
257 %toc_before_data = { *tag1* => *prefix1*, *tag2* => *prefix2* };
258
259 toc_before=>{ H2=>'<em>' }
260
261 For defining the layout of significant elements in the ToC. The
262 *tag* is the HTML tag which marks the start of the element. The
263 *prefix* is what is required to be prepended to the Table of
264 Contents entry generated for that tag.
265
266 (default: undefined)
267
268 toc_end
269 toc_end => \%toc_end_data
270
271 %toc_end_data = { *tag1* => *endtag1*, *tag2* => *endtag2* };
272
273 toc_end => { H1 => '/H1', H2 => '/H2' }
274
275 For defining significant elements. The *tag* is the HTML tag which
276 marks the start of the element. The *endtag* the HTML tag which
277 marks the end of the element. When matching in the input file, case
278 is ignored (but make sure that all your *tag* options referring to
279 the same tag are exactly the same!).
280
281 toc_entry
282 toc_entry => \%toc_entry_data
283
284 %toc_entry_data = { *tag1* => *level1*, *tag2* => *level2* };
285
286 toc_entry => { H1 => 1, H2 => 2 }
287
288 For defining significant elements. The *tag* is the HTML tag which
289 marks the start of the element. The *level* is what level the tag is
290 considered to be. The value of *level* must be numeric, and
291 non-zero. If the value is negative, consective entries represented
292 by the significant_element will be separated by the value set by
293 entrysep option.
294
295 toclabel
296 toclabel => *string*
297
298 HTML text that labels the ToC. Always used. (default: "<h1>Table of
299 Contents</h1>")
300
301 toc_tag
302 toc_tag => *string*
303
304 If a ToC is to be included inline, this is the pattern which is used
305 to match the tag where the ToC should be put. This can be a
306 start-tag, an end-tag or a comment, but the < should be left out;
307 that is, if you want the ToC to be placed after the BODY tag, then
308 give "BODY". If you want a special comment tag to make where the ToC
309 should go, then include the comment marks, for example: "!--toc--"
310 (default:BODY)
311
312 toc_tag_replace
313 toc_tag_replace => 1
314
315 In conjunction with toc_tag, this is a flag to say whether the given
316 tag should be replaced, or if the ToC should be put after the tag.
317 This can be useful if your toc_tag is a comment and you don't need
318 it after you have the ToC in place. (default:false)
319
320 toc_only
321 toc_only => 1
322
323 Output only the Table of Contents, that is, the Table of Contents
324 plus the toclabel. If there is a header or a footer, these will also
325 be output.
326
327 If toc_only is false then if there is no header, and inline is not
328 true, then a suitable HTML page header will be output, and if there
329 is no footer and inline is not true, then a HTML page footer will be
330 output.
331
332 (default:false)
333
334 to_string
335 to_string => 1
336
337 Return the modified HTML output as a string. This *does* override
338 other methods of output (unlike version 3.00). If *to_string* is
339 false, the method will return 1 rather than a string.
340
341 use_id
342 use_id => 1
343
344 Use id="*name*" for anchors rather than <a name="*name*"/> anchors.
345 However if an anchor already exists for a Significant Element, this
346 won't make an id for that particular element.
347
348 useorg
349 useorg => 1
350
351 Use pre-existing backup files as the input source; that is, files of
352 the form *infile*.*bak* (see input and bak).
353
354INTERNAL METHODS
355 These methods are documented for developer purposes and aren't intended
356 to be used externally.
357
358 make_anchor_name
359 $toc->make_anchor_name(content=>$content,
360 anchors=>\%anchors);
361
362 Makes the anchor-name for one anchor. Bases the anchor on the content of
363 the significant element. Ensures that anchors are unique.
364
365 make_anchors
366 my $new_html = $toc->make_anchors(input=>$html,
367 notoc_match=>$notoc_match,
368 use_id=>$use_id,
369 toc_entry=>\%toc_entries,
370 toc_end=>\%toc_ends,
371 );
372
373 Makes the anchors the given input string. Returns a string.
374
375 make_toc_list
376 my @toc_list = $toc->make_toc_list(input=>$html,
377 labels=>\%labels,
378 notoc_match=>$notoc_match,
379 toc_entry=>\%toc_entry,
380 toc_end=>\%toc_end,
381 filename=>$filename);
382
383 Makes a list of lists which represents the structure and content of (a
384 portion of) the ToC from one file. Also updates a list of labels for the
385 ToC entries.
386
387 build_lol
388 Build a list of lists of paths, given a list of hashes with info about
389 paths.
390
391 output_toc
392 $self->output_toc(toc=>$toc_str,
393 input=>\@input,
394 filenames=>\@filenames);
395
396 Put the output (whether to file, STDOUT or string). The "output" in this
397 case could be the ToC, the modified (anchors added) HTML, or both.
398
399 put_toc_inline
400 my $newhtml = $toc->put_toc_inline(toc_str=>$toc_str,
401 filename=>$filename, in_string=>$in_string);
402
403 Puts the given toc_str into the given input string; returns a string.
404
405 cp
406 cp($src, $dst);
407
408 Copies file $src to $dst. Used for making backups of files.
409
410FILE FORMATS
411 Formatting the ToC
412 The toc_entry and other related options give you control on how the ToC
413 entries may look, but there are other options to affect the final
414 appearance of the ToC file created.
415
416 With the header option, the contents of the given file (or string) will
417 be prepended before the generated ToC. This allows you to have
418 introductory text, or any other text, before the ToC.
419
420 Note:
421 If you use the header option, make sure the file specified contains
422 the opening HTML tag, the HEAD element (containing the TITLE
423 element), and the opening BODY tag. However, these tags/elements
424 should not be in the header file if the inline option is used. See
425 "Inlining the ToC" for information on what the header file should
426 contain for inlining the ToC.
427
428 With the toclabel option, the contents of the given string will be
429 prepended before the generated ToC (but after any text taken from a
430 header file).
431
432 With the footer option, the contents of the file will be appended after
433 the generated ToC.
434
435 Note:
436 If you use the footer, make sure it includes the closing BODY and
437 HTML tags (unless, of course, you are using the inline option).
438
439 If the header option is not specified, the appropriate starting HTML
440 markup will be added, unless the toc_only option is specified. If the
441 footer option is not specified, the appropriate closing HTML markup will
442 be added, unless the toc_only option is specified.
443
444 If you do not want/need to deal with header, and footer, files, then you
445 are allowed to specify the title, title option, of the ToC file; and it
446 allows you to specify a heading, or label, to put before ToC entries'
447 list, the toclabel option. Both options have default values.
448
449 If you do not want HTML page tags to be supplied, and just want the ToC
450 itself, then specify the toc_only option. If there are no header or
451 footer files, then this will simply output the contents of toclabel and
452 the ToC itself.
453
454 Inlining the ToC
455 The ability to incorporate the ToC directly into an HTML document is
456 supported via the inline option.
457
458 Inlining will be done on the first file in the list of files processed,
459 and will only be done if that file contains an opening tag matching the
460 toc_tag value.
461
462 If overwrite is true, then the first file in the list will be
463 overwritten, with the generated ToC inserted at the appropriate spot.
464 Otherwise a modified version of the first file is output to either
465 STDOUT or to the output file defined by the outfile option.
466
467 The options toc_tag and toc_tag_replace are used to determine where and
468 how the ToC is inserted into the output.
469
470 Example 1
471
472 $toc->generate_toc(inline=>1,
473 toc_tag => 'BODY',
474 toc_tag_replace => 0,
475 ...
476 );
477
478 This will put the generated ToC after the BODY tag of the first file. If
479 the header option is specified, then the contents of the specified file
480 are inserted after the BODY tag. If the toclabel option is not empty,
481 then the text specified by the toclabel option is inserted. Then the ToC
482 is inserted, and finally, if the footer option is specified, it inserts
483 the footer. Then the rest of the input file follows as it was before.
484
485 Example 2
486
487 $toc->generate_toc(inline=>1,
488 toc_tag => '!--toc--',
489 toc_tag_replace => 1,
490 ...
491 );
492
493 This will put the generated ToC after the first comment of the form
494 <!--toc-->, and that comment will be replaced by the ToC (in the order
495 header toclabel ToC footer) followed by the rest of the input file.
496
497 Note:
498 The header file should not contain the beginning HTML tag and HEAD
499 element since the HTML file being processed should already contain
500 these tags/elements.
501
502NOTES
503 * HTML::GenToc is smart enough to detect anchors inside significant
504 elements. If the anchor defines the NAME attribute, HTML::GenToc
505 uses the value. Else, it adds its own NAME attribute to the anchor.
506 If use_id is true, then it likewise checks for and uses IDs.
507
508 * The TITLE element is treated specially if specified in the toc_entry
509 option. It is illegal to insert anchors (A) into TITLE elements.
510 Therefore, HTML::GenToc will actually link to the filename itself
511 instead of the TITLE element of the document.
512
513 * HTML::GenToc will ignore a significant element if it does not
514 contain any non-whitespace characters. A warning message is
515 generated if such a condition exists.
516
517 * If you have a sequence of significant elements that change in a
518 slightly disordered fashion, such as H1 -> H3 -> H2 or even H2 ->
519 H1, though HTML::GenToc deals with this to create a list which is
520 still good HTML, if you are using an ordered list to that depth,
521 then you will get strange numbering, as an extra list element will
522 have been inserted to nest the elements at the correct level.
523
524 For example (H2 -> H1 with ol_num_levels=1):
525
526 1.
527 * My H2 Header
528 2. My H1 Header
529
530 For example (H1 -> H3 -> H2 with ol_num_levels=0 and H3 also being
531 significant):
532
533 1. My H1 Header
534 1.
535 1. My H3 Header
536 2. My H2 Header
537 2. My Second H1 Header
538
539 In cases such as this it may be better not to use the ol option.
540
541CAVEATS
542 * Version 3.10 (and above) generates more verbose (SEO-friendly)
543 anchors than prior versions. Thus anchors generated with earlier
544 versions will not match version 3.10 anchors.
545
546 * Version 3.00 (and above) of HTML::GenToc is not compatible with
547 Version 2.x of HTML::GenToc. It is now designed to do everything in
548 one pass, and has dropped certain options: the infile option is no
549 longer used (it has been replaced with the input option); the
550 toc_file option no longer exists; use the outfile option instead;
551 the tocmap option is no longer supported. Also the old array-parsing
552 of arguments is no longer supported. There is no longer a
553 generate_anchors method; everything is done with generate_toc.
554
555 It now generates lower-case tags rather than upper-case ones.
556
557 * HTML::GenToc is not very efficient (memory and speed), and can be
558 slow for large documents.
559
560 * Now that generation of anchors and of the ToC are done in one pass,
561 even more memory is used than was the case before. This is more
562 notable when processing multiple files, since all files are read
563 into memory before processing them.
564
565 * Invalid markup will be generated if a significant element is
566 contained inside of an anchor. For example:
567
568 <a name="foo"><h1>The FOO command</h1></a>
569
570 will be converted to (if H1 is a significant element),
571
572 <a name="foo"><h1><a name="The">The</a> FOO command</h1></a>
573
574 which is illegal since anchors cannot be nested.
575
576 It is better style to put anchor statements within the element to be
577 anchored. For example, the following is preferred:
578
579 <h1><a name="foo">The FOO command</a></h1>
580
581 HTML::GenToc will detect the "foo" name and use it.
582
583 * name attributes without quotes are not recognized.
584
585BUGS
586 Tell me about them.
587
588REQUIRES
589 The installation of this module requires "Module::Build". The module
590 depends on "HTML::SimpleParse", "HTML::Entities" and "HTML::LinkList"
591 and uses "Data::Dumper" for debugging purposes. The hypertoc script
592 depends on "Getopt::Long", "Getopt::ArgvFile" and "Pod::Usage". Testing
593 of this distribution depends on "Test::More".
594
595INSTALLATION
596 To install this module, run the following commands:
597
598 perl Build.PL
599 ./Build
600 ./Build test
601 ./Build install
602
603 Or, if you're on a platform (like DOS or Windows) that doesn't like the
604 "./" notation, you can do this:
605
606 perl Build.PL
607 perl Build
608 perl Build test
609 perl Build install
610
611 In order to install somewhere other than the default, such as in a
612 directory under your home directory, like "/home/fred/perl" go
613
614 perl Build.PL --install_base /home/fred/perl
615
616 as the first step instead.
617
618 This will install the files underneath /home/fred/perl.
619
620 You will then need to make sure that you alter the PERL5LIB variable to
621 find the modules, and the PATH variable to find the script.
622
623 Therefore you will need to change: your path, to include
624 /home/fred/perl/script (where the script will be)
625
626 PATH=/home/fred/perl/script:${PATH}
627
628 the PERL5LIB variable to add /home/fred/perl/lib
629
630 PERL5LIB=/home/fred/perl/lib:${PERL5LIB}
631
632SEE ALSO
633 perl(1) htmltoc(1) hypertoc(1)
634
635AUTHOR
636 Kathryn Andersen (RUBYKAT) http://www.katspace.org/tools/hypertoc/
637
638 Based on htmltoc by Earl Hood ehood AT medusa.acs.uci.edu
639
640 Contributions by Dan Dascalescu, <http://dandascalescu.com>
641
642COPYRIGHT
643 Copyright (C) 1994-1997 Earl Hood, ehood AT medusa.acs.uci.edu Copyright
644 (C) 2002-2008 Kathryn Andersen
645
646 This program is free software; you can redistribute it and/or modify it
647 under the terms of the GNU General Public License as published by the
648 Free Software Foundation; either version 2 of the License, or (at your
649 option) any later version.
650
651 This program is distributed in the hope that it will be useful, but
652 WITHOUT ANY WARRANTY; without even the implied warranty of
653 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
654 Public License for more details.
655
656 You should have received a copy of the GNU General Public License along
657 with this program; if not, write to the Free Software Foundation, Inc.,
658 675 Mass Ave, Cambridge, MA 02139, USA.
659
660
README.mkdn
1# NAME
2
3HTML::GenToc - Generate a Table of Contents for HTML documents.
4
5# VERSION
6
7version 3.20
8
9# SYNOPSIS
10
11 use HTML::GenToc;
12
13 # create a new object
14 my $toc = new HTML::GenToc();
15
16 my $toc = new HTML::GenToc(title=>"Table of Contents",
17 toc_entry=>{
18 H1=>1,
19 H2=>2
20 },
21 toc_end=>{
22 H1=>'/H1',
23 H2=>'/H2'
24 }
25 );
26
27 # generate a ToC from a file
28 $toc->generate_toc(input=>$html_file,
29 footer=>$footer_file,
30 header=>$header_file
31 );
32
33
34
35# DESCRIPTION
36
37HTML::GenToc generates anchors and a table of contents for
38HTML documents. Depending on the arguments, it will insert
39the information it generates, or output to a string, a separate file
40or STDOUT.
41
42While it defaults to taking H1 and H2 elements as the significant
43elements to put into the table of contents, any tag can be defined
44as a significant element. Also, it doesn't matter if the input
45HTML code is complete, pure HTML, one can input pseudo-html
46or page-fragments, which makes it suitable for using on templates
47and HTML meta-languages such as WML.
48
49Also included in the distrubution is hypertoc, a script which uses the
50module so that one can process files on the command-line in a
51user-friendly manner.
52
53# DETAILS
54
55The ToC generated is a multi-level level list containing links to the
56significant elements. HTML::GenToc inserts the links into the ToC to
57significant elements at a level specified by the user.
58
59__Example:__
60
61If H1s are specified as level 1, than they appear in the first
62level list of the ToC. If H2s are specified as a level 2, than
63they appear in a second level list in the ToC.
64
65Information on the significant elements and what level they should occur
66are passed in to the methods used by this object, or one can use the
67defaults.
68
69There are two phases to the ToC generation. The first phase is to
70put suitable anchors into the HTML documents, and the second phase
71is to generate the ToC from HTML documents which have anchors
72in them for the ToC to link to.
73
74For more information on controlling the contents of the created ToC, see
75L</Formatting the ToC>.
76
77HTML::GenToc also supports the ability to incorporate the ToC into the HTML
78document itself via the __inline__ option. See L</Inlining the ToC> for more
79information.
80
81In order for HTML::GenToc to support linking to significant elements,
82HTML::GenToc inserts anchors into the significant elements. One can
83use HTML::GenToc as a filter, outputing the result to another file,
84or one can overwrite the original file, with the original backed
85up with a suffix (default: "org") appended to the filename.
86One can also output the result to a string.
87
88# METHODS
89
90Default arguments can be set when the object is created, and overridden
91by setting arguments when the generate_toc method is called.
92Arguments are given as a hash of arguments.
93
94## Method -- new
95
96 $toc = new HTML::GenToc();
97
98 $toc = new HTML::GenToc(toc_entry=>\%my_toc_entry,
99 toc_end=>\%my_toc_end,
100 bak=>'bak',
101 ...
102 );
103
104Creates a new HTML::GenToc object.
105
106These arguments will be used as defaults in invocations of other methods.
107
108See [generate_tod](http://search.cpan.org/perldoc?generate_tod) for possible arguments.
109
110## generate_toc
111
112 $toc->generate_toc(outfile=>"index2.html");
113
114 my $result_str = $toc->generate_toc(to_string=>1);
115
116Generates a table of contents for the significant elements in the HTML
117documents, optionally generating anchors for them first.
118
119__Options__
120
121- bak
122
123bak => _string_
124
125If the input file/files is/are being overwritten (__overwrite__ is on), copy
126the original file to "_filename_._string_". If the value is empty, __no__
127backup file will be created.
128(default:org)
129
130- debug
131
132debug => 1
133
134Enable verbose debugging output. Used for debugging this module;
135in other words, don't bother.
136(default:off)
137
138- entrysep
139
140entrysep => _string_
141
142Separator string for non-<li> item entries
143(default: ", ")
144
145- filenames
146
147filenames => \@filenames
148
149The filenames to use when creating table-of-contents links.
150This overrides the filenames given in the __input__ option,
151and is expected to have exactly the same number of elements.
152This can also be used when passing in string-content to the __input__
153option, to give a (fake) filename to use for the links relating
154to that content.
155
156- footer
157
158footer => _file_or_string_
159
160Either the filename of the file containing footer text for ToC;
161or a string containing the footer text.
162
163- header
164
165header => _file_or_string_
166
167Either the filename of the file containing header text for ToC;
168or a string containing the header text.
169
170- ignore_only_one
171
172ignore_only_one => 1
173
174If there would be only one item in the ToC, don't make a ToC.
175
176- ignore_sole_first
177
178ignore_sole_first => 1
179
180If the first item in the ToC is of the highest level,
181AND it is the only one of that level, ignore it.
182This is useful in web-pages where there is only one H1 header
183but one doesn't know beforehand whether there will be only one.
184
185- inline
186
187inline => 1
188
189Put ToC in document at a given point.
190See L</Inlining the ToC> for more information.
191
192- input
193
194input => \@filenames
195
196input => $content
197
198This is expected to be either a reference to an array of filenames,
199or a string containing content to process.
200
201The three main uses would be:
202
203 - (a)
204
205 you have more than one file to process, so pass in multiple filenames
206
207 - (b)
208
209 you have one file to process, so pass in its filename as the only array item
210
211 - (c)
212
213 you have HTML content to process, so pass in just the content as a string
214
215(default:undefined)
216
217- notoc_match
218
219notoc_match => _string_
220
221If there are certain individual tags you don't wish to include in the
222table of contents, even though they match the "significant elements",
223then if this pattern matches contents inside the tag (not the body),
224then that tag will not be included, either in generating anchors nor in
225generating the ToC. (default: `class="notoc"`)
226
227- ol
228
229ol => 1
230
231Use an ordered list for level 1 ToC entries.
232
233- ol_num_levels
234
235ol_num_levels => 2
236
237The number of levels deep the OL listing will go if __ol__ is true.
238If set to zero, will use an ordered list for all levels.
239(default:1)
240
241- overwrite
242
243overwrite => 1
244
245Overwrite the input file with the output.
246(default:off)
247
248- outfile
249
250outfile => _file_
251
252File to write the output to. This is where the modified HTML
253output goes to. Note that it doesn't make sense to use this option if you
254are processing more than one file. If you give '-' as the filename, then
255output will go to STDOUT.
256(default: STDOUT)
257
258- quiet
259
260quiet => 1
261
262Suppress informative messages. (default: off)
263
264- textonly
265
266textonly => 1
267
268Use only text content in significant elements.
269
270- title
271
272title => _string_
273
274Title for ToC page (if not using __header__ or __inline__ or __toc_only__)
275(default: "Table of Contents")
276
277- toc_after
278
279toc_after => \%toc_after_data
280
281%toc_after_data = { _tag1_ => _suffix1_,
282 _tag2_ => _suffix2_
283 };
284
285toc_after => { H2=>'</em>' }
286
287For defining layout of significant elements in the ToC.
288
289This expects a reference to a hash of
290tag=>suffix pairs.
291
292The _tag_ is the HTML tag which marks the start of the element. The
293_suffix_ is what is required to be appended to the Table of Contents
294entry generated for that tag.
295
296(default: undefined)
297
298- toc_before
299
300toc_before => \%toc_before_data
301
302%toc_before_data = { _tag1_ => _prefix1_,
303 _tag2_ => _prefix2_
304 };
305
306toc_before=>{ H2=>'<em>' }
307
308For defining the layout of significant elements in the ToC. The _tag_
309is the HTML tag which marks the start of the element. The _prefix_ is
310what is required to be prepended to the Table of Contents entry
311generated for that tag.
312
313(default: undefined)
314
315- toc_end
316
317toc_end => \%toc_end_data
318
319%toc_end_data = { _tag1_ => _endtag1_,
320 _tag2_ => _endtag2_
321 };
322
323toc_end => { H1 => '/H1', H2 => '/H2' }
324
325For defining significant elements. The _tag_ is the HTML tag which
326marks the start of the element. The _endtag_ the HTML tag which marks
327the end of the element. When matching in the input file, case is
328ignored (but make sure that all your _tag_ options referring to the
329same tag are exactly the same!).
330
331- toc_entry
332
333toc_entry => \%toc_entry_data
334
335%toc_entry_data = { _tag1_ => _level1_,
336 _tag2_ => _level2_
337 };
338
339toc_entry => { H1 => 1, H2 => 2 }
340
341For defining significant elements. The _tag_ is the HTML tag which marks
342the start of the element. The _level_ is what level the tag is considered
343to be. The value of _level_ must be numeric, and non-zero. If the value
344is negative, consective entries represented by the significant_element will
345be separated by the value set by __entrysep__ option.
346
347- toclabel
348
349toclabel => _string_
350
351HTML text that labels the ToC. Always used.
352(default: "<h1>Table of Contents</h1>")
353
354- toc_tag
355
356toc_tag => _string_
357
358If a ToC is to be included inline, this is the pattern which is used to
359match the tag where the ToC should be put. This can be a start-tag, an
360end-tag or a comment, but the < should be left out; that is, if you
361want the ToC to be placed after the BODY tag, then give "BODY". If you
362want a special comment tag to make where the ToC should go, then include
363the comment marks, for example: "!--toc--" (default:BODY)
364
365- toc_tag_replace
366
367toc_tag_replace => 1
368
369In conjunction with __toc_tag__, this is a flag to say whether the given tag
370should be replaced, or if the ToC should be put after the tag.
371This can be useful if your toc_tag is a comment and you don't need it
372after you have the ToC in place.
373(default:false)
374
375- toc_only
376
377toc_only => 1
378
379Output only the Table of Contents, that is, the Table of Contents plus
380the toclabel. If there is a __header__ or a __footer__, these will also be
381output.
382
383If __toc_only__ is false then if there is no __header__, and __inline__ is
384not true, then a suitable HTML page header will be output, and if there
385is no __footer__ and __inline__ is not true, then a HTML page footer will
386be output.
387
388(default:false)
389
390- to_string
391
392to_string => 1
393
394Return the modified HTML output as a string. This _does_ override
395other methods of output (unlike version 3.00). If _to_string_ is false,
396the method will return 1 rather than a string.
397
398- use_id
399
400use_id => 1
401
402Use id="_name_" for anchors rather than <a name="_name_"/> anchors.
403However if an anchor already exists for a Significant Element, this
404won't make an id for that particular element.
405
406- useorg
407
408useorg => 1
409
410Use pre-existing backup files as the input source; that is, files of the
411form _infile_._bak_ (see __input__ and __bak__).
412
413# INTERNAL METHODS
414
415These methods are documented for developer purposes and aren't intended
416to be used externally.
417
418## make_anchor_name
419
420 $toc->make_anchor_name(content=>$content,
421 anchors=>\%anchors);
422
423Makes the anchor-name for one anchor.
424Bases the anchor on the content of the significant element.
425Ensures that anchors are unique.
426
427## make_anchors
428
429 my $new_html = $toc->make_anchors(input=>$html,
430 notoc_match=>$notoc_match,
431 use_id=>$use_id,
432 toc_entry=>\%toc_entries,
433 toc_end=>\%toc_ends,
434 );
435
436Makes the anchors the given input string.
437Returns a string.
438
439## make_toc_list
440
441 my @toc_list = $toc->make_toc_list(input=>$html,
442 labels=>\%labels,
443 notoc_match=>$notoc_match,
444 toc_entry=>\%toc_entry,
445 toc_end=>\%toc_end,
446 filename=>$filename);
447
448Makes a list of lists which represents the structure and content
449of (a portion of) the ToC from one file.
450Also updates a list of labels for the ToC entries.
451
452## build_lol
453
454Build a list of lists of paths, given a list
455of hashes with info about paths.
456
457## output_toc
458
459 $self->output_toc(toc=>$toc_str,
460 input=>\@input,
461 filenames=>\@filenames);
462
463Put the output (whether to file, STDOUT or string).
464The "output" in this case could be the ToC, the modified
465(anchors added) HTML, or both.
466
467## put_toc_inline
468
469 my $newhtml = $toc->put_toc_inline(toc_str=>$toc_str,
470 filename=>$filename, in_string=>$in_string);
471
472Puts the given toc_str into the given input string;
473returns a string.
474
475## cp
476
477 cp($src, $dst);
478
479Copies file $src to $dst.
480Used for making backups of files.
481
482# FILE FORMATS
483
484## Formatting the ToC
485
486The __toc_entry__ and other related options give you control on how the
487ToC entries may look, but there are other options to affect the final
488appearance of the ToC file created.
489
490With the __header__ option, the contents of the given file (or string)
491will be prepended before the generated ToC. This allows you to have
492introductory text, or any other text, before the ToC.
493
494- Note:
495
496If you use the __header__ option, make sure the file specified
497contains the opening HTML tag, the HEAD element (containing the
498TITLE element), and the opening BODY tag. However, these
499tags/elements should not be in the header file if the __inline__
500option is used. See L</Inlining the ToC> for information on what
501the header file should contain for inlining the ToC.
502
503With the __toclabel__ option, the contents of the given string will be
504prepended before the generated ToC (but after any text taken from a
505__header__ file).
506
507With the __footer__ option, the contents of the file will be appended
508after the generated ToC.
509
510- Note:
511
512If you use the __footer__, make sure it includes the closing BODY
513and HTML tags (unless, of course, you are using the __inline__ option).
514
515If the __header__ option is not specified, the appropriate starting
516HTML markup will be added, unless the __toc_only__ option is specified.
517If the __footer__ option is not specified, the appropriate closing
518HTML markup will be added, unless the __toc_only__ option is specified.
519
520If you do not want/need to deal with header, and footer, files, then
521you are allowed to specify the title, __title__ option, of the ToC file;
522and it allows you to specify a heading, or label, to put before ToC
523entries' list, the __toclabel__ option. Both options have default values.
524
525If you do not want HTML page tags to be supplied, and just want
526the ToC itself, then specify the __toc_only__ option.
527If there are no __header__ or __footer__ files, then this will simply
528output the contents of __toclabel__ and the ToC itself.
529
530## Inlining the ToC
531
532The ability to incorporate the ToC directly into an HTML document
533is supported via the __inline__ option.
534
535Inlining will be done on the first file in the list of files processed,
536and will only be done if that file contains an opening tag matching the
537__toc_tag__ value.
538
539If __overwrite__ is true, then the first file in the list will be
540overwritten, with the generated ToC inserted at the appropriate spot.
541Otherwise a modified version of the first file is output to either STDOUT
542or to the output file defined by the __outfile__ option.
543
544The options __toc_tag__ and __toc_tag_replace__ are used to determine where
545and how the ToC is inserted into the output.
546
547__Example 1__
548
549 $toc->generate_toc(inline=>1,
550 toc_tag => 'BODY',
551 toc_tag_replace => 0,
552 ...
553 );
554
555This will put the generated ToC after the BODY tag of the first file.
556If the __header__ option is specified, then the contents of the specified
557file are inserted after the BODY tag. If the __toclabel__ option is not
558empty, then the text specified by the __toclabel__ option is inserted.
559Then the ToC is inserted, and finally, if the __footer__ option is
560specified, it inserts the footer. Then the rest of the input file
561follows as it was before.
562
563__Example 2__
564
565 $toc->generate_toc(inline=>1,
566 toc_tag => '!--toc--',
567 toc_tag_replace => 1,
568 ...
569 );
570
571This will put the generated ToC after the first comment of the form
572<!--toc-->, and that comment will be replaced by the ToC
573(in the order
574 __header__
575 __toclabel__
576 ToC
577 __footer__)
578followed by the rest of the input file.
579
580- Note:
581
582The header file should not contain the beginning HTML tag
583and HEAD element since the HTML file being processed should
584already contain these tags/elements.
585
586# NOTES
587
588- *
589
590HTML::GenToc is smart enough to detect anchors inside significant
591elements. If the anchor defines the NAME attribute, HTML::GenToc uses
592the value. Else, it adds its own NAME attribute to the anchor.
593If __use_id__ is true, then it likewise checks for and uses IDs.
594
595- *
596
597The TITLE element is treated specially if specified in the __toc_entry__
598option. It is illegal to insert anchors (A) into TITLE elements.
599Therefore, HTML::GenToc will actually link to the filename itself
600instead of the TITLE element of the document.
601
602- *
603
604HTML::GenToc will ignore a significant element if it does not contain
605any non-whitespace characters. A warning message is generated if
606such a condition exists.
607
608- *
609
610If you have a sequence of significant elements that change in a slightly
611disordered fashion, such as H1 -> H3 -> H2 or even H2 -> H1, though
612HTML::GenToc deals with this to create a list which is still good HTML, if
613you are using an ordered list to that depth, then you will get strange
614numbering, as an extra list element will have been inserted to nest the
615elements at the correct level.
616
617For example (H2 -> H1 with ol_num_levels=1):
618
619 1.
620 * My H2 Header
621 2. My H1 Header
622
623For example (H1 -> H3 -> H2 with ol_num_levels=0 and H3 also being
624significant):
625
626 1. My H1 Header
627 1.
628 1. My H3 Header
629 2. My H2 Header
630 2. My Second H1 Header
631
632In cases such as this it may be better not to use the __ol__ option.
633
634# CAVEATS
635
636- *
637
638Version 3.10 (and above) generates more verbose (SEO-friendly) anchors
639than prior versions. Thus anchors generated with earlier versions will
640not match version 3.10 anchors.
641
642- *
643
644Version 3.00 (and above) of HTML::GenToc is not compatible with
645Version 2.x of HTML::GenToc. It is now designed to do everything
646in one pass, and has dropped certain options: the __infile__ option
647is no longer used (it has been replaced with the __input__ option);
648the __toc_file__ option no longer exists; use the __outfile__ option
649instead; the __tocmap__ option is no longer supported. Also the old
650array-parsing of arguments is no longer supported. There is no longer
651a __generate_anchors__ method; everything is done with __generate_toc__.
652
653It now generates lower-case tags rather than upper-case ones.
654
655- *
656
657HTML::GenToc is not very efficient (memory and speed), and can be
658slow for large documents.
659
660- *
661
662Now that generation of anchors and of the ToC are done in one pass,
663even more memory is used than was the case before. This is more notable
664when processing multiple files, since all files are read into memory
665before processing them.
666
667- *
668
669Invalid markup will be generated if a significant element is
670contained inside of an anchor. For example:
671
672 <a name="foo"><h1>The FOO command</h1></a>
673
674will be converted to (if H1 is a significant element),
675
676 <a name="foo"><h1><a name="The">The</a> FOO command</h1></a>
677
678which is illegal since anchors cannot be nested.
679
680It is better style to put anchor statements within the element to
681be anchored. For example, the following is preferred:
682
683 <h1><a name="foo">The FOO command</a></h1>
684
685HTML::GenToc will detect the "foo" name and use it.
686
687- *
688
689name attributes without quotes are not recognized.
690
691# BUGS
692
693Tell me about them.
694
695# REQUIRES
696
697The installation of this module requires `Module::Build`. The module
698depends on `HTML::SimpleParse`, `HTML::Entities` and `HTML::LinkList` and uses
699`Data::Dumper` for debugging purposes. The hypertoc script depends on
700`Getopt::Long`, `Getopt::ArgvFile` and `Pod::Usage`. Testing of this
701distribution depends on `Test::More`.
702
703# INSTALLATION
704
705To install this module, run the following commands:
706
707 perl Build.PL
708 ./Build
709 ./Build test
710 ./Build install
711
712Or, if you're on a platform (like DOS or Windows) that doesn't like the
713"./" notation, you can do this:
714
715 perl Build.PL
716 perl Build
717 perl Build test
718 perl Build install
719
720In order to install somewhere other than the default, such as
721in a directory under your home directory, like "/home/fred/perl"
722go
723
724 perl Build.PL --install_base /home/fred/perl
725
726as the first step instead.
727
728This will install the files underneath /home/fred/perl.
729
730You will then need to make sure that you alter the PERL5LIB variable to
731find the modules, and the PATH variable to find the script.
732
733Therefore you will need to change:
734your path, to include /home/fred/perl/script (where the script will be)
735
736 PATH=/home/fred/perl/script:${PATH}
737
738the PERL5LIB variable to add /home/fred/perl/lib
739
740 PERL5LIB=/home/fred/perl/lib:${PERL5LIB}
741
742# SEE ALSO
743
744perl(1)
745htmltoc(1)
746hypertoc(1)
747
748# AUTHOR
749
750Kathryn Andersen (RUBYKAT) http://www.katspace.org/tools/hypertoc/
751
752Based on htmltoc by Earl Hood ehood AT medusa.acs.uci.edu
753
754Contributions by Dan Dascalescu, <http://dandascalescu.com>
755
756# COPYRIGHT
757
758Copyright (C) 1994-1997 Earl Hood, ehood AT medusa.acs.uci.edu
759Copyright (C) 2002-2008 Kathryn Andersen
760
761This program is free software; you can redistribute it and/or modify
762it under the terms of the GNU General Public License as published by
763the Free Software Foundation; either version 2 of the License, or
764(at your option) any later version.
765
766This program is distributed in the hope that it will be useful,
767but WITHOUT ANY WARRANTY; without even the implied warranty of
768MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
769GNU General Public License for more details.
770
771You should have received a copy of the GNU General Public License
772along with this program; if not, write to the Free Software
773Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.