1%% /u/sy/beebe/src/htmlpty/htmlpty-1.00/INSTALL, Fri Nov 28 13:56:40 1997
2%% Edit by Nelson H. F. Beebe <beebe@plot79.math.utah.edu>
3
4==================
5QUICK INSTALLATION
6==================
7
8You can build and install html-pretty on UNIX systems with little
9difficulty, using the GNU standard incantation
10
11    ./configure && make all check install
12
13On systems with unusual compiler names (e.g., HAL SPARC64/OS_2.4.5),
14you may need to specify a C and C++ compiler to use; see the CC= and
15CCC= options to env in the example below.
16
17NB: you can safely ignore this warning [your line number may differ]:
18
19	flex  -t htmlpty.l | sed -e '/^# *line/d' > htmlpty.c
20	"htmlpty.l", line 620: warning, rule cannot be matched
21
22That rule is there intentionally, to avoid losing output in the event
23earlier rules fail to match all possible input patterns.
24
25If you want to repeat the process in the same directory, but for a
26different architecture, first do
27
28    make distclean
29
30to restore the directory to the state of a fresh distribution, then
31repeat the configure and make steps as before.
32
33If the builds, checks, and installs were successful, you can stop
34reading now!
35
36			 --------------------
37
38The configure script will create Makefile from Makefile.in, and
39config.h from config.hin, an essential step before make and compilers
40can be used.
41
42There are no top-level Makefile and config.h files included in a new
43distribution, since configure is expected to generate them.  However,
44for safety, there are backup copies of Makefile, config.h, configure,
45and htmlpty.c in the Backup subdirectory; they were prepared on a Sun
46Solaris 2.5 system.  You can use copies of them should you experience
47problems with configure, or if you are porting the software to a
48non-UNIX system that does not have a POSIX- or UNIX-like shell
49(several non-UNIX systems have such shells, including IBM PC DOS, IBM
50VM/CMS, and Microsoft Windows NT).
51
52If you wish to use a particular C and/or C++ compiler and optimization
53level, do it like this:
54
55    env CC=..C-compiler.. CCC=..C++-compiler.. ./configure && \
56	make OPT='-optlevel' all check install
57
58or if your system lacks the env command
59
60    sh/ksh/bash shell:
61    CC=..C-compiler.. CCC=..C++-compiler.. ./configure && \
62	make OPT='-optlevel' all check install
63
64    csh/tcsh shell:
65    setenv CC ..C-compiler..
66    setenv CCC ..C++-compiler..
67    ./configure && make OPT='-optlevel' all check install
68
69You do not need to rerun configure if you only change optimization
70levels, but you should do if you change other compiler options, since
71they may affect the visibility of symbols and declarations in system
72header files, and that in turn may require (automated) changes to the
73config.h file.
74
75In the Makefile, you may want to change the definitions of the BINDIR
76and MANDIR installation directories; the distribution version uses the
77Free Software Foundation's standards of /usr/local/bin and
78/usr/local/man, which avoids contaminating any vendor-provided
79directories.  This is readily done on the make command-line, like
80this:
81
82     make prefix=/some/other/place targets
83
84or, better, at configure time, like this:
85
86     ./configure --prefix=/some/other/place
87
88
89====================
90INSTALLATION DETAILS
91====================
92
93The Makefile contains about 60 convenience targets for picking
94particular compilers and optimization levels on different UNIX
95systems.  If you use them, be sure to run configure first with CC set
96to choose the appropriate compiler.
97
98In the Makefile, read carefully the comments on the merits of choosing
99lex or flex; flex is the default for the reasons stated there, but lex
100can be used on SOME, but not all, UNIX systems.  Although flex
101usuually produces faster lexical analyzers than lex, for html-pretty
1021.00, the lex version is slightly faster.  However, because the lex
103version has problems with some characters in the range 128..255 on
104some systems, flex is still the default lexical analyzer generator.
105
106On HP 9000/7xx HP-UX 10.01 systems, and possibly others, it is
107necessary to force yytext to be an array rather than a pointer; this
108is done by adding -DYYCHAR_ARRAY to the compilation options, e.g.
109
110	make XCFLAGS=-DYYCHAR_ARRAY
111
112Because some lex implementations allocate peculiar sizes for yytext[],
113html-pretty checks at startup that the expected size matches the
114actual size, and if they differ, it quits immediately with a message
115like this:
116
117**********************************************************************
118** FATAL ERROR: This program has inconsistent array sizes:
119**      YYLMAX = 8192
120**      sizeof(yytext) = 0
121** You must correct the lex-generated code and rebuild the program.
122** The Makefile has a sed filter step that should correct the error.
123** Did you override this by running lex manually?
124**********************************************************************
125
126As the message notes, the installation procedure tries to correct such
127abberrations, but may not succeed on all systems.
128
129In the Makefile, pick the appropriate definition of HOST, or if
130neither works (highly unlikely in the Internet world), you can specify
131it on the make command line with something like
132
133	make HOST=-D\"foo.bar.baz\"
134
135Then just type
136
137	make
138
139On Sun Solaris 2.x, with the default vendor C compiler, you may need
140to type
141
142	make CC='cc -Xc'
143
144because flex-generated code otherwise defines const to an empty string
145after including some system header files, and before including others.
146The result is function prototype name conflicts for names declared in
147multiple header files (e.g. rename()).  The -Xc option requests strict
148Standard C compilation, and avoids the problem.
149
150On some systems (e.g. HP-UX with c89), you may need to add
151-D_POSIX_SOURCE in order to get the utsname structures visible.
152
153html-pretty has been successfully built and tested on these systems:
154
155DECstation 3100		ULTRIX 4.3		cc, lcc, gcc, g++
156DEC Alpha 3000/400	OSF/1 2.0		cc, c89, gcc
157DEC Alpha 3000/300LX	OSF/1 3.0		cc, c89, gcc, g++
158HALstation 385		SPARC64/OS_2.4.5	hcc, FCC
159HP 9000/735		HP-UX 9.0.5		cc, c89, CC, gcc, g++
160IBM PC			MS DOS 5.0		tcc (2.0 and 3.0)
161IBM RS/6000		AIX 3.2.5		cc, xlC, gcc, g++
162MIPS RC6280		RISCos 2.1.1AC		cc
163NeXT Turbostation	Mach 3.0		cc, lcc, gcc, g++
164Sun SPARCstation	SunOS 4.1.3		cc, lcc, gcc, g++
165Sun SPARCstation	Solaris 2.3 and 2.4	cc, CC, lcc, gcc, g++
166Silicon Graphics Indigo	IRIX 5.3, 6.2, 6.4	cc, c89, CC, gcc, g++
167
168With flex 2.5.1, all systems passed, and the same hold for the more
169recent flex 2.5.4.  If you get failures in check005 at the characters
170in 128..255 with flex, you probably have an older version and should
171upgrade (flex sources are on
172ftp://prep.ai.mit.edu/pub/gnu/flex*.tar.gz, and on the many sites that
173mirror the Free Software Foundation archive); flex 2.3 is one such
174failing version.
175
176With lex, some implementations lose characters with values in the
177range 128..255 on check005.in.
178
179On other systems, please try a C++ compiler if you have one, because
180it is more likely to catch problems than C compilers do.
181flex-generated code is C++ compatible, but some vendor lex
182implementations are still in the old K&R C mold, instead of conforming
183to 1989 ANSI/ISO Standard C, and produce C code that cannot be
184compiled with C++ compilers.
185
186To build, check, and install html-pretty, just use the conventional
187GNU standard incantation:
188
189	./configure && make all check install
190
191Installation will not happen if any of the validation checks fail.  In
192the current release, configure is a dummy script, but it may do more
193in later releases.
194
195It you want to change the default compiler optimization level, set the
196variable OPT on the command line.  Other compiler flags can be
197supplied with the XCFLAGS variable.  For example
198
199	make OPT=-O3 XCFLAGS=-Xc all check install
200
201If you need additional libraries, you can add a LIBS value on the make
202command line, e.g., I found that compilation with gprof profiling with
203C++ on Sun Solaris 2.5 needed this:
204
205	make OPT=-pg LIBS=-ldl
206
207Should you want to remove an installed version, just do
208
209	make uninstall
210
211When you finish, do
212
213	make clean
214
215to remove intermediate files, leaving the executable, or
216
217	make distclean
218
219to reduce everything to the state of the original distribution.
220
221If you do
222
223	make maintainer-clean
224
225[but please do NOT!]  you will also remove the bootstrap htmlpty.c
226file, which needs lex or flex to be rebuilt, the ASCII, HTML, PDF, and
227PostScript forms of the manual pages, which need man2ps, man2html,
228groff/nroff, and Adobe Acrobat Distiller to recreate, and the
229configure and config.hin files, which need GNU autoconf and autoheader
230to recreate.
231
232
233==================================
234EFFICIENCY ISSUES FOR HTMLPTY 0.11
235==================================
236
237lex (and flex)-generated lexical analyzers are generally quite
238efficient, and this one is no exception.  With version 0.11 of
239htmlpty, on a large test file of 78,300 lines (2.4MB), made by
240concatenating all of the .html files on my home file system, the flex
241version of html-pretty took only 10.72 sec (7306 lines/sec, 269K input
242bytes/sec) on an entry-level Sun SPARCstation LX workstation, with the
243code compiled at the highest optimization level (-xO4) with the Sun
244Solaris 2.4 native C compiler.  This optimization level results in
245inlining of short functions, of which there are many in this program.
246
247A more general result is the ratio of html-pretty's run time with that
248of a simple program which copies the same file with the loop
249
250	while ((c = getchar()) != EOF)
251            putchar(c);
252
253so that every input character is input and output individually.  The
254copy loop ran in 3.21 sec, and the time ratio is 3.34.  The output
255file size is 3.3MB, so adjusting for the total number of bytes input
256and output, this ratio can be further scaled down by (2399316 +
2573295133)/(2 * 2399316) = 0.69, to a value of 2.29.
258
259Line profiling with Sun tcov revealed hot spots inside the
260flex-generated code, about which one can do nothing, and in the
261copy_verbatim() routine, which is called a lot because the test file
262has a lot of <PRE> ... </PRE> environments.
263
264CPU time profiling with gprof gave this flat profile; note that
265_read() and _write() together account for about 45% of the CPU time.
266
267Each sample counts as 0.01 seconds.
268  %   cumulative   self              self     total
269 time   seconds   seconds    calls  ms/call  ms/call  name
270 27.05      7.41     7.41      594    12.47    12.47  _read
271 18.33     12.43     5.02      813     6.17     6.17  _write
272 10.55     15.32     2.89        1  2890.00 24003.83  yylex
273  8.80     17.73     2.41      659     3.66    13.24  copy_verbatim
274  8.54     20.07     2.34   112354     0.02     0.03  out_string
275  6.28     21.79     1.72                             _mcount
276  3.58     22.77     0.98                             oldarc
277  3.36     23.69     0.92   423743     0.00     0.00  .mul
278  2.34     24.33     0.64    28719     0.02     0.06  line_end
279  2.26     24.95     0.62    53771     0.01     0.01  _memccpy
280  1.53     25.37     0.42   261034     0.00     0.00  strchr
281  1.28     25.72     0.35    70858     0.00     0.01  blank
282  0.91     25.97     0.25    53273     0.00     0.07  fputs
283  0.91     26.22     0.25    27228     0.01     0.01  normalize_tag
284  0.62     26.39     0.17                             done
285  0.51     26.53     0.14    84749     0.00     0.04  out_yytext
286  0.51     26.67     0.14     8387     0.02     0.20  list_item
287  0.37     26.77     0.10     9998     0.01     0.17  pair
288  0.37     26.87     0.10        8    12.50    12.50  _open
289  0.33     26.96     0.09      608     0.15     0.15  memcpy
290  0.26     27.03     0.07   108476     0.00     0.00  _thr_main_stub
291  0.26     27.10     0.07    54999     0.00     0.00  _realbufend
292  0.26     27.17     0.07    53327     0.00     0.00  out_blank
293  0.15     27.21     0.04    12875     0.00     0.00  toupper
294  0.11     27.24     0.03    55004     0.00     0.00  _rw_rdlock_stub
295  0.11     27.27     0.03     3848     0.01     0.06  font
296  0.07     27.29     0.02     2158     0.01     0.31  paragraph
297...
298
299and this hierarchical profile (function whose names begin with
300underscore are library routines):
301
302granularity: each sample hit covers 4 byte(s) for 0.04% of 25.67 seconds
303
304index % time    self  children    called     name
305[1]     95.3    0.00   24.46       1         main [1]
306[2]     95.3    0.00   24.46                 _start [2]
307[3]     93.5    2.89   21.11       1         yylex [3]
308[4]     34.1    0.01    8.75     664         verbatim [4]
309[5]     34.0    2.41    6.32     659         copy_verbatim [5]
310[6]     29.1    0.00    7.46     307         fread [6]
311[7]     28.9    7.41    0.00     594         _read [7]
312[8]     28.8    0.01    7.37     590         __filbuf [8]
313[9]     27.9    0.00    7.17     295         yy_get_next_buffer [9]
314[10]    19.6    5.02    0.00     813         _write [10]
315[11]    19.4    0.01    4.97     804         _xflsbuf [11]
316[12]    15.4    0.25    3.71   53273         fputs [12]
317[13]    15.3    2.34    1.58  112354         out_string [13]
318[14]    12.1    0.14    2.96   84749         out_yytext [14]
319[15]     8.0    0.00    2.05     332         __flsbuf [15]
320[16]     6.9    0.64    1.14   28719         line_end [16]
321[17]     6.7    0.14    1.57    8387         list_item [17]
322[18]     6.6    0.10    1.59    9998         pair [18]
323[19]     3.8    0.98    0.00                 oldarc [19]
324[20]     3.6    0.92    0.00  423743         .mul [20]
325[21]     2.6    0.02    0.66    2158         paragraph [21]
326[22]     2.6    0.35    0.32   70858         blank [22]
327[23]     2.4    0.62    0.00   53771         _memccpy [23]
328[24]     1.8    0.00    0.46       1         out_banner [24]
329[25]     1.6    0.42    0.00  261034         strchr [25]
330[28]     1.3    0.25    0.09   27228         normalize_tag [28]
331[30]     0.9    0.03    0.20    3848         font [30]
332[31]     0.7    0.17    0.00                 done [31]
333[34]     0.4    0.09    0.00     608         memcpy [34]
334[35]     0.3    0.01    0.06     423         begin_list [35]
335[36]     0.3    0.07    0.00   53327         out_blank [36]
336[38]     0.3    0.01    0.05     425         standalone [38]
337[40]     0.2    0.02    0.03     426         out_newline [40]
338[46]     0.2    0.04    0.00   12875         toupper [46]
339[47]     0.1    0.00    0.04       2         fopen [47]
340[50]     0.1    0.01    0.02     158         line_break [50]
341[52]     0.1    0.00    0.03      27         fgets [52]
342[53]     0.1    0.00    0.02       1         ctime [53]
343...
344
345flex offers options for improved scanner efficiency, notably,
346uncompressed tables, and fullword, instead of halfword, storage.  Here
347is a comparison of the relative run times on a Sun SPARCstation LX
348with the native Solaris 2.4 C compiler, using the 2.4MB test file
349noted above:
350
351		------------------------------------
352		LFLAGS			-----OPT----
353					-g	-xO4
354		------------------------------------
355		(slow)			2.42	1.23
356		(fast) -Ca -Cf		2.21	1.00
357		------------------------------------
358
359The tradeoff is that the fast scanner has 3.65 times as many lines of
360C code, 4.10 times as many bytes of C code, and the executable is 6.28
361times larger.  Compilation of the slow scanner with optimization took
3624 minutes, and of the fast scanner, 6 minutes.
363
364However, use of the fast scanner uncovered two problems:
365
366	(a) in function yy_try_NUL_trans(), the generated code
367	was missing a variable declaration:
368		register char *yy_cp = yy_c_buf_p; /* BUG: lost with -Cf -Ca */
369	This is a flex 2.5.1 bug that will be reported to the flex author.
370
371	(b) "make check" failed in check005: three lines with
372	characters in 128..255 are missing a blank preceding those
373	characters.
374
375Thus, I will not use the fast scanner options until these problems are
376fixed.
377
378[08-Nov-1997]: Under flex version 2.5.4, I repeated this experiment,
379this time on a faster Sun UltraSPARC 2170 using the native Solaris
3802.5.1 C++ compiler.
381
382The compilation and link time for "make all" with OPT=-g was 13.05sec,
383and with OPT=-O4, 185.58 sec, 14.22 times longer.
384
385The htmlpty.c file was 6231 lines and 179583 bytes long with the fast
386flex options -Ca -Cf, compared to 5220 lines and 130535 bytes without.
387
388The executable size for the fast flex version with OPT=-O4 was 395440
389bytes (382280 bytes stripped of symbols) compared with 425984 bytes
390(391632 bytes stripped of symbols) with OPT=-g.
391
392Although bug (a) above has disappeared, the "make check" failed as
393before in (b) above in check005; all other checks passed.  Thus, the
394fast flex options should still not be used.
395
396
397==================================
398EFFICIENCY ISSUES FOR HTMLPTY 1.00
399==================================
400
401During the development of version 1.00, considerable attention was
402given to improving the efficiency of the program.  Because it does
403much more than earlier versions, it is not as fast (compared to the
404cpchar program, it runs ?.?? times slower), but I believe the code is
405now acceptably fast, and further changes to the code are not likely to
406make much difference.
407
408In particular, I was concerned about the efficiency of low-level I/O
409routines, and the linear search through tag tables, called from
410do_tag() -> check_tag_nesting() -> search() when the
411-check-tag-nesting option has been specified.
412
413Hash table searches are used elsewhere in the program for class/tag
414lookups, and could be used by check_tag_nesting() as well, at the cost
415of some additional complexity in table allocation, generation, and
416freeing.  Fortunately, this does not seem to be necessary, thanks to
417some rather simple optimizations that avoid unnecessary strxxx()
418library function calls. One of them, a 3-line change in table.c in
419function get_name_by_style() and struct Style, reduced execution time
420by 40%.
421
422The low-level I/O routines have been rewritten to replace the old
423double buffered scheme (old output + current line) with a new single
424buffered approach that uses two variables, big_last_verbatim_position
425and big_newline_position, to track the buffer positions of two
426critical fenceposts.  This optimization also reduced complexity,
427making the time-critical dputc() function a candidate for inlining:
428every output character goes through that function.
429
430With C++ compilation, several short functions are now compiled inline
431(they are identified by the INLINE attribute, which is defined in
432common.h to expand to inline in C++ compilation, and to an empty
433string in C compilation).
434
435Profiling with prof, gprof, pixie, pixstats, and tcov on several
436architectures now shows that the time critical code section is the
437yy_match loop in yylex(), and the yy_get_next_buffer() function, which
438together account for 40% to 70% of the run time, depending on the
439architecture.  Since they are part of the flex-generated code, nothing
440can be readily done to improve them, short of changing to a faster
441lexical analyzer, but as far as I'm aware, no one so far has claimed
442to improve on flex.
443
444Here is a sample profile, from a DEC Alpha 2100/5-250 OSF/1 3.2
445system, using a 5MB test file made by concatenating all of the HMTL
446files in the Web tree at http://www.math.utah.edu/, running the
447program like this;
448
449pixie htmlpty
450htmlpty.pixie -indent 0 -logfile /dev/null <test-file >/dev/null
451pixstats htmlpty
452
453    cycles %cycles  cum%     instrs  c/i      calls     c/call name
454 401335489  24.3%  24.3%  401335489  1.0        680     590199 yy_get_next_buffer__Xv
455 289269893  17.5%  41.8%  289269893  1.0          1  289269893 yylex__Xv
456 186740484  11.3%  53.1%  186740484  1.0    5328755         35 dputc__Xi
457 103959911   6.3%  59.3%  103959911  1.0    2735657         38 yyinput__Xv
458  59729355   3.6%  63.0%   59729355  1.0     421340        142 out_string__XPCc
459  56872936   3.4%  66.4%   56872936  1.0    2829934         20 out_char__Xi
460  56729130   3.4%  69.8%   56729130  1.0    3781942         15 __iswctype_sb
461  44619727   2.7%  72.5%   44619727  1.0    1962533         23 strncmp
462  44589019   2.7%  75.2%   44589019  1.0     185581        240 normalize_tag__XPc
463  38082936   2.3%  77.5%   38082936  1.0    1081174         35 NLstrchr
464  34056501   2.1%  79.6%   34056501  1.0        948      35925 copy_verbatim__Xv
465  25740013   1.6%  81.1%   25740013  1.0    1980001         13 isspace
466  23601100   1.4%  82.6%   23601100  1.0     887103         27 strcmp
467  20878692   1.3%  83.8%   20878692  1.0     123129        170 memcpy
468  18787220   1.1%  85.0%   18787220  1.0    1342541         14 last_char__Xi
469  18543783   1.1%  86.1%   18543783  1.0      48003        386 paragraph_contains__XPCc
470  18113750   1.1%  87.2%   18113750  1.0    1811375         10 indentation_size__Xv
471...
472
473When the same experiment is repeated, this time adding the
474-check-tag-nesting option, the profile looks like this:
475
476    cycles %cycles  cum%     instrs  c/i      calls     c/call name
477 401335489  18.6%  18.6%  401335489  1.0        680     590199 yy_get_next_buffer__Xv
478 289269893  13.4%  32.0%  289269893  1.0          1  289269893 yylex__Xv
479 207404011   9.6%  41.7%  207404011  1.0      92374       2245 get_name_by_style__XPCc
480 186740484   8.7%  50.3%  186740484  1.0    5328755         35 dputc__Xi
481 129298105   6.0%  56.3%  129298105  1.0     184240        702 _doprnt
482 103959911   4.8%  61.1%  103959911  1.0    2735657         38 yyinput__Xv
483  70219906   3.3%  64.4%   70219906  1.0    2335227         30 strcmp
484  69497989   3.2%  67.6%   69497989  1.0    3008303         23 strncmp
485  59729355   2.8%  70.4%   59729355  1.0     421340        142 out_string__XPCc
486  56872936   2.6%  73.0%   56872936  1.0    2829934         20 out_char__Xi
487  56730540   2.6%  75.7%   56730540  1.0    3782036         15 __iswctype_sb
488  55964230   2.6%  78.2%   55964230  1.0      52281       1070 search__XPCcPCc
489  48358354   2.2%  80.5%   48358354  1.0     958937         50 memcpy
490  44589019   2.1%  82.6%   44589019  1.0     185581        240 normalize_tag__XPc
491  38082936   1.8%  84.3%   38082936  1.0    1081174         35 NLstrchr
492  34056501   1.6%  85.9%   34056501  1.0        948      35925 copy_verbatim__Xv
493  25740065   1.2%  87.1%   25740065  1.0    1980005         13 isspace
494
495This option seems to add about 10% to the execution time, but still,
496the time attributed to strcmp() and strncmp() is still less than 7%.
497
498Here is a another sample profile, also from pixie, but for the program
499running on an SGI Challenge L system running IRIX 5.3:
500
501Procedures ordered by execution time:
502    cycles %cycles  cum%     instrs  cycles   calls     cycles procedure
503                                     /inst               /call
5042063493343  59.1%  59.1% 2063493343    1.0      680    3034549 yy_get_next_buffer__Fv
505 349231336  10.0%  69.1%  349231336    1.0        1  349231336 yylex__Fv
506 181381967   5.2%  74.3%  174277699    1.0   421340        430 out_string__FPCc
507 136797681   3.9%  78.2%  136797681    1.0  2735657         50 yyinput__Fv
508 102750980   2.9%  81.2%  102750980    1.0  2498820         41 out_verbatim_char__Fi
509  61240202   1.8%  82.9%   61240202    1.0      948      64599 copy_verbatim__Fv
510  61100974   1.8%  84.7%   61100974    1.0   185581        329 normalize_tag__FPc
511  49604130   1.4%  86.1%   49604130    1.0    48003       1033 paragraph_contains__FPCc
512  38372206   1.1%  87.2%   38372206    1.0  1962571         20 strncmp
513  38183748   1.1%  88.3%   38183748    1.0  1081059         35 strchr
514  37590583   1.1%  89.4%   30311767    1.2   227463        165 hash_lookup__FPCcP10Hash_Table
515  35579988   1.0%  90.4%   35579988    1.0   585204         61 trim_line__Fi
516
517On this machine, the lexical analyzer is using about 73% of the time.
518
519From a Sun SPARCstation LX running Solaris 2.5, here is part of the
520flat profile produced by gprof, using a 13MB test file formed from
521all of the HTML files on my home system:
522
523granularity: each sample hit covers 2 byte(s) for 0.00% of 1071.27 seconds
524
525   %  cumulative    self              self    total
526 time   seconds   seconds    calls  ms/call  ms/call name
527 68.1     729.23   729.23     1635   446.01   459.94  yy_get_next_buffer [4]
528  5.1     783.52    54.30 22060617     0.00     0.00  dputc [9]
529  5.0     836.82    53.30                            _mcount (615)
530  4.7     887.06    50.24     2843    17.67    17.67  _write [15]
531  2.8     917.23    30.17                            oldarc [19]
532  2.1     939.76    22.53     1636    13.77    13.77  _read [22]
533  2.0     961.68    21.92 11225462     0.00     0.06  input [5]
534  2.0     982.86    21.18                            next [23]
535  0.9     992.24     9.38                            done [27]
536  0.7     999.83     7.59 11419666     0.00     0.00  out_char [28]
537  0.7    1007.02     7.19        1  7190.15 945357.77  yylex [3]
538  0.5    1012.33     5.31                            _moncontrol [34]
539  0.4    1016.21     3.88    86155     0.05     0.05  _memcpy [35]
540  0.4    1020.00     3.79      107    35.42  1241.25  complex_markup_declaration [8]
541  0.3    1023.38     3.39 10640950     0.00     0.01  out_verbatim_char [12]
542  0.3    1026.67     3.29      678     4.85   804.24  copy_verbatim [7]
543  0.3    1029.46     2.79                            chainloop [42]
544  0.3    1032.20     2.74      274     9.98    10.37  doctype [41]
545  0.3    1034.93     2.73   346694     0.01     0.01  normalize_tag [36]
546  0.2    1036.69     1.76  1027598     0.00     0.00  strchr [48]
547  0.2    1038.33     1.65  1483003     0.00     0.00  indentation_size [43]
548  0.2    1039.95     1.62   276567     0.01     0.24  out_string [10]
549
550Clearly, on all three systems, I/O is the chief consumer of CPU time.
551
552To that end, I made some additional changes in the code to reduce the
553number of system calls for I/O, by redimensioning the big_buffer[]
554array from MAXBUF to MAXBIGBUF, and then adding calls in main() to
555setvbuf() (when available) to allocate input and output buffers, also
556of size MAXBIGBUF (default: max(16384,MAXBUF)).
557
558Experiments on DEC Alpha 2100-5/250 OSF/1 3.2 HP 9000/735 HP-UX 10.01,
559NeXT Mach 3.3, SGI Challenge L IRIX 5.3, and Sun Solaris 2.5 systems
560with MAXBIGBUF values of 2048, 4096, 16384, 65536, 262144, and 1048576
561showed little change in run times with a 5MB test input file. The HP
562system had the largest reduction, only about 5%, as MAXBIGBUF
563increased.
564
565I also made additional experiments with function inlining: choosing
566the top 15 time consumers from profiles (blank(), copy_verbatim(),
567dputc(), indentation_size(), last_char(), normalize_tag(),
568out_blank(), out_char(), out_string(), out_verbatim_char(),
569out_verbatim_string(), out_yytext(), trim_line(), yyinput(), and
570yylook()) and asking the C++ compiler to inline them reduced execution
571time by 37% on Sun systems.  I have therefore added INLINE directives
572to the definitions and declarations of those functions (except
573yyinput() and yylook(), which are in lex/flex-generated code).
574
575With the current version (SC4.0) of the Sun Solaris C++ compiler,
576inline directives do not seem to cause function inlining, but an
577explicit (but horrid) option
578
579-xinline=__0FGyylookv,__0FHyyinputv,__0FFdputci,__0FKout_stringPCc,__0FJlast_chari,strncmp,strchr,__0FIout_chari,__0FQindentation_sizev,__0FNnormalize_tagPc,__0FRout_verbatim_chari,strcmp,_memcpy,__0FKout_yytextv,__0FNcopy_verbatimv,__0FFblankv,__0FJout_blankv,__0FJtrim_linei,__0FTout_verbatim_stringPCc
580
581did so.
582
583One optimization that looked promising, but actually made the program
584run slightly slower, was to add additional patterns for the commonest
585tags (I chose the top 73 from a frequency-ordered list of all of the
586tags used in the Web tree at http://www.math.utah.edu/), and then have
587an action that cached the formatting function, e.g.,
588
589{BEGINPAIR}{B}{I}{G}{ENDTAG}			{ DO_TAG2("BIG") }
590
591(DO_TAG2() is still defined in htmlpty.l), instead of looking it up
592each time as do_tag() does.  Even though this greatly reduced the
593number of calls to get_action_by_name(), it appears that the
594additional complexity in the lexical analyzer made up for that gain,
595and resulted in a slower program.
596
597Because lex does not run a preprocessor, there is no way to retain
598those 73 patterns in the source file, but I have saved them in my
599development directory for possible future work.  The rest of the
600required support code is retained in htmlpty.l and table.c inside a
601
602#if defined(TAG_CACHE)
603...
604#endif /* defined(TAG_CACHE) */
605
606preprocessor conditional.
607
608
609===================
610IBM PC INSTALLATION
611===================
612
613Up to version 0.08, html-pretty built without problems under Turbo C
6142.0 and 3.0, and passed the validation suite.
615
616With version 0.09, the lex/flex-generated jump tables are larger, and
617the nasty Intel segmented memory architecture rears its ugly head, and
618it took me several hours of work to get a working version for IBM PC
619DOS.  I tried Microsoft C 5.0, 5.1, and 6.0, and Turbo C 2.0 and 3.0.
620I also have Microsoft C 7.0, but it will not run under SunPC.
621
622Compilation under Microsoft C 6.0 requires addition of -Dconst= ,
623because of an error in the compiler: it thinks that once an array is
624declared of type const char *, then you cannot assign to it!
625
626lex-generated htmlpty.c compiles with Microsoft C 5.0, 5.1, and 6.0,
627but gives incorrect output.
628
629The flex-generated htmlpty.c won't compile with Microsoft C 5.0, 5.1,
6306.0, or Turbo C 2.0 or 3.0 -- four complain about data group > 64K,
631even in the compact and huge memory models; Microsoft C 6.0 just
632produces this:
633
634htmlpty.c
635htmlpty.c(4781) : fatal error C1001: Internal Compiler Error
636                (compiler file '@(#)regMD.c:1.100', line 3837)
637                Contact Microsoft Product Support Services
638
639I then modified the flex-generated htmlpty.c to incorporate the huge
640attribute on two arrays:
641
642	static yyconst short int huge yy_nxt[15842] =
643	static yyconst short huge yy_chk[15842] =
644
645Microsoft C 5.0 and 5.1 compile in the huge model, but htmlpty goes
646into an infinite loop.  Microsoft C 6.0 produces the above fatal
647internal error message.
648
649tcc 2.0 won't compile at all: it seems to permit the huge attribute
650only on pointers, not on array objects.
651
652tcc 3.0 will compile the code in the compact and huge memory models,
653provided that -Dconst= is used to eliminate some apparent type
654conflict errors.  I don't understand what is happening here: running
655the tcc cpp (C preprocessor) on the code shows that yyconst expands to
656const through most of the code, then expands to an empty string in the
657rest, without ever having been redefined!
658
659The resulting executable from the tcc 3.0 compilation runs, and passes
660the validation suite.
661
662I then tried the lex-generated htmlpty.c, adding the huge attribute to
663these three lines:
664
665	int huge yyvstop[] = {
666	struct yywork { YYTYPE verify, advance; } huge yycrank[] = {
667	struct yysvf huge yysvec[] = {
668
669This compiled and linked with tcc 3.0, but the output is incorrect.
670
671Conclusion: the only workable compiler that I have for building
672html-pretty on the IBM PC is Turbo C 3.0.
673
674Trapping of the warning and error messages sent to stderr requires the
675Microsoft errout executable; it is used in the pccheck.bat script.
676
677For version 1.00, I prepared PC/config.h by hand and got a successful
678build with Turbo C 3.0.  All but one of the validation tests passed.
679The single failure is check016, whose nested style files seem to
680exhaust PC memory, causing it to hang completely without getting a
681failure return from fopen().  By experiment, I found that if I
682replaced the line
683
684-stylefile check016.st4
685
686in check016.st3 by
687
688-print-stylefile
689
690then the test would produce the expected results.  I don't have time
691or patience to pursue a workaround for this, but the limitation does
692not seem serious anyway, since style files nested deeper than two
693levels are unlikely to be necessary in practice, and relatively
694trivial to avoid.
695
696The batch file Test/docheck.bat can be used to run the checks.  PC DOS
697lacks an adequate file difference utility, so I ran the difference
698tests on a UNIX system using a simple csh/tcsh loop:
699
700	foreach f (*.out *.err)
701		echo ========== $f
702		diff $f okay/$f
703	end
704
705Before running this loop, it may be necessary to change test file line
706terminators to either DOS CR-LF or UNIX LF conventions, perhaps using
707the dos2ux/ux2dos utilities available at
708
709	ftp://ftp.math.utah.edu/pub/misc/dosmacux-x.y.*
710
711Perhaps some PC installer will be able to build the program under
712newer compilers that avoid the obnoxious 64K segment limit, and send
713me a .exe file and a .bat file to build that program, for
714incorporation in later releases.
715
716
717======================
718TESTING AND VALIDATION
719======================
720
721While successful passing of the validation suite with "make check"
722gives confidence in the correct operation of the program, it is
723helpful to test the program further with coverage analyzers (such as
724Sun's tcov), profilers (such as prof, gprof, and pixie), and debuggers
725with memory leak detection and pointer access checking (such as Sun's
726dbx 3.1).  Also, testing against dozens of C and C++ compilers, with
727maximal warnings requested, and runs with lint, helps to weed out
728errors that unnecessarily limit portability.
729
730------------------
731Memory utilization
732------------------
733
734Runs under dbx show no memory leaks, and only one kind of pointer
735violation, rui (read-from-uninitialized memory) errors, which arise in
736access to data returned by getpwnam(), and in access to the lex/flex
737yytext[] buffer, and which, as far as I can see, are bogus, since the
738returned data is valid, and can be displayed by the debugger, but
739which the debugger complains about when the program tries to access
740it.
741
742	% dbx htmlpty
743	(dbx) check -all
744	(dbx) suppress rui
745	(dbx) run test/check005.in >/dev/null
746	(dbx) ...prettyprinter warnings...
747	Checking for memory leaks...
748	Leak Summary:
749		actual leaks:         0  total size:       0 bytes
750		possible leaks:       0  total size:       0 bytes
751
752	Blocks in use Summary:
753		blocks in use:      304  total size:   26108 bytes
754
755	execution completed, exit code is 0
756
757Setting a breakpoint at the final return in main(), and requesting a
758memory usage report produces:
759
760	 4286       return ((g_errors > 0) ? EXIT_FAILURE : EXIT_SUCCESS);
761	(dbx) showmemuse -n 999 -a
762	Checking for memory use...
763
764	Blocks in use (biu) report:
765
766	  Total   % of   Num of    Avg     Allocation trace
767	  Size     All   Blocks    Block
768				   Size
769	========= ====== ======  ======== =======================================
770	     8200 90.37%      1      8200  _findbuf < _doprnt < printf < generate_style_file < main
771	      592  6.52%      1       592  calloc < _tzload < _ltzset_u < localtime_u < ctime < generate_style_file < main
772	      148  1.63%      1       148  calloc < _tzload < _ltzset_u < localtime_u < ctime < generate_style_file < main
773	       36  0.39%      1        36  calloc < _tzload < _ltzset_u < localtime_u < ctime < generate_style_file < main
774	       36  0.39%      1        36  calloc < _tzload < _ltzset_u < localtime_u < ctime < generate_style_file < main
775	       36  0.39%      1        36  strdup < _tzload < _ltzset_u < localtime_u < ctime < generate_style_file < main
776	       13  0.14%      1        13  calloc < _tzload < _ltzset_u < localtime_u < ctime < generate_style_file < main
777	       12  0.13%      1        12  tzcpy < getzname < _ltzset_u < localtime_u < ctime < generate_style_file < main
778
779	Blocks in use Summary:
780		blocks in use:        8  total size:    9073 bytes
781
782The only unfreed memory left is that allocated by Sun library
783routines, findbuf() and tzload(), neither of which are under user
784control.
785
786
787----------------
788Input Robustness
789----------------
790
791The fuzz package, described in the paper
792
793@String{j-CACM                  = "Communications of the ACM"}
794
795@Article{Miller:1990:SRU,
796  author =       "Barton P. Miller and Lars Fredriksen and Bryan So",
797  title =        "Study of the Reliability of {UNIX} Utilities",
798  journal =      j-CACM,
799  volume =       "33",
800  number =       "12",
801  pages =        "33--44",
802  month =        dec,
803  year =         "1990",
804  CODEN =        "CACMA2",
805  ISSN =         "0001-0782",
806  bibdate =      "Wed Aug 31 17:57:41 1994",
807  acknowledgement = ack-nhfb,
808}
809
810and followed up in 1995 with a revisit in a technical report available
811at ftp://ftp.cs.wisc.edu/par-distr-sys/fuzz, has also been applied.
812
813The fuzz package, which essentially feeds random garbage to a
814program's input stream, has turned up bugs in numerous UNIX utilities,
815and was able to make many of them core dump, and in at least one case,
816was able to crash the entire operating system.
817
818Despite wide availability of the fuzz package after the 1990 paper,
819five years later, many of the same bugs were found in commercial UNIX
820systems; the system that fared the best in the fuzz tests was the Free
821Software Foundation's GNU system.
822
823The fuzz tests found no problems in this program, when run this way
824
825    cd /u/sy/beebe/src/fuzz/fuzz-1995-basic/src/fuzz/script1
826    ln /u/sy/beebe/src/htmlpty/htmlpty-1.00/htmlpty ..
827
828    ./run.stdin ../htmlpty
829
830	../htmlpty < t1 >& /dev/null
831	../htmlpty < t2 >& /dev/null
832	../htmlpty < t3 >& /dev/null
833	../htmlpty < t4 >& /dev/null
834	../htmlpty < t5 >& /dev/null
835	../htmlpty < t6 >& /dev/null
836	../htmlpty < t7 >& /dev/null
837	../htmlpty < t8 >& /dev/null
838	../htmlpty < t9 >& /dev/null
839	../htmlpty < t10 >& /dev/null
840	../htmlpty < t11 >& /dev/null
841	../htmlpty < t12 >& /dev/null
842
843    ./run.file ../htmlpty
844
845	../htmlpty t1 >& /dev/null
846	../htmlpty t2 >& /dev/null
847	../htmlpty t3 >& /dev/null
848	../htmlpty t4 >& /dev/null
849	../htmlpty t5 >& /dev/null
850	../htmlpty t6 >& /dev/null
851	../htmlpty t7 >& /dev/null
852	../htmlpty t8 >& /dev/null
853	../htmlpty t9 >& /dev/null
854	../htmlpty t10 >& /dev/null
855	../htmlpty t11 >& /dev/null
856	../htmlpty t12 >& /dev/null
857
858    cd ../script2
859
860    ./run.stdin ../htmlpty
861
862	../htmlpty < t1 >& /dev/null
863	../htmlpty < t2 >& /dev/null
864	../htmlpty < t3 >& /dev/null
865	../htmlpty < t4 >& /dev/null
866	../htmlpty < t5 >& /dev/null
867	../htmlpty < t6 >& /dev/null
868	../htmlpty < t7 >& /dev/null
869	../htmlpty < t8 >& /dev/null
870	../htmlpty < t9 >& /dev/null
871	../htmlpty < t10 >& /dev/null
872	../htmlpty < t11 >& /dev/null
873	../htmlpty < t12 >& /dev/null
874
875    ./run.file ../htmlpty
876
877	../htmlpty t1 >& /dev/null
878	../htmlpty t2 >& /dev/null
879	../htmlpty t3 >& /dev/null
880	../htmlpty t4 >& /dev/null
881	../htmlpty t5 >& /dev/null
882	../htmlpty t6 >& /dev/null
883	../htmlpty t7 >& /dev/null
884	../htmlpty t8 >& /dev/null
885	../htmlpty t9 >& /dev/null
886	../htmlpty t10 >& /dev/null
887	../htmlpty t11 >& /dev/null
888	../htmlpty t12 >& /dev/null
889
890    rm -f ../htmlpty
891
892===============
893PROBLEM SYSTEMS
894===============
895
896On a HALstation,
897
898	uname -a
899	SunOS hal 5.4 SPARC64/OS_2.4.5 sun4H sparc
900
901building with the Fujitsu C++ compiler like this
902
903	env CC=FCC ./configure && make all check
904
905produces successful compilations, and all but a single check pass.
906The failure is in check005, and it happens at all optimization levels,
907including -g.  Each time, the Test/check005.out file has binary
908garbage in it.  I was unable to debug this with either gdb or fdb:
909gdb cannot usefully find symbols to print, such as big_next_position,
910and fdb aborts immediately on startup.
911
912Changing from FCC to hcc, the HAL C/C++ compiler, produces a
913successful build and test.
914