1%% /u/sy/beebe/src/htmlpty/htmlpty-1.00/INSTALL, Fri Nov 28 13:56:40 1997 2%% Edit by Nelson H. F. Beebe <beebe@plot79.math.utah.edu> 3 4================== 5QUICK INSTALLATION 6================== 7 8You can build and install html-pretty on UNIX systems with little 9difficulty, using the GNU standard incantation 10 11 ./configure && make all check install 12 13On systems with unusual compiler names (e.g., HAL SPARC64/OS_2.4.5), 14you may need to specify a C and C++ compiler to use; see the CC= and 15CCC= options to env in the example below. 16 17NB: you can safely ignore this warning [your line number may differ]: 18 19 flex -t htmlpty.l | sed -e '/^# *line/d' > htmlpty.c 20 "htmlpty.l", line 620: warning, rule cannot be matched 21 22That rule is there intentionally, to avoid losing output in the event 23earlier rules fail to match all possible input patterns. 24 25If you want to repeat the process in the same directory, but for a 26different architecture, first do 27 28 make distclean 29 30to restore the directory to the state of a fresh distribution, then 31repeat the configure and make steps as before. 32 33If the builds, checks, and installs were successful, you can stop 34reading now! 35 36 -------------------- 37 38The configure script will create Makefile from Makefile.in, and 39config.h from config.hin, an essential step before make and compilers 40can be used. 41 42There are no top-level Makefile and config.h files included in a new 43distribution, since configure is expected to generate them. However, 44for safety, there are backup copies of Makefile, config.h, configure, 45and htmlpty.c in the Backup subdirectory; they were prepared on a Sun 46Solaris 2.5 system. You can use copies of them should you experience 47problems with configure, or if you are porting the software to a 48non-UNIX system that does not have a POSIX- or UNIX-like shell 49(several non-UNIX systems have such shells, including IBM PC DOS, IBM 50VM/CMS, and Microsoft Windows NT). 51 52If you wish to use a particular C and/or C++ compiler and optimization 53level, do it like this: 54 55 env CC=..C-compiler.. CCC=..C++-compiler.. ./configure && \ 56 make OPT='-optlevel' all check install 57 58or if your system lacks the env command 59 60 sh/ksh/bash shell: 61 CC=..C-compiler.. CCC=..C++-compiler.. ./configure && \ 62 make OPT='-optlevel' all check install 63 64 csh/tcsh shell: 65 setenv CC ..C-compiler.. 66 setenv CCC ..C++-compiler.. 67 ./configure && make OPT='-optlevel' all check install 68 69You do not need to rerun configure if you only change optimization 70levels, but you should do if you change other compiler options, since 71they may affect the visibility of symbols and declarations in system 72header files, and that in turn may require (automated) changes to the 73config.h file. 74 75In the Makefile, you may want to change the definitions of the BINDIR 76and MANDIR installation directories; the distribution version uses the 77Free Software Foundation's standards of /usr/local/bin and 78/usr/local/man, which avoids contaminating any vendor-provided 79directories. This is readily done on the make command-line, like 80this: 81 82 make prefix=/some/other/place targets 83 84or, better, at configure time, like this: 85 86 ./configure --prefix=/some/other/place 87 88 89==================== 90INSTALLATION DETAILS 91==================== 92 93The Makefile contains about 60 convenience targets for picking 94particular compilers and optimization levels on different UNIX 95systems. If you use them, be sure to run configure first with CC set 96to choose the appropriate compiler. 97 98In the Makefile, read carefully the comments on the merits of choosing 99lex or flex; flex is the default for the reasons stated there, but lex 100can be used on SOME, but not all, UNIX systems. Although flex 101usuually produces faster lexical analyzers than lex, for html-pretty 1021.00, the lex version is slightly faster. However, because the lex 103version has problems with some characters in the range 128..255 on 104some systems, flex is still the default lexical analyzer generator. 105 106On HP 9000/7xx HP-UX 10.01 systems, and possibly others, it is 107necessary to force yytext to be an array rather than a pointer; this 108is done by adding -DYYCHAR_ARRAY to the compilation options, e.g. 109 110 make XCFLAGS=-DYYCHAR_ARRAY 111 112Because some lex implementations allocate peculiar sizes for yytext[], 113html-pretty checks at startup that the expected size matches the 114actual size, and if they differ, it quits immediately with a message 115like this: 116 117********************************************************************** 118** FATAL ERROR: This program has inconsistent array sizes: 119** YYLMAX = 8192 120** sizeof(yytext) = 0 121** You must correct the lex-generated code and rebuild the program. 122** The Makefile has a sed filter step that should correct the error. 123** Did you override this by running lex manually? 124********************************************************************** 125 126As the message notes, the installation procedure tries to correct such 127abberrations, but may not succeed on all systems. 128 129In the Makefile, pick the appropriate definition of HOST, or if 130neither works (highly unlikely in the Internet world), you can specify 131it on the make command line with something like 132 133 make HOST=-D\"foo.bar.baz\" 134 135Then just type 136 137 make 138 139On Sun Solaris 2.x, with the default vendor C compiler, you may need 140to type 141 142 make CC='cc -Xc' 143 144because flex-generated code otherwise defines const to an empty string 145after including some system header files, and before including others. 146The result is function prototype name conflicts for names declared in 147multiple header files (e.g. rename()). The -Xc option requests strict 148Standard C compilation, and avoids the problem. 149 150On some systems (e.g. HP-UX with c89), you may need to add 151-D_POSIX_SOURCE in order to get the utsname structures visible. 152 153html-pretty has been successfully built and tested on these systems: 154 155DECstation 3100 ULTRIX 4.3 cc, lcc, gcc, g++ 156DEC Alpha 3000/400 OSF/1 2.0 cc, c89, gcc 157DEC Alpha 3000/300LX OSF/1 3.0 cc, c89, gcc, g++ 158HALstation 385 SPARC64/OS_2.4.5 hcc, FCC 159HP 9000/735 HP-UX 9.0.5 cc, c89, CC, gcc, g++ 160IBM PC MS DOS 5.0 tcc (2.0 and 3.0) 161IBM RS/6000 AIX 3.2.5 cc, xlC, gcc, g++ 162MIPS RC6280 RISCos 2.1.1AC cc 163NeXT Turbostation Mach 3.0 cc, lcc, gcc, g++ 164Sun SPARCstation SunOS 4.1.3 cc, lcc, gcc, g++ 165Sun SPARCstation Solaris 2.3 and 2.4 cc, CC, lcc, gcc, g++ 166Silicon Graphics Indigo IRIX 5.3, 6.2, 6.4 cc, c89, CC, gcc, g++ 167 168With flex 2.5.1, all systems passed, and the same hold for the more 169recent flex 2.5.4. If you get failures in check005 at the characters 170in 128..255 with flex, you probably have an older version and should 171upgrade (flex sources are on 172ftp://prep.ai.mit.edu/pub/gnu/flex*.tar.gz, and on the many sites that 173mirror the Free Software Foundation archive); flex 2.3 is one such 174failing version. 175 176With lex, some implementations lose characters with values in the 177range 128..255 on check005.in. 178 179On other systems, please try a C++ compiler if you have one, because 180it is more likely to catch problems than C compilers do. 181flex-generated code is C++ compatible, but some vendor lex 182implementations are still in the old K&R C mold, instead of conforming 183to 1989 ANSI/ISO Standard C, and produce C code that cannot be 184compiled with C++ compilers. 185 186To build, check, and install html-pretty, just use the conventional 187GNU standard incantation: 188 189 ./configure && make all check install 190 191Installation will not happen if any of the validation checks fail. In 192the current release, configure is a dummy script, but it may do more 193in later releases. 194 195It you want to change the default compiler optimization level, set the 196variable OPT on the command line. Other compiler flags can be 197supplied with the XCFLAGS variable. For example 198 199 make OPT=-O3 XCFLAGS=-Xc all check install 200 201If you need additional libraries, you can add a LIBS value on the make 202command line, e.g., I found that compilation with gprof profiling with 203C++ on Sun Solaris 2.5 needed this: 204 205 make OPT=-pg LIBS=-ldl 206 207Should you want to remove an installed version, just do 208 209 make uninstall 210 211When you finish, do 212 213 make clean 214 215to remove intermediate files, leaving the executable, or 216 217 make distclean 218 219to reduce everything to the state of the original distribution. 220 221If you do 222 223 make maintainer-clean 224 225[but please do NOT!] you will also remove the bootstrap htmlpty.c 226file, which needs lex or flex to be rebuilt, the ASCII, HTML, PDF, and 227PostScript forms of the manual pages, which need man2ps, man2html, 228groff/nroff, and Adobe Acrobat Distiller to recreate, and the 229configure and config.hin files, which need GNU autoconf and autoheader 230to recreate. 231 232 233================================== 234EFFICIENCY ISSUES FOR HTMLPTY 0.11 235================================== 236 237lex (and flex)-generated lexical analyzers are generally quite 238efficient, and this one is no exception. With version 0.11 of 239htmlpty, on a large test file of 78,300 lines (2.4MB), made by 240concatenating all of the .html files on my home file system, the flex 241version of html-pretty took only 10.72 sec (7306 lines/sec, 269K input 242bytes/sec) on an entry-level Sun SPARCstation LX workstation, with the 243code compiled at the highest optimization level (-xO4) with the Sun 244Solaris 2.4 native C compiler. This optimization level results in 245inlining of short functions, of which there are many in this program. 246 247A more general result is the ratio of html-pretty's run time with that 248of a simple program which copies the same file with the loop 249 250 while ((c = getchar()) != EOF) 251 putchar(c); 252 253so that every input character is input and output individually. The 254copy loop ran in 3.21 sec, and the time ratio is 3.34. The output 255file size is 3.3MB, so adjusting for the total number of bytes input 256and output, this ratio can be further scaled down by (2399316 + 2573295133)/(2 * 2399316) = 0.69, to a value of 2.29. 258 259Line profiling with Sun tcov revealed hot spots inside the 260flex-generated code, about which one can do nothing, and in the 261copy_verbatim() routine, which is called a lot because the test file 262has a lot of <PRE> ... </PRE> environments. 263 264CPU time profiling with gprof gave this flat profile; note that 265_read() and _write() together account for about 45% of the CPU time. 266 267Each sample counts as 0.01 seconds. 268 % cumulative self self total 269 time seconds seconds calls ms/call ms/call name 270 27.05 7.41 7.41 594 12.47 12.47 _read 271 18.33 12.43 5.02 813 6.17 6.17 _write 272 10.55 15.32 2.89 1 2890.00 24003.83 yylex 273 8.80 17.73 2.41 659 3.66 13.24 copy_verbatim 274 8.54 20.07 2.34 112354 0.02 0.03 out_string 275 6.28 21.79 1.72 _mcount 276 3.58 22.77 0.98 oldarc 277 3.36 23.69 0.92 423743 0.00 0.00 .mul 278 2.34 24.33 0.64 28719 0.02 0.06 line_end 279 2.26 24.95 0.62 53771 0.01 0.01 _memccpy 280 1.53 25.37 0.42 261034 0.00 0.00 strchr 281 1.28 25.72 0.35 70858 0.00 0.01 blank 282 0.91 25.97 0.25 53273 0.00 0.07 fputs 283 0.91 26.22 0.25 27228 0.01 0.01 normalize_tag 284 0.62 26.39 0.17 done 285 0.51 26.53 0.14 84749 0.00 0.04 out_yytext 286 0.51 26.67 0.14 8387 0.02 0.20 list_item 287 0.37 26.77 0.10 9998 0.01 0.17 pair 288 0.37 26.87 0.10 8 12.50 12.50 _open 289 0.33 26.96 0.09 608 0.15 0.15 memcpy 290 0.26 27.03 0.07 108476 0.00 0.00 _thr_main_stub 291 0.26 27.10 0.07 54999 0.00 0.00 _realbufend 292 0.26 27.17 0.07 53327 0.00 0.00 out_blank 293 0.15 27.21 0.04 12875 0.00 0.00 toupper 294 0.11 27.24 0.03 55004 0.00 0.00 _rw_rdlock_stub 295 0.11 27.27 0.03 3848 0.01 0.06 font 296 0.07 27.29 0.02 2158 0.01 0.31 paragraph 297... 298 299and this hierarchical profile (function whose names begin with 300underscore are library routines): 301 302granularity: each sample hit covers 4 byte(s) for 0.04% of 25.67 seconds 303 304index % time self children called name 305[1] 95.3 0.00 24.46 1 main [1] 306[2] 95.3 0.00 24.46 _start [2] 307[3] 93.5 2.89 21.11 1 yylex [3] 308[4] 34.1 0.01 8.75 664 verbatim [4] 309[5] 34.0 2.41 6.32 659 copy_verbatim [5] 310[6] 29.1 0.00 7.46 307 fread [6] 311[7] 28.9 7.41 0.00 594 _read [7] 312[8] 28.8 0.01 7.37 590 __filbuf [8] 313[9] 27.9 0.00 7.17 295 yy_get_next_buffer [9] 314[10] 19.6 5.02 0.00 813 _write [10] 315[11] 19.4 0.01 4.97 804 _xflsbuf [11] 316[12] 15.4 0.25 3.71 53273 fputs [12] 317[13] 15.3 2.34 1.58 112354 out_string [13] 318[14] 12.1 0.14 2.96 84749 out_yytext [14] 319[15] 8.0 0.00 2.05 332 __flsbuf [15] 320[16] 6.9 0.64 1.14 28719 line_end [16] 321[17] 6.7 0.14 1.57 8387 list_item [17] 322[18] 6.6 0.10 1.59 9998 pair [18] 323[19] 3.8 0.98 0.00 oldarc [19] 324[20] 3.6 0.92 0.00 423743 .mul [20] 325[21] 2.6 0.02 0.66 2158 paragraph [21] 326[22] 2.6 0.35 0.32 70858 blank [22] 327[23] 2.4 0.62 0.00 53771 _memccpy [23] 328[24] 1.8 0.00 0.46 1 out_banner [24] 329[25] 1.6 0.42 0.00 261034 strchr [25] 330[28] 1.3 0.25 0.09 27228 normalize_tag [28] 331[30] 0.9 0.03 0.20 3848 font [30] 332[31] 0.7 0.17 0.00 done [31] 333[34] 0.4 0.09 0.00 608 memcpy [34] 334[35] 0.3 0.01 0.06 423 begin_list [35] 335[36] 0.3 0.07 0.00 53327 out_blank [36] 336[38] 0.3 0.01 0.05 425 standalone [38] 337[40] 0.2 0.02 0.03 426 out_newline [40] 338[46] 0.2 0.04 0.00 12875 toupper [46] 339[47] 0.1 0.00 0.04 2 fopen [47] 340[50] 0.1 0.01 0.02 158 line_break [50] 341[52] 0.1 0.00 0.03 27 fgets [52] 342[53] 0.1 0.00 0.02 1 ctime [53] 343... 344 345flex offers options for improved scanner efficiency, notably, 346uncompressed tables, and fullword, instead of halfword, storage. Here 347is a comparison of the relative run times on a Sun SPARCstation LX 348with the native Solaris 2.4 C compiler, using the 2.4MB test file 349noted above: 350 351 ------------------------------------ 352 LFLAGS -----OPT---- 353 -g -xO4 354 ------------------------------------ 355 (slow) 2.42 1.23 356 (fast) -Ca -Cf 2.21 1.00 357 ------------------------------------ 358 359The tradeoff is that the fast scanner has 3.65 times as many lines of 360C code, 4.10 times as many bytes of C code, and the executable is 6.28 361times larger. Compilation of the slow scanner with optimization took 3624 minutes, and of the fast scanner, 6 minutes. 363 364However, use of the fast scanner uncovered two problems: 365 366 (a) in function yy_try_NUL_trans(), the generated code 367 was missing a variable declaration: 368 register char *yy_cp = yy_c_buf_p; /* BUG: lost with -Cf -Ca */ 369 This is a flex 2.5.1 bug that will be reported to the flex author. 370 371 (b) "make check" failed in check005: three lines with 372 characters in 128..255 are missing a blank preceding those 373 characters. 374 375Thus, I will not use the fast scanner options until these problems are 376fixed. 377 378[08-Nov-1997]: Under flex version 2.5.4, I repeated this experiment, 379this time on a faster Sun UltraSPARC 2170 using the native Solaris 3802.5.1 C++ compiler. 381 382The compilation and link time for "make all" with OPT=-g was 13.05sec, 383and with OPT=-O4, 185.58 sec, 14.22 times longer. 384 385The htmlpty.c file was 6231 lines and 179583 bytes long with the fast 386flex options -Ca -Cf, compared to 5220 lines and 130535 bytes without. 387 388The executable size for the fast flex version with OPT=-O4 was 395440 389bytes (382280 bytes stripped of symbols) compared with 425984 bytes 390(391632 bytes stripped of symbols) with OPT=-g. 391 392Although bug (a) above has disappeared, the "make check" failed as 393before in (b) above in check005; all other checks passed. Thus, the 394fast flex options should still not be used. 395 396 397================================== 398EFFICIENCY ISSUES FOR HTMLPTY 1.00 399================================== 400 401During the development of version 1.00, considerable attention was 402given to improving the efficiency of the program. Because it does 403much more than earlier versions, it is not as fast (compared to the 404cpchar program, it runs ?.?? times slower), but I believe the code is 405now acceptably fast, and further changes to the code are not likely to 406make much difference. 407 408In particular, I was concerned about the efficiency of low-level I/O 409routines, and the linear search through tag tables, called from 410do_tag() -> check_tag_nesting() -> search() when the 411-check-tag-nesting option has been specified. 412 413Hash table searches are used elsewhere in the program for class/tag 414lookups, and could be used by check_tag_nesting() as well, at the cost 415of some additional complexity in table allocation, generation, and 416freeing. Fortunately, this does not seem to be necessary, thanks to 417some rather simple optimizations that avoid unnecessary strxxx() 418library function calls. One of them, a 3-line change in table.c in 419function get_name_by_style() and struct Style, reduced execution time 420by 40%. 421 422The low-level I/O routines have been rewritten to replace the old 423double buffered scheme (old output + current line) with a new single 424buffered approach that uses two variables, big_last_verbatim_position 425and big_newline_position, to track the buffer positions of two 426critical fenceposts. This optimization also reduced complexity, 427making the time-critical dputc() function a candidate for inlining: 428every output character goes through that function. 429 430With C++ compilation, several short functions are now compiled inline 431(they are identified by the INLINE attribute, which is defined in 432common.h to expand to inline in C++ compilation, and to an empty 433string in C compilation). 434 435Profiling with prof, gprof, pixie, pixstats, and tcov on several 436architectures now shows that the time critical code section is the 437yy_match loop in yylex(), and the yy_get_next_buffer() function, which 438together account for 40% to 70% of the run time, depending on the 439architecture. Since they are part of the flex-generated code, nothing 440can be readily done to improve them, short of changing to a faster 441lexical analyzer, but as far as I'm aware, no one so far has claimed 442to improve on flex. 443 444Here is a sample profile, from a DEC Alpha 2100/5-250 OSF/1 3.2 445system, using a 5MB test file made by concatenating all of the HMTL 446files in the Web tree at http://www.math.utah.edu/, running the 447program like this; 448 449pixie htmlpty 450htmlpty.pixie -indent 0 -logfile /dev/null <test-file >/dev/null 451pixstats htmlpty 452 453 cycles %cycles cum% instrs c/i calls c/call name 454 401335489 24.3% 24.3% 401335489 1.0 680 590199 yy_get_next_buffer__Xv 455 289269893 17.5% 41.8% 289269893 1.0 1 289269893 yylex__Xv 456 186740484 11.3% 53.1% 186740484 1.0 5328755 35 dputc__Xi 457 103959911 6.3% 59.3% 103959911 1.0 2735657 38 yyinput__Xv 458 59729355 3.6% 63.0% 59729355 1.0 421340 142 out_string__XPCc 459 56872936 3.4% 66.4% 56872936 1.0 2829934 20 out_char__Xi 460 56729130 3.4% 69.8% 56729130 1.0 3781942 15 __iswctype_sb 461 44619727 2.7% 72.5% 44619727 1.0 1962533 23 strncmp 462 44589019 2.7% 75.2% 44589019 1.0 185581 240 normalize_tag__XPc 463 38082936 2.3% 77.5% 38082936 1.0 1081174 35 NLstrchr 464 34056501 2.1% 79.6% 34056501 1.0 948 35925 copy_verbatim__Xv 465 25740013 1.6% 81.1% 25740013 1.0 1980001 13 isspace 466 23601100 1.4% 82.6% 23601100 1.0 887103 27 strcmp 467 20878692 1.3% 83.8% 20878692 1.0 123129 170 memcpy 468 18787220 1.1% 85.0% 18787220 1.0 1342541 14 last_char__Xi 469 18543783 1.1% 86.1% 18543783 1.0 48003 386 paragraph_contains__XPCc 470 18113750 1.1% 87.2% 18113750 1.0 1811375 10 indentation_size__Xv 471... 472 473When the same experiment is repeated, this time adding the 474-check-tag-nesting option, the profile looks like this: 475 476 cycles %cycles cum% instrs c/i calls c/call name 477 401335489 18.6% 18.6% 401335489 1.0 680 590199 yy_get_next_buffer__Xv 478 289269893 13.4% 32.0% 289269893 1.0 1 289269893 yylex__Xv 479 207404011 9.6% 41.7% 207404011 1.0 92374 2245 get_name_by_style__XPCc 480 186740484 8.7% 50.3% 186740484 1.0 5328755 35 dputc__Xi 481 129298105 6.0% 56.3% 129298105 1.0 184240 702 _doprnt 482 103959911 4.8% 61.1% 103959911 1.0 2735657 38 yyinput__Xv 483 70219906 3.3% 64.4% 70219906 1.0 2335227 30 strcmp 484 69497989 3.2% 67.6% 69497989 1.0 3008303 23 strncmp 485 59729355 2.8% 70.4% 59729355 1.0 421340 142 out_string__XPCc 486 56872936 2.6% 73.0% 56872936 1.0 2829934 20 out_char__Xi 487 56730540 2.6% 75.7% 56730540 1.0 3782036 15 __iswctype_sb 488 55964230 2.6% 78.2% 55964230 1.0 52281 1070 search__XPCcPCc 489 48358354 2.2% 80.5% 48358354 1.0 958937 50 memcpy 490 44589019 2.1% 82.6% 44589019 1.0 185581 240 normalize_tag__XPc 491 38082936 1.8% 84.3% 38082936 1.0 1081174 35 NLstrchr 492 34056501 1.6% 85.9% 34056501 1.0 948 35925 copy_verbatim__Xv 493 25740065 1.2% 87.1% 25740065 1.0 1980005 13 isspace 494 495This option seems to add about 10% to the execution time, but still, 496the time attributed to strcmp() and strncmp() is still less than 7%. 497 498Here is a another sample profile, also from pixie, but for the program 499running on an SGI Challenge L system running IRIX 5.3: 500 501Procedures ordered by execution time: 502 cycles %cycles cum% instrs cycles calls cycles procedure 503 /inst /call 5042063493343 59.1% 59.1% 2063493343 1.0 680 3034549 yy_get_next_buffer__Fv 505 349231336 10.0% 69.1% 349231336 1.0 1 349231336 yylex__Fv 506 181381967 5.2% 74.3% 174277699 1.0 421340 430 out_string__FPCc 507 136797681 3.9% 78.2% 136797681 1.0 2735657 50 yyinput__Fv 508 102750980 2.9% 81.2% 102750980 1.0 2498820 41 out_verbatim_char__Fi 509 61240202 1.8% 82.9% 61240202 1.0 948 64599 copy_verbatim__Fv 510 61100974 1.8% 84.7% 61100974 1.0 185581 329 normalize_tag__FPc 511 49604130 1.4% 86.1% 49604130 1.0 48003 1033 paragraph_contains__FPCc 512 38372206 1.1% 87.2% 38372206 1.0 1962571 20 strncmp 513 38183748 1.1% 88.3% 38183748 1.0 1081059 35 strchr 514 37590583 1.1% 89.4% 30311767 1.2 227463 165 hash_lookup__FPCcP10Hash_Table 515 35579988 1.0% 90.4% 35579988 1.0 585204 61 trim_line__Fi 516 517On this machine, the lexical analyzer is using about 73% of the time. 518 519From a Sun SPARCstation LX running Solaris 2.5, here is part of the 520flat profile produced by gprof, using a 13MB test file formed from 521all of the HTML files on my home system: 522 523granularity: each sample hit covers 2 byte(s) for 0.00% of 1071.27 seconds 524 525 % cumulative self self total 526 time seconds seconds calls ms/call ms/call name 527 68.1 729.23 729.23 1635 446.01 459.94 yy_get_next_buffer [4] 528 5.1 783.52 54.30 22060617 0.00 0.00 dputc [9] 529 5.0 836.82 53.30 _mcount (615) 530 4.7 887.06 50.24 2843 17.67 17.67 _write [15] 531 2.8 917.23 30.17 oldarc [19] 532 2.1 939.76 22.53 1636 13.77 13.77 _read [22] 533 2.0 961.68 21.92 11225462 0.00 0.06 input [5] 534 2.0 982.86 21.18 next [23] 535 0.9 992.24 9.38 done [27] 536 0.7 999.83 7.59 11419666 0.00 0.00 out_char [28] 537 0.7 1007.02 7.19 1 7190.15 945357.77 yylex [3] 538 0.5 1012.33 5.31 _moncontrol [34] 539 0.4 1016.21 3.88 86155 0.05 0.05 _memcpy [35] 540 0.4 1020.00 3.79 107 35.42 1241.25 complex_markup_declaration [8] 541 0.3 1023.38 3.39 10640950 0.00 0.01 out_verbatim_char [12] 542 0.3 1026.67 3.29 678 4.85 804.24 copy_verbatim [7] 543 0.3 1029.46 2.79 chainloop [42] 544 0.3 1032.20 2.74 274 9.98 10.37 doctype [41] 545 0.3 1034.93 2.73 346694 0.01 0.01 normalize_tag [36] 546 0.2 1036.69 1.76 1027598 0.00 0.00 strchr [48] 547 0.2 1038.33 1.65 1483003 0.00 0.00 indentation_size [43] 548 0.2 1039.95 1.62 276567 0.01 0.24 out_string [10] 549 550Clearly, on all three systems, I/O is the chief consumer of CPU time. 551 552To that end, I made some additional changes in the code to reduce the 553number of system calls for I/O, by redimensioning the big_buffer[] 554array from MAXBUF to MAXBIGBUF, and then adding calls in main() to 555setvbuf() (when available) to allocate input and output buffers, also 556of size MAXBIGBUF (default: max(16384,MAXBUF)). 557 558Experiments on DEC Alpha 2100-5/250 OSF/1 3.2 HP 9000/735 HP-UX 10.01, 559NeXT Mach 3.3, SGI Challenge L IRIX 5.3, and Sun Solaris 2.5 systems 560with MAXBIGBUF values of 2048, 4096, 16384, 65536, 262144, and 1048576 561showed little change in run times with a 5MB test input file. The HP 562system had the largest reduction, only about 5%, as MAXBIGBUF 563increased. 564 565I also made additional experiments with function inlining: choosing 566the top 15 time consumers from profiles (blank(), copy_verbatim(), 567dputc(), indentation_size(), last_char(), normalize_tag(), 568out_blank(), out_char(), out_string(), out_verbatim_char(), 569out_verbatim_string(), out_yytext(), trim_line(), yyinput(), and 570yylook()) and asking the C++ compiler to inline them reduced execution 571time by 37% on Sun systems. I have therefore added INLINE directives 572to the definitions and declarations of those functions (except 573yyinput() and yylook(), which are in lex/flex-generated code). 574 575With the current version (SC4.0) of the Sun Solaris C++ compiler, 576inline directives do not seem to cause function inlining, but an 577explicit (but horrid) option 578 579-xinline=__0FGyylookv,__0FHyyinputv,__0FFdputci,__0FKout_stringPCc,__0FJlast_chari,strncmp,strchr,__0FIout_chari,__0FQindentation_sizev,__0FNnormalize_tagPc,__0FRout_verbatim_chari,strcmp,_memcpy,__0FKout_yytextv,__0FNcopy_verbatimv,__0FFblankv,__0FJout_blankv,__0FJtrim_linei,__0FTout_verbatim_stringPCc 580 581did so. 582 583One optimization that looked promising, but actually made the program 584run slightly slower, was to add additional patterns for the commonest 585tags (I chose the top 73 from a frequency-ordered list of all of the 586tags used in the Web tree at http://www.math.utah.edu/), and then have 587an action that cached the formatting function, e.g., 588 589{BEGINPAIR}{B}{I}{G}{ENDTAG} { DO_TAG2("BIG") } 590 591(DO_TAG2() is still defined in htmlpty.l), instead of looking it up 592each time as do_tag() does. Even though this greatly reduced the 593number of calls to get_action_by_name(), it appears that the 594additional complexity in the lexical analyzer made up for that gain, 595and resulted in a slower program. 596 597Because lex does not run a preprocessor, there is no way to retain 598those 73 patterns in the source file, but I have saved them in my 599development directory for possible future work. The rest of the 600required support code is retained in htmlpty.l and table.c inside a 601 602#if defined(TAG_CACHE) 603... 604#endif /* defined(TAG_CACHE) */ 605 606preprocessor conditional. 607 608 609=================== 610IBM PC INSTALLATION 611=================== 612 613Up to version 0.08, html-pretty built without problems under Turbo C 6142.0 and 3.0, and passed the validation suite. 615 616With version 0.09, the lex/flex-generated jump tables are larger, and 617the nasty Intel segmented memory architecture rears its ugly head, and 618it took me several hours of work to get a working version for IBM PC 619DOS. I tried Microsoft C 5.0, 5.1, and 6.0, and Turbo C 2.0 and 3.0. 620I also have Microsoft C 7.0, but it will not run under SunPC. 621 622Compilation under Microsoft C 6.0 requires addition of -Dconst= , 623because of an error in the compiler: it thinks that once an array is 624declared of type const char *, then you cannot assign to it! 625 626lex-generated htmlpty.c compiles with Microsoft C 5.0, 5.1, and 6.0, 627but gives incorrect output. 628 629The flex-generated htmlpty.c won't compile with Microsoft C 5.0, 5.1, 6306.0, or Turbo C 2.0 or 3.0 -- four complain about data group > 64K, 631even in the compact and huge memory models; Microsoft C 6.0 just 632produces this: 633 634htmlpty.c 635htmlpty.c(4781) : fatal error C1001: Internal Compiler Error 636 (compiler file '@(#)regMD.c:1.100', line 3837) 637 Contact Microsoft Product Support Services 638 639I then modified the flex-generated htmlpty.c to incorporate the huge 640attribute on two arrays: 641 642 static yyconst short int huge yy_nxt[15842] = 643 static yyconst short huge yy_chk[15842] = 644 645Microsoft C 5.0 and 5.1 compile in the huge model, but htmlpty goes 646into an infinite loop. Microsoft C 6.0 produces the above fatal 647internal error message. 648 649tcc 2.0 won't compile at all: it seems to permit the huge attribute 650only on pointers, not on array objects. 651 652tcc 3.0 will compile the code in the compact and huge memory models, 653provided that -Dconst= is used to eliminate some apparent type 654conflict errors. I don't understand what is happening here: running 655the tcc cpp (C preprocessor) on the code shows that yyconst expands to 656const through most of the code, then expands to an empty string in the 657rest, without ever having been redefined! 658 659The resulting executable from the tcc 3.0 compilation runs, and passes 660the validation suite. 661 662I then tried the lex-generated htmlpty.c, adding the huge attribute to 663these three lines: 664 665 int huge yyvstop[] = { 666 struct yywork { YYTYPE verify, advance; } huge yycrank[] = { 667 struct yysvf huge yysvec[] = { 668 669This compiled and linked with tcc 3.0, but the output is incorrect. 670 671Conclusion: the only workable compiler that I have for building 672html-pretty on the IBM PC is Turbo C 3.0. 673 674Trapping of the warning and error messages sent to stderr requires the 675Microsoft errout executable; it is used in the pccheck.bat script. 676 677For version 1.00, I prepared PC/config.h by hand and got a successful 678build with Turbo C 3.0. All but one of the validation tests passed. 679The single failure is check016, whose nested style files seem to 680exhaust PC memory, causing it to hang completely without getting a 681failure return from fopen(). By experiment, I found that if I 682replaced the line 683 684-stylefile check016.st4 685 686in check016.st3 by 687 688-print-stylefile 689 690then the test would produce the expected results. I don't have time 691or patience to pursue a workaround for this, but the limitation does 692not seem serious anyway, since style files nested deeper than two 693levels are unlikely to be necessary in practice, and relatively 694trivial to avoid. 695 696The batch file Test/docheck.bat can be used to run the checks. PC DOS 697lacks an adequate file difference utility, so I ran the difference 698tests on a UNIX system using a simple csh/tcsh loop: 699 700 foreach f (*.out *.err) 701 echo ========== $f 702 diff $f okay/$f 703 end 704 705Before running this loop, it may be necessary to change test file line 706terminators to either DOS CR-LF or UNIX LF conventions, perhaps using 707the dos2ux/ux2dos utilities available at 708 709 ftp://ftp.math.utah.edu/pub/misc/dosmacux-x.y.* 710 711Perhaps some PC installer will be able to build the program under 712newer compilers that avoid the obnoxious 64K segment limit, and send 713me a .exe file and a .bat file to build that program, for 714incorporation in later releases. 715 716 717====================== 718TESTING AND VALIDATION 719====================== 720 721While successful passing of the validation suite with "make check" 722gives confidence in the correct operation of the program, it is 723helpful to test the program further with coverage analyzers (such as 724Sun's tcov), profilers (such as prof, gprof, and pixie), and debuggers 725with memory leak detection and pointer access checking (such as Sun's 726dbx 3.1). Also, testing against dozens of C and C++ compilers, with 727maximal warnings requested, and runs with lint, helps to weed out 728errors that unnecessarily limit portability. 729 730------------------ 731Memory utilization 732------------------ 733 734Runs under dbx show no memory leaks, and only one kind of pointer 735violation, rui (read-from-uninitialized memory) errors, which arise in 736access to data returned by getpwnam(), and in access to the lex/flex 737yytext[] buffer, and which, as far as I can see, are bogus, since the 738returned data is valid, and can be displayed by the debugger, but 739which the debugger complains about when the program tries to access 740it. 741 742 % dbx htmlpty 743 (dbx) check -all 744 (dbx) suppress rui 745 (dbx) run test/check005.in >/dev/null 746 (dbx) ...prettyprinter warnings... 747 Checking for memory leaks... 748 Leak Summary: 749 actual leaks: 0 total size: 0 bytes 750 possible leaks: 0 total size: 0 bytes 751 752 Blocks in use Summary: 753 blocks in use: 304 total size: 26108 bytes 754 755 execution completed, exit code is 0 756 757Setting a breakpoint at the final return in main(), and requesting a 758memory usage report produces: 759 760 4286 return ((g_errors > 0) ? EXIT_FAILURE : EXIT_SUCCESS); 761 (dbx) showmemuse -n 999 -a 762 Checking for memory use... 763 764 Blocks in use (biu) report: 765 766 Total % of Num of Avg Allocation trace 767 Size All Blocks Block 768 Size 769 ========= ====== ====== ======== ======================================= 770 8200 90.37% 1 8200 _findbuf < _doprnt < printf < generate_style_file < main 771 592 6.52% 1 592 calloc < _tzload < _ltzset_u < localtime_u < ctime < generate_style_file < main 772 148 1.63% 1 148 calloc < _tzload < _ltzset_u < localtime_u < ctime < generate_style_file < main 773 36 0.39% 1 36 calloc < _tzload < _ltzset_u < localtime_u < ctime < generate_style_file < main 774 36 0.39% 1 36 calloc < _tzload < _ltzset_u < localtime_u < ctime < generate_style_file < main 775 36 0.39% 1 36 strdup < _tzload < _ltzset_u < localtime_u < ctime < generate_style_file < main 776 13 0.14% 1 13 calloc < _tzload < _ltzset_u < localtime_u < ctime < generate_style_file < main 777 12 0.13% 1 12 tzcpy < getzname < _ltzset_u < localtime_u < ctime < generate_style_file < main 778 779 Blocks in use Summary: 780 blocks in use: 8 total size: 9073 bytes 781 782The only unfreed memory left is that allocated by Sun library 783routines, findbuf() and tzload(), neither of which are under user 784control. 785 786 787---------------- 788Input Robustness 789---------------- 790 791The fuzz package, described in the paper 792 793@String{j-CACM = "Communications of the ACM"} 794 795@Article{Miller:1990:SRU, 796 author = "Barton P. Miller and Lars Fredriksen and Bryan So", 797 title = "Study of the Reliability of {UNIX} Utilities", 798 journal = j-CACM, 799 volume = "33", 800 number = "12", 801 pages = "33--44", 802 month = dec, 803 year = "1990", 804 CODEN = "CACMA2", 805 ISSN = "0001-0782", 806 bibdate = "Wed Aug 31 17:57:41 1994", 807 acknowledgement = ack-nhfb, 808} 809 810and followed up in 1995 with a revisit in a technical report available 811at ftp://ftp.cs.wisc.edu/par-distr-sys/fuzz, has also been applied. 812 813The fuzz package, which essentially feeds random garbage to a 814program's input stream, has turned up bugs in numerous UNIX utilities, 815and was able to make many of them core dump, and in at least one case, 816was able to crash the entire operating system. 817 818Despite wide availability of the fuzz package after the 1990 paper, 819five years later, many of the same bugs were found in commercial UNIX 820systems; the system that fared the best in the fuzz tests was the Free 821Software Foundation's GNU system. 822 823The fuzz tests found no problems in this program, when run this way 824 825 cd /u/sy/beebe/src/fuzz/fuzz-1995-basic/src/fuzz/script1 826 ln /u/sy/beebe/src/htmlpty/htmlpty-1.00/htmlpty .. 827 828 ./run.stdin ../htmlpty 829 830 ../htmlpty < t1 >& /dev/null 831 ../htmlpty < t2 >& /dev/null 832 ../htmlpty < t3 >& /dev/null 833 ../htmlpty < t4 >& /dev/null 834 ../htmlpty < t5 >& /dev/null 835 ../htmlpty < t6 >& /dev/null 836 ../htmlpty < t7 >& /dev/null 837 ../htmlpty < t8 >& /dev/null 838 ../htmlpty < t9 >& /dev/null 839 ../htmlpty < t10 >& /dev/null 840 ../htmlpty < t11 >& /dev/null 841 ../htmlpty < t12 >& /dev/null 842 843 ./run.file ../htmlpty 844 845 ../htmlpty t1 >& /dev/null 846 ../htmlpty t2 >& /dev/null 847 ../htmlpty t3 >& /dev/null 848 ../htmlpty t4 >& /dev/null 849 ../htmlpty t5 >& /dev/null 850 ../htmlpty t6 >& /dev/null 851 ../htmlpty t7 >& /dev/null 852 ../htmlpty t8 >& /dev/null 853 ../htmlpty t9 >& /dev/null 854 ../htmlpty t10 >& /dev/null 855 ../htmlpty t11 >& /dev/null 856 ../htmlpty t12 >& /dev/null 857 858 cd ../script2 859 860 ./run.stdin ../htmlpty 861 862 ../htmlpty < t1 >& /dev/null 863 ../htmlpty < t2 >& /dev/null 864 ../htmlpty < t3 >& /dev/null 865 ../htmlpty < t4 >& /dev/null 866 ../htmlpty < t5 >& /dev/null 867 ../htmlpty < t6 >& /dev/null 868 ../htmlpty < t7 >& /dev/null 869 ../htmlpty < t8 >& /dev/null 870 ../htmlpty < t9 >& /dev/null 871 ../htmlpty < t10 >& /dev/null 872 ../htmlpty < t11 >& /dev/null 873 ../htmlpty < t12 >& /dev/null 874 875 ./run.file ../htmlpty 876 877 ../htmlpty t1 >& /dev/null 878 ../htmlpty t2 >& /dev/null 879 ../htmlpty t3 >& /dev/null 880 ../htmlpty t4 >& /dev/null 881 ../htmlpty t5 >& /dev/null 882 ../htmlpty t6 >& /dev/null 883 ../htmlpty t7 >& /dev/null 884 ../htmlpty t8 >& /dev/null 885 ../htmlpty t9 >& /dev/null 886 ../htmlpty t10 >& /dev/null 887 ../htmlpty t11 >& /dev/null 888 ../htmlpty t12 >& /dev/null 889 890 rm -f ../htmlpty 891 892=============== 893PROBLEM SYSTEMS 894=============== 895 896On a HALstation, 897 898 uname -a 899 SunOS hal 5.4 SPARC64/OS_2.4.5 sun4H sparc 900 901building with the Fujitsu C++ compiler like this 902 903 env CC=FCC ./configure && make all check 904 905produces successful compilations, and all but a single check pass. 906The failure is in check005, and it happens at all optimization levels, 907including -g. Each time, the Test/check005.out file has binary 908garbage in it. I was unable to debug this with either gdb or fdb: 909gdb cannot usefully find symbols to print, such as big_next_position, 910and fdb aborts immediately on startup. 911 912Changing from FCC to hcc, the HAL C/C++ compiler, produces a 913successful build and test. 914