1% This program by D. E. Knuth is not copyrighted and can be used freely. 2% Version 0 was released in December, 1981. 3% Version 1 was released in September, 1982, with version 0 of TeX. 4% Slight changes were made in October, 1982, for version 0.6 of TeX. 5% Version 1.2 introduced {:nnn} comments, added @@= and @@\ (December, 1982). 6% Version 1.4 added "history" (February, 1983). 7% Version 1.5 conformed to TeX version 0.96 and fixed @@\ (March, 1983). 8% Version 1.7 introduced the new change file format (June, 1983). 9% Version 2.0 was released in July, 1983, with version 0.999 of TeX. 10% Version 2.5 was released in November, 1983, with version 1.0 of TeX. 11% Version 2.6 fixed a bug: force-line-break after a constant (August, 1984). 12% Version 2.7 fixed the definition of check_sum_prime (May, 1985). 13% Version 2.8 fixed a bug in change_buffer movement (August, 1985). 14% Version 2.9 allows nonnumeric macros before their def (December, 1988). 15% Version 3, for Sewell's book, fixed long-line bug in input_ln (March, 1989). 16% Version 4 was major change to allow 8-bit input (September, 1989). 17% Version 4.1 conforms to ANSI standard for-loop rules (September, 1990). 18% Version 4.2 fixes stat report if phase one dies (March, 1991). 19% Version 4.3 fixes @@ bug in verbatim, catches extra } (September, 1991). 20% Version 4.4 activates debug_help on errors as advertised (February, 1993). 21% Version 4.5 prevents modno-comments from being split across lines (Dec 2002). 22 23% Here is TeX material that gets inserted after \input webmac 24\def\hang{\hangindent 3em\indent\ignorespaces} 25\font\ninerm=cmr9 26\let\mc=\ninerm % medium caps for names like SAIL 27\def\PASCAL{Pascal} 28\def\pb{$\.|\ldots\.|$} % Pascal brackets (|...|) 29\def\v{\.{\char'174}} % vertical (|) in typewriter font 30\mathchardef\BA="3224 % double arrow 31\def\({} % kludge for alphabetizing certain module names 32 33\def\title{TANGLE} 34\def\contentspagenumber{123} % should be odd 35\def\topofcontents{\null\vfill 36 \titlefalse % include headline on the contents page 37 \def\rheader{\mainfont Appendix E\hfil \contentspagenumber} 38 \centerline{\titlefont The {\ttitlefont TANGLE} processor} 39 \vskip 15pt 40 \centerline{(Version 4.5)} 41 \vfill} 42\pageno=\contentspagenumber \advance\pageno by 1 43 44@* Introduction. 45This program converts a \.{WEB} file to a \PASCAL\ file. It was written 46by D. E. Knuth in September, 1981; a somewhat similar {\mc SAIL} program had 47been developed in March, 1979. Since this program describes itself, a 48bootstrapping process involving hand-translation had to be used to get started. 49 50For large \.{WEB} files one should have a large memory, since \.{TANGLE} keeps 51all the \PASCAL\ text in memory (in an abbreviated form). The program uses 52a few features of the local \PASCAL\ compiler that may need to be changed in 53other installations: 54 55\yskip\item{1)} Case statements have a default. 56\item{2)} Input-output routines may need to be adapted for use with a particular 57character set and/or for printing messages on the user's terminal. 58 59\yskip\noindent 60These features are also present in the \PASCAL\ version of \TeX, where they 61are used in a similar (but more complex) way. System-dependent portions 62of \.{TANGLE} can be identified by looking at the entries for `system 63dependencies' in the index below. 64@!@^system dependencies@> 65 66The ``banner line'' defined here should be changed whenever \.{TANGLE} 67is modified. 68 69@d banner=='This is TANGLE, Version 4.5' 70 71@ The program begins with a fairly normal header, made up of pieces that 72@^system dependencies@> 73will mostly be filled in later. The \.{WEB} input comes from files |web_file| 74and |change_file|, the \PASCAL\ output goes to file |Pascal_file|, 75and the string pool output goes to file |pool|. 76 77If it is necessary to abort the job because of a fatal error, the program 78calls the `|jump_out|' procedure, which goes to the label |end_of_TANGLE|. 79 80@d end_of_TANGLE = 9999 {go here to wrap it up} 81 82@p @t\4@>@<Compiler directives@>@/ 83program TANGLE(@!web_file,@!change_file,@!Pascal_file,@!pool); 84label end_of_TANGLE; {go here to finish} 85const @<Constants in the outer block@>@/ 86type @<Types in the outer block@>@/ 87var @<Globals in the outer block@>@/ 88@<Error handling procedures@>@/ 89procedure initialize; 90 var @<Local variables for initialization@>@/ 91 begin @<Set initial values@>@/ 92 end; 93 94@ Some of this code is optional for use when debugging only; 95such material is enclosed between the delimiters |debug| and $|gubed|$. 96Other parts, delimited by |stat| and $|tats|$, are optionally included if 97statistics about \.{TANGLE}'s memory usage are desired. 98 99@d debug==@{ {change this to `$\\{debug}\equiv\null$' when debugging} 100@d gubed==@t@>@} {change this to `$\\{gubed}\equiv\null$' when debugging} 101@f debug==begin 102@f gubed==end 103@# 104@d stat==@{ {change this to `$\\{stat}\equiv\null$' 105 when gathering usage statistics} 106@d tats==@t@>@} {change this to `$\\{tats}\equiv\null$' 107 when gathering usage statistics} 108@f stat==begin 109@f tats==end 110 111@ The \PASCAL\ compiler used to develop this system has ``compiler 112directives'' that can appear in comments whose first character is a dollar sign. 113In production versions of \.{TANGLE} these directives tell the compiler that 114@^system dependencies@> 115it is safe to avoid range checks and to leave out the extra code it inserts 116for the \PASCAL\ debugger's benefit, although interrupts will occur if 117there is arithmetic overflow. 118 119@<Compiler directives@>= 120@{@&$C-,A+,D-@} {no range check, catch arithmetic overflow, no debug overhead} 121@!debug @{@&$C+,D+@}@+ gubed {but turn everything on when debugging} 122 123@ Labels are given symbolic names by the following definitions. We insert 124the label `|exit|:' just before the `\ignorespaces|end|\unskip' of a 125procedure in which we have used the `|return|' statement defined below; 126the label `|restart|' is occasionally used at the very beginning of a 127procedure; and the label `|reswitch|' is occasionally used just prior to 128a \&{case} statement in which some cases change the conditions and we wish to 129branch to the newly applicable case. 130Loops that are set up with the \&{loop} construction defined below are 131commonly exited by going to `|done|' or to `|found|' or to `|not_found|', 132and they are sometimes repeated by going to `|continue|'. 133 134@d exit=10 {go here to leave a procedure} 135@d restart=20 {go here to start a procedure again} 136@d reswitch=21 {go here to start a case statement again} 137@d continue=22 {go here to resume a loop} 138@d done=30 {go here to exit a loop} 139@d found=31 {go here when you've found it} 140@d not_found=32 {go here when you've found something else} 141 142@ Here are some macros for common programming idioms. 143 144@d incr(#) == #:=#+1 {increase a variable by unity} 145@d decr(#) == #:=#-1 {decrease a variable by unity} 146@d loop == @+ while true do@+ {repeat over and over until a |goto| happens} 147@d do_nothing == {empty statement} 148@d return == goto exit {terminate a procedure call} 149@f return == nil 150@f loop == xclause 151 152@ We assume that |case| statements may include a default case that applies 153if no matching label is found. Thus, we shall use constructions like 154@^system dependencies@> 155$$\vbox{\halign{#\hfil\cr 156|case x of|\cr 1571: $\langle\,$code for $x=1\,\rangle$;\cr 1583: $\langle\,$code for $x=3\,\rangle$;\cr 159|othercases| $\langle\,$code for |x<>1| and |x<>3|$\,\rangle$\cr 160|endcases|\cr}}$$ 161since most \PASCAL\ compilers have plugged this hole in the language by 162incorporating some sort of default mechanism. For example, the compiler 163used to develop \.{WEB} and \TeX\ allows `|others|:' as a default label, 164and other \PASCAL s allow syntaxes like `\ignorespaces|else|\unskip' or 165`\&{otherwise}' or `\\{otherwise}:', etc. The definitions of |othercases| 166and |endcases| should be changed to agree with local conventions. (Of 167course, if no default mechanism is available, the |case| statements of 168this program must be extended by listing all remaining cases. The author 169would have taken the trouble to modify \.{TANGLE} so that such extensions 170were done automatically, if he had not wanted to encourage \PASCAL\ 171compiler writers to make this important change in \PASCAL, where it belongs.) 172 173@d othercases == others: {default for cases not listed explicitly} 174@d endcases == @+end {follows the default case in an extended |case| statement} 175@f othercases == else 176@f endcases == end 177 178@ The following parameters are set big enough to handle \TeX, so they 179should be sufficient for most applications of \.{TANGLE}. 180 181@<Constants...@>= 182@!buf_size=100; {maximum length of input line} 183@!max_bytes=45000; {|1/ww| times the number of bytes in identifiers, 184 strings, and module names; must be less than 65536} 185@!max_toks=50000; {|1/zz| times the number of bytes in compressed \PASCAL\ code; 186 must be less than 65536} 187@!max_names=4000; {number of identifiers, strings, module names; 188 must be less than 10240} 189@!max_texts=2000; {number of replacement texts, must be less than 10240} 190@!hash_size=353; {should be prime} 191@!longest_name=400; {module names shouldn't be longer than this} 192@!line_length=72; {lines of \PASCAL\ output have at most this many characters} 193@!out_buf_size=144; {length of output buffer, should be twice |line_length|} 194@!stack_size=50; {number of simultaneous levels of macro expansion} 195@!max_id_length=12; {long identifiers are chopped to this length, which must 196 not exceed |line_length|} 197@!unambig_length=7; {identifiers must be unique if chopped to this length} 198 {note that 7 is more strict than \PASCAL's 8, but this can be varied} 199 200@ A global variable called |history| will contain one of four values 201at the end of every run: |spotless| means that no unusual messages were 202printed; |harmless_message| means that a message of possible interest 203was printed but no serious errors were detected; |error_message| means that 204at least one error was found; |fatal_message| means that the program 205terminated abnormally. The value of |history| does not influence the 206behavior of the program; it is simply computed for the convenience 207of systems that might want to use such information. 208 209@d spotless=0 {|history| value for normal jobs} 210@d harmless_message=1 {|history| value when non-serious info was printed} 211@d error_message=2 {|history| value when an error was noted} 212@d fatal_message=3 {|history| value when we had to stop prematurely} 213@# 214@d mark_harmless==@t@>@+if history=spotless then history:=harmless_message 215@d mark_error==history:=error_message 216@d mark_fatal==history:=fatal_message 217 218@<Glob...@>=@!history:spotless..fatal_message; {how bad was this run?} 219 220@ @<Set init...@>=history:=spotless; 221 222@* The character set. 223One of the main goals in the design of \.{WEB} has been to make it readily 224portable between a wide variety of computers. Yet \.{WEB} by its very 225nature must use a greater variety of characters than most computer 226programs deal with, and character encoding is one of the areas in which 227existing machines differ most widely from each other. 228 229To resolve this problem, all input to \.{WEAVE} and \.{TANGLE} is converted 230to an internal eight-bit code that is essentially standard ASCII, the ``American 231Standard Code for Information Interchange.'' The conversion is done 232immediately when each character is read in. Conversely, characters are 233converted from ASCII to the user's external representation just before 234they are output. (The original ASCII code was seven bits only; \.{WEB} now 235allows eight bits in an attempt to keep up with modern times.) 236 237Such an internal code is relevant to users of \.{WEB} only because it is 238the code used for preprocessed constants like \.{"A"}. If you are writing 239a program in \.{WEB} that makes use of such one-character constants, you 240should convert your input to ASCII form, like \.{WEAVE} and \.{TANGLE} do. 241Otherwise \.{WEB}'s internal coding scheme does not affect you. 242@^ASCII code@> 243 244Here is a table of the standard visible ASCII codes: 245$$\def\:{\char\count255\global\advance\count255 by 1} 246\count255='40 247\vbox{ 248\hbox{\hbox to 40pt{\it\hfill0\/\hfill}% 249\hbox to 40pt{\it\hfill1\/\hfill}% 250\hbox to 40pt{\it\hfill2\/\hfill}% 251\hbox to 40pt{\it\hfill3\/\hfill}% 252\hbox to 40pt{\it\hfill4\/\hfill}% 253\hbox to 40pt{\it\hfill5\/\hfill}% 254\hbox to 40pt{\it\hfill6\/\hfill}% 255\hbox to 40pt{\it\hfill7\/\hfill}} 256\vskip 4pt 257\hrule 258\def\^{\vrule height 10.5pt depth 4.5pt} 259\halign{\hbox to 0pt{\hskip -24pt\O{#0}\hfill}&\^ 260\hbox to 40pt{\tt\hfill#\hfill\^}& 261&\hbox to 40pt{\tt\hfill#\hfill\^}\cr 26204&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 26305&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 26406&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 26507&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 26610&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 26711&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 26812&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 26913&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 27014&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 27115&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 27216&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 27317&\:&\:&\:&\:&\:&\:&\:\cr} 274\hrule width 280pt}$$ 275(Actually, of course, code @'040 is an invisible blank space.) Code @'136 276was once an upward arrow (\.{\char'13}), and code @'137 was 277once a left arrow (\.^^X), in olden times when the first draft 278of ASCII code was prepared; but \.{WEB} works with today's standard 279ASCII in which those codes represent circumflex and underline as shown. 280 281@<Types...@>= 282@!ASCII_code=0..255; {eight-bit numbers, a subrange of the integers} 283 284@ The original \PASCAL\ compiler was designed in the late 60s, when six-bit 285character sets were common, so it did not make provision for lowercase 286letters. Nowadays, of course, we need to deal with both capital and small 287letters in a convenient way, so \.{WEB} assumes that it is being used 288with a \PASCAL\ whose character set contains at least the characters of 289standard ASCII as listed above. Some \PASCAL\ compilers use the original 290name |char| for the data type associated with the characters in text files, 291while other \PASCAL s consider |char| to be a 64-element subrange of a larger 292data type that has some other name. 293 294In order to accommodate this difference, we shall use the name |text_char| 295to stand for the data type of the characters in the input and output 296files. We shall also assume that |text_char| consists of the elements 297|chr(first_text_char)| through |chr(last_text_char)|, inclusive. The 298following definitions should be adjusted if necessary. 299@^system dependencies@> 300 301@d text_char == char {the data type of characters in text files} 302@d first_text_char=0 {ordinal number of the smallest element of |text_char|} 303@d last_text_char=255 {ordinal number of the largest element of |text_char|} 304 305@<Types...@>= 306@!text_file=packed file of text_char; 307 308@ The \.{WEAVE} and \.{TANGLE} processors convert between ASCII code and 309the user's external character set by means of arrays |xord| and |xchr| 310that are analogous to \PASCAL's |ord| and |chr| functions. 311 312@<Globals...@>= 313@!xord: array [text_char] of ASCII_code; 314 {specifies conversion of input characters} 315@!xchr: array [ASCII_code] of text_char; 316 {specifies conversion of output characters} 317 318@ If we assume that every system using \.{WEB} is able to read and write the 319visible characters of standard ASCII (although not necessarily using the 320ASCII codes to represent them), the following assignment statements initialize 321most of the |xchr| array properly, without needing any system-dependent 322changes. For example, the statement \.{xchr[@@\'101]:=\'A\'} that appears 323in the present \.{WEB} file might be encoded in, say, {\mc EBCDIC} code 324on the external medium on which it resides, but \.{TANGLE} will convert from 325this external code to ASCII and back again. Therefore the assignment 326statement \.{XCHR[65]:=\'A\'} will appear in the corresponding \PASCAL\ file, 327and \PASCAL\ will compile this statement so that |xchr[65]| receives the 328character \.A in the external (|char|) code. Note that it would be quite 329incorrect to say \.{xchr[@@\'101]:="A"}, because |"A"| is a constant of 330type |integer|, not |char|, and because we have $|"A"|=65$ regardless of 331the external character set. 332 333@<Set init...@>= 334xchr[@'40]:=' '; 335xchr[@'41]:='!'; 336xchr[@'42]:='"'; 337xchr[@'43]:='#'; 338xchr[@'44]:='$'; 339xchr[@'45]:='%'; 340xchr[@'46]:='&'; 341xchr[@'47]:='''';@/ 342xchr[@'50]:='('; 343xchr[@'51]:=')'; 344xchr[@'52]:='*'; 345xchr[@'53]:='+'; 346xchr[@'54]:=','; 347xchr[@'55]:='-'; 348xchr[@'56]:='.'; 349xchr[@'57]:='/';@/ 350xchr[@'60]:='0'; 351xchr[@'61]:='1'; 352xchr[@'62]:='2'; 353xchr[@'63]:='3'; 354xchr[@'64]:='4'; 355xchr[@'65]:='5'; 356xchr[@'66]:='6'; 357xchr[@'67]:='7';@/ 358xchr[@'70]:='8'; 359xchr[@'71]:='9'; 360xchr[@'72]:=':'; 361xchr[@'73]:=';'; 362xchr[@'74]:='<'; 363xchr[@'75]:='='; 364xchr[@'76]:='>'; 365xchr[@'77]:='?';@/ 366xchr[@'100]:='@@'; 367xchr[@'101]:='A'; 368xchr[@'102]:='B'; 369xchr[@'103]:='C'; 370xchr[@'104]:='D'; 371xchr[@'105]:='E'; 372xchr[@'106]:='F'; 373xchr[@'107]:='G';@/ 374xchr[@'110]:='H'; 375xchr[@'111]:='I'; 376xchr[@'112]:='J'; 377xchr[@'113]:='K'; 378xchr[@'114]:='L'; 379xchr[@'115]:='M'; 380xchr[@'116]:='N'; 381xchr[@'117]:='O';@/ 382xchr[@'120]:='P'; 383xchr[@'121]:='Q'; 384xchr[@'122]:='R'; 385xchr[@'123]:='S'; 386xchr[@'124]:='T'; 387xchr[@'125]:='U'; 388xchr[@'126]:='V'; 389xchr[@'127]:='W';@/ 390xchr[@'130]:='X'; 391xchr[@'131]:='Y'; 392xchr[@'132]:='Z'; 393xchr[@'133]:='['; 394xchr[@'134]:='\'; 395xchr[@'135]:=']'; 396xchr[@'136]:='^'; 397xchr[@'137]:='_';@/ 398xchr[@'140]:='`'; 399xchr[@'141]:='a'; 400xchr[@'142]:='b'; 401xchr[@'143]:='c'; 402xchr[@'144]:='d'; 403xchr[@'145]:='e'; 404xchr[@'146]:='f'; 405xchr[@'147]:='g';@/ 406xchr[@'150]:='h'; 407xchr[@'151]:='i'; 408xchr[@'152]:='j'; 409xchr[@'153]:='k'; 410xchr[@'154]:='l'; 411xchr[@'155]:='m'; 412xchr[@'156]:='n'; 413xchr[@'157]:='o';@/ 414xchr[@'160]:='p'; 415xchr[@'161]:='q'; 416xchr[@'162]:='r'; 417xchr[@'163]:='s'; 418xchr[@'164]:='t'; 419xchr[@'165]:='u'; 420xchr[@'166]:='v'; 421xchr[@'167]:='w';@/ 422xchr[@'170]:='x'; 423xchr[@'171]:='y'; 424xchr[@'172]:='z'; 425xchr[@'173]:='{'; 426xchr[@'174]:='|'; 427xchr[@'175]:='}'; 428xchr[@'176]:='~';@/ 429xchr[0]:=' '; xchr[@'177]:=' '; {these ASCII codes are not used} 430 431@ Some of the ASCII codes below @'40 have been given symbolic names in 432\.{WEAVE} and \.{TANGLE} because they are used with a special meaning. 433 434@d and_sign=@'4 {equivalent to `\.{and}'} 435@d not_sign=@'5 {equivalent to `\.{not}'} 436@d set_element_sign=@'6 {equivalent to `\.{in}'} 437@d tab_mark=@'11 {ASCII code used as tab-skip} 438@d line_feed=@'12 {ASCII code thrown away at end of line} 439@d form_feed=@'14 {ASCII code used at end of page} 440@d carriage_return=@'15 {ASCII code used at end of line} 441@d left_arrow=@'30 {equivalent to `\.{:=}'} 442@d not_equal=@'32 {equivalent to `\.{<>}'} 443@d less_or_equal=@'34 {equivalent to `\.{<=}'} 444@d greater_or_equal=@'35 {equivalent to `\.{>=}'} 445@d equivalence_sign=@'36 {equivalent to `\.{==}'} 446@d or_sign=@'37 {equivalent to `\.{or}'} 447 448@ When we initialize the |xord| array and the remaining parts of |xchr|, 449it will be convenient to make use of an index variable, |i|. 450 451@<Local variables for init...@>= 452@!i:0..255; 453 454@ Here now is the system-dependent part of the character set. 455If \.{WEB} is being implemented on a garden-variety \PASCAL\ for which 456only standard ASCII codes will appear in the input and output files, you 457don't need to make any changes here. But if you have, for example, an extended 458character set like the one in Appendix~C of {\sl The \TeX book}, the first 459line of code in this module should be changed to 460$$\hbox{|for i:=1 to @'37 do xchr[i]:=chr(i);|}$$ 461\.{WEB}'s character set is essentially identical to \TeX's, even with respect to 462characters less than @'40. 463@^system dependencies@> 464 465Changes to the present module will make \.{WEB} more friendly on computers 466that have an extended character set, so that one can type things like 467\.^^Z\ instead of \.{<>}. If you have an extended set of characters that 468are easily incorporated into text files, you can assign codes arbitrarily 469here, giving an |xchr| equivalent to whatever characters the users of 470\.{WEB} are allowed to have in their input files, provided that unsuitable 471characters do not correspond to special codes like |carriage_return| 472that are listed above. 473 474(The present file \.{TANGLE.WEB} does not contain any of the non-ASCII 475characters, because it is intended to be used with all implementations of 476\.{WEB}. It was originally created on a Stanford system that has a 477convenient extended character set, then ``sanitized'' by applying another 478program that transliterated all of the non-standard characters into 479standard equivalents.) 480 481@<Set init...@>= 482for i:=1 to @'37 do xchr[i]:=' '; 483for i:=@'200 to @'377 do xchr[i]:=' '; 484 485@ The following system-independent code makes the |xord| array contain a 486suitable inverse to the information in |xchr|. 487 488@<Set init...@>= 489for i:=first_text_char to last_text_char do xord[chr(i)]:=" "; 490for i:=1 to @'377 do xord[xchr[i]]:=i; 491xord[' ']:=" "; 492 493@* Input and output. 494The input conventions of this program are intended to be very much like those 495of \TeX\ (except, of course, that they are much simpler, because much less 496needs to be done). Furthermore they are identical to those of \.{WEAVE}. 497Therefore people who need to make modifications to all three systems 498should be able to do so without too many headaches. 499 500We use the standard \PASCAL\ input/output procedures in several places that 501\TeX\ cannot, since \.{TANGLE} does not have to deal with files that are named 502dynamically by the user, and since there is no input from the terminal. 503 504@ Terminal output is done by writing on file |term_out|, which is assumed to 505consist of characters of type |text_char|: 506@^system dependencies@> 507 508@d print(#)==write(term_out,#) {`|print|' means write on the terminal} 509@d print_ln(#)==write_ln(term_out,#) {`|print|' and then start new line} 510@d new_line==write_ln(term_out) {start new line} 511@d print_nl(#)== {print information starting on a new line} 512 begin new_line; print(#); 513 end 514 515@<Globals...@>= 516@!term_out:text_file; {the terminal as an output file} 517 518@ Different systems have different ways of specifying that the output on a 519certain file will appear on the user's terminal. Here is one way to do this 520on the \PASCAL\ system that was used in \.{TANGLE}'s initial development: 521@^system dependencies@> 522 523@<Set init...@>= 524rewrite(term_out,'TTY:'); {send |term_out| output to the terminal} 525 526@ The |update_terminal| procedure is called when we want 527to make sure that everything we have output to the terminal so far has 528actually left the computer's internal buffers and been sent. 529@^system dependencies@> 530 531@d update_terminal == break(term_out) {empty the terminal output buffer} 532 533@ The main input comes from |web_file|; this input may be overridden 534by changes in |change_file|. (If |change_file| is empty, there are no changes.) 535 536@<Globals...@>= 537@!web_file:text_file; {primary input} 538@!change_file:text_file; {updates} 539 540@ The following code opens the input files. Since these files were listed 541in the program header, we assume that the \PASCAL\ runtime system has 542already checked that suitable file names have been given; therefore no 543additional error checking needs to be done. 544@^system dependencies@> 545 546@p procedure open_input; {prepare to read |web_file| and |change_file|} 547begin reset(web_file); reset(change_file); 548end; 549 550@ The main output goes to |Pascal_file|, and string pool constants are 551written to the |pool| file. 552 553@<Globals...@>= 554@!Pascal_file: text_file; 555@!pool: text_file; 556 557@ The following code opens |Pascal_file| and |pool|. 558Since these files were listed in the program header, we assume that the 559\PASCAL\ runtime system has checked that suitable external file names have 560been given. 561@^system dependencies@> 562 563@<Set init...@>= 564rewrite(Pascal_file); rewrite(pool); 565 566@ Input goes into an array called |buffer|. 567 568@<Globals...@>=@!buffer: array[0..buf_size] of ASCII_code; 569 570@ The |input_ln| procedure brings the next line of input from the specified 571file into the |buffer| array and returns the value |true|, unless the file has 572already been entirely read, in which case it returns |false|. The conventions 573of \TeX\ are followed; i.e., |ASCII_code| numbers representing the next line 574of the file are input into |buffer[0]|, |buffer[1]|, \dots, 575|buffer[limit-1]|; trailing blanks are ignored; 576and the global variable |limit| is set to the length of the 577@^system dependencies@> 578line. The value of |limit| must be strictly less than |buf_size|. 579 580We assume that none of the |ASCII_code| values 581of |buffer[j]| for |0<=j<limit| is equal to 0, @'177, |line_feed|, |form_feed|, 582or |carriage_return|. 583 584@p function input_ln(var f:text_file):boolean; 585 {inputs a line or returns |false|} 586var final_limit:0..buf_size; {|limit| without trailing blanks} 587begin limit:=0; final_limit:=0; 588if eof(f) then input_ln:=false 589else begin while not eoln(f) do 590 begin buffer[limit]:=xord[f^]; get(f); 591 incr(limit); 592 if buffer[limit-1]<>" " then final_limit:=limit; 593 if limit=buf_size then 594 begin while not eoln(f) do get(f); 595 decr(limit); {keep |buffer[buf_size]| empty} 596 if final_limit>limit then final_limit:=limit; 597 print_nl('! Input line too long'); loc:=0; error; 598@.Input line too long@> 599 end; 600 end; 601 read_ln(f); limit:=final_limit; input_ln:=true; 602 end; 603end; 604 605@* Reporting errors to the user. 606The \.{TANGLE} processor operates in two phases: first it inputs the source 607file and stores a compressed representation of the program, then it produces 608the \PASCAL\ output from the compressed representation. 609 610The global variable |phase_one| tells whether we are in Phase I or not. 611 612@<Globals...@>= 613@!phase_one: boolean; {|true| in Phase I, |false| in Phase II} 614 615@ If an error is detected while we are debugging, 616we usually want to look at the contents of memory. 617A special procedure will be declared later for this purpose. 618 619@<Error handling...@>= 620@!debug @+ procedure debug_help; forward;@+ gubed 621 622@ During the first phase, syntax errors are reported to the user by saying 623$$\hbox{`|err_print('! Error message')|'},$$ 624followed by `|jump_out|' if no recovery from the error is provided. 625This will print the error message followed by an indication of where the error 626was spotted in the source file. Note that no period follows the error message, 627since the error routine will automatically supply a period. 628 629Errors that are noticed during the second phase are reported to the user 630in the same fashion, but the error message will be 631followed by an indication of where the error was spotted in the output file. 632 633The actual error indications are provided by a procedure called |error|. 634 635@d err_print(#)==begin new_line; print(#); error; 636 end 637 638@<Error handling...@>= 639procedure error; {prints '\..' and location of error message} 640var j: 0..out_buf_size; {index into |out_buf|} 641@!k,@!l: 0..buf_size; {indices into |buffer|} 642begin if phase_one then @<Print error location based on input buffer@> 643else @<Print error location based on output buffer@>; 644update_terminal; mark_error; 645@!debug debug_skipped:=debug_cycle; debug_help;@+gubed 646end; 647 648@ The error locations during Phase I can be indicated by using the global 649variables |loc|, |line|, and |changing|, which tell respectively the first 650unlooked-at position in |buffer|, the current line number, and whether or not 651the current line is from |change_file| or |web_file|. 652This routine should be modified on systems whose standard text editor 653has special line-numbering conventions. 654@^system dependencies@> 655 656@<Print error location based on input buffer@>= 657begin if changing then print('. (change file ')@+else print('. ('); 658print_ln('l.', line:1, ')'); 659if loc>=limit then l:=limit else l:=loc; 660for k:=1 to l do 661 if buffer[k-1]=tab_mark then print(' ') 662 else print(xchr[buffer[k-1]]); {print the characters already read} 663new_line; 664for k:=1 to l do print(' '); {space out the next line} 665for k:=l+1 to limit do print(xchr[buffer[k-1]]); {print the part not yet read} 666print(' '); {this space separates the message from future asterisks} 667end 668 669@ The position of errors detected during the second phase can be indicated 670by outputting the partially-filled output buffer, which contains |out_ptr| 671entries. 672 673@<Print error location based on output...@>= 674begin print_ln('. (l.',line:1,')'); 675for j:=1 to out_ptr do print(xchr[out_buf[j-1]]); {print current partial line} 676print('... '); {indicate that this information is partial} 677end 678 679@ The |jump_out| procedure just cuts across all active procedure levels 680and jumps out of the program. This is the only non-local |goto| statement 681in \.{TANGLE}. It is used when no recovery from a particular error has 682been provided. 683 684Some \PASCAL\ compilers do not implement non-local |goto| statements. 685@^system dependencies@> 686In such cases the code that appears at label |end_of_TANGLE| should be 687copied into the |jump_out| procedure, followed by a call to a system procedure 688that terminates the program. 689 690@d fatal_error(#)==begin new_line; print(#); error; mark_fatal; jump_out; 691 end 692 693@<Error handling...@>= 694procedure jump_out; 695begin goto end_of_TANGLE; 696end; 697 698@ Sometimes the program's behavior is far different from what it should be, 699and \.{TANGLE} prints an error message that is really for the \.{TANGLE} 700maintenance person, not the user. In such cases the program says 701|confusion('indication of where we are')|. 702 703@d confusion(#)==fatal_error('! This can''t happen (',#,')') 704@.This can't happen@> 705 706@ An overflow stop occurs if \.{TANGLE}'s tables aren't large enough. 707 708@d overflow(#)==fatal_error('! Sorry, ',#,' capacity exceeded') 709@.Sorry, x capacity exceeded@> 710 711 712@* Data structures. 713Most of the user's \PASCAL\ code is packed into eight-bit integers 714in two large arrays called |byte_mem| and |tok_mem|. 715The |byte_mem| array holds the names of identifiers, strings, and modules; 716the |tok_mem| array holds the replacement texts 717for macros and modules. Allocation is sequential, since things are deleted only 718during Phase II, and only in a last-in-first-out manner. 719 720Auxiliary arrays |byte_start| and |tok_start| are used as directories to 721|byte_mem| and |tok_mem|, and the |link|, |ilk|, |equiv|, and |text_link| 722arrays give further information about names. These auxiliary arrays 723consist of sixteen-bit items. 724 725@<Types...@>= 726@!eight_bits=0..255; {unsigned one-byte quantity} 727@!sixteen_bits=0..65535; {unsigned two-byte quantity} 728 729@ \.{TANGLE} has been designed to avoid the need for indices that are more 730than sixteen bits wide, so that it can be used on most computers. But 731there are programs that need more than 65536 tokens, and some programs 732even need more than 65536 bytes; \TeX\ is one of these. To get around 733this problem, a slight complication has been added to the data structures: 734|byte_mem| and |tok_mem| are two-dimensional arrays, whose first index is 735either 0 or 1. (For generality, the first index is actually allowed to run 736between 0 and |ww-1| in |byte_mem|, or between 0 and |zz-1| in |tok_mem|, 737where |ww| and |zz| are set to 2 and~3; the program will work for any 738positive values of |ww| and |zz|, and it can be simplified in obvious ways 739if |ww=1| or |zz=1|.) 740 741@d ww=2 {we multiply the byte capacity by approximately this amount} 742@d zz=3 {we multiply the token capacity by approximately this amount} 743 744@<Globals...@>= 745@!byte_mem: packed array [0..ww-1,0..max_bytes] of ASCII_code; 746 {characters of names} 747@!tok_mem: packed array [0..zz-1,0..max_toks] of eight_bits; {tokens} 748@!byte_start: array [0..max_names] of sixteen_bits; {directory into |byte_mem|} 749@!tok_start: array [0..max_texts] of sixteen_bits; {directory into |tok_mem|} 750@!link: array [0..max_names] of sixteen_bits; {hash table or tree links} 751@!ilk: array [0..max_names] of sixteen_bits; {type codes or tree links} 752@!equiv: array [0..max_names] of sixteen_bits; {info corresponding to names} 753@!text_link: array [0..max_texts] of sixteen_bits; {relates replacement texts} 754 755@ The names of identifiers are found by computing a hash address |h| and 756then looking at strings of bytes signified by |hash[h]|, |link[hash[h]]|, 757|link[link[hash[h]]]|, \dots, until either finding the desired name 758or encountering a zero. 759 760A `|name_pointer|' variable, which signifies a name, is an index into 761|byte_start|. The actual sequence of characters in the name pointed to by 762|p| appears in positions |byte_start[p]| to |byte_start[p+ww]-1|, inclusive, 763in the segment of |byte_mem| whose first index is |p mod ww|. Thus, when 764|ww=2| the even-numbered name bytes appear in |byte_mem[0,@t$*$@>]| 765and the odd-numbered ones appear in |byte_mem[1,@t$*$@>]|. 766The pointer 0 is used for undefined module names; we don't 767want to use it for the names of identifiers, since 0 stands for a null 768pointer in a linked list. 769 770Strings are treated like identifiers; the first character (a double-quote) 771distinguishes a string from an alphabetic name, but for \.{TANGLE}'s purposes 772strings behave like numeric macros. (A `string' here refers to the 773strings delimited by double-quotes that \.{TANGLE} processes. \PASCAL\ 774string constants delimited by single-quote marks are not given such special 775treatment; they simply appear as sequences of characters in the \PASCAL\ 776texts.) The total number of strings in the string 777pool is called |string_ptr|, and the total number of names in |byte_mem| 778is called |name_ptr|. The total number of bytes occupied in 779|byte_mem[w,@t$*$@>]| is called |byte_ptr[w]|. 780 781We usually have |byte_start[name_ptr+w]=byte_ptr[(name_ptr+w) mod ww]| 782for |0<=w<ww|, since these are the starting positions for the next |ww| 783names to be stored in |byte_mem|. 784 785@d length(#)==byte_start[#+ww]-byte_start[#] {the length of a name} 786 787@<Types...@>= 788@!name_pointer=0..max_names; {identifies a name} 789 790@ @<Global...@>= 791@!name_ptr:name_pointer; {first unused position in |byte_start|} 792@!string_ptr:name_pointer; {next number to be given to a string of length |<>1|} 793@!byte_ptr:array [0..ww-1] of 0..max_bytes; 794 {first unused position in |byte_mem|} 795@!pool_check_sum:integer; {sort of a hash for the whole string pool} 796 797@ @<Local variables for init...@>= 798@!wi: 0..ww-1; {to initialize the |byte_mem| indices} 799 800@ @<Set init...@>= 801for wi:=0 to ww-1 do 802 begin byte_start[wi]:=0; byte_ptr[wi]:=0; 803 end; 804byte_start[ww]:=0; {this makes name 0 of length zero} 805name_ptr:=1; string_ptr:=256; pool_check_sum:=271828; 806 807@ Replacement texts are stored in |tok_mem|, using similar conventions. 808A `|text_pointer|' variable is an index into |tok_start|, and the 809replacement text that corresponds to |p| runs from positions 810|tok_start[p]| to |tok_start[p+zz]-1|, inclusive, in the segment of 811|tok_mem| whose first index is |p mod zz|. Thus, when |zz=2| the 812even-numbered replacement texts appear in |tok_mem[0,@t$*$@>]| and the 813odd-numbered ones appear in |tok_mem[1,@t$*$@>]|. Furthermore, 814|text_link[p]| is used to connect pieces of text that have the same name, 815as we shall see later. The pointer 0 is used for undefined replacement 816texts. 817 818The first position of |tok_mem[z,@t$*$@>]| that is unoccupied by 819replacement text is called |tok_ptr[z]|, and the first unused location of 820|tok_start| is called |text_ptr|. We usually have the identity 821|tok_start[text_ptr+z]=tok_ptr[(text_ptr+z) mod zz]|, for |0<=z<zz|, since 822these are the starting positions for the next |zz| replacement texts to 823be stored in |tok_mem|. 824 825@<Types...@>= 826@!text_pointer=0..max_texts; {identifies a replacement text} 827 828@ It is convenient to maintain a variable |z| that is equal to |text_ptr 829mod zz|, so that we always insert tokens into segment |z| of |tok_mem|. 830 831@<Glob...@>= 832@t\hskip1em@>@!text_ptr:text_pointer; {first unused position in |tok_start|} 833@t\hskip1em@>@!tok_ptr:array[0..zz-1] of 0..max_toks; 834 {first unused position in a given segment of |tok_mem|} 835@t\hskip1em@>@!z:0..zz-1; {current segment of |tok_mem|} 836stat @!max_tok_ptr:array[0..zz-1] of 0..max_toks; 837 {largest values assumed by |tok_ptr|} 838tats 839 840@ @<Local variables for init...@>= 841@!zi:0..zz-1; {to initialize the |tok_mem| indices} 842 843@ @<Set init...@>= 844for zi:=0 to zz-1 do 845 begin tok_start[zi]:=0; tok_ptr[zi]:=0; 846 end; 847tok_start[zz]:=0; {this makes replacement text 0 of length zero} 848text_ptr:=1; z:=1 mod zz; 849 850@ Four types of identifiers are distinguished by their |ilk|: 851 852\yskip\hang |normal| identifiers will appear in the \PASCAL\ program as 853ordinary identifiers since they have not been defined to be macros; the 854corresponding value in the |equiv| array 855for such identifiers is a link in a secondary hash table that 856is used to check whether any two of them agree in their first |unambig_length| 857characters after underline symbols are removed and lowercase letters are 858changed to uppercase. 859 860\yskip\hang |numeric| identifiers have been defined to be numeric macros; 861their |equiv| value contains the corresponding numeric value plus $2^{15}$. 862Strings are treated as numeric macros. 863 864\yskip\hang |simple| identifiers have been defined to be simple macros; 865their |equiv| value points to the corresponding replacement text. 866 867\yskip\hang |parametric| identifiers have been defined to be parametric macros; 868like simple identifiers, their |equiv| value points to the replacement text. 869 870@d normal=0 {ordinary identifiers have |normal| ilk} 871@d numeric=1 {numeric macros and strings have |numeric| ilk} 872@d simple=2 {simple macros have |simple| ilk} 873@d parametric=3 {parametric macros have |parametric| ilk} 874 875@ The names of modules are stored in |byte_mem| together 876with the identifier names, but a hash table is not used for them because 877\.{TANGLE} needs to be able to recognize a module name when given a prefix of 878that name. A conventional binary seach tree is used to retrieve module names, 879with fields called |llink| and |rlink| in place of |link| and |ilk|. The 880root of this tree is |rlink[0]|. If |p| is a pointer to a module name, 881|equiv[p]| points to its replacement text, just as in simple and parametric 882macros, unless this replacement text has not yet been defined (in which case 883|equiv[p]=0|). 884 885@d llink==link {left link in binary search tree for module names} 886@d rlink==ilk {right link in binary search tree for module names} 887 888@<Set init...@>= 889rlink[0]:=0; {the binary search tree starts out with nothing in it} 890equiv[0]:=0; {the undefined module has no replacement text} 891 892@ Here is a little procedure that prints the text of a given name. 893 894@p procedure print_id(@!p:name_pointer); {print identifier or module name} 895var k:0..max_bytes; {index into |byte_mem|} 896@!w:0..ww-1; {segment of |byte_mem|} 897begin if p>=name_ptr then print('IMPOSSIBLE') 898else begin w:=p mod ww; 899 for k:=byte_start[p] to byte_start[p+ww]-1 do print(xchr[byte_mem[w,k]]); 900 end; 901end; 902 903@* Searching for identifiers. 904The hash table described above is updated by the |id_lookup| procedure, 905which finds a given identifier and returns a pointer to its index in 906|byte_start|. If the identifier was not already present, it is inserted with 907a given |ilk| code; and an error message is printed if the identifier is being 908doubly defined. 909 910Because of the way \.{TANGLE}'s scanning mechanism works, it is most convenient 911to let |id_lookup| search for an identifier that is present in the |buffer| 912array. Two other global variables specify its position in the buffer: the 913first character is |buffer[id_first]|, and the last is |buffer[id_loc-1]|. 914Furthermore, if the identifier is really a string, the global variable 915|double_chars| tells how many of the characters in the buffer appear 916twice (namely \.{@@@@} and \.{""}), since this additional information makes 917it easy to calculate the true length of the string. The final double-quote 918of the string is not included in its ``identifier,'' but the first one is, 919so the string length is |id_loc-id_first-double_chars-1|. 920 921We have mentioned that |normal| identifiers belong to two hash tables, 922one for their true names as they appear in the \.{WEB} file and the other 923when they have been reduced to their first |unambig_length| characters. 924The hash tables are kept by the method of simple chaining, where the 925heads of the individual lists appear in the |hash| and |chop_hash| arrays. 926If |h| is a hash code, the primary hash table list starts at |hash[h]| and 927proceeds through |link| pointers; the secondary hash table list starts at 928|chop_hash[h]| and proceeds through |equiv| pointers. Of course, the same 929identifier will probably have two different values of |h|. 930 931The |id_lookup| procedure uses an auxiliary array called |chopped_id| to 932contain up to |unambig_length| characters of the current identifier, if 933it is necessary to compute the secondary hash code. (This array could be 934declared local to |id_lookup|, but in general we are making all array 935declarations global in this program, because some compilers and some machine 936architectures make dynamic array allocation inefficient.) 937 938@<Glob...@>= 939@!id_first:0..buf_size; {where the current identifier begins in the buffer} 940@!id_loc:0..buf_size; {just after the current identifier in the buffer} 941@!double_chars:0..buf_size; {correction to length in case of strings} 942@# 943@!hash,@!chop_hash:array [0..hash_size] of sixteen_bits; {heads of hash lists} 944@!chopped_id:array [0..unambig_length] of ASCII_code; {chopped identifier} 945 946@ Initially all the hash lists are empty. 947 948@<Local variables for init...@>= 949@!h:0..hash_size; {index into hash-head arrays} 950 951@ @<Set init...@>= 952for h:=0 to hash_size-1 do 953 begin hash[h]:=0; chop_hash[h]:=0; 954 end; 955 956@ Here now is the main procedure for finding identifiers (and strings). 957The parameter |t| is set to |normal| except when the identifier is 958a macro name that is just being defined; in the latter case, |t| will be 959|numeric|, |simple|, or |parametric|. 960 961@p function id_lookup(@!t:eight_bits):name_pointer; {finds current identifier} 962label found, not_found; 963var c:eight_bits; {byte being chopped} 964@!i:0..buf_size; {index into |buffer|} 965@!h:0..hash_size; {hash code} 966@!k:0..max_bytes; {index into |byte_mem|} 967@!w:0..ww-1; {segment of |byte_mem|} 968@!l:0..buf_size; {length of the given identifier} 969@!p,@!q:name_pointer; {where the identifier is being sought} 970@!s:0..unambig_length; {index into |chopped_id|} 971begin l:=id_loc-id_first; {compute the length} 972@<Compute the hash code |h|@>; 973@<Compute the name location |p|@>; 974if (p=name_ptr)or(t<>normal) then 975 @<Update the tables and check for possible errors@>; 976id_lookup:=p; 977end; 978 979@ A simple hash code is used: If the sequence of 980ASCII codes is $c_1c_2\ldots c_m$, its hash value will be 981$$(2^{n-1}c_1+2^{n-2}c_2+\cdots+c_n)\,\bmod\,|hash_size|.$$ 982 983@<Compute the hash...@>= 984h:=buffer[id_first]; i:=id_first+1; 985while i<id_loc do 986 begin h:=(h+h+buffer[i]) mod hash_size; incr(i); 987 end 988 989@ If the identifier is new, it will be placed in position |p=name_ptr|, 990otherwise |p| will point to its existing location. 991 992@<Compute the name location...@>= 993p:=hash[h]; 994while p<>0 do 995 begin if length(p)=l then 996 @<Compare name |p| with current identifier, |goto found| if equal@>; 997 p:=link[p]; 998 end; 999p:=name_ptr; {the current identifier is new} 1000link[p]:=hash[h]; hash[h]:=p; {insert |p| at beginning of hash list} 1001found: 1002 1003@ @<Compare name |p|...@>= 1004begin i:=id_first; k:=byte_start[p]; w:=p mod ww; 1005while (i<id_loc)and(buffer[i]=byte_mem[w,k]) do 1006 begin incr(i); incr(k); 1007 end; 1008if i=id_loc then goto found; {all characters agree} 1009end 1010 1011@ @<Update the tables...@>= 1012begin if ((p<>name_ptr)and(t<>normal)and(ilk[p]=normal)) or 1013 ((p=name_ptr)and(t=normal)and(buffer[id_first]<>"""")) then 1014 @<Compute the secondary hash code |h| and put the first characters 1015 into the auxiliary array |chopped_id|@>; 1016if p<>name_ptr then 1017 @<Give double-definition error, if necessary, and change |p| to type |t|@> 1018else @<Enter a new identifier into the table at position |p|@>; 1019end 1020 1021@ The following routine, which is called into play when it is necessary to 1022look at the secondary hash table, computes the same hash function as before 1023(but on the chopped data), and places a zero after the chopped identifier 1024in |chopped_id| to serve as a convenient sentinel. 1025 1026@<Compute the secondary...@>= 1027begin i:=id_first; s:=0; h:=0; 1028while (i<id_loc)and(s<unambig_length) do 1029 begin if buffer[i]<>"_" then 1030 begin if buffer[i]>="a" then chopped_id[s]:=buffer[i]-@'40 1031 else chopped_id[s]:=buffer[i]; 1032 h:=(h+h+chopped_id[s]) mod hash_size; incr(s); 1033 end; 1034 incr(i); 1035 end; 1036chopped_id[s]:=0; 1037end 1038 1039@ If a nonnumeric macro has appeared before it was defined, \.{TANGLE} 1040will still work all right; after all, such behavior is typical of the 1041replacement texts for modules, which act very much like macros. 1042However, an undefined numeric macro may not be used on the right-hand 1043side of another numeric macro definition, so \.{TANGLE} finds it 1044simplest to make a blanket rule that numeric macros should be defined 1045before they are used. The following routine gives an error message and 1046also fixes up any damage that may have been caused. 1047 1048@<Give double...@>= {now |p<>name_ptr| and |t<>normal|} 1049begin if ilk[p]=normal then 1050 begin if t=numeric then err_print('! This identifier has already appeared'); 1051@.This identifier has already...@> 1052 @<Remove |p| from secondary hash table@>; 1053 end 1054else err_print('! This identifier was defined before'); 1055@.This identifier was defined...@> 1056ilk[p]:=t; 1057end 1058 1059@ When we have to remove a secondary hash entry, because a |normal| identifier 1060is changing to another |ilk|, the hash code |h| and chopped identifier have 1061already been computed. 1062 1063@<Remove |p| from secondary...@>= 1064q:=chop_hash[h]; 1065if q=p then chop_hash[h]:=equiv[p] 1066else begin while equiv[q]<>p do q:=equiv[q]; 1067 equiv[q]:=equiv[p]; 1068 end 1069 1070@ The following routine could make good use of a generalized |pack| procedure 1071that puts items into just part of a packed array instead of the whole thing. 1072 1073@<Enter a new identifier...@>= 1074begin if (t=normal)and(buffer[id_first]<>"""") then 1075 @<Check for ambiguity and update secondary hash@>; 1076w:=name_ptr mod ww; k:=byte_ptr[w]; 1077if k+l>max_bytes then overflow('byte memory'); 1078if name_ptr>max_names-ww then overflow('name'); 1079i:=id_first; {get ready to move the identifier into |byte_mem|} 1080while i<id_loc do 1081 begin byte_mem[w,k]:=buffer[i]; incr(k); incr(i); 1082 end; 1083byte_ptr[w]:=k; byte_start[name_ptr+ww]:=k; incr(name_ptr); 1084if buffer[id_first]<>"""" then ilk[p]:=t 1085else @<Define and output a new string of the pool@>; 1086end 1087 1088@ @<Check for ambig...@>= 1089begin q:=chop_hash[h]; 1090while q<>0 do 1091 begin @<Check if |q| conflicts with |p|@>; 1092 q:=equiv[q]; 1093 end; 1094equiv[p]:=chop_hash[h]; chop_hash[h]:=p; {put |p| at front of secondary list} 1095end 1096 1097@ @<Check if |q| conflicts...@>= 1098begin k:=byte_start[q]; s:=0; w:=q mod ww; 1099while (k<byte_start[q+ww]) and (s<unambig_length) do 1100 begin c:=byte_mem[w,k]; 1101 if c<>"_" then 1102 begin if c>="a" then c:=c-@'40; {merge lowercase with uppercase} 1103 if chopped_id[s]<>c then goto not_found; 1104 incr(s); 1105 end; 1106 incr(k); 1107 end; 1108if (k=byte_start[q+ww])and(chopped_id[s]<>0) then goto not_found; 1109print_nl('! Identifier conflict with '); 1110@.Identifier conflict...@> 1111for k:=byte_start[q] to byte_start[q+ww]-1 do print(xchr[byte_mem[w,k]]); 1112error; q:=0; {only one conflict will be printed, since |equiv[0]=0|} 1113not_found: 1114end 1115 1116@ We compute the string pool check sum by working modulo a prime number 1117that is large but not so large that overflow might occur. 1118 1119@d check_sum_prime==@'3777777667 {$2^{29}-73$} 1120@^preprocessed strings@> 1121 1122@<Define and output a new string...@>= 1123begin ilk[p]:=numeric; {strings are like numeric macros} 1124if l-double_chars=2 then {this string is for a single character} 1125 equiv[p]:=buffer[id_first+1]+@'100000 1126else begin equiv[p]:=string_ptr+@'100000; 1127 l:=l-double_chars-1; 1128 if l>99 then err_print('! Preprocessed string is too long'); 1129@.Preprocessed string is too long@> 1130 incr(string_ptr); 1131 write(pool,xchr["0"+l div 10],xchr["0"+l mod 10]); {output the length} 1132 pool_check_sum:=pool_check_sum+pool_check_sum+l; 1133 while pool_check_sum>check_sum_prime do 1134 pool_check_sum:=pool_check_sum-check_sum_prime; 1135 i:=id_first+1; 1136 while i<id_loc do 1137 begin write(pool,xchr[buffer[i]]); {output characters of string} 1138 pool_check_sum:=pool_check_sum+pool_check_sum+buffer[i]; 1139 while pool_check_sum>check_sum_prime do 1140 pool_check_sum:=pool_check_sum-check_sum_prime; 1141 if (buffer[i]="""") or (buffer[i]="@@") then 1142 i:=i+2 {omit second appearance of doubled character} 1143 else incr(i); 1144 end; 1145 write_ln(pool); 1146 end; 1147end 1148 1149@* Searching for module names. 1150The |mod_lookup| procedure finds the module name |mod_text[1..l]| in the 1151search tree, after inserting it if necessary, and returns a pointer to 1152where it was found. 1153 1154@<Glob...@>= 1155@!mod_text:array [0..longest_name] of ASCII_code; {name being sought for} 1156 1157@ According to the rules of \.{WEB}, no module name 1158should be a proper prefix of another, so a ``clean'' comparison should 1159occur between any two names. The result of |mod_lookup| is 0 if this 1160prefix condition is violated. An error message is printed when such violations 1161are detected during phase two of \.{WEAVE}. 1162 1163@d less=0 {the first name is lexicographically less than the second} 1164@d equal=1 {the first name is equal to the second} 1165@d greater=2 {the first name is lexicographically greater than the second} 1166@d prefix=3 {the first name is a proper prefix of the second} 1167@d extension=4 {the first name is a proper extension of the second} 1168 1169@p function mod_lookup(@!l:sixteen_bits):name_pointer; {finds module name} 1170label found; 1171var c:less..extension; {comparison between two names} 1172@!j:0..longest_name; {index into |mod_text|} 1173@!k:0..max_bytes; {index into |byte_mem|} 1174@!w:0..ww-1; {segment of |byte_mem|} 1175@!p:name_pointer; {current node of the search tree} 1176@!q:name_pointer; {father of node |p|} 1177begin c:=greater; q:=0; p:=rlink[0]; {|rlink[0]| is the root of the tree} 1178while p<>0 do 1179 begin @<Set \(|c| to the result of comparing the given name to 1180 name |p|@>; 1181 q:=p; 1182 if c=less then p:=llink[q] 1183 else if c=greater then p:=rlink[q] 1184 else goto found; 1185 end; 1186@<Enter a new module name into the tree@>; 1187found: if c<>equal then 1188 begin err_print('! Incompatible section names'); p:=0; 1189@.Incompatible module names@> 1190 end; 1191mod_lookup:=p; 1192end; 1193 1194@ @<Enter a new module name...@>= 1195w:=name_ptr mod ww; k:=byte_ptr[w]; 1196if k+l>max_bytes then overflow('byte memory'); 1197if name_ptr>max_names-ww then overflow('name'); 1198p:=name_ptr; 1199if c=less then llink[q]:=p else rlink[q]:=p; 1200llink[p]:=0; rlink[p]:=0; c:=equal; equiv[p]:=0; 1201for j:=1 to l do byte_mem[w,k+j-1]:=mod_text[j]; 1202byte_ptr[w]:=k+l; byte_start[name_ptr+ww]:=k+l; incr(name_ptr); 1203 1204@ @<Set \(|c|...@>= 1205begin k:=byte_start[p]; w:=p mod ww; c:=equal; j:=1; 1206while (k<byte_start[p+ww]) and (j<=l) and (mod_text[j]=byte_mem[w,k]) do 1207 begin incr(k); incr(j); 1208 end; 1209if k=byte_start[p+ww] then 1210 if j>l then c:=equal 1211 else c:=extension 1212else if j>l then c:=prefix 1213else if mod_text[j]<byte_mem[w,k] then c:=less 1214else c:=greater; 1215end 1216 1217@ The |prefix_lookup| procedure is supposed to find exactly one module 1218name that has |mod_text[1..l]| as a prefix. Actually the algorithm silently 1219accepts also the situation that some module name is a prefix of 1220|mod_text[1..l]|, because the user who painstakingly typed in more than 1221necessary probably doesn't want to be told about the wasted effort. 1222 1223@p function prefix_lookup(@!l:sixteen_bits):name_pointer; {finds name extension} 1224var c:less..extension; {comparison between two names} 1225@!count:0..max_names; {the number of hits} 1226@!j:0..longest_name; {index into |mod_text|} 1227@!k:0..max_bytes; {index into |byte_mem|} 1228@!w:0..ww-1; {segment of |byte_mem|} 1229@!p:name_pointer; {current node of the search tree} 1230@!q:name_pointer; {another place to resume the search after one branch is done} 1231@!r:name_pointer; {extension found} 1232begin q:=0; p:=rlink[0]; count:=0; r:=0; {begin search at root of tree} 1233while p<>0 do 1234 begin @<Set \(|c|...@>; 1235 if c=less then p:=llink[p] 1236 else if c=greater then p:=rlink[p] 1237 else begin r:=p; incr(count); q:=rlink[p]; p:=llink[p]; 1238 end; 1239 if p=0 then 1240 begin p:=q; q:=0; 1241 end; 1242 end; 1243if count<>1 then 1244 if count=0 then err_print('! Name does not match') 1245@.Name does not match@> 1246 else err_print('! Ambiguous prefix'); 1247@.Ambiguous prefix@> 1248prefix_lookup:=r; {the result will be 0 if there was no match} 1249end; 1250 1251@* Tokens. 1252Replacement texts, which represent \PASCAL\ code in a compressed format, 1253appear in |tok_mem| as mentioned above. The codes in 1254these texts are called `tokens'; some tokens occupy two consecutive 1255eight-bit byte positions, and the others take just one byte. 1256 1257If $p>0$ points to a replacement text, |tok_start[p]| is the |tok_mem| position 1258of the first eight-bit code of that text. If |text_link[p]=0|, 1259this is the replacement text for a macro, otherwise it is the replacement 1260text for a module. In the latter case |text_link[p]| is either equal to 1261|module_flag|, which means that there is no further text for this module, or 1262|text_link[p]| points to a 1263continuation of this replacement text; such links are created when 1264several modules have \PASCAL\ texts with the same name, and they also 1265tie together all the \PASCAL\ texts of unnamed modules. 1266The replacement text pointer for the first unnamed module 1267appears in |text_link[0]|, and the most recent such pointer is |last_unnamed|. 1268 1269@d module_flag==max_texts {final |text_link| in module replacement texts} 1270 1271@<Glob...@>= 1272@!last_unnamed:text_pointer; {most recent replacement text of unnamed module} 1273 1274@ @<Set init...@>= last_unnamed:=0; text_link[0]:=0; 1275 1276@ If the first byte of a token is less than @'200, the token occupies a 1277single byte. Otherwise we make a sixteen-bit token by combining two consecutive 1278bytes |a| and |b|. If |@'200<=a<@'250|, then $(a-@'200)\times2^8+b$ points 1279to an identifier; if |@'250<=a<@'320|, then 1280$(a-@'250)\times2^8+b$ points to a module name; otherwise, i.e., if 1281|@'320<=a<@'400|, then $(a-@'320)\times2^8+b$ is the number of the module 1282in which the current replacement text appears. 1283 1284Codes less than @'200 are 7-bit ASCII codes that represent themselves. 1285In particular, a single-character identifier like `|x|' will be a one-byte 1286token, while all longer identifiers will occupy two bytes. 1287 1288Some of the 7-bit ASCII codes will not be present, however, so we can 1289use them for special purposes. The following symbolic names are used: 1290 1291\yskip\hang |param| denotes insertion of a parameter. This occurs only in 1292the replacement texts of parametric macros, outside of single-quoted strings 1293in those texts. 1294 1295\hang |begin_comment| denotes \.{@@\{}, which will become either 1296\.{\{} or \.{[}. 1297 1298\hang |end_comment| denotes \.{@@\}}, which will become either 1299\.{\}} or \.{]}. 1300 1301\hang |octal| denotes the \.{@@\'} that precedes an octal constant. 1302 1303\hang |hex| denotes the \.{@@"} that precedes a hexadecimal constant. 1304 1305\hang |check_sum| denotes the \.{@@\char'44} that denotes the string pool 1306check sum. 1307 1308\hang |join| denotes the concatenation of adjacent items with no 1309space or line breaks allowed between them (the \.{@@\&} operation of \.{WEB}). 1310 1311\hang |double_dot| denotes `\.{..}' in \PASCAL. 1312 1313\hang |verbatim| denotes the \.{@@=} that begins a verbatim \PASCAL\ string. 1314It is also used for the end of the string. 1315 1316\hang |force_line| denotes the \.{@@\\} that forces a new line in the 1317\PASCAL\ output. 1318@^ASCII code@> 1319 1320@d param=0 {ASCII null code will not appear} 1321@d verbatim=@'2 {extended ASCII alpha should not appear} 1322@d force_line=@'3 {extended ASCII beta should not appear} 1323@d begin_comment=@'11 {ASCII tab mark will not appear} 1324@d end_comment=@'12 {ASCII line feed will not appear} 1325@d octal=@'14 {ASCII form feed will not appear} 1326@d hex=@'15 {ASCII carriage return will not appear} 1327@d double_dot=@'40 {ASCII space will not appear except in strings} 1328@d check_sum=@'175 {will not be confused with right brace} 1329@d join=@'177 {ASCII delete will not appear} 1330 1331@ The following procedure is used to enter a two-byte value into 1332|tok_mem| when a replacement text is being generated. 1333 1334@p procedure store_two_bytes(@!x:sixteen_bits); 1335 {stores high byte, then low byte} 1336begin if tok_ptr[z]+2>max_toks then overflow('token'); 1337tok_mem[z,tok_ptr[z]]:=x div@'400; {this could be done by a shift command} 1338tok_mem[z,tok_ptr[z]+1]:=x mod@'400; {this could be done by a logical and} 1339tok_ptr[z]:=tok_ptr[z]+2; 1340end; 1341 1342@ When \.{TANGLE} is being operated in debug mode, it has a procedure to display 1343a replacement text in symbolic form. This procedure has not been spruced up to 1344generate a real great format, but at least the results are not as bad as 1345a memory dump. 1346 1347@p @!debug procedure print_repl(@!p:text_pointer); 1348var k:0..max_toks; {index into |tok_mem|} 1349@!a: sixteen_bits; {current byte(s)} 1350@!zp: 0..zz-1; {segment of |tok_mem| being accessed} 1351begin if p>=text_ptr then print('BAD') 1352else begin k:=tok_start[p]; zp:=p mod zz; 1353 while k<tok_start[p+zz] do 1354 begin a:=tok_mem[zp,k]; 1355 if a>=@'200 then @<Display two-byte token starting with |a|@> 1356 else @<Display one-byte token |a|@>; 1357 incr(k); 1358 end; 1359 end; 1360end; 1361gubed 1362 1363@ @<Display two-byte...@>= 1364begin incr(k); 1365if a<@'250 then {identifier or string} 1366 begin a:=(a-@'200)*@'400+tok_mem[zp,k]; print_id(a); 1367 if byte_mem[a mod ww,byte_start[a]]="""" then print('"') 1368 else print(' '); 1369 end 1370else if a<@'320 then {module name} 1371 begin print('@@<'); print_id((a-@'250)*@'400+tok_mem[zp,k]); 1372 print('@@>'); 1373 end 1374else begin a:=(a-@'320)*@'400+tok_mem[zp,k]; {module number} 1375 print('@@',xchr["{"],a:1,'@@',xchr["}"]); {can't use right brace 1376 between \&{debug} and \&{gubed}} 1377 end; 1378end 1379 1380@ @<Display one-byte...@>= 1381case a of 1382begin_comment: print('@@',xchr["{"]); 1383end_comment: print('@@',xchr["}"]); {can't use right brace 1384 between \&{debug} and \&{gubed}} 1385octal: print('@@'''); 1386hex: print('@@"'); 1387check_sum: print('@@$'); 1388param: print('#'); 1389"@@": print('@@@@'); 1390verbatim: print('@@='); 1391force_line: print('@@\'); 1392othercases print(xchr[a]) 1393endcases 1394 1395@* Stacks for output. 1396Let's make sure that our data structures contain enough information to 1397produce the entire \PASCAL\ program as desired, by working next on the 1398algorithms that actually do produce that program. 1399 1400@ The output process uses a stack to keep track of what is going on at 1401different ``levels'' as the macros are being expanded. 1402Entries on this stack have five parts: 1403 1404\yskip\hang |end_field| is the |tok_mem| location where the replacement 1405text of a particular level will end; 1406 1407\hang |byte_field| is the |tok_mem| location from which the next token 1408on a particular level will be read; 1409 1410\hang |name_field| points to the name corresponding to a particular level; 1411 1412\hang |repl_field| points to the replacement text currently being read 1413at a particular level; 1414 1415\hang |mod_field| is the module number, or zero if this is a macro. 1416 1417\yskip\noindent The current values of these five quantities are referred to 1418quite frequently, so they are stored in a separate place instead of in 1419the |stack| array. We call the current values |cur_end|, |cur_byte|, 1420|cur_name|, |cur_repl|, and |cur_mod|. 1421 1422The global variable |stack_ptr| tells how many levels of output are 1423currently in progress. The end of all output occurs when the stack is 1424empty, i.e., when |stack_ptr=0|. 1425 1426@<Types...@>= 1427@t\4@>@!output_state=record 1428 @!end_field: sixteen_bits; {ending location of replacement text} 1429 @!byte_field: sixteen_bits; {present location within replacement text} 1430 @!name_field: name_pointer; {|byte_start| index for text being output} 1431 @!repl_field: text_pointer; {|tok_start| index for text being output} 1432 @!mod_field: 0..@'27777; {module number or zero if not a module} 1433 end; 1434 1435@ @d cur_end==cur_state.end_field {current ending location in |tok_mem|} 1436@d cur_byte==cur_state.byte_field {location of next output byte in |tok_mem|} 1437@d cur_name==cur_state.name_field {pointer to current name being expanded} 1438@d cur_repl==cur_state.repl_field {pointer to current replacement text} 1439@d cur_mod==cur_state.mod_field {current module number being expanded} 1440 1441@<Globals...@>= 1442@!cur_state : output_state; {|cur_end|, |cur_byte|, |cur_name|, 1443 |cur_repl|, |cur_mod|} 1444@!stack : array [1..stack_size] of output_state; {info for non-current levels} 1445@!stack_ptr: 0..stack_size; {first unused location in the output state stack} 1446 1447@ It is convenient to keep a global variable |zo| equal to |cur_repl mod zz|. 1448 1449@<Glob...@>= 1450@!zo:0..zz-1; {the segment of |tok_mem| from which output is coming} 1451 1452@ Parameters must also be stacked. They are placed in 1453|tok_mem| just above the other replacement texts, and dummy parameter 1454`names' are placed in |byte_start| just after the other names. 1455The variables |text_ptr| and |tok_ptr[z]| essentially serve as parameter 1456stack pointers during the output phase, so there is no need for a separate 1457data structure to handle this problem. 1458 1459@ There is an implicit stack corresponding to meta-comments that are output 1460via \.{@@\{} and \.{@@\}}. But this stack need not be represented in detail, 1461because we only need to know whether it is empty or not. A global variable 1462|brace_level| tells how many items would be on this stack if it were present. 1463 1464@<Globals...@>= 1465@!brace_level: eight_bits; {current depth of $\.{@@\{}\ldots\.{@@\}}$ nesting} 1466 1467@ To get the output process started, we will perform the following 1468initialization steps. We may assume that |text_link[0]| is nonzero, since it 1469points to the \PASCAL\ text in the first unnamed module that generates 1470code; if there are no such modules, there is nothing to output, and an 1471error message will have been generated before we do any of the initialization. 1472 1473@<Initialize the output stacks@>= 1474stack_ptr:=1; brace_level:=0; cur_name:=0; cur_repl:=text_link[0]; 1475zo:=cur_repl mod zz; cur_byte:=tok_start[cur_repl]; 1476cur_end:=tok_start[cur_repl+zz]; cur_mod:=0; 1477 1478@ When the replacement text for name |p| is to be inserted into the output, 1479the following subroutine is called to save the old level of output and get 1480the new one going. 1481 1482@p procedure push_level(@!p:name_pointer); {suspends the current level} 1483begin if stack_ptr=stack_size then overflow('stack') 1484else begin stack[stack_ptr]:=cur_state; {save |cur_end|, |cur_byte|, etc.} 1485 incr(stack_ptr); 1486 cur_name:=p; cur_repl:=equiv[p]; zo:=cur_repl mod zz; 1487 cur_byte:=tok_start[cur_repl]; cur_end:=tok_start[cur_repl+zz]; 1488 cur_mod:=0; 1489 end; 1490end; 1491 1492@ When we come to the end of a replacement text, the |pop_level| subroutine 1493does the right thing: It either moves to the continuation of this replacement 1494text or returns the state to the most recently stacked level. Part of this 1495subroutine, which updates the parameter stack, will be given later when we 1496study the parameter stack in more detail. 1497 1498@p procedure pop_level; {do this when |cur_byte| reaches |cur_end|} 1499label exit; 1500begin if text_link[cur_repl]=0 then {end of macro expansion} 1501 begin if ilk[cur_name]=parametric then 1502 @<Remove a parameter from the parameter stack@>; 1503 end 1504else if text_link[cur_repl]<module_flag then {link to a continuation} 1505 begin cur_repl:=text_link[cur_repl]; {we will stay on the same level} 1506 zo:=cur_repl mod zz; 1507 cur_byte:=tok_start[cur_repl]; cur_end:=tok_start[cur_repl+zz]; 1508 return; 1509 end; 1510decr(stack_ptr); {we will go down to the previous level} 1511if stack_ptr>0 then 1512 begin cur_state:=stack[stack_ptr]; zo:=cur_repl mod zz; 1513 end; 1514exit: end; 1515 1516@ The heart of the output procedure is the |get_output| routine, which produces 1517the next token of output that is not a reference to a macro. This procedure 1518handles all the stacking and unstacking that is necessary. It returns the 1519value |number| if the next output has a numeric value (the value of a 1520numeric macro or string), in which case |cur_val| has been set to the 1521number in question. The procedure also returns the value |module_number| 1522if the next output begins or ends the replacement text of some module, 1523in which case |cur_val| is that module's number (if beginning) or the 1524negative of that value (if ending). And it returns the value |identifier| 1525if the next output is an identifier of length two or more, in which case 1526|cur_val| points to that identifier name. 1527 1528@d number=@'200 {code returned by |get_output| when next output is numeric} 1529@d module_number=@'201 {code returned by |get_output| for module numbers} 1530@d identifier=@'202 {code returned by |get_output| for identifiers} 1531 1532@<Globals...@>= 1533@!cur_val:integer; {additional information corresponding to output token} 1534 1535@ If |get_output| finds that no more output remains, it returns the value zero. 1536 1537@p function get_output:sixteen_bits; {returns next token after macro expansion} 1538label restart, done, found; 1539var a:sixteen_bits; {value of current byte} 1540@!b:eight_bits; {byte being copied} 1541@!bal:sixteen_bits; {excess of \.( versus \.) while copying a parameter} 1542@!k:0..max_bytes; {index into |byte_mem|} 1543@!w:0..ww-1; {segment of |byte_mem|} 1544begin restart: if stack_ptr=0 then 1545 begin a:=0; goto found; 1546 end; 1547if cur_byte=cur_end then 1548 begin cur_val:=-cur_mod; pop_level; 1549 if cur_val=0 then goto restart; 1550 a:=module_number; goto found; 1551 end; 1552a:=tok_mem[zo,cur_byte]; incr(cur_byte); 1553if a<@'200 then {one-byte token} 1554 if a=param then 1555 @<Start scanning current macro parameter, |goto restart|@> 1556 else goto found; 1557a:=(a-@'200)*@'400+tok_mem[zo,cur_byte]; incr(cur_byte); 1558if a<@'24000 then {|@'24000=(@'250-@'200)*@'400|} 1559 @<Expand macro |a| and |goto found|, or |goto restart| if no output found@>; 1560if a<@'50000 then {|@'50000=(@'320-@'200)*@'400|} 1561 @<Expand module |a-@'24000|, |goto restart|@>; 1562cur_val:=a-@'50000; a:=module_number; cur_mod:=cur_val; 1563found: 1564@!debug if trouble_shooting then debug_help;@;@+gubed@/ 1565get_output:=a; 1566end; 1567 1568@ The user may have forgotten to give any \PASCAL\ text for a module name, 1569or the \PASCAL\ text may have been associated with a different name by mistake. 1570 1571@<Expand module |a-...@>= 1572begin a:=a-@'24000; 1573if equiv[a]<>0 then push_level(a) 1574else if a<>0 then 1575 begin print_nl('! Not present: <'); print_id(a); print('>'); error; 1576@.Not present: <section name>@> 1577 end; 1578goto restart; 1579end 1580 1581@ @<Expand macro ...@>= 1582begin case ilk[a] of 1583normal: begin cur_val:=a; a:=identifier; 1584 end; 1585numeric: begin cur_val:=equiv[a]-@'100000; a:=number; 1586 end; 1587simple: begin push_level(a); goto restart; 1588 end; 1589parametric: begin @<Put a parameter on the parameter stack, 1590 or |goto restart| if error occurs@>; 1591 push_level(a); goto restart; 1592 end; 1593othercases confusion('output') 1594endcases;@/ 1595goto found; 1596end 1597 1598@ We come now to the interesting part, the job of putting a parameter on 1599the parameter stack. First we pop the stack if necessary until getting to 1600a level that hasn't ended. Then the next character must be a `\.('; 1601and since parentheses are balanced on each level, the entire parameter must 1602be present, so we can copy it without difficulty. 1603 1604@<Put a parameter...@>= 1605while (cur_byte=cur_end)and(stack_ptr>0) do pop_level; 1606if (stack_ptr=0)or(tok_mem[zo,cur_byte]<>"(") then 1607 begin print_nl('! No parameter given for '); print_id(a); error; 1608@.No parameter given for macro@> 1609 goto restart; 1610 end; 1611@<Copy the parameter into |tok_mem|@>; 1612equiv[name_ptr]:=text_ptr; ilk[name_ptr]:=simple; w:=name_ptr mod ww; 1613k:=byte_ptr[w]; 1614@!debug if k=max_bytes then overflow('byte memory'); 1615byte_mem[w,k]:="#"; incr(k); byte_ptr[w]:=k; 1616gubed {this code has set the parameter identifier for debugging printouts} 1617if name_ptr>max_names-ww then overflow('name'); 1618byte_start[name_ptr+ww]:=k; incr(name_ptr); 1619if text_ptr>max_texts-zz then overflow('text'); 1620text_link[text_ptr]:=0; tok_start[text_ptr+zz]:=tok_ptr[z]; 1621incr(text_ptr); 1622z:=text_ptr mod zz 1623 1624@ The |pop_level| routine undoes the effect of parameter-pushing when 1625a parameter macro is finished: 1626 1627@<Remove a parameter...@>= 1628begin decr(name_ptr); decr(text_ptr); 1629z:=text_ptr mod zz; 1630stat if tok_ptr[z]>max_tok_ptr[z] then max_tok_ptr[z]:=tok_ptr[z]; 1631tats {the maximum value of |tok_ptr| occurs just before parameter popping} 1632tok_ptr[z]:=tok_start[text_ptr]; 1633@!debug decr(byte_ptr[name_ptr mod ww]);@+gubed 1634end 1635 1636@ When a parameter occurs in a replacement text, we treat it as a simple 1637macro in position (|name_ptr-1|): 1638 1639@<Start scanning...@>= 1640begin push_level(name_ptr-1); goto restart; 1641end 1642 1643@ Similarly, a |param| token encountered as we copy a parameter is converted 1644into a simple macro call for |name_ptr-1|. 1645Some care is needed to handle cases like \\{macro}|(#; print('#)'))|; the 1646\.{\#} token will have been changed to |param| outside of strings, but we 1647still must distinguish `real' parentheses from those in strings. 1648 1649@d app_repl(#)==begin if tok_ptr[z]=max_toks then overflow('token'); 1650 tok_mem[z,tok_ptr[z]]:=#; incr(tok_ptr[z]); end 1651 1652@<Copy the parameter...@>= 1653bal:=1; incr(cur_byte); {skip the opening `\.('} 1654loop@+ begin b:=tok_mem[zo,cur_byte]; incr(cur_byte); 1655 if b=param then store_two_bytes(name_ptr+@'77777) 1656 else begin if b>=@'200 then 1657 begin app_repl(b); 1658 b:=tok_mem[zo,cur_byte]; incr(cur_byte); 1659 end 1660 else case b of 1661 "(": incr(bal); 1662 ")": begin decr(bal); 1663 if bal=0 then goto done; 1664 end; 1665 "'": repeat app_repl(b); 1666 b:=tok_mem[zo,cur_byte]; incr(cur_byte); 1667 until b="'"; {copy string, don't change |bal|} 1668 othercases do_nothing 1669 endcases; 1670 app_repl(b); 1671 end; 1672 end; 1673done: 1674 1675@* Producing the output. 1676The |get_output| routine above handles most of the complexity of output 1677generation, but there are two further considerations that have a nontrivial 1678effect on \.{TANGLE}'s algorithms. 1679 1680First, we want to make sure that the output is broken into lines not 1681exceeding |line_length| characters per line, where these breaks occur at 1682valid places (e.g., not in the middle of a string or a constant or an 1683identifier, not between `\.<' and `\.>', not at a `\.{@@\&}' position 1684where quantities are being joined together). Therefore we assemble the 1685output into a buffer before deciding where the line breaks will appear. 1686However, we make very little attempt to make ``logical'' line breaks that 1687would enhance the readability of the output; people are supposed to read 1688the input of \.{TANGLE} or the \TeX ed output of \.{WEAVE}, but not the 1689tangled-up output. The only concession to readability is that a break after 1690a semicolon will be made if possible, since commonly used ``pretty 1691printing'' routines give better results in such cases. 1692 1693Second, we want to decimalize non-decimal constants, and to combine integer 1694quantities that are added or subtracted, because \PASCAL\ doesn't allow 1695constant expressions in subrange types or in case labels. This means we 1696want to have a procedure that treats a construction like \.{(E-15+17)} 1697as equivalent to `\.{(E+2)}', while also leaving `\.{(1E-15+17)}' and 1698`\.{(E-15+17*y)}' untouched. Consider also `\.{-15+17.5}' versus 1699`\.{-15+17..5}'. We shall not combine integers preceding or following 1700\.*, \./, \.{div}, \.{mod}, or \.{@@\&}. Note that if |y| has been defined 1701to equal $-2$, we must expand `\.{x*y}' into `\.{x*(-2)}'; but `\.{x-y}' 1702can expand into `\.{x+2}' and we can even change `\.{x - y mod z}' to 1703@^mod@> 1704`\.{x + 2 mod z}' because \PASCAL\ has a nonstandard \&{mod} operation! 1705 1706The following solution to these problems has been adopted: An array 1707|out_buf| contains characters that have been generated but not yet output, 1708and there are three pointers into this array. One of these, |out_ptr|, is 1709the number of characters currently in the buffer, and we will have 1710|1<=out_ptr<=line_length| most of the time. The second is |break_ptr|, 1711which is the largest value |<=out_ptr| such that we are definitely entitled 1712to end a line by outputting the characters |out_buf[1..(break_ptr-1)]|; 1713we will always have |break_ptr<=line_length|. Finally, |semi_ptr| is either 1714zero or the largest known value of a legal break after a semicolon or comment 1715on the current line; we will always have |semi_ptr<=break_ptr|. 1716 1717@<Globals...@>= 1718@!out_buf: array [0..out_buf_size] of ASCII_code; {assembled characters} 1719@!out_ptr: 0..out_buf_size; {first available place in |out_buf|} 1720@!break_ptr: 0..out_buf_size; {last breaking place in |out_buf|} 1721@!semi_ptr: 0..out_buf_size; {last semicolon breaking place in |out_buf|} 1722 1723@ Besides having those three pointers, 1724the output process is in one of several states: 1725 1726\yskip\hang |num_or_id| means that the last item in the buffer is a number or 1727identifier, hence a blank space or line break must be inserted if the next 1728item is also a number or identifier. 1729 1730\yskip\hang |unbreakable| means that the last item in the buffer was followed 1731by the \.{@@\&} operation that inhibits spaces between it and the next item. 1732 1733\yskip\hang |sign| means that the last item in the buffer is to be followed 1734by \.+ or \.-, depending on whether |out_app| is positive or negative. 1735 1736\yskip\hang |sign_val| means that the decimal equivalent of 1737$\vert|out_val|\vert$ should be appended to the buffer. If |out_val<0|, 1738or if |out_val=0| and |last_sign<0|, the number should be preceded by a minus 1739sign. Otherwise it should be preceded by the character |out_sign| unless 1740|out_sign=0|; the |out_sign| variable is either 0 or \.{"\ "} or \.{"+"}. 1741 1742\yskip\hang |sign_val_sign| is like |sign_val|, but also append \.+ or \.- 1743afterwards, depending on whether |out_app| is positive or negative. 1744 1745\yskip\hang |sign_val_val| is like |sign_val|, but also append the decimal 1746equivalent of |out_app| including its sign, using |last_sign| in case 1747|out_app=0|. 1748 1749\yskip\hang |misc| means none of the above. 1750 1751\yskip\noindent 1752For example, the output buffer and output state run through the following 1753sequence as we generate characters from `\.{(x-15+19-2)}': 1754$$\vbox{\halign{$\hfil#\hfil$\quad&#\hfil&\quad\hfil#\hfil&\quad 1755\hfil#\hfil&\quad\hfil#\hfil&\quad\hfil#\hfil\quad&\hfil#\hfil\cr 1756output&|out_buf|&|out_state|&|out_sign|&|out_val|&|out_app|&|last_sign|\cr 1757\noalign{\vskip 3pt} 1758(&\.(&|misc|\cr 1759x&\.{(x}&|num_or_id|\cr 1760-&\.{(x}&|sign|&&&$-1$&$-1$\cr 176115&\.{(x}&|sign_val|&\.{"+"}&$-15$&&$-15$\cr 1762+&\.{(x}&|sign_val_sign|&\.{"+"}&$-15$&$+1$&$+1$\cr 176319&\.{(x}&|sign_val_val|&\.{"+"}&$-15$&$+19$&$+1$\cr 1764-&\.{(x}&|sign_val_sign|&\.{"+"}&$+4$&$-1$&$-1$\cr 17652&\.{(x}&|sign_val_val|&\.{"+"}&$+4$&$-2$&$-2$\cr 1766)&\.{(x+2)}&|misc|\cr}}$$ 1767At each stage we have put as much into the buffer as possible without 1768knowing what is coming next. Examples like `\.{x-0.1}' indicate why 1769|last_sign| is needed to associate the proper sign with an output of zero. 1770 1771In states |num_or_id|, |unbreakable|, and |misc| the last item in the buffer 1772lies between |break_ptr| and |out_ptr-1|, inclusive; in the other states we 1773have |break_ptr=out_ptr|. 1774 1775The numeric values assigned to |num_or_id|, etc., have been chosen to 1776shorten some of the program logic; for example, the program makes use of 1777the fact that |sign+2=sign_val_sign|. 1778 1779@d misc=0 {state associated with special characters} 1780@d num_or_id=1 {state associated with numbers and identifiers} 1781@d sign=2 {state associated with pending \.+ or \.-} 1782@d sign_val=num_or_id+2 {state associated with pending sign and value} 1783@d sign_val_sign=sign+2 {|sign_val| followed by another pending sign} 1784@d sign_val_val=sign_val+2 {|sign_val| followed by another pending value} 1785@d unbreakable=sign_val_val+1 {state associated with \.{@@\&}} 1786 1787@<Globals...@>= 1788@!out_state:eight_bits; {current status of partial output} 1789@!out_val,@!out_app:integer; {pending values} 1790@!out_sign:ASCII_code; {sign to use if appending |out_val>=0|} 1791@!last_sign:-1..+1; {sign to use if appending a zero} 1792 1793@ During the output process, |line| will equal the number of the next line 1794to be output. 1795 1796@<Initialize the output buffer@>= 1797out_state:=misc; out_ptr:=0; break_ptr:=0; semi_ptr:=0; out_buf[0]:=0; line:=1; 1798 1799@ Here is a routine that is invoked when |out_ptr>line_length| 1800or when it is time to flush out the final line. The |flush_buffer| procedure 1801often writes out the line up to the current |break_ptr| position, then moves the 1802remaining information to the front of |out_buf|. However, it prefers to 1803write only up to |semi_ptr|, if the residual line won't be too long. 1804 1805@d check_break==if out_ptr>line_length then flush_buffer 1806 1807@p procedure flush_buffer; {writes one line to output file} 1808var k:0..out_buf_size; {index into |out_buf|} 1809@!b:0..out_buf_size; {value of |break_ptr| upon entry} 1810begin b:=break_ptr; 1811if (semi_ptr<>0)and(out_ptr-semi_ptr<=line_length) then break_ptr:=semi_ptr; 1812for k:=1 to break_ptr do write(Pascal_file,xchr[out_buf[k-1]]); 1813write_ln(Pascal_file); incr(line); 1814if line mod 100 = 0 then 1815 begin print('.'); 1816 if line mod 500 = 0 then print(line:1); 1817 update_terminal; {progress report} 1818 end; 1819if break_ptr<out_ptr then 1820 begin if out_buf[break_ptr]=" " then 1821 begin incr(break_ptr); {drop space at break} 1822 if break_ptr>b then b:=break_ptr; 1823 end; 1824 for k:=break_ptr to out_ptr-1 do out_buf[k-break_ptr]:=out_buf[k]; 1825 end; 1826out_ptr:=out_ptr-break_ptr; break_ptr:=b-break_ptr; semi_ptr:=0; 1827if out_ptr>line_length then 1828 begin err_print('! Long line must be truncated'); out_ptr:=line_length; 1829@.Long line must be truncated@> 1830 end; 1831end; 1832 1833@ @<Empty the last line from the buffer@>= 1834break_ptr:=out_ptr; semi_ptr:=0; flush_buffer; 1835if brace_level<>0 then 1836 err_print('! Program ended at brace level ',brace_level:1); 1837@.Program ended at brace level n@> 1838 1839@ Another simple and useful routine appends the decimal equivalent of 1840a nonnegative integer to the output buffer. 1841 1842@d app(#)==begin out_buf[out_ptr]:=#; incr(out_ptr); {append a single character} 1843 end 1844 1845@p procedure app_val(@!v:integer); {puts |v| into buffer, assumes |v>=0|} 1846var k:0..out_buf_size; {index into |out_buf|} 1847begin k:=out_buf_size; {first we put the digits at the very end of |out_buf|} 1848repeat out_buf[k]:=v mod 10; v:=v div 10; decr(k); 1849until v=0; 1850repeat incr(k); app(out_buf[k]+"0"); 1851until k=out_buf_size; {then we append them, most significant first} 1852end; 1853 1854@ The output states are kept up to date by the output routines, which are 1855called |send_out|, |send_val|, and |send_sign|. The |send_out| procedure 1856has two parameters: |t| tells the type of information being sent and 1857|v| contains the information proper. Some information may also be passed 1858in the array |out_contrib|. 1859 1860\yskip\hang If |t=misc| then |v| is a character to be output. 1861 1862\hang If |t=str| then |v| is the length of a string or something like `\.{<>}' 1863in |out_contrib|. 1864 1865\hang If |t=ident| then |v| is the length of an identifier in |out_contrib|. 1866 1867\hang If |t=frac| then |v| is the length of a fraction and/or exponent in 1868|out_contrib|. 1869 1870@d str=1 {|send_out| code for a string} 1871@d ident=2 {|send_out| code for an identifier} 1872@d frac=3 {|send_out| code for a fraction} 1873 1874@<Glob...@>= 1875@!out_contrib:array[1..line_length] of ASCII_code; {a contribution to |out_buf|} 1876 1877@ A slightly subtle point in the following code is that the user may ask 1878for a |join| operation (i.e., \.{@@\&}) following whatever is being sent 1879out. We will see later that |join| is implemented in part by calling 1880|send_out(frac,0)|. 1881 1882@p procedure send_out(@!t:eight_bits; @!v:sixteen_bits); 1883 {outputs |v| of type |t|} 1884label restart; 1885var k: 0..line_length; {index into |out_contrib|} 1886begin @<Get the buffer ready for appending the new information@>; 1887if t<>misc then for k:=1 to v do app(out_contrib[k]) 1888else app(v); 1889check_break; 1890if (t=misc)and((v=";")or(v="}")) then 1891 begin semi_ptr:=out_ptr; break_ptr:=out_ptr; 1892 end; 1893if t>=ident then out_state:=num_or_id {|t=ident| or |frac|} 1894else out_state:=misc {|t=str| or |misc|} 1895end; 1896 1897@ Here is where the buffer states for signs and values collapse into simpler 1898states, because we are about to append something that doesn't combine with 1899the previous integer constants. 1900 1901We use an ASCII-code trick: Since |","-1="+"| and |","+1="-"|, we have 1902|","-c=@t sign of $c$@>|, when $\vert c\vert=1$. 1903 1904@<Get the buffer ready...@>= 1905restart: case out_state of 1906num_or_id: if t<>frac then 1907 begin break_ptr:=out_ptr; 1908 if t=ident then app(" "); 1909 end; 1910sign: begin app(","-out_app); check_break; break_ptr:=out_ptr; 1911 end; 1912sign_val,sign_val_sign: begin @<Append \(|out_val| to buffer@>; 1913 out_state:=out_state-2; goto restart; 1914 end; 1915sign_val_val: @<Reduce |sign_val_val| to |sign_val| and |goto restart|@>; 1916misc: if t<>frac then break_ptr:=out_ptr;@/ 1917othercases do_nothing {this is for |unbreakable| state} 1918endcases 1919 1920@ @<Append \(|out_val|...@>= 1921if (out_val<0)or((out_val=0)and(last_sign<0)) then app("-") 1922else if out_sign>0 then app(out_sign); 1923app_val(abs(out_val)); check_break; 1924 1925@ @<Reduce |sign_val_val|...@>= 1926begin if (t=frac)or(@<Contribution is \.* or \./ or \.{DIV} or \.{MOD}@>) then 1927 begin @<Append \(|out_val| to buffer@>; 1928 out_sign:="+"; out_val:=out_app; 1929 end 1930else out_val:=out_val+out_app; 1931out_state:=sign_val; goto restart; 1932end 1933 1934@ @<Contribution is \.*...@>= 1935((t=ident)and(v=3)and@| 1936 (((out_contrib[1]="D")and(out_contrib[2]="I")and(out_contrib[3]="V")) or@| 1937 ((out_contrib[1]="M")and(out_contrib[2]="O")and(out_contrib[3]="D")) ))or@| 1938@^uppercase@> 1939 ((t=misc)and((v="*")or(v="/"))) 1940 1941@ The following routine is called with $v=\pm1$ when a plus or minus sign is 1942appended to the output. It extends \PASCAL\ to allow repeated signs 1943(e.g., `\.{--}' is equivalent to `\.+'), rather than to give an error message. 1944The signs following `\.E' in real constants are treated as part of a fraction, 1945so they are not seen by this routine. 1946 1947@p procedure send_sign(@!v:integer); 1948begin case out_state of 1949sign, sign_val_sign: out_app:=out_app*v; 1950sign_val:begin out_app:=v; out_state:=sign_val_sign; 1951 end; 1952sign_val_val: begin out_val:=out_val+out_app; out_app:=v; 1953 out_state:=sign_val_sign; 1954 end; 1955othercases begin break_ptr:=out_ptr; out_app:=v; out_state:=sign; 1956 end 1957endcases;@/ 1958last_sign:=out_app; 1959end; 1960 1961@ When a (signed) integer value is to be output, we call |send_val|. 1962 1963@d bad_case=666 {this is a label used below} 1964 1965@p procedure send_val(@!v:integer); {output the (signed) value |v|} 1966label bad_case, {go here if we can't keep |v| in the output state} 1967 exit; 1968begin case out_state of 1969num_or_id: begin @<If previous output was \.{DIV} or \.{MOD}, |goto bad_case|@>; 1970 out_sign:=" "; out_state:=sign_val; out_val:=v; break_ptr:=out_ptr; 1971 last_sign:=+1; 1972 end; 1973misc: begin @<If previous output was \.* or \./, |goto bad_case|@>; 1974 out_sign:=0; out_state:=sign_val; out_val:=v; break_ptr:=out_ptr; 1975 last_sign:=+1; 1976 end; 1977@t\4@>@<Handle cases of |send_val| when |out_state| contains a sign@>@; 1978othercases goto bad_case 1979endcases;@/ 1980return; 1981bad_case: @<Append the decimal value of |v|, with parentheses if negative@>; 1982exit: end; 1983 1984@ @<Handle cases of |send_val|...@>= 1985sign: begin out_sign:="+"; out_state:=sign_val; out_val:=out_app*v; 1986 end; 1987sign_val: begin out_state:=sign_val_val; out_app:=v; 1988 err_print('! Two numbers occurred without a sign between them'); 1989 end; 1990sign_val_sign: begin out_state:=sign_val_val; out_app:=out_app*v; 1991 end; 1992sign_val_val: begin out_val:=out_val+out_app; out_app:=v; 1993 err_print('! Two numbers occurred without a sign between them'); 1994@.Two numbers occurred...@> 1995 end; 1996 1997@ @<If previous output was \.*...@>= 1998if (out_ptr=break_ptr+1)and((out_buf[break_ptr]="*")or(out_buf[break_ptr]="/")) 1999 then goto bad_case 2000 2001@ @<If previous output was \.{DIV}...@>= 2002if (out_ptr=break_ptr+3)or 2003 ((out_ptr=break_ptr+4)and(out_buf[break_ptr]=" ")) then 2004@^uppercase@> 2005 if ((out_buf[out_ptr-3]="D")and(out_buf[out_ptr-2]="I")and 2006 (out_buf[out_ptr-1]="V"))or @/ 2007 ((out_buf[out_ptr-3]="M")and(out_buf[out_ptr-2]="O")and 2008 (out_buf[out_ptr-1]="D")) then@/ goto bad_case 2009 2010@ @<Append the decimal value...@>= 2011if v>=0 then 2012 begin if out_state=num_or_id then 2013 begin break_ptr:=out_ptr; app(" "); 2014 end; 2015 app_val(v); check_break; out_state:=num_or_id; 2016 end 2017else begin app("("); app("-"); app_val(-v); app(")"); check_break; 2018 out_state:=misc; 2019 end 2020 2021@* The big output switch. 2022To complete the output process, we need a routine that takes the results 2023of |get_output| and feeds them to |send_out|, |send_val|, or |send_sign|. 2024This procedure `|send_the_output|' will be invoked just once, as follows: 2025 2026@<Phase II: Output the contents of the compressed tables@>= 2027if text_link[0]=0 then 2028 begin print_nl('! No output was specified.'); mark_harmless; 2029@.No output was specified@> 2030 end 2031else begin print_nl('Writing the output file'); update_terminal;@/ 2032 @<Initialize the output stacks@>; 2033 @<Initialize the output buffer@>; 2034 send_the_output;@/ 2035 @<Empty the last line...@>; 2036 print_nl('Done.'); 2037 end 2038 2039@ A many-way switch is used to send the output: 2040 2041@d get_fraction=2 {this label is used below} 2042 2043@p procedure send_the_output; 2044label get_fraction, {go here to finish scanning a real constant} 2045 reswitch, continue; 2046var cur_char:eight_bits; {the latest character received} 2047 @!k:0..line_length; {index into |out_contrib|} 2048 @!j:0..max_bytes; {index into |byte_mem|} 2049 @!w:0..ww-1; {segment of |byte_mem|} 2050 @!n:integer; {number being scanned} 2051begin while stack_ptr>0 do 2052 begin cur_char:=get_output; 2053 reswitch: case cur_char of 2054 0: do_nothing; {this case might arise if output ends unexpectedly} 2055 @t\4@>@<Cases related to identifiers@>@; 2056 @t\4@>@<Cases related to constants, possibly leading to 2057 |get_fraction| or |reswitch|@>@; 2058 "+","-": send_sign(","-cur_char); 2059 @t\4@>@<Cases like \.{<>} and \.{:=}@>@; 2060 "'": @<Send a string, |goto reswitch|@>; 2061 @<Other printable characters@>: send_out(misc,cur_char); 2062 @t\4@>@<Cases involving \.{@@\{} and \.{@@\}}@>@; 2063 join: begin send_out(frac,0); out_state:=unbreakable; 2064 end; 2065 verbatim: @<Send verbatim string@>; 2066 force_line: @<Force a line break@>; 2067 othercases err_print('! Can''t output ASCII code ',cur_char:1) 2068@.Can't output ASCII code n@> 2069 endcases;@/ 2070 goto continue; 2071 get_fraction: @<Special code to finish real constants@>; 2072 continue: end; 2073end; 2074 2075@ @<Cases like \.{<>}...@>= 2076and_sign: begin out_contrib[1]:="A"; out_contrib[2]:="N"; out_contrib[3]:="D"; 2077@^uppercase@> 2078 send_out(ident,3); 2079 end; 2080not_sign: begin out_contrib[1]:="N"; out_contrib[2]:="O"; out_contrib[3]:="T"; 2081 send_out(ident,3); 2082 end; 2083set_element_sign: begin out_contrib[1]:="I"; out_contrib[2]:="N"; 2084 send_out(ident,2); 2085 end; 2086or_sign: begin out_contrib[1]:="O"; out_contrib[2]:="R"; send_out(ident,2); 2087 end; 2088left_arrow: begin out_contrib[1]:=":"; out_contrib[2]:="="; send_out(str,2); 2089 end; 2090not_equal: begin out_contrib[1]:="<"; out_contrib[2]:=">"; send_out(str,2); 2091 end; 2092less_or_equal: begin out_contrib[1]:="<"; out_contrib[2]:="="; send_out(str,2); 2093 end; 2094greater_or_equal: begin out_contrib[1]:=">"; out_contrib[2]:="="; 2095 send_out(str,2); 2096 end; 2097equivalence_sign: begin out_contrib[1]:="="; out_contrib[2]:="="; 2098 send_out(str,2); 2099 end; 2100double_dot: begin out_contrib[1]:="."; out_contrib[2]:="."; send_out(str,2); 2101 end; 2102 2103@ Please don't ask how all of the following characters can actually get 2104through \.{TANGLE} outside of strings. It seems that |""""| and |"{"| 2105cannot actually occur at this point of the program, but they have 2106been included just in case \.{TANGLE} changes. 2107 2108If \.{TANGLE} is producing code for a \PASCAL\ compiler that uses `\.{(.}' 2109and `\.{.)}' instead of square brackets (e.g., on machines with {\mc EBCDIC} 2110code), one should remove |"["| and |"]"| from this list and put them into 2111the preceding module in the appropriate way. Similarly, some compilers 2112want `\.\^' to be converted to `\.{@@}'. 2113@^system dependencies@>@^EBCDIC@> 2114 2115@<Other printable characters@>= 2116"!","""","#","$","%","&","(",")","*",",","/",":",";","<","=",">","?", 2117"@@","[","\","]","^","_","`","{","|" 2118 2119@ Single-character identifiers represent themselves, while longer ones 2120appear in |byte_mem|. All must be converted to uppercase, 2121with underlines removed. Extremely long identifiers must be chopped. 2122 2123(Some \PASCAL\ compilers work with lowercase letters instead of 2124uppercase. If this module of \.{TANGLE} is changed, it's also necessary 2125to change from uppercase to lowercase in the modules that are 2126listed in the index under ``uppercase''.) 2127@^system dependencies@> 2128@^uppercase@> 2129 2130@d up_to(#)==#-24,#-23,#-22,#-21,#-20,#-19,#-18,#-17,#-16,#-15,#-14, 2131 #-13,#-12,#-11,#-10,#-9,#-8,#-7,#-6,#-5,#-4,#-3,#-2,#-1,# 2132 2133@<Cases related to identifiers@>= 2134"A",up_to("Z"): begin out_contrib[1]:=cur_char; send_out(ident,1); 2135 end; 2136"a",up_to("z"): begin out_contrib[1]:=cur_char-@'40; send_out(ident,1); 2137 end; 2138identifier: begin k:=0; j:=byte_start[cur_val]; w:=cur_val mod ww; 2139 while (k<max_id_length)and(j<byte_start[cur_val+ww]) do 2140 begin incr(k); out_contrib[k]:=byte_mem[w,j]; incr(j); 2141 if out_contrib[k]>="a" then out_contrib[k]:=out_contrib[k]-@'40 2142 else if out_contrib[k]="_" then decr(k); 2143 end; 2144 send_out(ident,k); 2145 end; 2146 2147@ After sending a string, we need to look ahead at the next character, in order 2148to see if there were two consecutive single-quote marks. Afterwards we go to 2149|reswitch| to process the next character. 2150 2151@<Send a string...@>= 2152begin k:=1; out_contrib[1]:="'"; 2153repeat if k<line_length then incr(k); 2154out_contrib[k]:=get_output; 2155until (out_contrib[k]="'")or(stack_ptr=0); 2156if k=line_length then err_print('! String too long'); 2157@.String too long@> 2158send_out(str,k); cur_char:=get_output; 2159if cur_char="'" then out_state:=unbreakable; 2160goto reswitch; 2161end 2162 2163@ Sending a verbatim string is similar, but we don't have to look ahead. 2164 2165@<Send verbatim string@>= 2166begin k:=0; 2167repeat if k<line_length then incr(k); 2168out_contrib[k]:=get_output; 2169until (out_contrib[k]=verbatim)or(stack_ptr=0); 2170if k=line_length then err_print('! Verbatim string too long'); 2171@.Verbatim string too long@> 2172send_out(str,k-1); 2173end 2174 2175@ In order to encourage portable software, \.{TANGLE} complains 2176if the constants get dangerously close to the largest value representable 2177on a 32-bit computer ($2^{31}-1$). 2178 2179@d digits=="0","1","2","3","4","5","6","7","8","9" 2180 2181@<Cases related to constants...@>= 2182digits: begin n:=0; 2183 repeat cur_char:=cur_char-"0"; 2184 if n>=@'1463146314 then err_print('! Constant too big') 2185@.Constant too big@> 2186 else n:=10*n+cur_char; 2187 cur_char:=get_output; 2188 until (cur_char>"9")or(cur_char<"0"); 2189 send_val(n); k:=0; 2190 if cur_char="e" then cur_char:="E"; 2191@^uppercase@> 2192 if cur_char="E" then goto get_fraction 2193 else goto reswitch; 2194 end; 2195check_sum: send_val(pool_check_sum); 2196octal: begin n:=0; cur_char:="0"; 2197 repeat cur_char:=cur_char-"0"; 2198 if n>=@'2000000000 then err_print('! Constant too big') 2199 else n:=8*n+cur_char; 2200 cur_char:=get_output; 2201 until (cur_char>"7")or(cur_char<"0"); 2202 send_val(n); goto reswitch; 2203 end; 2204hex: begin n:=0; cur_char:="0"; 2205 repeat if cur_char>="A" then cur_char:=cur_char+10-"A" 2206 else cur_char:=cur_char-"0"; 2207 if n>=@"8000000 then err_print('! Constant too big') 2208 else n:=16*n+cur_char; 2209 cur_char:=get_output; 2210 until (cur_char>"F")or(cur_char<"0")or@| 2211 ((cur_char>"9")and(cur_char<"A")); 2212 send_val(n); goto reswitch; 2213 end; 2214number: send_val(cur_val); 2215".": begin k:=1; out_contrib[1]:="."; cur_char:=get_output; 2216 if cur_char="." then 2217 begin out_contrib[2]:="."; send_out(str,2); 2218 end 2219 else if (cur_char>="0")and(cur_char<="9") then goto get_fraction 2220 else begin send_out(misc,"."); goto reswitch; 2221 end; 2222 end; 2223 2224@ The following code appears at label `|get_fraction|', when we want to 2225scan to the end of a real constant. The first |k| characters of a fraction 2226have already been placed in |out_contrib|, and |cur_char| is the next character. 2227 2228@<Special code...@>= 2229repeat if k<line_length then incr(k); 2230out_contrib[k]:=cur_char; cur_char:=get_output; 2231if (out_contrib[k]="E")and((cur_char="+")or(cur_char="-")) then 2232@^uppercase@> 2233 begin if k<line_length then incr(k); 2234 out_contrib[k]:=cur_char; cur_char:=get_output; 2235 end 2236else if cur_char="e" then cur_char:="E"; 2237until (cur_char<>"E")and((cur_char<"0")or(cur_char>"9")); 2238if k=line_length then err_print('! Fraction too long'); 2239@.Fraction too long@> 2240send_out(frac,k); goto reswitch 2241 2242@ Some \PASCAL\ compilers do not recognize comments in braces, so the 2243comments must be delimited by `\.{(*}' and `\.{*)}'. 2244@^system dependencies@> 2245In such cases the statement `|out_contrib[1]:="{"|' that appears here should 2246be replaced by `\ignorespaces|begin out_contrib[1]:="("; out_contrib[2]:="*"; 2247incr(k); end|', and a similar change should be made to 2248`|out_contrib[k]:="}"|'. 2249 2250@<Cases involving \.{@@\{} and \.{@@\}}@>= 2251begin_comment: begin if brace_level=0 then send_out(misc,"{") 2252 else send_out(misc,"["); 2253 incr(brace_level); 2254 end; 2255end_comment: if brace_level>0 then 2256 begin decr(brace_level); 2257 if brace_level=0 then send_out(misc,"}") 2258 else send_out(misc,"]"); 2259 end 2260 else err_print('! Extra @@}'); 2261@.Extra \AT!\}@> 2262module_number: begin k:=2; 2263 if brace_level=0 then out_contrib[1]:="{" 2264 else out_contrib[1]:="["; 2265 if cur_val<0 then 2266 begin out_contrib[k]:=":"; cur_val:=-cur_val; incr(k); 2267 end; 2268 n:=10; 2269 while cur_val>=n do n:=10*n; 2270 repeat n:=n div 10; 2271 out_contrib[k]:="0"+(cur_val div n); cur_val:=cur_val mod n; incr(k); 2272 until n=1; 2273 if out_contrib[2]<>":" then 2274 begin out_contrib[k]:=":"; incr(k); 2275 end; 2276 if brace_level=0 then out_contrib[k]:="}" 2277 else out_contrib[k]:="]"; 2278 send_out(str,k); 2279 end; 2280 2281@ @<Force a line break@>= 2282begin send_out(str,0); {normalize the buffer} 2283while out_ptr>0 do 2284 begin if out_ptr<=line_length then break_ptr:=out_ptr; 2285 flush_buffer; 2286 end; 2287out_state:=misc; 2288end 2289 2290@* Introduction to the input phase. 2291We have now seen that \.{TANGLE} will be able to output the full 2292\PASCAL\ program, if we can only get that program into the byte memory in 2293the proper format. The input process is something like the output process 2294in reverse, since we compress the text as we read it in and we expand it 2295as we write it out. 2296 2297There are three main input routines. The most interesting is the one that gets 2298the next token of a \PASCAL\ text; the other two are used to scan rapidly past 2299\TeX\ text in the \.{WEB} source code. One of the latter routines will jump to 2300the next token that starts with `\.{@@}', and the other skips to the end 2301of a \PASCAL\ comment. 2302 2303@ But first we need to consider the low-level routine |get_line| 2304that takes care of merging |change_file| into |web_file|. The |get_line| 2305procedure also updates the line numbers for error messages. 2306 2307@<Globals...@>= 2308@!ii:integer; {general purpose |for| loop variable in the outer block} 2309@!line:integer; {the number of the current line in the current file} 2310@!other_line:integer; {the number of the current line in the input file that 2311 is not currently being read} 2312@!temp_line:integer; {used when interchanging |line| with |other_line|} 2313@!limit:0..buf_size; {the last character position occupied in the buffer} 2314@!loc:0..buf_size; {the next character position to be read from the buffer} 2315@!input_has_ended: boolean; {if |true|, there is no more input} 2316@!changing: boolean; {if |true|, the current line is from |change_file|} 2317 2318@ As we change |changing| from |true| to |false| and back again, we must 2319remember to swap the values of |line| and |other_line| so that the |err_print| 2320routine will be sure to report the correct line number. 2321 2322@d change_changing== 2323 changing := not changing; 2324 temp_line:=other_line; other_line:=line; line:=temp_line 2325 {|line @t$\null\BA\null$@> other_line|} 2326 2327@ When |changing| is |false|, the next line of |change_file| is kept in 2328|change_buffer[0..change_limit]|, for purposes of comparison with the next 2329line of |web_file|. After the change file has been completely input, we 2330set |change_limit:=0|, so that no further matches will be made. 2331 2332@<Globals...@>= 2333@!change_buffer:array[0..buf_size] of ASCII_code; 2334@!change_limit:0..buf_size; {the last position occupied in |change_buffer|} 2335 2336@ Here's a simple function that checks if the two buffers are different. 2337 2338@p function lines_dont_match:boolean; 2339label exit; 2340var k:0..buf_size; {index into the buffers} 2341begin lines_dont_match:=true; 2342if change_limit<>limit then return; 2343if limit>0 then 2344 for k:=0 to limit-1 do if change_buffer[k]<>buffer[k] then return; 2345lines_dont_match:=false; 2346exit: end; 2347 2348@ Procedure |prime_the_change_buffer| sets |change_buffer| in preparation 2349for the next matching operation. Since blank lines in the change file are 2350not used for matching, we have |(change_limit=0)and not changing| if and 2351only if the change file is exhausted. This procedure is called only 2352when |changing| is true; hence error messages will be reported correctly. 2353 2354@p procedure prime_the_change_buffer; 2355label continue, done, exit; 2356var k:0..buf_size; {index into the buffers} 2357begin change_limit:=0; {this value will be used if the change file ends} 2358@<Skip over comment lines in the change file; |return| if end of file@>; 2359@<Skip to the next nonblank line; |return| if end of file@>; 2360@<Move |buffer| and |limit| to |change_buffer| and |change_limit|@>; 2361exit: end; 2362 2363@ While looking for a line that begins with \.{@@x} in the change file, 2364we allow lines that begin with \.{@@}, as long as they don't begin with 2365\.{@@y} or \.{@@z} (which would probably indicate that the change file is 2366fouled up). 2367 2368@<Skip over comment lines in the change file...@>= 2369loop@+ begin incr(line); 2370 if not input_ln(change_file) then return; 2371 if limit<2 then goto continue; 2372 if buffer[0]<>"@@" then goto continue; 2373 if (buffer[1]>="X")and(buffer[1]<="Z") then 2374 buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify} 2375 if buffer[1]="x" then goto done; 2376 if (buffer[1]="y")or(buffer[1]="z") then 2377 begin loc:=2; err_print('! Where is the matching @@x?'); 2378@.Where is the match...@> 2379 end; 2380continue: end; 2381done: 2382 2383@ Here we are looking at lines following the \.{@@x}. 2384 2385@<Skip to the next nonblank line...@>= 2386repeat incr(line); 2387 if not input_ln(change_file) then 2388 begin err_print('! Change file ended after @@x'); 2389@.Change file ended...@> 2390 return; 2391 end; 2392until limit>0; 2393 2394@ @<Move |buffer| and |limit| to |change_buffer| and |change_limit|@>= 2395begin change_limit:=limit; 2396if limit>0 then for k:=0 to limit-1 do change_buffer[k]:=buffer[k]; 2397end 2398 2399@ The following procedure is used to see if the next change entry should 2400go into effect; it is called only when |changing| is false. 2401The idea is to test whether or not the current 2402contents of |buffer| matches the current contents of |change_buffer|. 2403If not, there's nothing more to do; but if so, a change is called for: 2404All of the text down to the \.{@@y} is supposed to match. An error 2405message is issued if any discrepancy is found. Then the procedure 2406prepares to read the next line from |change_file|. 2407 2408@p procedure check_change; {switches to |change_file| if the buffers match} 2409label exit; 2410var n:integer; {the number of discrepancies found} 2411@!k:0..buf_size; {index into the buffers} 2412begin if lines_dont_match then return; 2413n:=0; 2414loop@+ begin change_changing; {now it's |true|} 2415 incr(line); 2416 if not input_ln(change_file) then 2417 begin err_print('! Change file ended before @@y'); 2418@.Change file ended...@> 2419 change_limit:=0; change_changing; {|false| again} 2420 return; 2421 end; 2422 @<If the current line starts with \.{@@y}, 2423 report any discrepancies and |return|@>; 2424 @<Move |buffer| and |limit|...@>; 2425 change_changing; {now it's |false|} 2426 incr(line); 2427 if not input_ln(web_file) then 2428 begin err_print('! WEB file ended during a change'); 2429@.WEB file ended...@> 2430 input_has_ended:=true; return; 2431 end; 2432 if lines_dont_match then incr(n); 2433 end; 2434exit: end; 2435 2436@ @<If the current line starts with \.{@@y}...@>= 2437if limit>1 then if buffer[0]="@@" then 2438 begin if (buffer[1]>="X")and(buffer[1]<="Z") then 2439 buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify} 2440 if (buffer[1]="x")or(buffer[1]="z") then 2441 begin loc:=2; err_print('! Where is the matching @@y?'); 2442@.Where is the match...@> 2443 end 2444 else if buffer[1]="y" then 2445 begin if n>0 then 2446 begin loc:=2; err_print('! Hmm... ',n:1, 2447 ' of the preceding lines failed to match'); 2448@.Hmm... n of the preceding...@> 2449 end; 2450 return; 2451 end; 2452 end 2453 2454@ @<Initialize the input system@>= 2455open_input; line:=0; other_line:=0;@/ 2456changing:=true; prime_the_change_buffer; change_changing;@/ 2457limit:=0; loc:=1; buffer[0]:=" "; input_has_ended:=false; 2458 2459@ The |get_line| procedure is called when |loc>limit|; it puts the next 2460line of merged input into the buffer and updates the other variables 2461appropriately. A space is placed at the right end of the line. 2462 2463@p procedure get_line; {inputs the next line} 2464label restart; 2465begin restart: if changing then 2466 @<Read from |change_file| and maybe turn off |changing|@>; 2467if not changing then 2468 begin @<Read from |web_file| and maybe turn on |changing|@>; 2469 if changing then goto restart; 2470 end; 2471loc:=0; buffer[limit]:=" "; 2472end; 2473 2474@ @<Read from |web_file|...@>= 2475begin incr(line); 2476if not input_ln(web_file) then input_has_ended:=true 2477else if limit=change_limit then 2478 if buffer[0]=change_buffer[0] then 2479 if change_limit>0 then check_change; 2480end 2481 2482@ @<Read from |change_file|...@>= 2483begin incr(line); 2484if not input_ln(change_file) then 2485 begin err_print('! Change file ended without @@z'); 2486@.Change file ended...@> 2487 buffer[0]:="@@"; buffer[1]:="z"; limit:=2; 2488 end; 2489if limit>1 then {check if the change has ended} 2490 if buffer[0]="@@" then 2491 begin if (buffer[1]>="X")and(buffer[1]<="Z") then 2492 buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify} 2493 if (buffer[1]="x")or(buffer[1]="y") then 2494 begin loc:=2; err_print('! Where is the matching @@z?'); 2495@.Where is the match...@> 2496 end 2497 else if buffer[1]="z" then 2498 begin prime_the_change_buffer; change_changing; 2499 end; 2500 end; 2501end 2502 2503@ At the end of the program, we will tell the user if the change file 2504had a line that didn't match any relevant line in |web_file|. 2505 2506@<Check that all changes have been read@>= 2507if change_limit<>0 then {|changing| is false} 2508 begin for ii:=0 to change_limit do buffer[ii]:=change_buffer[ii]; 2509 limit:=change_limit; changing:=true; line:=other_line; loc:=change_limit; 2510 err_print('! Change file entry did not match'); 2511@.Change file entry did not match@> 2512 end 2513 2514@ Important milestones are reached during the input phase when certain 2515control codes are sensed. 2516 2517Control codes in \.{WEB} begin with `\.{@@}', and the next character 2518identifies the code. Some of these are of interest only to \.{WEAVE}, 2519so \.{TANGLE} ignores them; the others are converted by \.{TANGLE} into 2520internal code numbers by the |control_code| function below. The ordering 2521of these internal code numbers has been chosen to simplify the program logic; 2522larger numbers are given to the control codes that denote more significant 2523milestones. 2524 2525@d ignore=0 {control code of no interest to \.{TANGLE}} 2526@d control_text=@'203 {control code for `\.{@@t}', `\.{@@\^}', etc.} 2527@d format=@'204 {control code for `\.{@@f}'} 2528@d definition=@'205 {control code for `\.{@@d}'} 2529@d begin_Pascal=@'206 {control code for `\.{@@p}'} 2530@d module_name=@'207 {control code for `\.{@@<}'} 2531@d new_module=@'210 {control code for `\.{@@\ }' and `\.{@@*}'} 2532 2533@p function control_code(@!c:ASCII_code):eight_bits; {convert |c| after \.{@@}} 2534begin case c of 2535"@@": control_code:="@@"; {`quoted' at sign} 2536"'": control_code:=octal; {precedes octal constant} 2537"""": control_code:=hex; {precedes hexadecimal constant} 2538"$": control_code:=check_sum; {string pool check sum} 2539" ",tab_mark: control_code:=new_module; {beginning of a new module} 2540"*": begin print('*',module_count+1:1); 2541 update_terminal; {print a progress report} 2542 control_code:=new_module; {beginning of a new module} 2543 end; 2544"D","d": control_code:=definition; {macro definition} 2545"F","f": control_code:=format; {format definition} 2546"{": control_code:=begin_comment; {begin-comment delimiter} 2547"}": control_code:=end_comment; {end-comment delimiter} 2548"P","p": control_code:=begin_Pascal; {\PASCAL\ text in unnamed module} 2549"T","t","^",".",":": control_code:=control_text; {control text to be ignored} 2550"&": control_code:=join; {concatenate two tokens} 2551"<": control_code:=module_name; {beginning of a module name} 2552"=": control_code:=verbatim; {beginning of \PASCAL\ verbatim mode} 2553"\": control_code:=force_line; {force a new line in \PASCAL\ output} 2554othercases control_code:=ignore {ignore all other cases} 2555endcases; 2556end; 2557 2558@ The |skip_ahead| procedure reads through the input at fairly high speed 2559until finding the next non-ignorable control code, which it returns. 2560 2561@p function skip_ahead:eight_bits; {skip to next control code} 2562label done; 2563var c:eight_bits; {control code found} 2564begin loop begin if loc>limit then 2565 begin get_line; 2566 if input_has_ended then 2567 begin c:=new_module; goto done; 2568 end; 2569 end; 2570 buffer[limit+1]:="@@"; 2571 while buffer[loc]<>"@@" do incr(loc); 2572 if loc<=limit then 2573 begin loc:=loc+2; c:=control_code(buffer[loc-1]); 2574 if (c<>ignore)or(buffer[loc-1]=">") then goto done; 2575 end; 2576 end; 2577done: skip_ahead:=c; 2578end; 2579 2580@ The |skip_comment| procedure reads through the input at somewhat high speed 2581until finding the first unmatched right brace or until coming to the end 2582of the file. It ignores characters following `\.\\' characters, since all 2583braces that aren't nested are supposed to be hidden in that way. For 2584example, consider the process of skipping the first comment below, 2585where the string containing the right brace has been typed as \.{\`\\.\\\}\'} 2586in the \.{WEB} file. 2587 2588@p procedure skip_comment; {skips to next unmatched `\.\}'} 2589label exit; 2590var bal:eight_bits; {excess of left braces} 2591@!c:ASCII_code; {current character} 2592begin bal:=0; 2593loop@+ begin if loc>limit then 2594 begin get_line; 2595 if input_has_ended then 2596 begin err_print('! Input ended in mid-comment'); 2597@.Input ended in mid-comment@> 2598 return; 2599 end; 2600 end; 2601 c:=buffer[loc]; incr(loc); 2602 @<Do special things when |c="@@", "\", "{", "}"|; |return| at end@>; 2603 end; 2604exit:end; 2605 2606@ @<Do special things when |c="@@"...@>= 2607if c="@@" then 2608 begin c:=buffer[loc]; 2609 if (c<>" ")and(c<>tab_mark)and(c<>"*")and(c<>"z")and(c<>"Z") then incr(loc) 2610 else begin err_print('! Section ended in mid-comment'); 2611@.Section ended in mid-comment@> 2612 decr(loc); return; 2613 end 2614 end 2615else if (c="\")and(buffer[loc]<>"@@") then incr(loc) 2616else if c="{" then incr(bal) 2617else if c="}" then 2618 begin if bal=0 then return; 2619 decr(bal); 2620 end 2621 2622@* Inputting the next token. 2623As stated above, \.{TANGLE}'s most interesting input procedure is the 2624|get_next| routine that inputs the next token. However, the procedure 2625isn't especially difficult. 2626 2627In most cases the tokens output by |get_next| have the form used in 2628replacement texts, except that two-byte tokens are not produced. 2629An identifier that isn't one letter long is represented by the 2630output `|identifier|', and in such a case the global variables 2631|id_first| and |id_loc| will have been set to the appropriate values 2632needed by the |id_lookup| procedure. A string that begins with a 2633double-quote is also considered an |identifier|, and in such a case 2634the global variable |double_chars| will also have been set appropriately. 2635Control codes produce the corresponding output of the |control_code| 2636function above; and if that code is |module_name|, the value of |cur_module| 2637will point to the |byte_start| entry for that module name. 2638 2639Another global variable, |scanning_hex|, is |true| during the time that 2640the letters \.A through \.F should be treated as if they were digits. 2641 2642@<Globals...@>= 2643@!cur_module: name_pointer; {name of module just scanned} 2644@!scanning_hex: boolean; {are we scanning a hexadecimal constant?} 2645 2646@ @<Set init...@>= 2647scanning_hex:=false; 2648 2649@ At the top level, |get_next| is a multi-way switch based on the next 2650character in the input buffer. A |new_module| code is inserted at the 2651very end of the input file. 2652 2653@p function get_next:eight_bits; {produces the next input token} 2654label restart,done,found; 2655var c:eight_bits; {the current character} 2656@!d:eight_bits; {the next character} 2657@!j,@!k:0..longest_name; {indices into |mod_text|} 2658begin restart: if loc>limit then 2659 begin get_line; 2660 if input_has_ended then 2661 begin c:=new_module; goto found; 2662 end; 2663 end; 2664c:=buffer[loc]; incr(loc); 2665if scanning_hex then @<Go to |found| if |c| is a hexadecimal digit, 2666 otherwise set |scanning_hex:=false|@>; 2667case c of 2668"A",up_to("Z"),"a",up_to("z"): @<Get an identifier@>; 2669"""": @<Get a preprocessed string@>; 2670"@@": @<Get control code and possible module name@>; 2671@t\4@>@<Compress two-symbol combinations like `\.{:=}'@>@; 2672" ",tab_mark: goto restart; {ignore spaces and tabs} 2673"{": begin skip_comment; goto restart; 2674 end; 2675"}": begin err_print('! Extra }'); goto restart; 2676@.Extra \}@> 2677 end; 2678othercases if c>=128 then goto restart {ignore nonstandard characters} 2679 else do_nothing 2680endcases; 2681found:@!debug if trouble_shooting then debug_help;@;@+gubed@/ 2682get_next:=c; 2683end; 2684 2685@ @<Go to |found| if |c| is a hexadecimal digit...@>= 2686if ((c>="0")and(c<="9"))or((c>="A")and(c<="F")) then goto found 2687else scanning_hex:=false 2688 2689@ Note that the following code substitutes \.{@@\{} and \.{@@\}} for the 2690respective combinations `\.{(*}' and `\.{*)}'. Explicit braces should be used 2691for \TeX\ comments in \PASCAL\ text. 2692 2693@d compress(#)==begin if loc<=limit then begin c:=#; incr(loc); end; end 2694 2695@<Compress two-symbol...@>= 2696".": if buffer[loc]="." then compress(double_dot) 2697 else if buffer[loc]=")" then compress("]"); 2698":": if buffer[loc]="=" then compress(left_arrow); 2699"=": if buffer[loc]="=" then compress(equivalence_sign); 2700">": if buffer[loc]="=" then compress(greater_or_equal); 2701"<": if buffer[loc]="=" then compress(less_or_equal) 2702 else if buffer[loc]=">" then compress(not_equal); 2703"(": if buffer[loc]="*" then compress(begin_comment) 2704 else if buffer[loc]="." then compress("["); 2705"*": if buffer[loc]=")" then compress(end_comment); 2706 2707@ We have to look at the preceding character to make sure this isn't part 2708of a real constant, before trying to find an identifier starting with 2709`\.e' or `\.E'. 2710 2711@<Get an identifier@>= 2712begin if ((c="e")or(c="E"))and(loc>1) then 2713 if (buffer[loc-2]<="9")and(buffer[loc-2]>="0") then c:=0; 2714if c<>0 then 2715 begin decr(loc); id_first:=loc; 2716 repeat incr(loc); d:=buffer[loc]; 2717 until ((d<"0")or((d>"9")and(d<"A"))or((d>"Z")and(d<"a"))or(d>"z")) and 2718 (d<>"_"); 2719 if loc>id_first+1 then 2720 begin c:=identifier; id_loc:=loc; 2721 end; 2722 end 2723else c:="E"; {exponent of a real constant} 2724end 2725 2726@ A string that starts and ends with double-quote marks is converted into 2727an identifier that behaves like a numeric macro by means of the following 2728piece of the program. 2729@^preprocessed strings@> 2730 2731@<Get a preprocessed string@>= 2732begin double_chars:=0; id_first:=loc-1; 2733repeat d:=buffer[loc]; incr(loc); 2734 if (d="""")or(d="@@") then 2735 if buffer[loc]=d then 2736 begin incr(loc); d:=0; incr(double_chars); 2737 end 2738 else begin if d="@@" then err_print('! Double @@ sign missing') 2739@.Double \AT! sign missing@> 2740 end 2741 else if loc>limit then 2742 begin err_print('! String constant didn''t end'); d:=""""; 2743@.String constant didn't end@> 2744 end; 2745until d=""""; 2746id_loc:=loc-1; c:=identifier; 2747end 2748 2749@ After an \.{@@} sign has been scanned, the next character tells us 2750whether there is more work to do. 2751 2752@<Get control code and possible module name@>= 2753begin c:=control_code(buffer[loc]); incr(loc); 2754if c=ignore then goto restart 2755else if c=hex then scanning_hex:=true 2756else if c=module_name then 2757 @<Scan the \(module name and make |cur_module| point to it@> 2758else if c=control_text then 2759 begin repeat c:=skip_ahead; 2760 until c<>"@@"; 2761 if buffer[loc-1]<>">" then 2762 err_print('! Improper @@ within control text'); 2763@.Improper \AT! within control text@> 2764 goto restart; 2765 end; 2766end 2767 2768@ @<Scan the \(module name...@>= 2769begin @<Put module name into |mod_text[1..k]|@>; 2770if k>3 then 2771 begin if (mod_text[k]=".")and(mod_text[k-1]=".")and(mod_text[k-2]=".") then 2772 cur_module:=prefix_lookup(k-3) 2773 else cur_module:=mod_lookup(k); 2774 end 2775else cur_module:=mod_lookup(k); 2776end 2777 2778@ Module names are placed into the |mod_text| array with consecutive spaces, 2779tabs, and carriage-returns replaced by single spaces. There will be no 2780spaces at the beginning or the end. (We set |mod_text[0]:=" "| to facilitate 2781this, since the |mod_lookup| routine uses |mod_text[1]| as the first 2782character of the name.) 2783 2784@<Set init...@>=mod_text[0]:=" "; 2785 2786@ @<Put module name...@>= 2787k:=0; 2788loop@+ begin if loc>limit then 2789 begin get_line; 2790 if input_has_ended then 2791 begin err_print('! Input ended in section name'); 2792@.Input ended in section name@> 2793 goto done; 2794 end; 2795 end; 2796 d:=buffer[loc]; 2797 @<If end of name, |goto done|@>; 2798 incr(loc); if k<longest_name-1 then incr(k); 2799 if (d=" ")or(d=tab_mark) then 2800 begin d:=" "; if mod_text[k-1]=" " then decr(k); 2801 end; 2802 mod_text[k]:=d; 2803 end; 2804done: @<Check for overlong name@>; 2805if (mod_text[k]=" ")and(k>0) then decr(k); 2806 2807@ @<If end of name,...@>= 2808if d="@@" then 2809 begin d:=buffer[loc+1]; 2810 if d=">" then 2811 begin loc:=loc+2; goto done; 2812 end; 2813 if (d=" ")or(d=tab_mark)or(d="*") then 2814 begin err_print('! Section name didn''t end'); goto done; 2815@.Section name didn't end@> 2816 end; 2817 incr(k); mod_text[k]:="@@"; incr(loc); {now |d=buffer[loc]| again} 2818 end 2819 2820@ @<Check for overlong name@>= 2821if k>=longest_name-2 then 2822 begin print_nl('! Section name too long: '); 2823@.Section name too long@> 2824 for j:=1 to 25 do print(xchr[mod_text[j]]); 2825 print('...'); mark_harmless; 2826 end 2827 2828@* Scanning a numeric definition. 2829When \.{TANGLE} looks at the \PASCAL\ text following the `\.=' of a numeric 2830macro definition, it calls on the precedure |scan_numeric(p)|, where |p| 2831points to the name that is to be defined. This procedure evaluates the 2832right-hand side, which must consist entirely of integer constants and 2833defined numeric macros connected with \.+ and \.- signs (no parentheses). 2834It also sets the global variable |next_control| to the control code that 2835terminated this definition. 2836 2837A definition ends with the control codes |definition|, |format|, |module_name|, 2838|begin_Pascal|, and |new_module|, all of which can be recognized 2839by the fact that they are the largest values |get_next| can return. 2840 2841@d end_of_definition(#)==(#>=format) 2842 {is |#| a control code ending a definition?} 2843 2844@<Global...@>= 2845@!next_control:eight_bits; {control code waiting to be acted upon} 2846 2847@ The evaluation of a numeric expression makes use of two variables called the 2848|accumulator| and the |next_sign|. At the beginning, |accumulator| is zero and 2849|next_sign| is $+1$. When a \.+ or \.- is scanned, |next_sign| is multiplied 2850by the value of that sign. When a numeric value is scanned, it is multiplied by 2851|next_sign| and added to the |accumulator|, then |next_sign| is reset to $+1$. 2852 2853@d add_in(#)==begin accumulator:=accumulator+next_sign*(#); next_sign:=+1; 2854 end 2855 2856@p procedure scan_numeric(@!p:name_pointer); {defines numeric macros} 2857label reswitch, done; 2858var accumulator:integer; {accumulates sums} 2859@!next_sign:-1..+1; {sign to attach to next value} 2860@!q:name_pointer; {points to identifiers being evaluated} 2861@!val:integer; {constants being evaluated} 2862begin @<Set \(|accumulator| to the value of the right-hand side@>; 2863if abs(accumulator)>=@'100000 then 2864 begin err_print('! Value too big: ',accumulator:1); accumulator:=0; 2865@.Value too big@> 2866 end; 2867equiv[p]:=accumulator+@'100000; {name |p| now is defined to equal |accumulator|} 2868end; 2869 2870@ @<Set \(|accumulator| to the value of the right-hand side@>= 2871accumulator:=0; next_sign:=+1; 2872loop@+ begin next_control:=get_next; 2873 reswitch: case next_control of 2874 digits: begin @<Set |val| to value of decimal constant, and 2875 set |next_control| to the following token@>; 2876 add_in(val); goto reswitch; 2877 end; 2878 octal: begin @<Set |val| to value of octal constant, and 2879 set |next_control| to the following token@>; 2880 add_in(val); goto reswitch; 2881 end; 2882 hex: begin @<Set |val| to value of hexadecimal constant, and 2883 set |next_control| to the following token@>; 2884 add_in(val); goto reswitch; 2885 end; 2886 identifier: begin q:=id_lookup(normal); 2887 if ilk[q]<>numeric then 2888 begin next_control:="*"; goto reswitch; {leads to error} 2889 end; 2890 add_in(equiv[q]-@'100000); 2891 end; 2892 "+": do_nothing; 2893 "-": next_sign:=-next_sign; 2894 format, definition, module_name, begin_Pascal, new_module: goto done; 2895 ";": err_print('! Omit semicolon in numeric definition'); 2896@.Omit semicolon in numeric def...@> 2897 othercases @<Signal error, flush rest of the definition@> 2898 endcases; 2899 end; 2900done: 2901 2902@ @<Signal error, flush rest...@>= 2903begin err_print('! Improper numeric definition will be flushed'); 2904@.Improper numeric definition...@> 2905repeat next_control:=skip_ahead 2906until end_of_definition(next_control); 2907if next_control=module_name then 2908 begin {we want to scan the module name too} 2909 loc:=loc-2; next_control:=get_next; 2910 end; 2911accumulator:=0; goto done; 2912end 2913 2914@ @<Set |val| to value of decimal...@>= 2915val:=0; 2916repeat val:=10*val+next_control-"0"; next_control:=get_next; 2917until (next_control>"9")or(next_control<"0") 2918 2919@ @<Set |val| to value of octal...@>= 2920val:=0; next_control:="0"; 2921repeat val:=8*val+next_control-"0"; next_control:=get_next; 2922until (next_control>"7")or(next_control<"0") 2923 2924@ @<Set |val| to value of hex...@>= 2925val:=0; next_control:="0"; 2926repeat if next_control>="A" then next_control:=next_control+"0"+10-"A"; 2927val:=16*val+next_control-"0"; next_control:=get_next; 2928until (next_control>"F")or(next_control<"0")or@| 2929 ((next_control>"9")and(next_control<"A")) 2930 2931@* Scanning a macro definition. 2932The rules for generating the replacement texts corresponding to simple 2933macros, parametric macros, and \PASCAL\ texts of a module are almost 2934identical, so a single procedure is used for all three cases. The 2935differences are that 2936 2937\yskip\item{a)} The sign |#| denotes a parameter only when it appears 2938outside of strings in a parametric macro; otherwise it stands for the 2939ASCII character |#|. (This is not used in standard \PASCAL, but some 2940\PASCAL s allow, for example, `\.{/\#}' after a certain kind of file name.) 2941 2942\item{b)}Module names are not allowed in simple macros or parametric macros; 2943in fact, the appearance of a module name terminates such macros and denotes 2944the name of the current module. 2945 2946\item{c)}The symbols \.{@@d} and \.{@@f} and \.{@@p} are not allowed after 2947module names, while they terminate macro definitions. 2948 2949@ Therefore there is a procedure |scan_repl| whose parameter |t| specifies 2950either |simple| or |parametric| or |module_name|. After |scan_repl| has 2951acted, |cur_repl_text| will point to the replacement text just generated, and 2952|next_control| will contain the control code that terminated the activity. 2953 2954@<Globals...@>= 2955@!cur_repl_text:text_pointer; {replacement text formed by |scan_repl|} 2956 2957@ @p procedure scan_repl(@!t:eight_bits); {creates a replacement text} 2958label continue, done, found, reswitch; 2959var a:sixteen_bits; {the current token} 2960@!b:ASCII_code; {a character from the buffer} 2961@!bal:eight_bits; {left parentheses minus right parentheses} 2962begin bal:=0; 2963loop@+ begin continue: a:=get_next; 2964 case a of 2965 "(": incr(bal); 2966 ")": if bal=0 then err_print('! Extra )') 2967@.Extra )@> 2968 else decr(bal); 2969 "'": @<Copy a string from the buffer to |tok_mem|@>; 2970 "#": if t=parametric then a:=param; 2971 @t\4@>@<In cases that |a| is a non-ASCII token (|identifier|, 2972 |module_name|, etc.), either process it and change |a| to a byte 2973 that should be stored, or |goto continue| if |a| should be ignored, 2974 or |goto done| if |a| signals the end of this replacement text@>@; 2975 othercases do_nothing 2976 endcases;@/ 2977 app_repl(a); {store |a| in |tok_mem|} 2978 end; 2979done: next_control:=a; 2980@<Make sure the parentheses balance@>; 2981if text_ptr>max_texts-zz then overflow('text'); 2982cur_repl_text:=text_ptr; tok_start[text_ptr+zz]:=tok_ptr[z]; 2983incr(text_ptr); 2984if z=zz-1 then z:=0@+else incr(z); 2985end; 2986 2987@ @<Make sure the parentheses balance@>= 2988if bal>0 then 2989 begin if bal=1 then err_print('! Missing )') 2990 else err_print('! Missing ',bal:1,' )''s'); 2991@.Missing n )@> 2992 while bal>0 do 2993 begin app_repl(")"); decr(bal); 2994 end; 2995 end 2996 2997@ @<In cases that |a| is...@>= 2998identifier: begin a:=id_lookup(normal); app_repl((a div @'400)+@'200); 2999 a:=a mod @'400; 3000 end; 3001module_name: if t<>module_name then goto done 3002 else begin app_repl((cur_module div @'400)+@'250); 3003 a:=cur_module mod @'400; 3004 end; 3005verbatim: @<Copy verbatim string from the buffer to |tok_mem|@>; 3006definition, format, begin_Pascal: if t<>module_name then goto done 3007 else begin err_print('! @@',xchr[buffer[loc-1]], 3008@.\AT!p is ignored in Pascal text@> 3009@.\AT!d is ignored in Pascal text@> 3010@.\AT!f is ignored in Pascal text@> 3011 ' is ignored in Pascal text'); goto continue; 3012 end; 3013new_module: goto done; 3014 3015@ @<Copy a string...@>= 3016begin b:="'"; 3017loop@+ begin app_repl(b); 3018 if b="@@" then 3019 if buffer[loc]="@@" then incr(loc) {store only one \.{@@}} 3020 else err_print('! You should double @@ signs in strings'); 3021@.You should double \AT! signs@> 3022 if loc=limit then 3023 begin err_print('! String didn''t end'); 3024@.String didn't end@> 3025 buffer[loc]:="'"; buffer[loc+1]:=0; 3026 end; 3027 b:=buffer[loc]; incr(loc); 3028 if b="'" then 3029 begin if buffer[loc]<>"'" then goto found 3030 else begin incr(loc); app_repl("'"); 3031 end; 3032 end; 3033 end; 3034found: end {now |a| holds the final |"'"| that will be stored} 3035 3036@ @<Copy verbatim string...@>= 3037begin app_repl(verbatim); 3038buffer[limit+1]:="@@"; 3039reswitch: if buffer[loc]="@@" then 3040 begin if loc<limit then if buffer[loc+1]="@@" then 3041 begin app_repl("@@"); 3042 loc:=loc+2; 3043 goto reswitch; 3044 end; 3045 end 3046else begin app_repl(buffer[loc]); 3047 incr(loc); 3048 goto reswitch; 3049 end; 3050if loc>=limit then err_print('! Verbatim string didn''t end') 3051@.Verbatim string didn't end@> 3052else if buffer[loc+1]<>">" then 3053 err_print('! You should double @@ signs in verbatim strings'); 3054@.You should double \AT! signs@> 3055loc:=loc+2; 3056end {another |verbatim| byte will be stored, since |a=verbatim|} 3057 3058@ The following procedure is used to define a simple or parametric macro, 3059just after the `\.{==}' of its definition has been scanned. 3060 3061@p procedure define_macro(@!t:eight_bits); 3062var p:name_pointer; {the identifier being defined} 3063begin p:=id_lookup(t); scan_repl(t);@/ 3064equiv[p]:=cur_repl_text; text_link[cur_repl_text]:=0; 3065end; 3066 3067@* Scanning a module. 3068The |scan_module| procedure starts when `\.{@@\ }' or `\.{@@*}' has been 3069sensed in the input, and it proceeds until the end of that module. It 3070uses |module_count| to keep track of the current module number; with luck, 3071\.{WEAVE} and \.{TANGLE} will both assign the same numbers to modules. 3072 3073@<Globals...@>= 3074@!module_count:0..@'27777; {the current module number} 3075 3076@ The top level of |scan_module| is trivial. 3077@p procedure scan_module; 3078label continue, done, exit; 3079var p:name_pointer; {module name for the current module} 3080begin incr(module_count); 3081@<Scan the \(definition part of the current module@>; 3082@<Scan the \PASCAL\ part of the current module@>; 3083exit: end; 3084 3085@ @<Scan the \(definition part...@>= 3086next_control:=0; 3087loop@+ begin continue: while next_control<=format do 3088 begin next_control:=skip_ahead; 3089 if next_control=module_name then 3090 begin {we want to scan the module name too} 3091 loc:=loc-2; next_control:=get_next; 3092 end; 3093 end; 3094 if next_control<>definition then goto done; 3095 next_control:=get_next; {get identifier name} 3096 if next_control<>identifier then 3097 begin err_print('! Definition flushed, must start with ', 3098@.Definition flushed...@> 3099 'identifier of length > 1'); goto continue; 3100 end; 3101 next_control:=get_next; {get token after the identifier} 3102 if next_control="=" then 3103 begin scan_numeric(id_lookup(numeric)); goto continue; 3104 end 3105 else if next_control=equivalence_sign then 3106 begin define_macro(simple); goto continue; 3107 end 3108 else @<If the next text is `|(#)==|', call |define_macro| 3109 and |goto continue|@>; 3110 err_print('! Definition flushed since it starts badly'); 3111@.Definition flushed...@> 3112 end; 3113done: 3114 3115@ @<If the next text is `|(#)==|'...@>= 3116if next_control="(" then 3117 begin next_control:=get_next; 3118 if next_control="#" then 3119 begin next_control:=get_next; 3120 if next_control=")" then 3121 begin next_control:=get_next; 3122 if next_control="=" then 3123 begin err_print('! Use == for macros'); 3124@.Use == for macros@> 3125 next_control:=equivalence_sign; 3126 end; 3127 if next_control=equivalence_sign then 3128 begin define_macro(parametric); goto continue; 3129 end; 3130 end; 3131 end; 3132 end; 3133 3134@ @<Scan the \PASCAL...@>= 3135case next_control of 3136begin_Pascal:p:=0; 3137module_name: begin p:=cur_module; 3138 @<Check that |=| or |==| follows this module name, otherwise |return|@>; 3139 end; 3140othercases return 3141endcases;@/ 3142@<Insert the module number into |tok_mem|@>; 3143scan_repl(module_name); {now |cur_repl_text| points to the replacement text} 3144@<Update the data structure so that the replacement text is accessible@>; 3145 3146@ @<Check that |=|...@>= 3147repeat next_control:=get_next; 3148until next_control<>"+"; {allow optional `\.{+=}'} 3149if (next_control<>"=")and(next_control<>equivalence_sign) then 3150 begin err_print('! Pascal text flushed, = sign is missing'); 3151@.Pascal text flushed...@> 3152 repeat next_control:=skip_ahead; 3153 until next_control=new_module; 3154 return; 3155 end 3156 3157@ @<Insert the module number...@>= 3158store_two_bytes(@'150000+module_count); {|@'150000=@'320*@'400|} 3159 3160@ @<Update the data...@>= 3161if p=0 then {unnamed module} 3162 begin text_link[last_unnamed]:=cur_repl_text; last_unnamed:=cur_repl_text; 3163 end 3164else if equiv[p]=0 then equiv[p]:=cur_repl_text {first module of this name} 3165else begin p:=equiv[p]; 3166 while text_link[p]<module_flag do p:=text_link[p]; {find end of list} 3167 text_link[p]:=cur_repl_text; 3168 end; 3169text_link[cur_repl_text]:=module_flag; 3170 {mark this replacement text as a nonmacro} 3171 3172@* Debugging. 3173The \PASCAL\ debugger with which \.{TANGLE} was developed allows breakpoints 3174to be set, and variables can be read and changed, but procedures cannot be 3175executed. Therefore a `|debug_help|' procedure has been inserted in the main 3176loops of each phase of the program; when |ddt| and |dd| are set to appropriate 3177values, symbolic printouts of various tables will appear. 3178 3179The idea is to set a breakpoint inside the |debug_help| routine, at the 3180place of `\ignorespaces|breakpoint:|\unskip' below. Then when 3181|debug_help| is to be activated, set |trouble_shooting| equal to |true|. 3182The |debug_help| routine will prompt you for values of |ddt| and |dd|, 3183discontinuing this when |ddt<=0|; thus you type $2n+1$ integers, ending 3184with zero or a negative number. Then control either passes to the 3185breakpoint, allowing you to look at and/or change variables (if you typed 3186zero), or to exit the routine (if you typed a negative value). 3187 3188Another global variable, |debug_cycle|, can be used to skip silently 3189past calls on |debug_help|. If you set |debug_cycle>1|, the program stops 3190only every |debug_cycle| times |debug_help| is called; however, 3191any error stop will set |debug_cycle| to zero. 3192 3193@<Globals...@>= 3194@!debug@!trouble_shooting:boolean; {is |debug_help| wanted?} 3195@!ddt:integer; {operation code for the |debug_help| routine} 3196@!dd:integer; {operand in procedures performed by |debug_help|} 3197@!debug_cycle:integer; {threshold for |debug_help| stopping} 3198@!debug_skipped:integer; {we have skipped this many |debug_help| calls} 3199@!term_in:text_file; {the user's terminal as an input file} 3200gubed 3201 3202@ The debugging routine needs to read from the user's terminal. 3203@^system dependencies@> 3204@<Set init...@>= 3205@!debug trouble_shooting:=true; debug_cycle:=1; debug_skipped:=0;@/ 3206trouble_shooting:=false; debug_cycle:=99999; {use these when it almost works} 3207reset(term_in,'TTY:','/I'); {open |term_in| as the terminal, don't do a |get|} 3208gubed 3209 3210@ @d breakpoint=888 {place where a breakpoint is desirable} 3211@^system dependencies@> 3212 3213@p @!debug procedure debug_help; {routine to display various things} 3214label breakpoint,exit; 3215var k:integer; {index into various arrays} 3216begin incr(debug_skipped); 3217if debug_skipped<debug_cycle then return; 3218debug_skipped:=0; 3219loop@+ begin print_nl('#'); update_terminal; {prompt} 3220 read(term_in,ddt); {read a debug-command code} 3221 if ddt<0 then return 3222 else if ddt=0 then 3223 begin goto breakpoint;@\ {go to every label at least once} 3224 breakpoint: ddt:=0;@\ 3225 end 3226 else begin read(term_in,dd); 3227 case ddt of 3228 1: print_id(dd); 3229 2: print_repl(dd); 3230 3: for k:=1 to dd do print(xchr[buffer[k]]); 3231 4: for k:=1 to dd do print(xchr[mod_text[k]]); 3232 5: for k:=1 to out_ptr do print(xchr[out_buf[k]]); 3233 6: for k:=1 to dd do print(xchr[out_contrib[k]]); 3234 othercases print('?') 3235 endcases; 3236 end; 3237 end; 3238exit:end; 3239gubed 3240 3241@* The main program. 3242We have defined plenty of procedures, and it is time to put the last 3243pieces of the puzzle in place. Here is where \.{TANGLE} starts, and where 3244it ends. 3245@^system dependencies@> 3246 3247@p begin initialize; 3248@<Initialize the input system@>; 3249print_ln(banner); {print a ``banner line''} 3250@<Phase I: Read all the user's text and compress it into |tok_mem|@>; 3251stat for ii:=0 to zz-1 do max_tok_ptr[ii]:=tok_ptr[ii];@+tats@;@/ 3252@<Phase II:...@>; 3253end_of_TANGLE: 3254if string_ptr>256 then @<Finish off the string pool file@>; 3255stat @<Print statistics about memory usage@>;@+tats@;@/ 3256@t\4\4@>{here files should be closed if the operating system requires it} 3257@<Print the job |history|@>; 3258end. 3259 3260@ @<Phase I:...@>= 3261phase_one:=true; 3262module_count:=0; 3263repeat next_control:=skip_ahead; 3264until next_control=new_module; 3265while not input_has_ended do scan_module; 3266@<Check that all changes have been read@>; 3267phase_one:=false; 3268 3269@ @<Finish off the string pool file@>= 3270begin print_nl(string_ptr-256:1, ' strings written to string pool file.'); 3271write(pool,'*'); 3272for ii:=1 to 9 do 3273 begin out_buf[ii]:=pool_check_sum mod 10; 3274 pool_check_sum:=pool_check_sum div 10; 3275 end; 3276for ii:=9 downto 1 do write(pool,xchr["0"+out_buf[ii]]); 3277write_ln(pool); 3278end 3279 3280@ @<Glob...@>= 3281stat @!wo:0..ww-1; {segment of memory for which statistics are being printed} 3282tats 3283 3284@ @<Print statistics about memory usage@>= 3285print_nl('Memory usage statistics:'); 3286print_nl(name_ptr:1, ' names, ', text_ptr:1, ' replacement texts;'); 3287print_nl(byte_ptr[0]:1); 3288for wo:=1 to ww-1 do print('+',byte_ptr[wo]:1); 3289if phase_one then 3290 for ii:=0 to zz-1 do max_tok_ptr[ii]:=tok_ptr[ii]; 3291print(' bytes, ', max_tok_ptr[0]:1); 3292for ii:=1 to zz-1 do print('+',max_tok_ptr[ii]:1); 3293print(' tokens.'); 3294 3295@ Some implementations may wish to pass the |history| value to the 3296operating system so that it can be used to govern whether or not other 3297programs are started. Here we simply report the history to the user. 3298@^system dependencies@> 3299 3300@<Print the job |history|@>= 3301case history of 3302spotless: print_nl('(No errors were found.)'); 3303harmless_message: print_nl('(Did you see the warning message above?)'); 3304error_message: print_nl('(Pardon me, but I think I spotted something wrong.)'); 3305fatal_message: print_nl('(That was a fatal error, my friend.)'); 3306end {there are no other cases} 3307 3308@* System-dependent changes. 3309This module should be replaced, if necessary, by changes to the program 3310that are necessary to make \.{TANGLE} work at a particular installation. 3311It is usually best to design your change file so that all changes to 3312previous modules preserve the module numbering; then everybody's version 3313will be consistent with the printed program. More extensive changes, 3314which introduce new modules, can be inserted here; then only the index 3315itself will get a new module number. 3316@^system dependencies@> 3317 3318@* Index. 3319Here is a cross-reference table for the \.{TANGLE} processor. 3320All modules in which an identifier is 3321used are listed with that identifier, except that reserved words are 3322indexed only when they appear in format definitions, and the appearances 3323of identifiers in module names are not indexed. Underlined entries 3324correspond to where the identifier was declared. Error messages and 3325a few other things like ``ASCII code'' are indexed here too. 3326