1% otangle.web: Omega version of file tangle.web 2% 3% This file is part of the Omega project, which 4% is based in the web2c distribution of TeX. 5% 6% Copyright (c) 1994--1998 John Plaice and Yannis Haralambous 7% applies only to the changes to the original tangle.web. 8% 9% This program by D. E. Knuth is not copyrighted and can be used freely. 10% Version 0 was released in December, 1981. 11% Version 1 was released in September, 1982, with version 0 of TeX. 12% Slight changes were made in October, 1982, for version 0.6 of TeX. 13% Version 1.2 introduced {:nnn} comments, added @@= and @@\ (December, 1982). 14% Version 1.4 added "history" (February, 1983). 15% Version 1.5 conformed to TeX version 0.96 and fixed @@\ (March, 1983). 16% Version 1.7 introduced the new change file format (June, 1983). 17% Version 2.0 was released in July, 1983, with version 0.999 of TeX. 18% Version 2.5 was released in November, 1983, with version 1.0 of TeX. 19% Version 2.6 fixed a bug: force-line-break after a constant (August, 1984). 20% Version 2.7 fixed the definition of check_sum_prime (May, 1985). 21% Version 2.8 fixed a bug in change_buffer movement (August, 1985). 22% Version 2.9 allows nonnumeric macros before their def (December, 1988). 23% Version 3, for Sewell's book, fixed long-line bug in input_ln (March, 1989). 24% Version 4 was major change to allow 8-bit input (September, 1989). 25% Version 4.1 conforms to ANSI standard for-loop rules (September, 1990). 26% Version 4.2 fixes stat report if phase one dies (March, 1991). 27% Version 4.3 fixes @@ bug in verbatim, catches extra } (September, 1991). 28% Version 4.4 activates debug_help on errors as advertised (February, 1993). 29 30% Here is TeX material that gets inserted after \input webmac 31\def\hang{\hangindent 3em\indent\ignorespaces} 32\font\ninerm=cmr9 33\let\mc=\ninerm % medium caps for names like SAIL 34\def\PASCAL{Pascal} 35\def\pb{$\.|\ldots\.|$} % Pascal brackets (|...|) 36\def\v{\.{\char'174}} % vertical (|) in typewriter font 37\mathchardef\BA="3224 % double arrow 38\def\({} % kludge for alphabetizing certain module names 39 40\def\title{OTANGLE} 41\def\contentspagenumber{123} % should be odd 42\def\topofcontents{\null\vfill 43 \titlefalse % include headline on the contents page 44 \def\rheader{\mainfont Appendix E\hfil \contentspagenumber} 45 \centerline{\titlefont The {\ttitlefont OTANGLE} processor} 46 \vskip 15pt 47 \centerline{(Version 4.4)} 48 \vfill} 49\pageno=\contentspagenumber \advance\pageno by 1 50 51@* Introduction. 52This program converts a \.{WEB} file to a \PASCAL\ file. It was written 53by D. E. Knuth in September, 1981; a somewhat similar {\mc SAIL} program had 54been developed in March, 1979. Since this program describes itself, a 55bootstrapping process involving hand-translation had to be used to get started. 56 57For large \.{WEB} files one should have a large memory, since \.{TANGLE} keeps 58all the \PASCAL\ text in memory (in an abbreviated form). The program uses 59a few features of the local \PASCAL\ compiler that may need to be changed in 60other installations: 61 62\yskip\item{1)} Case statements have a default. 63\item{2)} Input-output routines may need to be adapted for use with a particular 64character set and/or for printing messages on the user's terminal. 65 66\yskip\noindent 67These features are also present in the \PASCAL\ version of \TeX, where they 68are used in a similar (but more complex) way. System-dependent portions 69of \.{TANGLE} can be identified by looking at the entries for `system 70dependencies' in the index below. 71@!@^system dependencies@> 72 73The ``banner line'' defined here should be changed whenever \.{TANGLE} 74is modified. 75 76@d banner=='This is OTANGLE, Version 4.4' 77 78@ The program begins with a fairly normal header, made up of pieces that 79@^system dependencies@> 80will mostly be filled in later. The \.{WEB} input comes from files |web_file| 81and |change_file|, the \PASCAL\ output goes to file |Pascal_file|, 82and the string pool output goes to file |pool|. 83 84If it is necessary to abort the job because of a fatal error, the program 85calls the `|jump_out|' procedure, which goes to the label |end_of_TANGLE|. 86 87@d end_of_TANGLE = 9999 {go here to wrap it up} 88 89@p @t\4@>@<Compiler directives@>@/ 90program OTANGLE(@!web_file,@!change_file,@!Pascal_file,@!pool); 91label end_of_TANGLE; {go here to finish} 92const @<Constants in the outer block@>@/ 93type @<Types in the outer block@>@/ 94var @<Globals in the outer block@>@/ 95@<Error handling procedures@>@/ 96procedure initialize; 97 var @<Local variables for initialization@>@/ 98 begin @<Set initial values@>@/ 99 end; 100 101@ Some of this code is optional for use when debugging only; 102such material is enclosed between the delimiters |debug| and $|gubed|$. 103Other parts, delimited by |stat| and $|tats|$, are optionally included if 104statistics about \.{TANGLE}'s memory usage are desired. 105 106@d debug==@{ {change this to `$\\{debug}\equiv\null$' when debugging} 107@d gubed==@t@>@} {change this to `$\\{gubed}\equiv\null$' when debugging} 108@f debug==begin 109@f gubed==end 110@# 111@d stat==@{ {change this to `$\\{stat}\equiv\null$' 112 when gathering usage statistics} 113@d tats==@t@>@} {change this to `$\\{tats}\equiv\null$' 114 when gathering usage statistics} 115@f stat==begin 116@f tats==end 117 118@ The \PASCAL\ compiler used to develop this system has ``compiler 119directives'' that can appear in comments whose first character is a dollar sign. 120In production versions of \.{TANGLE} these directives tell the compiler that 121@^system dependencies@> 122it is safe to avoid range checks and to leave out the extra code it inserts 123for the \PASCAL\ debugger's benefit, although interrupts will occur if 124there is arithmetic overflow. 125 126@<Compiler directives@>= 127@{@&$C-,A+,D-@} {no range check, catch arithmetic overflow, no debug overhead} 128@!debug @{@&$C+,D+@}@+ gubed {but turn everything on when debugging} 129 130@ Labels are given symbolic names by the following definitions. We insert 131the label `|exit|:' just before the `\ignorespaces|end|\unskip' of a 132procedure in which we have used the `|return|' statement defined below; 133the label `|restart|' is occasionally used at the very beginning of a 134procedure; and the label `|reswitch|' is occasionally used just prior to 135a \&{case} statement in which some cases change the conditions and we wish to 136branch to the newly applicable case. 137Loops that are set up with the \&{loop} construction defined below are 138commonly exited by going to `|done|' or to `|found|' or to `|not_found|', 139and they are sometimes repeated by going to `|continue|'. 140 141@d exit=10 {go here to leave a procedure} 142@d restart=20 {go here to start a procedure again} 143@d reswitch=21 {go here to start a case statement again} 144@d continue=22 {go here to resume a loop} 145@d done=30 {go here to exit a loop} 146@d found=31 {go here when you've found it} 147@d not_found=32 {go here when you've found something else} 148 149@ Here are some macros for common programming idioms. 150 151@d incr(#) == #:=#+1 {increase a variable by unity} 152@d decr(#) == #:=#-1 {decrease a variable by unity} 153@d loop == @+ while true do@+ {repeat over and over until a |goto| happens} 154@d do_nothing == {empty statement} 155@d return == goto exit {terminate a procedure call} 156@f return == nil 157@f loop == xclause 158 159@ We assume that |case| statements may include a default case that applies 160if no matching label is found. Thus, we shall use constructions like 161@^system dependencies@> 162$$\vbox{\halign{#\hfil\cr 163|case x of|\cr 1641: $\langle\,$code for $x=1\,\rangle$;\cr 1653: $\langle\,$code for $x=3\,\rangle$;\cr 166|othercases| $\langle\,$code for |x<>1| and |x<>3|$\,\rangle$\cr 167|endcases|\cr}}$$ 168since most \PASCAL\ compilers have plugged this hole in the language by 169incorporating some sort of default mechanism. For example, the compiler 170used to develop \.{WEB} and \TeX\ allows `|others|:' as a default label, 171and other \PASCAL s allow syntaxes like `\ignorespaces|else|\unskip' or 172`\&{otherwise}' or `\\{otherwise}:', etc. The definitions of |othercases| 173and |endcases| should be changed to agree with local conventions. (Of 174course, if no default mechanism is available, the |case| statements of 175this program must be extended by listing all remaining cases. The author 176would have taken the trouble to modify \.{TANGLE} so that such extensions 177were done automatically, if he had not wanted to encourage \PASCAL\ 178compiler writers to make this important change in \PASCAL, where it belongs.) 179 180@d othercases == others: {default for cases not listed explicitly} 181@d endcases == @+end {follows the default case in an extended |case| statement} 182@f othercases == else 183@f endcases == end 184 185@ The following parameters are set big enough to handle \TeX, so they 186should be sufficient for most applications of \.{TANGLE}. 187 188@<Constants...@>= 189@!buf_size=100; {maximum length of input line} 190@!max_bytes=65535; {|1/ww| times the number of bytes in identifiers, 191 strings, and module names; must be less than 65536} 192@!max_toks=65535; {|1/zz| times the number of bytes in compressed \PASCAL\ code; 193 must be less than 65536} 194@!max_names=10239; {number of identifiers, strings, module names; 195 must be less than 10240} 196@!max_texts=10239; {number of replacement texts, must be less than 10240} 197@!hash_size=353; {should be prime} 198@!longest_name=400; {module names shouldn't be longer than this} 199@!line_length=72; {lines of \PASCAL\ output have at most this many characters} 200@!out_buf_size=144; {length of output buffer, should be twice |line_length|} 201@!stack_size=50; {number of simultaneous levels of macro expansion} 202@!max_id_length=12; {long identifiers are chopped to this length, which must 203 not exceed |line_length|} 204@!unambig_length=7; {identifiers must be unique if chopped to this length} 205 {note that 7 is more strict than \PASCAL's 8, but this can be varied} 206 207@ A global variable called |history| will contain one of four values 208at the end of every run: |spotless| means that no unusual messages were 209printed; |harmless_message| means that a message of possible interest 210was printed but no serious errors were detected; |error_message| means that 211at least one error was found; |fatal_message| means that the program 212terminated abnormally. The value of |history| does not influence the 213behavior of the program; it is simply computed for the convenience 214of systems that might want to use such information. 215 216@d spotless=0 {|history| value for normal jobs} 217@d harmless_message=1 {|history| value when non-serious info was printed} 218@d error_message=2 {|history| value when an error was noted} 219@d fatal_message=3 {|history| value when we had to stop prematurely} 220@# 221@d mark_harmless==@t@>@+if history=spotless then history:=harmless_message 222@d mark_error==history:=error_message 223@d mark_fatal==history:=fatal_message 224@d biggest_char=65535 225@d number_chars=65536 226 227@<Glob...@>=@!history:spotless..fatal_message; {how bad was this run?} 228 229@ @<Set init...@>=history:=spotless; 230 231@* The character set. 232One of the main goals in the design of \.{WEB} has been to make it readily 233portable between a wide variety of computers. Yet \.{WEB} by its very 234nature must use a greater variety of characters than most computer 235programs deal with, and character encoding is one of the areas in which 236existing machines differ most widely from each other. 237 238To resolve this problem, all input to \.{WEAVE} and \.{TANGLE} is converted 239to an internal eight-bit code that is essentially standard ASCII, the ``American 240Standard Code for Information Interchange.'' The conversion is done 241immediately when each character is read in. Conversely, characters are 242converted from ASCII to the user's external representation just before 243they are output. (The original ASCII code was seven bits only; \.{WEB} now 244allows eight bits in an attempt to keep up with modern times.) 245 246Such an internal code is relevant to users of \.{WEB} only because it is 247the code used for preprocessed constants like \.{"A"}. If you are writing 248a program in \.{WEB} that makes use of such one-character constants, you 249should convert your input to ASCII form, like \.{WEAVE} and \.{TANGLE} do. 250Otherwise \.{WEB}'s internal coding scheme does not affect you. 251@^ASCII code@> 252 253Here is a table of the standard visible ASCII codes: 254$$\def\:{\char\count255\global\advance\count255 by 1} 255\count255='40 256\vbox{ 257\hbox{\hbox to 40pt{\it\hfill0\/\hfill}% 258\hbox to 40pt{\it\hfill1\/\hfill}% 259\hbox to 40pt{\it\hfill2\/\hfill}% 260\hbox to 40pt{\it\hfill3\/\hfill}% 261\hbox to 40pt{\it\hfill4\/\hfill}% 262\hbox to 40pt{\it\hfill5\/\hfill}% 263\hbox to 40pt{\it\hfill6\/\hfill}% 264\hbox to 40pt{\it\hfill7\/\hfill}} 265\vskip 4pt 266\hrule 267\def\^{\vrule height 10.5pt depth 4.5pt} 268\halign{\hbox to 0pt{\hskip -24pt\O{#0}\hfill}&\^ 269\hbox to 40pt{\tt\hfill#\hfill\^}& 270&\hbox to 40pt{\tt\hfill#\hfill\^}\cr 27104&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 27205&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 27306&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 27407&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 27510&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 27611&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 27712&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 27813&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 27914&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 28015&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 28116&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 28217&\:&\:&\:&\:&\:&\:&\:\cr} 283\hrule width 280pt}$$ 284(Actually, of course, code @'040 is an invisible blank space.) Code @'136 285was once an upward arrow (\.{\char'13}), and code @'137 was 286once a left arrow (\.^^X), in olden times when the first draft 287of ASCII code was prepared; but \.{WEB} works with today's standard 288ASCII in which those codes represent circumflex and underline as shown. 289 290@<Types...@>= 291@!ASCII_code=0..255; {eight-bit numbers, a subrange of the integers} 292 293@ The original \PASCAL\ compiler was designed in the late 60s, when six-bit 294character sets were common, so it did not make provision for lowercase 295letters. Nowadays, of course, we need to deal with both capital and small 296letters in a convenient way, so \.{WEB} assumes that it is being used 297with a \PASCAL\ whose character set contains at least the characters of 298standard ASCII as listed above. Some \PASCAL\ compilers use the original 299name |char| for the data type associated with the characters in text files, 300while other \PASCAL s consider |char| to be a 64-element subrange of a larger 301data type that has some other name. 302 303In order to accommodate this difference, we shall use the name |text_char| 304to stand for the data type of the characters in the input and output 305files. We shall also assume that |text_char| consists of the elements 306|chr(first_text_char)| through |chr(last_text_char)|, inclusive. The 307following definitions should be adjusted if necessary. 308@^system dependencies@> 309 310@d text_char == char {the data type of characters in text files} 311@d first_text_char=0 {ordinal number of the smallest element of |text_char|} 312@d last_text_char=255 {ordinal number of the largest element of |text_char|} 313 314@<Types...@>= 315@!text_file=packed file of text_char; 316 317@ The \.{WEAVE} and \.{TANGLE} processors convert between ASCII code and 318the user's external character set by means of arrays |xord| and |xchr| 319that are analogous to \PASCAL's |ord| and |chr| functions. 320 321@<Globals...@>= 322@!xord: array [text_char] of ASCII_code; 323 {specifies conversion of input characters} 324@!xchr: array [ASCII_code] of text_char; 325 {specifies conversion of output characters} 326 327@ If we assume that every system using \.{WEB} is able to read and write the 328visible characters of standard ASCII (although not necessarily using the 329ASCII codes to represent them), the following assignment statements initialize 330most of the |xchr| array properly, without needing any system-dependent 331changes. For example, the statement \.{xchr[@@\'101]:=\'A\'} that appears 332in the present \.{WEB} file might be encoded in, say, {\mc EBCDIC} code 333on the external medium on which it resides, but \.{TANGLE} will convert from 334this external code to ASCII and back again. Therefore the assignment 335statement \.{XCHR[65]:=\'A\'} will appear in the corresponding \PASCAL\ file, 336and \PASCAL\ will compile this statement so that |xchr[65]| receives the 337character \.A in the external (|char|) code. Note that it would be quite 338incorrect to say \.{xchr[@@\'101]:="A"}, because |"A"| is a constant of 339type |integer|, not |char|, and because we have $|"A"|=65$ regardless of 340the external character set. 341 342@<Set init...@>= 343xchr[@'40]:=' '; 344xchr[@'41]:='!'; 345xchr[@'42]:='"'; 346xchr[@'43]:='#'; 347xchr[@'44]:='$'; 348xchr[@'45]:='%'; 349xchr[@'46]:='&'; 350xchr[@'47]:='''';@/ 351xchr[@'50]:='('; 352xchr[@'51]:=')'; 353xchr[@'52]:='*'; 354xchr[@'53]:='+'; 355xchr[@'54]:=','; 356xchr[@'55]:='-'; 357xchr[@'56]:='.'; 358xchr[@'57]:='/';@/ 359xchr[@'60]:='0'; 360xchr[@'61]:='1'; 361xchr[@'62]:='2'; 362xchr[@'63]:='3'; 363xchr[@'64]:='4'; 364xchr[@'65]:='5'; 365xchr[@'66]:='6'; 366xchr[@'67]:='7';@/ 367xchr[@'70]:='8'; 368xchr[@'71]:='9'; 369xchr[@'72]:=':'; 370xchr[@'73]:=';'; 371xchr[@'74]:='<'; 372xchr[@'75]:='='; 373xchr[@'76]:='>'; 374xchr[@'77]:='?';@/ 375xchr[@'100]:='@@'; 376xchr[@'101]:='A'; 377xchr[@'102]:='B'; 378xchr[@'103]:='C'; 379xchr[@'104]:='D'; 380xchr[@'105]:='E'; 381xchr[@'106]:='F'; 382xchr[@'107]:='G';@/ 383xchr[@'110]:='H'; 384xchr[@'111]:='I'; 385xchr[@'112]:='J'; 386xchr[@'113]:='K'; 387xchr[@'114]:='L'; 388xchr[@'115]:='M'; 389xchr[@'116]:='N'; 390xchr[@'117]:='O';@/ 391xchr[@'120]:='P'; 392xchr[@'121]:='Q'; 393xchr[@'122]:='R'; 394xchr[@'123]:='S'; 395xchr[@'124]:='T'; 396xchr[@'125]:='U'; 397xchr[@'126]:='V'; 398xchr[@'127]:='W';@/ 399xchr[@'130]:='X'; 400xchr[@'131]:='Y'; 401xchr[@'132]:='Z'; 402xchr[@'133]:='['; 403xchr[@'134]:='\'; 404xchr[@'135]:=']'; 405xchr[@'136]:='^'; 406xchr[@'137]:='_';@/ 407xchr[@'140]:='`'; 408xchr[@'141]:='a'; 409xchr[@'142]:='b'; 410xchr[@'143]:='c'; 411xchr[@'144]:='d'; 412xchr[@'145]:='e'; 413xchr[@'146]:='f'; 414xchr[@'147]:='g';@/ 415xchr[@'150]:='h'; 416xchr[@'151]:='i'; 417xchr[@'152]:='j'; 418xchr[@'153]:='k'; 419xchr[@'154]:='l'; 420xchr[@'155]:='m'; 421xchr[@'156]:='n'; 422xchr[@'157]:='o';@/ 423xchr[@'160]:='p'; 424xchr[@'161]:='q'; 425xchr[@'162]:='r'; 426xchr[@'163]:='s'; 427xchr[@'164]:='t'; 428xchr[@'165]:='u'; 429xchr[@'166]:='v'; 430xchr[@'167]:='w';@/ 431xchr[@'170]:='x'; 432xchr[@'171]:='y'; 433xchr[@'172]:='z'; 434xchr[@'173]:='{'; 435xchr[@'174]:='|'; 436xchr[@'175]:='}'; 437xchr[@'176]:='~';@/ 438xchr[0]:=' '; xchr[@'177]:=' '; {these ASCII codes are not used} 439 440@ Some of the ASCII codes below @'40 have been given symbolic names in 441\.{WEAVE} and \.{TANGLE} because they are used with a special meaning. 442 443@d and_sign=@'4 {equivalent to `\.{and}'} 444@d not_sign=@'5 {equivalent to `\.{not}'} 445@d set_element_sign=@'6 {equivalent to `\.{in}'} 446@d tab_mark=@'11 {ASCII code used as tab-skip} 447@d line_feed=@'12 {ASCII code thrown away at end of line} 448@d form_feed=@'14 {ASCII code used at end of page} 449@d carriage_return=@'15 {ASCII code used at end of line} 450@d left_arrow=@'30 {equivalent to `\.{:=}'} 451@d not_equal=@'32 {equivalent to `\.{<>}'} 452@d less_or_equal=@'34 {equivalent to `\.{<=}'} 453@d greater_or_equal=@'35 {equivalent to `\.{>=}'} 454@d equivalence_sign=@'36 {equivalent to `\.{==}'} 455@d or_sign=@'37 {equivalent to `\.{or}'} 456 457@ When we initialize the |xord| array and the remaining parts of |xchr|, 458it will be convenient to make use of an index variable, |i|. 459 460@<Local variables for init...@>= 461@!i:0..255; 462 463@ Here now is the system-dependent part of the character set. 464If \.{WEB} is being implemented on a garden-variety \PASCAL\ for which 465only standard ASCII codes will appear in the input and output files, you 466don't need to make any changes here. But if you have, for example, an extended 467character set like the one in Appendix~C of {\sl The \TeX book}, the first 468line of code in this module should be changed to 469$$\hbox{|for i:=1 to @'37 do xchr[i]:=chr(i);|}$$ 470\.{WEB}'s character set is essentially identical to \TeX's, even with respect to 471characters less than @'40. 472@^system dependencies@> 473 474Changes to the present module will make \.{WEB} more friendly on computers 475that have an extended character set, so that one can type things like 476\.^^Z\ instead of \.{<>}. If you have an extended set of characters that 477are easily incorporated into text files, you can assign codes arbitrarily 478here, giving an |xchr| equivalent to whatever characters the users of 479\.{WEB} are allowed to have in their input files, provided that unsuitable 480characters do not correspond to special codes like |carriage_return| 481that are listed above. 482 483(The present file \.{TANGLE.WEB} does not contain any of the non-ASCII 484characters, because it is intended to be used with all implementations of 485\.{WEB}. It was originally created on a Stanford system that has a 486convenient extended character set, then ``sanitized'' by applying another 487program that transliterated all of the non-standard characters into 488standard equivalents.) 489 490@<Set init...@>= 491for i:=1 to @'37 do xchr[i]:=' '; 492for i:=@'200 to @'377 do xchr[i]:=' '; 493 494@ The following system-independent code makes the |xord| array contain a 495suitable inverse to the information in |xchr|. 496 497@<Set init...@>= 498for i:=first_text_char to last_text_char do xord[chr(i)]:=" "; 499for i:=1 to @'377 do xord[xchr[i]]:=i; 500xord[' ']:=" "; 501 502@* Input and output. 503The input conventions of this program are intended to be very much like those 504of \TeX\ (except, of course, that they are much simpler, because much less 505needs to be done). Furthermore they are identical to those of \.{WEAVE}. 506Therefore people who need to make modifications to all three systems 507should be able to do so without too many headaches. 508 509We use the standard \PASCAL\ input/output procedures in several places that 510\TeX\ cannot, since \.{TANGLE} does not have to deal with files that are named 511dynamically by the user, and since there is no input from the terminal. 512 513@ Terminal output is done by writing on file |term_out|, which is assumed to 514consist of characters of type |text_char|: 515@^system dependencies@> 516 517@d print(#)==write(term_out,#) {`|print|' means write on the terminal} 518@d print_ln(#)==write_ln(term_out,#) {`|print|' and then start new line} 519@d new_line==write_ln(term_out) {start new line} 520@d print_nl(#)== {print information starting on a new line} 521 begin new_line; print(#); 522 end 523 524@<Globals...@>= 525@!term_out:text_file; {the terminal as an output file} 526 527@ Different systems have different ways of specifying that the output on a 528certain file will appear on the user's terminal. Here is one way to do this 529on the \PASCAL\ system that was used in \.{TANGLE}'s initial development: 530@^system dependencies@> 531 532@<Set init...@>= 533rewrite(term_out,'TTY:'); {send |term_out| output to the terminal} 534 535@ The |update_terminal| procedure is called when we want 536to make sure that everything we have output to the terminal so far has 537actually left the computer's internal buffers and been sent. 538@^system dependencies@> 539 540@d update_terminal == break(term_out) {empty the terminal output buffer} 541 542@ The main input comes from |web_file|; this input may be overridden 543by changes in |change_file|. (If |change_file| is empty, there are no changes.) 544 545@<Globals...@>= 546@!web_file:text_file; {primary input} 547@!change_file:text_file; {updates} 548 549@ The following code opens the input files. Since these files were listed 550in the program header, we assume that the \PASCAL\ runtime system has 551already checked that suitable file names have been given; therefore no 552additional error checking needs to be done. 553@^system dependencies@> 554 555@p procedure open_input; {prepare to read |web_file| and |change_file|} 556begin reset(web_file); reset(change_file); 557end; 558 559@ The main output goes to |Pascal_file|, and string pool constants are 560written to the |pool| file. 561 562@<Globals...@>= 563@!Pascal_file: text_file; 564@!pool: text_file; 565 566@ The following code opens |Pascal_file| and |pool|. 567Since these files were listed in the program header, we assume that the 568\PASCAL\ runtime system has checked that suitable external file names have 569been given. 570@^system dependencies@> 571 572@<Set init...@>= 573rewrite(Pascal_file); rewrite(pool); 574 575@ Input goes into an array called |buffer|. 576 577@<Globals...@>=@!buffer: array[0..buf_size] of ASCII_code; 578 579@ The |input_ln| procedure brings the next line of input from the specified 580file into the |buffer| array and returns the value |true|, unless the file has 581already been entirely read, in which case it returns |false|. The conventions 582of \TeX\ are followed; i.e., |ASCII_code| numbers representing the next line 583of the file are input into |buffer[0]|, |buffer[1]|, \dots, 584|buffer[limit-1]|; trailing blanks are ignored; 585and the global variable |limit| is set to the length of the 586@^system dependencies@> 587line. The value of |limit| must be strictly less than |buf_size|. 588 589We assume that none of the |ASCII_code| values 590of |buffer[j]| for |0<=j<limit| is equal to 0, @'177, |line_feed|, |form_feed|, 591or |carriage_return|. 592 593@p function input_ln(var f:text_file):boolean; 594 {inputs a line or returns |false|} 595var final_limit:0..buf_size; {|limit| without trailing blanks} 596begin limit:=0; final_limit:=0; 597if eof(f) then input_ln:=false 598else begin while not eoln(f) do 599 begin buffer[limit]:=xord[f^]; get(f); 600 incr(limit); 601 if buffer[limit-1]<>" " then final_limit:=limit; 602 if limit=buf_size then 603 begin while not eoln(f) do get(f); 604 decr(limit); {keep |buffer[buf_size]| empty} 605 if final_limit>limit then final_limit:=limit; 606 print_nl('! Input line too long'); loc:=0; error; 607@.Input line too long@> 608 end; 609 end; 610 read_ln(f); limit:=final_limit; input_ln:=true; 611 end; 612end; 613 614@* Reporting errors to the user. 615The \.{TANGLE} processor operates in two phases: first it inputs the source 616file and stores a compressed representation of the program, then it produces 617the \PASCAL\ output from the compressed representation. 618 619The global variable |phase_one| tells whether we are in Phase I or not. 620 621@<Globals...@>= 622@!phase_one: boolean; {|true| in Phase I, |false| in Phase II} 623 624@ If an error is detected while we are debugging, 625we usually want to look at the contents of memory. 626A special procedure will be declared later for this purpose. 627 628@<Error handling...@>= 629@!debug @+ procedure debug_help; forward;@+ gubed 630 631@ During the first phase, syntax errors are reported to the user by saying 632$$\hbox{`|err_print('! Error message')|'},$$ 633followed by `|jump_out|' if no recovery from the error is provided. 634This will print the error message followed by an indication of where the error 635was spotted in the source file. Note that no period follows the error message, 636since the error routine will automatically supply a period. 637 638Errors that are noticed during the second phase are reported to the user 639in the same fashion, but the error message will be 640followed by an indication of where the error was spotted in the output file. 641 642The actual error indications are provided by a procedure called |error|. 643 644@d err_print(#)==begin new_line; print(#); error; 645 end 646 647@<Error handling...@>= 648procedure error; {prints '\..' and location of error message} 649var j: 0..out_buf_size; {index into |out_buf|} 650@!k,@!l: 0..buf_size; {indices into |buffer|} 651begin if phase_one then @<Print error location based on input buffer@> 652else @<Print error location based on output buffer@>; 653update_terminal; mark_error; 654@!debug debug_skipped:=debug_cycle; debug_help;@+gubed 655end; 656 657@ The error locations during Phase I can be indicated by using the global 658variables |loc|, |line|, and |changing|, which tell respectively the first 659unlooked-at position in |buffer|, the current line number, and whether or not 660the current line is from |change_file| or |web_file|. 661This routine should be modified on systems whose standard text editor 662has special line-numbering conventions. 663@^system dependencies@> 664 665@<Print error location based on input buffer@>= 666begin if changing then print('. (change file ')@+else print('. ('); 667print_ln('l.', line:1, ')'); 668if loc>=limit then l:=limit else l:=loc; 669for k:=1 to l do 670 if buffer[k-1]=tab_mark then print(' ') 671 else print(xchr[buffer[k-1]]); {print the characters already read} 672new_line; 673for k:=1 to l do print(' '); {space out the next line} 674for k:=l+1 to limit do print(xchr[buffer[k-1]]); {print the part not yet read} 675print(' '); {this space separates the message from future asterisks} 676end 677 678@ The position of errors detected during the second phase can be indicated 679by outputting the partially-filled output buffer, which contains |out_ptr| 680entries. 681 682@<Print error location based on output...@>= 683begin print_ln('. (l.',line:1,')'); 684for j:=1 to out_ptr do print(xchr[out_buf[j-1]]); {print current partial line} 685print('... '); {indicate that this information is partial} 686end 687 688@ The |jump_out| procedure just cuts across all active procedure levels 689and jumps out of the program. This is the only non-local |goto| statement 690in \.{TANGLE}. It is used when no recovery from a particular error has 691been provided. 692 693Some \PASCAL\ compilers do not implement non-local |goto| statements. 694@^system dependencies@> 695In such cases the code that appears at label |end_of_TANGLE| should be 696copied into the |jump_out| procedure, followed by a call to a system procedure 697that terminates the program. 698 699@d fatal_error(#)==begin new_line; print(#); error; mark_fatal; jump_out; 700 end 701 702@<Error handling...@>= 703procedure jump_out; 704begin goto end_of_TANGLE; 705end; 706 707@ Sometimes the program's behavior is far different from what it should be, 708and \.{TANGLE} prints an error message that is really for the \.{TANGLE} 709maintenance person, not the user. In such cases the program says 710|confusion('indication of where we are')|. 711 712@d confusion(#)==fatal_error('! This can''t happen (',#,')') 713@.This can't happen@> 714 715@ An overflow stop occurs if \.{TANGLE}'s tables aren't large enough. 716 717@d overflow(#)==fatal_error('! Sorry, ',#,' capacity exceeded') 718@.Sorry, x capacity exceeded@> 719 720 721@* Data structures. 722Most of the user's \PASCAL\ code is packed into eight-bit integers 723in two large arrays called |byte_mem| and |tok_mem|. 724The |byte_mem| array holds the names of identifiers, strings, and modules; 725the |tok_mem| array holds the replacement texts 726for macros and modules. Allocation is sequential, since things are deleted only 727during Phase II, and only in a last-in-first-out manner. 728 729Auxiliary arrays |byte_start| and |tok_start| are used as directories to 730|byte_mem| and |tok_mem|, and the |link|, |ilk|, |equiv|, and |text_link| 731arrays give further information about names. These auxiliary arrays 732consist of sixteen-bit items. 733 734@<Types...@>= 735@!eight_bits=0..255; {unsigned one-byte quantity} 736@!sixteen_bits=integer; {not quite a two-byte quantity} 737 738@ \.{TANGLE} has been designed to avoid the need for indices that are more 739than sixteen bits wide, so that it can be used on most computers. But 740there are programs that need more than 65536 tokens, and some programs 741even need more than 65536 bytes; \TeX\ is one of these. To get around 742this problem, a slight complication has been added to the data structures: 743|byte_mem| and |tok_mem| are two-dimensional arrays, whose first index is 744either 0 or 1. (For generality, the first index is actually allowed to run 745between 0 and |ww-1| in |byte_mem|, or between 0 and |zz-1| in |tok_mem|, 746where |ww| and |zz| are set to 2 and~3; the program will work for any 747positive values of |ww| and |zz|, and it can be simplified in obvious ways 748if |ww=1| or |zz=1|.) 749 750@d ww=2 {we multiply the byte capacity by approximately this amount} 751@d zz=3 {we multiply the token capacity by approximately this amount} 752 753@<Globals...@>= 754@!byte_mem: packed array [0..ww-1,0..max_bytes] of ASCII_code; 755 {characters of names} 756@!tok_mem: packed array [0..zz-1,0..max_toks] of eight_bits; {tokens} 757@!byte_start: array [0..max_names] of sixteen_bits; {directory into |byte_mem|} 758@!tok_start: array [0..max_texts] of sixteen_bits; {directory into |tok_mem|} 759@!link: array [0..max_names] of sixteen_bits; {hash table or tree links} 760@!ilk: array [0..max_names] of sixteen_bits; {type codes or tree links} 761@!equiv: array [0..max_names] of sixteen_bits; {info corresponding to names} 762@!text_link: array [0..max_texts] of sixteen_bits; {relates replacement texts} 763 764@ The names of identifiers are found by computing a hash address |h| and 765then looking at strings of bytes signified by |hash[h]|, |link[hash[h]]|, 766|link[link[hash[h]]]|, \dots, until either finding the desired name 767or encountering a zero. 768 769A `|name_pointer|' variable, which signifies a name, is an index into 770|byte_start|. The actual sequence of characters in the name pointed to by 771|p| appears in positions |byte_start[p]| to |byte_start[p+ww]-1|, inclusive, 772in the segment of |byte_mem| whose first index is |p mod ww|. Thus, when 773|ww=2| the even-numbered name bytes appear in |byte_mem[0,@t$*$@>]| 774and the odd-numbered ones appear in |byte_mem[1,@t$*$@>]|. 775The pointer 0 is used for undefined module names; we don't 776want to use it for the names of identifiers, since 0 stands for a null 777pointer in a linked list. 778 779Strings are treated like identifiers; the first character (a double-quote) 780distinguishes a string from an alphabetic name, but for \.{TANGLE}'s purposes 781strings behave like numeric macros. (A `string' here refers to the 782strings delimited by double-quotes that \.{TANGLE} processes. \PASCAL\ 783string constants delimited by single-quote marks are not given such special 784treatment; they simply appear as sequences of characters in the \PASCAL\ 785texts.) The total number of strings in the string 786pool is called |string_ptr|, and the total number of names in |byte_mem| 787is called |name_ptr|. The total number of bytes occupied in 788|byte_mem[w,@t$*$@>]| is called |byte_ptr[w]|. 789 790We usually have |byte_start[name_ptr+w]=byte_ptr[(name_ptr+w) mod ww]| 791for |0<=w<ww|, since these are the starting positions for the next |ww| 792names to be stored in |byte_mem|. 793 794@d length(#)==byte_start[#+ww]-byte_start[#] {the length of a name} 795 796@<Types...@>= 797@!name_pointer=0..max_names; {identifies a name} 798 799@ @<Global...@>= 800@!name_ptr:name_pointer; {first unused position in |byte_start|} 801@!string_ptr:name_pointer; {next number to be given to a string of length |<>1|} 802@!byte_ptr:array [0..ww-1] of 0..max_bytes; 803 {first unused position in |byte_mem|} 804@!pool_check_sum:integer; {sort of a hash for the whole string pool} 805 806@ @<Local variables for init...@>= 807@!wi: 0..ww-1; {to initialize the |byte_mem| indices} 808 809@ @<Set init...@>= 810for wi:=0 to ww-1 do 811 begin byte_start[wi]:=0; byte_ptr[wi]:=0; 812 end; 813byte_start[ww]:=0; {this makes name 0 of length zero} 814name_ptr:=1; string_ptr:=number_chars; pool_check_sum:=271828; 815 816@ Replacement texts are stored in |tok_mem|, using similar conventions. 817A `|text_pointer|' variable is an index into |tok_start|, and the 818replacement text that corresponds to |p| runs from positions 819|tok_start[p]| to |tok_start[p+zz]-1|, inclusive, in the segment of 820|tok_mem| whose first index is |p mod zz|. Thus, when |zz=2| the 821even-numbered replacement texts appear in |tok_mem[0,@t$*$@>]| and the 822odd-numbered ones appear in |tok_mem[1,@t$*$@>]|. Furthermore, 823|text_link[p]| is used to connect pieces of text that have the same name, 824as we shall see later. The pointer 0 is used for undefined replacement 825texts. 826 827The first position of |tok_mem[z,@t$*$@>]| that is unoccupied by 828replacement text is called |tok_ptr[z]|, and the first unused location of 829|tok_start| is called |text_ptr|. We usually have the identity 830|tok_start[text_ptr+z]=tok_ptr[(text_ptr+z) mod zz]|, for |0<=z<zz|, since 831these are the starting positions for the next |zz| replacement texts to 832be stored in |tok_mem|. 833 834@<Types...@>= 835@!text_pointer=0..max_texts; {identifies a replacement text} 836 837@ It is convenient to maintain a variable |z| that is equal to |text_ptr 838mod zz|, so that we always insert tokens into segment |z| of |tok_mem|. 839 840@<Glob...@>= 841@t\hskip1em@>@!text_ptr:text_pointer; {first unused position in |tok_start|} 842@t\hskip1em@>@!tok_ptr:array[0..zz-1] of 0..max_toks; 843 {first unused position in a given segment of |tok_mem|} 844@t\hskip1em@>@!z:0..zz-1; {current segment of |tok_mem|} 845stat @!max_tok_ptr:array[0..zz-1] of 0..max_toks; 846 {largest values assumed by |tok_ptr|} 847tats 848 849@ @<Local variables for init...@>= 850@!zi:0..zz-1; {to initialize the |tok_mem| indices} 851 852@ @<Set init...@>= 853for zi:=0 to zz-1 do 854 begin tok_start[zi]:=0; tok_ptr[zi]:=0; 855 end; 856tok_start[zz]:=0; {this makes replacement text 0 of length zero} 857text_ptr:=1; z:=1 mod zz; 858 859@ Four types of identifiers are distinguished by their |ilk|: 860 861\yskip\hang |normal| identifiers will appear in the \PASCAL\ program as 862ordinary identifiers since they have not been defined to be macros; the 863corresponding value in the |equiv| array 864for such identifiers is a link in a secondary hash table that 865is used to check whether any two of them agree in their first |unambig_length| 866characters after underline symbols are removed and lowercase letters are 867changed to uppercase. 868 869\yskip\hang |numeric| identifiers have been defined to be numeric macros; 870their |equiv| value contains the corresponding numeric value plus $2^{15}$. 871Strings are treated as numeric macros. 872 873\yskip\hang |simple| identifiers have been defined to be simple macros; 874their |equiv| value points to the corresponding replacement text. 875 876\yskip\hang |parametric| identifiers have been defined to be parametric macros; 877like simple identifiers, their |equiv| value points to the replacement text. 878 879@d normal=0 {ordinary identifiers have |normal| ilk} 880@d numeric=1 {numeric macros and strings have |numeric| ilk} 881@d simple=2 {simple macros have |simple| ilk} 882@d parametric=3 {parametric macros have |parametric| ilk} 883 884@ The names of modules are stored in |byte_mem| together 885with the identifier names, but a hash table is not used for them because 886\.{TANGLE} needs to be able to recognize a module name when given a prefix of 887that name. A conventional binary seach tree is used to retrieve module names, 888with fields called |llink| and |rlink| in place of |link| and |ilk|. The 889root of this tree is |rlink[0]|. If |p| is a pointer to a module name, 890|equiv[p]| points to its replacement text, just as in simple and parametric 891macros, unless this replacement text has not yet been defined (in which case 892|equiv[p]=0|). 893 894@d llink==link {left link in binary search tree for module names} 895@d rlink==ilk {right link in binary search tree for module names} 896 897@<Set init...@>= 898rlink[0]:=0; {the binary search tree starts out with nothing in it} 899equiv[0]:=0; {the undefined module has no replacement text} 900 901@ Here is a little procedure that prints the text of a given name. 902 903@p procedure print_id(@!p:name_pointer); {print identifier or module name} 904var k:0..max_bytes; {index into |byte_mem|} 905@!w:0..ww-1; {segment of |byte_mem|} 906begin if p>=name_ptr then print('IMPOSSIBLE') 907else begin w:=p mod ww; 908 for k:=byte_start[p] to byte_start[p+ww]-1 do print(xchr[byte_mem[w,k]]); 909 end; 910end; 911 912@* Searching for identifiers. 913The hash table described above is updated by the |id_lookup| procedure, 914which finds a given identifier and returns a pointer to its index in 915|byte_start|. If the identifier was not already present, it is inserted with 916a given |ilk| code; and an error message is printed if the identifier is being 917doubly defined. 918 919Because of the way \.{TANGLE}'s scanning mechanism works, it is most convenient 920to let |id_lookup| search for an identifier that is present in the |buffer| 921array. Two other global variables specify its position in the buffer: the 922first character is |buffer[id_first]|, and the last is |buffer[id_loc-1]|. 923Furthermore, if the identifier is really a string, the global variable 924|double_chars| tells how many of the characters in the buffer appear 925twice (namely \.{@@@@} and \.{""}), since this additional information makes 926it easy to calculate the true length of the string. The final double-quote 927of the string is not included in its ``identifier,'' but the first one is, 928so the string length is |id_loc-id_first-double_chars-1|. 929 930We have mentioned that |normal| identifiers belong to two hash tables, 931one for their true names as they appear in the \.{WEB} file and the other 932when they have been reduced to their first |unambig_length| characters. 933The hash tables are kept by the method of simple chaining, where the 934heads of the individual lists appear in the |hash| and |chop_hash| arrays. 935If |h| is a hash code, the primary hash table list starts at |hash[h]| and 936proceeds through |link| pointers; the secondary hash table list starts at 937|chop_hash[h]| and proceeds through |equiv| pointers. Of course, the same 938identifier will probably have two different values of |h|. 939 940The |id_lookup| procedure uses an auxiliary array called |chopped_id| to 941contain up to |unambig_length| characters of the current identifier, if 942it is necessary to compute the secondary hash code. (This array could be 943declared local to |id_lookup|, but in general we are making all array 944declarations global in this program, because some compilers and some machine 945architectures make dynamic array allocation inefficient.) 946 947@<Glob...@>= 948@!id_first:0..buf_size; {where the current identifier begins in the buffer} 949@!id_loc:0..buf_size; {just after the current identifier in the buffer} 950@!double_chars:0..buf_size; {correction to length in case of strings} 951@# 952@!hash,@!chop_hash:array [0..hash_size] of sixteen_bits; {heads of hash lists} 953@!chopped_id:array [0..unambig_length] of ASCII_code; {chopped identifier} 954 955@ Initially all the hash lists are empty. 956 957@<Local variables for init...@>= 958@!h:0..hash_size; {index into hash-head arrays} 959 960@ @<Set init...@>= 961for h:=0 to hash_size-1 do 962 begin hash[h]:=0; chop_hash[h]:=0; 963 end; 964 965@ Here now is the main procedure for finding identifiers (and strings). 966The parameter |t| is set to |normal| except when the identifier is 967a macro name that is just being defined; in the latter case, |t| will be 968|numeric|, |simple|, or |parametric|. 969 970@p function id_lookup(@!t:eight_bits):name_pointer; {finds current identifier} 971label found, not_found; 972var c:eight_bits; {byte being chopped} 973@!i:0..buf_size; {index into |buffer|} 974@!h:0..hash_size; {hash code} 975@!k:0..max_bytes; {index into |byte_mem|} 976@!w:0..ww-1; {segment of |byte_mem|} 977@!l:0..buf_size; {length of the given identifier} 978@!p,@!q:name_pointer; {where the identifier is being sought} 979@!s:0..unambig_length; {index into |chopped_id|} 980begin l:=id_loc-id_first; {compute the length} 981@<Compute the hash code |h|@>; 982@<Compute the name location |p|@>; 983if (p=name_ptr)or(t<>normal) then 984 @<Update the tables and check for possible errors@>; 985id_lookup:=p; 986end; 987 988@ A simple hash code is used: If the sequence of 989ASCII codes is $c_1c_2\ldots c_m$, its hash value will be 990$$(2^{n-1}c_1+2^{n-2}c_2+\cdots+c_n)\,\bmod\,|hash_size|.$$ 991 992@<Compute the hash...@>= 993h:=buffer[id_first]; i:=id_first+1; 994while i<id_loc do 995 begin h:=(h+h+buffer[i]) mod hash_size; incr(i); 996 end 997 998@ If the identifier is new, it will be placed in position |p=name_ptr|, 999otherwise |p| will point to its existing location. 1000 1001@<Compute the name location...@>= 1002p:=hash[h]; 1003while p<>0 do 1004 begin if length(p)=l then 1005 @<Compare name |p| with current identifier, |goto found| if equal@>; 1006 p:=link[p]; 1007 end; 1008p:=name_ptr; {the current identifier is new} 1009link[p]:=hash[h]; hash[h]:=p; {insert |p| at beginning of hash list} 1010found: 1011 1012@ @<Compare name |p|...@>= 1013begin i:=id_first; k:=byte_start[p]; w:=p mod ww; 1014while (i<id_loc)and(buffer[i]=byte_mem[w,k]) do 1015 begin incr(i); incr(k); 1016 end; 1017if i=id_loc then goto found; {all characters agree} 1018end 1019 1020@ @<Update the tables...@>= 1021begin if ((p<>name_ptr)and(t<>normal)and(ilk[p]=normal)) or 1022 ((p=name_ptr)and(t=normal)and(buffer[id_first]<>"""")) then 1023 @<Compute the secondary hash code |h| and put the first characters 1024 into the auxiliary array |chopped_id|@>; 1025if p<>name_ptr then 1026 @<Give double-definition error, if necessary, and change |p| to type |t|@> 1027else @<Enter a new identifier into the table at position |p|@>; 1028end 1029 1030@ The following routine, which is called into play when it is necessary to 1031look at the secondary hash table, computes the same hash function as before 1032(but on the chopped data), and places a zero after the chopped identifier 1033in |chopped_id| to serve as a convenient sentinel. 1034 1035@<Compute the secondary...@>= 1036begin i:=id_first; s:=0; h:=0; 1037while (i<id_loc)and(s<unambig_length) do 1038 begin if buffer[i]<>"_" then 1039 begin if buffer[i]>="a" then chopped_id[s]:=buffer[i]-@'40 1040 else chopped_id[s]:=buffer[i]; 1041 h:=(h+h+chopped_id[s]) mod hash_size; incr(s); 1042 end; 1043 incr(i); 1044 end; 1045chopped_id[s]:=0; 1046end 1047 1048@ If a nonnumeric macro has appeared before it was defined, \.{TANGLE} 1049will still work all right; after all, such behavior is typical of the 1050replacement texts for modules, which act very much like macros. 1051However, an undefined numeric macro may not be used on the right-hand 1052side of another numeric macro definition, so \.{TANGLE} finds it 1053simplest to make a blanket rule that numeric macros should be defined 1054before they are used. The following routine gives an error message and 1055also fixes up any damage that may have been caused. 1056 1057@<Give double...@>= {now |p<>name_ptr| and |t<>normal|} 1058begin if ilk[p]=normal then 1059 begin if t=numeric then err_print('! This identifier has already appeared'); 1060@.This identifier has already...@> 1061 @<Remove |p| from secondary hash table@>; 1062 end 1063else err_print('! This identifier was defined before'); 1064@.This identifier was defined...@> 1065ilk[p]:=t; 1066end 1067 1068@ When we have to remove a secondary hash entry, because a |normal| identifier 1069is changing to another |ilk|, the hash code |h| and chopped identifier have 1070already been computed. 1071 1072@<Remove |p| from secondary...@>= 1073q:=chop_hash[h]; 1074if q=p then chop_hash[h]:=equiv[p] 1075else begin while equiv[q]<>p do q:=equiv[q]; 1076 equiv[q]:=equiv[p]; 1077 end 1078 1079@ The following routine could make good use of a generalized |pack| procedure 1080that puts items into just part of a packed array instead of the whole thing. 1081 1082@<Enter a new identifier...@>= 1083begin if (t=normal)and(buffer[id_first]<>"""") then 1084 @<Check for ambiguity and update secondary hash@>; 1085w:=name_ptr mod ww; k:=byte_ptr[w]; 1086if k+l>max_bytes then overflow('byte memory'); 1087if name_ptr>max_names-ww then overflow('name'); 1088i:=id_first; {get ready to move the identifier into |byte_mem|} 1089while i<id_loc do 1090 begin byte_mem[w,k]:=buffer[i]; incr(k); incr(i); 1091 end; 1092byte_ptr[w]:=k; byte_start[name_ptr+ww]:=k; incr(name_ptr); 1093if buffer[id_first]<>"""" then ilk[p]:=t 1094else @<Define and output a new string of the pool@>; 1095end 1096 1097@ @<Check for ambig...@>= 1098begin q:=chop_hash[h]; 1099while q<>0 do 1100 begin @<Check if |q| conflicts with |p|@>; 1101 q:=equiv[q]; 1102 end; 1103equiv[p]:=chop_hash[h]; chop_hash[h]:=p; {put |p| at front of secondary list} 1104end 1105 1106@ @<Check if |q| conflicts...@>= 1107begin k:=byte_start[q]; s:=0; w:=q mod ww; 1108while (k<byte_start[q+ww]) and (s<unambig_length) do 1109 begin c:=byte_mem[w,k]; 1110 if c<>"_" then 1111 begin if c>="a" then c:=c-@'40; {merge lowercase with uppercase} 1112 if chopped_id[s]<>c then goto not_found; 1113 incr(s); 1114 end; 1115 incr(k); 1116 end; 1117if (k=byte_start[q+ww])and(chopped_id[s]<>0) then goto not_found; 1118print_nl('! Identifier conflict with '); 1119@.Identifier conflict...@> 1120for k:=byte_start[q] to byte_start[q+ww]-1 do print(xchr[byte_mem[w,k]]); 1121error; q:=0; {only one conflict will be printed, since |equiv[0]=0|} 1122not_found: 1123end 1124 1125@ We compute the string pool check sum by working modulo a prime number 1126that is large but not so large that overflow might occur. 1127 1128@d check_sum_prime==@'3777777667 {$2^{29}-73$} 1129@^preprocessed strings@> 1130 1131@<Define and output a new string...@>= 1132begin ilk[p]:=numeric; {strings are like numeric macros} 1133if l-double_chars=2 then {this string is for a single character} 1134 equiv[p]:=buffer[id_first+1]+1073741824 1135else begin equiv[p]:=string_ptr+1073741824; 1136 l:=l-double_chars-1; 1137 if l>99 then err_print('! Preprocessed string is too long'); 1138@.Preprocessed string is too long@> 1139 incr(string_ptr); 1140 write(pool,xchr["0"+l div 10],xchr["0"+l mod 10]); {output the length} 1141 pool_check_sum:=pool_check_sum+pool_check_sum+l; 1142 while pool_check_sum>check_sum_prime do 1143 pool_check_sum:=pool_check_sum-check_sum_prime; 1144 i:=id_first+1; 1145 while i<id_loc do 1146 begin write(pool,xchr[buffer[i]]); {output characters of string} 1147 pool_check_sum:=pool_check_sum+pool_check_sum+buffer[i]; 1148 while pool_check_sum>check_sum_prime do 1149 pool_check_sum:=pool_check_sum-check_sum_prime; 1150 if (buffer[i]="""") or (buffer[i]="@@") then 1151 i:=i+2 {omit second appearance of doubled character} 1152 else incr(i); 1153 end; 1154 write_ln(pool); 1155 end; 1156end 1157 1158@* Searching for module names. 1159The |mod_lookup| procedure finds the module name |mod_text[1..l]| in the 1160search tree, after inserting it if necessary, and returns a pointer to 1161where it was found. 1162 1163@<Glob...@>= 1164@!mod_text:array [0..longest_name] of ASCII_code; {name being sought for} 1165 1166@ According to the rules of \.{WEB}, no module name 1167should be a proper prefix of another, so a ``clean'' comparison should 1168occur between any two names. The result of |mod_lookup| is 0 if this 1169prefix condition is violated. An error message is printed when such violations 1170are detected during phase two of \.{WEAVE}. 1171 1172@d less=0 {the first name is lexicographically less than the second} 1173@d equal=1 {the first name is equal to the second} 1174@d greater=2 {the first name is lexicographically greater than the second} 1175@d prefix=3 {the first name is a proper prefix of the second} 1176@d extension=4 {the first name is a proper extension of the second} 1177 1178@p function mod_lookup(@!l:sixteen_bits):name_pointer; {finds module name} 1179label found; 1180var c:less..extension; {comparison between two names} 1181@!j:0..longest_name; {index into |mod_text|} 1182@!k:0..max_bytes; {index into |byte_mem|} 1183@!w:0..ww-1; {segment of |byte_mem|} 1184@!p:name_pointer; {current node of the search tree} 1185@!q:name_pointer; {father of node |p|} 1186begin c:=greater; q:=0; p:=rlink[0]; {|rlink[0]| is the root of the tree} 1187while p<>0 do 1188 begin @<Set \(|c| to the result of comparing the given name to 1189 name |p|@>; 1190 q:=p; 1191 if c=less then p:=llink[q] 1192 else if c=greater then p:=rlink[q] 1193 else goto found; 1194 end; 1195@<Enter a new module name into the tree@>; 1196found: if c<>equal then 1197 begin err_print('! Incompatible section names'); p:=0; 1198@.Incompatible module names@> 1199 end; 1200mod_lookup:=p; 1201end; 1202 1203@ @<Enter a new module name...@>= 1204w:=name_ptr mod ww; k:=byte_ptr[w]; 1205if k+l>max_bytes then overflow('byte memory'); 1206if name_ptr>max_names-ww then overflow('name'); 1207p:=name_ptr; 1208if c=less then llink[q]:=p else rlink[q]:=p; 1209llink[p]:=0; rlink[p]:=0; c:=equal; equiv[p]:=0; 1210for j:=1 to l do byte_mem[w,k+j-1]:=mod_text[j]; 1211byte_ptr[w]:=k+l; byte_start[name_ptr+ww]:=k+l; incr(name_ptr); 1212 1213@ @<Set \(|c|...@>= 1214begin k:=byte_start[p]; w:=p mod ww; c:=equal; j:=1; 1215while (k<byte_start[p+ww]) and (j<=l) and (mod_text[j]=byte_mem[w,k]) do 1216 begin incr(k); incr(j); 1217 end; 1218if k=byte_start[p+ww] then 1219 if j>l then c:=equal 1220 else c:=extension 1221else if j>l then c:=prefix 1222else if mod_text[j]<byte_mem[w,k] then c:=less 1223else c:=greater; 1224end 1225 1226@ The |prefix_lookup| procedure is supposed to find exactly one module 1227name that has |mod_text[1..l]| as a prefix. Actually the algorithm silently 1228accepts also the situation that some module name is a prefix of 1229|mod_text[1..l]|, because the user who painstakingly typed in more than 1230necessary probably doesn't want to be told about the wasted effort. 1231 1232@p function prefix_lookup(@!l:sixteen_bits):name_pointer; {finds name extension} 1233var c:less..extension; {comparison between two names} 1234@!count:0..max_names; {the number of hits} 1235@!j:0..longest_name; {index into |mod_text|} 1236@!k:0..max_bytes; {index into |byte_mem|} 1237@!w:0..ww-1; {segment of |byte_mem|} 1238@!p:name_pointer; {current node of the search tree} 1239@!q:name_pointer; {another place to resume the search after one branch is done} 1240@!r:name_pointer; {extension found} 1241begin q:=0; p:=rlink[0]; count:=0; r:=0; {begin search at root of tree} 1242while p<>0 do 1243 begin @<Set \(|c|...@>; 1244 if c=less then p:=llink[p] 1245 else if c=greater then p:=rlink[p] 1246 else begin r:=p; incr(count); q:=rlink[p]; p:=llink[p]; 1247 end; 1248 if p=0 then 1249 begin p:=q; q:=0; 1250 end; 1251 end; 1252if count<>1 then 1253 if count=0 then err_print('! Name does not match') 1254@.Name does not match@> 1255 else err_print('! Ambiguous prefix'); 1256@.Ambiguous prefix@> 1257prefix_lookup:=r; {the result will be 0 if there was no match} 1258end; 1259 1260@* Tokens. 1261Replacement texts, which represent \PASCAL\ code in a compressed format, 1262appear in |tok_mem| as mentioned above. The codes in 1263these texts are called `tokens'; some tokens occupy two consecutive 1264eight-bit byte positions, and the others take just one byte. 1265 1266If $p>0$ points to a replacement text, |tok_start[p]| is the |tok_mem| position 1267of the first eight-bit code of that text. If |text_link[p]=0|, 1268this is the replacement text for a macro, otherwise it is the replacement 1269text for a module. In the latter case |text_link[p]| is either equal to 1270|module_flag|, which means that there is no further text for this module, or 1271|text_link[p]| points to a 1272continuation of this replacement text; such links are created when 1273several modules have \PASCAL\ texts with the same name, and they also 1274tie together all the \PASCAL\ texts of unnamed modules. 1275The replacement text pointer for the first unnamed module 1276appears in |text_link[0]|, and the most recent such pointer is |last_unnamed|. 1277 1278@d module_flag==max_texts {final |text_link| in module replacement texts} 1279 1280@<Glob...@>= 1281@!last_unnamed:text_pointer; {most recent replacement text of unnamed module} 1282 1283@ @<Set init...@>= last_unnamed:=0; text_link[0]:=0; 1284 1285@ If the first byte of a token is less than @'200, the token occupies a 1286single byte. Otherwise we make a sixteen-bit token by combining two consecutive 1287bytes |a| and |b|. If |@'200<=a<@'250|, then $(a-@'200)\times2^8+b$ points 1288to an identifier; if |@'250<=a<@'320|, then 1289$(a-@'250)\times2^8+b$ points to a module name; otherwise, i.e., if 1290|@'320<=a<@'400|, then $(a-@'320)\times2^8+b$ is the number of the module 1291in which the current replacement text appears. 1292 1293Codes less than @'200 are 7-bit ASCII codes that represent themselves. 1294In particular, a single-character identifier like `|x|' will be a one-byte 1295token, while all longer identifiers will occupy two bytes. 1296 1297Some of the 7-bit ASCII codes will not be present, however, so we can 1298use them for special purposes. The following symbolic names are used: 1299 1300\yskip\hang |param| denotes insertion of a parameter. This occurs only in 1301the replacement texts of parametric macros, outside of single-quoted strings 1302in those texts. 1303 1304\hang |begin_comment| denotes \.{@@\{}, which will become either 1305\.{\{} or \.{[}. 1306 1307\hang |end_comment| denotes \.{@@\}}, which will become either 1308\.{\}} or \.{]}. 1309 1310\hang |octal| denotes the \.{@@\'} that precedes an octal constant. 1311 1312\hang |hex| denotes the \.{@@"} that precedes a hexadecimal constant. 1313 1314\hang |check_sum| denotes the \.{@@\char'44} that denotes the string pool 1315check sum. 1316 1317\hang |join| denotes the concatenation of adjacent items with no 1318space or line breaks allowed between them (the \.{@@\&} operation of \.{WEB}). 1319 1320\hang |double_dot| denotes `\.{..}' in \PASCAL. 1321 1322\hang |verbatim| denotes the \.{@@=} that begins a verbatim \PASCAL\ string. 1323It is also used for the end of the string. 1324 1325\hang |force_line| denotes the \.{@@\\} that forces a new line in the 1326\PASCAL\ output. 1327@^ASCII code@> 1328 1329@d param=0 {ASCII null code will not appear} 1330@d verbatim=@'2 {extended ASCII alpha should not appear} 1331@d force_line=@'3 {extended ASCII beta should not appear} 1332@d begin_comment=@'11 {ASCII tab mark will not appear} 1333@d end_comment=@'12 {ASCII line feed will not appear} 1334@d octal=@'14 {ASCII form feed will not appear} 1335@d hex=@'15 {ASCII carriage return will not appear} 1336@d double_dot=@'40 {ASCII space will not appear except in strings} 1337@d check_sum=@'175 {will not be confused with right brace} 1338@d join=@'177 {ASCII delete will not appear} 1339 1340@ The following procedure is used to enter a two-byte value into 1341|tok_mem| when a replacement text is being generated. 1342 1343@p procedure store_two_bytes(@!x:sixteen_bits); 1344 {stores high byte, then low byte} 1345begin if tok_ptr[z]+2>max_toks then overflow('token'); 1346tok_mem[z,tok_ptr[z]]:=x div@'400; {this could be done by a shift command} 1347tok_mem[z,tok_ptr[z]+1]:=x mod@'400; {this could be done by a logical and} 1348tok_ptr[z]:=tok_ptr[z]+2; 1349end; 1350 1351@ When \.{TANGLE} is being operated in debug mode, it has a procedure to display 1352a replacement text in symbolic form. This procedure has not been spruced up to 1353generate a real great format, but at least the results are not as bad as 1354a memory dump. 1355 1356@p @!debug procedure print_repl(@!p:text_pointer); 1357var k:0..max_toks; {index into |tok_mem|} 1358@!a: sixteen_bits; {current byte(s)} 1359@!zp: 0..zz-1; {segment of |tok_mem| being accessed} 1360begin if p>=text_ptr then print('BAD') 1361else begin k:=tok_start[p]; zp:=p mod zz; 1362 while k<tok_start[p+zz] do 1363 begin a:=tok_mem[zp,k]; 1364 if a>=@'200 then @<Display two-byte token starting with |a|@> 1365 else @<Display one-byte token |a|@>; 1366 incr(k); 1367 end; 1368 end; 1369end; 1370gubed 1371 1372@ @<Display two-byte...@>= 1373begin incr(k); 1374if a<@'250 then {identifier or string} 1375 begin a:=(a-@'200)*@'400+tok_mem[zp,k]; print_id(a); 1376 if byte_mem[a mod ww,byte_start[a]]="""" then print('"') 1377 else print(' '); 1378 end 1379else if a<@'320 then {module name} 1380 begin print('@@<'); print_id((a-@'250)*@'400+tok_mem[zp,k]); 1381 print('@@>'); 1382 end 1383else begin a:=(a-@'320)*@'400+tok_mem[zp,k]; {module number} 1384 print('@@',xchr["{"],a:1,'@@',xchr["}"]); {can't use right brace 1385 between \&{debug} and \&{gubed}} 1386 end; 1387end 1388 1389@ @<Display one-byte...@>= 1390case a of 1391begin_comment: print('@@',xchr["{"]); 1392end_comment: print('@@',xchr["}"]); {can't use right brace 1393 between \&{debug} and \&{gubed}} 1394octal: print('@@'''); 1395hex: print('@@"'); 1396check_sum: print('@@$'); 1397param: print('#'); 1398"@@": print('@@@@'); 1399verbatim: print('@@='); 1400force_line: print('@@\'); 1401othercases print(xchr[a]) 1402endcases 1403 1404@* Stacks for output. 1405Let's make sure that our data structures contain enough information to 1406produce the entire \PASCAL\ program as desired, by working next on the 1407algorithms that actually do produce that program. 1408 1409@ The output process uses a stack to keep track of what is going on at 1410different ``levels'' as the macros are being expanded. 1411Entries on this stack have five parts: 1412 1413\yskip\hang |end_field| is the |tok_mem| location where the replacement 1414text of a particular level will end; 1415 1416\hang |byte_field| is the |tok_mem| location from which the next token 1417on a particular level will be read; 1418 1419\hang |name_field| points to the name corresponding to a particular level; 1420 1421\hang |repl_field| points to the replacement text currently being read 1422at a particular level; 1423 1424\hang |mod_field| is the module number, or zero if this is a macro. 1425 1426\yskip\noindent The current values of these five quantities are referred to 1427quite frequently, so they are stored in a separate place instead of in 1428the |stack| array. We call the current values |cur_end|, |cur_byte|, 1429|cur_name|, |cur_repl|, and |cur_mod|. 1430 1431The global variable |stack_ptr| tells how many levels of output are 1432currently in progress. The end of all output occurs when the stack is 1433empty, i.e., when |stack_ptr=0|. 1434 1435@<Types...@>= 1436@t\4@>@!output_state=record 1437 @!end_field: sixteen_bits; {ending location of replacement text} 1438 @!byte_field: sixteen_bits; {present location within replacement text} 1439 @!name_field: name_pointer; {|byte_start| index for text being output} 1440 @!repl_field: text_pointer; {|tok_start| index for text being output} 1441 @!mod_field: 0..@'27777; {module number or zero if not a module} 1442 end; 1443 1444@ @d cur_end==cur_state.end_field {current ending location in |tok_mem|} 1445@d cur_byte==cur_state.byte_field {location of next output byte in |tok_mem|} 1446@d cur_name==cur_state.name_field {pointer to current name being expanded} 1447@d cur_repl==cur_state.repl_field {pointer to current replacement text} 1448@d cur_mod==cur_state.mod_field {current module number being expanded} 1449 1450@<Globals...@>= 1451@!cur_state : output_state; {|cur_end|, |cur_byte|, |cur_name|, 1452 |cur_repl|, |cur_mod|} 1453@!stack : array [1..stack_size] of output_state; {info for non-current levels} 1454@!stack_ptr: 0..stack_size; {first unused location in the output state stack} 1455 1456@ It is convenient to keep a global variable |zo| equal to |cur_repl mod zz|. 1457 1458@<Glob...@>= 1459@!zo:0..zz-1; {the segment of |tok_mem| from which output is coming} 1460 1461@ Parameters must also be stacked. They are placed in 1462|tok_mem| just above the other replacement texts, and dummy parameter 1463`names' are placed in |byte_start| just after the other names. 1464The variables |text_ptr| and |tok_ptr[z]| essentially serve as parameter 1465stack pointers during the output phase, so there is no need for a separate 1466data structure to handle this problem. 1467 1468@ There is an implicit stack corresponding to meta-comments that are output 1469via \.{@@\{} and \.{@@\}}. But this stack need not be represented in detail, 1470because we only need to know whether it is empty or not. A global variable 1471|brace_level| tells how many items would be on this stack if it were present. 1472 1473@<Globals...@>= 1474@!brace_level: eight_bits; {current depth of $\.{@@\{}\ldots\.{@@\}}$ nesting} 1475 1476@ To get the output process started, we will perform the following 1477initialization steps. We may assume that |text_link[0]| is nonzero, since it 1478points to the \PASCAL\ text in the first unnamed module that generates 1479code; if there are no such modules, there is nothing to output, and an 1480error message will have been generated before we do any of the initialization. 1481 1482@<Initialize the output stacks@>= 1483stack_ptr:=1; brace_level:=0; cur_name:=0; cur_repl:=text_link[0]; 1484zo:=cur_repl mod zz; cur_byte:=tok_start[cur_repl]; 1485cur_end:=tok_start[cur_repl+zz]; cur_mod:=0; 1486 1487@ When the replacement text for name |p| is to be inserted into the output, 1488the following subroutine is called to save the old level of output and get 1489the new one going. 1490 1491@p procedure push_level(@!p:name_pointer); {suspends the current level} 1492begin if stack_ptr=stack_size then overflow('stack') 1493else begin stack[stack_ptr]:=cur_state; {save |cur_end|, |cur_byte|, etc.} 1494 incr(stack_ptr); 1495 cur_name:=p; cur_repl:=equiv[p]; zo:=cur_repl mod zz; 1496 cur_byte:=tok_start[cur_repl]; cur_end:=tok_start[cur_repl+zz]; 1497 cur_mod:=0; 1498 end; 1499end; 1500 1501@ When we come to the end of a replacement text, the |pop_level| subroutine 1502does the right thing: It either moves to the continuation of this replacement 1503text or returns the state to the most recently stacked level. Part of this 1504subroutine, which updates the parameter stack, will be given later when we 1505study the parameter stack in more detail. 1506 1507@p procedure pop_level; {do this when |cur_byte| reaches |cur_end|} 1508label exit; 1509begin if text_link[cur_repl]=0 then {end of macro expansion} 1510 begin if ilk[cur_name]=parametric then 1511 @<Remove a parameter from the parameter stack@>; 1512 end 1513else if text_link[cur_repl]<module_flag then {link to a continuation} 1514 begin cur_repl:=text_link[cur_repl]; {we will stay on the same level} 1515 zo:=cur_repl mod zz; 1516 cur_byte:=tok_start[cur_repl]; cur_end:=tok_start[cur_repl+zz]; 1517 return; 1518 end; 1519decr(stack_ptr); {we will go down to the previous level} 1520if stack_ptr>0 then 1521 begin cur_state:=stack[stack_ptr]; zo:=cur_repl mod zz; 1522 end; 1523exit: end; 1524 1525@ The heart of the output procedure is the |get_output| routine, which produces 1526the next token of output that is not a reference to a macro. This procedure 1527handles all the stacking and unstacking that is necessary. It returns the 1528value |number| if the next output has a numeric value (the value of a 1529numeric macro or string), in which case |cur_val| has been set to the 1530number in question. The procedure also returns the value |module_number| 1531if the next output begins or ends the replacement text of some module, 1532in which case |cur_val| is that module's number (if beginning) or the 1533negative of that value (if ending). And it returns the value |identifier| 1534if the next output is an identifier of length two or more, in which case 1535|cur_val| points to that identifier name. 1536 1537@d number=@'200 {code returned by |get_output| when next output is numeric} 1538@d module_number=@'201 {code returned by |get_output| for module numbers} 1539@d identifier=@'202 {code returned by |get_output| for identifiers} 1540 1541@<Globals...@>= 1542@!cur_val:integer; {additional information corresponding to output token} 1543 1544@ If |get_output| finds that no more output remains, it returns the value zero. 1545 1546@p function get_output:sixteen_bits; {returns next token after macro expansion} 1547label restart, done, found; 1548var a:sixteen_bits; {value of current byte} 1549@!b:eight_bits; {byte being copied} 1550@!bal:sixteen_bits; {excess of \.( versus \.) while copying a parameter} 1551@!k:0..max_bytes; {index into |byte_mem|} 1552@!w:0..ww-1; {segment of |byte_mem|} 1553begin restart: if stack_ptr=0 then 1554 begin a:=0; goto found; 1555 end; 1556if cur_byte=cur_end then 1557 begin cur_val:=-cur_mod; pop_level; 1558 if cur_val=0 then goto restart; 1559 a:=module_number; goto found; 1560 end; 1561a:=tok_mem[zo,cur_byte]; incr(cur_byte); 1562if a<@'200 then {one-byte token} 1563 if a=param then 1564 @<Start scanning current macro parameter, |goto restart|@> 1565 else goto found; 1566a:=(a-@'200)*@'400+tok_mem[zo,cur_byte]; incr(cur_byte); 1567if a<@'24000 then {|@'24000=(@'250-@'200)*@'400|} 1568 @<Expand macro |a| and |goto found|, or |goto restart| if no output found@>; 1569if a<@'50000 then {|@'50000=(@'320-@'200)*@'400|} 1570 @<Expand module |a-@'24000|, |goto restart|@>; 1571cur_val:=a-@'50000; a:=module_number; cur_mod:=cur_val; 1572found: 1573@!debug if trouble_shooting then debug_help;@;@+gubed@/ 1574get_output:=a; 1575end; 1576 1577@ The user may have forgotten to give any \PASCAL\ text for a module name, 1578or the \PASCAL\ text may have been associated with a different name by mistake. 1579 1580@<Expand module |a-...@>= 1581begin a:=a-@'24000; 1582if equiv[a]<>0 then push_level(a) 1583else if a<>0 then 1584 begin print_nl('! Not present: <'); print_id(a); print('>'); error; 1585@.Not present: <section name>@> 1586 end; 1587goto restart; 1588end 1589 1590@ @<Expand macro ...@>= 1591begin case ilk[a] of 1592normal: begin cur_val:=a; a:=identifier; 1593 end; 1594numeric: begin cur_val:=equiv[a]-1073741824; a:=number; 1595 end; 1596simple: begin push_level(a); goto restart; 1597 end; 1598parametric: begin @<Put a parameter on the parameter stack, 1599 or |goto restart| if error occurs@>; 1600 push_level(a); goto restart; 1601 end; 1602othercases confusion('output') 1603endcases;@/ 1604goto found; 1605end 1606 1607@ We come now to the interesting part, the job of putting a parameter on 1608the parameter stack. First we pop the stack if necessary until getting to 1609a level that hasn't ended. Then the next character must be a `\.('; 1610and since parentheses are balanced on each level, the entire parameter must 1611be present, so we can copy it without difficulty. 1612 1613@<Put a parameter...@>= 1614while (cur_byte=cur_end)and(stack_ptr>0) do pop_level; 1615if (stack_ptr=0)or(tok_mem[zo,cur_byte]<>"(") then 1616 begin print_nl('! No parameter given for '); print_id(a); error; 1617@.No parameter given for macro@> 1618 goto restart; 1619 end; 1620@<Copy the parameter into |tok_mem|@>; 1621equiv[name_ptr]:=text_ptr; ilk[name_ptr]:=simple; w:=name_ptr mod ww; 1622k:=byte_ptr[w]; 1623@!debug if k=max_bytes then overflow('byte memory'); 1624byte_mem[w,k]:="#"; incr(k); byte_ptr[w]:=k; 1625gubed {this code has set the parameter identifier for debugging printouts} 1626if name_ptr>max_names-ww then overflow('name'); 1627byte_start[name_ptr+ww]:=k; incr(name_ptr); 1628if text_ptr>max_texts-zz then overflow('text'); 1629text_link[text_ptr]:=0; tok_start[text_ptr+zz]:=tok_ptr[z]; 1630incr(text_ptr); 1631z:=text_ptr mod zz 1632 1633@ The |pop_level| routine undoes the effect of parameter-pushing when 1634a parameter macro is finished: 1635 1636@<Remove a parameter...@>= 1637begin decr(name_ptr); decr(text_ptr); 1638z:=text_ptr mod zz; 1639stat if tok_ptr[z]>max_tok_ptr[z] then max_tok_ptr[z]:=tok_ptr[z]; 1640tats {the maximum value of |tok_ptr| occurs just before parameter popping} 1641tok_ptr[z]:=tok_start[text_ptr]; 1642@!debug decr(byte_ptr[name_ptr mod ww]);@+gubed 1643end 1644 1645@ When a parameter occurs in a replacement text, we treat it as a simple 1646macro in position (|name_ptr-1|): 1647 1648@<Start scanning...@>= 1649begin push_level(name_ptr-1); goto restart; 1650end 1651 1652@ Similarly, a |param| token encountered as we copy a parameter is converted 1653into a simple macro call for |name_ptr-1|. 1654Some care is needed to handle cases like \\{macro}|(#; print('#)'))|; the 1655\.{\#} token will have been changed to |param| outside of strings, but we 1656still must distinguish `real' parentheses from those in strings. 1657 1658@d app_repl(#)==begin if tok_ptr[z]=max_toks then overflow('token'); 1659 tok_mem[z,tok_ptr[z]]:=#; incr(tok_ptr[z]); end 1660 1661@<Copy the parameter...@>= 1662bal:=1; incr(cur_byte); {skip the opening `\.('} 1663loop@+ begin b:=tok_mem[zo,cur_byte]; incr(cur_byte); 1664 if b=param then store_two_bytes(name_ptr+@'77777) 1665 else begin if b>=@'200 then 1666 begin app_repl(b); 1667 b:=tok_mem[zo,cur_byte]; incr(cur_byte); 1668 end 1669 else case b of 1670 "(": incr(bal); 1671 ")": begin decr(bal); 1672 if bal=0 then goto done; 1673 end; 1674 "'": repeat app_repl(b); 1675 b:=tok_mem[zo,cur_byte]; incr(cur_byte); 1676 until b="'"; {copy string, don't change |bal|} 1677 othercases do_nothing 1678 endcases; 1679 app_repl(b); 1680 end; 1681 end; 1682done: 1683 1684@* Producing the output. 1685The |get_output| routine above handles most of the complexity of output 1686generation, but there are two further considerations that have a nontrivial 1687effect on \.{TANGLE}'s algorithms. 1688 1689First, we want to make sure that the output is broken into lines not 1690exceeding |line_length| characters per line, where these breaks occur at 1691valid places (e.g., not in the middle of a string or a constant or an 1692identifier, not between `\.<' and `\.>', not at a `\.{@@\&}' position 1693where quantities are being joined together). Therefore we assemble the 1694output into a buffer before deciding where the line breaks will appear. 1695However, we make very little attempt to make ``logical'' line breaks that 1696would enhance the readability of the output; people are supposed to read 1697the input of \.{TANGLE} or the \TeX ed output of \.{WEAVE}, but not the 1698tangled-up output. The only concession to readability is that a break after 1699a semicolon will be made if possible, since commonly used ``pretty 1700printing'' routines give better results in such cases. 1701 1702Second, we want to decimalize non-decimal constants, and to combine integer 1703quantities that are added or subtracted, because \PASCAL\ doesn't allow 1704constant expressions in subrange types or in case labels. This means we 1705want to have a procedure that treats a construction like \.{(E-15+17)} 1706as equivalent to `\.{(E+2)}', while also leaving `\.{(1E-15+17)}' and 1707`\.{(E-15+17*y)}' untouched. Consider also `\.{-15+17.5}' versus 1708`\.{-15+17..5}'. We shall not combine integers preceding or following 1709\.*, \./, \.{div}, \.{mod}, or \.{@@\&}. Note that if |y| has been defined 1710to equal $-2$, we must expand `\.{x*y}' into `\.{x*(-2)}'; but `\.{x-y}' 1711can expand into `\.{x+2}' and we can even change `\.{x - y mod z}' to 1712@^mod@> 1713`\.{x + 2 mod z}' because \PASCAL\ has a nonstandard \&{mod} operation! 1714 1715The following solution to these problems has been adopted: An array 1716|out_buf| contains characters that have been generated but not yet output, 1717and there are three pointers into this array. One of these, |out_ptr|, is 1718the number of characters currently in the buffer, and we will have 1719|1<=out_ptr<=line_length| most of the time. The second is |break_ptr|, 1720which is the largest value |<=out_ptr| such that we are definitely entitled 1721to end a line by outputting the characters |out_buf[1..(break_ptr-1)]|; 1722we will always have |break_ptr<=line_length|. Finally, |semi_ptr| is either 1723zero or the largest known value of a legal break after a semicolon or comment 1724on the current line; we will always have |semi_ptr<=break_ptr|. 1725 1726@<Globals...@>= 1727@!out_buf: array [0..out_buf_size] of ASCII_code; {assembled characters} 1728@!out_ptr: 0..out_buf_size; {first available place in |out_buf|} 1729@!break_ptr: 0..out_buf_size; {last breaking place in |out_buf|} 1730@!semi_ptr: 0..out_buf_size; {last semicolon breaking place in |out_buf|} 1731 1732@ Besides having those three pointers, 1733the output process is in one of several states: 1734 1735\yskip\hang |num_or_id| means that the last item in the buffer is a number or 1736identifier, hence a blank space or line break must be inserted if the next 1737item is also a number or identifier. 1738 1739\yskip\hang |unbreakable| means that the last item in the buffer was followed 1740by the \.{@@\&} operation that inhibits spaces between it and the next item. 1741 1742\yskip\hang |sign| means that the last item in the buffer is to be followed 1743by \.+ or \.-, depending on whether |out_app| is positive or negative. 1744 1745\yskip\hang |sign_val| means that the decimal equivalent of 1746$\vert|out_val|\vert$ should be appended to the buffer. If |out_val<0|, 1747or if |out_val=0| and |last_sign<0|, the number should be preceded by a minus 1748sign. Otherwise it should be preceded by the character |out_sign| unless 1749|out_sign=0|; the |out_sign| variable is either 0 or \.{"\ "} or \.{"+"}. 1750 1751\yskip\hang |sign_val_sign| is like |sign_val|, but also append \.+ or \.- 1752afterwards, depending on whether |out_app| is positive or negative. 1753 1754\yskip\hang |sign_val_val| is like |sign_val|, but also append the decimal 1755equivalent of |out_app| including its sign, using |last_sign| in case 1756|out_app=0|. 1757 1758\yskip\hang |misc| means none of the above. 1759 1760\yskip\noindent 1761For example, the output buffer and output state run through the following 1762sequence as we generate characters from `\.{(x-15+19-2)}': 1763$$\vbox{\halign{$\hfil#\hfil$\quad&#\hfil&\quad\hfil#\hfil&\quad 1764\hfil#\hfil&\quad\hfil#\hfil&\quad\hfil#\hfil\quad&\hfil#\hfil\cr 1765output&|out_buf|&|out_state|&|out_sign|&|out_val|&|out_app|&|last_sign|\cr 1766\noalign{\vskip 3pt} 1767(&\.(&|misc|\cr 1768x&\.{(x}&|num_or_id|\cr 1769-&\.{(x}&|sign|&&&$-1$&$-1$\cr 177015&\.{(x}&|sign_val|&\.{"+"}&$-15$&&$-15$\cr 1771+&\.{(x}&|sign_val_sign|&\.{"+"}&$-15$&$+1$&$+1$\cr 177219&\.{(x}&|sign_val_val|&\.{"+"}&$-15$&$+19$&$+1$\cr 1773-&\.{(x}&|sign_val_sign|&\.{"+"}&$+4$&$-1$&$-1$\cr 17742&\.{(x}&|sign_val_val|&\.{"+"}&$+4$&$-2$&$-2$\cr 1775)&\.{(x+2)}&|misc|\cr}}$$ 1776At each stage we have put as much into the buffer as possible without 1777knowing what is coming next. Examples like `\.{x-0.1}' indicate why 1778|last_sign| is needed to associate the proper sign with an output of zero. 1779 1780In states |num_or_id|, |unbreakable|, and |misc| the last item in the buffer 1781lies between |break_ptr| and |out_ptr-1|, inclusive; in the other states we 1782have |break_ptr=out_ptr|. 1783 1784The numeric values assigned to |num_or_id|, etc., have been chosen to 1785shorten some of the program logic; for example, the program makes use of 1786the fact that |sign+2=sign_val_sign|. 1787 1788@d misc=0 {state associated with special characters} 1789@d num_or_id=1 {state associated with numbers and identifiers} 1790@d sign=2 {state associated with pending \.+ or \.-} 1791@d sign_val=num_or_id+2 {state associated with pending sign and value} 1792@d sign_val_sign=sign+2 {|sign_val| followed by another pending sign} 1793@d sign_val_val=sign_val+2 {|sign_val| followed by another pending value} 1794@d unbreakable=sign_val_val+1 {state associated with \.{@@\&}} 1795 1796@<Globals...@>= 1797@!out_state:eight_bits; {current status of partial output} 1798@!out_val,@!out_app:integer; {pending values} 1799@!out_sign:ASCII_code; {sign to use if appending |out_val>=0|} 1800@!last_sign:-1..+1; {sign to use if appending a zero} 1801 1802@ During the output process, |line| will equal the number of the next line 1803to be output. 1804 1805@<Initialize the output buffer@>= 1806out_state:=misc; out_ptr:=0; break_ptr:=0; semi_ptr:=0; out_buf[0]:=0; line:=1; 1807 1808@ Here is a routine that is invoked when |out_ptr>line_length| 1809or when it is time to flush out the final line. The |flush_buffer| procedure 1810often writes out the line up to the current |break_ptr| position, then moves the 1811remaining information to the front of |out_buf|. However, it prefers to 1812write only up to |semi_ptr|, if the residual line won't be too long. 1813 1814@d check_break==if out_ptr>line_length then flush_buffer 1815 1816@p procedure flush_buffer; {writes one line to output file} 1817var k:0..out_buf_size; {index into |out_buf|} 1818@!b:0..out_buf_size; {value of |break_ptr| upon entry} 1819begin b:=break_ptr; 1820if (semi_ptr<>0)and(out_ptr-semi_ptr<=line_length) then break_ptr:=semi_ptr; 1821for k:=1 to break_ptr do write(Pascal_file,xchr[out_buf[k-1]]); 1822write_ln(Pascal_file); incr(line); 1823if line mod 100 = 0 then 1824 begin print('.'); 1825 if line mod 500 = 0 then print(line:1); 1826 update_terminal; {progress report} 1827 end; 1828if break_ptr<out_ptr then 1829 begin if out_buf[break_ptr]=" " then 1830 begin incr(break_ptr); {drop space at break} 1831 if break_ptr>b then b:=break_ptr; 1832 end; 1833 for k:=break_ptr to out_ptr-1 do out_buf[k-break_ptr]:=out_buf[k]; 1834 end; 1835out_ptr:=out_ptr-break_ptr; break_ptr:=b-break_ptr; semi_ptr:=0; 1836if out_ptr>line_length then 1837 begin err_print('! Long line must be truncated'); out_ptr:=line_length; 1838@.Long line must be truncated@> 1839 end; 1840end; 1841 1842@ @<Empty the last line from the buffer@>= 1843break_ptr:=out_ptr; semi_ptr:=0; flush_buffer; 1844if brace_level<>0 then 1845 err_print('! Program ended at brace level ',brace_level:1); 1846@.Program ended at brace level n@> 1847 1848@ Another simple and useful routine appends the decimal equivalent of 1849a nonnegative integer to the output buffer. 1850 1851@d app(#)==begin out_buf[out_ptr]:=#; incr(out_ptr); {append a single character} 1852 end 1853 1854@p procedure app_val(@!v:integer); {puts |v| into buffer, assumes |v>=0|} 1855var k:0..out_buf_size; {index into |out_buf|} 1856begin k:=out_buf_size; {first we put the digits at the very end of |out_buf|} 1857repeat out_buf[k]:=v mod 10; v:=v div 10; decr(k); 1858until v=0; 1859repeat incr(k); app(out_buf[k]+"0"); 1860until k=out_buf_size; {then we append them, most significant first} 1861end; 1862 1863@ The output states are kept up to date by the output routines, which are 1864called |send_out|, |send_val|, and |send_sign|. The |send_out| procedure 1865has two parameters: |t| tells the type of information being sent and 1866|v| contains the information proper. Some information may also be passed 1867in the array |out_contrib|. 1868 1869\yskip\hang If |t=misc| then |v| is a character to be output. 1870 1871\hang If |t=str| then |v| is the length of a string or something like `\.{<>}' 1872in |out_contrib|. 1873 1874\hang If |t=ident| then |v| is the length of an identifier in |out_contrib|. 1875 1876\hang If |t=frac| then |v| is the length of a fraction and/or exponent in 1877|out_contrib|. 1878 1879@d str=1 {|send_out| code for a string} 1880@d ident=2 {|send_out| code for an identifier} 1881@d frac=3 {|send_out| code for a fraction} 1882 1883@<Glob...@>= 1884@!out_contrib:array[1..line_length] of ASCII_code; {a contribution to |out_buf|} 1885 1886@ A slightly subtle point in the following code is that the user may ask 1887for a |join| operation (i.e., \.{@@\&}) following whatever is being sent 1888out. We will see later that |join| is implemented in part by calling 1889|send_out(frac,0)|. 1890 1891@p procedure send_out(@!t:eight_bits; @!v:sixteen_bits); 1892 {outputs |v| of type |t|} 1893label restart; 1894var k: 0..line_length; {index into |out_contrib|} 1895begin @<Get the buffer ready for appending the new information@>; 1896if t<>misc then for k:=1 to v do app(out_contrib[k]) 1897else app(v); 1898check_break; 1899if (t=misc)and((v=";")or(v="}")) then 1900 begin semi_ptr:=out_ptr; break_ptr:=out_ptr; 1901 end; 1902if t>=ident then out_state:=num_or_id {|t=ident| or |frac|} 1903else out_state:=misc {|t=str| or |misc|} 1904end; 1905 1906@ Here is where the buffer states for signs and values collapse into simpler 1907states, because we are about to append something that doesn't combine with 1908the previous integer constants. 1909 1910We use an ASCII-code trick: Since |","-1="+"| and |","+1="-"|, we have 1911|","-c=@t sign of $c$@>|, when $\vert c\vert=1$. 1912 1913@<Get the buffer ready...@>= 1914restart: case out_state of 1915num_or_id: if t<>frac then 1916 begin break_ptr:=out_ptr; 1917 if t=ident then app(" "); 1918 end; 1919sign: begin app(","-out_app); check_break; break_ptr:=out_ptr; 1920 end; 1921sign_val,sign_val_sign: begin @<Append \(|out_val| to buffer@>; 1922 out_state:=out_state-2; goto restart; 1923 end; 1924sign_val_val: @<Reduce |sign_val_val| to |sign_val| and |goto restart|@>; 1925misc: if t<>frac then break_ptr:=out_ptr;@/ 1926othercases do_nothing {this is for |unbreakable| state} 1927endcases 1928 1929@ @<Append \(|out_val|...@>= 1930if (out_val<0)or((out_val=0)and(last_sign<0)) then app("-") 1931else if out_sign>0 then app(out_sign); 1932app_val(abs(out_val)); check_break; 1933 1934@ @<Reduce |sign_val_val|...@>= 1935begin if (t=frac)or(@<Contribution is \.* or \./ or \.{DIV} or \.{MOD}@>) then 1936 begin @<Append \(|out_val| to buffer@>; 1937 out_sign:="+"; out_val:=out_app; 1938 end 1939else out_val:=out_val+out_app; 1940out_state:=sign_val; goto restart; 1941end 1942 1943@ @<Contribution is \.*...@>= 1944((t=ident)and(v=3)and@| 1945 (((out_contrib[1]="D")and(out_contrib[2]="I")and(out_contrib[3]="V")) or@| 1946 ((out_contrib[1]="M")and(out_contrib[2]="O")and(out_contrib[3]="D")) ))or@| 1947@^uppercase@> 1948 ((t=misc)and((v="*")or(v="/"))) 1949 1950@ The following routine is called with $v=\pm1$ when a plus or minus sign is 1951appended to the output. It extends \PASCAL\ to allow repeated signs 1952(e.g., `\.{--}' is equivalent to `\.+'), rather than to give an error message. 1953The signs following `\.E' in real constants are treated as part of a fraction, 1954so they are not seen by this routine. 1955 1956@p procedure send_sign(@!v:integer); 1957begin case out_state of 1958sign, sign_val_sign: out_app:=out_app*v; 1959sign_val:begin out_app:=v; out_state:=sign_val_sign; 1960 end; 1961sign_val_val: begin out_val:=out_val+out_app; out_app:=v; 1962 out_state:=sign_val_sign; 1963 end; 1964othercases begin break_ptr:=out_ptr; out_app:=v; out_state:=sign; 1965 end 1966endcases;@/ 1967last_sign:=out_app; 1968end; 1969 1970@ When a (signed) integer value is to be output, we call |send_val|. 1971 1972@d bad_case=666 {this is a label used below} 1973 1974@p procedure send_val(@!v:integer); {output the (signed) value |v|} 1975label bad_case, {go here if we can't keep |v| in the output state} 1976 exit; 1977begin case out_state of 1978num_or_id: begin @<If previous output was \.{DIV} or \.{MOD}, |goto bad_case|@>; 1979 out_sign:=" "; out_state:=sign_val; out_val:=v; break_ptr:=out_ptr; 1980 last_sign:=+1; 1981 end; 1982misc: begin @<If previous output was \.* or \./, |goto bad_case|@>; 1983 out_sign:=0; out_state:=sign_val; out_val:=v; break_ptr:=out_ptr; 1984 last_sign:=+1; 1985 end; 1986@t\4@>@<Handle cases of |send_val| when |out_state| contains a sign@>@; 1987othercases goto bad_case 1988endcases;@/ 1989return; 1990bad_case: @<Append the decimal value of |v|, with parentheses if negative@>; 1991exit: end; 1992 1993@ @<Handle cases of |send_val|...@>= 1994sign: begin out_sign:="+"; out_state:=sign_val; out_val:=out_app*v; 1995 end; 1996sign_val: begin out_state:=sign_val_val; out_app:=v; 1997 err_print('! Two numbers occurred without a sign between them'); 1998 end; 1999sign_val_sign: begin out_state:=sign_val_val; out_app:=out_app*v; 2000 end; 2001sign_val_val: begin out_val:=out_val+out_app; out_app:=v; 2002 err_print('! Two numbers occurred without a sign between them'); 2003@.Two numbers occurred...@> 2004 end; 2005 2006@ @<If previous output was \.*...@>= 2007if (out_ptr=break_ptr+1)and((out_buf[break_ptr]="*")or(out_buf[break_ptr]="/")) 2008 then goto bad_case 2009 2010@ @<If previous output was \.{DIV}...@>= 2011if (out_ptr=break_ptr+3)or 2012 ((out_ptr=break_ptr+4)and(out_buf[break_ptr]=" ")) then 2013@^uppercase@> 2014 if ((out_buf[out_ptr-3]="D")and(out_buf[out_ptr-2]="I")and 2015 (out_buf[out_ptr-1]="V"))or @/ 2016 ((out_buf[out_ptr-3]="M")and(out_buf[out_ptr-2]="O")and 2017 (out_buf[out_ptr-1]="D")) then@/ goto bad_case 2018 2019@ @<Append the decimal value...@>= 2020if v>=0 then 2021 begin if out_state=num_or_id then 2022 begin break_ptr:=out_ptr; app(" "); 2023 end; 2024 app_val(v); check_break; out_state:=num_or_id; 2025 end 2026else begin app("("); app("-"); app_val(-v); app(")"); check_break; 2027 out_state:=misc; 2028 end 2029 2030@* The big output switch. 2031To complete the output process, we need a routine that takes the results 2032of |get_output| and feeds them to |send_out|, |send_val|, or |send_sign|. 2033This procedure `|send_the_output|' will be invoked just once, as follows: 2034 2035@<Phase II: Output the contents of the compressed tables@>= 2036if text_link[0]=0 then 2037 begin print_nl('! No output was specified.'); mark_harmless; 2038@.No output was specified@> 2039 end 2040else begin print_nl('Writing the output file'); update_terminal;@/ 2041 @<Initialize the output stacks@>; 2042 @<Initialize the output buffer@>; 2043 send_the_output;@/ 2044 @<Empty the last line...@>; 2045 print_nl('Done.'); 2046 end 2047 2048@ A many-way switch is used to send the output: 2049 2050@d get_fraction=2 {this label is used below} 2051 2052@p procedure send_the_output; 2053label get_fraction, {go here to finish scanning a real constant} 2054 reswitch, continue; 2055var cur_char:eight_bits; {the latest character received} 2056 @!k:0..line_length; {index into |out_contrib|} 2057 @!j:0..max_bytes; {index into |byte_mem|} 2058 @!w:0..ww-1; {segment of |byte_mem|} 2059 @!n:integer; {number being scanned} 2060begin while stack_ptr>0 do 2061 begin cur_char:=get_output; 2062 reswitch: case cur_char of 2063 0: do_nothing; {this case might arise if output ends unexpectedly} 2064 @t\4@>@<Cases related to identifiers@>@; 2065 @t\4@>@<Cases related to constants, possibly leading to 2066 |get_fraction| or |reswitch|@>@; 2067 "+","-": send_sign(","-cur_char); 2068 @t\4@>@<Cases like \.{<>} and \.{:=}@>@; 2069 "'": @<Send a string, |goto reswitch|@>; 2070 @<Other printable characters@>: send_out(misc,cur_char); 2071 @t\4@>@<Cases involving \.{@@\{} and \.{@@\}}@>@; 2072 join: begin send_out(frac,0); out_state:=unbreakable; 2073 end; 2074 verbatim: @<Send verbatim string@>; 2075 force_line: @<Force a line break@>; 2076 othercases err_print('! Can''t output ASCII code ',cur_char:1) 2077@.Can't output ASCII code n@> 2078 endcases;@/ 2079 goto continue; 2080 get_fraction: @<Special code to finish real constants@>; 2081 continue: end; 2082end; 2083 2084@ @<Cases like \.{<>}...@>= 2085and_sign: begin out_contrib[1]:="A"; out_contrib[2]:="N"; out_contrib[3]:="D"; 2086@^uppercase@> 2087 send_out(ident,3); 2088 end; 2089not_sign: begin out_contrib[1]:="N"; out_contrib[2]:="O"; out_contrib[3]:="T"; 2090 send_out(ident,3); 2091 end; 2092set_element_sign: begin out_contrib[1]:="I"; out_contrib[2]:="N"; 2093 send_out(ident,2); 2094 end; 2095or_sign: begin out_contrib[1]:="O"; out_contrib[2]:="R"; send_out(ident,2); 2096 end; 2097left_arrow: begin out_contrib[1]:=":"; out_contrib[2]:="="; send_out(str,2); 2098 end; 2099not_equal: begin out_contrib[1]:="<"; out_contrib[2]:=">"; send_out(str,2); 2100 end; 2101less_or_equal: begin out_contrib[1]:="<"; out_contrib[2]:="="; send_out(str,2); 2102 end; 2103greater_or_equal: begin out_contrib[1]:=">"; out_contrib[2]:="="; 2104 send_out(str,2); 2105 end; 2106equivalence_sign: begin out_contrib[1]:="="; out_contrib[2]:="="; 2107 send_out(str,2); 2108 end; 2109double_dot: begin out_contrib[1]:="."; out_contrib[2]:="."; send_out(str,2); 2110 end; 2111 2112@ Please don't ask how all of the following characters can actually get 2113through \.{TANGLE} outside of strings. It seems that |""""| and |"{"| 2114cannot actually occur at this point of the program, but they have 2115been included just in case \.{TANGLE} changes. 2116 2117If \.{TANGLE} is producing code for a \PASCAL\ compiler that uses `\.{(.}' 2118and `\.{.)}' instead of square brackets (e.g., on machines with {\mc EBCDIC} 2119code), one should remove |"["| and |"]"| from this list and put them into 2120the preceding module in the appropriate way. Similarly, some compilers 2121want `\.\^' to be converted to `\.{@@}'. 2122@^system dependencies@>@^EBCDIC@> 2123 2124@<Other printable characters@>= 2125"!","""","#","$","%","&","(",")","*",",","/",":",";","<","=",">","?", 2126"@@","[","\","]","^","_","`","{","|" 2127 2128@ Single-character identifiers represent themselves, while longer ones 2129appear in |byte_mem|. All must be converted to uppercase, 2130with underlines removed. Extremely long identifiers must be chopped. 2131 2132(Some \PASCAL\ compilers work with lowercase letters instead of 2133uppercase. If this module of \.{TANGLE} is changed, it's also necessary 2134to change from uppercase to lowercase in the modules that are 2135listed in the index under ``uppercase''.) 2136@^system dependencies@> 2137@^uppercase@> 2138 2139@d up_to(#)==#-24,#-23,#-22,#-21,#-20,#-19,#-18,#-17,#-16,#-15,#-14, 2140 #-13,#-12,#-11,#-10,#-9,#-8,#-7,#-6,#-5,#-4,#-3,#-2,#-1,# 2141 2142@<Cases related to identifiers@>= 2143"A",up_to("Z"): begin out_contrib[1]:=cur_char; send_out(ident,1); 2144 end; 2145"a",up_to("z"): begin out_contrib[1]:=cur_char-@'40; send_out(ident,1); 2146 end; 2147identifier: begin k:=0; j:=byte_start[cur_val]; w:=cur_val mod ww; 2148 while (k<max_id_length)and(j<byte_start[cur_val+ww]) do 2149 begin incr(k); out_contrib[k]:=byte_mem[w,j]; incr(j); 2150 if out_contrib[k]>="a" then out_contrib[k]:=out_contrib[k]-@'40 2151 else if out_contrib[k]="_" then decr(k); 2152 end; 2153 send_out(ident,k); 2154 end; 2155 2156@ After sending a string, we need to look ahead at the next character, in order 2157to see if there were two consecutive single-quote marks. Afterwards we go to 2158|reswitch| to process the next character. 2159 2160@<Send a string...@>= 2161begin k:=1; out_contrib[1]:="'"; 2162repeat if k<line_length then incr(k); 2163out_contrib[k]:=get_output; 2164until (out_contrib[k]="'")or(stack_ptr=0); 2165if k=line_length then err_print('! String too long'); 2166@.String too long@> 2167send_out(str,k); cur_char:=get_output; 2168if cur_char="'" then out_state:=unbreakable; 2169goto reswitch; 2170end 2171 2172@ Sending a verbatim string is similar, but we don't have to look ahead. 2173 2174@<Send verbatim string@>= 2175begin k:=0; 2176repeat if k<line_length then incr(k); 2177out_contrib[k]:=get_output; 2178until (out_contrib[k]=verbatim)or(stack_ptr=0); 2179if k=line_length then err_print('! Verbatim string too long'); 2180@.Verbatim string too long@> 2181send_out(str,k-1); 2182end 2183 2184@ In order to encourage portable software, \.{TANGLE} complains 2185if the constants get dangerously close to the largest value representable 2186on a 32-bit computer ($2^{31}-1$). 2187 2188@d digits=="0","1","2","3","4","5","6","7","8","9" 2189 2190@<Cases related to constants...@>= 2191digits: begin n:=0; 2192 repeat cur_char:=cur_char-"0"; 2193 if n>=@'1463146314 then err_print('! Constant too big') 2194@.Constant too big@> 2195 else n:=10*n+cur_char; 2196 cur_char:=get_output; 2197 until (cur_char>"9")or(cur_char<"0"); 2198 send_val(n); k:=0; 2199 if cur_char="e" then cur_char:="E"; 2200@^uppercase@> 2201 if cur_char="E" then goto get_fraction 2202 else goto reswitch; 2203 end; 2204check_sum: send_val(pool_check_sum); 2205octal: begin n:=0; cur_char:="0"; 2206 repeat cur_char:=cur_char-"0"; 2207 if n>=@'2000000000 then err_print('! Constant too big') 2208 else n:=8*n+cur_char; 2209 cur_char:=get_output; 2210 until (cur_char>"7")or(cur_char<"0"); 2211 send_val(n); goto reswitch; 2212 end; 2213hex: begin n:=0; cur_char:="0"; 2214 repeat if cur_char>="A" then cur_char:=cur_char+10-"A" 2215 else cur_char:=cur_char-"0"; 2216 if n>=@"8000000 then err_print('! Constant too big') 2217 else n:=16*n+cur_char; 2218 cur_char:=get_output; 2219 until (cur_char>"F")or(cur_char<"0")or@| 2220 ((cur_char>"9")and(cur_char<"A")); 2221 send_val(n); goto reswitch; 2222 end; 2223number: send_val(cur_val); 2224".": begin k:=1; out_contrib[1]:="."; cur_char:=get_output; 2225 if cur_char="." then 2226 begin out_contrib[2]:="."; send_out(str,2); 2227 end 2228 else if (cur_char>="0")and(cur_char<="9") then goto get_fraction 2229 else begin send_out(misc,"."); goto reswitch; 2230 end; 2231 end; 2232 2233@ The following code appears at label `|get_fraction|', when we want to 2234scan to the end of a real constant. The first |k| characters of a fraction 2235have already been placed in |out_contrib|, and |cur_char| is the next character. 2236 2237@<Special code...@>= 2238repeat if k<line_length then incr(k); 2239out_contrib[k]:=cur_char; cur_char:=get_output; 2240if (out_contrib[k]="E")and((cur_char="+")or(cur_char="-")) then 2241@^uppercase@> 2242 begin if k<line_length then incr(k); 2243 out_contrib[k]:=cur_char; cur_char:=get_output; 2244 end 2245else if cur_char="e" then cur_char:="E"; 2246until (cur_char<>"E")and((cur_char<"0")or(cur_char>"9")); 2247if k=line_length then err_print('! Fraction too long'); 2248@.Fraction too long@> 2249send_out(frac,k); goto reswitch 2250 2251@ Some \PASCAL\ compilers do not recognize comments in braces, so the 2252comments must be delimited by `\.{(*}' and `\.{*)}'. 2253@^system dependencies@> 2254In such cases the statement `|send_out(misc,"{")|' that appears here should 2255be replaced by `\ignorespaces|begin out_contrib[1]:="("; out_contrib[2]:="*"; 2256send_out(str,2); end|', and a similar change should be made to 2257`|send_out(misc,"}")|'. 2258 2259@<Cases involving \.{@@\{} and \.{@@\}}@>= 2260begin_comment: begin if brace_level=0 then send_out(misc,"{") 2261 else send_out(misc,"["); 2262 incr(brace_level); 2263 end; 2264end_comment: if brace_level>0 then 2265 begin decr(brace_level); 2266 if brace_level=0 then send_out(misc,"}") 2267 else send_out(misc,"]"); 2268 end 2269 else err_print('! Extra @@}'); 2270@.Extra \AT!\}@> 2271module_number: begin if brace_level=0 then send_out(misc,"{") 2272 else send_out(misc,"["); 2273 if cur_val<0 then 2274 begin send_out(misc,":"); send_val(-cur_val); 2275 end 2276 else begin send_val(cur_val); send_out(misc,":"); 2277 end; 2278 if brace_level=0 then send_out(misc,"}") 2279 else send_out(misc,"]"); 2280 end; 2281 2282@ @<Force a line break@>= 2283begin send_out(str,0); {normalize the buffer} 2284while out_ptr>0 do 2285 begin if out_ptr<=line_length then break_ptr:=out_ptr; 2286 flush_buffer; 2287 end; 2288out_state:=misc; 2289end 2290 2291@* Introduction to the input phase. 2292We have now seen that \.{TANGLE} will be able to output the full 2293\PASCAL\ program, if we can only get that program into the byte memory in 2294the proper format. The input process is something like the output process 2295in reverse, since we compress the text as we read it in and we expand it 2296as we write it out. 2297 2298There are three main input routines. The most interesting is the one that gets 2299the next token of a \PASCAL\ text; the other two are used to scan rapidly past 2300\TeX\ text in the \.{WEB} source code. One of the latter routines will jump to 2301the next token that starts with `\.{@@}', and the other skips to the end 2302of a \PASCAL\ comment. 2303 2304@ But first we need to consider the low-level routine |get_line| 2305that takes care of merging |change_file| into |web_file|. The |get_line| 2306procedure also updates the line numbers for error messages. 2307 2308@<Globals...@>= 2309@!ii:integer; {general purpose |for| loop variable in the outer block} 2310@!line:integer; {the number of the current line in the current file} 2311@!other_line:integer; {the number of the current line in the input file that 2312 is not currently being read} 2313@!temp_line:integer; {used when interchanging |line| with |other_line|} 2314@!limit:0..buf_size; {the last character position occupied in the buffer} 2315@!loc:0..buf_size; {the next character position to be read from the buffer} 2316@!input_has_ended: boolean; {if |true|, there is no more input} 2317@!changing: boolean; {if |true|, the current line is from |change_file|} 2318 2319@ As we change |changing| from |true| to |false| and back again, we must 2320remember to swap the values of |line| and |other_line| so that the |err_print| 2321routine will be sure to report the correct line number. 2322 2323@d change_changing== 2324 changing := not changing; 2325 temp_line:=other_line; other_line:=line; line:=temp_line 2326 {|line @t$\null\BA\null$@> other_line|} 2327 2328@ When |changing| is |false|, the next line of |change_file| is kept in 2329|change_buffer[0..change_limit]|, for purposes of comparison with the next 2330line of |web_file|. After the change file has been completely input, we 2331set |change_limit:=0|, so that no further matches will be made. 2332 2333@<Globals...@>= 2334@!change_buffer:array[0..buf_size] of ASCII_code; 2335@!change_limit:0..buf_size; {the last position occupied in |change_buffer|} 2336 2337@ Here's a simple function that checks if the two buffers are different. 2338 2339@p function lines_dont_match:boolean; 2340label exit; 2341var k:0..buf_size; {index into the buffers} 2342begin lines_dont_match:=true; 2343if change_limit<>limit then return; 2344if limit>0 then 2345 for k:=0 to limit-1 do if change_buffer[k]<>buffer[k] then return; 2346lines_dont_match:=false; 2347exit: end; 2348 2349@ Procedure |prime_the_change_buffer| sets |change_buffer| in preparation 2350for the next matching operation. Since blank lines in the change file are 2351not used for matching, we have |(change_limit=0)and not changing| if and 2352only if the change file is exhausted. This procedure is called only 2353when |changing| is true; hence error messages will be reported correctly. 2354 2355@p procedure prime_the_change_buffer; 2356label continue, done, exit; 2357var k:0..buf_size; {index into the buffers} 2358begin change_limit:=0; {this value will be used if the change file ends} 2359@<Skip over comment lines in the change file; |return| if end of file@>; 2360@<Skip to the next nonblank line; |return| if end of file@>; 2361@<Move |buffer| and |limit| to |change_buffer| and |change_limit|@>; 2362exit: end; 2363 2364@ While looking for a line that begins with \.{@@x} in the change file, 2365we allow lines that begin with \.{@@}, as long as they don't begin with 2366\.{@@y} or \.{@@z} (which would probably indicate that the change file is 2367fouled up). 2368 2369@<Skip over comment lines in the change file...@>= 2370loop@+ begin incr(line); 2371 if not input_ln(change_file) then return; 2372 if limit<2 then goto continue; 2373 if buffer[0]<>"@@" then goto continue; 2374 if (buffer[1]>="X")and(buffer[1]<="Z") then 2375 buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify} 2376 if buffer[1]="x" then goto done; 2377 if (buffer[1]="y")or(buffer[1]="z") then 2378 begin loc:=2; err_print('! Where is the matching @@x?'); 2379@.Where is the match...@> 2380 end; 2381continue: end; 2382done: 2383 2384@ Here we are looking at lines following the \.{@@x}. 2385 2386@<Skip to the next nonblank line...@>= 2387repeat incr(line); 2388 if not input_ln(change_file) then 2389 begin err_print('! Change file ended after @@x'); 2390@.Change file ended...@> 2391 return; 2392 end; 2393until limit>0; 2394 2395@ @<Move |buffer| and |limit| to |change_buffer| and |change_limit|@>= 2396begin change_limit:=limit; 2397if limit>0 then for k:=0 to limit-1 do change_buffer[k]:=buffer[k]; 2398end 2399 2400@ The following procedure is used to see if the next change entry should 2401go into effect; it is called only when |changing| is false. 2402The idea is to test whether or not the current 2403contents of |buffer| matches the current contents of |change_buffer|. 2404If not, there's nothing more to do; but if so, a change is called for: 2405All of the text down to the \.{@@y} is supposed to match. An error 2406message is issued if any discrepancy is found. Then the procedure 2407prepares to read the next line from |change_file|. 2408 2409@p procedure check_change; {switches to |change_file| if the buffers match} 2410label exit; 2411var n:integer; {the number of discrepancies found} 2412@!k:0..buf_size; {index into the buffers} 2413begin if lines_dont_match then return; 2414n:=0; 2415loop@+ begin change_changing; {now it's |true|} 2416 incr(line); 2417 if not input_ln(change_file) then 2418 begin err_print('! Change file ended before @@y'); 2419@.Change file ended...@> 2420 change_limit:=0; change_changing; {|false| again} 2421 return; 2422 end; 2423 @<If the current line starts with \.{@@y}, 2424 report any discrepancies and |return|@>; 2425 @<Move |buffer| and |limit|...@>; 2426 change_changing; {now it's |false|} 2427 incr(line); 2428 if not input_ln(web_file) then 2429 begin err_print('! WEB file ended during a change'); 2430@.WEB file ended...@> 2431 input_has_ended:=true; return; 2432 end; 2433 if lines_dont_match then incr(n); 2434 end; 2435exit: end; 2436 2437@ @<If the current line starts with \.{@@y}...@>= 2438if limit>1 then if buffer[0]="@@" then 2439 begin if (buffer[1]>="X")and(buffer[1]<="Z") then 2440 buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify} 2441 if (buffer[1]="x")or(buffer[1]="z") then 2442 begin loc:=2; err_print('! Where is the matching @@y?'); 2443@.Where is the match...@> 2444 end 2445 else if buffer[1]="y" then 2446 begin if n>0 then 2447 begin loc:=2; err_print('! Hmm... ',n:1, 2448 ' of the preceding lines failed to match'); 2449@.Hmm... n of the preceding...@> 2450 end; 2451 return; 2452 end; 2453 end 2454 2455@ @<Initialize the input system@>= 2456open_input; line:=0; other_line:=0;@/ 2457changing:=true; prime_the_change_buffer; change_changing;@/ 2458limit:=0; loc:=1; buffer[0]:=" "; input_has_ended:=false; 2459 2460@ The |get_line| procedure is called when |loc>limit|; it puts the next 2461line of merged input into the buffer and updates the other variables 2462appropriately. A space is placed at the right end of the line. 2463 2464@p procedure get_line; {inputs the next line} 2465label restart; 2466begin restart: if changing then 2467 @<Read from |change_file| and maybe turn off |changing|@>; 2468if not changing then 2469 begin @<Read from |web_file| and maybe turn on |changing|@>; 2470 if changing then goto restart; 2471 end; 2472loc:=0; buffer[limit]:=" "; 2473end; 2474 2475@ @<Read from |web_file|...@>= 2476begin incr(line); 2477if not input_ln(web_file) then input_has_ended:=true 2478else if limit=change_limit then 2479 if buffer[0]=change_buffer[0] then 2480 if change_limit>0 then check_change; 2481end 2482 2483@ @<Read from |change_file|...@>= 2484begin incr(line); 2485if not input_ln(change_file) then 2486 begin err_print('! Change file ended without @@z'); 2487@.Change file ended...@> 2488 buffer[0]:="@@"; buffer[1]:="z"; limit:=2; 2489 end; 2490if limit>1 then {check if the change has ended} 2491 if buffer[0]="@@" then 2492 begin if (buffer[1]>="X")and(buffer[1]<="Z") then 2493 buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify} 2494 if (buffer[1]="x")or(buffer[1]="y") then 2495 begin loc:=2; err_print('! Where is the matching @@z?'); 2496@.Where is the match...@> 2497 end 2498 else if buffer[1]="z" then 2499 begin prime_the_change_buffer; change_changing; 2500 end; 2501 end; 2502end 2503 2504@ At the end of the program, we will tell the user if the change file 2505had a line that didn't match any relevant line in |web_file|. 2506 2507@<Check that all changes have been read@>= 2508if change_limit<>0 then {|changing| is false} 2509 begin for ii:=0 to change_limit do buffer[ii]:=change_buffer[ii]; 2510 limit:=change_limit; changing:=true; line:=other_line; loc:=change_limit; 2511 err_print('! Change file entry did not match'); 2512@.Change file entry did not match@> 2513 end 2514 2515@ Important milestones are reached during the input phase when certain 2516control codes are sensed. 2517 2518Control codes in \.{WEB} begin with `\.{@@}', and the next character 2519identifies the code. Some of these are of interest only to \.{WEAVE}, 2520so \.{TANGLE} ignores them; the others are converted by \.{TANGLE} into 2521internal code numbers by the |control_code| function below. The ordering 2522of these internal code numbers has been chosen to simplify the program logic; 2523larger numbers are given to the control codes that denote more significant 2524milestones. 2525 2526@d ignore=0 {control code of no interest to \.{TANGLE}} 2527@d control_text=@'203 {control code for `\.{@@t}', `\.{@@\^}', etc.} 2528@d format=@'204 {control code for `\.{@@f}'} 2529@d definition=@'205 {control code for `\.{@@d}'} 2530@d begin_Pascal=@'206 {control code for `\.{@@p}'} 2531@d module_name=@'207 {control code for `\.{@@<}'} 2532@d new_module=@'210 {control code for `\.{@@\ }' and `\.{@@*}'} 2533 2534@p function control_code(@!c:ASCII_code):eight_bits; {convert |c| after \.{@@}} 2535begin case c of 2536"@@": control_code:="@@"; {`quoted' at sign} 2537"'": control_code:=octal; {precedes octal constant} 2538"""": control_code:=hex; {precedes hexadecimal constant} 2539"$": control_code:=check_sum; {string pool check sum} 2540" ",tab_mark: control_code:=new_module; {beginning of a new module} 2541"*": begin print('*',module_count+1:1); 2542 update_terminal; {print a progress report} 2543 control_code:=new_module; {beginning of a new module} 2544 end; 2545"D","d": control_code:=definition; {macro definition} 2546"F","f": control_code:=format; {format definition} 2547"{": control_code:=begin_comment; {begin-comment delimiter} 2548"}": control_code:=end_comment; {end-comment delimiter} 2549"P","p": control_code:=begin_Pascal; {\PASCAL\ text in unnamed module} 2550"T","t","^",".",":": control_code:=control_text; {control text to be ignored} 2551"&": control_code:=join; {concatenate two tokens} 2552"<": control_code:=module_name; {beginning of a module name} 2553"=": control_code:=verbatim; {beginning of \PASCAL\ verbatim mode} 2554"\": control_code:=force_line; {force a new line in \PASCAL\ output} 2555othercases control_code:=ignore {ignore all other cases} 2556endcases; 2557end; 2558 2559@ The |skip_ahead| procedure reads through the input at fairly high speed 2560until finding the next non-ignorable control code, which it returns. 2561 2562@p function skip_ahead:eight_bits; {skip to next control code} 2563label done; 2564var c:eight_bits; {control code found} 2565begin loop begin if loc>limit then 2566 begin get_line; 2567 if input_has_ended then 2568 begin c:=new_module; goto done; 2569 end; 2570 end; 2571 buffer[limit+1]:="@@"; 2572 while buffer[loc]<>"@@" do incr(loc); 2573 if loc<=limit then 2574 begin loc:=loc+2; c:=control_code(buffer[loc-1]); 2575 if (c<>ignore)or(buffer[loc-1]=">") then goto done; 2576 end; 2577 end; 2578done: skip_ahead:=c; 2579end; 2580 2581@ The |skip_comment| procedure reads through the input at somewhat high speed 2582until finding the first unmatched right brace or until coming to the end 2583of the file. It ignores characters following `\.\\' characters, since all 2584braces that aren't nested are supposed to be hidden in that way. For 2585example, consider the process of skipping the first comment below, 2586where the string containing the right brace has been typed as \.{\`\\.\\\}\'} 2587in the \.{WEB} file. 2588 2589@p procedure skip_comment; {skips to next unmatched `\.\}'} 2590label exit; 2591var bal:eight_bits; {excess of left braces} 2592@!c:ASCII_code; {current character} 2593begin bal:=0; 2594loop@+ begin if loc>limit then 2595 begin get_line; 2596 if input_has_ended then 2597 begin err_print('! Input ended in mid-comment'); 2598@.Input ended in mid-comment@> 2599 return; 2600 end; 2601 end; 2602 c:=buffer[loc]; incr(loc); 2603 @<Do special things when |c="@@", "\", "{", "}"|; |return| at end@>; 2604 end; 2605exit:end; 2606 2607@ @<Do special things when |c="@@"...@>= 2608if c="@@" then 2609 begin c:=buffer[loc]; 2610 if (c<>" ")and(c<>tab_mark)and(c<>"*")and(c<>"z")and(c<>"Z") then incr(loc) 2611 else begin err_print('! Section ended in mid-comment'); 2612@.Section ended in mid-comment@> 2613 decr(loc); return; 2614 end 2615 end 2616else if (c="\")and(buffer[loc]<>"@@") then incr(loc) 2617else if c="{" then incr(bal) 2618else if c="}" then 2619 begin if bal=0 then return; 2620 decr(bal); 2621 end 2622 2623@* Inputting the next token. 2624As stated above, \.{TANGLE}'s most interesting input procedure is the 2625|get_next| routine that inputs the next token. However, the procedure 2626isn't especially difficult. 2627 2628In most cases the tokens output by |get_next| have the form used in 2629replacement texts, except that two-byte tokens are not produced. 2630An identifier that isn't one letter long is represented by the 2631output `|identifier|', and in such a case the global variables 2632|id_first| and |id_loc| will have been set to the appropriate values 2633needed by the |id_lookup| procedure. A string that begins with a 2634double-quote is also considered an |identifier|, and in such a case 2635the global variable |double_chars| will also have been set appropriately. 2636Control codes produce the corresponding output of the |control_code| 2637function above; and if that code is |module_name|, the value of |cur_module| 2638will point to the |byte_start| entry for that module name. 2639 2640Another global variable, |scanning_hex|, is |true| during the time that 2641the letters \.A through \.F should be treated as if they were digits. 2642 2643@<Globals...@>= 2644@!cur_module: name_pointer; {name of module just scanned} 2645@!scanning_hex: boolean; {are we scanning a hexadecimal constant?} 2646 2647@ @<Set init...@>= 2648scanning_hex:=false; 2649 2650@ At the top level, |get_next| is a multi-way switch based on the next 2651character in the input buffer. A |new_module| code is inserted at the 2652very end of the input file. 2653 2654@p function get_next:eight_bits; {produces the next input token} 2655label restart,done,found; 2656var c:eight_bits; {the current character} 2657@!d:eight_bits; {the next character} 2658@!j,@!k:0..longest_name; {indices into |mod_text|} 2659begin restart: if loc>limit then 2660 begin get_line; 2661 if input_has_ended then 2662 begin c:=new_module; goto found; 2663 end; 2664 end; 2665c:=buffer[loc]; incr(loc); 2666if scanning_hex then @<Go to |found| if |c| is a hexadecimal digit, 2667 otherwise set |scanning_hex:=false|@>; 2668case c of 2669"A",up_to("Z"),"a",up_to("z"): @<Get an identifier@>; 2670"""": @<Get a preprocessed string@>; 2671"@@": @<Get control code and possible module name@>; 2672@t\4@>@<Compress two-symbol combinations like `\.{:=}'@>@; 2673" ",tab_mark: goto restart; {ignore spaces and tabs} 2674"{": begin skip_comment; goto restart; 2675 end; 2676"}": begin err_print('! Extra }'); goto restart; 2677@.Extra \}@> 2678 end; 2679othercases if c>=128 then goto restart {ignore nonstandard characters} 2680 else do_nothing 2681endcases; 2682found:@!debug if trouble_shooting then debug_help;@;@+gubed@/ 2683get_next:=c; 2684end; 2685 2686@ @<Go to |found| if |c| is a hexadecimal digit...@>= 2687if ((c>="0")and(c<="9"))or((c>="A")and(c<="F")) then goto found 2688else scanning_hex:=false 2689 2690@ Note that the following code substitutes \.{@@\{} and \.{@@\}} for the 2691respective combinations `\.{(*}' and `\.{*)}'. Explicit braces should be used 2692for \TeX\ comments in \PASCAL\ text. 2693 2694@d compress(#)==begin if loc<=limit then begin c:=#; incr(loc); end; end 2695 2696@<Compress two-symbol...@>= 2697".": if buffer[loc]="." then compress(double_dot) 2698 else if buffer[loc]=")" then compress("]"); 2699":": if buffer[loc]="=" then compress(left_arrow); 2700"=": if buffer[loc]="=" then compress(equivalence_sign); 2701">": if buffer[loc]="=" then compress(greater_or_equal); 2702"<": if buffer[loc]="=" then compress(less_or_equal) 2703 else if buffer[loc]=">" then compress(not_equal); 2704"(": if buffer[loc]="*" then compress(begin_comment) 2705 else if buffer[loc]="." then compress("["); 2706"*": if buffer[loc]=")" then compress(end_comment); 2707 2708@ We have to look at the preceding character to make sure this isn't part 2709of a real constant, before trying to find an identifier starting with 2710`\.e' or `\.E'. 2711 2712@<Get an identifier@>= 2713begin if ((c="e")or(c="E"))and(loc>1) then 2714 if (buffer[loc-2]<="9")and(buffer[loc-2]>="0") then c:=0; 2715if c<>0 then 2716 begin decr(loc); id_first:=loc; 2717 repeat incr(loc); d:=buffer[loc]; 2718 until ((d<"0")or((d>"9")and(d<"A"))or((d>"Z")and(d<"a"))or(d>"z")) and 2719 (d<>"_"); 2720 if loc>id_first+1 then 2721 begin c:=identifier; id_loc:=loc; 2722 end; 2723 end 2724else c:="E"; {exponent of a real constant} 2725end 2726 2727@ A string that starts and ends with double-quote marks is converted into 2728an identifier that behaves like a numeric macro by means of the following 2729piece of the program. 2730@^preprocessed strings@> 2731 2732@<Get a preprocessed string@>= 2733begin double_chars:=0; id_first:=loc-1; 2734repeat d:=buffer[loc]; incr(loc); 2735 if (d="""")or(d="@@") then 2736 if buffer[loc]=d then 2737 begin incr(loc); d:=0; incr(double_chars); 2738 end 2739 else begin if d="@@" then err_print('! Double @@ sign missing') 2740@.Double \AT! sign missing@> 2741 end 2742 else if loc>limit then 2743 begin err_print('! String constant didn''t end'); d:=""""; 2744@.String constant didn't end@> 2745 end; 2746until d=""""; 2747id_loc:=loc-1; c:=identifier; 2748end 2749 2750@ After an \.{@@} sign has been scanned, the next character tells us 2751whether there is more work to do. 2752 2753@<Get control code and possible module name@>= 2754begin c:=control_code(buffer[loc]); incr(loc); 2755if c=ignore then goto restart 2756else if c=hex then scanning_hex:=true 2757else if c=module_name then 2758 @<Scan the \(module name and make |cur_module| point to it@> 2759else if c=control_text then 2760 begin repeat c:=skip_ahead; 2761 until c<>"@@"; 2762 if buffer[loc-1]<>">" then 2763 err_print('! Improper @@ within control text'); 2764@.Improper \AT! within control text@> 2765 goto restart; 2766 end; 2767end 2768 2769@ @<Scan the \(module name...@>= 2770begin @<Put module name into |mod_text[1..k]|@>; 2771if k>3 then 2772 begin if (mod_text[k]=".")and(mod_text[k-1]=".")and(mod_text[k-2]=".") then 2773 cur_module:=prefix_lookup(k-3) 2774 else cur_module:=mod_lookup(k); 2775 end 2776else cur_module:=mod_lookup(k); 2777end 2778 2779@ Module names are placed into the |mod_text| array with consecutive spaces, 2780tabs, and carriage-returns replaced by single spaces. There will be no 2781spaces at the beginning or the end. (We set |mod_text[0]:=" "| to facilitate 2782this, since the |mod_lookup| routine uses |mod_text[1]| as the first 2783character of the name.) 2784 2785@<Set init...@>=mod_text[0]:=" "; 2786 2787@ @<Put module name...@>= 2788k:=0; 2789loop@+ begin if loc>limit then 2790 begin get_line; 2791 if input_has_ended then 2792 begin err_print('! Input ended in section name'); 2793@.Input ended in section name@> 2794 goto done; 2795 end; 2796 end; 2797 d:=buffer[loc]; 2798 @<If end of name, |goto done|@>; 2799 incr(loc); if k<longest_name-1 then incr(k); 2800 if (d=" ")or(d=tab_mark) then 2801 begin d:=" "; if mod_text[k-1]=" " then decr(k); 2802 end; 2803 mod_text[k]:=d; 2804 end; 2805done: @<Check for overlong name@>; 2806if (mod_text[k]=" ")and(k>0) then decr(k); 2807 2808@ @<If end of name,...@>= 2809if d="@@" then 2810 begin d:=buffer[loc+1]; 2811 if d=">" then 2812 begin loc:=loc+2; goto done; 2813 end; 2814 if (d=" ")or(d=tab_mark)or(d="*") then 2815 begin err_print('! Section name didn''t end'); goto done; 2816@.Section name didn't end@> 2817 end; 2818 incr(k); mod_text[k]:="@@"; incr(loc); {now |d=buffer[loc]| again} 2819 end 2820 2821@ @<Check for overlong name@>= 2822if k>=longest_name-2 then 2823 begin print_nl('! Section name too long: '); 2824@.Section name too long@> 2825 for j:=1 to 25 do print(xchr[mod_text[j]]); 2826 print('...'); mark_harmless; 2827 end 2828 2829@* Scanning a numeric definition. 2830When \.{TANGLE} looks at the \PASCAL\ text following the `\.=' of a numeric 2831macro definition, it calls on the precedure |scan_numeric(p)|, where |p| 2832points to the name that is to be defined. This procedure evaluates the 2833right-hand side, which must consist entirely of integer constants and 2834defined numeric macros connected with \.+ and \.- signs (no parentheses). 2835It also sets the global variable |next_control| to the control code that 2836terminated this definition. 2837 2838A definition ends with the control codes |definition|, |format|, |module_name|, 2839|begin_Pascal|, and |new_module|, all of which can be recognized 2840by the fact that they are the largest values |get_next| can return. 2841 2842@d end_of_definition(#)==(#>=format) 2843 {is |#| a control code ending a definition?} 2844 2845@<Global...@>= 2846@!next_control:eight_bits; {control code waiting to be acted upon} 2847 2848@ The evaluation of a numeric expression makes use of two variables called the 2849|accumulator| and the |next_sign|. At the beginning, |accumulator| is zero and 2850|next_sign| is $+1$. When a \.+ or \.- is scanned, |next_sign| is multiplied 2851by the value of that sign. When a numeric value is scanned, it is multiplied by 2852|next_sign| and added to the |accumulator|, then |next_sign| is reset to $+1$. 2853 2854@d add_in(#)==begin accumulator:=accumulator+next_sign*(#); next_sign:=+1; 2855 end 2856 2857@p procedure scan_numeric(@!p:name_pointer); {defines numeric macros} 2858label reswitch, done; 2859var accumulator:integer; {accumulates sums} 2860@!next_sign:-1..+1; {sign to attach to next value} 2861@!q:name_pointer; {points to identifiers being evaluated} 2862@!val:integer; {constants being evaluated} 2863begin @<Set \(|accumulator| to the value of the right-hand side@>; 2864if abs(accumulator)>=1073741824 then 2865 begin err_print('! Value too big: ',accumulator:1); accumulator:=0; 2866@.Value too big@> 2867 end; 2868equiv[p]:=accumulator+1073741824; 2869 {name |p| now is defined to equal |accumulator|} 2870end; 2871 2872@ @<Set \(|accumulator| to the value of the right-hand side@>= 2873accumulator:=0; next_sign:=+1; 2874loop@+ begin next_control:=get_next; 2875 reswitch: case next_control of 2876 digits: begin @<Set |val| to value of decimal constant, and 2877 set |next_control| to the following token@>; 2878 add_in(val); goto reswitch; 2879 end; 2880 octal: begin @<Set |val| to value of octal constant, and 2881 set |next_control| to the following token@>; 2882 add_in(val); goto reswitch; 2883 end; 2884 hex: begin @<Set |val| to value of hexadecimal constant, and 2885 set |next_control| to the following token@>; 2886 add_in(val); goto reswitch; 2887 end; 2888 identifier: begin q:=id_lookup(normal); 2889 if ilk[q]<>numeric then 2890 begin next_control:="*"; goto reswitch; {leads to error} 2891 end; 2892 add_in(equiv[q]-1073741824); 2893 end; 2894 "+": do_nothing; 2895 "-": next_sign:=-next_sign; 2896 format, definition, module_name, begin_Pascal, new_module: goto done; 2897 ";": err_print('! Omit semicolon in numeric definition'); 2898@.Omit semicolon in numeric def...@> 2899 othercases @<Signal error, flush rest of the definition@> 2900 endcases; 2901 end; 2902done: 2903 2904@ @<Signal error, flush rest...@>= 2905begin err_print('! Improper numeric definition will be flushed'); 2906@.Improper numeric definition...@> 2907repeat next_control:=skip_ahead 2908until end_of_definition(next_control); 2909if next_control=module_name then 2910 begin {we want to scan the module name too} 2911 loc:=loc-2; next_control:=get_next; 2912 end; 2913accumulator:=0; goto done; 2914end 2915 2916@ @<Set |val| to value of decimal...@>= 2917val:=0; 2918repeat val:=10*val+next_control-"0"; next_control:=get_next; 2919until (next_control>"9")or(next_control<"0") 2920 2921@ @<Set |val| to value of octal...@>= 2922val:=0; next_control:="0"; 2923repeat val:=8*val+next_control-"0"; next_control:=get_next; 2924until (next_control>"7")or(next_control<"0") 2925 2926@ @<Set |val| to value of hex...@>= 2927val:=0; next_control:="0"; 2928repeat if next_control>="A" then next_control:=next_control+"0"+10-"A"; 2929val:=16*val+next_control-"0"; next_control:=get_next; 2930until (next_control>"F")or(next_control<"0")or@| 2931 ((next_control>"9")and(next_control<"A")) 2932 2933@* Scanning a macro definition. 2934The rules for generating the replacement texts corresponding to simple 2935macros, parametric macros, and \PASCAL\ texts of a module are almost 2936identical, so a single procedure is used for all three cases. The 2937differences are that 2938 2939\yskip\item{a)} The sign |#| denotes a parameter only when it appears 2940outside of strings in a parametric macro; otherwise it stands for the 2941ASCII character |#|. (This is not used in standard \PASCAL, but some 2942\PASCAL s allow, for example, `\.{/\#}' after a certain kind of file name.) 2943 2944\item{b)}Module names are not allowed in simple macros or parametric macros; 2945in fact, the appearance of a module name terminates such macros and denotes 2946the name of the current module. 2947 2948\item{c)}The symbols \.{@@d} and \.{@@f} and \.{@@p} are not allowed after 2949module names, while they terminate macro definitions. 2950 2951@ Therefore there is a procedure |scan_repl| whose parameter |t| specifies 2952either |simple| or |parametric| or |module_name|. After |scan_repl| has 2953acted, |cur_repl_text| will point to the replacement text just generated, and 2954|next_control| will contain the control code that terminated the activity. 2955 2956@<Globals...@>= 2957@!cur_repl_text:text_pointer; {replacement text formed by |scan_repl|} 2958 2959@ @p procedure scan_repl(@!t:eight_bits); {creates a replacement text} 2960label continue, done, found, reswitch; 2961var a:sixteen_bits; {the current token} 2962@!b:ASCII_code; {a character from the buffer} 2963@!bal:eight_bits; {left parentheses minus right parentheses} 2964begin bal:=0; 2965loop@+ begin continue: a:=get_next; 2966 case a of 2967 "(": incr(bal); 2968 ")": if bal=0 then err_print('! Extra )') 2969@.Extra )@> 2970 else decr(bal); 2971 "'": @<Copy a string from the buffer to |tok_mem|@>; 2972 "#": if t=parametric then a:=param; 2973 @t\4@>@<In cases that |a| is a non-ASCII token (|identifier|, 2974 |module_name|, etc.), either process it and change |a| to a byte 2975 that should be stored, or |goto continue| if |a| should be ignored, 2976 or |goto done| if |a| signals the end of this replacement text@>@; 2977 othercases do_nothing 2978 endcases;@/ 2979 app_repl(a); {store |a| in |tok_mem|} 2980 end; 2981done: next_control:=a; 2982@<Make sure the parentheses balance@>; 2983if text_ptr>max_texts-zz then overflow('text'); 2984cur_repl_text:=text_ptr; tok_start[text_ptr+zz]:=tok_ptr[z]; 2985incr(text_ptr); 2986if z=zz-1 then z:=0@+else incr(z); 2987end; 2988 2989@ @<Make sure the parentheses balance@>= 2990if bal>0 then 2991 begin if bal=1 then err_print('! Missing )') 2992 else err_print('! Missing ',bal:1,' )''s'); 2993@.Missing n )@> 2994 while bal>0 do 2995 begin app_repl(")"); decr(bal); 2996 end; 2997 end 2998 2999@ @<In cases that |a| is...@>= 3000identifier: begin a:=id_lookup(normal); app_repl((a div @'400)+@'200); 3001 a:=a mod @'400; 3002 end; 3003module_name: if t<>module_name then goto done 3004 else begin app_repl((cur_module div @'400)+@'250); 3005 a:=cur_module mod @'400; 3006 end; 3007verbatim: @<Copy verbatim string from the buffer to |tok_mem|@>; 3008definition, format, begin_Pascal: if t<>module_name then goto done 3009 else begin err_print('! @@',xchr[buffer[loc-1]], 3010@.\AT!p is ignored in Pascal text@> 3011@.\AT!d is ignored in Pascal text@> 3012@.\AT!f is ignored in Pascal text@> 3013 ' is ignored in Pascal text'); goto continue; 3014 end; 3015new_module: goto done; 3016 3017@ @<Copy a string...@>= 3018begin b:="'"; 3019loop@+ begin app_repl(b); 3020 if b="@@" then 3021 if buffer[loc]="@@" then incr(loc) {store only one \.{@@}} 3022 else err_print('! You should double @@ signs in strings'); 3023@.You should double \AT! signs@> 3024 if loc=limit then 3025 begin err_print('! String didn''t end'); 3026@.String didn't end@> 3027 buffer[loc]:="'"; buffer[loc+1]:=0; 3028 end; 3029 b:=buffer[loc]; incr(loc); 3030 if b="'" then 3031 begin if buffer[loc]<>"'" then goto found 3032 else begin incr(loc); app_repl("'"); 3033 end; 3034 end; 3035 end; 3036found: end {now |a| holds the final |"'"| that will be stored} 3037 3038@ @<Copy verbatim string...@>= 3039begin app_repl(verbatim); 3040buffer[limit+1]:="@@"; 3041reswitch: if buffer[loc]="@@" then 3042 begin if loc<limit then if buffer[loc+1]="@@" then 3043 begin app_repl("@@"); 3044 loc:=loc+2; 3045 goto reswitch; 3046 end; 3047 end 3048else begin app_repl(buffer[loc]); 3049 incr(loc); 3050 goto reswitch; 3051 end; 3052if loc>=limit then err_print('! Verbatim string didn''t end') 3053@.Verbatim string didn't end@> 3054else if buffer[loc+1]<>">" then 3055 err_print('! You should double @@ signs in verbatim strings'); 3056@.You should double \AT! signs@> 3057loc:=loc+2; 3058end {another |verbatim| byte will be stored, since |a=verbatim|} 3059 3060@ The following procedure is used to define a simple or parametric macro, 3061just after the `\.{==}' of its definition has been scanned. 3062 3063@p procedure define_macro(@!t:eight_bits); 3064var p:name_pointer; {the identifier being defined} 3065begin p:=id_lookup(t); scan_repl(t);@/ 3066equiv[p]:=cur_repl_text; text_link[cur_repl_text]:=0; 3067end; 3068 3069@* Scanning a module. 3070The |scan_module| procedure starts when `\.{@@\ }' or `\.{@@*}' has been 3071sensed in the input, and it proceeds until the end of that module. It 3072uses |module_count| to keep track of the current module number; with luck, 3073\.{WEAVE} and \.{TANGLE} will both assign the same numbers to modules. 3074 3075@<Globals...@>= 3076@!module_count:0..@'27777; {the current module number} 3077 3078@ The top level of |scan_module| is trivial. 3079@p procedure scan_module; 3080label continue, done, exit; 3081var p:name_pointer; {module name for the current module} 3082begin incr(module_count); 3083@<Scan the \(definition part of the current module@>; 3084@<Scan the \PASCAL\ part of the current module@>; 3085exit: end; 3086 3087@ @<Scan the \(definition part...@>= 3088next_control:=0; 3089loop@+ begin continue: while next_control<=format do 3090 begin next_control:=skip_ahead; 3091 if next_control=module_name then 3092 begin {we want to scan the module name too} 3093 loc:=loc-2; next_control:=get_next; 3094 end; 3095 end; 3096 if next_control<>definition then goto done; 3097 next_control:=get_next; {get identifier name} 3098 if next_control<>identifier then 3099 begin err_print('! Definition flushed, must start with ', 3100@.Definition flushed...@> 3101 'identifier of length > 1'); goto continue; 3102 end; 3103 next_control:=get_next; {get token after the identifier} 3104 if next_control="=" then 3105 begin scan_numeric(id_lookup(numeric)); goto continue; 3106 end 3107 else if next_control=equivalence_sign then 3108 begin define_macro(simple); goto continue; 3109 end 3110 else @<If the next text is `|(#)==|', call |define_macro| 3111 and |goto continue|@>; 3112 err_print('! Definition flushed since it starts badly'); 3113@.Definition flushed...@> 3114 end; 3115done: 3116 3117@ @<If the next text is `|(#)==|'...@>= 3118if next_control="(" then 3119 begin next_control:=get_next; 3120 if next_control="#" then 3121 begin next_control:=get_next; 3122 if next_control=")" then 3123 begin next_control:=get_next; 3124 if next_control="=" then 3125 begin err_print('! Use == for macros'); 3126@.Use == for macros@> 3127 next_control:=equivalence_sign; 3128 end; 3129 if next_control=equivalence_sign then 3130 begin define_macro(parametric); goto continue; 3131 end; 3132 end; 3133 end; 3134 end; 3135 3136@ @<Scan the \PASCAL...@>= 3137case next_control of 3138begin_Pascal:p:=0; 3139module_name: begin p:=cur_module; 3140 @<Check that |=| or |==| follows this module name, otherwise |return|@>; 3141 end; 3142othercases return 3143endcases;@/ 3144@<Insert the module number into |tok_mem|@>; 3145scan_repl(module_name); {now |cur_repl_text| points to the replacement text} 3146@<Update the data structure so that the replacement text is accessible@>; 3147 3148@ @<Check that |=|...@>= 3149repeat next_control:=get_next; 3150until next_control<>"+"; {allow optional `\.{+=}'} 3151if (next_control<>"=")and(next_control<>equivalence_sign) then 3152 begin err_print('! Pascal text flushed, = sign is missing'); 3153@.Pascal text flushed...@> 3154 repeat next_control:=skip_ahead; 3155 until next_control=new_module; 3156 return; 3157 end 3158 3159@ @<Insert the module number...@>= 3160store_two_bytes(@'150000+module_count); {|@'150000=@'320*@'400|} 3161 3162@ @<Update the data...@>= 3163if p=0 then {unnamed module} 3164 begin text_link[last_unnamed]:=cur_repl_text; last_unnamed:=cur_repl_text; 3165 end 3166else if equiv[p]=0 then equiv[p]:=cur_repl_text {first module of this name} 3167else begin p:=equiv[p]; 3168 while text_link[p]<module_flag do p:=text_link[p]; {find end of list} 3169 text_link[p]:=cur_repl_text; 3170 end; 3171text_link[cur_repl_text]:=module_flag; 3172 {mark this replacement text as a nonmacro} 3173 3174@* Debugging. 3175The \PASCAL\ debugger with which \.{TANGLE} was developed allows breakpoints 3176to be set, and variables can be read and changed, but procedures cannot be 3177executed. Therefore a `|debug_help|' procedure has been inserted in the main 3178loops of each phase of the program; when |ddt| and |dd| are set to appropriate 3179values, symbolic printouts of various tables will appear. 3180 3181The idea is to set a breakpoint inside the |debug_help| routine, at the 3182place of `\ignorespaces|breakpoint:|\unskip' below. Then when 3183|debug_help| is to be activated, set |trouble_shooting| equal to |true|. 3184The |debug_help| routine will prompt you for values of |ddt| and |dd|, 3185discontinuing this when |ddt<=0|; thus you type $2n+1$ integers, ending 3186with zero or a negative number. Then control either passes to the 3187breakpoint, allowing you to look at and/or change variables (if you typed 3188zero), or to exit the routine (if you typed a negative value). 3189 3190Another global variable, |debug_cycle|, can be used to skip silently 3191past calls on |debug_help|. If you set |debug_cycle>1|, the program stops 3192only every |debug_cycle| times |debug_help| is called; however, 3193any error stop will set |debug_cycle| to zero. 3194 3195@<Globals...@>= 3196@!debug@!trouble_shooting:boolean; {is |debug_help| wanted?} 3197@!ddt:integer; {operation code for the |debug_help| routine} 3198@!dd:integer; {operand in procedures performed by |debug_help|} 3199@!debug_cycle:integer; {threshold for |debug_help| stopping} 3200@!debug_skipped:integer; {we have skipped this many |debug_help| calls} 3201@!term_in:text_file; {the user's terminal as an input file} 3202gubed 3203 3204@ The debugging routine needs to read from the user's terminal. 3205@^system dependencies@> 3206@<Set init...@>= 3207@!debug trouble_shooting:=true; debug_cycle:=1; debug_skipped:=0;@/ 3208trouble_shooting:=false; debug_cycle:=99999; {use these when it almost works} 3209reset(term_in,'TTY:','/I'); {open |term_in| as the terminal, don't do a |get|} 3210gubed 3211 3212@ @d breakpoint=888 {place where a breakpoint is desirable} 3213@^system dependencies@> 3214 3215@p @!debug procedure debug_help; {routine to display various things} 3216label breakpoint,exit; 3217var k:integer; {index into various arrays} 3218begin incr(debug_skipped); 3219if debug_skipped<debug_cycle then return; 3220debug_skipped:=0; 3221loop@+ begin write(term_out,'#'); update_terminal; {prompt} 3222 read(term_in,ddt); {read a list of integers} 3223 if ddt<0 then return 3224 else if ddt=0 then 3225 begin goto breakpoint;@\ {go to every label at least once} 3226 breakpoint: ddt:=0;@\ 3227 end 3228 else begin read(term_in,dd); 3229 case ddt of 3230 1: print_id(dd); 3231 2: print_repl(dd); 3232 3: for k:=1 to dd do print(xchr[buffer[k]]); 3233 4: for k:=1 to dd do print(xchr[mod_text[k]]); 3234 5: for k:=1 to out_ptr do print(xchr[out_buf[k]]); 3235 6: for k:=1 to dd do print(xchr[out_contrib[k]]); 3236 othercases print('?') 3237 endcases; 3238 end; 3239 end; 3240exit:end; 3241gubed 3242 3243@* The main program. 3244We have defined plenty of procedures, and it is time to put the last 3245pieces of the puzzle in place. Here is where \.{TANGLE} starts, and where 3246it ends. 3247@^system dependencies@> 3248 3249@p begin initialize; 3250@<Initialize the input system@>; 3251print_ln(banner); {print a ``banner line''} 3252@<Phase I: Read all the user's text and compress it into |tok_mem|@>; 3253stat for ii:=0 to zz-1 do max_tok_ptr[ii]:=tok_ptr[ii];@+tats@;@/ 3254@<Phase II:...@>; 3255end_of_TANGLE: 3256if string_ptr>number_chars then @<Finish off the string pool file@>; 3257stat @<Print statistics about memory usage@>;@+tats@;@/ 3258@t\4\4@>{here files should be closed if the operating system requires it} 3259@<Print the job |history|@>; 3260end. 3261 3262@ @<Phase I:...@>= 3263phase_one:=true; 3264module_count:=0; 3265repeat next_control:=skip_ahead; 3266until next_control=new_module; 3267while not input_has_ended do scan_module; 3268@<Check that all changes have been read@>; 3269phase_one:=false; 3270 3271@ @<Finish off the string pool file@>= 3272begin print_nl(string_ptr-number_chars:1, 3273 ' strings written to string pool file.'); 3274write(pool,'*'); 3275for ii:=1 to 9 do 3276 begin out_buf[ii]:=pool_check_sum mod 10; 3277 pool_check_sum:=pool_check_sum div 10; 3278 end; 3279for ii:=9 downto 1 do write(pool,xchr["0"+out_buf[ii]]); 3280write_ln(pool); 3281end 3282 3283@ @<Glob...@>= 3284stat @!wo:0..ww-1; {segment of memory for which statistics are being printed} 3285tats 3286 3287@ @<Print statistics about memory usage@>= 3288print_nl('Memory usage statistics:'); 3289print_nl(name_ptr:1, ' names, ', text_ptr:1, ' replacement texts;'); 3290print_nl(byte_ptr[0]:1); 3291for wo:=1 to ww-1 do print('+',byte_ptr[wo]:1); 3292if phase_one then 3293 for ii:=0 to zz-1 do max_tok_ptr[ii]:=tok_ptr[ii]; 3294print(' bytes, ', max_tok_ptr[0]:1); 3295for ii:=1 to zz-1 do print('+',max_tok_ptr[ii]:1); 3296print(' tokens.'); 3297 3298@ Some implementations may wish to pass the |history| value to the 3299operating system so that it can be used to govern whether or not other 3300programs are started. Here we simply report the history to the user. 3301@^system dependencies@> 3302 3303@<Print the job |history|@>= 3304case history of 3305spotless: print_nl('(No errors were found.)'); 3306harmless_message: print_nl('(Did you see the warning message above?)'); 3307error_message: print_nl('(Pardon me, but I think I spotted something wrong.)'); 3308fatal_message: print_nl('(That was a fatal error, my friend.)'); 3309end {there are no other cases} 3310 3311@* System-dependent changes. 3312This module should be replaced, if necessary, by changes to the program 3313that are necessary to make \.{TANGLE} work at a particular installation. 3314It is usually best to design your change file so that all changes to 3315previous modules preserve the module numbering; then everybody's version 3316will be consistent with the printed program. More extensive changes, 3317which introduce new modules, can be inserted here; then only the index 3318itself will get a new module number. 3319@^system dependencies@> 3320 3321@* Index. 3322Here is a cross-reference table for the \.{TANGLE} processor. 3323All modules in which an identifier is 3324used are listed with that identifier, except that reserved words are 3325indexed only when they appear in format definitions, and the appearances 3326of identifiers in module names are not indexed. Underlined entries 3327correspond to where the identifier was declared. Error messages and 3328a few other things like ``ASCII code'' are indexed here too. 3329