1% This program by D. E. Knuth is not copyrighted and can be used freely.
2% Version 0 was released in December, 1981.
3% Version 1 was released in September, 1982, with version 0 of TeX.
4% Slight changes were made in October, 1982, for version 0.6 of TeX.
5% Version 1.2 introduced {:nnn} comments, added @@= and @@\ (December, 1982).
6% Version 1.4 added "history" (February, 1983).
7% Version 1.5 conformed to TeX version 0.96 and fixed @@\ (March, 1983).
8% Version 1.7 introduced the new change file format (June, 1983).
9% Version 2.0 was released in July, 1983, with version 0.999 of TeX.
10% Version 2.5 was released in November, 1983, with version 1.0 of TeX.
11% Version 2.6 fixed a bug: force-line-break after a constant (August, 1984).
12% Version 2.7 fixed the definition of check_sum_prime (May, 1985).
13% Version 2.8 fixed a bug in change_buffer movement (August, 1985).
14% Version 2.9 allows nonnumeric macros before their def (December, 1988).
15% Version 3, for Sewell's book, fixed long-line bug in input_ln (March, 1989).
16% Version 4 was major change to allow 8-bit input (September, 1989).
17% Version 4.1 conforms to ANSI standard for-loop rules (September, 1990).
18% Version 4.2 fixes stat report if phase one dies (March, 1991).
19% Version 4.3 fixes @@ bug in verbatim, catches extra } (September, 1991).
20% Version 4.4 activates debug_help on errors as advertised (February, 1993).
21% Version 4.5 prevents modno-comments from being split across lines (Dec 2002).
22
23% Here is TeX material that gets inserted after \input webmac
24\def\hang{\hangindent 3em\indent\ignorespaces}
25\font\ninerm=cmr9
26\let\mc=\ninerm % medium caps for names like SAIL
27\def\PASCAL{Pascal}
28\def\pb{$\.|\ldots\.|$} % Pascal brackets (|...|)
29\def\v{\.{\char'174}} % vertical (|) in typewriter font
30\mathchardef\BA="3224 % double arrow
31\def\({} % kludge for alphabetizing certain module names
32
33\def\title{TANGLE}
34\def\contentspagenumber{123} % should be odd
35\def\topofcontents{\null\vfill
36  \titlefalse % include headline on the contents page
37  \def\rheader{\mainfont Appendix E\hfil \contentspagenumber}
38  \centerline{\titlefont The {\ttitlefont TANGLE} processor}
39  \vskip 15pt
40  \centerline{(Version 4.5)}
41  \vfill}
42\pageno=\contentspagenumber \advance\pageno by 1
43
44@* Introduction.
45This program converts a \.{WEB} file to a \PASCAL\ file. It was written
46by D. E. Knuth in September, 1981; a somewhat similar {\mc SAIL} program had
47been developed in March, 1979. Since this program describes itself, a
48bootstrapping process involving hand-translation had to be used to get started.
49
50For large \.{WEB} files one should have a large memory, since \.{TANGLE} keeps
51all the \PASCAL\ text in memory (in an abbreviated form). The program uses
52a few features of the local \PASCAL\ compiler that may need to be changed in
53other installations:
54
55\yskip\item{1)} Case statements have a default.
56\item{2)} Input-output routines may need to be adapted for use with a particular
57character set and/or for printing messages on the user's terminal.
58
59\yskip\noindent
60These features are also present in the \PASCAL\ version of \TeX, where they
61are used in a similar (but more complex) way. System-dependent portions
62of \.{TANGLE} can be identified by looking at the entries for `system
63dependencies' in the index below.
64@!@^system dependencies@>
65
66The ``banner line'' defined here should be changed whenever \.{TANGLE}
67is modified.
68
69@d banner=='This is TANGLE, Version 4.5'
70
71@ The program begins with a fairly normal header, made up of pieces that
72@^system dependencies@>
73will mostly be filled in later. The \.{WEB} input comes from files |web_file|
74and |change_file|, the \PASCAL\ output goes to file |Pascal_file|,
75and the string pool output goes to file |pool|.
76
77If it is necessary to abort the job because of a fatal error, the program
78calls the `|jump_out|' procedure, which goes to the label |end_of_TANGLE|.
79
80@d end_of_TANGLE = 9999 {go here to wrap it up}
81
82@p @t\4@>@<Compiler directives@>@/
83program TANGLE(@!web_file,@!change_file,@!Pascal_file,@!pool);
84label end_of_TANGLE; {go here to finish}
85const @<Constants in the outer block@>@/
86type @<Types in the outer block@>@/
87var @<Globals in the outer block@>@/
88@<Error handling procedures@>@/
89procedure initialize;
90  var @<Local variables for initialization@>@/
91  begin @<Set initial values@>@/
92  end;
93
94@ Some of this code is optional for use when debugging only;
95such material is enclosed between the delimiters |debug| and $|gubed|$.
96Other parts, delimited by |stat| and $|tats|$, are optionally included if
97statistics about \.{TANGLE}'s memory usage are desired.
98
99@d debug==@{ {change this to `$\\{debug}\equiv\null$' when debugging}
100@d gubed==@t@>@} {change this to `$\\{gubed}\equiv\null$' when debugging}
101@f debug==begin
102@f gubed==end
103@#
104@d stat==@{ {change this to `$\\{stat}\equiv\null$'
105  when gathering usage statistics}
106@d tats==@t@>@} {change this to `$\\{tats}\equiv\null$'
107  when gathering usage statistics}
108@f stat==begin
109@f tats==end
110
111@ The \PASCAL\ compiler used to develop this system has ``compiler
112directives'' that can appear in comments whose first character is a dollar sign.
113In production versions of \.{TANGLE} these directives tell the compiler that
114@^system dependencies@>
115it is safe to avoid range checks and to leave out the extra code it inserts
116for the \PASCAL\ debugger's benefit, although interrupts will occur if
117there is arithmetic overflow.
118
119@<Compiler directives@>=
120@{@&$C-,A+,D-@} {no range check, catch arithmetic overflow, no debug overhead}
121@!debug @{@&$C+,D+@}@+ gubed {but turn everything on when debugging}
122
123@ Labels are given symbolic names by the following definitions. We insert
124the label `|exit|:' just before the `\ignorespaces|end|\unskip' of a
125procedure in which we have used the `|return|' statement defined below;
126the label `|restart|' is occasionally used at the very beginning of a
127procedure; and the label `|reswitch|' is occasionally used just prior to
128a \&{case} statement in which some cases change the conditions and we wish to
129branch to the newly applicable case.
130Loops that are set up with the \&{loop} construction defined below are
131commonly exited by going to `|done|' or to `|found|' or to `|not_found|',
132and they are sometimes repeated by going to `|continue|'.
133
134@d exit=10 {go here to leave a procedure}
135@d restart=20 {go here to start a procedure again}
136@d reswitch=21 {go here to start a case statement again}
137@d continue=22 {go here to resume a loop}
138@d done=30 {go here to exit a loop}
139@d found=31 {go here when you've found it}
140@d not_found=32 {go here when you've found something else}
141
142@ Here are some macros for common programming idioms.
143
144@d incr(#) == #:=#+1 {increase a variable by unity}
145@d decr(#) == #:=#-1 {decrease a variable by unity}
146@d loop == @+ while true do@+ {repeat over and over until a |goto| happens}
147@d do_nothing == {empty statement}
148@d return == goto exit {terminate a procedure call}
149@f return == nil
150@f loop == xclause
151
152@ We assume that |case| statements may include a default case that applies
153if no matching label is found. Thus, we shall use constructions like
154@^system dependencies@>
155$$\vbox{\halign{#\hfil\cr
156|case x of|\cr
1571: $\langle\,$code for $x=1\,\rangle$;\cr
1583: $\langle\,$code for $x=3\,\rangle$;\cr
159|othercases| $\langle\,$code for |x<>1| and |x<>3|$\,\rangle$\cr
160|endcases|\cr}}$$
161since most \PASCAL\ compilers have plugged this hole in the language by
162incorporating some sort of default mechanism. For example, the compiler
163used to develop \.{WEB} and \TeX\ allows `|others|:' as a default label,
164and other \PASCAL s allow syntaxes like `\ignorespaces|else|\unskip' or
165`\&{otherwise}' or `\\{otherwise}:', etc. The definitions of |othercases|
166and |endcases| should be changed to agree with local conventions. (Of
167course, if no default mechanism is available, the |case| statements of
168this program must be extended by listing all remaining cases. The author
169would have taken the trouble to modify \.{TANGLE} so that such extensions
170were done automatically, if he had not wanted to encourage \PASCAL\
171compiler writers to make this important change in \PASCAL, where it belongs.)
172
173@d othercases == others: {default for cases not listed explicitly}
174@d endcases == @+end {follows the default case in an extended |case| statement}
175@f othercases == else
176@f endcases == end
177
178@ The following parameters are set big enough to handle \TeX, so they
179should be sufficient for most applications of \.{TANGLE}.
180
181@<Constants...@>=
182@!buf_size=100; {maximum length of input line}
183@!max_bytes=45000; {|1/ww| times the number of bytes in identifiers,
184  strings, and module names; must be less than 65536}
185@!max_toks=50000; {|1/zz| times the number of bytes in compressed \PASCAL\ code;
186  must be less than 65536}
187@!max_names=4000; {number of identifiers, strings, module names;
188  must be less than 10240}
189@!max_texts=2000; {number of replacement texts, must be less than 10240}
190@!hash_size=353; {should be prime}
191@!longest_name=400; {module names shouldn't be longer than this}
192@!line_length=72; {lines of \PASCAL\ output have at most this many characters}
193@!out_buf_size=144; {length of output buffer, should be twice |line_length|}
194@!stack_size=50; {number of simultaneous levels of macro expansion}
195@!max_id_length=12; {long identifiers are chopped to this length, which must
196  not exceed |line_length|}
197@!unambig_length=7; {identifiers must be unique if chopped to this length}
198  {note that 7 is more strict than \PASCAL's 8, but this can be varied}
199
200@ A global variable called |history| will contain one of four values
201at the end of every run: |spotless| means that no unusual messages were
202printed; |harmless_message| means that a message of possible interest
203was printed but no serious errors were detected; |error_message| means that
204at least one error was found; |fatal_message| means that the program
205terminated abnormally. The value of |history| does not influence the
206behavior of the program; it is simply computed for the convenience
207of systems that might want to use such information.
208
209@d spotless=0 {|history| value for normal jobs}
210@d harmless_message=1 {|history| value when non-serious info was printed}
211@d error_message=2 {|history| value when an error was noted}
212@d fatal_message=3 {|history| value when we had to stop prematurely}
213@#
214@d mark_harmless==@t@>@+if history=spotless then history:=harmless_message
215@d mark_error==history:=error_message
216@d mark_fatal==history:=fatal_message
217
218@<Glob...@>=@!history:spotless..fatal_message; {how bad was this run?}
219
220@ @<Set init...@>=history:=spotless;
221
222@* The character set.
223One of the main goals in the design of \.{WEB} has been to make it readily
224portable between a wide variety of computers. Yet \.{WEB} by its very
225nature must use a greater variety of characters than most computer
226programs deal with, and character encoding is one of the areas in which
227existing machines differ most widely from each other.
228
229To resolve this problem, all input to \.{WEAVE} and \.{TANGLE} is converted
230to an internal eight-bit code that is essentially standard ASCII, the ``American
231Standard Code for Information Interchange.''  The conversion is done
232immediately when each character is read in. Conversely, characters are
233converted from ASCII to the user's external representation just before
234they are output. (The original ASCII code was seven bits only; \.{WEB} now
235allows eight bits in an attempt to keep up with modern times.)
236
237Such an internal code is relevant to users of \.{WEB} only because it is
238the code used for preprocessed constants like \.{"A"}. If you are writing
239a program in \.{WEB} that makes use of such one-character constants, you
240should convert your input to ASCII form, like \.{WEAVE} and \.{TANGLE} do.
241Otherwise \.{WEB}'s internal coding scheme does not affect you.
242@^ASCII code@>
243
244Here is a table of the standard visible ASCII codes:
245$$\def\:{\char\count255\global\advance\count255 by 1}
246\count255='40
247\vbox{
248\hbox{\hbox to 40pt{\it\hfill0\/\hfill}%
249\hbox to 40pt{\it\hfill1\/\hfill}%
250\hbox to 40pt{\it\hfill2\/\hfill}%
251\hbox to 40pt{\it\hfill3\/\hfill}%
252\hbox to 40pt{\it\hfill4\/\hfill}%
253\hbox to 40pt{\it\hfill5\/\hfill}%
254\hbox to 40pt{\it\hfill6\/\hfill}%
255\hbox to 40pt{\it\hfill7\/\hfill}}
256\vskip 4pt
257\hrule
258\def\^{\vrule height 10.5pt depth 4.5pt}
259\halign{\hbox to 0pt{\hskip -24pt\O{#0}\hfill}&\^
260\hbox to 40pt{\tt\hfill#\hfill\^}&
261&\hbox to 40pt{\tt\hfill#\hfill\^}\cr
26204&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
26305&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
26406&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
26507&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
26610&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
26711&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
26812&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
26913&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
27014&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
27115&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
27216&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
27317&\:&\:&\:&\:&\:&\:&\:\cr}
274\hrule width 280pt}$$
275(Actually, of course, code @'040 is an invisible blank space.)  Code @'136
276was once an upward arrow (\.{\char'13}), and code @'137 was
277once a left arrow (\.^^X), in olden times when the first draft
278of ASCII code was prepared; but \.{WEB} works with today's standard
279ASCII in which those codes represent circumflex and underline as shown.
280
281@<Types...@>=
282@!ASCII_code=0..255; {eight-bit numbers, a subrange of the integers}
283
284@ The original \PASCAL\ compiler was designed in the late 60s, when six-bit
285character sets were common, so it did not make provision for lowercase
286letters. Nowadays, of course, we need to deal with both capital and small
287letters in a convenient way, so \.{WEB} assumes that it is being used
288with a \PASCAL\ whose character set contains at least the characters of
289standard ASCII as listed above. Some \PASCAL\ compilers use the original
290name |char| for the data type associated with the characters in text files,
291while other \PASCAL s consider |char| to be a 64-element subrange of a larger
292data type that has some other name.
293
294In order to accommodate this difference, we shall use the name |text_char|
295to stand for the data type of the characters in the input and output
296files.  We shall also assume that |text_char| consists of the elements
297|chr(first_text_char)| through |chr(last_text_char)|, inclusive. The
298following definitions should be adjusted if necessary.
299@^system dependencies@>
300
301@d text_char == char {the data type of characters in text files}
302@d first_text_char=0 {ordinal number of the smallest element of |text_char|}
303@d last_text_char=255 {ordinal number of the largest element of |text_char|}
304
305@<Types...@>=
306@!text_file=packed file of text_char;
307
308@ The \.{WEAVE} and \.{TANGLE} processors convert between ASCII code and
309the user's external character set by means of arrays |xord| and |xchr|
310that are analogous to \PASCAL's |ord| and |chr| functions.
311
312@<Globals...@>=
313@!xord: array [text_char] of ASCII_code;
314  {specifies conversion of input characters}
315@!xchr: array [ASCII_code] of text_char;
316  {specifies conversion of output characters}
317
318@ If we assume that every system using \.{WEB} is able to read and write the
319visible characters of standard ASCII (although not necessarily using the
320ASCII codes to represent them), the following assignment statements initialize
321most of the |xchr| array properly, without needing any system-dependent
322changes. For example, the statement \.{xchr[@@\'101]:=\'A\'} that appears
323in the present \.{WEB} file might be encoded in, say, {\mc EBCDIC} code
324on the external medium on which it resides, but \.{TANGLE} will convert from
325this external code to ASCII and back again. Therefore the assignment
326statement \.{XCHR[65]:=\'A\'} will appear in the corresponding \PASCAL\ file,
327and \PASCAL\ will compile this statement so that |xchr[65]| receives the
328character \.A in the external (|char|) code. Note that it would be quite
329incorrect to say \.{xchr[@@\'101]:="A"}, because |"A"| is a constant of
330type |integer|, not |char|, and because we have $|"A"|=65$ regardless of
331the external character set.
332
333@<Set init...@>=
334xchr[@'40]:=' ';
335xchr[@'41]:='!';
336xchr[@'42]:='"';
337xchr[@'43]:='#';
338xchr[@'44]:='$';
339xchr[@'45]:='%';
340xchr[@'46]:='&';
341xchr[@'47]:='''';@/
342xchr[@'50]:='(';
343xchr[@'51]:=')';
344xchr[@'52]:='*';
345xchr[@'53]:='+';
346xchr[@'54]:=',';
347xchr[@'55]:='-';
348xchr[@'56]:='.';
349xchr[@'57]:='/';@/
350xchr[@'60]:='0';
351xchr[@'61]:='1';
352xchr[@'62]:='2';
353xchr[@'63]:='3';
354xchr[@'64]:='4';
355xchr[@'65]:='5';
356xchr[@'66]:='6';
357xchr[@'67]:='7';@/
358xchr[@'70]:='8';
359xchr[@'71]:='9';
360xchr[@'72]:=':';
361xchr[@'73]:=';';
362xchr[@'74]:='<';
363xchr[@'75]:='=';
364xchr[@'76]:='>';
365xchr[@'77]:='?';@/
366xchr[@'100]:='@@';
367xchr[@'101]:='A';
368xchr[@'102]:='B';
369xchr[@'103]:='C';
370xchr[@'104]:='D';
371xchr[@'105]:='E';
372xchr[@'106]:='F';
373xchr[@'107]:='G';@/
374xchr[@'110]:='H';
375xchr[@'111]:='I';
376xchr[@'112]:='J';
377xchr[@'113]:='K';
378xchr[@'114]:='L';
379xchr[@'115]:='M';
380xchr[@'116]:='N';
381xchr[@'117]:='O';@/
382xchr[@'120]:='P';
383xchr[@'121]:='Q';
384xchr[@'122]:='R';
385xchr[@'123]:='S';
386xchr[@'124]:='T';
387xchr[@'125]:='U';
388xchr[@'126]:='V';
389xchr[@'127]:='W';@/
390xchr[@'130]:='X';
391xchr[@'131]:='Y';
392xchr[@'132]:='Z';
393xchr[@'133]:='[';
394xchr[@'134]:='\';
395xchr[@'135]:=']';
396xchr[@'136]:='^';
397xchr[@'137]:='_';@/
398xchr[@'140]:='`';
399xchr[@'141]:='a';
400xchr[@'142]:='b';
401xchr[@'143]:='c';
402xchr[@'144]:='d';
403xchr[@'145]:='e';
404xchr[@'146]:='f';
405xchr[@'147]:='g';@/
406xchr[@'150]:='h';
407xchr[@'151]:='i';
408xchr[@'152]:='j';
409xchr[@'153]:='k';
410xchr[@'154]:='l';
411xchr[@'155]:='m';
412xchr[@'156]:='n';
413xchr[@'157]:='o';@/
414xchr[@'160]:='p';
415xchr[@'161]:='q';
416xchr[@'162]:='r';
417xchr[@'163]:='s';
418xchr[@'164]:='t';
419xchr[@'165]:='u';
420xchr[@'166]:='v';
421xchr[@'167]:='w';@/
422xchr[@'170]:='x';
423xchr[@'171]:='y';
424xchr[@'172]:='z';
425xchr[@'173]:='{';
426xchr[@'174]:='|';
427xchr[@'175]:='}';
428xchr[@'176]:='~';@/
429xchr[0]:=' '; xchr[@'177]:=' '; {these ASCII codes are not used}
430
431@ Some of the ASCII codes below @'40 have been given symbolic names in
432\.{WEAVE} and \.{TANGLE} because they are used with a special meaning.
433
434@d and_sign=@'4 {equivalent to `\.{and}'}
435@d not_sign=@'5 {equivalent to `\.{not}'}
436@d set_element_sign=@'6 {equivalent to `\.{in}'}
437@d tab_mark=@'11 {ASCII code used as tab-skip}
438@d line_feed=@'12 {ASCII code thrown away at end of line}
439@d form_feed=@'14 {ASCII code used at end of page}
440@d carriage_return=@'15 {ASCII code used at end of line}
441@d left_arrow=@'30 {equivalent to `\.{:=}'}
442@d not_equal=@'32 {equivalent to `\.{<>}'}
443@d less_or_equal=@'34 {equivalent to `\.{<=}'}
444@d greater_or_equal=@'35 {equivalent to `\.{>=}'}
445@d equivalence_sign=@'36 {equivalent to `\.{==}'}
446@d or_sign=@'37 {equivalent to `\.{or}'}
447
448@ When we initialize the |xord| array and the remaining parts of |xchr|,
449it will be convenient to make use of an index variable, |i|.
450
451@<Local variables for init...@>=
452@!i:0..255;
453
454@ Here now is the system-dependent part of the character set.
455If \.{WEB} is being implemented on a garden-variety \PASCAL\ for which
456only standard ASCII codes will appear in the input and output files, you
457don't need to make any changes here. But if you have, for example, an extended
458character set like the one in Appendix~C of {\sl The \TeX book}, the first
459line of code in this module should be changed to
460$$\hbox{|for i:=1 to @'37 do xchr[i]:=chr(i);|}$$
461\.{WEB}'s character set is essentially identical to \TeX's, even with respect to
462characters less than @'40.
463@^system dependencies@>
464
465Changes to the present module will make \.{WEB} more friendly on computers
466that have an extended character set, so that one can type things like
467\.^^Z\ instead of \.{<>}. If you have an extended set of characters that
468are easily incorporated into text files, you can assign codes arbitrarily
469here, giving an |xchr| equivalent to whatever characters the users of
470\.{WEB} are allowed to have in their input files, provided that unsuitable
471characters do not correspond to special codes like |carriage_return|
472that are listed above.
473
474(The present file \.{TANGLE.WEB} does not contain any of the non-ASCII
475characters, because it is intended to be used with all implementations of
476\.{WEB}.  It was originally created on a Stanford system that has a
477convenient extended character set, then ``sanitized'' by applying another
478program that transliterated all of the non-standard characters into
479standard equivalents.)
480
481@<Set init...@>=
482for i:=1 to @'37 do xchr[i]:=' ';
483for i:=@'200 to @'377 do xchr[i]:=' ';
484
485@ The following system-independent code makes the |xord| array contain a
486suitable inverse to the information in |xchr|.
487
488@<Set init...@>=
489for i:=first_text_char to last_text_char do xord[chr(i)]:=" ";
490for i:=1 to @'377 do xord[xchr[i]]:=i;
491xord[' ']:=" ";
492
493@* Input and output.
494The input conventions of this program are intended to be very much like those
495of \TeX\ (except, of course, that they are much simpler, because much less
496needs to be done). Furthermore they are identical to those of \.{WEAVE}.
497Therefore people who need to make modifications to all three systems
498should be able to do so without too many headaches.
499
500We use the standard \PASCAL\ input/output procedures in several places that
501\TeX\ cannot, since \.{TANGLE} does not have to deal with files that are named
502dynamically by the user, and since there is no input from the terminal.
503
504@ Terminal output is done by writing on file |term_out|, which is assumed to
505consist of characters of type |text_char|:
506@^system dependencies@>
507
508@d print(#)==write(term_out,#) {`|print|' means write on the terminal}
509@d print_ln(#)==write_ln(term_out,#) {`|print|' and then start new line}
510@d new_line==write_ln(term_out) {start new line}
511@d print_nl(#)==  {print information starting on a new line}
512  begin new_line; print(#);
513  end
514
515@<Globals...@>=
516@!term_out:text_file; {the terminal as an output file}
517
518@ Different systems have different ways of specifying that the output on a
519certain file will appear on the user's terminal. Here is one way to do this
520on the \PASCAL\ system that was used in \.{TANGLE}'s initial development:
521@^system dependencies@>
522
523@<Set init...@>=
524rewrite(term_out,'TTY:'); {send |term_out| output to the terminal}
525
526@ The |update_terminal| procedure is called when we want
527to make sure that everything we have output to the terminal so far has
528actually left the computer's internal buffers and been sent.
529@^system dependencies@>
530
531@d update_terminal == break(term_out) {empty the terminal output buffer}
532
533@ The main input comes from |web_file|; this input may be overridden
534by changes in |change_file|. (If |change_file| is empty, there are no changes.)
535
536@<Globals...@>=
537@!web_file:text_file; {primary input}
538@!change_file:text_file; {updates}
539
540@ The following code opens the input files.  Since these files were listed
541in the program header, we assume that the \PASCAL\ runtime system has
542already checked that suitable file names have been given; therefore no
543additional error checking needs to be done.
544@^system dependencies@>
545
546@p procedure open_input; {prepare to read |web_file| and |change_file|}
547begin reset(web_file); reset(change_file);
548end;
549
550@ The main output goes to |Pascal_file|, and string pool constants are
551written to the |pool| file.
552
553@<Globals...@>=
554@!Pascal_file: text_file;
555@!pool: text_file;
556
557@ The following code opens |Pascal_file| and |pool|.
558Since these files were listed in the program header, we assume that the
559\PASCAL\ runtime system has checked that suitable external file names have
560been given.
561@^system dependencies@>
562
563@<Set init...@>=
564rewrite(Pascal_file); rewrite(pool);
565
566@ Input goes into an array called |buffer|.
567
568@<Globals...@>=@!buffer: array[0..buf_size] of ASCII_code;
569
570@ The |input_ln| procedure brings the next line of input from the specified
571file into the |buffer| array and returns the value |true|, unless the file has
572already been entirely read, in which case it returns |false|. The conventions
573of \TeX\ are followed; i.e., |ASCII_code| numbers representing the next line
574of the file are input into |buffer[0]|, |buffer[1]|, \dots,
575|buffer[limit-1]|; trailing blanks are ignored;
576and the global variable |limit| is set to the length of the
577@^system dependencies@>
578line. The value of |limit| must be strictly less than |buf_size|.
579
580We assume that none of the |ASCII_code| values
581of |buffer[j]| for |0<=j<limit| is equal to 0, @'177, |line_feed|, |form_feed|,
582or |carriage_return|.
583
584@p function input_ln(var f:text_file):boolean;
585  {inputs a line or returns |false|}
586var final_limit:0..buf_size; {|limit| without trailing blanks}
587begin limit:=0; final_limit:=0;
588if eof(f) then input_ln:=false
589else  begin while not eoln(f) do
590    begin buffer[limit]:=xord[f^]; get(f);
591    incr(limit);
592    if buffer[limit-1]<>" " then final_limit:=limit;
593    if limit=buf_size then
594      begin while not eoln(f) do get(f);
595      decr(limit); {keep |buffer[buf_size]| empty}
596      if final_limit>limit then final_limit:=limit;
597      print_nl('! Input line too long'); loc:=0; error;
598@.Input line too long@>
599      end;
600    end;
601  read_ln(f); limit:=final_limit; input_ln:=true;
602  end;
603end;
604
605@* Reporting errors to the user.
606The \.{TANGLE} processor operates in two phases: first it inputs the source
607file and stores a compressed representation of the program, then it produces
608the \PASCAL\ output from the compressed representation.
609
610The global variable |phase_one| tells whether we are in Phase I or not.
611
612@<Globals...@>=
613@!phase_one: boolean; {|true| in Phase I, |false| in Phase II}
614
615@ If an error is detected while we are debugging,
616we usually want to look at the contents of memory.
617A special procedure will be declared later for this purpose.
618
619@<Error handling...@>=
620@!debug @+ procedure debug_help; forward;@+ gubed
621
622@ During the first phase, syntax errors are reported to the user by saying
623$$\hbox{`|err_print('! Error message')|'},$$
624followed by `|jump_out|' if no recovery from the error is provided.
625This will print the error message followed by an indication of where the error
626was spotted in the source file. Note that no period follows the error message,
627since the error routine will automatically supply a period.
628
629Errors that are noticed during the second phase are reported to the user
630in the same fashion, but the error message will be
631followed by an indication of where the error was spotted in the output file.
632
633The actual error indications are provided by a procedure called |error|.
634
635@d err_print(#)==begin new_line; print(#); error;
636  end
637
638@<Error handling...@>=
639procedure error; {prints '\..' and location of error message}
640var j: 0..out_buf_size; {index into |out_buf|}
641@!k,@!l: 0..buf_size; {indices into |buffer|}
642begin if phase_one then @<Print error location based on input buffer@>
643else @<Print error location based on output buffer@>;
644update_terminal; mark_error;
645@!debug debug_skipped:=debug_cycle; debug_help;@+gubed
646end;
647
648@ The error locations during Phase I can be indicated by using the global
649variables |loc|, |line|, and |changing|, which tell respectively the first
650unlooked-at position in |buffer|, the current line number, and whether or not
651the current line is from |change_file| or |web_file|.
652This routine should be modified on systems whose standard text editor
653has special line-numbering conventions.
654@^system dependencies@>
655
656@<Print error location based on input buffer@>=
657begin if changing then print('. (change file ')@+else print('. (');
658print_ln('l.', line:1, ')');
659if loc>=limit then l:=limit else l:=loc;
660for k:=1 to l do
661  if buffer[k-1]=tab_mark then print(' ')
662  else print(xchr[buffer[k-1]]); {print the characters already read}
663new_line;
664for k:=1 to l do print(' '); {space out the next line}
665for k:=l+1 to limit do print(xchr[buffer[k-1]]); {print the part not yet read}
666print(' '); {this space separates the message from future asterisks}
667end
668
669@ The position of errors detected during the second phase can be indicated
670by outputting the partially-filled output buffer, which contains |out_ptr|
671entries.
672
673@<Print error location based on output...@>=
674begin print_ln('. (l.',line:1,')');
675for j:=1 to out_ptr do print(xchr[out_buf[j-1]]); {print current partial line}
676print('... '); {indicate that this information is partial}
677end
678
679@ The |jump_out| procedure just cuts across all active procedure levels
680and jumps out of the program. This is the only non-local |goto| statement
681in \.{TANGLE}. It is used when no recovery from a particular error has
682been provided.
683
684Some \PASCAL\ compilers do not implement non-local |goto| statements.
685@^system dependencies@>
686In such cases the code that appears at label |end_of_TANGLE| should be
687copied into the |jump_out| procedure, followed by a call to a system procedure
688that terminates the program.
689
690@d fatal_error(#)==begin new_line; print(#); error; mark_fatal; jump_out;
691  end
692
693@<Error handling...@>=
694procedure jump_out;
695begin goto end_of_TANGLE;
696end;
697
698@ Sometimes the program's behavior is far different from what it should be,
699and \.{TANGLE} prints an error message that is really for the \.{TANGLE}
700maintenance person, not the user. In such cases the program says
701|confusion('indication of where we are')|.
702
703@d confusion(#)==fatal_error('! This can''t happen (',#,')')
704@.This can't happen@>
705
706@ An overflow stop occurs if \.{TANGLE}'s tables aren't large enough.
707
708@d overflow(#)==fatal_error('! Sorry, ',#,' capacity exceeded')
709@.Sorry, x capacity exceeded@>
710
711
712@* Data structures.
713Most of the user's \PASCAL\ code is packed into eight-bit integers
714in two large arrays called |byte_mem| and |tok_mem|.
715The |byte_mem| array holds the names of identifiers, strings, and modules;
716the |tok_mem| array holds the replacement texts
717for macros and modules. Allocation is sequential, since things are deleted only
718during Phase II, and only in a last-in-first-out manner.
719
720Auxiliary arrays |byte_start| and |tok_start| are used as directories to
721|byte_mem| and |tok_mem|, and the |link|, |ilk|, |equiv|, and |text_link|
722arrays give further information about names. These auxiliary arrays
723consist of sixteen-bit items.
724
725@<Types...@>=
726@!eight_bits=0..255; {unsigned one-byte quantity}
727@!sixteen_bits=0..65535; {unsigned two-byte quantity}
728
729@ \.{TANGLE} has been designed to avoid the need for indices that are more
730than sixteen bits wide, so that it can be used on most computers. But
731there are programs that need more than 65536 tokens, and some programs
732even need more than 65536 bytes; \TeX\ is one of these.  To get around
733this problem, a slight complication has been added to the data structures:
734|byte_mem| and |tok_mem| are two-dimensional arrays, whose first index is
735either 0 or 1. (For generality, the first index is actually allowed to run
736between 0 and |ww-1| in |byte_mem|, or between 0 and |zz-1| in |tok_mem|,
737where |ww| and |zz| are set to 2 and~3; the program will work for any
738positive values of |ww| and |zz|, and it can be simplified in obvious ways
739if |ww=1| or |zz=1|.)
740
741@d ww=2 {we multiply the byte capacity by approximately this amount}
742@d zz=3 {we multiply the token capacity by approximately this amount}
743
744@<Globals...@>=
745@!byte_mem: packed array [0..ww-1,0..max_bytes] of ASCII_code;
746  {characters of names}
747@!tok_mem: packed array [0..zz-1,0..max_toks] of eight_bits; {tokens}
748@!byte_start: array [0..max_names] of sixteen_bits; {directory into |byte_mem|}
749@!tok_start: array [0..max_texts] of sixteen_bits; {directory into |tok_mem|}
750@!link: array [0..max_names] of sixteen_bits; {hash table or tree links}
751@!ilk: array [0..max_names] of sixteen_bits; {type codes or tree links}
752@!equiv: array [0..max_names] of sixteen_bits; {info corresponding to names}
753@!text_link: array [0..max_texts] of sixteen_bits; {relates replacement texts}
754
755@ The names of identifiers are found by computing a hash address |h| and
756then looking at strings of bytes signified by |hash[h]|, |link[hash[h]]|,
757|link[link[hash[h]]]|, \dots, until either finding the desired name
758or encountering a zero.
759
760A `|name_pointer|' variable, which signifies a name, is an index into
761|byte_start|. The actual sequence of characters in the name pointed to by
762|p| appears in positions |byte_start[p]| to |byte_start[p+ww]-1|, inclusive,
763in the segment of |byte_mem| whose first index is |p mod ww|. Thus, when
764|ww=2| the even-numbered name bytes appear in |byte_mem[0,@t$*$@>]|
765and the odd-numbered ones appear in |byte_mem[1,@t$*$@>]|.
766The pointer 0 is used for undefined module names; we don't
767want to use it for the names of identifiers, since 0 stands for a null
768pointer in a linked list.
769
770Strings are treated like identifiers; the first character (a double-quote)
771distinguishes a string from an alphabetic name, but for \.{TANGLE}'s purposes
772strings behave like numeric macros. (A `string' here refers to the
773strings delimited by double-quotes that \.{TANGLE} processes. \PASCAL\
774string constants delimited by single-quote marks are not given such special
775treatment; they simply appear as sequences of characters in the \PASCAL\
776texts.)  The total number of strings in the string
777pool is called |string_ptr|, and the total number of names in |byte_mem|
778is called |name_ptr|. The total number of bytes occupied in
779|byte_mem[w,@t$*$@>]| is called |byte_ptr[w]|.
780
781We usually have |byte_start[name_ptr+w]=byte_ptr[(name_ptr+w) mod ww]|
782for |0<=w<ww|, since these are the starting positions for the next |ww|
783names to be stored in |byte_mem|.
784
785@d length(#)==byte_start[#+ww]-byte_start[#] {the length of a name}
786
787@<Types...@>=
788@!name_pointer=0..max_names; {identifies a name}
789
790@ @<Global...@>=
791@!name_ptr:name_pointer; {first unused position in |byte_start|}
792@!string_ptr:name_pointer; {next number to be given to a string of length |<>1|}
793@!byte_ptr:array [0..ww-1] of 0..max_bytes;
794  {first unused position in |byte_mem|}
795@!pool_check_sum:integer; {sort of a hash for the whole string pool}
796
797@ @<Local variables for init...@>=
798@!wi: 0..ww-1; {to initialize the |byte_mem| indices}
799
800@ @<Set init...@>=
801for wi:=0 to ww-1 do
802  begin byte_start[wi]:=0; byte_ptr[wi]:=0;
803  end;
804byte_start[ww]:=0; {this makes name 0 of length zero}
805name_ptr:=1; string_ptr:=256; pool_check_sum:=271828;
806
807@ Replacement texts are stored in |tok_mem|, using similar conventions.
808A `|text_pointer|' variable is an index into |tok_start|, and the
809replacement text that corresponds to |p| runs from positions
810|tok_start[p]| to |tok_start[p+zz]-1|, inclusive, in the segment of
811|tok_mem| whose first index is |p mod zz|. Thus, when |zz=2| the
812even-numbered replacement texts appear in |tok_mem[0,@t$*$@>]| and the
813odd-numbered ones appear in |tok_mem[1,@t$*$@>]|.  Furthermore,
814|text_link[p]| is used to connect pieces of text that have the same name,
815as we shall see later. The pointer 0 is used for undefined replacement
816texts.
817
818The first position of |tok_mem[z,@t$*$@>]| that is unoccupied by
819replacement text is called |tok_ptr[z]|, and the first unused location of
820|tok_start| is called |text_ptr|.  We usually have the identity
821|tok_start[text_ptr+z]=tok_ptr[(text_ptr+z) mod zz]|, for |0<=z<zz|, since
822these are the starting positions for the next |zz| replacement texts to
823be stored in |tok_mem|.
824
825@<Types...@>=
826@!text_pointer=0..max_texts; {identifies a replacement text}
827
828@ It is convenient to maintain a variable |z| that is equal to |text_ptr
829mod zz|, so that we always insert tokens into segment |z| of |tok_mem|.
830
831@<Glob...@>=
832@t\hskip1em@>@!text_ptr:text_pointer; {first unused position in |tok_start|}
833@t\hskip1em@>@!tok_ptr:array[0..zz-1] of 0..max_toks;
834  {first unused position in a given segment of |tok_mem|}
835@t\hskip1em@>@!z:0..zz-1; {current segment of |tok_mem|}
836stat @!max_tok_ptr:array[0..zz-1] of 0..max_toks;
837  {largest values assumed by |tok_ptr|}
838tats
839
840@ @<Local variables for init...@>=
841@!zi:0..zz-1; {to initialize the |tok_mem| indices}
842
843@ @<Set init...@>=
844for zi:=0 to zz-1 do
845  begin tok_start[zi]:=0; tok_ptr[zi]:=0;
846  end;
847tok_start[zz]:=0; {this makes replacement text 0 of length zero}
848text_ptr:=1; z:=1 mod zz;
849
850@ Four types of identifiers are distinguished by their |ilk|:
851
852\yskip\hang |normal| identifiers will appear in the \PASCAL\ program as
853ordinary identifiers since they have not been defined to be macros; the
854corresponding value in the |equiv| array
855for such identifiers is a link in a secondary hash table that
856is used to check whether any two of them agree in their first |unambig_length|
857characters after underline symbols are removed and lowercase letters are
858changed to uppercase.
859
860\yskip\hang |numeric| identifiers have been defined to be numeric macros;
861their |equiv| value contains the corresponding numeric value plus $2^{15}$.
862Strings are treated as numeric macros.
863
864\yskip\hang |simple| identifiers have been defined to be simple macros;
865their |equiv| value points to the corresponding replacement text.
866
867\yskip\hang |parametric| identifiers have been defined to be parametric macros;
868like simple identifiers, their |equiv| value points to the replacement text.
869
870@d normal=0 {ordinary identifiers have |normal| ilk}
871@d numeric=1 {numeric macros and strings have |numeric| ilk}
872@d simple=2 {simple macros have |simple| ilk}
873@d parametric=3 {parametric macros have |parametric| ilk}
874
875@ The names of modules are stored in |byte_mem| together
876with the identifier names, but a hash table is not used for them because
877\.{TANGLE} needs to be able to recognize a module name when given a prefix of
878that name. A conventional binary seach tree is used to retrieve module names,
879with fields called |llink| and |rlink| in place of |link| and |ilk|. The
880root of this tree is |rlink[0]|. If |p| is a pointer to a module name,
881|equiv[p]| points to its replacement text, just as in simple and parametric
882macros, unless this replacement text has not yet been defined (in which case
883|equiv[p]=0|).
884
885@d llink==link {left link in binary search tree for module names}
886@d rlink==ilk {right link in binary search tree for module names}
887
888@<Set init...@>=
889rlink[0]:=0; {the binary search tree starts out with nothing in it}
890equiv[0]:=0; {the undefined module has no replacement text}
891
892@ Here is a little procedure that prints the text of a given name.
893
894@p procedure print_id(@!p:name_pointer); {print identifier or module name}
895var k:0..max_bytes; {index into |byte_mem|}
896@!w:0..ww-1; {segment of |byte_mem|}
897begin if p>=name_ptr then print('IMPOSSIBLE')
898else  begin w:=p mod ww;
899  for k:=byte_start[p] to byte_start[p+ww]-1 do print(xchr[byte_mem[w,k]]);
900  end;
901end;
902
903@* Searching for identifiers.
904The hash table described above is updated by the |id_lookup| procedure,
905which finds a given identifier and returns a pointer to its index in
906|byte_start|. If the identifier was not already present, it is inserted with
907a given |ilk| code; and an error message is printed if the identifier is being
908doubly defined.
909
910Because of the way \.{TANGLE}'s scanning mechanism works, it is most convenient
911to let |id_lookup| search for an identifier that is present in the |buffer|
912array. Two other global variables specify its position in the buffer: the
913first character is |buffer[id_first]|, and the last is |buffer[id_loc-1]|.
914Furthermore, if the identifier is really a string, the global variable
915|double_chars| tells how many of the characters in the buffer appear
916twice (namely \.{@@@@} and \.{""}), since this additional information makes
917it easy to calculate the true length of the string. The final double-quote
918of the string is not included in its ``identifier,'' but the first one is,
919so the string length is |id_loc-id_first-double_chars-1|.
920
921We have mentioned that |normal| identifiers belong to two hash tables,
922one for their true names as they appear in the \.{WEB} file and the other
923when they have been reduced to their first |unambig_length| characters.
924The hash tables are kept by the method of simple chaining, where the
925heads of the individual lists appear in the |hash| and |chop_hash| arrays.
926If |h| is a hash code, the primary hash table list starts at |hash[h]| and
927proceeds through |link| pointers; the secondary hash table list starts at
928|chop_hash[h]| and proceeds through |equiv| pointers. Of course, the same
929identifier will probably have two different values of |h|.
930
931The |id_lookup| procedure uses an auxiliary array called |chopped_id| to
932contain up to |unambig_length| characters of the current identifier, if
933it is necessary to compute the secondary hash code. (This array could be
934declared local to |id_lookup|, but in general we are making all array
935declarations global in this program, because some compilers and some machine
936architectures make dynamic array allocation inefficient.)
937
938@<Glob...@>=
939@!id_first:0..buf_size; {where the current identifier begins in the buffer}
940@!id_loc:0..buf_size; {just after the current identifier in the buffer}
941@!double_chars:0..buf_size; {correction to length in case of strings}
942@#
943@!hash,@!chop_hash:array [0..hash_size] of sixteen_bits; {heads of hash lists}
944@!chopped_id:array [0..unambig_length] of ASCII_code; {chopped identifier}
945
946@ Initially all the hash lists are empty.
947
948@<Local variables for init...@>=
949@!h:0..hash_size; {index into hash-head arrays}
950
951@ @<Set init...@>=
952for h:=0 to hash_size-1 do
953  begin hash[h]:=0; chop_hash[h]:=0;
954  end;
955
956@ Here now is the main procedure for finding identifiers (and strings).
957The parameter |t| is set to |normal| except when the identifier is
958a macro name that is just being defined; in the latter case, |t| will be
959|numeric|, |simple|, or |parametric|.
960
961@p function id_lookup(@!t:eight_bits):name_pointer; {finds current identifier}
962label found, not_found;
963var c:eight_bits; {byte being chopped}
964@!i:0..buf_size; {index into |buffer|}
965@!h:0..hash_size; {hash code}
966@!k:0..max_bytes; {index into |byte_mem|}
967@!w:0..ww-1; {segment of |byte_mem|}
968@!l:0..buf_size; {length of the given identifier}
969@!p,@!q:name_pointer; {where the identifier is being sought}
970@!s:0..unambig_length; {index into |chopped_id|}
971begin l:=id_loc-id_first; {compute the length}
972@<Compute the hash code |h|@>;
973@<Compute the name location |p|@>;
974if (p=name_ptr)or(t<>normal) then
975  @<Update the tables and check for possible errors@>;
976id_lookup:=p;
977end;
978
979@ A simple hash code is used: If the sequence of
980ASCII codes is $c_1c_2\ldots c_m$, its hash value will be
981$$(2^{n-1}c_1+2^{n-2}c_2+\cdots+c_n)\,\bmod\,|hash_size|.$$
982
983@<Compute the hash...@>=
984h:=buffer[id_first]; i:=id_first+1;
985while i<id_loc do
986  begin h:=(h+h+buffer[i]) mod hash_size; incr(i);
987  end
988
989@ If the identifier is new, it will be placed in position |p=name_ptr|,
990otherwise |p| will point to its existing location.
991
992@<Compute the name location...@>=
993p:=hash[h];
994while p<>0 do
995  begin if length(p)=l then
996      @<Compare name |p| with current identifier, |goto found| if equal@>;
997  p:=link[p];
998  end;
999p:=name_ptr; {the current identifier is new}
1000link[p]:=hash[h]; hash[h]:=p; {insert |p| at beginning of hash list}
1001found:
1002
1003@ @<Compare name |p|...@>=
1004begin i:=id_first; k:=byte_start[p]; w:=p mod ww;
1005while (i<id_loc)and(buffer[i]=byte_mem[w,k]) do
1006  begin incr(i); incr(k);
1007  end;
1008if i=id_loc then goto found; {all characters agree}
1009end
1010
1011@ @<Update the tables...@>=
1012begin if ((p<>name_ptr)and(t<>normal)and(ilk[p]=normal)) or
1013    ((p=name_ptr)and(t=normal)and(buffer[id_first]<>"""")) then
1014  @<Compute the secondary hash code |h| and put the first characters
1015  into the auxiliary array |chopped_id|@>;
1016if p<>name_ptr then
1017  @<Give double-definition error, if necessary, and change |p| to type |t|@>
1018else @<Enter a new identifier into the table at position |p|@>;
1019end
1020
1021@ The following routine, which is called into play when it is necessary to
1022look at the secondary hash table, computes the same hash function as before
1023(but on the chopped data), and places a zero after the chopped identifier
1024in |chopped_id| to serve as a convenient sentinel.
1025
1026@<Compute the secondary...@>=
1027begin i:=id_first; s:=0; h:=0;
1028while (i<id_loc)and(s<unambig_length) do
1029  begin if buffer[i]<>"_" then
1030    begin if buffer[i]>="a" then chopped_id[s]:=buffer[i]-@'40
1031    else chopped_id[s]:=buffer[i];
1032    h:=(h+h+chopped_id[s]) mod hash_size; incr(s);
1033    end;
1034  incr(i);
1035  end;
1036chopped_id[s]:=0;
1037end
1038
1039@ If a nonnumeric macro has appeared before it was defined, \.{TANGLE}
1040will still work all right; after all, such behavior is typical of the
1041replacement texts for modules, which act very much like macros.
1042However, an undefined numeric macro may not be used on the right-hand
1043side of another numeric macro definition, so \.{TANGLE} finds it
1044simplest to make a blanket rule that numeric macros should be defined
1045before they are used. The following routine gives an error message and
1046also fixes up any damage that may have been caused.
1047
1048@<Give double...@>= {now |p<>name_ptr| and |t<>normal|}
1049begin if ilk[p]=normal then
1050  begin if t=numeric then err_print('! This identifier has already appeared');
1051@.This identifier has already...@>
1052  @<Remove |p| from secondary hash table@>;
1053  end
1054else err_print('! This identifier was defined before');
1055@.This identifier was defined...@>
1056ilk[p]:=t;
1057end
1058
1059@ When we have to remove a secondary hash entry, because a |normal| identifier
1060is changing to another |ilk|, the hash code |h| and chopped identifier have
1061already been computed.
1062
1063@<Remove |p| from secondary...@>=
1064q:=chop_hash[h];
1065if q=p then chop_hash[h]:=equiv[p]
1066else  begin while equiv[q]<>p do q:=equiv[q];
1067  equiv[q]:=equiv[p];
1068  end
1069
1070@ The following routine could make good use of a generalized |pack| procedure
1071that puts items into just part of a packed array instead of the whole thing.
1072
1073@<Enter a new identifier...@>=
1074begin if (t=normal)and(buffer[id_first]<>"""") then
1075  @<Check for ambiguity and update secondary hash@>;
1076w:=name_ptr mod ww; k:=byte_ptr[w];
1077if k+l>max_bytes then overflow('byte memory');
1078if name_ptr>max_names-ww then overflow('name');
1079i:=id_first; {get ready to move the identifier into |byte_mem|}
1080while i<id_loc do
1081  begin byte_mem[w,k]:=buffer[i]; incr(k); incr(i);
1082  end;
1083byte_ptr[w]:=k; byte_start[name_ptr+ww]:=k; incr(name_ptr);
1084if buffer[id_first]<>"""" then ilk[p]:=t
1085else @<Define and output a new string of the pool@>;
1086end
1087
1088@ @<Check for ambig...@>=
1089begin q:=chop_hash[h];
1090while q<>0 do
1091  begin @<Check if |q| conflicts with |p|@>;
1092  q:=equiv[q];
1093  end;
1094equiv[p]:=chop_hash[h]; chop_hash[h]:=p; {put |p| at front of secondary list}
1095end
1096
1097@ @<Check if |q| conflicts...@>=
1098begin k:=byte_start[q]; s:=0; w:=q mod ww;
1099while (k<byte_start[q+ww]) and (s<unambig_length) do
1100  begin c:=byte_mem[w,k];
1101  if c<>"_" then
1102    begin if c>="a" then c:=c-@'40; {merge lowercase with uppercase}
1103    if chopped_id[s]<>c then goto not_found;
1104    incr(s);
1105    end;
1106  incr(k);
1107  end;
1108if (k=byte_start[q+ww])and(chopped_id[s]<>0) then goto not_found;
1109print_nl('! Identifier conflict with ');
1110@.Identifier conflict...@>
1111for k:=byte_start[q] to byte_start[q+ww]-1 do print(xchr[byte_mem[w,k]]);
1112error; q:=0; {only one conflict will be printed, since |equiv[0]=0|}
1113not_found:
1114end
1115
1116@ We compute the string pool check sum by working modulo a prime number
1117that is large but not so large that overflow might occur.
1118
1119@d check_sum_prime==@'3777777667 {$2^{29}-73$}
1120@^preprocessed strings@>
1121
1122@<Define and output a new string...@>=
1123begin ilk[p]:=numeric; {strings are like numeric macros}
1124if l-double_chars=2 then {this string is for a single character}
1125  equiv[p]:=buffer[id_first+1]+@'100000
1126else  begin equiv[p]:=string_ptr+@'100000;
1127  l:=l-double_chars-1;
1128  if l>99 then err_print('! Preprocessed string is too long');
1129@.Preprocessed string is too long@>
1130  incr(string_ptr);
1131  write(pool,xchr["0"+l div 10],xchr["0"+l mod 10]); {output the length}
1132  pool_check_sum:=pool_check_sum+pool_check_sum+l;
1133  while pool_check_sum>check_sum_prime do
1134    pool_check_sum:=pool_check_sum-check_sum_prime;
1135  i:=id_first+1;
1136  while i<id_loc do
1137    begin write(pool,xchr[buffer[i]]); {output characters of string}
1138    pool_check_sum:=pool_check_sum+pool_check_sum+buffer[i];
1139    while pool_check_sum>check_sum_prime do
1140      pool_check_sum:=pool_check_sum-check_sum_prime;
1141    if (buffer[i]="""") or (buffer[i]="@@") then
1142      i:=i+2 {omit second appearance of doubled character}
1143    else incr(i);
1144    end;
1145  write_ln(pool);
1146  end;
1147end
1148
1149@* Searching for module names.
1150The |mod_lookup| procedure finds the module name |mod_text[1..l]| in the
1151search tree, after inserting it if necessary, and returns a pointer to
1152where it was found.
1153
1154@<Glob...@>=
1155@!mod_text:array [0..longest_name] of ASCII_code; {name being sought for}
1156
1157@ According to the rules of \.{WEB}, no module name
1158should be a proper prefix of another, so a ``clean'' comparison should
1159occur between any two names. The result of |mod_lookup| is 0 if this
1160prefix condition is violated. An error message is printed when such violations
1161are detected during phase two of \.{WEAVE}.
1162
1163@d less=0 {the first name is lexicographically less than the second}
1164@d equal=1 {the first name is equal to the second}
1165@d greater=2 {the first name is lexicographically greater than the second}
1166@d prefix=3 {the first name is a proper prefix of the second}
1167@d extension=4 {the first name is a proper extension of the second}
1168
1169@p function mod_lookup(@!l:sixteen_bits):name_pointer; {finds module name}
1170label found;
1171var c:less..extension; {comparison between two names}
1172@!j:0..longest_name; {index into |mod_text|}
1173@!k:0..max_bytes; {index into |byte_mem|}
1174@!w:0..ww-1; {segment of |byte_mem|}
1175@!p:name_pointer; {current node of the search tree}
1176@!q:name_pointer; {father of node |p|}
1177begin c:=greater; q:=0; p:=rlink[0]; {|rlink[0]| is the root of the tree}
1178while p<>0 do
1179  begin @<Set \(|c| to the result of comparing the given name to
1180    name |p|@>;
1181  q:=p;
1182  if c=less then p:=llink[q]
1183  else if c=greater then p:=rlink[q]
1184  else goto found;
1185  end;
1186@<Enter a new module name into the tree@>;
1187found: if c<>equal then
1188  begin err_print('! Incompatible section names'); p:=0;
1189@.Incompatible module names@>
1190  end;
1191mod_lookup:=p;
1192end;
1193
1194@ @<Enter a new module name...@>=
1195w:=name_ptr mod ww; k:=byte_ptr[w];
1196if k+l>max_bytes then overflow('byte memory');
1197if name_ptr>max_names-ww then overflow('name');
1198p:=name_ptr;
1199if c=less then llink[q]:=p else rlink[q]:=p;
1200llink[p]:=0; rlink[p]:=0; c:=equal; equiv[p]:=0;
1201for j:=1 to l do byte_mem[w,k+j-1]:=mod_text[j];
1202byte_ptr[w]:=k+l; byte_start[name_ptr+ww]:=k+l; incr(name_ptr);
1203
1204@ @<Set \(|c|...@>=
1205begin k:=byte_start[p]; w:=p mod ww; c:=equal; j:=1;
1206while (k<byte_start[p+ww]) and (j<=l) and (mod_text[j]=byte_mem[w,k]) do
1207  begin incr(k); incr(j);
1208  end;
1209if k=byte_start[p+ww] then
1210  if j>l then c:=equal
1211  else c:=extension
1212else if j>l then c:=prefix
1213else if mod_text[j]<byte_mem[w,k] then c:=less
1214else c:=greater;
1215end
1216
1217@ The |prefix_lookup| procedure is supposed to find exactly one module
1218name that has |mod_text[1..l]| as a prefix. Actually the algorithm silently
1219accepts also the situation that some module name is a prefix of
1220|mod_text[1..l]|, because the user who painstakingly typed in more than
1221necessary probably doesn't want to be told about the wasted effort.
1222
1223@p function prefix_lookup(@!l:sixteen_bits):name_pointer; {finds name extension}
1224var c:less..extension; {comparison between two names}
1225@!count:0..max_names; {the number of hits}
1226@!j:0..longest_name; {index into |mod_text|}
1227@!k:0..max_bytes; {index into |byte_mem|}
1228@!w:0..ww-1; {segment of |byte_mem|}
1229@!p:name_pointer; {current node of the search tree}
1230@!q:name_pointer; {another place to resume the search after one branch is done}
1231@!r:name_pointer; {extension found}
1232begin q:=0; p:=rlink[0]; count:=0; r:=0; {begin search at root of tree}
1233while p<>0 do
1234  begin @<Set \(|c|...@>;
1235  if c=less then p:=llink[p]
1236  else if c=greater then p:=rlink[p]
1237  else  begin r:=p; incr(count); q:=rlink[p]; p:=llink[p];
1238    end;
1239  if p=0 then
1240    begin p:=q; q:=0;
1241    end;
1242  end;
1243if count<>1 then
1244  if count=0 then err_print('! Name does not match')
1245@.Name does not match@>
1246  else err_print('! Ambiguous prefix');
1247@.Ambiguous prefix@>
1248prefix_lookup:=r; {the result will be 0 if there was no match}
1249end;
1250
1251@* Tokens.
1252Replacement texts, which represent \PASCAL\ code in a compressed format,
1253appear in |tok_mem| as mentioned above. The codes in
1254these texts are called `tokens'; some tokens occupy two consecutive
1255eight-bit byte positions, and the others take just one byte.
1256
1257If $p>0$ points to a replacement text, |tok_start[p]| is the |tok_mem| position
1258of the first eight-bit code of that text. If |text_link[p]=0|,
1259this is the replacement text for a macro, otherwise it is the replacement
1260text for a module. In the latter case |text_link[p]| is either equal to
1261|module_flag|, which means that there is no further text for this module, or
1262|text_link[p]| points to a
1263continuation of this replacement text; such links are created when
1264several modules have \PASCAL\ texts with the same name, and they also
1265tie together all the \PASCAL\ texts of unnamed modules.
1266The replacement text pointer for the first unnamed module
1267appears in |text_link[0]|, and the most recent such pointer is |last_unnamed|.
1268
1269@d module_flag==max_texts {final |text_link| in module replacement texts}
1270
1271@<Glob...@>=
1272@!last_unnamed:text_pointer; {most recent replacement text of unnamed module}
1273
1274@ @<Set init...@>= last_unnamed:=0; text_link[0]:=0;
1275
1276@ If the first byte of a token is less than @'200, the token occupies a
1277single byte. Otherwise we make a sixteen-bit token by combining two consecutive
1278bytes |a| and |b|. If |@'200<=a<@'250|, then $(a-@'200)\times2^8+b$ points
1279to an identifier; if |@'250<=a<@'320|, then
1280$(a-@'250)\times2^8+b$ points to a module name; otherwise, i.e., if
1281|@'320<=a<@'400|, then $(a-@'320)\times2^8+b$ is the number of the module
1282in which the current replacement text appears.
1283
1284Codes less than @'200 are 7-bit ASCII codes that represent themselves.
1285In particular, a single-character identifier like `|x|' will be a one-byte
1286token, while all longer identifiers will occupy two bytes.
1287
1288Some of the 7-bit ASCII codes will not be present, however, so we can
1289use them for special purposes. The following symbolic names are used:
1290
1291\yskip\hang |param| denotes insertion of a parameter. This occurs only in
1292the replacement texts of parametric macros, outside of single-quoted strings
1293in those texts.
1294
1295\hang |begin_comment| denotes \.{@@\{}, which will become either
1296\.{\{} or \.{[}.
1297
1298\hang |end_comment| denotes \.{@@\}}, which will become either
1299\.{\}} or \.{]}.
1300
1301\hang |octal| denotes the \.{@@\'} that precedes an octal constant.
1302
1303\hang |hex| denotes the \.{@@"} that precedes a hexadecimal constant.
1304
1305\hang |check_sum| denotes the \.{@@\char'44} that denotes the string pool
1306check sum.
1307
1308\hang |join| denotes the concatenation of adjacent items with no
1309space or line breaks allowed between them (the \.{@@\&} operation of \.{WEB}).
1310
1311\hang |double_dot| denotes `\.{..}' in \PASCAL.
1312
1313\hang |verbatim| denotes the \.{@@=} that begins a verbatim \PASCAL\ string.
1314It is also used for the end of the string.
1315
1316\hang |force_line| denotes the \.{@@\\} that forces a new line in the
1317\PASCAL\ output.
1318@^ASCII code@>
1319
1320@d param=0 {ASCII null code will not appear}
1321@d verbatim=@'2 {extended ASCII alpha should not appear}
1322@d force_line=@'3 {extended ASCII beta should not appear}
1323@d begin_comment=@'11 {ASCII tab mark will not appear}
1324@d end_comment=@'12 {ASCII line feed will not appear}
1325@d octal=@'14 {ASCII form feed will not appear}
1326@d hex=@'15 {ASCII carriage return will not appear}
1327@d double_dot=@'40 {ASCII space will not appear except in strings}
1328@d check_sum=@'175 {will not be confused with right brace}
1329@d join=@'177 {ASCII delete will not appear}
1330
1331@ The following procedure is used to enter a two-byte value into
1332|tok_mem| when a replacement text is being generated.
1333
1334@p procedure store_two_bytes(@!x:sixteen_bits);
1335  {stores high byte, then low byte}
1336begin if tok_ptr[z]+2>max_toks then overflow('token');
1337tok_mem[z,tok_ptr[z]]:=x div@'400; {this could be done by a shift command}
1338tok_mem[z,tok_ptr[z]+1]:=x mod@'400; {this could be done by a logical and}
1339tok_ptr[z]:=tok_ptr[z]+2;
1340end;
1341
1342@ When \.{TANGLE} is being operated in debug mode, it has a procedure to display
1343a replacement text in symbolic form. This procedure has not been spruced up to
1344generate a real great format, but at least the results are not as bad as
1345a memory dump.
1346
1347@p @!debug procedure print_repl(@!p:text_pointer);
1348var k:0..max_toks; {index into |tok_mem|}
1349@!a: sixteen_bits; {current byte(s)}
1350@!zp: 0..zz-1; {segment of |tok_mem| being accessed}
1351begin if p>=text_ptr then print('BAD')
1352else  begin k:=tok_start[p]; zp:=p mod zz;
1353  while k<tok_start[p+zz] do
1354    begin a:=tok_mem[zp,k];
1355    if a>=@'200 then @<Display two-byte token starting with |a|@>
1356    else @<Display one-byte token |a|@>;
1357    incr(k);
1358    end;
1359  end;
1360end;
1361gubed
1362
1363@ @<Display two-byte...@>=
1364begin incr(k);
1365if a<@'250 then {identifier or string}
1366  begin a:=(a-@'200)*@'400+tok_mem[zp,k]; print_id(a);
1367  if byte_mem[a mod ww,byte_start[a]]="""" then print('"')
1368  else print(' ');
1369  end
1370else if a<@'320 then {module name}
1371  begin print('@@<'); print_id((a-@'250)*@'400+tok_mem[zp,k]);
1372  print('@@>');
1373  end
1374else  begin a:=(a-@'320)*@'400+tok_mem[zp,k]; {module number}
1375  print('@@',xchr["{"],a:1,'@@',xchr["}"]); {can't use right brace
1376    between \&{debug} and \&{gubed}}
1377  end;
1378end
1379
1380@ @<Display one-byte...@>=
1381case a of
1382begin_comment: print('@@',xchr["{"]);
1383end_comment: print('@@',xchr["}"]); {can't use right brace
1384    between \&{debug} and \&{gubed}}
1385octal: print('@@''');
1386hex: print('@@"');
1387check_sum: print('@@$');
1388param: print('#');
1389"@@": print('@@@@');
1390verbatim: print('@@=');
1391force_line: print('@@\');
1392othercases print(xchr[a])
1393endcases
1394
1395@* Stacks for output.
1396Let's make sure that our data structures contain enough information to
1397produce the entire \PASCAL\ program as desired, by working next on the
1398algorithms that actually do produce that program.
1399
1400@ The output process uses a stack to keep track of what is going on at
1401different ``levels'' as the macros are being expanded.
1402Entries on this stack have five parts:
1403
1404\yskip\hang |end_field| is the |tok_mem| location where the replacement
1405text of a particular level will end;
1406
1407\hang |byte_field| is the |tok_mem| location from which the next token
1408on a particular level will be read;
1409
1410\hang |name_field| points to the name corresponding to a particular level;
1411
1412\hang |repl_field| points to the replacement text currently being read
1413at a particular level;
1414
1415\hang |mod_field| is the module number, or zero if this is a macro.
1416
1417\yskip\noindent The current values of these five quantities are referred to
1418quite frequently, so they are stored in a separate place instead of in
1419the |stack| array. We call the current values |cur_end|, |cur_byte|,
1420|cur_name|, |cur_repl|, and |cur_mod|.
1421
1422The global variable |stack_ptr| tells how many levels of output are
1423currently in progress. The end of all output occurs when the stack is
1424empty, i.e., when |stack_ptr=0|.
1425
1426@<Types...@>=
1427@t\4@>@!output_state=record
1428  @!end_field: sixteen_bits; {ending location of replacement text}
1429  @!byte_field: sixteen_bits; {present location within replacement text}
1430  @!name_field: name_pointer; {|byte_start| index for text being output}
1431  @!repl_field: text_pointer; {|tok_start| index for text being output}
1432  @!mod_field: 0..@'27777; {module number or zero if not a module}
1433  end;
1434
1435@ @d cur_end==cur_state.end_field {current ending location in |tok_mem|}
1436@d cur_byte==cur_state.byte_field {location of next output byte in |tok_mem|}
1437@d cur_name==cur_state.name_field {pointer to current name being expanded}
1438@d cur_repl==cur_state.repl_field {pointer to current replacement text}
1439@d cur_mod==cur_state.mod_field {current module number being expanded}
1440
1441@<Globals...@>=
1442@!cur_state : output_state; {|cur_end|, |cur_byte|, |cur_name|,
1443  |cur_repl|, |cur_mod|}
1444@!stack : array [1..stack_size] of output_state; {info for non-current levels}
1445@!stack_ptr: 0..stack_size; {first unused location in the output state stack}
1446
1447@ It is convenient to keep a global variable |zo| equal to |cur_repl mod zz|.
1448
1449@<Glob...@>=
1450@!zo:0..zz-1; {the segment of |tok_mem| from which output is coming}
1451
1452@ Parameters must also be stacked. They are placed in
1453|tok_mem| just above the other replacement texts, and dummy parameter
1454`names' are placed in |byte_start| just after the other names.
1455The variables |text_ptr| and |tok_ptr[z]| essentially serve as parameter
1456stack pointers during the output phase, so there is no need for a separate
1457data structure to handle this problem.
1458
1459@ There is an implicit stack corresponding to meta-comments that are output
1460via \.{@@\{} and \.{@@\}}. But this stack need not be represented in detail,
1461because we only need to know whether it is empty or not. A global variable
1462|brace_level| tells how many items would be on this stack if it were present.
1463
1464@<Globals...@>=
1465@!brace_level: eight_bits; {current depth of $\.{@@\{}\ldots\.{@@\}}$ nesting}
1466
1467@ To get the output process started, we will perform the following
1468initialization steps. We may assume that |text_link[0]| is nonzero, since it
1469points to the \PASCAL\ text in the first unnamed module that generates
1470code; if there are no such modules, there is nothing to output, and an
1471error message will have been generated before we do any of the initialization.
1472
1473@<Initialize the output stacks@>=
1474stack_ptr:=1; brace_level:=0; cur_name:=0; cur_repl:=text_link[0];
1475zo:=cur_repl mod zz; cur_byte:=tok_start[cur_repl];
1476cur_end:=tok_start[cur_repl+zz]; cur_mod:=0;
1477
1478@ When the replacement text for name |p| is to be inserted into the output,
1479the following subroutine is called to save the old level of output and get
1480the new one going.
1481
1482@p procedure push_level(@!p:name_pointer); {suspends the current level}
1483begin if stack_ptr=stack_size then overflow('stack')
1484else  begin stack[stack_ptr]:=cur_state; {save |cur_end|, |cur_byte|, etc.}
1485  incr(stack_ptr);
1486  cur_name:=p; cur_repl:=equiv[p]; zo:=cur_repl mod zz;
1487  cur_byte:=tok_start[cur_repl]; cur_end:=tok_start[cur_repl+zz];
1488  cur_mod:=0;
1489  end;
1490end;
1491
1492@ When we come to the end of a replacement text, the |pop_level| subroutine
1493does the right thing: It either moves to the continuation of this replacement
1494text or returns the state to the most recently stacked level. Part of this
1495subroutine, which updates the parameter stack, will be given later when we
1496study the parameter stack in more detail.
1497
1498@p procedure pop_level; {do this when |cur_byte| reaches |cur_end|}
1499label exit;
1500begin if text_link[cur_repl]=0 then {end of macro expansion}
1501  begin if ilk[cur_name]=parametric then
1502    @<Remove a parameter from the parameter stack@>;
1503  end
1504else if text_link[cur_repl]<module_flag then {link to a continuation}
1505  begin cur_repl:=text_link[cur_repl]; {we will stay on the same level}
1506  zo:=cur_repl mod zz;
1507  cur_byte:=tok_start[cur_repl]; cur_end:=tok_start[cur_repl+zz];
1508  return;
1509  end;
1510decr(stack_ptr); {we will go down to the previous level}
1511if stack_ptr>0 then
1512  begin cur_state:=stack[stack_ptr]; zo:=cur_repl mod zz;
1513  end;
1514exit: end;
1515
1516@ The heart of the output procedure is the |get_output| routine, which produces
1517the next token of output that is not a reference to a macro. This procedure
1518handles all the stacking and unstacking that is necessary. It returns the
1519value |number| if the next output has a numeric value (the value of a
1520numeric macro or string), in which case |cur_val| has been set to the
1521number in question. The procedure also returns the value |module_number|
1522if the next output begins or ends the replacement text of some module,
1523in which case |cur_val| is that module's number (if beginning) or the
1524negative of that value (if ending). And it returns the value |identifier|
1525if the next output is an identifier of length two or more, in which case
1526|cur_val| points to that identifier name.
1527
1528@d number=@'200 {code returned by |get_output| when next output is numeric}
1529@d module_number=@'201 {code returned by |get_output| for module numbers}
1530@d identifier=@'202 {code returned by |get_output| for identifiers}
1531
1532@<Globals...@>=
1533@!cur_val:integer; {additional information corresponding to output token}
1534
1535@ If |get_output| finds that no more output remains, it returns the value zero.
1536
1537@p function get_output:sixteen_bits; {returns next token after macro expansion}
1538label restart, done, found;
1539var a:sixteen_bits; {value of current byte}
1540@!b:eight_bits; {byte being copied}
1541@!bal:sixteen_bits; {excess of \.( versus \.) while copying a parameter}
1542@!k:0..max_bytes; {index into |byte_mem|}
1543@!w:0..ww-1; {segment of |byte_mem|}
1544begin restart: if stack_ptr=0 then
1545  begin a:=0; goto found;
1546  end;
1547if cur_byte=cur_end then
1548  begin cur_val:=-cur_mod; pop_level;
1549  if cur_val=0 then goto restart;
1550  a:=module_number; goto found;
1551  end;
1552a:=tok_mem[zo,cur_byte]; incr(cur_byte);
1553if a<@'200 then {one-byte token}
1554  if a=param then
1555      @<Start scanning current macro parameter, |goto restart|@>
1556  else goto found;
1557a:=(a-@'200)*@'400+tok_mem[zo,cur_byte]; incr(cur_byte);
1558if a<@'24000 then {|@'24000=(@'250-@'200)*@'400|}
1559  @<Expand macro |a| and |goto found|, or |goto restart| if no output found@>;
1560if a<@'50000 then {|@'50000=(@'320-@'200)*@'400|}
1561  @<Expand module |a-@'24000|, |goto restart|@>;
1562cur_val:=a-@'50000; a:=module_number; cur_mod:=cur_val;
1563found:
1564@!debug if trouble_shooting then debug_help;@;@+gubed@/
1565get_output:=a;
1566end;
1567
1568@ The user may have forgotten to give any \PASCAL\ text for a module name,
1569or the \PASCAL\ text may have been associated with a different name by mistake.
1570
1571@<Expand module |a-...@>=
1572begin a:=a-@'24000;
1573if equiv[a]<>0 then push_level(a)
1574else if a<>0 then
1575  begin print_nl('! Not present: <'); print_id(a); print('>'); error;
1576@.Not present: <section name>@>
1577  end;
1578goto restart;
1579end
1580
1581@ @<Expand macro ...@>=
1582begin case ilk[a] of
1583normal: begin cur_val:=a; a:=identifier;
1584  end;
1585numeric: begin cur_val:=equiv[a]-@'100000; a:=number;
1586  end;
1587simple: begin push_level(a); goto restart;
1588  end;
1589parametric: begin @<Put a parameter on the parameter stack,
1590  or |goto restart| if error occurs@>;
1591  push_level(a); goto restart;
1592  end;
1593othercases confusion('output')
1594endcases;@/
1595goto found;
1596end
1597
1598@ We come now to the interesting part, the job of putting a parameter on
1599the parameter stack. First we pop the stack if necessary until getting to
1600a level that hasn't ended. Then the next character must be a `\.(';
1601and since parentheses are balanced on each level, the entire parameter must
1602be present, so we can copy it without difficulty.
1603
1604@<Put a parameter...@>=
1605while (cur_byte=cur_end)and(stack_ptr>0) do pop_level;
1606if (stack_ptr=0)or(tok_mem[zo,cur_byte]<>"(") then
1607  begin print_nl('! No parameter given for '); print_id(a); error;
1608@.No parameter given for macro@>
1609  goto restart;
1610  end;
1611@<Copy the parameter into |tok_mem|@>;
1612equiv[name_ptr]:=text_ptr; ilk[name_ptr]:=simple; w:=name_ptr mod ww;
1613k:=byte_ptr[w];
1614@!debug if k=max_bytes then overflow('byte memory');
1615byte_mem[w,k]:="#"; incr(k); byte_ptr[w]:=k;
1616gubed {this code has set the parameter identifier for debugging printouts}
1617if name_ptr>max_names-ww then overflow('name');
1618byte_start[name_ptr+ww]:=k; incr(name_ptr);
1619if text_ptr>max_texts-zz then overflow('text');
1620text_link[text_ptr]:=0; tok_start[text_ptr+zz]:=tok_ptr[z];
1621incr(text_ptr);
1622z:=text_ptr mod zz
1623
1624@ The |pop_level| routine undoes the effect of parameter-pushing when
1625a parameter macro is finished:
1626
1627@<Remove a parameter...@>=
1628begin decr(name_ptr); decr(text_ptr);
1629z:=text_ptr mod zz;
1630stat if tok_ptr[z]>max_tok_ptr[z] then max_tok_ptr[z]:=tok_ptr[z];
1631tats {the maximum value of |tok_ptr| occurs just before parameter popping}
1632tok_ptr[z]:=tok_start[text_ptr];
1633@!debug decr(byte_ptr[name_ptr mod ww]);@+gubed
1634end
1635
1636@ When a parameter occurs in a replacement text, we treat it as a simple
1637macro in position (|name_ptr-1|):
1638
1639@<Start scanning...@>=
1640begin push_level(name_ptr-1); goto restart;
1641end
1642
1643@ Similarly, a |param| token encountered as we copy a parameter is converted
1644into a simple macro call for |name_ptr-1|.
1645Some care is needed to handle cases like \\{macro}|(#; print('#)'))|; the
1646\.{\#} token will have been changed to |param| outside of strings, but we
1647still must distinguish `real' parentheses from those in strings.
1648
1649@d app_repl(#)==begin if tok_ptr[z]=max_toks then overflow('token');
1650  tok_mem[z,tok_ptr[z]]:=#; incr(tok_ptr[z]); end
1651
1652@<Copy the parameter...@>=
1653bal:=1; incr(cur_byte); {skip the opening `\.('}
1654loop@+  begin b:=tok_mem[zo,cur_byte]; incr(cur_byte);
1655  if b=param then store_two_bytes(name_ptr+@'77777)
1656  else  begin if b>=@'200 then
1657      begin app_repl(b);
1658      b:=tok_mem[zo,cur_byte]; incr(cur_byte);
1659      end
1660    else   case b of
1661      "(": incr(bal);
1662      ")":  begin decr(bal);
1663        if bal=0 then goto done;
1664        end;
1665      "'": repeat app_repl(b);
1666        b:=tok_mem[zo,cur_byte]; incr(cur_byte);
1667        until b="'"; {copy string, don't change |bal|}
1668      othercases do_nothing
1669      endcases;
1670    app_repl(b);
1671    end;
1672  end;
1673done:
1674
1675@* Producing the output.
1676The |get_output| routine above handles most of the complexity of output
1677generation, but there are two further considerations that have a nontrivial
1678effect on \.{TANGLE}'s algorithms.
1679
1680First, we want to make sure that the output is broken into lines not
1681exceeding |line_length| characters per line, where these breaks occur at
1682valid places (e.g., not in the middle of a string or a constant or an
1683identifier, not between `\.<' and `\.>', not at a `\.{@@\&}' position
1684where quantities are being joined together). Therefore we assemble the
1685output into a buffer before deciding where the line breaks will appear.
1686However, we make very little attempt to make ``logical'' line breaks that
1687would enhance the readability of the output; people are supposed to read
1688the input of \.{TANGLE} or the \TeX ed output of \.{WEAVE}, but not the
1689tangled-up output. The only concession to readability is that a break after
1690a semicolon will be made if possible, since commonly used ``pretty
1691printing'' routines give better results in such cases.
1692
1693Second, we want to decimalize non-decimal constants, and to combine integer
1694quantities that are added or subtracted, because \PASCAL\ doesn't allow
1695constant expressions in subrange types or in case labels. This means we
1696want to have a procedure that treats a construction like \.{(E-15+17)}
1697as equivalent to `\.{(E+2)}', while also leaving `\.{(1E-15+17)}' and
1698`\.{(E-15+17*y)}' untouched. Consider also `\.{-15+17.5}' versus
1699`\.{-15+17..5}'. We shall not combine integers preceding or following
1700\.*, \./, \.{div}, \.{mod}, or \.{@@\&}. Note that if |y| has been defined
1701to equal $-2$, we must expand `\.{x*y}' into `\.{x*(-2)}'; but `\.{x-y}'
1702can expand into `\.{x+2}' and we can even change `\.{x - y mod z}' to
1703@^mod@>
1704`\.{x + 2 mod z}' because \PASCAL\ has a nonstandard \&{mod} operation!
1705
1706The following solution to these problems has been adopted: An array
1707|out_buf| contains characters that have been generated but not yet output,
1708and there are three pointers into this array. One of these, |out_ptr|, is
1709the number of characters currently in the buffer, and we will have
1710|1<=out_ptr<=line_length| most of the time. The second is |break_ptr|,
1711which is the largest value |<=out_ptr| such that we are definitely entitled
1712to end a line by outputting the characters |out_buf[1..(break_ptr-1)]|;
1713we will always have |break_ptr<=line_length|. Finally, |semi_ptr| is either
1714zero or the largest known value of a legal break after a semicolon or comment
1715on the current line; we will always have |semi_ptr<=break_ptr|.
1716
1717@<Globals...@>=
1718@!out_buf: array [0..out_buf_size] of ASCII_code; {assembled characters}
1719@!out_ptr: 0..out_buf_size; {first available place in |out_buf|}
1720@!break_ptr: 0..out_buf_size; {last breaking place in |out_buf|}
1721@!semi_ptr: 0..out_buf_size; {last semicolon breaking place in |out_buf|}
1722
1723@ Besides having those three pointers,
1724the output process is in one of several states:
1725
1726\yskip\hang |num_or_id| means that the last item in the buffer is a number or
1727identifier, hence a blank space or line break must be inserted if the next
1728item is also a number or identifier.
1729
1730\yskip\hang |unbreakable| means that the last item in the buffer was followed
1731by the \.{@@\&} operation that inhibits spaces between it and the next item.
1732
1733\yskip\hang |sign| means that the last item in the buffer is to be followed
1734by \.+ or \.-, depending on whether |out_app| is positive or negative.
1735
1736\yskip\hang |sign_val| means that the decimal equivalent of
1737$\vert|out_val|\vert$ should be appended to the buffer. If |out_val<0|,
1738or if |out_val=0| and |last_sign<0|, the number should be preceded by a minus
1739sign. Otherwise it should be preceded by the character |out_sign| unless
1740|out_sign=0|; the |out_sign| variable is either 0 or \.{"\ "} or \.{"+"}.
1741
1742\yskip\hang |sign_val_sign| is like |sign_val|, but also append \.+ or \.-
1743afterwards, depending on whether |out_app| is positive or negative.
1744
1745\yskip\hang |sign_val_val| is like |sign_val|, but also append the decimal
1746equivalent of |out_app| including its sign, using |last_sign| in case
1747|out_app=0|.
1748
1749\yskip\hang |misc| means none of the above.
1750
1751\yskip\noindent
1752For example, the output buffer and output state run through the following
1753sequence as we generate characters from `\.{(x-15+19-2)}':
1754$$\vbox{\halign{$\hfil#\hfil$\quad&#\hfil&\quad\hfil#\hfil&\quad
1755\hfil#\hfil&\quad\hfil#\hfil&\quad\hfil#\hfil\quad&\hfil#\hfil\cr
1756output&|out_buf|&|out_state|&|out_sign|&|out_val|&|out_app|&|last_sign|\cr
1757\noalign{\vskip 3pt}
1758(&\.(&|misc|\cr
1759x&\.{(x}&|num_or_id|\cr
1760-&\.{(x}&|sign|&&&$-1$&$-1$\cr
176115&\.{(x}&|sign_val|&\.{"+"}&$-15$&&$-15$\cr
1762+&\.{(x}&|sign_val_sign|&\.{"+"}&$-15$&$+1$&$+1$\cr
176319&\.{(x}&|sign_val_val|&\.{"+"}&$-15$&$+19$&$+1$\cr
1764-&\.{(x}&|sign_val_sign|&\.{"+"}&$+4$&$-1$&$-1$\cr
17652&\.{(x}&|sign_val_val|&\.{"+"}&$+4$&$-2$&$-2$\cr
1766)&\.{(x+2)}&|misc|\cr}}$$
1767At each stage we have put as much into the buffer as possible without
1768knowing what is coming next. Examples like `\.{x-0.1}' indicate why
1769|last_sign| is needed to associate the proper sign with an output of zero.
1770
1771In states |num_or_id|, |unbreakable|, and |misc| the last item in the buffer
1772lies between |break_ptr| and |out_ptr-1|, inclusive; in the other states we
1773have |break_ptr=out_ptr|.
1774
1775The numeric values assigned to |num_or_id|, etc., have been chosen to
1776shorten some of the program logic; for example, the program makes use of
1777the fact that |sign+2=sign_val_sign|.
1778
1779@d misc=0 {state associated with special characters}
1780@d num_or_id=1 {state associated with numbers and identifiers}
1781@d sign=2 {state associated with pending \.+ or \.-}
1782@d sign_val=num_or_id+2 {state associated with pending sign and value}
1783@d sign_val_sign=sign+2 {|sign_val| followed by another pending sign}
1784@d sign_val_val=sign_val+2 {|sign_val| followed by another pending value}
1785@d unbreakable=sign_val_val+1 {state associated with \.{@@\&}}
1786
1787@<Globals...@>=
1788@!out_state:eight_bits; {current status of partial output}
1789@!out_val,@!out_app:integer; {pending values}
1790@!out_sign:ASCII_code; {sign to use if appending |out_val>=0|}
1791@!last_sign:-1..+1; {sign to use if appending a zero}
1792
1793@ During the output process, |line| will equal the number of the next line
1794to be output.
1795
1796@<Initialize the output buffer@>=
1797out_state:=misc; out_ptr:=0; break_ptr:=0; semi_ptr:=0; out_buf[0]:=0; line:=1;
1798
1799@ Here is a routine that is invoked when |out_ptr>line_length|
1800or when it is time to flush out the final line. The |flush_buffer| procedure
1801often writes out the line up to the current |break_ptr| position, then moves the
1802remaining information to the front of |out_buf|. However, it prefers to
1803write only up to |semi_ptr|, if the residual line won't be too long.
1804
1805@d check_break==if out_ptr>line_length then flush_buffer
1806
1807@p procedure flush_buffer; {writes one line to output file}
1808var k:0..out_buf_size; {index into |out_buf|}
1809@!b:0..out_buf_size; {value of |break_ptr| upon entry}
1810begin b:=break_ptr;
1811if (semi_ptr<>0)and(out_ptr-semi_ptr<=line_length) then break_ptr:=semi_ptr;
1812for k:=1 to break_ptr do write(Pascal_file,xchr[out_buf[k-1]]);
1813write_ln(Pascal_file); incr(line);
1814if line mod 100 = 0 then
1815  begin print('.');
1816  if line mod 500 = 0 then print(line:1);
1817  update_terminal; {progress report}
1818  end;
1819if break_ptr<out_ptr then
1820  begin if out_buf[break_ptr]=" " then
1821    begin incr(break_ptr); {drop space at break}
1822    if break_ptr>b then b:=break_ptr;
1823    end;
1824  for k:=break_ptr to out_ptr-1 do out_buf[k-break_ptr]:=out_buf[k];
1825  end;
1826out_ptr:=out_ptr-break_ptr; break_ptr:=b-break_ptr; semi_ptr:=0;
1827if out_ptr>line_length then
1828  begin err_print('! Long line must be truncated'); out_ptr:=line_length;
1829@.Long line must be truncated@>
1830  end;
1831end;
1832
1833@ @<Empty the last line from the buffer@>=
1834break_ptr:=out_ptr; semi_ptr:=0; flush_buffer;
1835if brace_level<>0 then
1836  err_print('! Program ended at brace level ',brace_level:1);
1837@.Program ended at brace level n@>
1838
1839@ Another simple and useful routine appends the decimal equivalent of
1840a nonnegative integer to the output buffer.
1841
1842@d app(#)==begin out_buf[out_ptr]:=#; incr(out_ptr); {append a single character}
1843  end
1844
1845@p procedure app_val(@!v:integer); {puts |v| into buffer, assumes |v>=0|}
1846var k:0..out_buf_size; {index into |out_buf|}
1847begin k:=out_buf_size; {first we put the digits at the very end of |out_buf|}
1848repeat out_buf[k]:=v mod 10; v:=v div 10; decr(k);
1849until v=0;
1850repeat incr(k); app(out_buf[k]+"0");
1851until k=out_buf_size; {then we append them, most significant first}
1852end;
1853
1854@ The output states are kept up to date by the output routines, which are
1855called |send_out|, |send_val|, and |send_sign|. The |send_out| procedure
1856has two parameters: |t| tells the type of information being sent and
1857|v| contains the information proper. Some information may also be passed
1858in the array |out_contrib|.
1859
1860\yskip\hang If |t=misc| then |v| is a character to be output.
1861
1862\hang If |t=str| then |v| is the length of a string or something like `\.{<>}'
1863in |out_contrib|.
1864
1865\hang If |t=ident| then |v| is the length of an identifier in |out_contrib|.
1866
1867\hang If |t=frac| then |v| is the length of a fraction and/or exponent in
1868|out_contrib|.
1869
1870@d str=1 {|send_out| code for a string}
1871@d ident=2 {|send_out| code for an identifier}
1872@d frac=3 {|send_out| code for a fraction}
1873
1874@<Glob...@>=
1875@!out_contrib:array[1..line_length] of ASCII_code; {a contribution to |out_buf|}
1876
1877@ A slightly subtle point in the following code is that the user may ask
1878for a |join| operation (i.e., \.{@@\&}) following whatever is being sent
1879out.  We will see later that |join| is implemented in part by calling
1880|send_out(frac,0)|.
1881
1882@p procedure send_out(@!t:eight_bits; @!v:sixteen_bits);
1883  {outputs |v| of type |t|}
1884label restart;
1885var k: 0..line_length; {index into |out_contrib|}
1886begin @<Get the buffer ready for appending the new information@>;
1887if t<>misc then for k:=1 to v do app(out_contrib[k])
1888else app(v);
1889check_break;
1890if (t=misc)and((v=";")or(v="}")) then
1891  begin semi_ptr:=out_ptr; break_ptr:=out_ptr;
1892  end;
1893if t>=ident then out_state:=num_or_id {|t=ident| or |frac|}
1894else out_state:=misc {|t=str| or |misc|}
1895end;
1896
1897@ Here is where the buffer states for signs and values collapse into simpler
1898states, because we are about to append something that doesn't combine with
1899the previous integer constants.
1900
1901We use an ASCII-code trick: Since |","-1="+"| and |","+1="-"|, we have
1902|","-c=@t sign of $c$@>|, when $\vert c\vert=1$.
1903
1904@<Get the buffer ready...@>=
1905restart: case out_state of
1906num_or_id: if t<>frac then
1907  begin break_ptr:=out_ptr;
1908  if t=ident then app(" ");
1909  end;
1910sign: begin app(","-out_app); check_break; break_ptr:=out_ptr;
1911  end;
1912sign_val,sign_val_sign: begin @<Append \(|out_val| to buffer@>;
1913  out_state:=out_state-2; goto restart;
1914  end;
1915sign_val_val: @<Reduce |sign_val_val| to |sign_val| and |goto restart|@>;
1916misc: if t<>frac then break_ptr:=out_ptr;@/
1917othercases do_nothing {this is for |unbreakable| state}
1918endcases
1919
1920@ @<Append \(|out_val|...@>=
1921if (out_val<0)or((out_val=0)and(last_sign<0)) then app("-")
1922else if out_sign>0 then app(out_sign);
1923app_val(abs(out_val)); check_break;
1924
1925@ @<Reduce |sign_val_val|...@>=
1926begin if (t=frac)or(@<Contribution is \.* or \./ or \.{DIV} or \.{MOD}@>) then
1927  begin @<Append \(|out_val| to buffer@>;
1928  out_sign:="+"; out_val:=out_app;
1929  end
1930else out_val:=out_val+out_app;
1931out_state:=sign_val; goto restart;
1932end
1933
1934@ @<Contribution is \.*...@>=
1935((t=ident)and(v=3)and@|
1936 (((out_contrib[1]="D")and(out_contrib[2]="I")and(out_contrib[3]="V")) or@|
1937 ((out_contrib[1]="M")and(out_contrib[2]="O")and(out_contrib[3]="D")) ))or@|
1938@^uppercase@>
1939 ((t=misc)and((v="*")or(v="/")))
1940
1941@ The following routine is called with $v=\pm1$ when a plus or minus sign is
1942appended to the output. It extends \PASCAL\ to allow repeated signs
1943(e.g., `\.{--}' is equivalent to `\.+'), rather than to give an error message.
1944The signs following `\.E' in real constants are treated as part of a fraction,
1945so they are not seen by this routine.
1946
1947@p procedure send_sign(@!v:integer);
1948begin case out_state of
1949sign, sign_val_sign: out_app:=out_app*v;
1950sign_val:begin out_app:=v; out_state:=sign_val_sign;
1951  end;
1952sign_val_val: begin out_val:=out_val+out_app; out_app:=v;
1953  out_state:=sign_val_sign;
1954  end;
1955othercases begin break_ptr:=out_ptr; out_app:=v; out_state:=sign;
1956  end
1957endcases;@/
1958last_sign:=out_app;
1959end;
1960
1961@ When a (signed) integer value is to be output, we call |send_val|.
1962
1963@d bad_case=666 {this is a label used below}
1964
1965@p procedure send_val(@!v:integer); {output the (signed) value |v|}
1966label bad_case, {go here if we can't keep |v| in the output state}
1967  exit;
1968begin case out_state of
1969num_or_id: begin @<If previous output was \.{DIV} or \.{MOD}, |goto bad_case|@>;
1970  out_sign:=" "; out_state:=sign_val; out_val:=v; break_ptr:=out_ptr;
1971  last_sign:=+1;
1972  end;
1973misc: begin @<If previous output was \.* or \./, |goto bad_case|@>;
1974  out_sign:=0; out_state:=sign_val; out_val:=v; break_ptr:=out_ptr;
1975  last_sign:=+1;
1976  end;
1977@t\4@>@<Handle cases of |send_val| when |out_state| contains a sign@>@;
1978othercases goto bad_case
1979endcases;@/
1980return;
1981bad_case: @<Append the decimal value of |v|, with parentheses if negative@>;
1982exit: end;
1983
1984@ @<Handle cases of |send_val|...@>=
1985sign: begin out_sign:="+"; out_state:=sign_val; out_val:=out_app*v;
1986  end;
1987sign_val: begin out_state:=sign_val_val; out_app:=v;
1988  err_print('! Two numbers occurred without a sign between them');
1989  end;
1990sign_val_sign: begin out_state:=sign_val_val; out_app:=out_app*v;
1991  end;
1992sign_val_val: begin out_val:=out_val+out_app; out_app:=v;
1993  err_print('! Two numbers occurred without a sign between them');
1994@.Two numbers occurred...@>
1995  end;
1996
1997@ @<If previous output was \.*...@>=
1998if (out_ptr=break_ptr+1)and((out_buf[break_ptr]="*")or(out_buf[break_ptr]="/"))
1999  then goto bad_case
2000
2001@ @<If previous output was \.{DIV}...@>=
2002if (out_ptr=break_ptr+3)or
2003 ((out_ptr=break_ptr+4)and(out_buf[break_ptr]=" ")) then
2004@^uppercase@>
2005  if ((out_buf[out_ptr-3]="D")and(out_buf[out_ptr-2]="I")and
2006    (out_buf[out_ptr-1]="V"))or @/
2007     ((out_buf[out_ptr-3]="M")and(out_buf[out_ptr-2]="O")and
2008    (out_buf[out_ptr-1]="D")) then@/ goto bad_case
2009
2010@ @<Append the decimal value...@>=
2011if v>=0 then
2012  begin if out_state=num_or_id then
2013    begin break_ptr:=out_ptr; app(" ");
2014    end;
2015  app_val(v); check_break; out_state:=num_or_id;
2016  end
2017else  begin app("("); app("-"); app_val(-v); app(")"); check_break;
2018  out_state:=misc;
2019  end
2020
2021@* The big output switch.
2022To complete the output process, we need a routine that takes the results
2023of |get_output| and feeds them to |send_out|, |send_val|, or |send_sign|.
2024This procedure `|send_the_output|' will be invoked just once, as follows:
2025
2026@<Phase II: Output the contents of the compressed tables@>=
2027if text_link[0]=0 then
2028  begin print_nl('! No output was specified.'); mark_harmless;
2029@.No output was specified@>
2030  end
2031else  begin print_nl('Writing the output file'); update_terminal;@/
2032  @<Initialize the output stacks@>;
2033  @<Initialize the output buffer@>;
2034  send_the_output;@/
2035  @<Empty the last line...@>;
2036  print_nl('Done.');
2037  end
2038
2039@ A many-way switch is used to send the output:
2040
2041@d get_fraction=2 {this label is used below}
2042
2043@p procedure send_the_output;
2044label get_fraction, {go here to finish scanning a real constant}
2045  reswitch, continue;
2046var cur_char:eight_bits; {the latest character received}
2047  @!k:0..line_length; {index into |out_contrib|}
2048  @!j:0..max_bytes; {index into |byte_mem|}
2049  @!w:0..ww-1; {segment of |byte_mem|}
2050  @!n:integer; {number being scanned}
2051begin while stack_ptr>0 do
2052  begin cur_char:=get_output;
2053  reswitch: case cur_char of
2054  0: do_nothing; {this case might arise if output ends unexpectedly}
2055  @t\4@>@<Cases related to identifiers@>@;
2056  @t\4@>@<Cases related to constants, possibly leading to
2057    |get_fraction| or |reswitch|@>@;
2058  "+","-": send_sign(","-cur_char);
2059  @t\4@>@<Cases like \.{<>} and \.{:=}@>@;
2060  "'": @<Send a string, |goto reswitch|@>;
2061  @<Other printable characters@>: send_out(misc,cur_char);
2062  @t\4@>@<Cases involving \.{@@\{} and \.{@@\}}@>@;
2063  join: begin send_out(frac,0); out_state:=unbreakable;
2064    end;
2065  verbatim: @<Send verbatim string@>;
2066  force_line: @<Force a line break@>;
2067  othercases err_print('! Can''t output ASCII code ',cur_char:1)
2068@.Can't output ASCII code n@>
2069  endcases;@/
2070  goto continue;
2071  get_fraction: @<Special code to finish real constants@>;
2072  continue: end;
2073end;
2074
2075@ @<Cases like \.{<>}...@>=
2076and_sign: begin out_contrib[1]:="A"; out_contrib[2]:="N"; out_contrib[3]:="D";
2077@^uppercase@>
2078  send_out(ident,3);
2079  end;
2080not_sign: begin out_contrib[1]:="N"; out_contrib[2]:="O"; out_contrib[3]:="T";
2081  send_out(ident,3);
2082  end;
2083set_element_sign: begin out_contrib[1]:="I"; out_contrib[2]:="N";
2084  send_out(ident,2);
2085  end;
2086or_sign: begin out_contrib[1]:="O"; out_contrib[2]:="R"; send_out(ident,2);
2087  end;
2088left_arrow: begin out_contrib[1]:=":"; out_contrib[2]:="="; send_out(str,2);
2089  end;
2090not_equal: begin out_contrib[1]:="<"; out_contrib[2]:=">"; send_out(str,2);
2091  end;
2092less_or_equal: begin out_contrib[1]:="<"; out_contrib[2]:="="; send_out(str,2);
2093  end;
2094greater_or_equal: begin out_contrib[1]:=">"; out_contrib[2]:="=";
2095  send_out(str,2);
2096  end;
2097equivalence_sign: begin out_contrib[1]:="="; out_contrib[2]:="=";
2098  send_out(str,2);
2099  end;
2100double_dot: begin out_contrib[1]:="."; out_contrib[2]:="."; send_out(str,2);
2101  end;
2102
2103@ Please don't ask how all of the following characters can actually get
2104through \.{TANGLE} outside of strings. It seems that |""""| and |"{"|
2105cannot actually occur at this point of the program, but they have
2106been included just in case \.{TANGLE} changes.
2107
2108If \.{TANGLE} is producing code for a \PASCAL\ compiler that uses `\.{(.}'
2109and `\.{.)}' instead of square brackets (e.g., on machines with {\mc EBCDIC}
2110code), one should remove |"["| and |"]"| from this list and put them into
2111the preceding module in the appropriate way. Similarly, some compilers
2112want `\.\^' to be converted to `\.{@@}'.
2113@^system dependencies@>@^EBCDIC@>
2114
2115@<Other printable characters@>=
2116"!","""","#","$","%","&","(",")","*",",","/",":",";","<","=",">","?",
2117"@@","[","\","]","^","_","`","{","|"
2118
2119@ Single-character identifiers represent themselves, while longer ones
2120appear in |byte_mem|. All must be converted to uppercase,
2121with underlines removed. Extremely long identifiers must be chopped.
2122
2123(Some \PASCAL\ compilers work with lowercase letters instead of
2124uppercase. If this module of \.{TANGLE} is changed, it's also necessary
2125to change from uppercase to lowercase in the modules that are
2126listed in the index under ``uppercase''.)
2127@^system dependencies@>
2128@^uppercase@>
2129
2130@d up_to(#)==#-24,#-23,#-22,#-21,#-20,#-19,#-18,#-17,#-16,#-15,#-14,
2131  #-13,#-12,#-11,#-10,#-9,#-8,#-7,#-6,#-5,#-4,#-3,#-2,#-1,#
2132
2133@<Cases related to identifiers@>=
2134"A",up_to("Z"): begin out_contrib[1]:=cur_char; send_out(ident,1);
2135  end;
2136"a",up_to("z"): begin out_contrib[1]:=cur_char-@'40; send_out(ident,1);
2137  end;
2138identifier: begin k:=0; j:=byte_start[cur_val]; w:=cur_val mod ww;
2139  while (k<max_id_length)and(j<byte_start[cur_val+ww]) do
2140    begin incr(k); out_contrib[k]:=byte_mem[w,j]; incr(j);
2141    if out_contrib[k]>="a" then out_contrib[k]:=out_contrib[k]-@'40
2142    else if out_contrib[k]="_" then decr(k);
2143    end;
2144  send_out(ident,k);
2145  end;
2146
2147@ After sending a string, we need to look ahead at the next character, in order
2148to see if there were two consecutive single-quote marks. Afterwards we go to
2149|reswitch| to process the next character.
2150
2151@<Send a string...@>=
2152begin k:=1; out_contrib[1]:="'";
2153repeat if k<line_length then incr(k);
2154out_contrib[k]:=get_output;
2155until (out_contrib[k]="'")or(stack_ptr=0);
2156if k=line_length then err_print('! String too long');
2157@.String too long@>
2158send_out(str,k); cur_char:=get_output;
2159if cur_char="'" then out_state:=unbreakable;
2160goto reswitch;
2161end
2162
2163@ Sending a verbatim string is similar, but we don't have to look ahead.
2164
2165@<Send verbatim string@>=
2166begin k:=0;
2167repeat if k<line_length then incr(k);
2168out_contrib[k]:=get_output;
2169until (out_contrib[k]=verbatim)or(stack_ptr=0);
2170if k=line_length then err_print('! Verbatim string too long');
2171@.Verbatim string too long@>
2172send_out(str,k-1);
2173end
2174
2175@ In order to encourage portable software, \.{TANGLE} complains
2176if the constants get dangerously close to the largest value representable
2177on a 32-bit computer ($2^{31}-1$).
2178
2179@d digits=="0","1","2","3","4","5","6","7","8","9"
2180
2181@<Cases related to constants...@>=
2182digits: begin n:=0;
2183  repeat cur_char:=cur_char-"0";
2184  if n>=@'1463146314 then err_print('! Constant too big')
2185@.Constant too big@>
2186  else n:=10*n+cur_char;
2187  cur_char:=get_output;
2188  until (cur_char>"9")or(cur_char<"0");
2189  send_val(n); k:=0;
2190  if cur_char="e" then cur_char:="E";
2191@^uppercase@>
2192  if cur_char="E" then goto get_fraction
2193  else goto reswitch;
2194  end;
2195check_sum: send_val(pool_check_sum);
2196octal: begin n:=0; cur_char:="0";
2197  repeat cur_char:=cur_char-"0";
2198  if n>=@'2000000000 then err_print('! Constant too big')
2199  else n:=8*n+cur_char;
2200  cur_char:=get_output;
2201  until (cur_char>"7")or(cur_char<"0");
2202  send_val(n); goto reswitch;
2203  end;
2204hex: begin n:=0; cur_char:="0";
2205  repeat if cur_char>="A" then cur_char:=cur_char+10-"A"
2206  else cur_char:=cur_char-"0";
2207  if n>=@"8000000 then err_print('! Constant too big')
2208  else n:=16*n+cur_char;
2209  cur_char:=get_output;
2210  until (cur_char>"F")or(cur_char<"0")or@|
2211    ((cur_char>"9")and(cur_char<"A"));
2212  send_val(n); goto reswitch;
2213  end;
2214number: send_val(cur_val);
2215".":  begin k:=1; out_contrib[1]:="."; cur_char:=get_output;
2216  if cur_char="." then
2217    begin out_contrib[2]:="."; send_out(str,2);
2218    end
2219  else if (cur_char>="0")and(cur_char<="9") then goto get_fraction
2220  else  begin send_out(misc,"."); goto reswitch;
2221    end;
2222  end;
2223
2224@ The following code appears at label `|get_fraction|', when we want to
2225scan to the end of a real constant. The first |k| characters of a fraction
2226have already been placed in |out_contrib|, and |cur_char| is the next character.
2227
2228@<Special code...@>=
2229repeat if k<line_length then incr(k);
2230out_contrib[k]:=cur_char; cur_char:=get_output;
2231if (out_contrib[k]="E")and((cur_char="+")or(cur_char="-")) then
2232@^uppercase@>
2233  begin if k<line_length then incr(k);
2234  out_contrib[k]:=cur_char; cur_char:=get_output;
2235  end
2236else if cur_char="e" then cur_char:="E";
2237until (cur_char<>"E")and((cur_char<"0")or(cur_char>"9"));
2238if k=line_length then err_print('! Fraction too long');
2239@.Fraction too long@>
2240send_out(frac,k); goto reswitch
2241
2242@ Some \PASCAL\ compilers do not recognize comments in braces, so the
2243comments must be delimited by `\.{(*}' and `\.{*)}'.
2244@^system dependencies@>
2245In such cases the statement `|out_contrib[1]:="{"|' that appears here should
2246be replaced by `\ignorespaces|begin out_contrib[1]:="("; out_contrib[2]:="*";
2247incr(k); end|', and a similar change should be made to
2248`|out_contrib[k]:="}"|'.
2249
2250@<Cases involving \.{@@\{} and \.{@@\}}@>=
2251begin_comment: begin if brace_level=0 then send_out(misc,"{")
2252  else send_out(misc,"[");
2253  incr(brace_level);
2254  end;
2255end_comment: if brace_level>0 then
2256    begin decr(brace_level);
2257    if brace_level=0 then send_out(misc,"}")
2258    else send_out(misc,"]");
2259    end
2260  else err_print('! Extra @@}');
2261@.Extra \AT!\}@>
2262module_number: begin k:=2;
2263  if brace_level=0 then out_contrib[1]:="{"
2264  else out_contrib[1]:="[";
2265  if cur_val<0 then
2266    begin out_contrib[k]:=":"; cur_val:=-cur_val; incr(k);
2267    end;
2268  n:=10;
2269  while cur_val>=n do n:=10*n;
2270  repeat n:=n div 10;
2271    out_contrib[k]:="0"+(cur_val div n); cur_val:=cur_val mod n; incr(k);
2272  until n=1;
2273  if out_contrib[2]<>":" then
2274    begin out_contrib[k]:=":"; incr(k);
2275    end;
2276  if brace_level=0 then out_contrib[k]:="}"
2277  else out_contrib[k]:="]";
2278  send_out(str,k);
2279  end;
2280
2281@ @<Force a line break@>=
2282begin send_out(str,0); {normalize the buffer}
2283while out_ptr>0 do
2284  begin if out_ptr<=line_length then break_ptr:=out_ptr;
2285  flush_buffer;
2286  end;
2287out_state:=misc;
2288end
2289
2290@* Introduction to the input phase.
2291We have now seen that \.{TANGLE} will be able to output the full
2292\PASCAL\ program, if we can only get that program into the byte memory in
2293the proper format. The input process is something like the output process
2294in reverse, since we compress the text as we read it in and we expand it
2295as we write it out.
2296
2297There are three main input routines. The most interesting is the one that gets
2298the next token of a \PASCAL\ text; the other two are used to scan rapidly past
2299\TeX\ text in the \.{WEB} source code. One of the latter routines will jump to
2300the next token that starts with `\.{@@}', and the other skips to the end
2301of a \PASCAL\ comment.
2302
2303@ But first we need to consider the low-level routine |get_line|
2304that takes care of merging |change_file| into |web_file|. The |get_line|
2305procedure also updates the line numbers for error messages.
2306
2307@<Globals...@>=
2308@!ii:integer; {general purpose |for| loop variable in the outer block}
2309@!line:integer; {the number of the current line in the current file}
2310@!other_line:integer; {the number of the current line in the input file that
2311  is not currently being read}
2312@!temp_line:integer; {used when interchanging |line| with |other_line|}
2313@!limit:0..buf_size; {the last character position occupied in the buffer}
2314@!loc:0..buf_size; {the next character position to be read from the buffer}
2315@!input_has_ended: boolean; {if |true|, there is no more input}
2316@!changing: boolean; {if |true|, the current line is from |change_file|}
2317
2318@ As we change |changing| from |true| to |false| and back again, we must
2319remember to swap the values of |line| and |other_line| so that the |err_print|
2320routine will be sure to report the correct line number.
2321
2322@d change_changing==
2323  changing := not changing;
2324  temp_line:=other_line; other_line:=line; line:=temp_line
2325    {|line @t$\null\BA\null$@> other_line|}
2326
2327@ When |changing| is |false|, the next line of |change_file| is kept in
2328|change_buffer[0..change_limit]|, for purposes of comparison with the next
2329line of |web_file|. After the change file has been completely input, we
2330set |change_limit:=0|, so that no further matches will be made.
2331
2332@<Globals...@>=
2333@!change_buffer:array[0..buf_size] of ASCII_code;
2334@!change_limit:0..buf_size; {the last position occupied in |change_buffer|}
2335
2336@ Here's a simple function that checks if the two buffers are different.
2337
2338@p function lines_dont_match:boolean;
2339label exit;
2340var k:0..buf_size; {index into the buffers}
2341begin lines_dont_match:=true;
2342if change_limit<>limit then return;
2343if limit>0 then
2344  for k:=0 to limit-1 do if change_buffer[k]<>buffer[k] then return;
2345lines_dont_match:=false;
2346exit: end;
2347
2348@ Procedure |prime_the_change_buffer| sets |change_buffer| in preparation
2349for the next matching operation. Since blank lines in the change file are
2350not used for matching, we have |(change_limit=0)and not changing| if and
2351only if the change file is exhausted. This procedure is called only
2352when |changing| is true; hence error messages will be reported correctly.
2353
2354@p procedure prime_the_change_buffer;
2355label continue, done, exit;
2356var k:0..buf_size; {index into the buffers}
2357begin change_limit:=0; {this value will be used if the change file ends}
2358@<Skip over comment lines in the change file; |return| if end of file@>;
2359@<Skip to the next nonblank line; |return| if end of file@>;
2360@<Move |buffer| and |limit| to |change_buffer| and |change_limit|@>;
2361exit: end;
2362
2363@ While looking for a line that begins with \.{@@x} in the change file,
2364we allow lines that begin with \.{@@}, as long as they don't begin with
2365\.{@@y} or \.{@@z} (which would probably indicate that the change file is
2366fouled up).
2367
2368@<Skip over comment lines in the change file...@>=
2369loop@+  begin incr(line);
2370  if not input_ln(change_file) then return;
2371  if limit<2 then goto continue;
2372  if buffer[0]<>"@@" then goto continue;
2373  if (buffer[1]>="X")and(buffer[1]<="Z") then
2374    buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify}
2375  if buffer[1]="x" then goto done;
2376  if (buffer[1]="y")or(buffer[1]="z") then
2377    begin loc:=2; err_print('! Where is the matching @@x?');
2378@.Where is the match...@>
2379    end;
2380continue: end;
2381done:
2382
2383@ Here we are looking at lines following the \.{@@x}.
2384
2385@<Skip to the next nonblank line...@>=
2386repeat incr(line);
2387  if not input_ln(change_file) then
2388    begin err_print('! Change file ended after @@x');
2389@.Change file ended...@>
2390    return;
2391    end;
2392until limit>0;
2393
2394@ @<Move |buffer| and |limit| to |change_buffer| and |change_limit|@>=
2395begin change_limit:=limit;
2396if limit>0 then for k:=0 to limit-1 do change_buffer[k]:=buffer[k];
2397end
2398
2399@ The following procedure is used to see if the next change entry should
2400go into effect; it is called only when |changing| is false.
2401The idea is to test whether or not the current
2402contents of |buffer| matches the current contents of |change_buffer|.
2403If not, there's nothing more to do; but if so, a change is called for:
2404All of the text down to the \.{@@y} is supposed to match. An error
2405message is issued if any discrepancy is found. Then the procedure
2406prepares to read the next line from |change_file|.
2407
2408@p procedure check_change; {switches to |change_file| if the buffers match}
2409label exit;
2410var n:integer; {the number of discrepancies found}
2411@!k:0..buf_size; {index into the buffers}
2412begin if lines_dont_match then return;
2413n:=0;
2414loop@+  begin change_changing; {now it's |true|}
2415  incr(line);
2416  if not input_ln(change_file) then
2417    begin err_print('! Change file ended before @@y');
2418@.Change file ended...@>
2419    change_limit:=0;  change_changing; {|false| again}
2420    return;
2421    end;
2422  @<If the current line starts with \.{@@y},
2423    report any discrepancies and |return|@>;
2424  @<Move |buffer| and |limit|...@>;
2425  change_changing; {now it's |false|}
2426  incr(line);
2427  if not input_ln(web_file) then
2428    begin err_print('! WEB file ended during a change');
2429@.WEB file ended...@>
2430    input_has_ended:=true; return;
2431    end;
2432  if lines_dont_match then incr(n);
2433  end;
2434exit: end;
2435
2436@ @<If the current line starts with \.{@@y}...@>=
2437if limit>1 then if buffer[0]="@@" then
2438  begin if (buffer[1]>="X")and(buffer[1]<="Z") then
2439    buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify}
2440  if (buffer[1]="x")or(buffer[1]="z") then
2441    begin loc:=2; err_print('! Where is the matching @@y?');
2442@.Where is the match...@>
2443    end
2444  else if buffer[1]="y" then
2445    begin if n>0 then
2446      begin loc:=2; err_print('! Hmm... ',n:1,
2447        ' of the preceding lines failed to match');
2448@.Hmm... n of the preceding...@>
2449      end;
2450    return;
2451    end;
2452  end
2453
2454@ @<Initialize the input system@>=
2455open_input; line:=0; other_line:=0;@/
2456changing:=true; prime_the_change_buffer; change_changing;@/
2457limit:=0; loc:=1; buffer[0]:=" "; input_has_ended:=false;
2458
2459@ The |get_line| procedure is called when |loc>limit|; it puts the next
2460line of merged input into the buffer and updates the other variables
2461appropriately. A space is placed at the right end of the line.
2462
2463@p procedure get_line; {inputs the next line}
2464label restart;
2465begin restart: if changing then
2466  @<Read from |change_file| and maybe turn off |changing|@>;
2467if not changing then
2468  begin @<Read from |web_file| and maybe turn on |changing|@>;
2469  if changing then goto restart;
2470  end;
2471loc:=0; buffer[limit]:=" ";
2472end;
2473
2474@ @<Read from |web_file|...@>=
2475begin incr(line);
2476if not input_ln(web_file) then input_has_ended:=true
2477else if limit=change_limit then
2478  if buffer[0]=change_buffer[0] then
2479    if change_limit>0 then check_change;
2480end
2481
2482@ @<Read from |change_file|...@>=
2483begin incr(line);
2484if not input_ln(change_file) then
2485  begin err_print('! Change file ended without @@z');
2486@.Change file ended...@>
2487  buffer[0]:="@@"; buffer[1]:="z"; limit:=2;
2488  end;
2489if limit>1 then {check if the change has ended}
2490  if buffer[0]="@@" then
2491    begin if (buffer[1]>="X")and(buffer[1]<="Z") then
2492      buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify}
2493    if (buffer[1]="x")or(buffer[1]="y") then
2494      begin loc:=2; err_print('! Where is the matching @@z?');
2495@.Where is the match...@>
2496      end
2497    else if buffer[1]="z" then
2498      begin prime_the_change_buffer; change_changing;
2499      end;
2500    end;
2501end
2502
2503@ At the end of the program, we will tell the user if the change file
2504had a line that didn't match any relevant line in |web_file|.
2505
2506@<Check that all changes have been read@>=
2507if change_limit<>0 then {|changing| is false}
2508  begin for ii:=0 to change_limit do buffer[ii]:=change_buffer[ii];
2509  limit:=change_limit; changing:=true; line:=other_line; loc:=change_limit;
2510  err_print('! Change file entry did not match');
2511@.Change file entry did not match@>
2512  end
2513
2514@ Important milestones are reached during the input phase when certain
2515control codes are sensed.
2516
2517Control codes in \.{WEB} begin with `\.{@@}', and the next character
2518identifies the code. Some of these are of interest only to \.{WEAVE},
2519so \.{TANGLE} ignores them; the others are converted by \.{TANGLE} into
2520internal code numbers by the |control_code| function below. The ordering
2521of these internal code numbers has been chosen to simplify the program logic;
2522larger numbers are given to the control codes that denote more significant
2523milestones.
2524
2525@d ignore=0 {control code of no interest to \.{TANGLE}}
2526@d control_text=@'203 {control code for `\.{@@t}', `\.{@@\^}', etc.}
2527@d format=@'204 {control code for `\.{@@f}'}
2528@d definition=@'205 {control code for `\.{@@d}'}
2529@d begin_Pascal=@'206 {control code for `\.{@@p}'}
2530@d module_name=@'207 {control code for `\.{@@<}'}
2531@d new_module=@'210 {control code for `\.{@@\ }' and `\.{@@*}'}
2532
2533@p function control_code(@!c:ASCII_code):eight_bits; {convert |c| after \.{@@}}
2534begin case c of
2535"@@": control_code:="@@"; {`quoted' at sign}
2536"'": control_code:=octal; {precedes octal constant}
2537"""": control_code:=hex; {precedes hexadecimal constant}
2538"$": control_code:=check_sum; {string pool check sum}
2539" ",tab_mark: control_code:=new_module; {beginning of a new module}
2540"*": begin print('*',module_count+1:1);
2541  update_terminal; {print a progress report}
2542  control_code:=new_module; {beginning of a new module}
2543  end;
2544"D","d": control_code:=definition; {macro definition}
2545"F","f": control_code:=format; {format definition}
2546"{": control_code:=begin_comment; {begin-comment delimiter}
2547"}": control_code:=end_comment; {end-comment delimiter}
2548"P","p": control_code:=begin_Pascal; {\PASCAL\ text in unnamed module}
2549"T","t","^",".",":": control_code:=control_text; {control text to be ignored}
2550"&": control_code:=join; {concatenate two tokens}
2551"<": control_code:=module_name; {beginning of a module name}
2552"=": control_code:=verbatim; {beginning of \PASCAL\ verbatim mode}
2553"\": control_code:=force_line; {force a new line in \PASCAL\ output}
2554othercases control_code:=ignore {ignore all other cases}
2555endcases;
2556end;
2557
2558@ The |skip_ahead| procedure reads through the input at fairly high speed
2559until finding the next non-ignorable control code, which it returns.
2560
2561@p function skip_ahead:eight_bits; {skip to next control code}
2562label done;
2563var c:eight_bits; {control code found}
2564begin loop begin if loc>limit then
2565    begin get_line;
2566    if input_has_ended then
2567      begin c:=new_module; goto done;
2568      end;
2569    end;
2570  buffer[limit+1]:="@@";
2571  while buffer[loc]<>"@@" do incr(loc);
2572  if loc<=limit then
2573    begin loc:=loc+2; c:=control_code(buffer[loc-1]);
2574    if (c<>ignore)or(buffer[loc-1]=">") then goto done;
2575    end;
2576  end;
2577done: skip_ahead:=c;
2578end;
2579
2580@ The |skip_comment| procedure reads through the input at somewhat high speed
2581until finding the first unmatched right brace or until coming to the end
2582of the file. It ignores characters following `\.\\' characters, since all
2583braces that aren't nested are supposed to be hidden in that way. For
2584example, consider the process of skipping the first comment below,
2585where the string containing the right brace has been typed as \.{\`\\.\\\}\'}
2586in the \.{WEB} file.
2587
2588@p procedure skip_comment; {skips to next unmatched `\.\}'}
2589label exit;
2590var bal:eight_bits; {excess of left braces}
2591@!c:ASCII_code; {current character}
2592begin bal:=0;
2593loop@+  begin if loc>limit then
2594    begin get_line;
2595    if input_has_ended then
2596      begin err_print('! Input ended in mid-comment');
2597@.Input ended in mid-comment@>
2598      return;
2599      end;
2600    end;
2601  c:=buffer[loc]; incr(loc);
2602  @<Do special things when |c="@@", "\", "{", "}"|; |return| at end@>;
2603  end;
2604exit:end;
2605
2606@ @<Do special things when |c="@@"...@>=
2607if c="@@" then
2608  begin c:=buffer[loc];
2609  if (c<>" ")and(c<>tab_mark)and(c<>"*")and(c<>"z")and(c<>"Z") then incr(loc)
2610  else  begin err_print('! Section ended in mid-comment');
2611@.Section ended in mid-comment@>
2612    decr(loc); return;
2613    end
2614  end
2615else if (c="\")and(buffer[loc]<>"@@") then incr(loc)
2616else if c="{" then incr(bal)
2617else if c="}" then
2618  begin if bal=0 then return;
2619  decr(bal);
2620  end
2621
2622@* Inputting the next token.
2623As stated above, \.{TANGLE}'s most interesting input procedure is the
2624|get_next| routine that inputs the next token. However, the procedure
2625isn't especially difficult.
2626
2627In most cases the tokens output by |get_next| have the form used in
2628replacement texts, except that two-byte tokens are not produced.
2629An identifier that isn't one letter long is represented by the
2630output `|identifier|', and in such a case the global variables
2631|id_first| and |id_loc| will have been set to the appropriate values
2632needed by the |id_lookup| procedure. A string that begins with a
2633double-quote is also considered an |identifier|, and in such a case
2634the global variable |double_chars| will also have been set appropriately.
2635Control codes produce the corresponding output of the |control_code|
2636function above; and if that code is |module_name|, the value of |cur_module|
2637will point to the |byte_start| entry for that module name.
2638
2639Another global variable, |scanning_hex|, is |true| during the time that
2640the letters \.A through \.F should be treated as if they were digits.
2641
2642@<Globals...@>=
2643@!cur_module: name_pointer; {name of module just scanned}
2644@!scanning_hex: boolean; {are we scanning a hexadecimal constant?}
2645
2646@ @<Set init...@>=
2647scanning_hex:=false;
2648
2649@ At the top level, |get_next| is a multi-way switch based on the next
2650character in the input buffer. A |new_module| code is inserted at the
2651very end of the input file.
2652
2653@p function get_next:eight_bits; {produces the next input token}
2654label restart,done,found;
2655var c:eight_bits; {the current character}
2656@!d:eight_bits; {the next character}
2657@!j,@!k:0..longest_name; {indices into |mod_text|}
2658begin restart: if loc>limit then
2659  begin get_line;
2660  if input_has_ended then
2661    begin c:=new_module; goto found;
2662    end;
2663  end;
2664c:=buffer[loc]; incr(loc);
2665if scanning_hex then @<Go to |found| if |c| is a hexadecimal digit,
2666  otherwise set |scanning_hex:=false|@>;
2667case c of
2668"A",up_to("Z"),"a",up_to("z"): @<Get an identifier@>;
2669"""": @<Get a preprocessed string@>;
2670"@@": @<Get control code and possible module name@>;
2671@t\4@>@<Compress two-symbol combinations like `\.{:=}'@>@;
2672" ",tab_mark: goto restart; {ignore spaces and tabs}
2673"{": begin skip_comment; goto restart;
2674  end;
2675"}": begin err_print('! Extra }'); goto restart;
2676@.Extra \}@>
2677  end;
2678othercases if c>=128 then goto restart {ignore nonstandard characters}
2679  else do_nothing
2680endcases;
2681found:@!debug if trouble_shooting then debug_help;@;@+gubed@/
2682get_next:=c;
2683end;
2684
2685@ @<Go to |found| if |c| is a hexadecimal digit...@>=
2686if ((c>="0")and(c<="9"))or((c>="A")and(c<="F")) then goto found
2687else scanning_hex:=false
2688
2689@ Note that the following code substitutes \.{@@\{} and \.{@@\}} for the
2690respective combinations `\.{(*}' and `\.{*)}'. Explicit braces should be used
2691for \TeX\ comments in \PASCAL\ text.
2692
2693@d compress(#)==begin if loc<=limit then begin c:=#; incr(loc); end; end
2694
2695@<Compress two-symbol...@>=
2696".": if buffer[loc]="." then compress(double_dot)
2697  else if buffer[loc]=")" then compress("]");
2698":": if buffer[loc]="=" then compress(left_arrow);
2699"=": if buffer[loc]="=" then compress(equivalence_sign);
2700">": if buffer[loc]="=" then compress(greater_or_equal);
2701"<": if buffer[loc]="=" then compress(less_or_equal)
2702  else if buffer[loc]=">" then compress(not_equal);
2703"(": if buffer[loc]="*" then compress(begin_comment)
2704  else if buffer[loc]="." then compress("[");
2705"*": if buffer[loc]=")" then compress(end_comment);
2706
2707@ We have to look at the preceding character to make sure this isn't part
2708of a real constant, before trying to find an identifier starting with
2709`\.e' or `\.E'.
2710
2711@<Get an identifier@>=
2712begin if ((c="e")or(c="E"))and(loc>1) then
2713  if (buffer[loc-2]<="9")and(buffer[loc-2]>="0") then c:=0;
2714if c<>0 then
2715  begin decr(loc); id_first:=loc;
2716  repeat incr(loc); d:=buffer[loc];
2717  until ((d<"0")or((d>"9")and(d<"A"))or((d>"Z")and(d<"a"))or(d>"z")) and
2718    (d<>"_");
2719  if loc>id_first+1 then
2720    begin c:=identifier; id_loc:=loc;
2721    end;
2722  end
2723else c:="E"; {exponent of a real constant}
2724end
2725
2726@ A string that starts and ends with double-quote marks is converted into
2727an identifier that behaves like a numeric macro by means of the following
2728piece of the program.
2729@^preprocessed strings@>
2730
2731@<Get a preprocessed string@>=
2732begin double_chars:=0; id_first:=loc-1;
2733repeat d:=buffer[loc]; incr(loc);
2734  if (d="""")or(d="@@") then
2735    if buffer[loc]=d then
2736      begin incr(loc); d:=0; incr(double_chars);
2737      end
2738    else  begin if d="@@" then err_print('! Double @@ sign missing')
2739@.Double \AT! sign missing@>
2740      end
2741  else if loc>limit then
2742    begin err_print('! String constant didn''t end'); d:="""";
2743@.String constant didn't end@>
2744    end;
2745until d="""";
2746id_loc:=loc-1; c:=identifier;
2747end
2748
2749@ After an \.{@@} sign has been scanned, the next character tells us
2750whether there is more work to do.
2751
2752@<Get control code and possible module name@>=
2753begin c:=control_code(buffer[loc]); incr(loc);
2754if c=ignore then goto restart
2755else if c=hex then scanning_hex:=true
2756else if c=module_name then
2757  @<Scan the \(module name and make |cur_module| point to it@>
2758else if c=control_text then
2759  begin repeat c:=skip_ahead;
2760  until c<>"@@";
2761  if buffer[loc-1]<>">" then
2762    err_print('! Improper @@ within control text');
2763@.Improper \AT! within control text@>
2764  goto restart;
2765  end;
2766end
2767
2768@ @<Scan the \(module name...@>=
2769begin @<Put module name into |mod_text[1..k]|@>;
2770if k>3 then
2771  begin if (mod_text[k]=".")and(mod_text[k-1]=".")and(mod_text[k-2]=".") then
2772    cur_module:=prefix_lookup(k-3)
2773  else cur_module:=mod_lookup(k);
2774  end
2775else cur_module:=mod_lookup(k);
2776end
2777
2778@ Module names are placed into the |mod_text| array with consecutive spaces,
2779tabs, and carriage-returns replaced by single spaces. There will be no
2780spaces at the beginning or the end. (We set |mod_text[0]:=" "| to facilitate
2781this, since the |mod_lookup| routine uses |mod_text[1]| as the first
2782character of the name.)
2783
2784@<Set init...@>=mod_text[0]:=" ";
2785
2786@ @<Put module name...@>=
2787k:=0;
2788loop@+  begin if loc>limit then
2789    begin get_line;
2790    if input_has_ended then
2791      begin err_print('! Input ended in section name');
2792@.Input ended in section name@>
2793      goto done;
2794      end;
2795    end;
2796  d:=buffer[loc];
2797  @<If end of name, |goto done|@>;
2798  incr(loc); if k<longest_name-1 then incr(k);
2799  if (d=" ")or(d=tab_mark) then
2800    begin d:=" "; if mod_text[k-1]=" " then decr(k);
2801    end;
2802  mod_text[k]:=d;
2803  end;
2804done: @<Check for overlong name@>;
2805if (mod_text[k]=" ")and(k>0) then decr(k);
2806
2807@ @<If end of name,...@>=
2808if d="@@" then
2809  begin d:=buffer[loc+1];
2810  if d=">" then
2811    begin loc:=loc+2; goto done;
2812    end;
2813  if (d=" ")or(d=tab_mark)or(d="*") then
2814    begin err_print('! Section name didn''t end'); goto done;
2815@.Section name didn't end@>
2816    end;
2817  incr(k); mod_text[k]:="@@"; incr(loc); {now |d=buffer[loc]| again}
2818  end
2819
2820@ @<Check for overlong name@>=
2821if k>=longest_name-2 then
2822  begin print_nl('! Section name too long: ');
2823@.Section name too long@>
2824  for j:=1 to 25 do print(xchr[mod_text[j]]);
2825  print('...'); mark_harmless;
2826  end
2827
2828@* Scanning a numeric definition.
2829When \.{TANGLE} looks at the \PASCAL\ text following the `\.=' of a numeric
2830macro definition, it calls on the precedure |scan_numeric(p)|, where |p|
2831points to the name that is to be defined. This procedure evaluates the
2832right-hand side, which must consist entirely of integer constants and
2833defined numeric macros connected with \.+ and \.- signs (no parentheses).
2834It also sets the global variable |next_control| to the control code that
2835terminated this definition.
2836
2837A definition ends with the control codes |definition|, |format|, |module_name|,
2838|begin_Pascal|, and |new_module|, all of which can be recognized
2839by the fact that they are the largest values |get_next| can return.
2840
2841@d end_of_definition(#)==(#>=format)
2842  {is |#| a control code ending a definition?}
2843
2844@<Global...@>=
2845@!next_control:eight_bits; {control code waiting to be acted upon}
2846
2847@ The evaluation of a numeric expression makes use of two variables called the
2848|accumulator| and the |next_sign|. At the beginning, |accumulator| is zero and
2849|next_sign| is $+1$. When a \.+ or \.- is scanned, |next_sign| is multiplied
2850by the value of that sign. When a numeric value is scanned, it is multiplied by
2851|next_sign| and added to the |accumulator|, then |next_sign| is reset to $+1$.
2852
2853@d add_in(#)==begin accumulator:=accumulator+next_sign*(#); next_sign:=+1;
2854  end
2855
2856@p procedure scan_numeric(@!p:name_pointer); {defines numeric macros}
2857label reswitch, done;
2858var accumulator:integer; {accumulates sums}
2859@!next_sign:-1..+1; {sign to attach to next value}
2860@!q:name_pointer; {points to identifiers being evaluated}
2861@!val:integer; {constants being evaluated}
2862begin @<Set \(|accumulator| to the value of the right-hand side@>;
2863if abs(accumulator)>=@'100000 then
2864  begin err_print('! Value too big: ',accumulator:1); accumulator:=0;
2865@.Value too big@>
2866  end;
2867equiv[p]:=accumulator+@'100000; {name |p| now is defined to equal |accumulator|}
2868end;
2869
2870@ @<Set \(|accumulator| to the value of the right-hand side@>=
2871accumulator:=0; next_sign:=+1;
2872loop@+  begin next_control:=get_next;
2873  reswitch: case next_control of
2874  digits: begin @<Set |val| to value of decimal constant, and
2875      set |next_control| to the following token@>;
2876    add_in(val); goto reswitch;
2877    end;
2878  octal: begin @<Set |val| to value of octal constant, and
2879      set |next_control| to the following token@>;
2880    add_in(val); goto reswitch;
2881    end;
2882  hex: begin @<Set |val| to value of hexadecimal constant, and
2883      set |next_control| to the following token@>;
2884    add_in(val); goto reswitch;
2885    end;
2886  identifier: begin q:=id_lookup(normal);
2887    if ilk[q]<>numeric then
2888      begin next_control:="*"; goto reswitch; {leads to error}
2889      end;
2890    add_in(equiv[q]-@'100000);
2891    end;
2892  "+": do_nothing;
2893  "-": next_sign:=-next_sign;
2894  format, definition, module_name, begin_Pascal, new_module: goto done;
2895  ";": err_print('! Omit semicolon in numeric definition');
2896@.Omit semicolon in numeric def...@>
2897  othercases @<Signal error, flush rest of the definition@>
2898  endcases;
2899  end;
2900done:
2901
2902@ @<Signal error, flush rest...@>=
2903begin err_print('! Improper numeric definition will be flushed');
2904@.Improper numeric definition...@>
2905repeat next_control:=skip_ahead
2906until end_of_definition(next_control);
2907if next_control=module_name then
2908  begin {we want to scan the module name too}
2909  loc:=loc-2; next_control:=get_next;
2910  end;
2911accumulator:=0; goto done;
2912end
2913
2914@ @<Set |val| to value of decimal...@>=
2915val:=0;
2916repeat val:=10*val+next_control-"0"; next_control:=get_next;
2917until (next_control>"9")or(next_control<"0")
2918
2919@ @<Set |val| to value of octal...@>=
2920val:=0; next_control:="0";
2921repeat val:=8*val+next_control-"0"; next_control:=get_next;
2922until (next_control>"7")or(next_control<"0")
2923
2924@ @<Set |val| to value of hex...@>=
2925val:=0; next_control:="0";
2926repeat if next_control>="A" then next_control:=next_control+"0"+10-"A";
2927val:=16*val+next_control-"0"; next_control:=get_next;
2928until (next_control>"F")or(next_control<"0")or@|
2929  ((next_control>"9")and(next_control<"A"))
2930
2931@* Scanning a macro definition.
2932The rules for generating the replacement texts corresponding to simple
2933macros, parametric macros, and \PASCAL\ texts of a module are almost
2934identical, so a single procedure is used for all three cases. The
2935differences are that
2936
2937\yskip\item{a)} The sign |#| denotes a parameter only when it appears
2938outside of strings in a parametric macro; otherwise it stands for the
2939ASCII character |#|. (This is not used in standard \PASCAL, but some
2940\PASCAL s allow, for example, `\.{/\#}' after a certain kind of file name.)
2941
2942\item{b)}Module names are not allowed in simple macros or parametric macros;
2943in fact, the appearance of a module name terminates such macros and denotes
2944the name of the current module.
2945
2946\item{c)}The symbols \.{@@d} and \.{@@f} and \.{@@p} are not allowed after
2947module names, while they terminate macro definitions.
2948
2949@ Therefore there is a procedure |scan_repl| whose parameter |t| specifies
2950either |simple| or |parametric| or |module_name|. After |scan_repl| has
2951acted, |cur_repl_text| will point to the replacement text just generated, and
2952|next_control| will contain the control code that terminated the activity.
2953
2954@<Globals...@>=
2955@!cur_repl_text:text_pointer; {replacement text formed by |scan_repl|}
2956
2957@ @p procedure scan_repl(@!t:eight_bits); {creates a replacement text}
2958label continue, done, found, reswitch;
2959var a:sixteen_bits; {the current token}
2960@!b:ASCII_code; {a character from the buffer}
2961@!bal:eight_bits; {left parentheses minus right parentheses}
2962begin bal:=0;
2963loop@+  begin continue: a:=get_next;
2964  case a of
2965  "(": incr(bal);
2966  ")": if bal=0 then err_print('! Extra )')
2967@.Extra )@>
2968    else decr(bal);
2969  "'": @<Copy a string from the buffer to |tok_mem|@>;
2970  "#": if t=parametric then a:=param;
2971  @t\4@>@<In cases that |a| is a non-ASCII token (|identifier|,
2972  |module_name|, etc.), either process it and change |a| to a byte
2973  that should be stored, or |goto continue| if |a| should be ignored,
2974  or |goto done| if |a| signals the end of this replacement text@>@;
2975  othercases do_nothing
2976  endcases;@/
2977  app_repl(a); {store |a| in |tok_mem|}
2978  end;
2979done: next_control:=a;
2980@<Make sure the parentheses balance@>;
2981if text_ptr>max_texts-zz then overflow('text');
2982cur_repl_text:=text_ptr; tok_start[text_ptr+zz]:=tok_ptr[z];
2983incr(text_ptr);
2984if z=zz-1 then z:=0@+else incr(z);
2985end;
2986
2987@ @<Make sure the parentheses balance@>=
2988if bal>0 then
2989  begin if bal=1 then err_print('! Missing )')
2990  else err_print('! Missing ',bal:1,' )''s');
2991@.Missing n )@>
2992  while bal>0 do
2993    begin app_repl(")"); decr(bal);
2994    end;
2995  end
2996
2997@ @<In cases that |a| is...@>=
2998identifier: begin a:=id_lookup(normal); app_repl((a div @'400)+@'200);
2999  a:=a mod @'400;
3000  end;
3001module_name: if t<>module_name then goto done
3002  else  begin app_repl((cur_module div @'400)+@'250);
3003    a:=cur_module mod @'400;
3004    end;
3005verbatim: @<Copy verbatim string from the buffer to |tok_mem|@>;
3006definition, format, begin_Pascal: if t<>module_name then goto done
3007  else  begin err_print('! @@',xchr[buffer[loc-1]],
3008@.\AT!p is ignored in Pascal text@>
3009@.\AT!d is ignored in Pascal text@>
3010@.\AT!f is ignored in Pascal text@>
3011      ' is ignored in Pascal text'); goto continue;
3012    end;
3013new_module: goto done;
3014
3015@ @<Copy a string...@>=
3016begin b:="'";
3017loop@+  begin app_repl(b);
3018  if b="@@" then
3019    if buffer[loc]="@@" then incr(loc) {store only one \.{@@}}
3020    else err_print('! You should double @@ signs in strings');
3021@.You should double \AT! signs@>
3022  if loc=limit then
3023    begin err_print('! String didn''t end');
3024@.String didn't end@>
3025    buffer[loc]:="'"; buffer[loc+1]:=0;
3026    end;
3027  b:=buffer[loc]; incr(loc);
3028  if b="'" then
3029    begin if buffer[loc]<>"'" then goto found
3030    else  begin incr(loc); app_repl("'");
3031      end;
3032    end;
3033  end;
3034found: end {now |a| holds the final |"'"| that will be stored}
3035
3036@ @<Copy verbatim string...@>=
3037begin app_repl(verbatim);
3038buffer[limit+1]:="@@";
3039reswitch: if buffer[loc]="@@" then
3040  begin if loc<limit then if buffer[loc+1]="@@" then
3041    begin app_repl("@@");
3042    loc:=loc+2;
3043    goto reswitch;
3044    end;
3045  end
3046else begin app_repl(buffer[loc]);
3047  incr(loc);
3048  goto reswitch;
3049  end;
3050if loc>=limit then err_print('! Verbatim string didn''t end')
3051@.Verbatim string didn't end@>
3052else if buffer[loc+1]<>">" then
3053  err_print('! You should double @@ signs in verbatim strings');
3054@.You should double \AT! signs@>
3055loc:=loc+2;
3056end {another |verbatim| byte will be stored, since |a=verbatim|}
3057
3058@ The following procedure is used to define a simple or parametric macro,
3059just after the `\.{==}' of its definition has been scanned.
3060
3061@p procedure define_macro(@!t:eight_bits);
3062var p:name_pointer; {the identifier being defined}
3063begin p:=id_lookup(t); scan_repl(t);@/
3064equiv[p]:=cur_repl_text; text_link[cur_repl_text]:=0;
3065end;
3066
3067@* Scanning a module.
3068The |scan_module| procedure starts when `\.{@@\ }' or `\.{@@*}' has been
3069sensed in the input, and it proceeds until the end of that module.  It
3070uses |module_count| to keep track of the current module number; with luck,
3071\.{WEAVE} and \.{TANGLE} will both assign the same numbers to modules.
3072
3073@<Globals...@>=
3074@!module_count:0..@'27777; {the current module number}
3075
3076@ The top level of |scan_module| is trivial.
3077@p procedure scan_module;
3078label continue, done, exit;
3079var p:name_pointer; {module name for the current module}
3080begin incr(module_count);
3081@<Scan the \(definition part of the current module@>;
3082@<Scan the \PASCAL\ part of the current module@>;
3083exit: end;
3084
3085@ @<Scan the \(definition part...@>=
3086next_control:=0;
3087loop@+  begin continue: while next_control<=format do
3088    begin next_control:=skip_ahead;
3089    if next_control=module_name then
3090      begin {we want to scan the module name too}
3091      loc:=loc-2; next_control:=get_next;
3092      end;
3093    end;
3094  if next_control<>definition then goto done;
3095  next_control:=get_next; {get identifier name}
3096  if next_control<>identifier then
3097    begin err_print('! Definition flushed, must start with ',
3098@.Definition flushed...@>
3099      'identifier of length > 1'); goto continue;
3100    end;
3101  next_control:=get_next; {get token after the identifier}
3102  if next_control="=" then
3103    begin scan_numeric(id_lookup(numeric)); goto continue;
3104    end
3105  else if next_control=equivalence_sign then
3106    begin define_macro(simple); goto continue;
3107    end
3108  else @<If the next text is `|(#)==|', call |define_macro|
3109    and |goto continue|@>;
3110  err_print('! Definition flushed since it starts badly');
3111@.Definition flushed...@>
3112  end;
3113done:
3114
3115@ @<If the next text is `|(#)==|'...@>=
3116if next_control="(" then
3117  begin next_control:=get_next;
3118  if next_control="#" then
3119    begin next_control:=get_next;
3120    if next_control=")" then
3121      begin next_control:=get_next;
3122      if next_control="=" then
3123        begin err_print('! Use == for macros');
3124@.Use == for macros@>
3125        next_control:=equivalence_sign;
3126        end;
3127      if next_control=equivalence_sign then
3128        begin define_macro(parametric); goto continue;
3129        end;
3130      end;
3131    end;
3132  end;
3133
3134@ @<Scan the \PASCAL...@>=
3135case next_control of
3136begin_Pascal:p:=0;
3137module_name: begin p:=cur_module;
3138  @<Check that |=| or |==| follows this module name, otherwise |return|@>;
3139  end;
3140othercases return
3141endcases;@/
3142@<Insert the module number into |tok_mem|@>;
3143scan_repl(module_name); {now |cur_repl_text| points to the replacement text}
3144@<Update the data structure so that the replacement text is accessible@>;
3145
3146@ @<Check that |=|...@>=
3147repeat next_control:=get_next;
3148until next_control<>"+"; {allow optional `\.{+=}'}
3149if (next_control<>"=")and(next_control<>equivalence_sign) then
3150  begin err_print('! Pascal text flushed, = sign is missing');
3151@.Pascal text flushed...@>
3152  repeat next_control:=skip_ahead;
3153  until next_control=new_module;
3154  return;
3155  end
3156
3157@ @<Insert the module number...@>=
3158store_two_bytes(@'150000+module_count); {|@'150000=@'320*@'400|}
3159
3160@ @<Update the data...@>=
3161if p=0 then {unnamed module}
3162  begin text_link[last_unnamed]:=cur_repl_text; last_unnamed:=cur_repl_text;
3163  end
3164else if equiv[p]=0 then equiv[p]:=cur_repl_text {first module of this name}
3165else  begin p:=equiv[p];
3166  while text_link[p]<module_flag do p:=text_link[p]; {find end of list}
3167  text_link[p]:=cur_repl_text;
3168  end;
3169text_link[cur_repl_text]:=module_flag;
3170  {mark this replacement text as a nonmacro}
3171
3172@* Debugging.
3173The \PASCAL\ debugger with which \.{TANGLE} was developed allows breakpoints
3174to be set, and variables can be read and changed, but procedures cannot be
3175executed. Therefore a `|debug_help|' procedure has been inserted in the main
3176loops of each phase of the program; when |ddt| and |dd| are set to appropriate
3177values, symbolic printouts of various tables will appear.
3178
3179The idea is to set a breakpoint inside the |debug_help| routine, at the
3180place of `\ignorespaces|breakpoint:|\unskip' below.  Then when
3181|debug_help| is to be activated, set |trouble_shooting| equal to |true|.
3182The |debug_help| routine will prompt you for values of |ddt| and |dd|,
3183discontinuing this when |ddt<=0|; thus you type $2n+1$ integers, ending
3184with zero or a negative number. Then control either passes to the
3185breakpoint, allowing you to look at and/or change variables (if you typed
3186zero), or to exit the routine (if you typed a negative value).
3187
3188Another global variable, |debug_cycle|, can be used to skip silently
3189past calls on |debug_help|. If you set |debug_cycle>1|, the program stops
3190only every |debug_cycle| times |debug_help| is called; however,
3191any error stop will set |debug_cycle| to zero.
3192
3193@<Globals...@>=
3194@!debug@!trouble_shooting:boolean; {is |debug_help| wanted?}
3195@!ddt:integer; {operation code for the |debug_help| routine}
3196@!dd:integer; {operand in procedures performed by |debug_help|}
3197@!debug_cycle:integer; {threshold for |debug_help| stopping}
3198@!debug_skipped:integer; {we have skipped this many |debug_help| calls}
3199@!term_in:text_file; {the user's terminal as an input file}
3200gubed
3201
3202@ The debugging routine needs to read from the user's terminal.
3203@^system dependencies@>
3204@<Set init...@>=
3205@!debug trouble_shooting:=true; debug_cycle:=1; debug_skipped:=0;@/
3206trouble_shooting:=false; debug_cycle:=99999; {use these when it almost works}
3207reset(term_in,'TTY:','/I'); {open |term_in| as the terminal, don't do a |get|}
3208gubed
3209
3210@ @d breakpoint=888 {place where a breakpoint is desirable}
3211@^system dependencies@>
3212
3213@p @!debug procedure debug_help; {routine to display various things}
3214label breakpoint,exit;
3215var k:integer; {index into various arrays}
3216begin incr(debug_skipped);
3217if debug_skipped<debug_cycle then return;
3218debug_skipped:=0;
3219loop@+  begin print_nl('#'); update_terminal; {prompt}
3220  read(term_in,ddt); {read a debug-command code}
3221  if ddt<0 then return
3222  else if ddt=0 then
3223    begin goto breakpoint;@\ {go to every label at least once}
3224    breakpoint: ddt:=0;@\
3225    end
3226  else  begin read(term_in,dd);
3227    case ddt of
3228    1: print_id(dd);
3229    2: print_repl(dd);
3230    3: for k:=1 to dd do print(xchr[buffer[k]]);
3231    4: for k:=1 to dd do print(xchr[mod_text[k]]);
3232    5: for k:=1 to out_ptr do print(xchr[out_buf[k]]);
3233    6: for k:=1 to dd do print(xchr[out_contrib[k]]);
3234    othercases print('?')
3235    endcases;
3236    end;
3237  end;
3238exit:end;
3239gubed
3240
3241@* The main program.
3242We have defined plenty of procedures, and it is time to put the last
3243pieces of the puzzle in place. Here is where \.{TANGLE} starts, and where
3244it ends.
3245@^system dependencies@>
3246
3247@p begin initialize;
3248@<Initialize the input system@>;
3249print_ln(banner); {print a ``banner line''}
3250@<Phase I: Read all the user's text and compress it into |tok_mem|@>;
3251stat for ii:=0 to zz-1 do max_tok_ptr[ii]:=tok_ptr[ii];@+tats@;@/
3252@<Phase II:...@>;
3253end_of_TANGLE:
3254if string_ptr>256 then @<Finish off the string pool file@>;
3255stat @<Print statistics about memory usage@>;@+tats@;@/
3256@t\4\4@>{here files should be closed if the operating system requires it}
3257@<Print the job |history|@>;
3258end.
3259
3260@ @<Phase I:...@>=
3261phase_one:=true;
3262module_count:=0;
3263repeat next_control:=skip_ahead;
3264until next_control=new_module;
3265while not input_has_ended do scan_module;
3266@<Check that all changes have been read@>;
3267phase_one:=false;
3268
3269@ @<Finish off the string pool file@>=
3270begin print_nl(string_ptr-256:1, ' strings written to string pool file.');
3271write(pool,'*');
3272for ii:=1 to 9 do
3273  begin out_buf[ii]:=pool_check_sum mod 10;
3274  pool_check_sum:=pool_check_sum div 10;
3275  end;
3276for ii:=9 downto 1 do write(pool,xchr["0"+out_buf[ii]]);
3277write_ln(pool);
3278end
3279
3280@ @<Glob...@>=
3281stat @!wo:0..ww-1; {segment of memory for which statistics are being printed}
3282tats
3283
3284@ @<Print statistics about memory usage@>=
3285print_nl('Memory usage statistics:');
3286print_nl(name_ptr:1, ' names, ', text_ptr:1, ' replacement texts;');
3287print_nl(byte_ptr[0]:1);
3288for wo:=1 to ww-1 do print('+',byte_ptr[wo]:1);
3289if phase_one then
3290  for ii:=0 to zz-1 do max_tok_ptr[ii]:=tok_ptr[ii];
3291print(' bytes, ', max_tok_ptr[0]:1);
3292for ii:=1 to zz-1 do print('+',max_tok_ptr[ii]:1);
3293print(' tokens.');
3294
3295@ Some implementations may wish to pass the |history| value to the
3296operating system so that it can be used to govern whether or not other
3297programs are started. Here we simply report the history to the user.
3298@^system dependencies@>
3299
3300@<Print the job |history|@>=
3301case history of
3302spotless: print_nl('(No errors were found.)');
3303harmless_message: print_nl('(Did you see the warning message above?)');
3304error_message: print_nl('(Pardon me, but I think I spotted something wrong.)');
3305fatal_message: print_nl('(That was a fatal error, my friend.)');
3306end {there are no other cases}
3307
3308@* System-dependent changes.
3309This module should be replaced, if necessary, by changes to the program
3310that are necessary to make \.{TANGLE} work at a particular installation.
3311It is usually best to design your change file so that all changes to
3312previous modules preserve the module numbering; then everybody's version
3313will be consistent with the printed program. More extensive changes,
3314which introduce new modules, can be inserted here; then only the index
3315itself will get a new module number.
3316@^system dependencies@>
3317
3318@* Index.
3319Here is a cross-reference table for the \.{TANGLE} processor.
3320All modules in which an identifier is
3321used are listed with that identifier, except that reserved words are
3322indexed only when they appear in format definitions, and the appearances
3323of identifiers in module names are not indexed. Underlined entries
3324correspond to where the identifier was declared. Error messages and
3325a few other things like ``ASCII code'' are indexed here too.
3326