1% otangle.web: Omega version of file tangle.web
2%
3% This file is part of the Omega project, which
4% is based in the web2c distribution of TeX.
5%
6% Copyright (c) 1994--1998 John Plaice and Yannis Haralambous
7% applies only to the changes to the original tangle.web.
8%
9% This program by D. E. Knuth is not copyrighted and can be used freely.
10% Version 0 was released in December, 1981.
11% Version 1 was released in September, 1982, with version 0 of TeX.
12% Slight changes were made in October, 1982, for version 0.6 of TeX.
13% Version 1.2 introduced {:nnn} comments, added @@= and @@\ (December, 1982).
14% Version 1.4 added "history" (February, 1983).
15% Version 1.5 conformed to TeX version 0.96 and fixed @@\ (March, 1983).
16% Version 1.7 introduced the new change file format (June, 1983).
17% Version 2.0 was released in July, 1983, with version 0.999 of TeX.
18% Version 2.5 was released in November, 1983, with version 1.0 of TeX.
19% Version 2.6 fixed a bug: force-line-break after a constant (August, 1984).
20% Version 2.7 fixed the definition of check_sum_prime (May, 1985).
21% Version 2.8 fixed a bug in change_buffer movement (August, 1985).
22% Version 2.9 allows nonnumeric macros before their def (December, 1988).
23% Version 3, for Sewell's book, fixed long-line bug in input_ln (March, 1989).
24% Version 4 was major change to allow 8-bit input (September, 1989).
25% Version 4.1 conforms to ANSI standard for-loop rules (September, 1990).
26% Version 4.2 fixes stat report if phase one dies (March, 1991).
27% Version 4.3 fixes @@ bug in verbatim, catches extra } (September, 1991).
28% Version 4.4 activates debug_help on errors as advertised (February, 1993).
29
30% Here is TeX material that gets inserted after \input webmac
31\def\hang{\hangindent 3em\indent\ignorespaces}
32\font\ninerm=cmr9
33\let\mc=\ninerm % medium caps for names like SAIL
34\def\PASCAL{Pascal}
35\def\pb{$\.|\ldots\.|$} % Pascal brackets (|...|)
36\def\v{\.{\char'174}} % vertical (|) in typewriter font
37\mathchardef\BA="3224 % double arrow
38\def\({} % kludge for alphabetizing certain module names
39
40\def\title{OTANGLE}
41\def\contentspagenumber{123} % should be odd
42\def\topofcontents{\null\vfill
43  \titlefalse % include headline on the contents page
44  \def\rheader{\mainfont Appendix E\hfil \contentspagenumber}
45  \centerline{\titlefont The {\ttitlefont OTANGLE} processor}
46  \vskip 15pt
47  \centerline{(Version 4.4)}
48  \vfill}
49\pageno=\contentspagenumber \advance\pageno by 1
50
51@* Introduction.
52This program converts a \.{WEB} file to a \PASCAL\ file. It was written
53by D. E. Knuth in September, 1981; a somewhat similar {\mc SAIL} program had
54been developed in March, 1979. Since this program describes itself, a
55bootstrapping process involving hand-translation had to be used to get started.
56
57For large \.{WEB} files one should have a large memory, since \.{TANGLE} keeps
58all the \PASCAL\ text in memory (in an abbreviated form). The program uses
59a few features of the local \PASCAL\ compiler that may need to be changed in
60other installations:
61
62\yskip\item{1)} Case statements have a default.
63\item{2)} Input-output routines may need to be adapted for use with a particular
64character set and/or for printing messages on the user's terminal.
65
66\yskip\noindent
67These features are also present in the \PASCAL\ version of \TeX, where they
68are used in a similar (but more complex) way. System-dependent portions
69of \.{TANGLE} can be identified by looking at the entries for `system
70dependencies' in the index below.
71@!@^system dependencies@>
72
73The ``banner line'' defined here should be changed whenever \.{TANGLE}
74is modified.
75
76@d banner=='This is OTANGLE, Version 4.4'
77
78@ The program begins with a fairly normal header, made up of pieces that
79@^system dependencies@>
80will mostly be filled in later. The \.{WEB} input comes from files |web_file|
81and |change_file|, the \PASCAL\ output goes to file |Pascal_file|,
82and the string pool output goes to file |pool|.
83
84If it is necessary to abort the job because of a fatal error, the program
85calls the `|jump_out|' procedure, which goes to the label |end_of_TANGLE|.
86
87@d end_of_TANGLE = 9999 {go here to wrap it up}
88
89@p @t\4@>@<Compiler directives@>@/
90program OTANGLE(@!web_file,@!change_file,@!Pascal_file,@!pool);
91label end_of_TANGLE; {go here to finish}
92const @<Constants in the outer block@>@/
93type @<Types in the outer block@>@/
94var @<Globals in the outer block@>@/
95@<Error handling procedures@>@/
96procedure initialize;
97  var @<Local variables for initialization@>@/
98  begin @<Set initial values@>@/
99  end;
100
101@ Some of this code is optional for use when debugging only;
102such material is enclosed between the delimiters |debug| and $|gubed|$.
103Other parts, delimited by |stat| and $|tats|$, are optionally included if
104statistics about \.{TANGLE}'s memory usage are desired.
105
106@d debug==@{ {change this to `$\\{debug}\equiv\null$' when debugging}
107@d gubed==@t@>@} {change this to `$\\{gubed}\equiv\null$' when debugging}
108@f debug==begin
109@f gubed==end
110@#
111@d stat==@{ {change this to `$\\{stat}\equiv\null$'
112  when gathering usage statistics}
113@d tats==@t@>@} {change this to `$\\{tats}\equiv\null$'
114  when gathering usage statistics}
115@f stat==begin
116@f tats==end
117
118@ The \PASCAL\ compiler used to develop this system has ``compiler
119directives'' that can appear in comments whose first character is a dollar sign.
120In production versions of \.{TANGLE} these directives tell the compiler that
121@^system dependencies@>
122it is safe to avoid range checks and to leave out the extra code it inserts
123for the \PASCAL\ debugger's benefit, although interrupts will occur if
124there is arithmetic overflow.
125
126@<Compiler directives@>=
127@{@&$C-,A+,D-@} {no range check, catch arithmetic overflow, no debug overhead}
128@!debug @{@&$C+,D+@}@+ gubed {but turn everything on when debugging}
129
130@ Labels are given symbolic names by the following definitions. We insert
131the label `|exit|:' just before the `\ignorespaces|end|\unskip' of a
132procedure in which we have used the `|return|' statement defined below;
133the label `|restart|' is occasionally used at the very beginning of a
134procedure; and the label `|reswitch|' is occasionally used just prior to
135a \&{case} statement in which some cases change the conditions and we wish to
136branch to the newly applicable case.
137Loops that are set up with the \&{loop} construction defined below are
138commonly exited by going to `|done|' or to `|found|' or to `|not_found|',
139and they are sometimes repeated by going to `|continue|'.
140
141@d exit=10 {go here to leave a procedure}
142@d restart=20 {go here to start a procedure again}
143@d reswitch=21 {go here to start a case statement again}
144@d continue=22 {go here to resume a loop}
145@d done=30 {go here to exit a loop}
146@d found=31 {go here when you've found it}
147@d not_found=32 {go here when you've found something else}
148
149@ Here are some macros for common programming idioms.
150
151@d incr(#) == #:=#+1 {increase a variable by unity}
152@d decr(#) == #:=#-1 {decrease a variable by unity}
153@d loop == @+ while true do@+ {repeat over and over until a |goto| happens}
154@d do_nothing == {empty statement}
155@d return == goto exit {terminate a procedure call}
156@f return == nil
157@f loop == xclause
158
159@ We assume that |case| statements may include a default case that applies
160if no matching label is found. Thus, we shall use constructions like
161@^system dependencies@>
162$$\vbox{\halign{#\hfil\cr
163|case x of|\cr
1641: $\langle\,$code for $x=1\,\rangle$;\cr
1653: $\langle\,$code for $x=3\,\rangle$;\cr
166|othercases| $\langle\,$code for |x<>1| and |x<>3|$\,\rangle$\cr
167|endcases|\cr}}$$
168since most \PASCAL\ compilers have plugged this hole in the language by
169incorporating some sort of default mechanism. For example, the compiler
170used to develop \.{WEB} and \TeX\ allows `|others|:' as a default label,
171and other \PASCAL s allow syntaxes like `\ignorespaces|else|\unskip' or
172`\&{otherwise}' or `\\{otherwise}:', etc. The definitions of |othercases|
173and |endcases| should be changed to agree with local conventions. (Of
174course, if no default mechanism is available, the |case| statements of
175this program must be extended by listing all remaining cases. The author
176would have taken the trouble to modify \.{TANGLE} so that such extensions
177were done automatically, if he had not wanted to encourage \PASCAL\
178compiler writers to make this important change in \PASCAL, where it belongs.)
179
180@d othercases == others: {default for cases not listed explicitly}
181@d endcases == @+end {follows the default case in an extended |case| statement}
182@f othercases == else
183@f endcases == end
184
185@ The following parameters are set big enough to handle \TeX, so they
186should be sufficient for most applications of \.{TANGLE}.
187
188@<Constants...@>=
189@!buf_size=100; {maximum length of input line}
190@!max_bytes=65535; {|1/ww| times the number of bytes in identifiers,
191  strings, and module names; must be less than 65536}
192@!max_toks=65535; {|1/zz| times the number of bytes in compressed \PASCAL\ code;
193  must be less than 65536}
194@!max_names=10239; {number of identifiers, strings, module names;
195  must be less than 10240}
196@!max_texts=10239; {number of replacement texts, must be less than 10240}
197@!hash_size=353; {should be prime}
198@!longest_name=400; {module names shouldn't be longer than this}
199@!line_length=72; {lines of \PASCAL\ output have at most this many characters}
200@!out_buf_size=144; {length of output buffer, should be twice |line_length|}
201@!stack_size=50; {number of simultaneous levels of macro expansion}
202@!max_id_length=12; {long identifiers are chopped to this length, which must
203  not exceed |line_length|}
204@!unambig_length=7; {identifiers must be unique if chopped to this length}
205  {note that 7 is more strict than \PASCAL's 8, but this can be varied}
206
207@ A global variable called |history| will contain one of four values
208at the end of every run: |spotless| means that no unusual messages were
209printed; |harmless_message| means that a message of possible interest
210was printed but no serious errors were detected; |error_message| means that
211at least one error was found; |fatal_message| means that the program
212terminated abnormally. The value of |history| does not influence the
213behavior of the program; it is simply computed for the convenience
214of systems that might want to use such information.
215
216@d spotless=0 {|history| value for normal jobs}
217@d harmless_message=1 {|history| value when non-serious info was printed}
218@d error_message=2 {|history| value when an error was noted}
219@d fatal_message=3 {|history| value when we had to stop prematurely}
220@#
221@d mark_harmless==@t@>@+if history=spotless then history:=harmless_message
222@d mark_error==history:=error_message
223@d mark_fatal==history:=fatal_message
224@d biggest_char=65535
225@d number_chars=65536
226
227@<Glob...@>=@!history:spotless..fatal_message; {how bad was this run?}
228
229@ @<Set init...@>=history:=spotless;
230
231@* The character set.
232One of the main goals in the design of \.{WEB} has been to make it readily
233portable between a wide variety of computers. Yet \.{WEB} by its very
234nature must use a greater variety of characters than most computer
235programs deal with, and character encoding is one of the areas in which
236existing machines differ most widely from each other.
237
238To resolve this problem, all input to \.{WEAVE} and \.{TANGLE} is converted
239to an internal eight-bit code that is essentially standard ASCII, the ``American
240Standard Code for Information Interchange.''  The conversion is done
241immediately when each character is read in. Conversely, characters are
242converted from ASCII to the user's external representation just before
243they are output. (The original ASCII code was seven bits only; \.{WEB} now
244allows eight bits in an attempt to keep up with modern times.)
245
246Such an internal code is relevant to users of \.{WEB} only because it is
247the code used for preprocessed constants like \.{"A"}. If you are writing
248a program in \.{WEB} that makes use of such one-character constants, you
249should convert your input to ASCII form, like \.{WEAVE} and \.{TANGLE} do.
250Otherwise \.{WEB}'s internal coding scheme does not affect you.
251@^ASCII code@>
252
253Here is a table of the standard visible ASCII codes:
254$$\def\:{\char\count255\global\advance\count255 by 1}
255\count255='40
256\vbox{
257\hbox{\hbox to 40pt{\it\hfill0\/\hfill}%
258\hbox to 40pt{\it\hfill1\/\hfill}%
259\hbox to 40pt{\it\hfill2\/\hfill}%
260\hbox to 40pt{\it\hfill3\/\hfill}%
261\hbox to 40pt{\it\hfill4\/\hfill}%
262\hbox to 40pt{\it\hfill5\/\hfill}%
263\hbox to 40pt{\it\hfill6\/\hfill}%
264\hbox to 40pt{\it\hfill7\/\hfill}}
265\vskip 4pt
266\hrule
267\def\^{\vrule height 10.5pt depth 4.5pt}
268\halign{\hbox to 0pt{\hskip -24pt\O{#0}\hfill}&\^
269\hbox to 40pt{\tt\hfill#\hfill\^}&
270&\hbox to 40pt{\tt\hfill#\hfill\^}\cr
27104&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
27205&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
27306&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
27407&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
27510&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
27611&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
27712&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
27813&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
27914&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
28015&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
28116&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
28217&\:&\:&\:&\:&\:&\:&\:\cr}
283\hrule width 280pt}$$
284(Actually, of course, code @'040 is an invisible blank space.)  Code @'136
285was once an upward arrow (\.{\char'13}), and code @'137 was
286once a left arrow (\.^^X), in olden times when the first draft
287of ASCII code was prepared; but \.{WEB} works with today's standard
288ASCII in which those codes represent circumflex and underline as shown.
289
290@<Types...@>=
291@!ASCII_code=0..255; {eight-bit numbers, a subrange of the integers}
292
293@ The original \PASCAL\ compiler was designed in the late 60s, when six-bit
294character sets were common, so it did not make provision for lowercase
295letters. Nowadays, of course, we need to deal with both capital and small
296letters in a convenient way, so \.{WEB} assumes that it is being used
297with a \PASCAL\ whose character set contains at least the characters of
298standard ASCII as listed above. Some \PASCAL\ compilers use the original
299name |char| for the data type associated with the characters in text files,
300while other \PASCAL s consider |char| to be a 64-element subrange of a larger
301data type that has some other name.
302
303In order to accommodate this difference, we shall use the name |text_char|
304to stand for the data type of the characters in the input and output
305files.  We shall also assume that |text_char| consists of the elements
306|chr(first_text_char)| through |chr(last_text_char)|, inclusive. The
307following definitions should be adjusted if necessary.
308@^system dependencies@>
309
310@d text_char == char {the data type of characters in text files}
311@d first_text_char=0 {ordinal number of the smallest element of |text_char|}
312@d last_text_char=255 {ordinal number of the largest element of |text_char|}
313
314@<Types...@>=
315@!text_file=packed file of text_char;
316
317@ The \.{WEAVE} and \.{TANGLE} processors convert between ASCII code and
318the user's external character set by means of arrays |xord| and |xchr|
319that are analogous to \PASCAL's |ord| and |chr| functions.
320
321@<Globals...@>=
322@!xord: array [text_char] of ASCII_code;
323  {specifies conversion of input characters}
324@!xchr: array [ASCII_code] of text_char;
325  {specifies conversion of output characters}
326
327@ If we assume that every system using \.{WEB} is able to read and write the
328visible characters of standard ASCII (although not necessarily using the
329ASCII codes to represent them), the following assignment statements initialize
330most of the |xchr| array properly, without needing any system-dependent
331changes. For example, the statement \.{xchr[@@\'101]:=\'A\'} that appears
332in the present \.{WEB} file might be encoded in, say, {\mc EBCDIC} code
333on the external medium on which it resides, but \.{TANGLE} will convert from
334this external code to ASCII and back again. Therefore the assignment
335statement \.{XCHR[65]:=\'A\'} will appear in the corresponding \PASCAL\ file,
336and \PASCAL\ will compile this statement so that |xchr[65]| receives the
337character \.A in the external (|char|) code. Note that it would be quite
338incorrect to say \.{xchr[@@\'101]:="A"}, because |"A"| is a constant of
339type |integer|, not |char|, and because we have $|"A"|=65$ regardless of
340the external character set.
341
342@<Set init...@>=
343xchr[@'40]:=' ';
344xchr[@'41]:='!';
345xchr[@'42]:='"';
346xchr[@'43]:='#';
347xchr[@'44]:='$';
348xchr[@'45]:='%';
349xchr[@'46]:='&';
350xchr[@'47]:='''';@/
351xchr[@'50]:='(';
352xchr[@'51]:=')';
353xchr[@'52]:='*';
354xchr[@'53]:='+';
355xchr[@'54]:=',';
356xchr[@'55]:='-';
357xchr[@'56]:='.';
358xchr[@'57]:='/';@/
359xchr[@'60]:='0';
360xchr[@'61]:='1';
361xchr[@'62]:='2';
362xchr[@'63]:='3';
363xchr[@'64]:='4';
364xchr[@'65]:='5';
365xchr[@'66]:='6';
366xchr[@'67]:='7';@/
367xchr[@'70]:='8';
368xchr[@'71]:='9';
369xchr[@'72]:=':';
370xchr[@'73]:=';';
371xchr[@'74]:='<';
372xchr[@'75]:='=';
373xchr[@'76]:='>';
374xchr[@'77]:='?';@/
375xchr[@'100]:='@@';
376xchr[@'101]:='A';
377xchr[@'102]:='B';
378xchr[@'103]:='C';
379xchr[@'104]:='D';
380xchr[@'105]:='E';
381xchr[@'106]:='F';
382xchr[@'107]:='G';@/
383xchr[@'110]:='H';
384xchr[@'111]:='I';
385xchr[@'112]:='J';
386xchr[@'113]:='K';
387xchr[@'114]:='L';
388xchr[@'115]:='M';
389xchr[@'116]:='N';
390xchr[@'117]:='O';@/
391xchr[@'120]:='P';
392xchr[@'121]:='Q';
393xchr[@'122]:='R';
394xchr[@'123]:='S';
395xchr[@'124]:='T';
396xchr[@'125]:='U';
397xchr[@'126]:='V';
398xchr[@'127]:='W';@/
399xchr[@'130]:='X';
400xchr[@'131]:='Y';
401xchr[@'132]:='Z';
402xchr[@'133]:='[';
403xchr[@'134]:='\';
404xchr[@'135]:=']';
405xchr[@'136]:='^';
406xchr[@'137]:='_';@/
407xchr[@'140]:='`';
408xchr[@'141]:='a';
409xchr[@'142]:='b';
410xchr[@'143]:='c';
411xchr[@'144]:='d';
412xchr[@'145]:='e';
413xchr[@'146]:='f';
414xchr[@'147]:='g';@/
415xchr[@'150]:='h';
416xchr[@'151]:='i';
417xchr[@'152]:='j';
418xchr[@'153]:='k';
419xchr[@'154]:='l';
420xchr[@'155]:='m';
421xchr[@'156]:='n';
422xchr[@'157]:='o';@/
423xchr[@'160]:='p';
424xchr[@'161]:='q';
425xchr[@'162]:='r';
426xchr[@'163]:='s';
427xchr[@'164]:='t';
428xchr[@'165]:='u';
429xchr[@'166]:='v';
430xchr[@'167]:='w';@/
431xchr[@'170]:='x';
432xchr[@'171]:='y';
433xchr[@'172]:='z';
434xchr[@'173]:='{';
435xchr[@'174]:='|';
436xchr[@'175]:='}';
437xchr[@'176]:='~';@/
438xchr[0]:=' '; xchr[@'177]:=' '; {these ASCII codes are not used}
439
440@ Some of the ASCII codes below @'40 have been given symbolic names in
441\.{WEAVE} and \.{TANGLE} because they are used with a special meaning.
442
443@d and_sign=@'4 {equivalent to `\.{and}'}
444@d not_sign=@'5 {equivalent to `\.{not}'}
445@d set_element_sign=@'6 {equivalent to `\.{in}'}
446@d tab_mark=@'11 {ASCII code used as tab-skip}
447@d line_feed=@'12 {ASCII code thrown away at end of line}
448@d form_feed=@'14 {ASCII code used at end of page}
449@d carriage_return=@'15 {ASCII code used at end of line}
450@d left_arrow=@'30 {equivalent to `\.{:=}'}
451@d not_equal=@'32 {equivalent to `\.{<>}'}
452@d less_or_equal=@'34 {equivalent to `\.{<=}'}
453@d greater_or_equal=@'35 {equivalent to `\.{>=}'}
454@d equivalence_sign=@'36 {equivalent to `\.{==}'}
455@d or_sign=@'37 {equivalent to `\.{or}'}
456
457@ When we initialize the |xord| array and the remaining parts of |xchr|,
458it will be convenient to make use of an index variable, |i|.
459
460@<Local variables for init...@>=
461@!i:0..255;
462
463@ Here now is the system-dependent part of the character set.
464If \.{WEB} is being implemented on a garden-variety \PASCAL\ for which
465only standard ASCII codes will appear in the input and output files, you
466don't need to make any changes here. But if you have, for example, an extended
467character set like the one in Appendix~C of {\sl The \TeX book}, the first
468line of code in this module should be changed to
469$$\hbox{|for i:=1 to @'37 do xchr[i]:=chr(i);|}$$
470\.{WEB}'s character set is essentially identical to \TeX's, even with respect to
471characters less than @'40.
472@^system dependencies@>
473
474Changes to the present module will make \.{WEB} more friendly on computers
475that have an extended character set, so that one can type things like
476\.^^Z\ instead of \.{<>}. If you have an extended set of characters that
477are easily incorporated into text files, you can assign codes arbitrarily
478here, giving an |xchr| equivalent to whatever characters the users of
479\.{WEB} are allowed to have in their input files, provided that unsuitable
480characters do not correspond to special codes like |carriage_return|
481that are listed above.
482
483(The present file \.{TANGLE.WEB} does not contain any of the non-ASCII
484characters, because it is intended to be used with all implementations of
485\.{WEB}.  It was originally created on a Stanford system that has a
486convenient extended character set, then ``sanitized'' by applying another
487program that transliterated all of the non-standard characters into
488standard equivalents.)
489
490@<Set init...@>=
491for i:=1 to @'37 do xchr[i]:=' ';
492for i:=@'200 to @'377 do xchr[i]:=' ';
493
494@ The following system-independent code makes the |xord| array contain a
495suitable inverse to the information in |xchr|.
496
497@<Set init...@>=
498for i:=first_text_char to last_text_char do xord[chr(i)]:=" ";
499for i:=1 to @'377 do xord[xchr[i]]:=i;
500xord[' ']:=" ";
501
502@* Input and output.
503The input conventions of this program are intended to be very much like those
504of \TeX\ (except, of course, that they are much simpler, because much less
505needs to be done). Furthermore they are identical to those of \.{WEAVE}.
506Therefore people who need to make modifications to all three systems
507should be able to do so without too many headaches.
508
509We use the standard \PASCAL\ input/output procedures in several places that
510\TeX\ cannot, since \.{TANGLE} does not have to deal with files that are named
511dynamically by the user, and since there is no input from the terminal.
512
513@ Terminal output is done by writing on file |term_out|, which is assumed to
514consist of characters of type |text_char|:
515@^system dependencies@>
516
517@d print(#)==write(term_out,#) {`|print|' means write on the terminal}
518@d print_ln(#)==write_ln(term_out,#) {`|print|' and then start new line}
519@d new_line==write_ln(term_out) {start new line}
520@d print_nl(#)==  {print information starting on a new line}
521  begin new_line; print(#);
522  end
523
524@<Globals...@>=
525@!term_out:text_file; {the terminal as an output file}
526
527@ Different systems have different ways of specifying that the output on a
528certain file will appear on the user's terminal. Here is one way to do this
529on the \PASCAL\ system that was used in \.{TANGLE}'s initial development:
530@^system dependencies@>
531
532@<Set init...@>=
533rewrite(term_out,'TTY:'); {send |term_out| output to the terminal}
534
535@ The |update_terminal| procedure is called when we want
536to make sure that everything we have output to the terminal so far has
537actually left the computer's internal buffers and been sent.
538@^system dependencies@>
539
540@d update_terminal == break(term_out) {empty the terminal output buffer}
541
542@ The main input comes from |web_file|; this input may be overridden
543by changes in |change_file|. (If |change_file| is empty, there are no changes.)
544
545@<Globals...@>=
546@!web_file:text_file; {primary input}
547@!change_file:text_file; {updates}
548
549@ The following code opens the input files.  Since these files were listed
550in the program header, we assume that the \PASCAL\ runtime system has
551already checked that suitable file names have been given; therefore no
552additional error checking needs to be done.
553@^system dependencies@>
554
555@p procedure open_input; {prepare to read |web_file| and |change_file|}
556begin reset(web_file); reset(change_file);
557end;
558
559@ The main output goes to |Pascal_file|, and string pool constants are
560written to the |pool| file.
561
562@<Globals...@>=
563@!Pascal_file: text_file;
564@!pool: text_file;
565
566@ The following code opens |Pascal_file| and |pool|.
567Since these files were listed in the program header, we assume that the
568\PASCAL\ runtime system has checked that suitable external file names have
569been given.
570@^system dependencies@>
571
572@<Set init...@>=
573rewrite(Pascal_file); rewrite(pool);
574
575@ Input goes into an array called |buffer|.
576
577@<Globals...@>=@!buffer: array[0..buf_size] of ASCII_code;
578
579@ The |input_ln| procedure brings the next line of input from the specified
580file into the |buffer| array and returns the value |true|, unless the file has
581already been entirely read, in which case it returns |false|. The conventions
582of \TeX\ are followed; i.e., |ASCII_code| numbers representing the next line
583of the file are input into |buffer[0]|, |buffer[1]|, \dots,
584|buffer[limit-1]|; trailing blanks are ignored;
585and the global variable |limit| is set to the length of the
586@^system dependencies@>
587line. The value of |limit| must be strictly less than |buf_size|.
588
589We assume that none of the |ASCII_code| values
590of |buffer[j]| for |0<=j<limit| is equal to 0, @'177, |line_feed|, |form_feed|,
591or |carriage_return|.
592
593@p function input_ln(var f:text_file):boolean;
594  {inputs a line or returns |false|}
595var final_limit:0..buf_size; {|limit| without trailing blanks}
596begin limit:=0; final_limit:=0;
597if eof(f) then input_ln:=false
598else  begin while not eoln(f) do
599    begin buffer[limit]:=xord[f^]; get(f);
600    incr(limit);
601    if buffer[limit-1]<>" " then final_limit:=limit;
602    if limit=buf_size then
603      begin while not eoln(f) do get(f);
604      decr(limit); {keep |buffer[buf_size]| empty}
605      if final_limit>limit then final_limit:=limit;
606      print_nl('! Input line too long'); loc:=0; error;
607@.Input line too long@>
608      end;
609    end;
610  read_ln(f); limit:=final_limit; input_ln:=true;
611  end;
612end;
613
614@* Reporting errors to the user.
615The \.{TANGLE} processor operates in two phases: first it inputs the source
616file and stores a compressed representation of the program, then it produces
617the \PASCAL\ output from the compressed representation.
618
619The global variable |phase_one| tells whether we are in Phase I or not.
620
621@<Globals...@>=
622@!phase_one: boolean; {|true| in Phase I, |false| in Phase II}
623
624@ If an error is detected while we are debugging,
625we usually want to look at the contents of memory.
626A special procedure will be declared later for this purpose.
627
628@<Error handling...@>=
629@!debug @+ procedure debug_help; forward;@+ gubed
630
631@ During the first phase, syntax errors are reported to the user by saying
632$$\hbox{`|err_print('! Error message')|'},$$
633followed by `|jump_out|' if no recovery from the error is provided.
634This will print the error message followed by an indication of where the error
635was spotted in the source file. Note that no period follows the error message,
636since the error routine will automatically supply a period.
637
638Errors that are noticed during the second phase are reported to the user
639in the same fashion, but the error message will be
640followed by an indication of where the error was spotted in the output file.
641
642The actual error indications are provided by a procedure called |error|.
643
644@d err_print(#)==begin new_line; print(#); error;
645  end
646
647@<Error handling...@>=
648procedure error; {prints '\..' and location of error message}
649var j: 0..out_buf_size; {index into |out_buf|}
650@!k,@!l: 0..buf_size; {indices into |buffer|}
651begin if phase_one then @<Print error location based on input buffer@>
652else @<Print error location based on output buffer@>;
653update_terminal; mark_error;
654@!debug debug_skipped:=debug_cycle; debug_help;@+gubed
655end;
656
657@ The error locations during Phase I can be indicated by using the global
658variables |loc|, |line|, and |changing|, which tell respectively the first
659unlooked-at position in |buffer|, the current line number, and whether or not
660the current line is from |change_file| or |web_file|.
661This routine should be modified on systems whose standard text editor
662has special line-numbering conventions.
663@^system dependencies@>
664
665@<Print error location based on input buffer@>=
666begin if changing then print('. (change file ')@+else print('. (');
667print_ln('l.', line:1, ')');
668if loc>=limit then l:=limit else l:=loc;
669for k:=1 to l do
670  if buffer[k-1]=tab_mark then print(' ')
671  else print(xchr[buffer[k-1]]); {print the characters already read}
672new_line;
673for k:=1 to l do print(' '); {space out the next line}
674for k:=l+1 to limit do print(xchr[buffer[k-1]]); {print the part not yet read}
675print(' '); {this space separates the message from future asterisks}
676end
677
678@ The position of errors detected during the second phase can be indicated
679by outputting the partially-filled output buffer, which contains |out_ptr|
680entries.
681
682@<Print error location based on output...@>=
683begin print_ln('. (l.',line:1,')');
684for j:=1 to out_ptr do print(xchr[out_buf[j-1]]); {print current partial line}
685print('... '); {indicate that this information is partial}
686end
687
688@ The |jump_out| procedure just cuts across all active procedure levels
689and jumps out of the program. This is the only non-local |goto| statement
690in \.{TANGLE}. It is used when no recovery from a particular error has
691been provided.
692
693Some \PASCAL\ compilers do not implement non-local |goto| statements.
694@^system dependencies@>
695In such cases the code that appears at label |end_of_TANGLE| should be
696copied into the |jump_out| procedure, followed by a call to a system procedure
697that terminates the program.
698
699@d fatal_error(#)==begin new_line; print(#); error; mark_fatal; jump_out;
700  end
701
702@<Error handling...@>=
703procedure jump_out;
704begin goto end_of_TANGLE;
705end;
706
707@ Sometimes the program's behavior is far different from what it should be,
708and \.{TANGLE} prints an error message that is really for the \.{TANGLE}
709maintenance person, not the user. In such cases the program says
710|confusion('indication of where we are')|.
711
712@d confusion(#)==fatal_error('! This can''t happen (',#,')')
713@.This can't happen@>
714
715@ An overflow stop occurs if \.{TANGLE}'s tables aren't large enough.
716
717@d overflow(#)==fatal_error('! Sorry, ',#,' capacity exceeded')
718@.Sorry, x capacity exceeded@>
719
720
721@* Data structures.
722Most of the user's \PASCAL\ code is packed into eight-bit integers
723in two large arrays called |byte_mem| and |tok_mem|.
724The |byte_mem| array holds the names of identifiers, strings, and modules;
725the |tok_mem| array holds the replacement texts
726for macros and modules. Allocation is sequential, since things are deleted only
727during Phase II, and only in a last-in-first-out manner.
728
729Auxiliary arrays |byte_start| and |tok_start| are used as directories to
730|byte_mem| and |tok_mem|, and the |link|, |ilk|, |equiv|, and |text_link|
731arrays give further information about names. These auxiliary arrays
732consist of sixteen-bit items.
733
734@<Types...@>=
735@!eight_bits=0..255; {unsigned one-byte quantity}
736@!sixteen_bits=integer; {not quite a two-byte quantity}
737
738@ \.{TANGLE} has been designed to avoid the need for indices that are more
739than sixteen bits wide, so that it can be used on most computers. But
740there are programs that need more than 65536 tokens, and some programs
741even need more than 65536 bytes; \TeX\ is one of these.  To get around
742this problem, a slight complication has been added to the data structures:
743|byte_mem| and |tok_mem| are two-dimensional arrays, whose first index is
744either 0 or 1. (For generality, the first index is actually allowed to run
745between 0 and |ww-1| in |byte_mem|, or between 0 and |zz-1| in |tok_mem|,
746where |ww| and |zz| are set to 2 and~3; the program will work for any
747positive values of |ww| and |zz|, and it can be simplified in obvious ways
748if |ww=1| or |zz=1|.)
749
750@d ww=2 {we multiply the byte capacity by approximately this amount}
751@d zz=3 {we multiply the token capacity by approximately this amount}
752
753@<Globals...@>=
754@!byte_mem: packed array [0..ww-1,0..max_bytes] of ASCII_code;
755  {characters of names}
756@!tok_mem: packed array [0..zz-1,0..max_toks] of eight_bits; {tokens}
757@!byte_start: array [0..max_names] of sixteen_bits; {directory into |byte_mem|}
758@!tok_start: array [0..max_texts] of sixteen_bits; {directory into |tok_mem|}
759@!link: array [0..max_names] of sixteen_bits; {hash table or tree links}
760@!ilk: array [0..max_names] of sixteen_bits; {type codes or tree links}
761@!equiv: array [0..max_names] of sixteen_bits; {info corresponding to names}
762@!text_link: array [0..max_texts] of sixteen_bits; {relates replacement texts}
763
764@ The names of identifiers are found by computing a hash address |h| and
765then looking at strings of bytes signified by |hash[h]|, |link[hash[h]]|,
766|link[link[hash[h]]]|, \dots, until either finding the desired name
767or encountering a zero.
768
769A `|name_pointer|' variable, which signifies a name, is an index into
770|byte_start|. The actual sequence of characters in the name pointed to by
771|p| appears in positions |byte_start[p]| to |byte_start[p+ww]-1|, inclusive,
772in the segment of |byte_mem| whose first index is |p mod ww|. Thus, when
773|ww=2| the even-numbered name bytes appear in |byte_mem[0,@t$*$@>]|
774and the odd-numbered ones appear in |byte_mem[1,@t$*$@>]|.
775The pointer 0 is used for undefined module names; we don't
776want to use it for the names of identifiers, since 0 stands for a null
777pointer in a linked list.
778
779Strings are treated like identifiers; the first character (a double-quote)
780distinguishes a string from an alphabetic name, but for \.{TANGLE}'s purposes
781strings behave like numeric macros. (A `string' here refers to the
782strings delimited by double-quotes that \.{TANGLE} processes. \PASCAL\
783string constants delimited by single-quote marks are not given such special
784treatment; they simply appear as sequences of characters in the \PASCAL\
785texts.)  The total number of strings in the string
786pool is called |string_ptr|, and the total number of names in |byte_mem|
787is called |name_ptr|. The total number of bytes occupied in
788|byte_mem[w,@t$*$@>]| is called |byte_ptr[w]|.
789
790We usually have |byte_start[name_ptr+w]=byte_ptr[(name_ptr+w) mod ww]|
791for |0<=w<ww|, since these are the starting positions for the next |ww|
792names to be stored in |byte_mem|.
793
794@d length(#)==byte_start[#+ww]-byte_start[#] {the length of a name}
795
796@<Types...@>=
797@!name_pointer=0..max_names; {identifies a name}
798
799@ @<Global...@>=
800@!name_ptr:name_pointer; {first unused position in |byte_start|}
801@!string_ptr:name_pointer; {next number to be given to a string of length |<>1|}
802@!byte_ptr:array [0..ww-1] of 0..max_bytes;
803  {first unused position in |byte_mem|}
804@!pool_check_sum:integer; {sort of a hash for the whole string pool}
805
806@ @<Local variables for init...@>=
807@!wi: 0..ww-1; {to initialize the |byte_mem| indices}
808
809@ @<Set init...@>=
810for wi:=0 to ww-1 do
811  begin byte_start[wi]:=0; byte_ptr[wi]:=0;
812  end;
813byte_start[ww]:=0; {this makes name 0 of length zero}
814name_ptr:=1; string_ptr:=number_chars; pool_check_sum:=271828;
815
816@ Replacement texts are stored in |tok_mem|, using similar conventions.
817A `|text_pointer|' variable is an index into |tok_start|, and the
818replacement text that corresponds to |p| runs from positions
819|tok_start[p]| to |tok_start[p+zz]-1|, inclusive, in the segment of
820|tok_mem| whose first index is |p mod zz|. Thus, when |zz=2| the
821even-numbered replacement texts appear in |tok_mem[0,@t$*$@>]| and the
822odd-numbered ones appear in |tok_mem[1,@t$*$@>]|.  Furthermore,
823|text_link[p]| is used to connect pieces of text that have the same name,
824as we shall see later. The pointer 0 is used for undefined replacement
825texts.
826
827The first position of |tok_mem[z,@t$*$@>]| that is unoccupied by
828replacement text is called |tok_ptr[z]|, and the first unused location of
829|tok_start| is called |text_ptr|.  We usually have the identity
830|tok_start[text_ptr+z]=tok_ptr[(text_ptr+z) mod zz]|, for |0<=z<zz|, since
831these are the starting positions for the next |zz| replacement texts to
832be stored in |tok_mem|.
833
834@<Types...@>=
835@!text_pointer=0..max_texts; {identifies a replacement text}
836
837@ It is convenient to maintain a variable |z| that is equal to |text_ptr
838mod zz|, so that we always insert tokens into segment |z| of |tok_mem|.
839
840@<Glob...@>=
841@t\hskip1em@>@!text_ptr:text_pointer; {first unused position in |tok_start|}
842@t\hskip1em@>@!tok_ptr:array[0..zz-1] of 0..max_toks;
843  {first unused position in a given segment of |tok_mem|}
844@t\hskip1em@>@!z:0..zz-1; {current segment of |tok_mem|}
845stat @!max_tok_ptr:array[0..zz-1] of 0..max_toks;
846  {largest values assumed by |tok_ptr|}
847tats
848
849@ @<Local variables for init...@>=
850@!zi:0..zz-1; {to initialize the |tok_mem| indices}
851
852@ @<Set init...@>=
853for zi:=0 to zz-1 do
854  begin tok_start[zi]:=0; tok_ptr[zi]:=0;
855  end;
856tok_start[zz]:=0; {this makes replacement text 0 of length zero}
857text_ptr:=1; z:=1 mod zz;
858
859@ Four types of identifiers are distinguished by their |ilk|:
860
861\yskip\hang |normal| identifiers will appear in the \PASCAL\ program as
862ordinary identifiers since they have not been defined to be macros; the
863corresponding value in the |equiv| array
864for such identifiers is a link in a secondary hash table that
865is used to check whether any two of them agree in their first |unambig_length|
866characters after underline symbols are removed and lowercase letters are
867changed to uppercase.
868
869\yskip\hang |numeric| identifiers have been defined to be numeric macros;
870their |equiv| value contains the corresponding numeric value plus $2^{15}$.
871Strings are treated as numeric macros.
872
873\yskip\hang |simple| identifiers have been defined to be simple macros;
874their |equiv| value points to the corresponding replacement text.
875
876\yskip\hang |parametric| identifiers have been defined to be parametric macros;
877like simple identifiers, their |equiv| value points to the replacement text.
878
879@d normal=0 {ordinary identifiers have |normal| ilk}
880@d numeric=1 {numeric macros and strings have |numeric| ilk}
881@d simple=2 {simple macros have |simple| ilk}
882@d parametric=3 {parametric macros have |parametric| ilk}
883
884@ The names of modules are stored in |byte_mem| together
885with the identifier names, but a hash table is not used for them because
886\.{TANGLE} needs to be able to recognize a module name when given a prefix of
887that name. A conventional binary seach tree is used to retrieve module names,
888with fields called |llink| and |rlink| in place of |link| and |ilk|. The
889root of this tree is |rlink[0]|. If |p| is a pointer to a module name,
890|equiv[p]| points to its replacement text, just as in simple and parametric
891macros, unless this replacement text has not yet been defined (in which case
892|equiv[p]=0|).
893
894@d llink==link {left link in binary search tree for module names}
895@d rlink==ilk {right link in binary search tree for module names}
896
897@<Set init...@>=
898rlink[0]:=0; {the binary search tree starts out with nothing in it}
899equiv[0]:=0; {the undefined module has no replacement text}
900
901@ Here is a little procedure that prints the text of a given name.
902
903@p procedure print_id(@!p:name_pointer); {print identifier or module name}
904var k:0..max_bytes; {index into |byte_mem|}
905@!w:0..ww-1; {segment of |byte_mem|}
906begin if p>=name_ptr then print('IMPOSSIBLE')
907else  begin w:=p mod ww;
908  for k:=byte_start[p] to byte_start[p+ww]-1 do print(xchr[byte_mem[w,k]]);
909  end;
910end;
911
912@* Searching for identifiers.
913The hash table described above is updated by the |id_lookup| procedure,
914which finds a given identifier and returns a pointer to its index in
915|byte_start|. If the identifier was not already present, it is inserted with
916a given |ilk| code; and an error message is printed if the identifier is being
917doubly defined.
918
919Because of the way \.{TANGLE}'s scanning mechanism works, it is most convenient
920to let |id_lookup| search for an identifier that is present in the |buffer|
921array. Two other global variables specify its position in the buffer: the
922first character is |buffer[id_first]|, and the last is |buffer[id_loc-1]|.
923Furthermore, if the identifier is really a string, the global variable
924|double_chars| tells how many of the characters in the buffer appear
925twice (namely \.{@@@@} and \.{""}), since this additional information makes
926it easy to calculate the true length of the string. The final double-quote
927of the string is not included in its ``identifier,'' but the first one is,
928so the string length is |id_loc-id_first-double_chars-1|.
929
930We have mentioned that |normal| identifiers belong to two hash tables,
931one for their true names as they appear in the \.{WEB} file and the other
932when they have been reduced to their first |unambig_length| characters.
933The hash tables are kept by the method of simple chaining, where the
934heads of the individual lists appear in the |hash| and |chop_hash| arrays.
935If |h| is a hash code, the primary hash table list starts at |hash[h]| and
936proceeds through |link| pointers; the secondary hash table list starts at
937|chop_hash[h]| and proceeds through |equiv| pointers. Of course, the same
938identifier will probably have two different values of |h|.
939
940The |id_lookup| procedure uses an auxiliary array called |chopped_id| to
941contain up to |unambig_length| characters of the current identifier, if
942it is necessary to compute the secondary hash code. (This array could be
943declared local to |id_lookup|, but in general we are making all array
944declarations global in this program, because some compilers and some machine
945architectures make dynamic array allocation inefficient.)
946
947@<Glob...@>=
948@!id_first:0..buf_size; {where the current identifier begins in the buffer}
949@!id_loc:0..buf_size; {just after the current identifier in the buffer}
950@!double_chars:0..buf_size; {correction to length in case of strings}
951@#
952@!hash,@!chop_hash:array [0..hash_size] of sixteen_bits; {heads of hash lists}
953@!chopped_id:array [0..unambig_length] of ASCII_code; {chopped identifier}
954
955@ Initially all the hash lists are empty.
956
957@<Local variables for init...@>=
958@!h:0..hash_size; {index into hash-head arrays}
959
960@ @<Set init...@>=
961for h:=0 to hash_size-1 do
962  begin hash[h]:=0; chop_hash[h]:=0;
963  end;
964
965@ Here now is the main procedure for finding identifiers (and strings).
966The parameter |t| is set to |normal| except when the identifier is
967a macro name that is just being defined; in the latter case, |t| will be
968|numeric|, |simple|, or |parametric|.
969
970@p function id_lookup(@!t:eight_bits):name_pointer; {finds current identifier}
971label found, not_found;
972var c:eight_bits; {byte being chopped}
973@!i:0..buf_size; {index into |buffer|}
974@!h:0..hash_size; {hash code}
975@!k:0..max_bytes; {index into |byte_mem|}
976@!w:0..ww-1; {segment of |byte_mem|}
977@!l:0..buf_size; {length of the given identifier}
978@!p,@!q:name_pointer; {where the identifier is being sought}
979@!s:0..unambig_length; {index into |chopped_id|}
980begin l:=id_loc-id_first; {compute the length}
981@<Compute the hash code |h|@>;
982@<Compute the name location |p|@>;
983if (p=name_ptr)or(t<>normal) then
984  @<Update the tables and check for possible errors@>;
985id_lookup:=p;
986end;
987
988@ A simple hash code is used: If the sequence of
989ASCII codes is $c_1c_2\ldots c_m$, its hash value will be
990$$(2^{n-1}c_1+2^{n-2}c_2+\cdots+c_n)\,\bmod\,|hash_size|.$$
991
992@<Compute the hash...@>=
993h:=buffer[id_first]; i:=id_first+1;
994while i<id_loc do
995  begin h:=(h+h+buffer[i]) mod hash_size; incr(i);
996  end
997
998@ If the identifier is new, it will be placed in position |p=name_ptr|,
999otherwise |p| will point to its existing location.
1000
1001@<Compute the name location...@>=
1002p:=hash[h];
1003while p<>0 do
1004  begin if length(p)=l then
1005      @<Compare name |p| with current identifier, |goto found| if equal@>;
1006  p:=link[p];
1007  end;
1008p:=name_ptr; {the current identifier is new}
1009link[p]:=hash[h]; hash[h]:=p; {insert |p| at beginning of hash list}
1010found:
1011
1012@ @<Compare name |p|...@>=
1013begin i:=id_first; k:=byte_start[p]; w:=p mod ww;
1014while (i<id_loc)and(buffer[i]=byte_mem[w,k]) do
1015  begin incr(i); incr(k);
1016  end;
1017if i=id_loc then goto found; {all characters agree}
1018end
1019
1020@ @<Update the tables...@>=
1021begin if ((p<>name_ptr)and(t<>normal)and(ilk[p]=normal)) or
1022    ((p=name_ptr)and(t=normal)and(buffer[id_first]<>"""")) then
1023  @<Compute the secondary hash code |h| and put the first characters
1024  into the auxiliary array |chopped_id|@>;
1025if p<>name_ptr then
1026  @<Give double-definition error, if necessary, and change |p| to type |t|@>
1027else @<Enter a new identifier into the table at position |p|@>;
1028end
1029
1030@ The following routine, which is called into play when it is necessary to
1031look at the secondary hash table, computes the same hash function as before
1032(but on the chopped data), and places a zero after the chopped identifier
1033in |chopped_id| to serve as a convenient sentinel.
1034
1035@<Compute the secondary...@>=
1036begin i:=id_first; s:=0; h:=0;
1037while (i<id_loc)and(s<unambig_length) do
1038  begin if buffer[i]<>"_" then
1039    begin if buffer[i]>="a" then chopped_id[s]:=buffer[i]-@'40
1040    else chopped_id[s]:=buffer[i];
1041    h:=(h+h+chopped_id[s]) mod hash_size; incr(s);
1042    end;
1043  incr(i);
1044  end;
1045chopped_id[s]:=0;
1046end
1047
1048@ If a nonnumeric macro has appeared before it was defined, \.{TANGLE}
1049will still work all right; after all, such behavior is typical of the
1050replacement texts for modules, which act very much like macros.
1051However, an undefined numeric macro may not be used on the right-hand
1052side of another numeric macro definition, so \.{TANGLE} finds it
1053simplest to make a blanket rule that numeric macros should be defined
1054before they are used. The following routine gives an error message and
1055also fixes up any damage that may have been caused.
1056
1057@<Give double...@>= {now |p<>name_ptr| and |t<>normal|}
1058begin if ilk[p]=normal then
1059  begin if t=numeric then err_print('! This identifier has already appeared');
1060@.This identifier has already...@>
1061  @<Remove |p| from secondary hash table@>;
1062  end
1063else err_print('! This identifier was defined before');
1064@.This identifier was defined...@>
1065ilk[p]:=t;
1066end
1067
1068@ When we have to remove a secondary hash entry, because a |normal| identifier
1069is changing to another |ilk|, the hash code |h| and chopped identifier have
1070already been computed.
1071
1072@<Remove |p| from secondary...@>=
1073q:=chop_hash[h];
1074if q=p then chop_hash[h]:=equiv[p]
1075else  begin while equiv[q]<>p do q:=equiv[q];
1076  equiv[q]:=equiv[p];
1077  end
1078
1079@ The following routine could make good use of a generalized |pack| procedure
1080that puts items into just part of a packed array instead of the whole thing.
1081
1082@<Enter a new identifier...@>=
1083begin if (t=normal)and(buffer[id_first]<>"""") then
1084  @<Check for ambiguity and update secondary hash@>;
1085w:=name_ptr mod ww; k:=byte_ptr[w];
1086if k+l>max_bytes then overflow('byte memory');
1087if name_ptr>max_names-ww then overflow('name');
1088i:=id_first; {get ready to move the identifier into |byte_mem|}
1089while i<id_loc do
1090  begin byte_mem[w,k]:=buffer[i]; incr(k); incr(i);
1091  end;
1092byte_ptr[w]:=k; byte_start[name_ptr+ww]:=k; incr(name_ptr);
1093if buffer[id_first]<>"""" then ilk[p]:=t
1094else @<Define and output a new string of the pool@>;
1095end
1096
1097@ @<Check for ambig...@>=
1098begin q:=chop_hash[h];
1099while q<>0 do
1100  begin @<Check if |q| conflicts with |p|@>;
1101  q:=equiv[q];
1102  end;
1103equiv[p]:=chop_hash[h]; chop_hash[h]:=p; {put |p| at front of secondary list}
1104end
1105
1106@ @<Check if |q| conflicts...@>=
1107begin k:=byte_start[q]; s:=0; w:=q mod ww;
1108while (k<byte_start[q+ww]) and (s<unambig_length) do
1109  begin c:=byte_mem[w,k];
1110  if c<>"_" then
1111    begin if c>="a" then c:=c-@'40; {merge lowercase with uppercase}
1112    if chopped_id[s]<>c then goto not_found;
1113    incr(s);
1114    end;
1115  incr(k);
1116  end;
1117if (k=byte_start[q+ww])and(chopped_id[s]<>0) then goto not_found;
1118print_nl('! Identifier conflict with ');
1119@.Identifier conflict...@>
1120for k:=byte_start[q] to byte_start[q+ww]-1 do print(xchr[byte_mem[w,k]]);
1121error; q:=0; {only one conflict will be printed, since |equiv[0]=0|}
1122not_found:
1123end
1124
1125@ We compute the string pool check sum by working modulo a prime number
1126that is large but not so large that overflow might occur.
1127
1128@d check_sum_prime==@'3777777667 {$2^{29}-73$}
1129@^preprocessed strings@>
1130
1131@<Define and output a new string...@>=
1132begin ilk[p]:=numeric; {strings are like numeric macros}
1133if l-double_chars=2 then {this string is for a single character}
1134  equiv[p]:=buffer[id_first+1]+1073741824
1135else  begin equiv[p]:=string_ptr+1073741824;
1136  l:=l-double_chars-1;
1137  if l>99 then err_print('! Preprocessed string is too long');
1138@.Preprocessed string is too long@>
1139  incr(string_ptr);
1140  write(pool,xchr["0"+l div 10],xchr["0"+l mod 10]); {output the length}
1141  pool_check_sum:=pool_check_sum+pool_check_sum+l;
1142  while pool_check_sum>check_sum_prime do
1143    pool_check_sum:=pool_check_sum-check_sum_prime;
1144  i:=id_first+1;
1145  while i<id_loc do
1146    begin write(pool,xchr[buffer[i]]); {output characters of string}
1147    pool_check_sum:=pool_check_sum+pool_check_sum+buffer[i];
1148    while pool_check_sum>check_sum_prime do
1149      pool_check_sum:=pool_check_sum-check_sum_prime;
1150    if (buffer[i]="""") or (buffer[i]="@@") then
1151      i:=i+2 {omit second appearance of doubled character}
1152    else incr(i);
1153    end;
1154  write_ln(pool);
1155  end;
1156end
1157
1158@* Searching for module names.
1159The |mod_lookup| procedure finds the module name |mod_text[1..l]| in the
1160search tree, after inserting it if necessary, and returns a pointer to
1161where it was found.
1162
1163@<Glob...@>=
1164@!mod_text:array [0..longest_name] of ASCII_code; {name being sought for}
1165
1166@ According to the rules of \.{WEB}, no module name
1167should be a proper prefix of another, so a ``clean'' comparison should
1168occur between any two names. The result of |mod_lookup| is 0 if this
1169prefix condition is violated. An error message is printed when such violations
1170are detected during phase two of \.{WEAVE}.
1171
1172@d less=0 {the first name is lexicographically less than the second}
1173@d equal=1 {the first name is equal to the second}
1174@d greater=2 {the first name is lexicographically greater than the second}
1175@d prefix=3 {the first name is a proper prefix of the second}
1176@d extension=4 {the first name is a proper extension of the second}
1177
1178@p function mod_lookup(@!l:sixteen_bits):name_pointer; {finds module name}
1179label found;
1180var c:less..extension; {comparison between two names}
1181@!j:0..longest_name; {index into |mod_text|}
1182@!k:0..max_bytes; {index into |byte_mem|}
1183@!w:0..ww-1; {segment of |byte_mem|}
1184@!p:name_pointer; {current node of the search tree}
1185@!q:name_pointer; {father of node |p|}
1186begin c:=greater; q:=0; p:=rlink[0]; {|rlink[0]| is the root of the tree}
1187while p<>0 do
1188  begin @<Set \(|c| to the result of comparing the given name to
1189    name |p|@>;
1190  q:=p;
1191  if c=less then p:=llink[q]
1192  else if c=greater then p:=rlink[q]
1193  else goto found;
1194  end;
1195@<Enter a new module name into the tree@>;
1196found: if c<>equal then
1197  begin err_print('! Incompatible section names'); p:=0;
1198@.Incompatible module names@>
1199  end;
1200mod_lookup:=p;
1201end;
1202
1203@ @<Enter a new module name...@>=
1204w:=name_ptr mod ww; k:=byte_ptr[w];
1205if k+l>max_bytes then overflow('byte memory');
1206if name_ptr>max_names-ww then overflow('name');
1207p:=name_ptr;
1208if c=less then llink[q]:=p else rlink[q]:=p;
1209llink[p]:=0; rlink[p]:=0; c:=equal; equiv[p]:=0;
1210for j:=1 to l do byte_mem[w,k+j-1]:=mod_text[j];
1211byte_ptr[w]:=k+l; byte_start[name_ptr+ww]:=k+l; incr(name_ptr);
1212
1213@ @<Set \(|c|...@>=
1214begin k:=byte_start[p]; w:=p mod ww; c:=equal; j:=1;
1215while (k<byte_start[p+ww]) and (j<=l) and (mod_text[j]=byte_mem[w,k]) do
1216  begin incr(k); incr(j);
1217  end;
1218if k=byte_start[p+ww] then
1219  if j>l then c:=equal
1220  else c:=extension
1221else if j>l then c:=prefix
1222else if mod_text[j]<byte_mem[w,k] then c:=less
1223else c:=greater;
1224end
1225
1226@ The |prefix_lookup| procedure is supposed to find exactly one module
1227name that has |mod_text[1..l]| as a prefix. Actually the algorithm silently
1228accepts also the situation that some module name is a prefix of
1229|mod_text[1..l]|, because the user who painstakingly typed in more than
1230necessary probably doesn't want to be told about the wasted effort.
1231
1232@p function prefix_lookup(@!l:sixteen_bits):name_pointer; {finds name extension}
1233var c:less..extension; {comparison between two names}
1234@!count:0..max_names; {the number of hits}
1235@!j:0..longest_name; {index into |mod_text|}
1236@!k:0..max_bytes; {index into |byte_mem|}
1237@!w:0..ww-1; {segment of |byte_mem|}
1238@!p:name_pointer; {current node of the search tree}
1239@!q:name_pointer; {another place to resume the search after one branch is done}
1240@!r:name_pointer; {extension found}
1241begin q:=0; p:=rlink[0]; count:=0; r:=0; {begin search at root of tree}
1242while p<>0 do
1243  begin @<Set \(|c|...@>;
1244  if c=less then p:=llink[p]
1245  else if c=greater then p:=rlink[p]
1246  else  begin r:=p; incr(count); q:=rlink[p]; p:=llink[p];
1247    end;
1248  if p=0 then
1249    begin p:=q; q:=0;
1250    end;
1251  end;
1252if count<>1 then
1253  if count=0 then err_print('! Name does not match')
1254@.Name does not match@>
1255  else err_print('! Ambiguous prefix');
1256@.Ambiguous prefix@>
1257prefix_lookup:=r; {the result will be 0 if there was no match}
1258end;
1259
1260@* Tokens.
1261Replacement texts, which represent \PASCAL\ code in a compressed format,
1262appear in |tok_mem| as mentioned above. The codes in
1263these texts are called `tokens'; some tokens occupy two consecutive
1264eight-bit byte positions, and the others take just one byte.
1265
1266If $p>0$ points to a replacement text, |tok_start[p]| is the |tok_mem| position
1267of the first eight-bit code of that text. If |text_link[p]=0|,
1268this is the replacement text for a macro, otherwise it is the replacement
1269text for a module. In the latter case |text_link[p]| is either equal to
1270|module_flag|, which means that there is no further text for this module, or
1271|text_link[p]| points to a
1272continuation of this replacement text; such links are created when
1273several modules have \PASCAL\ texts with the same name, and they also
1274tie together all the \PASCAL\ texts of unnamed modules.
1275The replacement text pointer for the first unnamed module
1276appears in |text_link[0]|, and the most recent such pointer is |last_unnamed|.
1277
1278@d module_flag==max_texts {final |text_link| in module replacement texts}
1279
1280@<Glob...@>=
1281@!last_unnamed:text_pointer; {most recent replacement text of unnamed module}
1282
1283@ @<Set init...@>= last_unnamed:=0; text_link[0]:=0;
1284
1285@ If the first byte of a token is less than @'200, the token occupies a
1286single byte. Otherwise we make a sixteen-bit token by combining two consecutive
1287bytes |a| and |b|. If |@'200<=a<@'250|, then $(a-@'200)\times2^8+b$ points
1288to an identifier; if |@'250<=a<@'320|, then
1289$(a-@'250)\times2^8+b$ points to a module name; otherwise, i.e., if
1290|@'320<=a<@'400|, then $(a-@'320)\times2^8+b$ is the number of the module
1291in which the current replacement text appears.
1292
1293Codes less than @'200 are 7-bit ASCII codes that represent themselves.
1294In particular, a single-character identifier like `|x|' will be a one-byte
1295token, while all longer identifiers will occupy two bytes.
1296
1297Some of the 7-bit ASCII codes will not be present, however, so we can
1298use them for special purposes. The following symbolic names are used:
1299
1300\yskip\hang |param| denotes insertion of a parameter. This occurs only in
1301the replacement texts of parametric macros, outside of single-quoted strings
1302in those texts.
1303
1304\hang |begin_comment| denotes \.{@@\{}, which will become either
1305\.{\{} or \.{[}.
1306
1307\hang |end_comment| denotes \.{@@\}}, which will become either
1308\.{\}} or \.{]}.
1309
1310\hang |octal| denotes the \.{@@\'} that precedes an octal constant.
1311
1312\hang |hex| denotes the \.{@@"} that precedes a hexadecimal constant.
1313
1314\hang |check_sum| denotes the \.{@@\char'44} that denotes the string pool
1315check sum.
1316
1317\hang |join| denotes the concatenation of adjacent items with no
1318space or line breaks allowed between them (the \.{@@\&} operation of \.{WEB}).
1319
1320\hang |double_dot| denotes `\.{..}' in \PASCAL.
1321
1322\hang |verbatim| denotes the \.{@@=} that begins a verbatim \PASCAL\ string.
1323It is also used for the end of the string.
1324
1325\hang |force_line| denotes the \.{@@\\} that forces a new line in the
1326\PASCAL\ output.
1327@^ASCII code@>
1328
1329@d param=0 {ASCII null code will not appear}
1330@d verbatim=@'2 {extended ASCII alpha should not appear}
1331@d force_line=@'3 {extended ASCII beta should not appear}
1332@d begin_comment=@'11 {ASCII tab mark will not appear}
1333@d end_comment=@'12 {ASCII line feed will not appear}
1334@d octal=@'14 {ASCII form feed will not appear}
1335@d hex=@'15 {ASCII carriage return will not appear}
1336@d double_dot=@'40 {ASCII space will not appear except in strings}
1337@d check_sum=@'175 {will not be confused with right brace}
1338@d join=@'177 {ASCII delete will not appear}
1339
1340@ The following procedure is used to enter a two-byte value into
1341|tok_mem| when a replacement text is being generated.
1342
1343@p procedure store_two_bytes(@!x:sixteen_bits);
1344  {stores high byte, then low byte}
1345begin if tok_ptr[z]+2>max_toks then overflow('token');
1346tok_mem[z,tok_ptr[z]]:=x div@'400; {this could be done by a shift command}
1347tok_mem[z,tok_ptr[z]+1]:=x mod@'400; {this could be done by a logical and}
1348tok_ptr[z]:=tok_ptr[z]+2;
1349end;
1350
1351@ When \.{TANGLE} is being operated in debug mode, it has a procedure to display
1352a replacement text in symbolic form. This procedure has not been spruced up to
1353generate a real great format, but at least the results are not as bad as
1354a memory dump.
1355
1356@p @!debug procedure print_repl(@!p:text_pointer);
1357var k:0..max_toks; {index into |tok_mem|}
1358@!a: sixteen_bits; {current byte(s)}
1359@!zp: 0..zz-1; {segment of |tok_mem| being accessed}
1360begin if p>=text_ptr then print('BAD')
1361else  begin k:=tok_start[p]; zp:=p mod zz;
1362  while k<tok_start[p+zz] do
1363    begin a:=tok_mem[zp,k];
1364    if a>=@'200 then @<Display two-byte token starting with |a|@>
1365    else @<Display one-byte token |a|@>;
1366    incr(k);
1367    end;
1368  end;
1369end;
1370gubed
1371
1372@ @<Display two-byte...@>=
1373begin incr(k);
1374if a<@'250 then {identifier or string}
1375  begin a:=(a-@'200)*@'400+tok_mem[zp,k]; print_id(a);
1376  if byte_mem[a mod ww,byte_start[a]]="""" then print('"')
1377  else print(' ');
1378  end
1379else if a<@'320 then {module name}
1380  begin print('@@<'); print_id((a-@'250)*@'400+tok_mem[zp,k]);
1381  print('@@>');
1382  end
1383else  begin a:=(a-@'320)*@'400+tok_mem[zp,k]; {module number}
1384  print('@@',xchr["{"],a:1,'@@',xchr["}"]); {can't use right brace
1385    between \&{debug} and \&{gubed}}
1386  end;
1387end
1388
1389@ @<Display one-byte...@>=
1390case a of
1391begin_comment: print('@@',xchr["{"]);
1392end_comment: print('@@',xchr["}"]); {can't use right brace
1393    between \&{debug} and \&{gubed}}
1394octal: print('@@''');
1395hex: print('@@"');
1396check_sum: print('@@$');
1397param: print('#');
1398"@@": print('@@@@');
1399verbatim: print('@@=');
1400force_line: print('@@\');
1401othercases print(xchr[a])
1402endcases
1403
1404@* Stacks for output.
1405Let's make sure that our data structures contain enough information to
1406produce the entire \PASCAL\ program as desired, by working next on the
1407algorithms that actually do produce that program.
1408
1409@ The output process uses a stack to keep track of what is going on at
1410different ``levels'' as the macros are being expanded.
1411Entries on this stack have five parts:
1412
1413\yskip\hang |end_field| is the |tok_mem| location where the replacement
1414text of a particular level will end;
1415
1416\hang |byte_field| is the |tok_mem| location from which the next token
1417on a particular level will be read;
1418
1419\hang |name_field| points to the name corresponding to a particular level;
1420
1421\hang |repl_field| points to the replacement text currently being read
1422at a particular level;
1423
1424\hang |mod_field| is the module number, or zero if this is a macro.
1425
1426\yskip\noindent The current values of these five quantities are referred to
1427quite frequently, so they are stored in a separate place instead of in
1428the |stack| array. We call the current values |cur_end|, |cur_byte|,
1429|cur_name|, |cur_repl|, and |cur_mod|.
1430
1431The global variable |stack_ptr| tells how many levels of output are
1432currently in progress. The end of all output occurs when the stack is
1433empty, i.e., when |stack_ptr=0|.
1434
1435@<Types...@>=
1436@t\4@>@!output_state=record
1437  @!end_field: sixteen_bits; {ending location of replacement text}
1438  @!byte_field: sixteen_bits; {present location within replacement text}
1439  @!name_field: name_pointer; {|byte_start| index for text being output}
1440  @!repl_field: text_pointer; {|tok_start| index for text being output}
1441  @!mod_field: 0..@'27777; {module number or zero if not a module}
1442  end;
1443
1444@ @d cur_end==cur_state.end_field {current ending location in |tok_mem|}
1445@d cur_byte==cur_state.byte_field {location of next output byte in |tok_mem|}
1446@d cur_name==cur_state.name_field {pointer to current name being expanded}
1447@d cur_repl==cur_state.repl_field {pointer to current replacement text}
1448@d cur_mod==cur_state.mod_field {current module number being expanded}
1449
1450@<Globals...@>=
1451@!cur_state : output_state; {|cur_end|, |cur_byte|, |cur_name|,
1452  |cur_repl|, |cur_mod|}
1453@!stack : array [1..stack_size] of output_state; {info for non-current levels}
1454@!stack_ptr: 0..stack_size; {first unused location in the output state stack}
1455
1456@ It is convenient to keep a global variable |zo| equal to |cur_repl mod zz|.
1457
1458@<Glob...@>=
1459@!zo:0..zz-1; {the segment of |tok_mem| from which output is coming}
1460
1461@ Parameters must also be stacked. They are placed in
1462|tok_mem| just above the other replacement texts, and dummy parameter
1463`names' are placed in |byte_start| just after the other names.
1464The variables |text_ptr| and |tok_ptr[z]| essentially serve as parameter
1465stack pointers during the output phase, so there is no need for a separate
1466data structure to handle this problem.
1467
1468@ There is an implicit stack corresponding to meta-comments that are output
1469via \.{@@\{} and \.{@@\}}. But this stack need not be represented in detail,
1470because we only need to know whether it is empty or not. A global variable
1471|brace_level| tells how many items would be on this stack if it were present.
1472
1473@<Globals...@>=
1474@!brace_level: eight_bits; {current depth of $\.{@@\{}\ldots\.{@@\}}$ nesting}
1475
1476@ To get the output process started, we will perform the following
1477initialization steps. We may assume that |text_link[0]| is nonzero, since it
1478points to the \PASCAL\ text in the first unnamed module that generates
1479code; if there are no such modules, there is nothing to output, and an
1480error message will have been generated before we do any of the initialization.
1481
1482@<Initialize the output stacks@>=
1483stack_ptr:=1; brace_level:=0; cur_name:=0; cur_repl:=text_link[0];
1484zo:=cur_repl mod zz; cur_byte:=tok_start[cur_repl];
1485cur_end:=tok_start[cur_repl+zz]; cur_mod:=0;
1486
1487@ When the replacement text for name |p| is to be inserted into the output,
1488the following subroutine is called to save the old level of output and get
1489the new one going.
1490
1491@p procedure push_level(@!p:name_pointer); {suspends the current level}
1492begin if stack_ptr=stack_size then overflow('stack')
1493else  begin stack[stack_ptr]:=cur_state; {save |cur_end|, |cur_byte|, etc.}
1494  incr(stack_ptr);
1495  cur_name:=p; cur_repl:=equiv[p]; zo:=cur_repl mod zz;
1496  cur_byte:=tok_start[cur_repl]; cur_end:=tok_start[cur_repl+zz];
1497  cur_mod:=0;
1498  end;
1499end;
1500
1501@ When we come to the end of a replacement text, the |pop_level| subroutine
1502does the right thing: It either moves to the continuation of this replacement
1503text or returns the state to the most recently stacked level. Part of this
1504subroutine, which updates the parameter stack, will be given later when we
1505study the parameter stack in more detail.
1506
1507@p procedure pop_level; {do this when |cur_byte| reaches |cur_end|}
1508label exit;
1509begin if text_link[cur_repl]=0 then {end of macro expansion}
1510  begin if ilk[cur_name]=parametric then
1511    @<Remove a parameter from the parameter stack@>;
1512  end
1513else if text_link[cur_repl]<module_flag then {link to a continuation}
1514  begin cur_repl:=text_link[cur_repl]; {we will stay on the same level}
1515  zo:=cur_repl mod zz;
1516  cur_byte:=tok_start[cur_repl]; cur_end:=tok_start[cur_repl+zz];
1517  return;
1518  end;
1519decr(stack_ptr); {we will go down to the previous level}
1520if stack_ptr>0 then
1521  begin cur_state:=stack[stack_ptr]; zo:=cur_repl mod zz;
1522  end;
1523exit: end;
1524
1525@ The heart of the output procedure is the |get_output| routine, which produces
1526the next token of output that is not a reference to a macro. This procedure
1527handles all the stacking and unstacking that is necessary. It returns the
1528value |number| if the next output has a numeric value (the value of a
1529numeric macro or string), in which case |cur_val| has been set to the
1530number in question. The procedure also returns the value |module_number|
1531if the next output begins or ends the replacement text of some module,
1532in which case |cur_val| is that module's number (if beginning) or the
1533negative of that value (if ending). And it returns the value |identifier|
1534if the next output is an identifier of length two or more, in which case
1535|cur_val| points to that identifier name.
1536
1537@d number=@'200 {code returned by |get_output| when next output is numeric}
1538@d module_number=@'201 {code returned by |get_output| for module numbers}
1539@d identifier=@'202 {code returned by |get_output| for identifiers}
1540
1541@<Globals...@>=
1542@!cur_val:integer; {additional information corresponding to output token}
1543
1544@ If |get_output| finds that no more output remains, it returns the value zero.
1545
1546@p function get_output:sixteen_bits; {returns next token after macro expansion}
1547label restart, done, found;
1548var a:sixteen_bits; {value of current byte}
1549@!b:eight_bits; {byte being copied}
1550@!bal:sixteen_bits; {excess of \.( versus \.) while copying a parameter}
1551@!k:0..max_bytes; {index into |byte_mem|}
1552@!w:0..ww-1; {segment of |byte_mem|}
1553begin restart: if stack_ptr=0 then
1554  begin a:=0; goto found;
1555  end;
1556if cur_byte=cur_end then
1557  begin cur_val:=-cur_mod; pop_level;
1558  if cur_val=0 then goto restart;
1559  a:=module_number; goto found;
1560  end;
1561a:=tok_mem[zo,cur_byte]; incr(cur_byte);
1562if a<@'200 then {one-byte token}
1563  if a=param then
1564      @<Start scanning current macro parameter, |goto restart|@>
1565  else goto found;
1566a:=(a-@'200)*@'400+tok_mem[zo,cur_byte]; incr(cur_byte);
1567if a<@'24000 then {|@'24000=(@'250-@'200)*@'400|}
1568  @<Expand macro |a| and |goto found|, or |goto restart| if no output found@>;
1569if a<@'50000 then {|@'50000=(@'320-@'200)*@'400|}
1570  @<Expand module |a-@'24000|, |goto restart|@>;
1571cur_val:=a-@'50000; a:=module_number; cur_mod:=cur_val;
1572found:
1573@!debug if trouble_shooting then debug_help;@;@+gubed@/
1574get_output:=a;
1575end;
1576
1577@ The user may have forgotten to give any \PASCAL\ text for a module name,
1578or the \PASCAL\ text may have been associated with a different name by mistake.
1579
1580@<Expand module |a-...@>=
1581begin a:=a-@'24000;
1582if equiv[a]<>0 then push_level(a)
1583else if a<>0 then
1584  begin print_nl('! Not present: <'); print_id(a); print('>'); error;
1585@.Not present: <section name>@>
1586  end;
1587goto restart;
1588end
1589
1590@ @<Expand macro ...@>=
1591begin case ilk[a] of
1592normal: begin cur_val:=a; a:=identifier;
1593  end;
1594numeric: begin cur_val:=equiv[a]-1073741824; a:=number;
1595  end;
1596simple: begin push_level(a); goto restart;
1597  end;
1598parametric: begin @<Put a parameter on the parameter stack,
1599  or |goto restart| if error occurs@>;
1600  push_level(a); goto restart;
1601  end;
1602othercases confusion('output')
1603endcases;@/
1604goto found;
1605end
1606
1607@ We come now to the interesting part, the job of putting a parameter on
1608the parameter stack. First we pop the stack if necessary until getting to
1609a level that hasn't ended. Then the next character must be a `\.(';
1610and since parentheses are balanced on each level, the entire parameter must
1611be present, so we can copy it without difficulty.
1612
1613@<Put a parameter...@>=
1614while (cur_byte=cur_end)and(stack_ptr>0) do pop_level;
1615if (stack_ptr=0)or(tok_mem[zo,cur_byte]<>"(") then
1616  begin print_nl('! No parameter given for '); print_id(a); error;
1617@.No parameter given for macro@>
1618  goto restart;
1619  end;
1620@<Copy the parameter into |tok_mem|@>;
1621equiv[name_ptr]:=text_ptr; ilk[name_ptr]:=simple; w:=name_ptr mod ww;
1622k:=byte_ptr[w];
1623@!debug if k=max_bytes then overflow('byte memory');
1624byte_mem[w,k]:="#"; incr(k); byte_ptr[w]:=k;
1625gubed {this code has set the parameter identifier for debugging printouts}
1626if name_ptr>max_names-ww then overflow('name');
1627byte_start[name_ptr+ww]:=k; incr(name_ptr);
1628if text_ptr>max_texts-zz then overflow('text');
1629text_link[text_ptr]:=0; tok_start[text_ptr+zz]:=tok_ptr[z];
1630incr(text_ptr);
1631z:=text_ptr mod zz
1632
1633@ The |pop_level| routine undoes the effect of parameter-pushing when
1634a parameter macro is finished:
1635
1636@<Remove a parameter...@>=
1637begin decr(name_ptr); decr(text_ptr);
1638z:=text_ptr mod zz;
1639stat if tok_ptr[z]>max_tok_ptr[z] then max_tok_ptr[z]:=tok_ptr[z];
1640tats {the maximum value of |tok_ptr| occurs just before parameter popping}
1641tok_ptr[z]:=tok_start[text_ptr];
1642@!debug decr(byte_ptr[name_ptr mod ww]);@+gubed
1643end
1644
1645@ When a parameter occurs in a replacement text, we treat it as a simple
1646macro in position (|name_ptr-1|):
1647
1648@<Start scanning...@>=
1649begin push_level(name_ptr-1); goto restart;
1650end
1651
1652@ Similarly, a |param| token encountered as we copy a parameter is converted
1653into a simple macro call for |name_ptr-1|.
1654Some care is needed to handle cases like \\{macro}|(#; print('#)'))|; the
1655\.{\#} token will have been changed to |param| outside of strings, but we
1656still must distinguish `real' parentheses from those in strings.
1657
1658@d app_repl(#)==begin if tok_ptr[z]=max_toks then overflow('token');
1659  tok_mem[z,tok_ptr[z]]:=#; incr(tok_ptr[z]); end
1660
1661@<Copy the parameter...@>=
1662bal:=1; incr(cur_byte); {skip the opening `\.('}
1663loop@+  begin b:=tok_mem[zo,cur_byte]; incr(cur_byte);
1664  if b=param then store_two_bytes(name_ptr+@'77777)
1665  else  begin if b>=@'200 then
1666      begin app_repl(b);
1667      b:=tok_mem[zo,cur_byte]; incr(cur_byte);
1668      end
1669    else   case b of
1670      "(": incr(bal);
1671      ")":  begin decr(bal);
1672        if bal=0 then goto done;
1673        end;
1674      "'": repeat app_repl(b);
1675        b:=tok_mem[zo,cur_byte]; incr(cur_byte);
1676        until b="'"; {copy string, don't change |bal|}
1677      othercases do_nothing
1678      endcases;
1679    app_repl(b);
1680    end;
1681  end;
1682done:
1683
1684@* Producing the output.
1685The |get_output| routine above handles most of the complexity of output
1686generation, but there are two further considerations that have a nontrivial
1687effect on \.{TANGLE}'s algorithms.
1688
1689First, we want to make sure that the output is broken into lines not
1690exceeding |line_length| characters per line, where these breaks occur at
1691valid places (e.g., not in the middle of a string or a constant or an
1692identifier, not between `\.<' and `\.>', not at a `\.{@@\&}' position
1693where quantities are being joined together). Therefore we assemble the
1694output into a buffer before deciding where the line breaks will appear.
1695However, we make very little attempt to make ``logical'' line breaks that
1696would enhance the readability of the output; people are supposed to read
1697the input of \.{TANGLE} or the \TeX ed output of \.{WEAVE}, but not the
1698tangled-up output. The only concession to readability is that a break after
1699a semicolon will be made if possible, since commonly used ``pretty
1700printing'' routines give better results in such cases.
1701
1702Second, we want to decimalize non-decimal constants, and to combine integer
1703quantities that are added or subtracted, because \PASCAL\ doesn't allow
1704constant expressions in subrange types or in case labels. This means we
1705want to have a procedure that treats a construction like \.{(E-15+17)}
1706as equivalent to `\.{(E+2)}', while also leaving `\.{(1E-15+17)}' and
1707`\.{(E-15+17*y)}' untouched. Consider also `\.{-15+17.5}' versus
1708`\.{-15+17..5}'. We shall not combine integers preceding or following
1709\.*, \./, \.{div}, \.{mod}, or \.{@@\&}. Note that if |y| has been defined
1710to equal $-2$, we must expand `\.{x*y}' into `\.{x*(-2)}'; but `\.{x-y}'
1711can expand into `\.{x+2}' and we can even change `\.{x - y mod z}' to
1712@^mod@>
1713`\.{x + 2 mod z}' because \PASCAL\ has a nonstandard \&{mod} operation!
1714
1715The following solution to these problems has been adopted: An array
1716|out_buf| contains characters that have been generated but not yet output,
1717and there are three pointers into this array. One of these, |out_ptr|, is
1718the number of characters currently in the buffer, and we will have
1719|1<=out_ptr<=line_length| most of the time. The second is |break_ptr|,
1720which is the largest value |<=out_ptr| such that we are definitely entitled
1721to end a line by outputting the characters |out_buf[1..(break_ptr-1)]|;
1722we will always have |break_ptr<=line_length|. Finally, |semi_ptr| is either
1723zero or the largest known value of a legal break after a semicolon or comment
1724on the current line; we will always have |semi_ptr<=break_ptr|.
1725
1726@<Globals...@>=
1727@!out_buf: array [0..out_buf_size] of ASCII_code; {assembled characters}
1728@!out_ptr: 0..out_buf_size; {first available place in |out_buf|}
1729@!break_ptr: 0..out_buf_size; {last breaking place in |out_buf|}
1730@!semi_ptr: 0..out_buf_size; {last semicolon breaking place in |out_buf|}
1731
1732@ Besides having those three pointers,
1733the output process is in one of several states:
1734
1735\yskip\hang |num_or_id| means that the last item in the buffer is a number or
1736identifier, hence a blank space or line break must be inserted if the next
1737item is also a number or identifier.
1738
1739\yskip\hang |unbreakable| means that the last item in the buffer was followed
1740by the \.{@@\&} operation that inhibits spaces between it and the next item.
1741
1742\yskip\hang |sign| means that the last item in the buffer is to be followed
1743by \.+ or \.-, depending on whether |out_app| is positive or negative.
1744
1745\yskip\hang |sign_val| means that the decimal equivalent of
1746$\vert|out_val|\vert$ should be appended to the buffer. If |out_val<0|,
1747or if |out_val=0| and |last_sign<0|, the number should be preceded by a minus
1748sign. Otherwise it should be preceded by the character |out_sign| unless
1749|out_sign=0|; the |out_sign| variable is either 0 or \.{"\ "} or \.{"+"}.
1750
1751\yskip\hang |sign_val_sign| is like |sign_val|, but also append \.+ or \.-
1752afterwards, depending on whether |out_app| is positive or negative.
1753
1754\yskip\hang |sign_val_val| is like |sign_val|, but also append the decimal
1755equivalent of |out_app| including its sign, using |last_sign| in case
1756|out_app=0|.
1757
1758\yskip\hang |misc| means none of the above.
1759
1760\yskip\noindent
1761For example, the output buffer and output state run through the following
1762sequence as we generate characters from `\.{(x-15+19-2)}':
1763$$\vbox{\halign{$\hfil#\hfil$\quad&#\hfil&\quad\hfil#\hfil&\quad
1764\hfil#\hfil&\quad\hfil#\hfil&\quad\hfil#\hfil\quad&\hfil#\hfil\cr
1765output&|out_buf|&|out_state|&|out_sign|&|out_val|&|out_app|&|last_sign|\cr
1766\noalign{\vskip 3pt}
1767(&\.(&|misc|\cr
1768x&\.{(x}&|num_or_id|\cr
1769-&\.{(x}&|sign|&&&$-1$&$-1$\cr
177015&\.{(x}&|sign_val|&\.{"+"}&$-15$&&$-15$\cr
1771+&\.{(x}&|sign_val_sign|&\.{"+"}&$-15$&$+1$&$+1$\cr
177219&\.{(x}&|sign_val_val|&\.{"+"}&$-15$&$+19$&$+1$\cr
1773-&\.{(x}&|sign_val_sign|&\.{"+"}&$+4$&$-1$&$-1$\cr
17742&\.{(x}&|sign_val_val|&\.{"+"}&$+4$&$-2$&$-2$\cr
1775)&\.{(x+2)}&|misc|\cr}}$$
1776At each stage we have put as much into the buffer as possible without
1777knowing what is coming next. Examples like `\.{x-0.1}' indicate why
1778|last_sign| is needed to associate the proper sign with an output of zero.
1779
1780In states |num_or_id|, |unbreakable|, and |misc| the last item in the buffer
1781lies between |break_ptr| and |out_ptr-1|, inclusive; in the other states we
1782have |break_ptr=out_ptr|.
1783
1784The numeric values assigned to |num_or_id|, etc., have been chosen to
1785shorten some of the program logic; for example, the program makes use of
1786the fact that |sign+2=sign_val_sign|.
1787
1788@d misc=0 {state associated with special characters}
1789@d num_or_id=1 {state associated with numbers and identifiers}
1790@d sign=2 {state associated with pending \.+ or \.-}
1791@d sign_val=num_or_id+2 {state associated with pending sign and value}
1792@d sign_val_sign=sign+2 {|sign_val| followed by another pending sign}
1793@d sign_val_val=sign_val+2 {|sign_val| followed by another pending value}
1794@d unbreakable=sign_val_val+1 {state associated with \.{@@\&}}
1795
1796@<Globals...@>=
1797@!out_state:eight_bits; {current status of partial output}
1798@!out_val,@!out_app:integer; {pending values}
1799@!out_sign:ASCII_code; {sign to use if appending |out_val>=0|}
1800@!last_sign:-1..+1; {sign to use if appending a zero}
1801
1802@ During the output process, |line| will equal the number of the next line
1803to be output.
1804
1805@<Initialize the output buffer@>=
1806out_state:=misc; out_ptr:=0; break_ptr:=0; semi_ptr:=0; out_buf[0]:=0; line:=1;
1807
1808@ Here is a routine that is invoked when |out_ptr>line_length|
1809or when it is time to flush out the final line. The |flush_buffer| procedure
1810often writes out the line up to the current |break_ptr| position, then moves the
1811remaining information to the front of |out_buf|. However, it prefers to
1812write only up to |semi_ptr|, if the residual line won't be too long.
1813
1814@d check_break==if out_ptr>line_length then flush_buffer
1815
1816@p procedure flush_buffer; {writes one line to output file}
1817var k:0..out_buf_size; {index into |out_buf|}
1818@!b:0..out_buf_size; {value of |break_ptr| upon entry}
1819begin b:=break_ptr;
1820if (semi_ptr<>0)and(out_ptr-semi_ptr<=line_length) then break_ptr:=semi_ptr;
1821for k:=1 to break_ptr do write(Pascal_file,xchr[out_buf[k-1]]);
1822write_ln(Pascal_file); incr(line);
1823if line mod 100 = 0 then
1824  begin print('.');
1825  if line mod 500 = 0 then print(line:1);
1826  update_terminal; {progress report}
1827  end;
1828if break_ptr<out_ptr then
1829  begin if out_buf[break_ptr]=" " then
1830    begin incr(break_ptr); {drop space at break}
1831    if break_ptr>b then b:=break_ptr;
1832    end;
1833  for k:=break_ptr to out_ptr-1 do out_buf[k-break_ptr]:=out_buf[k];
1834  end;
1835out_ptr:=out_ptr-break_ptr; break_ptr:=b-break_ptr; semi_ptr:=0;
1836if out_ptr>line_length then
1837  begin err_print('! Long line must be truncated'); out_ptr:=line_length;
1838@.Long line must be truncated@>
1839  end;
1840end;
1841
1842@ @<Empty the last line from the buffer@>=
1843break_ptr:=out_ptr; semi_ptr:=0; flush_buffer;
1844if brace_level<>0 then
1845  err_print('! Program ended at brace level ',brace_level:1);
1846@.Program ended at brace level n@>
1847
1848@ Another simple and useful routine appends the decimal equivalent of
1849a nonnegative integer to the output buffer.
1850
1851@d app(#)==begin out_buf[out_ptr]:=#; incr(out_ptr); {append a single character}
1852  end
1853
1854@p procedure app_val(@!v:integer); {puts |v| into buffer, assumes |v>=0|}
1855var k:0..out_buf_size; {index into |out_buf|}
1856begin k:=out_buf_size; {first we put the digits at the very end of |out_buf|}
1857repeat out_buf[k]:=v mod 10; v:=v div 10; decr(k);
1858until v=0;
1859repeat incr(k); app(out_buf[k]+"0");
1860until k=out_buf_size; {then we append them, most significant first}
1861end;
1862
1863@ The output states are kept up to date by the output routines, which are
1864called |send_out|, |send_val|, and |send_sign|. The |send_out| procedure
1865has two parameters: |t| tells the type of information being sent and
1866|v| contains the information proper. Some information may also be passed
1867in the array |out_contrib|.
1868
1869\yskip\hang If |t=misc| then |v| is a character to be output.
1870
1871\hang If |t=str| then |v| is the length of a string or something like `\.{<>}'
1872in |out_contrib|.
1873
1874\hang If |t=ident| then |v| is the length of an identifier in |out_contrib|.
1875
1876\hang If |t=frac| then |v| is the length of a fraction and/or exponent in
1877|out_contrib|.
1878
1879@d str=1 {|send_out| code for a string}
1880@d ident=2 {|send_out| code for an identifier}
1881@d frac=3 {|send_out| code for a fraction}
1882
1883@<Glob...@>=
1884@!out_contrib:array[1..line_length] of ASCII_code; {a contribution to |out_buf|}
1885
1886@ A slightly subtle point in the following code is that the user may ask
1887for a |join| operation (i.e., \.{@@\&}) following whatever is being sent
1888out.  We will see later that |join| is implemented in part by calling
1889|send_out(frac,0)|.
1890
1891@p procedure send_out(@!t:eight_bits; @!v:sixteen_bits);
1892  {outputs |v| of type |t|}
1893label restart;
1894var k: 0..line_length; {index into |out_contrib|}
1895begin @<Get the buffer ready for appending the new information@>;
1896if t<>misc then for k:=1 to v do app(out_contrib[k])
1897else app(v);
1898check_break;
1899if (t=misc)and((v=";")or(v="}")) then
1900  begin semi_ptr:=out_ptr; break_ptr:=out_ptr;
1901  end;
1902if t>=ident then out_state:=num_or_id {|t=ident| or |frac|}
1903else out_state:=misc {|t=str| or |misc|}
1904end;
1905
1906@ Here is where the buffer states for signs and values collapse into simpler
1907states, because we are about to append something that doesn't combine with
1908the previous integer constants.
1909
1910We use an ASCII-code trick: Since |","-1="+"| and |","+1="-"|, we have
1911|","-c=@t sign of $c$@>|, when $\vert c\vert=1$.
1912
1913@<Get the buffer ready...@>=
1914restart: case out_state of
1915num_or_id: if t<>frac then
1916  begin break_ptr:=out_ptr;
1917  if t=ident then app(" ");
1918  end;
1919sign: begin app(","-out_app); check_break; break_ptr:=out_ptr;
1920  end;
1921sign_val,sign_val_sign: begin @<Append \(|out_val| to buffer@>;
1922  out_state:=out_state-2; goto restart;
1923  end;
1924sign_val_val: @<Reduce |sign_val_val| to |sign_val| and |goto restart|@>;
1925misc: if t<>frac then break_ptr:=out_ptr;@/
1926othercases do_nothing {this is for |unbreakable| state}
1927endcases
1928
1929@ @<Append \(|out_val|...@>=
1930if (out_val<0)or((out_val=0)and(last_sign<0)) then app("-")
1931else if out_sign>0 then app(out_sign);
1932app_val(abs(out_val)); check_break;
1933
1934@ @<Reduce |sign_val_val|...@>=
1935begin if (t=frac)or(@<Contribution is \.* or \./ or \.{DIV} or \.{MOD}@>) then
1936  begin @<Append \(|out_val| to buffer@>;
1937  out_sign:="+"; out_val:=out_app;
1938  end
1939else out_val:=out_val+out_app;
1940out_state:=sign_val; goto restart;
1941end
1942
1943@ @<Contribution is \.*...@>=
1944((t=ident)and(v=3)and@|
1945 (((out_contrib[1]="D")and(out_contrib[2]="I")and(out_contrib[3]="V")) or@|
1946 ((out_contrib[1]="M")and(out_contrib[2]="O")and(out_contrib[3]="D")) ))or@|
1947@^uppercase@>
1948 ((t=misc)and((v="*")or(v="/")))
1949
1950@ The following routine is called with $v=\pm1$ when a plus or minus sign is
1951appended to the output. It extends \PASCAL\ to allow repeated signs
1952(e.g., `\.{--}' is equivalent to `\.+'), rather than to give an error message.
1953The signs following `\.E' in real constants are treated as part of a fraction,
1954so they are not seen by this routine.
1955
1956@p procedure send_sign(@!v:integer);
1957begin case out_state of
1958sign, sign_val_sign: out_app:=out_app*v;
1959sign_val:begin out_app:=v; out_state:=sign_val_sign;
1960  end;
1961sign_val_val: begin out_val:=out_val+out_app; out_app:=v;
1962  out_state:=sign_val_sign;
1963  end;
1964othercases begin break_ptr:=out_ptr; out_app:=v; out_state:=sign;
1965  end
1966endcases;@/
1967last_sign:=out_app;
1968end;
1969
1970@ When a (signed) integer value is to be output, we call |send_val|.
1971
1972@d bad_case=666 {this is a label used below}
1973
1974@p procedure send_val(@!v:integer); {output the (signed) value |v|}
1975label bad_case, {go here if we can't keep |v| in the output state}
1976  exit;
1977begin case out_state of
1978num_or_id: begin @<If previous output was \.{DIV} or \.{MOD}, |goto bad_case|@>;
1979  out_sign:=" "; out_state:=sign_val; out_val:=v; break_ptr:=out_ptr;
1980  last_sign:=+1;
1981  end;
1982misc: begin @<If previous output was \.* or \./, |goto bad_case|@>;
1983  out_sign:=0; out_state:=sign_val; out_val:=v; break_ptr:=out_ptr;
1984  last_sign:=+1;
1985  end;
1986@t\4@>@<Handle cases of |send_val| when |out_state| contains a sign@>@;
1987othercases goto bad_case
1988endcases;@/
1989return;
1990bad_case: @<Append the decimal value of |v|, with parentheses if negative@>;
1991exit: end;
1992
1993@ @<Handle cases of |send_val|...@>=
1994sign: begin out_sign:="+"; out_state:=sign_val; out_val:=out_app*v;
1995  end;
1996sign_val: begin out_state:=sign_val_val; out_app:=v;
1997  err_print('! Two numbers occurred without a sign between them');
1998  end;
1999sign_val_sign: begin out_state:=sign_val_val; out_app:=out_app*v;
2000  end;
2001sign_val_val: begin out_val:=out_val+out_app; out_app:=v;
2002  err_print('! Two numbers occurred without a sign between them');
2003@.Two numbers occurred...@>
2004  end;
2005
2006@ @<If previous output was \.*...@>=
2007if (out_ptr=break_ptr+1)and((out_buf[break_ptr]="*")or(out_buf[break_ptr]="/"))
2008  then goto bad_case
2009
2010@ @<If previous output was \.{DIV}...@>=
2011if (out_ptr=break_ptr+3)or
2012 ((out_ptr=break_ptr+4)and(out_buf[break_ptr]=" ")) then
2013@^uppercase@>
2014  if ((out_buf[out_ptr-3]="D")and(out_buf[out_ptr-2]="I")and
2015    (out_buf[out_ptr-1]="V"))or @/
2016     ((out_buf[out_ptr-3]="M")and(out_buf[out_ptr-2]="O")and
2017    (out_buf[out_ptr-1]="D")) then@/ goto bad_case
2018
2019@ @<Append the decimal value...@>=
2020if v>=0 then
2021  begin if out_state=num_or_id then
2022    begin break_ptr:=out_ptr; app(" ");
2023    end;
2024  app_val(v); check_break; out_state:=num_or_id;
2025  end
2026else  begin app("("); app("-"); app_val(-v); app(")"); check_break;
2027  out_state:=misc;
2028  end
2029
2030@* The big output switch.
2031To complete the output process, we need a routine that takes the results
2032of |get_output| and feeds them to |send_out|, |send_val|, or |send_sign|.
2033This procedure `|send_the_output|' will be invoked just once, as follows:
2034
2035@<Phase II: Output the contents of the compressed tables@>=
2036if text_link[0]=0 then
2037  begin print_nl('! No output was specified.'); mark_harmless;
2038@.No output was specified@>
2039  end
2040else  begin print_nl('Writing the output file'); update_terminal;@/
2041  @<Initialize the output stacks@>;
2042  @<Initialize the output buffer@>;
2043  send_the_output;@/
2044  @<Empty the last line...@>;
2045  print_nl('Done.');
2046  end
2047
2048@ A many-way switch is used to send the output:
2049
2050@d get_fraction=2 {this label is used below}
2051
2052@p procedure send_the_output;
2053label get_fraction, {go here to finish scanning a real constant}
2054  reswitch, continue;
2055var cur_char:eight_bits; {the latest character received}
2056  @!k:0..line_length; {index into |out_contrib|}
2057  @!j:0..max_bytes; {index into |byte_mem|}
2058  @!w:0..ww-1; {segment of |byte_mem|}
2059  @!n:integer; {number being scanned}
2060begin while stack_ptr>0 do
2061  begin cur_char:=get_output;
2062  reswitch: case cur_char of
2063  0: do_nothing; {this case might arise if output ends unexpectedly}
2064  @t\4@>@<Cases related to identifiers@>@;
2065  @t\4@>@<Cases related to constants, possibly leading to
2066    |get_fraction| or |reswitch|@>@;
2067  "+","-": send_sign(","-cur_char);
2068  @t\4@>@<Cases like \.{<>} and \.{:=}@>@;
2069  "'": @<Send a string, |goto reswitch|@>;
2070  @<Other printable characters@>: send_out(misc,cur_char);
2071  @t\4@>@<Cases involving \.{@@\{} and \.{@@\}}@>@;
2072  join: begin send_out(frac,0); out_state:=unbreakable;
2073    end;
2074  verbatim: @<Send verbatim string@>;
2075  force_line: @<Force a line break@>;
2076  othercases err_print('! Can''t output ASCII code ',cur_char:1)
2077@.Can't output ASCII code n@>
2078  endcases;@/
2079  goto continue;
2080  get_fraction: @<Special code to finish real constants@>;
2081  continue: end;
2082end;
2083
2084@ @<Cases like \.{<>}...@>=
2085and_sign: begin out_contrib[1]:="A"; out_contrib[2]:="N"; out_contrib[3]:="D";
2086@^uppercase@>
2087  send_out(ident,3);
2088  end;
2089not_sign: begin out_contrib[1]:="N"; out_contrib[2]:="O"; out_contrib[3]:="T";
2090  send_out(ident,3);
2091  end;
2092set_element_sign: begin out_contrib[1]:="I"; out_contrib[2]:="N";
2093  send_out(ident,2);
2094  end;
2095or_sign: begin out_contrib[1]:="O"; out_contrib[2]:="R"; send_out(ident,2);
2096  end;
2097left_arrow: begin out_contrib[1]:=":"; out_contrib[2]:="="; send_out(str,2);
2098  end;
2099not_equal: begin out_contrib[1]:="<"; out_contrib[2]:=">"; send_out(str,2);
2100  end;
2101less_or_equal: begin out_contrib[1]:="<"; out_contrib[2]:="="; send_out(str,2);
2102  end;
2103greater_or_equal: begin out_contrib[1]:=">"; out_contrib[2]:="=";
2104  send_out(str,2);
2105  end;
2106equivalence_sign: begin out_contrib[1]:="="; out_contrib[2]:="=";
2107  send_out(str,2);
2108  end;
2109double_dot: begin out_contrib[1]:="."; out_contrib[2]:="."; send_out(str,2);
2110  end;
2111
2112@ Please don't ask how all of the following characters can actually get
2113through \.{TANGLE} outside of strings. It seems that |""""| and |"{"|
2114cannot actually occur at this point of the program, but they have
2115been included just in case \.{TANGLE} changes.
2116
2117If \.{TANGLE} is producing code for a \PASCAL\ compiler that uses `\.{(.}'
2118and `\.{.)}' instead of square brackets (e.g., on machines with {\mc EBCDIC}
2119code), one should remove |"["| and |"]"| from this list and put them into
2120the preceding module in the appropriate way. Similarly, some compilers
2121want `\.\^' to be converted to `\.{@@}'.
2122@^system dependencies@>@^EBCDIC@>
2123
2124@<Other printable characters@>=
2125"!","""","#","$","%","&","(",")","*",",","/",":",";","<","=",">","?",
2126"@@","[","\","]","^","_","`","{","|"
2127
2128@ Single-character identifiers represent themselves, while longer ones
2129appear in |byte_mem|. All must be converted to uppercase,
2130with underlines removed. Extremely long identifiers must be chopped.
2131
2132(Some \PASCAL\ compilers work with lowercase letters instead of
2133uppercase. If this module of \.{TANGLE} is changed, it's also necessary
2134to change from uppercase to lowercase in the modules that are
2135listed in the index under ``uppercase''.)
2136@^system dependencies@>
2137@^uppercase@>
2138
2139@d up_to(#)==#-24,#-23,#-22,#-21,#-20,#-19,#-18,#-17,#-16,#-15,#-14,
2140  #-13,#-12,#-11,#-10,#-9,#-8,#-7,#-6,#-5,#-4,#-3,#-2,#-1,#
2141
2142@<Cases related to identifiers@>=
2143"A",up_to("Z"): begin out_contrib[1]:=cur_char; send_out(ident,1);
2144  end;
2145"a",up_to("z"): begin out_contrib[1]:=cur_char-@'40; send_out(ident,1);
2146  end;
2147identifier: begin k:=0; j:=byte_start[cur_val]; w:=cur_val mod ww;
2148  while (k<max_id_length)and(j<byte_start[cur_val+ww]) do
2149    begin incr(k); out_contrib[k]:=byte_mem[w,j]; incr(j);
2150    if out_contrib[k]>="a" then out_contrib[k]:=out_contrib[k]-@'40
2151    else if out_contrib[k]="_" then decr(k);
2152    end;
2153  send_out(ident,k);
2154  end;
2155
2156@ After sending a string, we need to look ahead at the next character, in order
2157to see if there were two consecutive single-quote marks. Afterwards we go to
2158|reswitch| to process the next character.
2159
2160@<Send a string...@>=
2161begin k:=1; out_contrib[1]:="'";
2162repeat if k<line_length then incr(k);
2163out_contrib[k]:=get_output;
2164until (out_contrib[k]="'")or(stack_ptr=0);
2165if k=line_length then err_print('! String too long');
2166@.String too long@>
2167send_out(str,k); cur_char:=get_output;
2168if cur_char="'" then out_state:=unbreakable;
2169goto reswitch;
2170end
2171
2172@ Sending a verbatim string is similar, but we don't have to look ahead.
2173
2174@<Send verbatim string@>=
2175begin k:=0;
2176repeat if k<line_length then incr(k);
2177out_contrib[k]:=get_output;
2178until (out_contrib[k]=verbatim)or(stack_ptr=0);
2179if k=line_length then err_print('! Verbatim string too long');
2180@.Verbatim string too long@>
2181send_out(str,k-1);
2182end
2183
2184@ In order to encourage portable software, \.{TANGLE} complains
2185if the constants get dangerously close to the largest value representable
2186on a 32-bit computer ($2^{31}-1$).
2187
2188@d digits=="0","1","2","3","4","5","6","7","8","9"
2189
2190@<Cases related to constants...@>=
2191digits: begin n:=0;
2192  repeat cur_char:=cur_char-"0";
2193  if n>=@'1463146314 then err_print('! Constant too big')
2194@.Constant too big@>
2195  else n:=10*n+cur_char;
2196  cur_char:=get_output;
2197  until (cur_char>"9")or(cur_char<"0");
2198  send_val(n); k:=0;
2199  if cur_char="e" then cur_char:="E";
2200@^uppercase@>
2201  if cur_char="E" then goto get_fraction
2202  else goto reswitch;
2203  end;
2204check_sum: send_val(pool_check_sum);
2205octal: begin n:=0; cur_char:="0";
2206  repeat cur_char:=cur_char-"0";
2207  if n>=@'2000000000 then err_print('! Constant too big')
2208  else n:=8*n+cur_char;
2209  cur_char:=get_output;
2210  until (cur_char>"7")or(cur_char<"0");
2211  send_val(n); goto reswitch;
2212  end;
2213hex: begin n:=0; cur_char:="0";
2214  repeat if cur_char>="A" then cur_char:=cur_char+10-"A"
2215  else cur_char:=cur_char-"0";
2216  if n>=@"8000000 then err_print('! Constant too big')
2217  else n:=16*n+cur_char;
2218  cur_char:=get_output;
2219  until (cur_char>"F")or(cur_char<"0")or@|
2220    ((cur_char>"9")and(cur_char<"A"));
2221  send_val(n); goto reswitch;
2222  end;
2223number: send_val(cur_val);
2224".":  begin k:=1; out_contrib[1]:="."; cur_char:=get_output;
2225  if cur_char="." then
2226    begin out_contrib[2]:="."; send_out(str,2);
2227    end
2228  else if (cur_char>="0")and(cur_char<="9") then goto get_fraction
2229  else  begin send_out(misc,"."); goto reswitch;
2230    end;
2231  end;
2232
2233@ The following code appears at label `|get_fraction|', when we want to
2234scan to the end of a real constant. The first |k| characters of a fraction
2235have already been placed in |out_contrib|, and |cur_char| is the next character.
2236
2237@<Special code...@>=
2238repeat if k<line_length then incr(k);
2239out_contrib[k]:=cur_char; cur_char:=get_output;
2240if (out_contrib[k]="E")and((cur_char="+")or(cur_char="-")) then
2241@^uppercase@>
2242  begin if k<line_length then incr(k);
2243  out_contrib[k]:=cur_char; cur_char:=get_output;
2244  end
2245else if cur_char="e" then cur_char:="E";
2246until (cur_char<>"E")and((cur_char<"0")or(cur_char>"9"));
2247if k=line_length then err_print('! Fraction too long');
2248@.Fraction too long@>
2249send_out(frac,k); goto reswitch
2250
2251@ Some \PASCAL\ compilers do not recognize comments in braces, so the
2252comments must be delimited by `\.{(*}' and `\.{*)}'.
2253@^system dependencies@>
2254In such cases the statement `|send_out(misc,"{")|' that appears here should
2255be replaced by `\ignorespaces|begin out_contrib[1]:="("; out_contrib[2]:="*";
2256send_out(str,2); end|', and a similar change should be made to
2257`|send_out(misc,"}")|'.
2258
2259@<Cases involving \.{@@\{} and \.{@@\}}@>=
2260begin_comment: begin if brace_level=0 then send_out(misc,"{")
2261  else send_out(misc,"[");
2262  incr(brace_level);
2263  end;
2264end_comment: if brace_level>0 then
2265    begin decr(brace_level);
2266    if brace_level=0 then send_out(misc,"}")
2267    else send_out(misc,"]");
2268    end
2269  else err_print('! Extra @@}');
2270@.Extra \AT!\}@>
2271module_number: begin if brace_level=0 then send_out(misc,"{")
2272  else send_out(misc,"[");
2273  if cur_val<0 then
2274    begin send_out(misc,":"); send_val(-cur_val);
2275    end
2276  else  begin send_val(cur_val); send_out(misc,":");
2277    end;
2278  if brace_level=0 then send_out(misc,"}")
2279  else send_out(misc,"]");
2280  end;
2281
2282@ @<Force a line break@>=
2283begin send_out(str,0); {normalize the buffer}
2284while out_ptr>0 do
2285  begin if out_ptr<=line_length then break_ptr:=out_ptr;
2286  flush_buffer;
2287  end;
2288out_state:=misc;
2289end
2290
2291@* Introduction to the input phase.
2292We have now seen that \.{TANGLE} will be able to output the full
2293\PASCAL\ program, if we can only get that program into the byte memory in
2294the proper format. The input process is something like the output process
2295in reverse, since we compress the text as we read it in and we expand it
2296as we write it out.
2297
2298There are three main input routines. The most interesting is the one that gets
2299the next token of a \PASCAL\ text; the other two are used to scan rapidly past
2300\TeX\ text in the \.{WEB} source code. One of the latter routines will jump to
2301the next token that starts with `\.{@@}', and the other skips to the end
2302of a \PASCAL\ comment.
2303
2304@ But first we need to consider the low-level routine |get_line|
2305that takes care of merging |change_file| into |web_file|. The |get_line|
2306procedure also updates the line numbers for error messages.
2307
2308@<Globals...@>=
2309@!ii:integer; {general purpose |for| loop variable in the outer block}
2310@!line:integer; {the number of the current line in the current file}
2311@!other_line:integer; {the number of the current line in the input file that
2312  is not currently being read}
2313@!temp_line:integer; {used when interchanging |line| with |other_line|}
2314@!limit:0..buf_size; {the last character position occupied in the buffer}
2315@!loc:0..buf_size; {the next character position to be read from the buffer}
2316@!input_has_ended: boolean; {if |true|, there is no more input}
2317@!changing: boolean; {if |true|, the current line is from |change_file|}
2318
2319@ As we change |changing| from |true| to |false| and back again, we must
2320remember to swap the values of |line| and |other_line| so that the |err_print|
2321routine will be sure to report the correct line number.
2322
2323@d change_changing==
2324  changing := not changing;
2325  temp_line:=other_line; other_line:=line; line:=temp_line
2326    {|line @t$\null\BA\null$@> other_line|}
2327
2328@ When |changing| is |false|, the next line of |change_file| is kept in
2329|change_buffer[0..change_limit]|, for purposes of comparison with the next
2330line of |web_file|. After the change file has been completely input, we
2331set |change_limit:=0|, so that no further matches will be made.
2332
2333@<Globals...@>=
2334@!change_buffer:array[0..buf_size] of ASCII_code;
2335@!change_limit:0..buf_size; {the last position occupied in |change_buffer|}
2336
2337@ Here's a simple function that checks if the two buffers are different.
2338
2339@p function lines_dont_match:boolean;
2340label exit;
2341var k:0..buf_size; {index into the buffers}
2342begin lines_dont_match:=true;
2343if change_limit<>limit then return;
2344if limit>0 then
2345  for k:=0 to limit-1 do if change_buffer[k]<>buffer[k] then return;
2346lines_dont_match:=false;
2347exit: end;
2348
2349@ Procedure |prime_the_change_buffer| sets |change_buffer| in preparation
2350for the next matching operation. Since blank lines in the change file are
2351not used for matching, we have |(change_limit=0)and not changing| if and
2352only if the change file is exhausted. This procedure is called only
2353when |changing| is true; hence error messages will be reported correctly.
2354
2355@p procedure prime_the_change_buffer;
2356label continue, done, exit;
2357var k:0..buf_size; {index into the buffers}
2358begin change_limit:=0; {this value will be used if the change file ends}
2359@<Skip over comment lines in the change file; |return| if end of file@>;
2360@<Skip to the next nonblank line; |return| if end of file@>;
2361@<Move |buffer| and |limit| to |change_buffer| and |change_limit|@>;
2362exit: end;
2363
2364@ While looking for a line that begins with \.{@@x} in the change file,
2365we allow lines that begin with \.{@@}, as long as they don't begin with
2366\.{@@y} or \.{@@z} (which would probably indicate that the change file is
2367fouled up).
2368
2369@<Skip over comment lines in the change file...@>=
2370loop@+  begin incr(line);
2371  if not input_ln(change_file) then return;
2372  if limit<2 then goto continue;
2373  if buffer[0]<>"@@" then goto continue;
2374  if (buffer[1]>="X")and(buffer[1]<="Z") then
2375    buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify}
2376  if buffer[1]="x" then goto done;
2377  if (buffer[1]="y")or(buffer[1]="z") then
2378    begin loc:=2; err_print('! Where is the matching @@x?');
2379@.Where is the match...@>
2380    end;
2381continue: end;
2382done:
2383
2384@ Here we are looking at lines following the \.{@@x}.
2385
2386@<Skip to the next nonblank line...@>=
2387repeat incr(line);
2388  if not input_ln(change_file) then
2389    begin err_print('! Change file ended after @@x');
2390@.Change file ended...@>
2391    return;
2392    end;
2393until limit>0;
2394
2395@ @<Move |buffer| and |limit| to |change_buffer| and |change_limit|@>=
2396begin change_limit:=limit;
2397if limit>0 then for k:=0 to limit-1 do change_buffer[k]:=buffer[k];
2398end
2399
2400@ The following procedure is used to see if the next change entry should
2401go into effect; it is called only when |changing| is false.
2402The idea is to test whether or not the current
2403contents of |buffer| matches the current contents of |change_buffer|.
2404If not, there's nothing more to do; but if so, a change is called for:
2405All of the text down to the \.{@@y} is supposed to match. An error
2406message is issued if any discrepancy is found. Then the procedure
2407prepares to read the next line from |change_file|.
2408
2409@p procedure check_change; {switches to |change_file| if the buffers match}
2410label exit;
2411var n:integer; {the number of discrepancies found}
2412@!k:0..buf_size; {index into the buffers}
2413begin if lines_dont_match then return;
2414n:=0;
2415loop@+  begin change_changing; {now it's |true|}
2416  incr(line);
2417  if not input_ln(change_file) then
2418    begin err_print('! Change file ended before @@y');
2419@.Change file ended...@>
2420    change_limit:=0;  change_changing; {|false| again}
2421    return;
2422    end;
2423  @<If the current line starts with \.{@@y},
2424    report any discrepancies and |return|@>;
2425  @<Move |buffer| and |limit|...@>;
2426  change_changing; {now it's |false|}
2427  incr(line);
2428  if not input_ln(web_file) then
2429    begin err_print('! WEB file ended during a change');
2430@.WEB file ended...@>
2431    input_has_ended:=true; return;
2432    end;
2433  if lines_dont_match then incr(n);
2434  end;
2435exit: end;
2436
2437@ @<If the current line starts with \.{@@y}...@>=
2438if limit>1 then if buffer[0]="@@" then
2439  begin if (buffer[1]>="X")and(buffer[1]<="Z") then
2440    buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify}
2441  if (buffer[1]="x")or(buffer[1]="z") then
2442    begin loc:=2; err_print('! Where is the matching @@y?');
2443@.Where is the match...@>
2444    end
2445  else if buffer[1]="y" then
2446    begin if n>0 then
2447      begin loc:=2; err_print('! Hmm... ',n:1,
2448        ' of the preceding lines failed to match');
2449@.Hmm... n of the preceding...@>
2450      end;
2451    return;
2452    end;
2453  end
2454
2455@ @<Initialize the input system@>=
2456open_input; line:=0; other_line:=0;@/
2457changing:=true; prime_the_change_buffer; change_changing;@/
2458limit:=0; loc:=1; buffer[0]:=" "; input_has_ended:=false;
2459
2460@ The |get_line| procedure is called when |loc>limit|; it puts the next
2461line of merged input into the buffer and updates the other variables
2462appropriately. A space is placed at the right end of the line.
2463
2464@p procedure get_line; {inputs the next line}
2465label restart;
2466begin restart: if changing then
2467  @<Read from |change_file| and maybe turn off |changing|@>;
2468if not changing then
2469  begin @<Read from |web_file| and maybe turn on |changing|@>;
2470  if changing then goto restart;
2471  end;
2472loc:=0; buffer[limit]:=" ";
2473end;
2474
2475@ @<Read from |web_file|...@>=
2476begin incr(line);
2477if not input_ln(web_file) then input_has_ended:=true
2478else if limit=change_limit then
2479  if buffer[0]=change_buffer[0] then
2480    if change_limit>0 then check_change;
2481end
2482
2483@ @<Read from |change_file|...@>=
2484begin incr(line);
2485if not input_ln(change_file) then
2486  begin err_print('! Change file ended without @@z');
2487@.Change file ended...@>
2488  buffer[0]:="@@"; buffer[1]:="z"; limit:=2;
2489  end;
2490if limit>1 then {check if the change has ended}
2491  if buffer[0]="@@" then
2492    begin if (buffer[1]>="X")and(buffer[1]<="Z") then
2493      buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify}
2494    if (buffer[1]="x")or(buffer[1]="y") then
2495      begin loc:=2; err_print('! Where is the matching @@z?');
2496@.Where is the match...@>
2497      end
2498    else if buffer[1]="z" then
2499      begin prime_the_change_buffer; change_changing;
2500      end;
2501    end;
2502end
2503
2504@ At the end of the program, we will tell the user if the change file
2505had a line that didn't match any relevant line in |web_file|.
2506
2507@<Check that all changes have been read@>=
2508if change_limit<>0 then {|changing| is false}
2509  begin for ii:=0 to change_limit do buffer[ii]:=change_buffer[ii];
2510  limit:=change_limit; changing:=true; line:=other_line; loc:=change_limit;
2511  err_print('! Change file entry did not match');
2512@.Change file entry did not match@>
2513  end
2514
2515@ Important milestones are reached during the input phase when certain
2516control codes are sensed.
2517
2518Control codes in \.{WEB} begin with `\.{@@}', and the next character
2519identifies the code. Some of these are of interest only to \.{WEAVE},
2520so \.{TANGLE} ignores them; the others are converted by \.{TANGLE} into
2521internal code numbers by the |control_code| function below. The ordering
2522of these internal code numbers has been chosen to simplify the program logic;
2523larger numbers are given to the control codes that denote more significant
2524milestones.
2525
2526@d ignore=0 {control code of no interest to \.{TANGLE}}
2527@d control_text=@'203 {control code for `\.{@@t}', `\.{@@\^}', etc.}
2528@d format=@'204 {control code for `\.{@@f}'}
2529@d definition=@'205 {control code for `\.{@@d}'}
2530@d begin_Pascal=@'206 {control code for `\.{@@p}'}
2531@d module_name=@'207 {control code for `\.{@@<}'}
2532@d new_module=@'210 {control code for `\.{@@\ }' and `\.{@@*}'}
2533
2534@p function control_code(@!c:ASCII_code):eight_bits; {convert |c| after \.{@@}}
2535begin case c of
2536"@@": control_code:="@@"; {`quoted' at sign}
2537"'": control_code:=octal; {precedes octal constant}
2538"""": control_code:=hex; {precedes hexadecimal constant}
2539"$": control_code:=check_sum; {string pool check sum}
2540" ",tab_mark: control_code:=new_module; {beginning of a new module}
2541"*": begin print('*',module_count+1:1);
2542  update_terminal; {print a progress report}
2543  control_code:=new_module; {beginning of a new module}
2544  end;
2545"D","d": control_code:=definition; {macro definition}
2546"F","f": control_code:=format; {format definition}
2547"{": control_code:=begin_comment; {begin-comment delimiter}
2548"}": control_code:=end_comment; {end-comment delimiter}
2549"P","p": control_code:=begin_Pascal; {\PASCAL\ text in unnamed module}
2550"T","t","^",".",":": control_code:=control_text; {control text to be ignored}
2551"&": control_code:=join; {concatenate two tokens}
2552"<": control_code:=module_name; {beginning of a module name}
2553"=": control_code:=verbatim; {beginning of \PASCAL\ verbatim mode}
2554"\": control_code:=force_line; {force a new line in \PASCAL\ output}
2555othercases control_code:=ignore {ignore all other cases}
2556endcases;
2557end;
2558
2559@ The |skip_ahead| procedure reads through the input at fairly high speed
2560until finding the next non-ignorable control code, which it returns.
2561
2562@p function skip_ahead:eight_bits; {skip to next control code}
2563label done;
2564var c:eight_bits; {control code found}
2565begin loop begin if loc>limit then
2566    begin get_line;
2567    if input_has_ended then
2568      begin c:=new_module; goto done;
2569      end;
2570    end;
2571  buffer[limit+1]:="@@";
2572  while buffer[loc]<>"@@" do incr(loc);
2573  if loc<=limit then
2574    begin loc:=loc+2; c:=control_code(buffer[loc-1]);
2575    if (c<>ignore)or(buffer[loc-1]=">") then goto done;
2576    end;
2577  end;
2578done: skip_ahead:=c;
2579end;
2580
2581@ The |skip_comment| procedure reads through the input at somewhat high speed
2582until finding the first unmatched right brace or until coming to the end
2583of the file. It ignores characters following `\.\\' characters, since all
2584braces that aren't nested are supposed to be hidden in that way. For
2585example, consider the process of skipping the first comment below,
2586where the string containing the right brace has been typed as \.{\`\\.\\\}\'}
2587in the \.{WEB} file.
2588
2589@p procedure skip_comment; {skips to next unmatched `\.\}'}
2590label exit;
2591var bal:eight_bits; {excess of left braces}
2592@!c:ASCII_code; {current character}
2593begin bal:=0;
2594loop@+  begin if loc>limit then
2595    begin get_line;
2596    if input_has_ended then
2597      begin err_print('! Input ended in mid-comment');
2598@.Input ended in mid-comment@>
2599      return;
2600      end;
2601    end;
2602  c:=buffer[loc]; incr(loc);
2603  @<Do special things when |c="@@", "\", "{", "}"|; |return| at end@>;
2604  end;
2605exit:end;
2606
2607@ @<Do special things when |c="@@"...@>=
2608if c="@@" then
2609  begin c:=buffer[loc];
2610  if (c<>" ")and(c<>tab_mark)and(c<>"*")and(c<>"z")and(c<>"Z") then incr(loc)
2611  else  begin err_print('! Section ended in mid-comment');
2612@.Section ended in mid-comment@>
2613    decr(loc); return;
2614    end
2615  end
2616else if (c="\")and(buffer[loc]<>"@@") then incr(loc)
2617else if c="{" then incr(bal)
2618else if c="}" then
2619  begin if bal=0 then return;
2620  decr(bal);
2621  end
2622
2623@* Inputting the next token.
2624As stated above, \.{TANGLE}'s most interesting input procedure is the
2625|get_next| routine that inputs the next token. However, the procedure
2626isn't especially difficult.
2627
2628In most cases the tokens output by |get_next| have the form used in
2629replacement texts, except that two-byte tokens are not produced.
2630An identifier that isn't one letter long is represented by the
2631output `|identifier|', and in such a case the global variables
2632|id_first| and |id_loc| will have been set to the appropriate values
2633needed by the |id_lookup| procedure. A string that begins with a
2634double-quote is also considered an |identifier|, and in such a case
2635the global variable |double_chars| will also have been set appropriately.
2636Control codes produce the corresponding output of the |control_code|
2637function above; and if that code is |module_name|, the value of |cur_module|
2638will point to the |byte_start| entry for that module name.
2639
2640Another global variable, |scanning_hex|, is |true| during the time that
2641the letters \.A through \.F should be treated as if they were digits.
2642
2643@<Globals...@>=
2644@!cur_module: name_pointer; {name of module just scanned}
2645@!scanning_hex: boolean; {are we scanning a hexadecimal constant?}
2646
2647@ @<Set init...@>=
2648scanning_hex:=false;
2649
2650@ At the top level, |get_next| is a multi-way switch based on the next
2651character in the input buffer. A |new_module| code is inserted at the
2652very end of the input file.
2653
2654@p function get_next:eight_bits; {produces the next input token}
2655label restart,done,found;
2656var c:eight_bits; {the current character}
2657@!d:eight_bits; {the next character}
2658@!j,@!k:0..longest_name; {indices into |mod_text|}
2659begin restart: if loc>limit then
2660  begin get_line;
2661  if input_has_ended then
2662    begin c:=new_module; goto found;
2663    end;
2664  end;
2665c:=buffer[loc]; incr(loc);
2666if scanning_hex then @<Go to |found| if |c| is a hexadecimal digit,
2667  otherwise set |scanning_hex:=false|@>;
2668case c of
2669"A",up_to("Z"),"a",up_to("z"): @<Get an identifier@>;
2670"""": @<Get a preprocessed string@>;
2671"@@": @<Get control code and possible module name@>;
2672@t\4@>@<Compress two-symbol combinations like `\.{:=}'@>@;
2673" ",tab_mark: goto restart; {ignore spaces and tabs}
2674"{": begin skip_comment; goto restart;
2675  end;
2676"}": begin err_print('! Extra }'); goto restart;
2677@.Extra \}@>
2678  end;
2679othercases if c>=128 then goto restart {ignore nonstandard characters}
2680  else do_nothing
2681endcases;
2682found:@!debug if trouble_shooting then debug_help;@;@+gubed@/
2683get_next:=c;
2684end;
2685
2686@ @<Go to |found| if |c| is a hexadecimal digit...@>=
2687if ((c>="0")and(c<="9"))or((c>="A")and(c<="F")) then goto found
2688else scanning_hex:=false
2689
2690@ Note that the following code substitutes \.{@@\{} and \.{@@\}} for the
2691respective combinations `\.{(*}' and `\.{*)}'. Explicit braces should be used
2692for \TeX\ comments in \PASCAL\ text.
2693
2694@d compress(#)==begin if loc<=limit then begin c:=#; incr(loc); end; end
2695
2696@<Compress two-symbol...@>=
2697".": if buffer[loc]="." then compress(double_dot)
2698  else if buffer[loc]=")" then compress("]");
2699":": if buffer[loc]="=" then compress(left_arrow);
2700"=": if buffer[loc]="=" then compress(equivalence_sign);
2701">": if buffer[loc]="=" then compress(greater_or_equal);
2702"<": if buffer[loc]="=" then compress(less_or_equal)
2703  else if buffer[loc]=">" then compress(not_equal);
2704"(": if buffer[loc]="*" then compress(begin_comment)
2705  else if buffer[loc]="." then compress("[");
2706"*": if buffer[loc]=")" then compress(end_comment);
2707
2708@ We have to look at the preceding character to make sure this isn't part
2709of a real constant, before trying to find an identifier starting with
2710`\.e' or `\.E'.
2711
2712@<Get an identifier@>=
2713begin if ((c="e")or(c="E"))and(loc>1) then
2714  if (buffer[loc-2]<="9")and(buffer[loc-2]>="0") then c:=0;
2715if c<>0 then
2716  begin decr(loc); id_first:=loc;
2717  repeat incr(loc); d:=buffer[loc];
2718  until ((d<"0")or((d>"9")and(d<"A"))or((d>"Z")and(d<"a"))or(d>"z")) and
2719    (d<>"_");
2720  if loc>id_first+1 then
2721    begin c:=identifier; id_loc:=loc;
2722    end;
2723  end
2724else c:="E"; {exponent of a real constant}
2725end
2726
2727@ A string that starts and ends with double-quote marks is converted into
2728an identifier that behaves like a numeric macro by means of the following
2729piece of the program.
2730@^preprocessed strings@>
2731
2732@<Get a preprocessed string@>=
2733begin double_chars:=0; id_first:=loc-1;
2734repeat d:=buffer[loc]; incr(loc);
2735  if (d="""")or(d="@@") then
2736    if buffer[loc]=d then
2737      begin incr(loc); d:=0; incr(double_chars);
2738      end
2739    else  begin if d="@@" then err_print('! Double @@ sign missing')
2740@.Double \AT! sign missing@>
2741      end
2742  else if loc>limit then
2743    begin err_print('! String constant didn''t end'); d:="""";
2744@.String constant didn't end@>
2745    end;
2746until d="""";
2747id_loc:=loc-1; c:=identifier;
2748end
2749
2750@ After an \.{@@} sign has been scanned, the next character tells us
2751whether there is more work to do.
2752
2753@<Get control code and possible module name@>=
2754begin c:=control_code(buffer[loc]); incr(loc);
2755if c=ignore then goto restart
2756else if c=hex then scanning_hex:=true
2757else if c=module_name then
2758  @<Scan the \(module name and make |cur_module| point to it@>
2759else if c=control_text then
2760  begin repeat c:=skip_ahead;
2761  until c<>"@@";
2762  if buffer[loc-1]<>">" then
2763    err_print('! Improper @@ within control text');
2764@.Improper \AT! within control text@>
2765  goto restart;
2766  end;
2767end
2768
2769@ @<Scan the \(module name...@>=
2770begin @<Put module name into |mod_text[1..k]|@>;
2771if k>3 then
2772  begin if (mod_text[k]=".")and(mod_text[k-1]=".")and(mod_text[k-2]=".") then
2773    cur_module:=prefix_lookup(k-3)
2774  else cur_module:=mod_lookup(k);
2775  end
2776else cur_module:=mod_lookup(k);
2777end
2778
2779@ Module names are placed into the |mod_text| array with consecutive spaces,
2780tabs, and carriage-returns replaced by single spaces. There will be no
2781spaces at the beginning or the end. (We set |mod_text[0]:=" "| to facilitate
2782this, since the |mod_lookup| routine uses |mod_text[1]| as the first
2783character of the name.)
2784
2785@<Set init...@>=mod_text[0]:=" ";
2786
2787@ @<Put module name...@>=
2788k:=0;
2789loop@+  begin if loc>limit then
2790    begin get_line;
2791    if input_has_ended then
2792      begin err_print('! Input ended in section name');
2793@.Input ended in section name@>
2794      goto done;
2795      end;
2796    end;
2797  d:=buffer[loc];
2798  @<If end of name, |goto done|@>;
2799  incr(loc); if k<longest_name-1 then incr(k);
2800  if (d=" ")or(d=tab_mark) then
2801    begin d:=" "; if mod_text[k-1]=" " then decr(k);
2802    end;
2803  mod_text[k]:=d;
2804  end;
2805done: @<Check for overlong name@>;
2806if (mod_text[k]=" ")and(k>0) then decr(k);
2807
2808@ @<If end of name,...@>=
2809if d="@@" then
2810  begin d:=buffer[loc+1];
2811  if d=">" then
2812    begin loc:=loc+2; goto done;
2813    end;
2814  if (d=" ")or(d=tab_mark)or(d="*") then
2815    begin err_print('! Section name didn''t end'); goto done;
2816@.Section name didn't end@>
2817    end;
2818  incr(k); mod_text[k]:="@@"; incr(loc); {now |d=buffer[loc]| again}
2819  end
2820
2821@ @<Check for overlong name@>=
2822if k>=longest_name-2 then
2823  begin print_nl('! Section name too long: ');
2824@.Section name too long@>
2825  for j:=1 to 25 do print(xchr[mod_text[j]]);
2826  print('...'); mark_harmless;
2827  end
2828
2829@* Scanning a numeric definition.
2830When \.{TANGLE} looks at the \PASCAL\ text following the `\.=' of a numeric
2831macro definition, it calls on the precedure |scan_numeric(p)|, where |p|
2832points to the name that is to be defined. This procedure evaluates the
2833right-hand side, which must consist entirely of integer constants and
2834defined numeric macros connected with \.+ and \.- signs (no parentheses).
2835It also sets the global variable |next_control| to the control code that
2836terminated this definition.
2837
2838A definition ends with the control codes |definition|, |format|, |module_name|,
2839|begin_Pascal|, and |new_module|, all of which can be recognized
2840by the fact that they are the largest values |get_next| can return.
2841
2842@d end_of_definition(#)==(#>=format)
2843  {is |#| a control code ending a definition?}
2844
2845@<Global...@>=
2846@!next_control:eight_bits; {control code waiting to be acted upon}
2847
2848@ The evaluation of a numeric expression makes use of two variables called the
2849|accumulator| and the |next_sign|. At the beginning, |accumulator| is zero and
2850|next_sign| is $+1$. When a \.+ or \.- is scanned, |next_sign| is multiplied
2851by the value of that sign. When a numeric value is scanned, it is multiplied by
2852|next_sign| and added to the |accumulator|, then |next_sign| is reset to $+1$.
2853
2854@d add_in(#)==begin accumulator:=accumulator+next_sign*(#); next_sign:=+1;
2855  end
2856
2857@p procedure scan_numeric(@!p:name_pointer); {defines numeric macros}
2858label reswitch, done;
2859var accumulator:integer; {accumulates sums}
2860@!next_sign:-1..+1; {sign to attach to next value}
2861@!q:name_pointer; {points to identifiers being evaluated}
2862@!val:integer; {constants being evaluated}
2863begin @<Set \(|accumulator| to the value of the right-hand side@>;
2864if abs(accumulator)>=1073741824 then
2865  begin err_print('! Value too big: ',accumulator:1); accumulator:=0;
2866@.Value too big@>
2867  end;
2868equiv[p]:=accumulator+1073741824;
2869  {name |p| now is defined to equal |accumulator|}
2870end;
2871
2872@ @<Set \(|accumulator| to the value of the right-hand side@>=
2873accumulator:=0; next_sign:=+1;
2874loop@+  begin next_control:=get_next;
2875  reswitch: case next_control of
2876  digits: begin @<Set |val| to value of decimal constant, and
2877      set |next_control| to the following token@>;
2878    add_in(val); goto reswitch;
2879    end;
2880  octal: begin @<Set |val| to value of octal constant, and
2881      set |next_control| to the following token@>;
2882    add_in(val); goto reswitch;
2883    end;
2884  hex: begin @<Set |val| to value of hexadecimal constant, and
2885      set |next_control| to the following token@>;
2886    add_in(val); goto reswitch;
2887    end;
2888  identifier: begin q:=id_lookup(normal);
2889    if ilk[q]<>numeric then
2890      begin next_control:="*"; goto reswitch; {leads to error}
2891      end;
2892    add_in(equiv[q]-1073741824);
2893    end;
2894  "+": do_nothing;
2895  "-": next_sign:=-next_sign;
2896  format, definition, module_name, begin_Pascal, new_module: goto done;
2897  ";": err_print('! Omit semicolon in numeric definition');
2898@.Omit semicolon in numeric def...@>
2899  othercases @<Signal error, flush rest of the definition@>
2900  endcases;
2901  end;
2902done:
2903
2904@ @<Signal error, flush rest...@>=
2905begin err_print('! Improper numeric definition will be flushed');
2906@.Improper numeric definition...@>
2907repeat next_control:=skip_ahead
2908until end_of_definition(next_control);
2909if next_control=module_name then
2910  begin {we want to scan the module name too}
2911  loc:=loc-2; next_control:=get_next;
2912  end;
2913accumulator:=0; goto done;
2914end
2915
2916@ @<Set |val| to value of decimal...@>=
2917val:=0;
2918repeat val:=10*val+next_control-"0"; next_control:=get_next;
2919until (next_control>"9")or(next_control<"0")
2920
2921@ @<Set |val| to value of octal...@>=
2922val:=0; next_control:="0";
2923repeat val:=8*val+next_control-"0"; next_control:=get_next;
2924until (next_control>"7")or(next_control<"0")
2925
2926@ @<Set |val| to value of hex...@>=
2927val:=0; next_control:="0";
2928repeat if next_control>="A" then next_control:=next_control+"0"+10-"A";
2929val:=16*val+next_control-"0"; next_control:=get_next;
2930until (next_control>"F")or(next_control<"0")or@|
2931  ((next_control>"9")and(next_control<"A"))
2932
2933@* Scanning a macro definition.
2934The rules for generating the replacement texts corresponding to simple
2935macros, parametric macros, and \PASCAL\ texts of a module are almost
2936identical, so a single procedure is used for all three cases. The
2937differences are that
2938
2939\yskip\item{a)} The sign |#| denotes a parameter only when it appears
2940outside of strings in a parametric macro; otherwise it stands for the
2941ASCII character |#|. (This is not used in standard \PASCAL, but some
2942\PASCAL s allow, for example, `\.{/\#}' after a certain kind of file name.)
2943
2944\item{b)}Module names are not allowed in simple macros or parametric macros;
2945in fact, the appearance of a module name terminates such macros and denotes
2946the name of the current module.
2947
2948\item{c)}The symbols \.{@@d} and \.{@@f} and \.{@@p} are not allowed after
2949module names, while they terminate macro definitions.
2950
2951@ Therefore there is a procedure |scan_repl| whose parameter |t| specifies
2952either |simple| or |parametric| or |module_name|. After |scan_repl| has
2953acted, |cur_repl_text| will point to the replacement text just generated, and
2954|next_control| will contain the control code that terminated the activity.
2955
2956@<Globals...@>=
2957@!cur_repl_text:text_pointer; {replacement text formed by |scan_repl|}
2958
2959@ @p procedure scan_repl(@!t:eight_bits); {creates a replacement text}
2960label continue, done, found, reswitch;
2961var a:sixteen_bits; {the current token}
2962@!b:ASCII_code; {a character from the buffer}
2963@!bal:eight_bits; {left parentheses minus right parentheses}
2964begin bal:=0;
2965loop@+  begin continue: a:=get_next;
2966  case a of
2967  "(": incr(bal);
2968  ")": if bal=0 then err_print('! Extra )')
2969@.Extra )@>
2970    else decr(bal);
2971  "'": @<Copy a string from the buffer to |tok_mem|@>;
2972  "#": if t=parametric then a:=param;
2973  @t\4@>@<In cases that |a| is a non-ASCII token (|identifier|,
2974  |module_name|, etc.), either process it and change |a| to a byte
2975  that should be stored, or |goto continue| if |a| should be ignored,
2976  or |goto done| if |a| signals the end of this replacement text@>@;
2977  othercases do_nothing
2978  endcases;@/
2979  app_repl(a); {store |a| in |tok_mem|}
2980  end;
2981done: next_control:=a;
2982@<Make sure the parentheses balance@>;
2983if text_ptr>max_texts-zz then overflow('text');
2984cur_repl_text:=text_ptr; tok_start[text_ptr+zz]:=tok_ptr[z];
2985incr(text_ptr);
2986if z=zz-1 then z:=0@+else incr(z);
2987end;
2988
2989@ @<Make sure the parentheses balance@>=
2990if bal>0 then
2991  begin if bal=1 then err_print('! Missing )')
2992  else err_print('! Missing ',bal:1,' )''s');
2993@.Missing n )@>
2994  while bal>0 do
2995    begin app_repl(")"); decr(bal);
2996    end;
2997  end
2998
2999@ @<In cases that |a| is...@>=
3000identifier: begin a:=id_lookup(normal); app_repl((a div @'400)+@'200);
3001  a:=a mod @'400;
3002  end;
3003module_name: if t<>module_name then goto done
3004  else  begin app_repl((cur_module div @'400)+@'250);
3005    a:=cur_module mod @'400;
3006    end;
3007verbatim: @<Copy verbatim string from the buffer to |tok_mem|@>;
3008definition, format, begin_Pascal: if t<>module_name then goto done
3009  else  begin err_print('! @@',xchr[buffer[loc-1]],
3010@.\AT!p is ignored in Pascal text@>
3011@.\AT!d is ignored in Pascal text@>
3012@.\AT!f is ignored in Pascal text@>
3013      ' is ignored in Pascal text'); goto continue;
3014    end;
3015new_module: goto done;
3016
3017@ @<Copy a string...@>=
3018begin b:="'";
3019loop@+  begin app_repl(b);
3020  if b="@@" then
3021    if buffer[loc]="@@" then incr(loc) {store only one \.{@@}}
3022    else err_print('! You should double @@ signs in strings');
3023@.You should double \AT! signs@>
3024  if loc=limit then
3025    begin err_print('! String didn''t end');
3026@.String didn't end@>
3027    buffer[loc]:="'"; buffer[loc+1]:=0;
3028    end;
3029  b:=buffer[loc]; incr(loc);
3030  if b="'" then
3031    begin if buffer[loc]<>"'" then goto found
3032    else  begin incr(loc); app_repl("'");
3033      end;
3034    end;
3035  end;
3036found: end {now |a| holds the final |"'"| that will be stored}
3037
3038@ @<Copy verbatim string...@>=
3039begin app_repl(verbatim);
3040buffer[limit+1]:="@@";
3041reswitch: if buffer[loc]="@@" then
3042  begin if loc<limit then if buffer[loc+1]="@@" then
3043    begin app_repl("@@");
3044    loc:=loc+2;
3045    goto reswitch;
3046    end;
3047  end
3048else begin app_repl(buffer[loc]);
3049  incr(loc);
3050  goto reswitch;
3051  end;
3052if loc>=limit then err_print('! Verbatim string didn''t end')
3053@.Verbatim string didn't end@>
3054else if buffer[loc+1]<>">" then
3055  err_print('! You should double @@ signs in verbatim strings');
3056@.You should double \AT! signs@>
3057loc:=loc+2;
3058end {another |verbatim| byte will be stored, since |a=verbatim|}
3059
3060@ The following procedure is used to define a simple or parametric macro,
3061just after the `\.{==}' of its definition has been scanned.
3062
3063@p procedure define_macro(@!t:eight_bits);
3064var p:name_pointer; {the identifier being defined}
3065begin p:=id_lookup(t); scan_repl(t);@/
3066equiv[p]:=cur_repl_text; text_link[cur_repl_text]:=0;
3067end;
3068
3069@* Scanning a module.
3070The |scan_module| procedure starts when `\.{@@\ }' or `\.{@@*}' has been
3071sensed in the input, and it proceeds until the end of that module.  It
3072uses |module_count| to keep track of the current module number; with luck,
3073\.{WEAVE} and \.{TANGLE} will both assign the same numbers to modules.
3074
3075@<Globals...@>=
3076@!module_count:0..@'27777; {the current module number}
3077
3078@ The top level of |scan_module| is trivial.
3079@p procedure scan_module;
3080label continue, done, exit;
3081var p:name_pointer; {module name for the current module}
3082begin incr(module_count);
3083@<Scan the \(definition part of the current module@>;
3084@<Scan the \PASCAL\ part of the current module@>;
3085exit: end;
3086
3087@ @<Scan the \(definition part...@>=
3088next_control:=0;
3089loop@+  begin continue: while next_control<=format do
3090    begin next_control:=skip_ahead;
3091    if next_control=module_name then
3092      begin {we want to scan the module name too}
3093      loc:=loc-2; next_control:=get_next;
3094      end;
3095    end;
3096  if next_control<>definition then goto done;
3097  next_control:=get_next; {get identifier name}
3098  if next_control<>identifier then
3099    begin err_print('! Definition flushed, must start with ',
3100@.Definition flushed...@>
3101      'identifier of length > 1'); goto continue;
3102    end;
3103  next_control:=get_next; {get token after the identifier}
3104  if next_control="=" then
3105    begin scan_numeric(id_lookup(numeric)); goto continue;
3106    end
3107  else if next_control=equivalence_sign then
3108    begin define_macro(simple); goto continue;
3109    end
3110  else @<If the next text is `|(#)==|', call |define_macro|
3111    and |goto continue|@>;
3112  err_print('! Definition flushed since it starts badly');
3113@.Definition flushed...@>
3114  end;
3115done:
3116
3117@ @<If the next text is `|(#)==|'...@>=
3118if next_control="(" then
3119  begin next_control:=get_next;
3120  if next_control="#" then
3121    begin next_control:=get_next;
3122    if next_control=")" then
3123      begin next_control:=get_next;
3124      if next_control="=" then
3125        begin err_print('! Use == for macros');
3126@.Use == for macros@>
3127        next_control:=equivalence_sign;
3128        end;
3129      if next_control=equivalence_sign then
3130        begin define_macro(parametric); goto continue;
3131        end;
3132      end;
3133    end;
3134  end;
3135
3136@ @<Scan the \PASCAL...@>=
3137case next_control of
3138begin_Pascal:p:=0;
3139module_name: begin p:=cur_module;
3140  @<Check that |=| or |==| follows this module name, otherwise |return|@>;
3141  end;
3142othercases return
3143endcases;@/
3144@<Insert the module number into |tok_mem|@>;
3145scan_repl(module_name); {now |cur_repl_text| points to the replacement text}
3146@<Update the data structure so that the replacement text is accessible@>;
3147
3148@ @<Check that |=|...@>=
3149repeat next_control:=get_next;
3150until next_control<>"+"; {allow optional `\.{+=}'}
3151if (next_control<>"=")and(next_control<>equivalence_sign) then
3152  begin err_print('! Pascal text flushed, = sign is missing');
3153@.Pascal text flushed...@>
3154  repeat next_control:=skip_ahead;
3155  until next_control=new_module;
3156  return;
3157  end
3158
3159@ @<Insert the module number...@>=
3160store_two_bytes(@'150000+module_count); {|@'150000=@'320*@'400|}
3161
3162@ @<Update the data...@>=
3163if p=0 then {unnamed module}
3164  begin text_link[last_unnamed]:=cur_repl_text; last_unnamed:=cur_repl_text;
3165  end
3166else if equiv[p]=0 then equiv[p]:=cur_repl_text {first module of this name}
3167else  begin p:=equiv[p];
3168  while text_link[p]<module_flag do p:=text_link[p]; {find end of list}
3169  text_link[p]:=cur_repl_text;
3170  end;
3171text_link[cur_repl_text]:=module_flag;
3172  {mark this replacement text as a nonmacro}
3173
3174@* Debugging.
3175The \PASCAL\ debugger with which \.{TANGLE} was developed allows breakpoints
3176to be set, and variables can be read and changed, but procedures cannot be
3177executed. Therefore a `|debug_help|' procedure has been inserted in the main
3178loops of each phase of the program; when |ddt| and |dd| are set to appropriate
3179values, symbolic printouts of various tables will appear.
3180
3181The idea is to set a breakpoint inside the |debug_help| routine, at the
3182place of `\ignorespaces|breakpoint:|\unskip' below.  Then when
3183|debug_help| is to be activated, set |trouble_shooting| equal to |true|.
3184The |debug_help| routine will prompt you for values of |ddt| and |dd|,
3185discontinuing this when |ddt<=0|; thus you type $2n+1$ integers, ending
3186with zero or a negative number. Then control either passes to the
3187breakpoint, allowing you to look at and/or change variables (if you typed
3188zero), or to exit the routine (if you typed a negative value).
3189
3190Another global variable, |debug_cycle|, can be used to skip silently
3191past calls on |debug_help|. If you set |debug_cycle>1|, the program stops
3192only every |debug_cycle| times |debug_help| is called; however,
3193any error stop will set |debug_cycle| to zero.
3194
3195@<Globals...@>=
3196@!debug@!trouble_shooting:boolean; {is |debug_help| wanted?}
3197@!ddt:integer; {operation code for the |debug_help| routine}
3198@!dd:integer; {operand in procedures performed by |debug_help|}
3199@!debug_cycle:integer; {threshold for |debug_help| stopping}
3200@!debug_skipped:integer; {we have skipped this many |debug_help| calls}
3201@!term_in:text_file; {the user's terminal as an input file}
3202gubed
3203
3204@ The debugging routine needs to read from the user's terminal.
3205@^system dependencies@>
3206@<Set init...@>=
3207@!debug trouble_shooting:=true; debug_cycle:=1; debug_skipped:=0;@/
3208trouble_shooting:=false; debug_cycle:=99999; {use these when it almost works}
3209reset(term_in,'TTY:','/I'); {open |term_in| as the terminal, don't do a |get|}
3210gubed
3211
3212@ @d breakpoint=888 {place where a breakpoint is desirable}
3213@^system dependencies@>
3214
3215@p @!debug procedure debug_help; {routine to display various things}
3216label breakpoint,exit;
3217var k:integer; {index into various arrays}
3218begin incr(debug_skipped);
3219if debug_skipped<debug_cycle then return;
3220debug_skipped:=0;
3221loop@+  begin write(term_out,'#'); update_terminal; {prompt}
3222  read(term_in,ddt); {read a list of integers}
3223  if ddt<0 then return
3224  else if ddt=0 then
3225    begin goto breakpoint;@\ {go to every label at least once}
3226    breakpoint: ddt:=0;@\
3227    end
3228  else  begin read(term_in,dd);
3229    case ddt of
3230    1: print_id(dd);
3231    2: print_repl(dd);
3232    3: for k:=1 to dd do print(xchr[buffer[k]]);
3233    4: for k:=1 to dd do print(xchr[mod_text[k]]);
3234    5: for k:=1 to out_ptr do print(xchr[out_buf[k]]);
3235    6: for k:=1 to dd do print(xchr[out_contrib[k]]);
3236    othercases print('?')
3237    endcases;
3238    end;
3239  end;
3240exit:end;
3241gubed
3242
3243@* The main program.
3244We have defined plenty of procedures, and it is time to put the last
3245pieces of the puzzle in place. Here is where \.{TANGLE} starts, and where
3246it ends.
3247@^system dependencies@>
3248
3249@p begin initialize;
3250@<Initialize the input system@>;
3251print_ln(banner); {print a ``banner line''}
3252@<Phase I: Read all the user's text and compress it into |tok_mem|@>;
3253stat for ii:=0 to zz-1 do max_tok_ptr[ii]:=tok_ptr[ii];@+tats@;@/
3254@<Phase II:...@>;
3255end_of_TANGLE:
3256if string_ptr>number_chars then @<Finish off the string pool file@>;
3257stat @<Print statistics about memory usage@>;@+tats@;@/
3258@t\4\4@>{here files should be closed if the operating system requires it}
3259@<Print the job |history|@>;
3260end.
3261
3262@ @<Phase I:...@>=
3263phase_one:=true;
3264module_count:=0;
3265repeat next_control:=skip_ahead;
3266until next_control=new_module;
3267while not input_has_ended do scan_module;
3268@<Check that all changes have been read@>;
3269phase_one:=false;
3270
3271@ @<Finish off the string pool file@>=
3272begin print_nl(string_ptr-number_chars:1,
3273               ' strings written to string pool file.');
3274write(pool,'*');
3275for ii:=1 to 9 do
3276  begin out_buf[ii]:=pool_check_sum mod 10;
3277  pool_check_sum:=pool_check_sum div 10;
3278  end;
3279for ii:=9 downto 1 do write(pool,xchr["0"+out_buf[ii]]);
3280write_ln(pool);
3281end
3282
3283@ @<Glob...@>=
3284stat @!wo:0..ww-1; {segment of memory for which statistics are being printed}
3285tats
3286
3287@ @<Print statistics about memory usage@>=
3288print_nl('Memory usage statistics:');
3289print_nl(name_ptr:1, ' names, ', text_ptr:1, ' replacement texts;');
3290print_nl(byte_ptr[0]:1);
3291for wo:=1 to ww-1 do print('+',byte_ptr[wo]:1);
3292if phase_one then
3293  for ii:=0 to zz-1 do max_tok_ptr[ii]:=tok_ptr[ii];
3294print(' bytes, ', max_tok_ptr[0]:1);
3295for ii:=1 to zz-1 do print('+',max_tok_ptr[ii]:1);
3296print(' tokens.');
3297
3298@ Some implementations may wish to pass the |history| value to the
3299operating system so that it can be used to govern whether or not other
3300programs are started. Here we simply report the history to the user.
3301@^system dependencies@>
3302
3303@<Print the job |history|@>=
3304case history of
3305spotless: print_nl('(No errors were found.)');
3306harmless_message: print_nl('(Did you see the warning message above?)');
3307error_message: print_nl('(Pardon me, but I think I spotted something wrong.)');
3308fatal_message: print_nl('(That was a fatal error, my friend.)');
3309end {there are no other cases}
3310
3311@* System-dependent changes.
3312This module should be replaced, if necessary, by changes to the program
3313that are necessary to make \.{TANGLE} work at a particular installation.
3314It is usually best to design your change file so that all changes to
3315previous modules preserve the module numbering; then everybody's version
3316will be consistent with the printed program. More extensive changes,
3317which introduce new modules, can be inserted here; then only the index
3318itself will get a new module number.
3319@^system dependencies@>
3320
3321@* Index.
3322Here is a cross-reference table for the \.{TANGLE} processor.
3323All modules in which an identifier is
3324used are listed with that identifier, except that reserved words are
3325indexed only when they appear in format definitions, and the appearances
3326of identifiers in module names are not indexed. Underlined entries
3327correspond to where the identifier was declared. Error messages and
3328a few other things like ``ASCII code'' are indexed here too.
3329