1% This program is copyright (C) 1982 by D. E. Knuth; all rights are reserved.
2% Copying of this file is authorized only if (1) you are D. E. Knuth, or if
3% (2) you make absolutely no changes to your copy. (The WEB system provides
4% for alterations via an auxiliary file; the master file should stay intact.)
5% See Appendix H of the WEB manual for hints on how to install this program.
6% And see Appendix A of the TRIP manual for details about how to validate it.
7
8% TeX is a trademark of the American Mathematical Society.
9% METAFONT is a trademark of Addison-Wesley Publishing Company.
10
11% Version 0 was released in September 1982 after it passed a variety of tests.
12% Version 1 was released in November 1983 after thorough testing.
13% Version 1.1 fixed ``disappearing font identifiers'' et alia (July 1984).
14% Version 1.2 allowed `0' in response to an error, et alia (October 1984).
15% Version 1.3 made memory allocation more flexible and local (November 1984).
16% Version 1.4 fixed accents right after line breaks, et alia (April 1985).
17% Version 1.5 fixed \the\toks after other expansion in \edefs (August 1985).
18% Version 2.0 (almost identical to 1.5) corresponds to "Volume B" (April 1986).
19% Version 2.1 corrected anomalies in discretionary breaks (January 1987).
20% Version 2.2 corrected "(Please type...)" with null \endlinechar (April 1987).
21% Version 2.3 avoided incomplete page in premature termination (August 1987).
22% Version 2.4 fixed \noaligned rules in indented displays (August 1987).
23% Version 2.5 saved cur_order when expanding tokens (September 1987).
24% Version 2.6 added 10sp slop when shipping leaders (November 1987).
25% Version 2.7 improved rounding of negative-width characters (November 1987).
26% Version 2.8 fixed weird bug if no \patterns are used (December 1987).
27% Version 2.9 made \csname\endcsname's "relax" local (December 1987).
28% Version 2.91 fixed \outer\def\a0{}\a\a bug (April 1988).
29% Version 2.92 fixed \patterns, also file names with complex macros (May 1988).
30% Version 2.93 fixed negative halving in allocator when mem_min<0 (June 1988).
31% Version 2.94 kept open_log_file from calling fatal_error (November 1988).
32% Version 2.95 solved that problem a better way (December 1988).
33% Version 2.96 corrected bug in "Infinite shrinkage" recovery (January 1989).
34% Version 2.97 corrected blunder in creating 2.95 (February 1989).
35% Version 2.98 omitted save_for_after at outer level (March 1989).
36% Version 2.99 caught $$\begingroup\halign..$$ (June 1989).
37% Version 2.991 caught .5\ifdim.6... (June 1989).
38% Version 2.992 introduced major changes for 8-bit extensions (September 1989).
39% Version 2.993 fixed a save_stack synchronization bug et alia (December 1989).
40% Version 3.0 fixed unusual displays; was more \output robust (March 1990).
41% Version 3.1 fixed nullfont, disabled \write{\the\prevgraf} (September 1990).
42% Version 3.14 fixed unprintable font names and corrected typos (March 1991).
43% Version 3.141 more of same; reconstituted ligatures better (March 1992).
44% Version 3.1415 preserved nonexplicit kerns, tidied up (February 1993).
45% Version 3.14159 allowed fontmemsize to change; bulletproofing (March 1995).
46% Version 3.141592 fixed \xleaders, glueset, weird alignments (December 2002).
47% Version 3.1415926 was a general cleanup with minor fixes (February 2008).
48% Version 3.14159265 was similar (January 2014).
49
50% A reward of $327.68 will be paid to the first finder of any remaining bug.
51
52% Although considerable effort has been expended to make the TeX program
53% correct and reliable, no warranty is implied; the author disclaims any
54% obligation or liability for damages, including but not limited to
55% special, indirect, or consequential damages arising out of or in
56% connection with the use or performance of this software. This work has
57% been a ``labor of love'' and the author hopes that users enjoy it.
58
59% Here is TeX material that gets inserted after \input webmac
60\def\hang{\hangindent 3em\noindent\ignorespaces}
61\def\hangg#1 {\hang\hbox{#1 }}
62\def\textindent#1{\hangindent2.5em\noindent\hbox to2.5em{\hss#1 }\ignorespaces}
63\font\ninerm=cmr9
64\let\mc=\ninerm % medium caps for names like SAIL
65\def\PASCAL{Pascal}
66\def\ph{\hbox{Pascal-H}}
67\def\pct!{{\char`\%}} % percent sign in ordinary text
68\font\logo=logo10 % font used for the METAFONT logo
69\def\MF{{\logo META}\-{\logo FONT}}
70\def\<#1>{$\langle#1\rangle$}
71\def\section{\mathhexbox278}
72
73\def\(#1){} % this is used to make section names sort themselves better
74\def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@>
75
76\outer\def\N#1. \[#2]#3.{\MN#1.\vfil\eject % begin starred section
77  \def\rhead{PART #2:\uppercase{#3}} % define running headline
78  \message{*\modno} % progress report
79  \edef\next{\write\cont{\Z{\?#2]#3}{\modno}{\the\pageno}}}\next
80  \ifon\startsection{\bf\ignorespaces#3.\quad}\ignorespaces}
81\let\?=\relax % we want to be able to \write a \?
82
83\def\title{\TeX82}
84\def\topofcontents{\hsize 5.5in
85  \vglue 0pt plus 1fil minus 1.5in
86  \def\?##1]{\hbox to 1in{\hfil##1.\ }}
87  }
88\def\botofcontents{\vskip 0pt plus 1fil minus 1.5in}
89\pageno=3
90\def\glob{13} % this should be the section number of "<Global...>"
91\def\gglob{20, 26} % this should be the next two sections of "<Global...>"
92
93@* \[1] Introduction.
94This is \TeX, a document compiler intended to produce typesetting of high
95quality.
96The \PASCAL\ program that follows is the definition of \TeX82, a standard
97@:PASCAL}{\PASCAL@>
98@!@:TeX82}{\TeX82@>
99version of \TeX\ that is designed to be highly portable so that identical output
100will be obtainable on a great variety of computers.
101
102The main purpose of the following program is to explain the algorithms of \TeX\
103as clearly as possible. As a result, the program will not necessarily be very
104efficient when a particular \PASCAL\ compiler has translated it into a
105particular machine language. However, the program has been written so that it
106can be tuned to run efficiently in a wide variety of operating environments
107by making comparatively few changes. Such flexibility is possible because
108the documentation that follows is written in the \.{WEB} language, which is
109at a higher level than \PASCAL; the preprocessing step that converts \.{WEB}
110to \PASCAL\ is able to introduce most of the necessary refinements.
111Semi-automatic translation to other languages is also feasible, because the
112program below does not make extensive use of features that are peculiar to
113\PASCAL.
114
115A large piece of software like \TeX\ has inherent complexity that cannot
116be reduced below a certain level of difficulty, although each individual
117part is fairly simple by itself. The \.{WEB} language is intended to make
118the algorithms as readable as possible, by reflecting the way the
119individual program pieces fit together and by providing the
120cross-references that connect different parts. Detailed comments about
121what is going on, and about why things were done in certain ways, have
122been liberally sprinkled throughout the program.  These comments explain
123features of the implementation, but they rarely attempt to explain the
124\TeX\ language itself, since the reader is supposed to be familiar with
125{\sl The \TeX book}.
126@.WEB@>
127@:TeXbook}{\sl The \TeX book@>
128
129@ The present implementation has a long ancestry, beginning in the summer
130of~1977, when Michael~F. Plass and Frank~M. Liang designed and coded
131a prototype
132@^Plass, Michael Frederick@>
133@^Liang, Franklin Mark@>
134@^Knuth, Donald Ervin@>
135based on some specifications that the author had made in May of that year.
136This original proto\TeX\ included macro definitions and elementary
137manipulations on boxes and glue, but it did not have line-breaking,
138page-breaking, mathematical formulas, alignment routines, error recovery,
139or the present semantic nest; furthermore,
140it used character lists instead of token lists, so that a control sequence
141like \.{\\halign} was represented by a list of seven characters. A
142complete version of \TeX\ was designed and coded by the author in late
1431977 and early 1978; that program, like its prototype, was written in the
144{\mc SAIL} language, for which an excellent debugging system was
145available. Preliminary plans to convert the {\mc SAIL} code into a form
146somewhat like the present ``web'' were developed by Luis Trabb~Pardo and
147@^Trabb Pardo, Luis Isidoro@>
148the author at the beginning of 1979, and a complete implementation was
149created by Ignacio~A. Zabala in 1979 and 1980. The \TeX82 program, which
150@^Zabala Salelles, Ignacio Andr\'es@>
151was written by the author during the latter part of 1981 and the early
152part of 1982, also incorporates ideas from the 1979 implementation of
153@^Guibas, Leonidas Ioannis@>
154@^Sedgewick, Robert@>
155@^Wyatt, Douglas Kirk@>
156\TeX\ in {\mc MESA} that was written by Leonidas Guibas, Robert Sedgewick,
157and Douglas Wyatt at the Xerox Palo Alto Research Center.  Several hundred
158refinements were introduced into \TeX82 based on the experiences gained with
159the original implementations, so that essentially every part of the system
160has been substantially improved. After the appearance of ``Version 0'' in
161September 1982, this program benefited greatly from the comments of
162many other people, notably David~R. Fuchs and Howard~W. Trickey.
163A final revision in September 1989 extended the input character set to
164eight-bit codes and introduced the ability to hyphenate words from
165different languages, based on some ideas of Michael~J. Ferguson.
166@^Fuchs, David Raymond@>
167@^Trickey, Howard Wellington@>
168@^Ferguson, Michael John@>
169
170No doubt there still is plenty of room for improvement, but the author
171is firmly committed to keeping \TeX82 ``frozen'' from now on; stability
172and reliability are to be its main virtues.
173
174On the other hand, the \.{WEB} description can be extended without changing
175the core of \TeX82 itself, and the program has been designed so that such
176extensions are not extremely difficult to make.
177The |banner| string defined here should be changed whenever \TeX\
178undergoes any modifications, so that it will be clear which version of
179\TeX\ might be the guilty party when a problem arises.
180@^extensions to \TeX@>
181@^system dependencies@>
182
183If this program is changed, the resulting system should not be called
184`\TeX'; the official name `\TeX' by itself is reserved
185for software systems that are fully compatible with each other.
186A special test suite called the ``\.{TRIP} test'' is available for
187helping to determine whether a particular implementation deserves to be
188known as `\TeX' [cf.~Stanford Computer Science report CS1027,
189November 1984].
190
191@d banner=='This is TeX, Version 3.14159265' {printed when \TeX\ starts}
192
193@ Different \PASCAL s have slightly different conventions, and the present
194@!@:PASCAL H}{\ph@>
195program expresses \TeX\ in terms of the \PASCAL\ that was
196available to the author in 1982. Constructions that apply to
197this particular compiler, which we shall call \ph, should help the
198reader see how to make an appropriate interface for other systems
199if necessary. (\ph\ is Charles Hedrick's modification of a compiler
200@^Hedrick, Charles Locke@>
201for the DECsystem-10 that was originally developed at the University of
202Hamburg; cf.\ {\sl SOFTWARE---Practice \AM\ Experience \bf6} (1976),
20329--42. The \TeX\ program below is intended to be adaptable, without
204extensive changes, to most other versions of \PASCAL, so it does not fully
205use the admirable features of \ph. Indeed, a conscious effort has been
206made here to avoid using several idiosyncratic features of standard
207\PASCAL\ itself, so that most of the code can be translated mechanically
208into other high-level languages. For example, the `\&{with}' and `\\{new}'
209features are not used, nor are pointer types, set types, or enumerated
210scalar types; there are no `\&{var}' parameters, except in the case of files;
211there are no tag fields on variant records; there are no assignments
212|real:=integer|; no procedures are declared local to other procedures.)
213
214The portions of this program that involve system-dependent code, where
215changes might be necessary because of differences between \PASCAL\ compilers
216and/or differences between
217operating systems, can be identified by looking at the sections whose
218numbers are listed under `system dependencies' in the index. Furthermore,
219the index entries for `dirty \PASCAL' list all places where the restrictions
220of \PASCAL\ have not been followed perfectly, for one reason or another.
221@!@^system dependencies@>
222@!@^dirty \PASCAL@>
223
224Incidentally, \PASCAL's standard |round| function can be problematical,
225because it disagrees with the IEEE floating-point standard.
226Many implementors have
227therefore chosen to substitute their own home-grown rounding procedure.
228
229@ The program begins with a normal \PASCAL\ program heading, whose
230components will be filled in later, using the conventions of \.{WEB}.
231@.WEB@>
232For example, the portion of the program called `\X\glob:Global
233variables\X' below will be replaced by a sequence of variable declarations
234that starts in $\section\glob$ of this documentation. In this way, we are able
235to define each individual global variable when we are prepared to
236understand what it means; we do not have to define all of the globals at
237once.  Cross references in $\section\glob$, where it says ``See also
238sections \gglob, \dots,'' also make it possible to look at the set of
239all global variables, if desired.  Similar remarks apply to the other
240portions of the program heading.
241
242Actually the heading shown here is not quite normal: The |program| line
243does not mention any |output| file, because \ph\ would ask the \TeX\ user
244to specify a file name if |output| were specified here.
245@:PASCAL H}{\ph@>
246@^system dependencies@>
247
248@d mtype==t@&y@&p@&e {this is a \.{WEB} coding trick:}
249@f mtype==type {`\&{mtype}' will be equivalent to `\&{type}'}
250@f type==true {but `|type|' will not be treated as a reserved word}
251
252@p @t\4@>@<Compiler directives@>@/
253program TEX; {all file names are defined dynamically}
254label @<Labels in the outer block@>@/
255const @<Constants in the outer block@>@/
256mtype @<Types in the outer block@>@/
257var @<Global variables@>@/
258@#
259procedure initialize; {this procedure gets things started properly}
260  var @<Local variables for initialization@>@/
261  begin @<Initialize whatever \TeX\ might access@>@;
262  end;@#
263@t\4@>@<Basic printing procedures@>@/
264@t\4@>@<Error handling procedures@>@/
265
266@ The overall \TeX\ program begins with the heading just shown, after which
267comes a bunch of procedure declarations and function declarations.
268Finally we will get to the main program, which begins with the
269comment `|start_here|'. If you want to skip down to the
270main program now, you can look up `|start_here|' in the index.
271But the author suggests that the best way to understand this program
272is to follow pretty much the order of \TeX's components as they appear in the
273\.{WEB} description you are now reading, since the present ordering is
274intended to combine the advantages of the ``bottom up'' and ``top down''
275approaches to the problem of understanding a somewhat complicated system.
276
277@ Three labels must be declared in the main program, so we give them
278symbolic names.
279
280@d start_of_TEX=1 {go here when \TeX's variables are initialized}
281@d end_of_TEX=9998 {go here to close files and terminate gracefully}
282@d final_end=9999 {this label marks the ending of the program}
283
284@<Labels in the out...@>=
285start_of_TEX@t\hskip-2pt@>, end_of_TEX@t\hskip-2pt@>,@,final_end;
286  {key control points}
287
288@ Some of the code below is intended to be used only when diagnosing the
289strange behavior that sometimes occurs when \TeX\ is being installed or
290when system wizards are fooling around with \TeX\ without quite knowing
291what they are doing. Such code will not normally be compiled; it is
292delimited by the codewords `$|debug|\ldots|gubed|$', with apologies
293to people who wish to preserve the purity of English.
294
295Similarly, there is some conditional code delimited by
296`$|stat|\ldots|tats|$' that is intended for use when statistics are to be
297kept about \TeX's memory usage.  The |stat| $\ldots$ |tats| code also
298implements diagnostic information for \.{\\tracingparagraphs} and
299\.{\\tracingpages}.
300@^debugging@>
301
302@d debug==@{ {change this to `$\\{debug}\equiv\null$' when debugging}
303@d gubed==@t@>@} {change this to `$\\{gubed}\equiv\null$' when debugging}
304@f debug==begin
305@f gubed==end
306@#
307@d stat==@{ {change this to `$\\{stat}\equiv\null$' when gathering
308  usage statistics}
309@d tats==@t@>@} {change this to `$\\{tats}\equiv\null$' when gathering
310  usage statistics}
311@f stat==begin
312@f tats==end
313
314@ This program has two important variations: (1) There is a long and slow
315version called \.{INITEX}, which does the extra calculations needed to
316@.INITEX@>
317initialize \TeX's internal tables; and (2)~there is a shorter and faster
318production version, which cuts the initialization to a bare minimum.
319Parts of the program that are needed in (1) but not in (2) are delimited by
320the codewords `$|init|\ldots|tini|$'.
321
322@d init== {change this to `$\\{init}\equiv\.{@@\{}$' in the production version}
323@d tini== {change this to `$\\{tini}\equiv\.{@@\}}$' in the production version}
324@f init==begin
325@f tini==end
326
327@<Initialize whatever...@>=
328@<Set initial values of key variables@>@/
329@!init @<Initialize table entries (done by \.{INITEX} only)@>@;@+tini
330
331@ If the first character of a \PASCAL\ comment is a dollar sign,
332\ph\ treats the comment as a list of ``compiler directives'' that will
333affect the translation of this program into machine language.  The
334directives shown below specify full checking and inclusion of the \PASCAL\
335debugger when \TeX\ is being debugged, but they cause range checking and other
336redundant code to be eliminated when the production system is being generated.
337Arithmetic overflow will be detected in all cases.
338@:PASCAL H}{\ph@>
339@^system dependencies@>
340@^overflow in arithmetic@>
341
342@<Compiler directives@>=
343@{@&$C-,A+,D-@} {no range check, catch arithmetic overflow, no debug overhead}
344@!debug @{@&$C+,D+@}@+ gubed {but turn everything on when debugging}
345
346@ This \TeX\ implementation conforms to the rules of the {\sl Pascal User
347@:PASCAL}{\PASCAL@>
348@^system dependencies@>
349Manual} published by Jensen and Wirth in 1975, except where system-dependent
350@^Wirth, Niklaus@>
351@^Jensen, Kathleen@>
352code is necessary to make a useful system program, and except in another
353respect where such conformity would unnecessarily obscure the meaning
354and clutter up the code: We assume that |case| statements may include a
355default case that applies if no matching label is found. Thus, we shall use
356constructions like
357$$\vbox{\halign{\ignorespaces#\hfil\cr
358|case x of|\cr
3591: $\langle\,$code for $x=1\,\rangle$;\cr
3603: $\langle\,$code for $x=3\,\rangle$;\cr
361|othercases| $\langle\,$code for |x<>1| and |x<>3|$\,\rangle$\cr
362|endcases|\cr}}$$
363since most \PASCAL\ compilers have plugged this hole in the language by
364incorporating some sort of default mechanism. For example, the \ph\
365compiler allows `|others|:' as a default label, and other \PASCAL s allow
366syntaxes like `\&{else}' or `\&{otherwise}' or `\\{otherwise}:', etc. The
367definitions of |othercases| and |endcases| should be changed to agree with
368local conventions.  Note that no semicolon appears before |endcases| in
369this program, so the definition of |endcases| should include a semicolon
370if the compiler wants one. (Of course, if no default mechanism is
371available, the |case| statements of \TeX\ will have to be laboriously
372extended by listing all remaining cases. People who are stuck with such
373\PASCAL s have, in fact, done this, successfully but not happily!)
374@:PASCAL H}{\ph@>
375
376@d othercases == others: {default for cases not listed explicitly}
377@d endcases == @+end {follows the default case in an extended |case| statement}
378@f othercases == else
379@f endcases == end
380
381@ The following parameters can be changed at compile time to extend or
382reduce \TeX's capacity. They may have different values in \.{INITEX} and
383in production versions of \TeX.
384@.INITEX@>
385@^system dependencies@>
386
387@<Constants...@>=
388@!mem_max=30000; {greatest index in \TeX's internal |mem| array;
389  must be strictly less than |max_halfword|;
390  must be equal to |mem_top| in \.{INITEX}, otherwise |>=mem_top|}
391@!mem_min=0; {smallest index in \TeX's internal |mem| array;
392  must be |min_halfword| or more;
393  must be equal to |mem_bot| in \.{INITEX}, otherwise |<=mem_bot|}
394@!buf_size=500; {maximum number of characters simultaneously present in
395  current lines of open files and in control sequences between
396  \.{\\csname} and \.{\\endcsname}; must not exceed |max_halfword|}
397@!error_line=72; {width of context lines on terminal error messages}
398@!half_error_line=42; {width of first lines of contexts in terminal
399  error messages; should be between 30 and |error_line-15|}
400@!max_print_line=79; {width of longest text lines output; should be at least 60}
401@!stack_size=200; {maximum number of simultaneous input sources}
402@!max_in_open=6; {maximum number of input files and error insertions that
403  can be going on simultaneously}
404@!font_max=75; {maximum internal font number; must not exceed |max_quarterword|
405  and must be at most |font_base+256|}
406@!font_mem_size=20000; {number of words of |font_info| for all fonts}
407@!param_size=60; {maximum number of simultaneous macro parameters}
408@!nest_size=40; {maximum number of semantic levels simultaneously active}
409@!max_strings=3000; {maximum number of strings; must not exceed |max_halfword|}
410@!string_vacancies=8000; {the minimum number of characters that should be
411  available for the user's control sequences and font names,
412  after \TeX's own error messages are stored}
413@!pool_size=32000; {maximum number of characters in strings, including all
414  error messages and help texts, and the names of all fonts and
415  control sequences; must exceed |string_vacancies| by the total
416  length of \TeX's own strings, which is currently about 23000}
417@!save_size=600; {space for saving values outside of current group; must be
418  at most |max_halfword|}
419@!trie_size=8000; {space for hyphenation patterns; should be larger for
420  \.{INITEX} than it is in production versions of \TeX}
421@!trie_op_size=500; {space for ``opcodes'' in the hyphenation patterns}
422@!dvi_buf_size=800; {size of the output buffer; must be a multiple of 8}
423@!file_name_size=40; {file names shouldn't be longer than this}
424@!pool_name='TeXformats:TEX.POOL                     ';
425  {string of length |file_name_size|; tells where the string pool appears}
426@.TeXformats@>
427
428@ Like the preceding parameters, the following quantities can be changed
429at compile time to extend or reduce \TeX's capacity. But if they are changed,
430it is necessary to rerun the initialization program \.{INITEX}
431@.INITEX@>
432to generate new tables for the production \TeX\ program.
433One can't simply make helter-skelter changes to the following constants,
434since certain rather complex initialization
435numbers are computed from them. They are defined here using
436\.{WEB} macros, instead of being put into \PASCAL's |const| list, in order to
437emphasize this distinction.
438
439@d mem_bot=0 {smallest index in the |mem| array dumped by \.{INITEX};
440  must not be less than |mem_min|}
441@d mem_top==30000 {largest index in the |mem| array dumped by \.{INITEX};
442  must be substantially larger than |mem_bot|
443  and not greater than |mem_max|}
444@d font_base=0 {smallest internal font number; must not be less
445  than |min_quarterword|}
446@d hash_size=2100 {maximum number of control sequences; it should be at most
447  about |(mem_max-mem_min)/10|}
448@d hash_prime=1777 {a prime number equal to about 85\pct! of |hash_size|}
449@d hyph_size=307 {another prime; the number of \.{\\hyphenation} exceptions}
450@^system dependencies@>
451
452@ In case somebody has inadvertently made bad settings of the ``constants,''
453\TeX\ checks them using a global variable called |bad|.
454
455This is the first of many sections of \TeX\ where global variables are
456defined.
457
458@<Glob...@>=
459@!bad:integer; {is some ``constant'' wrong?}
460
461@ Later on we will say `\ignorespaces|if mem_max>=max_halfword then bad:=14|',
462or something similar. (We can't do that until |max_halfword| has been defined.)
463
464@<Check the ``constant'' values for consistency@>=
465bad:=0;
466if (half_error_line<30)or(half_error_line>error_line-15) then bad:=1;
467if max_print_line<60 then bad:=2;
468if dvi_buf_size mod 8<>0 then bad:=3;
469if mem_bot+1100>mem_top then bad:=4;
470if hash_prime>hash_size then bad:=5;
471if max_in_open>=128 then bad:=6;
472if mem_top<256+11 then bad:=7; {we will want |null_list>255|}
473
474@ Labels are given symbolic names by the following definitions, so that
475occasional |goto| statements will be meaningful. We insert the label
476`|exit|' just before the `\ignorespaces|end|\unskip' of a procedure in
477which we have used the `|return|' statement defined below; the label
478`|restart|' is occasionally used at the very beginning of a procedure; and
479the label `|reswitch|' is occasionally used just prior to a |case|
480statement in which some cases change the conditions and we wish to branch
481to the newly applicable case.  Loops that are set up with the |loop|
482construction defined below are commonly exited by going to `|done|' or to
483`|found|' or to `|not_found|', and they are sometimes repeated by going to
484`|continue|'.  If two or more parts of a subroutine start differently but
485end up the same, the shared code may be gathered together at
486`|common_ending|'.
487
488Incidentally, this program never declares a label that isn't actually used,
489because some fussy \PASCAL\ compilers will complain about redundant labels.
490
491@d exit=10 {go here to leave a procedure}
492@d restart=20 {go here to start a procedure again}
493@d reswitch=21 {go here to start a case statement again}
494@d continue=22 {go here to resume a loop}
495@d done=30 {go here to exit a loop}
496@d done1=31 {like |done|, when there is more than one loop}
497@d done2=32 {for exiting the second loop in a long block}
498@d done3=33 {for exiting the third loop in a very long block}
499@d done4=34 {for exiting the fourth loop in an extremely long block}
500@d done5=35 {for exiting the fifth loop in an immense block}
501@d done6=36 {for exiting the sixth loop in a block}
502@d found=40 {go here when you've found it}
503@d found1=41 {like |found|, when there's more than one per routine}
504@d found2=42 {like |found|, when there's more than two per routine}
505@d not_found=45 {go here when you've found nothing}
506@d common_ending=50 {go here when you want to merge with another branch}
507
508@ Here are some macros for common programming idioms.
509
510@d incr(#) == #:=#+1 {increase a variable by unity}
511@d decr(#) == #:=#-1 {decrease a variable by unity}
512@d negate(#) == #:=-# {change the sign of a variable}
513@d loop == @+ while true do@+ {repeat over and over until a |goto| happens}
514@f loop == xclause
515  {\.{WEB}'s |xclause| acts like `\ignorespaces|while true do|\unskip'}
516@d do_nothing == {empty statement}
517@d return == goto exit {terminate a procedure call}
518@f return == nil
519@d empty=0 {symbolic name for a null constant}
520
521@* \[2] The character set.
522In order to make \TeX\ readily portable to a wide variety of
523computers, all of its input text is converted to an internal eight-bit
524code that includes standard ASCII, the ``American Standard Code for
525Information Interchange.''  This conversion is done immediately when each
526character is read in. Conversely, characters are converted from ASCII to
527the user's external representation just before they are output to a
528text file.
529
530Such an internal code is relevant to users of \TeX\ primarily because it
531governs the positions of characters in the fonts. For example, the
532character `\.A' has ASCII code $65=@'101$, and when \TeX\ typesets
533this letter it specifies character number 65 in the current font.
534If that font actually has `\.A' in a different position, \TeX\ doesn't
535know what the real position is; the program that does the actual printing from
536\TeX's device-independent files is responsible for converting from ASCII to
537a particular font encoding.
538@^ASCII code@>
539
540\TeX's internal code also defines the value of constants
541that begin with a reverse apostrophe; and it provides an index to the
542\.{\\catcode}, \.{\\mathcode}, \.{\\uccode}, \.{\\lccode}, and \.{\\delcode}
543tables.
544
545@ Characters of text that have been converted to \TeX's internal form
546are said to be of type |ASCII_code|, which is a subrange of the integers.
547
548@<Types...@>=
549@!ASCII_code=0..255; {eight-bit numbers}
550
551@ The original \PASCAL\ compiler was designed in the late 60s, when six-bit
552character sets were common, so it did not make provision for lowercase
553letters. Nowadays, of course, we need to deal with both capital and small
554letters in a convenient way, especially in a program for typesetting;
555so the present specification of \TeX\ has been written under the assumption
556that the \PASCAL\ compiler and run-time system permit the use of text files
557with more than 64 distinguishable characters. More precisely, we assume that
558the character set contains at least the letters and symbols associated
559with ASCII codes @'40 through @'176; all of these characters are now
560available on most computer terminals.
561
562Since we are dealing with more characters than were present in the first
563\PASCAL\ compilers, we have to decide what to call the associated data
564type. Some \PASCAL s use the original name |char| for the
565characters in text files, even though there now are more than 64 such
566characters, while other \PASCAL s consider |char| to be a 64-element
567subrange of a larger data type that has some other name.
568
569In order to accommodate this difference, we shall use the name |text_char|
570to stand for the data type of the characters that are converted to and
571from |ASCII_code| when they are input and output. We shall also assume
572that |text_char| consists of the elements |chr(first_text_char)| through
573|chr(last_text_char)|, inclusive. The following definitions should be
574adjusted if necessary.
575@^system dependencies@>
576
577@d text_char == char {the data type of characters in text files}
578@d first_text_char=0 {ordinal number of the smallest element of |text_char|}
579@d last_text_char=255 {ordinal number of the largest element of |text_char|}
580
581@<Local variables for init...@>=
582@!i:integer;
583
584@ The \TeX\ processor converts between ASCII code and
585the user's external character set by means of arrays |xord| and |xchr|
586that are analogous to \PASCAL's |ord| and |chr| functions.
587
588@<Glob...@>=
589@!xord: array [text_char] of ASCII_code;
590  {specifies conversion of input characters}
591@!xchr: array [ASCII_code] of text_char;
592  {specifies conversion of output characters}
593
594@ Since we are assuming that our \PASCAL\ system is able to read and
595write the visible characters of standard ASCII (although not
596necessarily using the ASCII codes to represent them), the following
597assignment statements initialize the standard part of the |xchr| array
598properly, without needing any system-dependent changes. On the other
599hand, it is possible to implement \TeX\ with less complete character
600sets, and in such cases it will be necessary to change something here.
601@^system dependencies@>
602
603@<Set init...@>=
604xchr[@'40]:=' ';
605xchr[@'41]:='!';
606xchr[@'42]:='"';
607xchr[@'43]:='#';
608xchr[@'44]:='$';
609xchr[@'45]:='%';
610xchr[@'46]:='&';
611xchr[@'47]:='''';@/
612xchr[@'50]:='(';
613xchr[@'51]:=')';
614xchr[@'52]:='*';
615xchr[@'53]:='+';
616xchr[@'54]:=',';
617xchr[@'55]:='-';
618xchr[@'56]:='.';
619xchr[@'57]:='/';@/
620xchr[@'60]:='0';
621xchr[@'61]:='1';
622xchr[@'62]:='2';
623xchr[@'63]:='3';
624xchr[@'64]:='4';
625xchr[@'65]:='5';
626xchr[@'66]:='6';
627xchr[@'67]:='7';@/
628xchr[@'70]:='8';
629xchr[@'71]:='9';
630xchr[@'72]:=':';
631xchr[@'73]:=';';
632xchr[@'74]:='<';
633xchr[@'75]:='=';
634xchr[@'76]:='>';
635xchr[@'77]:='?';@/
636xchr[@'100]:='@@';
637xchr[@'101]:='A';
638xchr[@'102]:='B';
639xchr[@'103]:='C';
640xchr[@'104]:='D';
641xchr[@'105]:='E';
642xchr[@'106]:='F';
643xchr[@'107]:='G';@/
644xchr[@'110]:='H';
645xchr[@'111]:='I';
646xchr[@'112]:='J';
647xchr[@'113]:='K';
648xchr[@'114]:='L';
649xchr[@'115]:='M';
650xchr[@'116]:='N';
651xchr[@'117]:='O';@/
652xchr[@'120]:='P';
653xchr[@'121]:='Q';
654xchr[@'122]:='R';
655xchr[@'123]:='S';
656xchr[@'124]:='T';
657xchr[@'125]:='U';
658xchr[@'126]:='V';
659xchr[@'127]:='W';@/
660xchr[@'130]:='X';
661xchr[@'131]:='Y';
662xchr[@'132]:='Z';
663xchr[@'133]:='[';
664xchr[@'134]:='\';
665xchr[@'135]:=']';
666xchr[@'136]:='^';
667xchr[@'137]:='_';@/
668xchr[@'140]:='`';
669xchr[@'141]:='a';
670xchr[@'142]:='b';
671xchr[@'143]:='c';
672xchr[@'144]:='d';
673xchr[@'145]:='e';
674xchr[@'146]:='f';
675xchr[@'147]:='g';@/
676xchr[@'150]:='h';
677xchr[@'151]:='i';
678xchr[@'152]:='j';
679xchr[@'153]:='k';
680xchr[@'154]:='l';
681xchr[@'155]:='m';
682xchr[@'156]:='n';
683xchr[@'157]:='o';@/
684xchr[@'160]:='p';
685xchr[@'161]:='q';
686xchr[@'162]:='r';
687xchr[@'163]:='s';
688xchr[@'164]:='t';
689xchr[@'165]:='u';
690xchr[@'166]:='v';
691xchr[@'167]:='w';@/
692xchr[@'170]:='x';
693xchr[@'171]:='y';
694xchr[@'172]:='z';
695xchr[@'173]:='{';
696xchr[@'174]:='|';
697xchr[@'175]:='}';
698xchr[@'176]:='~';@/
699
700@ Some of the ASCII codes without visible characters have been given symbolic
701names in this program because they are used with a special meaning.
702
703@d null_code=@'0 {ASCII code that might disappear}
704@d carriage_return=@'15 {ASCII code used at end of line}
705@d invalid_code=@'177 {ASCII code that many systems prohibit in text files}
706
707@ The ASCII code is ``standard'' only to a certain extent, since many
708computer installations have found it advantageous to have ready access
709to more than 94 printing characters. Appendix~C of {\sl The \TeX book\/}
710gives a complete specification of the intended correspondence between
711characters and \TeX's internal representation.
712@:TeXbook}{\sl The \TeX book@>
713
714If \TeX\ is being used
715on a garden-variety \PASCAL\ for which only standard ASCII
716codes will appear in the input and output files, it doesn't really matter
717what codes are specified in |xchr[0..@'37]|, but the safest policy is to
718blank everything out by using the code shown below.
719
720However, other settings of |xchr| will make \TeX\ more friendly on
721computers that have an extended character set, so that users can type things
722like `\.^^Z' instead of `\.{\\ne}'. People with extended character sets can
723assign codes arbitrarily, giving an |xchr| equivalent to whatever
724characters the users of \TeX\ are allowed to have in their input files.
725It is best to make the codes correspond to the intended interpretations as
726shown in Appendix~C whenever possible; but this is not necessary. For
727example, in countries with an alphabet of more than 26 letters, it is
728usually best to map the additional letters into codes less than~@'40.
729To get the most ``permissive'' character set, change |' '| on the
730right of these assignment statements to |chr(i)|.
731@^character set dependencies@>
732@^system dependencies@>
733
734@<Set init...@>=
735for i:=0 to @'37 do xchr[i]:=' ';
736for i:=@'177 to @'377 do xchr[i]:=' ';
737
738@ The following system-independent code makes the |xord| array contain a
739suitable inverse to the information in |xchr|. Note that if |xchr[i]=xchr[j]|
740where |i<j<@'177|, the value of |xord[xchr[i]]| will turn out to be
741|j| or more; hence, standard ASCII code numbers will be used instead of
742codes below @'40 in case there is a coincidence.
743
744@<Set init...@>=
745for i:=first_text_char to last_text_char do xord[chr(i)]:=invalid_code;
746for i:=@'200 to @'377 do xord[xchr[i]]:=i;
747for i:=0 to @'176 do xord[xchr[i]]:=i;
748
749@* \[3] Input and output.
750The bane of portability is the fact that different operating systems treat
751input and output quite differently, perhaps because computer scientists
752have not given sufficient attention to this problem. People have felt somehow
753that input and output are not part of ``real'' programming. Well, it is true
754that some kinds of programming are more fun than others. With existing
755input/output conventions being so diverse and so messy, the only sources of
756joy in such parts of the code are the rare occasions when one can find a
757way to make the program a little less bad than it might have been. We have
758two choices, either to attack I/O now and get it over with, or to postpone
759I/O until near the end. Neither prospect is very attractive, so let's
760get it over with.
761
762The basic operations we need to do are (1)~inputting and outputting of
763text, to or from a file or the user's terminal; (2)~inputting and
764outputting of eight-bit bytes, to or from a file; (3)~instructing the
765operating system to initiate (``open'') or to terminate (``close'') input or
766output from a specified file; (4)~testing whether the end of an input
767file has been reached.
768
769\TeX\ needs to deal with two kinds of files.
770We shall use the term |alpha_file| for a file that contains textual data,
771and the term |byte_file| for a file that contains eight-bit binary information.
772These two types turn out to be the same on many computers, but
773sometimes there is a significant distinction, so we shall be careful to
774distinguish between them. Standard protocols for transferring
775such files from computer to computer, via high-speed networks, are
776now becoming available to more and more communities of users.
777
778The program actually makes use also of a third kind of file, called a
779|word_file|, when dumping and reloading base information for its own
780initialization.  We shall define a word file later; but it will be possible
781for us to specify simple operations on word files before they are defined.
782
783@<Types...@>=
784@!eight_bits=0..255; {unsigned one-byte quantity}
785@!alpha_file=packed file of text_char; {files that contain textual data}
786@!byte_file=packed file of eight_bits; {files that contain binary data}
787
788@ Most of what we need to do with respect to input and output can be handled
789by the I/O facilities that are standard in \PASCAL, i.e., the routines
790called |get|, |put|, |eof|, and so on. But
791standard \PASCAL\ does not allow file variables to be associated with file
792names that are determined at run time, so it cannot be used to implement
793\TeX; some sort of extension to \PASCAL's ordinary |reset| and |rewrite|
794is crucial for our purposes. We shall assume that |name_of_file| is a variable
795of an appropriate type such that the \PASCAL\ run-time system being used to
796implement \TeX\ can open a file whose external name is specified by
797|name_of_file|.
798@^system dependencies@>
799
800@<Glob...@>=
801@!name_of_file:packed array[1..file_name_size] of char;@;@/
802  {on some systems this may be a \&{record} variable}
803@!name_length:0..file_name_size;@/{this many characters are actually
804  relevant in |name_of_file| (the rest are blank)}
805
806@ The \ph\ compiler with which the present version of \TeX\ was prepared has
807extended the rules of \PASCAL\ in a very convenient way. To open file~|f|,
808we can write
809$$\vbox{\halign{#\hfil\qquad&#\hfil\cr
810|reset(f,@t\\{name}@>,'/O')|&for input;\cr
811|rewrite(f,@t\\{name}@>,'/O')|&for output.\cr}}$$
812The `\\{name}' parameter, which is of type `{\bf packed array
813$[\langle\\{any}\rangle]$ of \\{char}}', stands for the name of
814the external file that is being opened for input or output.
815Blank spaces that might appear in \\{name} are ignored.
816
817The `\.{/O}' parameter tells the operating system not to issue its own
818error messages if something goes wrong. If a file of the specified name
819cannot be found, or if such a file cannot be opened for some other reason
820(e.g., someone may already be trying to write the same file), we will have
821|@!erstat(f)<>0| after an unsuccessful |reset| or |rewrite|.  This allows
822\TeX\ to undertake appropriate corrective action.
823@:PASCAL H}{\ph@>
824@^system dependencies@>
825
826\TeX's file-opening procedures return |false| if no file identified by
827|name_of_file| could be opened.
828
829@d reset_OK(#)==erstat(#)=0
830@d rewrite_OK(#)==erstat(#)=0
831
832@p function a_open_in(var f:alpha_file):boolean;
833  {open a text file for input}
834begin reset(f,name_of_file,'/O'); a_open_in:=reset_OK(f);
835end;
836@#
837function a_open_out(var f:alpha_file):boolean;
838  {open a text file for output}
839begin rewrite(f,name_of_file,'/O'); a_open_out:=rewrite_OK(f);
840end;
841@#
842function b_open_in(var f:byte_file):boolean;
843  {open a binary file for input}
844begin reset(f,name_of_file,'/O'); b_open_in:=reset_OK(f);
845end;
846@#
847function b_open_out(var f:byte_file):boolean;
848  {open a binary file for output}
849begin rewrite(f,name_of_file,'/O'); b_open_out:=rewrite_OK(f);
850end;
851@#
852function w_open_in(var f:word_file):boolean;
853  {open a word file for input}
854begin reset(f,name_of_file,'/O'); w_open_in:=reset_OK(f);
855end;
856@#
857function w_open_out(var f:word_file):boolean;
858  {open a word file for output}
859begin rewrite(f,name_of_file,'/O'); w_open_out:=rewrite_OK(f);
860end;
861
862@ Files can be closed with the \ph\ routine `|close(f)|', which
863@:PASCAL H}{\ph@>
864@^system dependencies@>
865should be used when all input or output with respect to |f| has been completed.
866This makes |f| available to be opened again, if desired; and if |f| was used for
867output, the |close| operation makes the corresponding external file appear
868on the user's area, ready to be read.
869
870These procedures should not generate error messages if a file is
871being closed before it has been successfully opened.
872
873@p procedure a_close(var f:alpha_file); {close a text file}
874begin close(f);
875end;
876@#
877procedure b_close(var f:byte_file); {close a binary file}
878begin close(f);
879end;
880@#
881procedure w_close(var f:word_file); {close a word file}
882begin close(f);
883end;
884
885@ Binary input and output are done with \PASCAL's ordinary |get| and |put|
886procedures, so we don't have to make any other special arrangements for
887binary~I/O. Text output is also easy to do with standard \PASCAL\ routines.
888The treatment of text input is more difficult, however, because
889of the necessary translation to |ASCII_code| values.
890\TeX's conventions should be efficient, and they should
891blend nicely with the user's operating environment.
892
893@ Input from text files is read one line at a time, using a routine called
894|input_ln|. This function is defined in terms of global variables called
895|buffer|, |first|, and |last| that will be described in detail later; for
896now, it suffices for us to know that |buffer| is an array of |ASCII_code|
897values, and that |first| and |last| are indices into this array
898representing the beginning and ending of a line of text.
899
900@<Glob...@>=
901@!buffer:array[0..buf_size] of ASCII_code; {lines of characters being read}
902@!first:0..buf_size; {the first unused position in |buffer|}
903@!last:0..buf_size; {end of the line just input to |buffer|}
904@!max_buf_stack:0..buf_size; {largest index used in |buffer|}
905
906@ The |input_ln| function brings the next line of input from the specified
907file into available positions of the buffer array and returns the value
908|true|, unless the file has already been entirely read, in which case it
909returns |false| and sets |last:=first|.  In general, the |ASCII_code|
910numbers that represent the next line of the file are input into
911|buffer[first]|, |buffer[first+1]|, \dots, |buffer[last-1]|; and the
912global variable |last| is set equal to |first| plus the length of the
913line. Trailing blanks are removed from the line; thus, either |last=first|
914(in which case the line was entirely blank) or |buffer[last-1]<>" "|.
915
916An overflow error is given, however, if the normal actions of |input_ln|
917would make |last>=buf_size|; this is done so that other parts of \TeX\
918can safely look at the contents of |buffer[last+1]| without overstepping
919the bounds of the |buffer| array. Upon entry to |input_ln|, the condition
920|first<buf_size| will always hold, so that there is always room for an
921``empty'' line.
922
923The variable |max_buf_stack|, which is used to keep track of how large
924the |buf_size| parameter must be to accommodate the present job, is
925also kept up to date by |input_ln|.
926
927If the |bypass_eoln| parameter is |true|, |input_ln| will do a |get|
928before looking at the first character of the line; this skips over
929an |eoln| that was in |f^|. The procedure does not do a |get| when it
930reaches the end of the line; therefore it can be used to acquire input
931from the user's terminal as well as from ordinary text files.
932
933Standard \PASCAL\ says that a file should have |eoln| immediately
934before |eof|, but \TeX\ needs only a weaker restriction: If |eof|
935occurs in the middle of a line, the system function |eoln| should return
936a |true| result (even though |f^| will be undefined).
937
938Since the inner loop of |input_ln| is part of \TeX's ``inner loop''---each
939character of input comes in at this place---it is wise to reduce system
940overhead by making use of special routines that read in an entire array
941of characters at once, if such routines are available. The following
942code uses standard \PASCAL\ to illustrate what needs to be done, but
943finer tuning is often possible at well-developed \PASCAL\ sites.
944@^inner loop@>
945
946@p function input_ln(var f:alpha_file;@!bypass_eoln:boolean):boolean;
947  {inputs the next line or returns |false|}
948var last_nonblank:0..buf_size; {|last| with trailing blanks removed}
949begin if bypass_eoln then if not eof(f) then get(f);
950  {input the first character of the line into |f^|}
951last:=first; {cf.\ Matthew 19\thinspace:\thinspace30}
952if eof(f) then input_ln:=false
953else  begin last_nonblank:=first;
954  while not eoln(f) do
955    begin if last>=max_buf_stack then
956      begin max_buf_stack:=last+1;
957      if max_buf_stack=buf_size then
958        @<Report overflow of the input buffer, and abort@>;
959      end;
960    buffer[last]:=xord[f^]; get(f); incr(last);
961    if buffer[last-1]<>" " then last_nonblank:=last;
962    end;
963  last:=last_nonblank; input_ln:=true;
964  end;
965end;
966
967@ The user's terminal acts essentially like other files of text, except
968that it is used both for input and for output. When the terminal is
969considered an input file, the file variable is called |term_in|, and when it
970is considered an output file the file variable is |term_out|.
971@^system dependencies@>
972
973@<Glob...@>=
974@!term_in:alpha_file; {the terminal as an input file}
975@!term_out:alpha_file; {the terminal as an output file}
976
977@ Here is how to open the terminal files
978in \ph. The `\.{/I}' switch suppresses the first |get|.
979@:PASCAL H}{\ph@>
980@^system dependencies@>
981
982@d t_open_in==reset(term_in,'TTY:','/O/I') {open the terminal for text input}
983@d t_open_out==rewrite(term_out,'TTY:','/O') {open the terminal for text output}
984
985@ Sometimes it is necessary to synchronize the input/output mixture that
986happens on the user's terminal, and three system-dependent
987procedures are used for this
988purpose. The first of these, |update_terminal|, is called when we want
989to make sure that everything we have output to the terminal so far has
990actually left the computer's internal buffers and been sent.
991The second, |clear_terminal|, is called when we wish to cancel any
992input that the user may have typed ahead (since we are about to
993issue an unexpected error message). The third, |wake_up_terminal|,
994is supposed to revive the terminal if the user has disabled it by
995some instruction to the operating system.  The following macros show how
996these operations can be specified in \ph:
997@:PASCAL H}{\ph@>
998@^system dependencies@>
999
1000@d update_terminal == break(term_out) {empty the terminal output buffer}
1001@d clear_terminal == break_in(term_in,true) {clear the terminal input buffer}
1002@d wake_up_terminal == do_nothing {cancel the user's cancellation of output}
1003
1004@ We need a special routine to read the first line of \TeX\ input from
1005the user's terminal. This line is different because it is read before we
1006have opened the transcript file; there is sort of a ``chicken and
1007egg'' problem here. If the user types `\.{\\input paper}' on the first
1008line, or if some macro invoked by that line does such an \.{\\input},
1009the transcript file will be named `\.{paper.log}'; but if no \.{\\input}
1010commands are performed during the first line of terminal input, the transcript
1011file will acquire its default name `\.{texput.log}'. (The transcript file
1012will not contain error messages generated by the first line before the
1013first \.{\\input} command.)
1014@.texput@>
1015
1016The first line is even more special if we are lucky enough to have an operating
1017system that treats \TeX\ differently from a run-of-the-mill \PASCAL\ object
1018program. It's nice to let the user start running a \TeX\ job by typing
1019a command line like `\.{tex paper}'; in such a case, \TeX\ will operate
1020as if the first line of input were `\.{paper}', i.e., the first line will
1021consist of the remainder of the command line, after the part that invoked
1022\TeX.
1023
1024The first line is special also because it may be read before \TeX\ has
1025input a format file. In such cases, normal error messages cannot yet
1026be given. The following code uses concepts that will be explained later.
1027(If the \PASCAL\ compiler does not support non-local |@!goto|\unskip, the
1028@^system dependencies@>
1029statement `|goto final_end|' should be replaced by something that
1030quietly terminates the program.)
1031
1032@<Report overflow of the input buffer, and abort@>=
1033if format_ident=0 then
1034  begin write_ln(term_out,'Buffer size exceeded!'); goto final_end;
1035@.Buffer size exceeded@>
1036  end
1037else begin cur_input.loc_field:=first; cur_input.limit_field:=last-1;
1038  overflow("buffer size",buf_size);
1039@:TeX capacity exceeded buffer size}{\quad buffer size@>
1040  end
1041
1042@ Different systems have different ways to get started. But regardless of
1043what conventions are adopted, the routine that initializes the terminal
1044should satisfy the following specifications:
1045
1046\yskip\textindent{1)}It should open file |term_in| for input from the
1047  terminal. (The file |term_out| will already be open for output to the
1048  terminal.)
1049
1050\textindent{2)}If the user has given a command line, this line should be
1051  considered the first line of terminal input. Otherwise the
1052  user should be prompted with `\.{**}', and the first line of input
1053  should be whatever is typed in response.
1054
1055\textindent{3)}The first line of input, which might or might not be a
1056  command line, should appear in locations |first| to |last-1| of the
1057  |buffer| array.
1058
1059\textindent{4)}The global variable |loc| should be set so that the
1060  character to be read next by \TeX\ is in |buffer[loc]|. This
1061  character should not be blank, and we should have |loc<last|.
1062
1063\yskip\noindent(It may be necessary to prompt the user several times
1064before a non-blank line comes in. The prompt is `\.{**}' instead of the
1065later `\.*' because the meaning is slightly different: `\.{\\input}' need
1066not be typed immediately after~`\.{**}'.)
1067
1068@d loc==cur_input.loc_field {location of first unread character in |buffer|}
1069
1070@ The following program does the required initialization
1071without retrieving a possible command line.
1072It should be clear how to modify this routine to deal with command lines,
1073if the system permits them.
1074@^system dependencies@>
1075
1076@p function init_terminal:boolean; {gets the terminal input started}
1077label exit;
1078begin t_open_in;
1079loop@+begin wake_up_terminal; write(term_out,'**'); update_terminal;
1080@.**@>
1081  if not input_ln(term_in,true) then {this shouldn't happen}
1082    begin write_ln(term_out);
1083    write(term_out,'! End of file on the terminal... why?');
1084@.End of file on the terminal@>
1085    init_terminal:=false; return;
1086    end;
1087  loc:=first;
1088  while (loc<last)and(buffer[loc]=" ") do incr(loc);
1089  if loc<last then
1090    begin init_terminal:=true;
1091    return; {return unless the line was all blank}
1092    end;
1093  write_ln(term_out,'Please type the name of your input file.');
1094  end;
1095exit:end;
1096
1097@* \[4] String handling.
1098Control sequence names and diagnostic messages are variable-length strings
1099of eight-bit characters. Since \PASCAL\ does not have a well-developed string
1100mechanism, \TeX\ does all of its string processing by homegrown methods.
1101
1102Elaborate facilities for dynamic strings are not needed, so all of the
1103necessary operations can be handled with a simple data structure.
1104The array |str_pool| contains all of the (eight-bit) ASCII codes in all
1105of the strings, and the array |str_start| contains indices of the starting
1106points of each string. Strings are referred to by integer numbers, so that
1107string number |s| comprises the characters |str_pool[j]| for
1108|str_start[s]<=j<str_start[s+1]|. Additional integer variables
1109|pool_ptr| and |str_ptr| indicate the number of entries used so far
1110in |str_pool| and |str_start|, respectively; locations
1111|str_pool[pool_ptr]| and |str_start[str_ptr]| are
1112ready for the next string to be allocated.
1113
1114String numbers 0 to 255 are reserved for strings that correspond to single
1115ASCII characters. This is in accordance with the conventions of \.{WEB},
1116@.WEB@>
1117which converts single-character strings into the ASCII code number of the
1118single character involved, while it converts other strings into integers
1119and builds a string pool file. Thus, when the string constant \.{"."} appears
1120in the program below, \.{WEB} converts it into the integer 46, which is the
1121ASCII code for a period, while \.{WEB} will convert a string like \.{"hello"}
1122into some integer greater than~255. String number 46 will presumably be the
1123single character `\..'; but some ASCII codes have no standard visible
1124representation, and \TeX\ sometimes needs to be able to print an arbitrary
1125ASCII character, so the first 256 strings are used to specify exactly what
1126should be printed for each of the 256 possibilities.
1127
1128Elements of the |str_pool| array must be ASCII codes that can actually
1129be printed; i.e., they must have an |xchr| equivalent in the local
1130character set. (This restriction applies only to preloaded strings,
1131not to those generated dynamically by the user.)
1132
1133Some \PASCAL\ compilers won't pack integers into a single byte unless the
1134integers lie in the range |-128..127|. To accommodate such systems
1135we access the string pool only via macros that can easily be redefined.
1136@^system dependencies@>
1137
1138@d si(#) == # {convert from |ASCII_code| to |packed_ASCII_code|}
1139@d so(#) == # {convert from |packed_ASCII_code| to |ASCII_code|}
1140
1141@<Types...@>=
1142@!pool_pointer = 0..pool_size; {for variables that point into |str_pool|}
1143@!str_number = 0..max_strings; {for variables that point into |str_start|}
1144@!packed_ASCII_code = 0..255; {elements of |str_pool| array}
1145
1146@ @<Glob...@>=
1147@!str_pool:packed array[pool_pointer] of packed_ASCII_code; {the characters}
1148@!str_start : array[str_number] of pool_pointer; {the starting pointers}
1149@!pool_ptr : pool_pointer; {first unused position in |str_pool|}
1150@!str_ptr : str_number; {number of the current string being created}
1151@!init_pool_ptr : pool_pointer; {the starting value of |pool_ptr|}
1152@!init_str_ptr : str_number; {the starting value of |str_ptr|}
1153
1154@ Several of the elementary string operations are performed using \.{WEB}
1155macros instead of \PASCAL\ procedures, because many of the
1156operations are done quite frequently and we want to avoid the
1157overhead of procedure calls. For example, here is
1158a simple macro that computes the length of a string.
1159@.WEB@>
1160
1161@d length(#)==(str_start[#+1]-str_start[#]) {the number of characters
1162  in string number \#}
1163
1164@ The length of the current string is called |cur_length|:
1165
1166@d cur_length == (pool_ptr - str_start[str_ptr])
1167
1168@ Strings are created by appending character codes to |str_pool|.
1169The |append_char| macro, defined here, does not check to see if the
1170value of |pool_ptr| has gotten too high; this test is supposed to be
1171made before |append_char| is used. There is also a |flush_char|
1172macro, which erases the last character appended.
1173
1174To test if there is room to append |l| more characters to |str_pool|,
1175we shall write |str_room(l)|, which aborts \TeX\ and gives an
1176apologetic error message if there isn't enough room.
1177
1178@d append_char(#) == {put |ASCII_code| \# at the end of |str_pool|}
1179begin str_pool[pool_ptr]:=si(#); incr(pool_ptr);
1180end
1181@d flush_char == decr(pool_ptr) {forget the last character in the pool}
1182@d str_room(#) == {make sure that the pool hasn't overflowed}
1183  begin if pool_ptr+# > pool_size then
1184  overflow("pool size",pool_size-init_pool_ptr);
1185@:TeX capacity exceeded pool size}{\quad pool size@>
1186  end
1187
1188@ Once a sequence of characters has been appended to |str_pool|, it
1189officially becomes a string when the function |make_string| is called.
1190This function returns the identification number of the new string as its
1191value.
1192
1193@p function make_string : str_number; {current string enters the pool}
1194begin if str_ptr=max_strings then
1195  overflow("number of strings",max_strings-init_str_ptr);
1196@:TeX capacity exceeded number of strings}{\quad number of strings@>
1197incr(str_ptr); str_start[str_ptr]:=pool_ptr;
1198make_string:=str_ptr-1;
1199end;
1200
1201@ To destroy the most recently made string, we say |flush_string|.
1202
1203@d flush_string==begin decr(str_ptr); pool_ptr:=str_start[str_ptr];
1204  end
1205
1206@ The following subroutine compares string |s| with another string of the
1207same length that appears in |buffer| starting at position |k|;
1208the result is |true| if and only if the strings are equal.
1209Empirical tests indicate that |str_eq_buf| is used in such a way that
1210it tends to return |true| about 80 percent of the time.
1211
1212@p function str_eq_buf(@!s:str_number;@!k:integer):boolean;
1213  {test equality of strings}
1214label not_found; {loop exit}
1215var j: pool_pointer; {running index}
1216@!result: boolean; {result of comparison}
1217begin j:=str_start[s];
1218while j<str_start[s+1] do
1219  begin if so(str_pool[j])<>buffer[k] then
1220    begin result:=false; goto not_found;
1221    end;
1222  incr(j); incr(k);
1223  end;
1224result:=true;
1225not_found: str_eq_buf:=result;
1226end;
1227
1228@ Here is a similar routine, but it compares two strings in the string pool,
1229and it does not assume that they have the same length.
1230
1231@p function str_eq_str(@!s,@!t:str_number):boolean;
1232  {test equality of strings}
1233label not_found; {loop exit}
1234var j,@!k: pool_pointer; {running indices}
1235@!result: boolean; {result of comparison}
1236begin result:=false;
1237if length(s)<>length(t) then goto not_found;
1238j:=str_start[s]; k:=str_start[t];
1239while j<str_start[s+1] do
1240  begin if str_pool[j]<>str_pool[k] then goto not_found;
1241  incr(j); incr(k);
1242  end;
1243result:=true;
1244not_found: str_eq_str:=result;
1245end;
1246
1247@ The initial values of |str_pool|, |str_start|, |pool_ptr|,
1248and |str_ptr| are computed by the \.{INITEX} program, based in part
1249on the information that \.{WEB} has output while processing \TeX.
1250@.INITEX@>
1251@^string pool@>
1252
1253@p @!init function get_strings_started:boolean; {initializes the string pool,
1254  but returns |false| if something goes wrong}
1255label done,exit;
1256var k,@!l:0..255; {small indices or counters}
1257@!m,@!n:text_char; {characters input from |pool_file|}
1258@!g:str_number; {garbage}
1259@!a:integer; {accumulator for check sum}
1260@!c:boolean; {check sum has been checked}
1261begin pool_ptr:=0; str_ptr:=0; str_start[0]:=0;
1262@<Make the first 256 strings@>;
1263@<Read the other strings from the \.{TEX.POOL} file and return |true|,
1264  or give an error message and return |false|@>;
1265exit:end;
1266tini
1267
1268@ @d app_lc_hex(#)==l:=#;
1269  if l<10 then append_char(l+"0")@+else append_char(l-10+"a")
1270
1271@<Make the first 256...@>=
1272for k:=0 to 255 do
1273  begin if (@<Character |k| cannot be printed@>) then
1274    begin append_char("^"); append_char("^");
1275    if k<@'100 then append_char(k+@'100)
1276    else if k<@'200 then append_char(k-@'100)
1277    else begin app_lc_hex(k div 16); app_lc_hex(k mod 16);
1278      end;
1279    end
1280  else append_char(k);
1281  g:=make_string;
1282  end
1283
1284@ The first 128 strings will contain 95 standard ASCII characters, and the
1285other 33 characters will be printed in three-symbol form like `\.{\^\^A}'
1286unless a system-dependent change is made here. Installations that have
1287an extended character set, where for example |xchr[@'32]=@t\.{\'^^Z\'}@>|,
1288would like string @'32 to be the single character @'32 instead of the
1289three characters @'136, @'136, @'132 (\.{\^\^Z}). On the other hand,
1290even people with an extended character set will want to represent string
1291@'15 by \.{\^\^M}, since @'15 is |carriage_return|; the idea is to
1292produce visible strings instead of tabs or line-feeds or carriage-returns
1293or bell-rings or characters that are treated anomalously in text files.
1294
1295Unprintable characters of codes 128--255 are, similarly, rendered
1296\.{\^\^80}--\.{\^\^ff}.
1297
1298The boolean expression defined here should be |true| unless \TeX\
1299internal code number~|k| corresponds to a non-troublesome visible
1300symbol in the local character set.  An appropriate formula for the
1301extended character set recommended in {\sl The \TeX book\/} would, for
1302example, be `|k in [0,@'10..@'12,@'14,@'15,@'33,@'177..@'377]|'.
1303If character |k| cannot be printed, and |k<@'200|, then character |k+@'100| or
1304|k-@'100| must be printable; moreover, ASCII codes |[@'41..@'46,
1305@'60..@'71, @'136, @'141..@'146, @'160..@'171]| must be printable.
1306Thus, at least 81 printable characters are needed.
1307@:TeXbook}{\sl The \TeX book@>
1308@^character set dependencies@>
1309@^system dependencies@>
1310
1311@<Character |k| cannot be printed@>=
1312  (k<" ")or(k>"~")
1313
1314@ When the \.{WEB} system program called \.{TANGLE} processes the \.{TEX.WEB}
1315description that you are now reading, it outputs the \PASCAL\ program
1316\.{TEX.PAS} and also a string pool file called \.{TEX.POOL}. The \.{INITEX}
1317@.WEB@>@.INITEX@>
1318program reads the latter file, where each string appears as a two-digit decimal
1319length followed by the string itself, and the information is recorded in
1320\TeX's string memory.
1321
1322@<Glob...@>=
1323@!init @!pool_file:alpha_file; {the string-pool file output by \.{TANGLE}}
1324tini
1325
1326@ @d bad_pool(#)==begin wake_up_terminal; write_ln(term_out,#);
1327  a_close(pool_file); get_strings_started:=false; return;
1328  end
1329@<Read the other strings...@>=
1330name_of_file:=pool_name; {we needn't set |name_length|}
1331if a_open_in(pool_file) then
1332  begin c:=false;
1333  repeat @<Read one string, but return |false| if the
1334    string memory space is getting too tight for comfort@>;
1335  until c;
1336  a_close(pool_file); get_strings_started:=true;
1337  end
1338else  bad_pool('! I can''t read TEX.POOL.')
1339@.I can't read TEX.POOL@>
1340
1341@ @<Read one string...@>=
1342begin if eof(pool_file) then bad_pool('! TEX.POOL has no check sum.');
1343@.TEX.POOL has no check sum@>
1344read(pool_file,m,n); {read two digits of string length}
1345if m='*' then @<Check the pool check sum@>
1346else  begin if (xord[m]<"0")or(xord[m]>"9")or@|
1347      (xord[n]<"0")or(xord[n]>"9") then
1348    bad_pool('! TEX.POOL line doesn''t begin with two digits.');
1349@.TEX.POOL line doesn't...@>
1350  l:=xord[m]*10+xord[n]-"0"*11; {compute the length}
1351  if pool_ptr+l+string_vacancies>pool_size then
1352    bad_pool('! You have to increase POOLSIZE.');
1353@.You have to increase POOLSIZE@>
1354  for k:=1 to l do
1355    begin if eoln(pool_file) then m:=' '@+else read(pool_file,m);
1356    append_char(xord[m]);
1357    end;
1358  read_ln(pool_file); g:=make_string;
1359  end;
1360end
1361
1362@ The \.{WEB} operation \.{@@\$} denotes the value that should be at the
1363end of this \.{TEX.POOL} file; any other value means that the wrong pool
1364file has been loaded.
1365@^check sum@>
1366
1367@<Check the pool check sum@>=
1368begin a:=0; k:=1;
1369loop@+  begin if (xord[n]<"0")or(xord[n]>"9") then
1370  bad_pool('! TEX.POOL check sum doesn''t have nine digits.');
1371@.TEX.POOL check sum...@>
1372  a:=10*a+xord[n]-"0";
1373  if k=9 then goto done;
1374  incr(k); read(pool_file,n);
1375  end;
1376done: if a<>@$ then bad_pool('! TEX.POOL doesn''t match; TANGLE me again.');
1377@.TEX.POOL doesn't match@>
1378c:=true;
1379end
1380
1381@* \[5] On-line and off-line printing.
1382Messages that are sent to a user's terminal and to the transcript-log file
1383are produced by several `|print|' procedures. These procedures will
1384direct their output to a variety of places, based on the setting of
1385the global variable |selector|, which has the following possible
1386values:
1387
1388\yskip
1389\hang |term_and_log|, the normal setting, prints on the terminal and on the
1390  transcript file.
1391
1392\hang |log_only|, prints only on the transcript file.
1393
1394\hang |term_only|, prints only on the terminal.
1395
1396\hang |no_print|, doesn't print at all. This is used only in rare cases
1397  before the transcript file is open.
1398
1399\hang |pseudo|, puts output into a cyclic buffer that is used
1400  by the |show_context| routine; when we get to that routine we shall discuss
1401  the reasoning behind this curious mode.
1402
1403\hang |new_string|, appends the output to the current string in the
1404  string pool.
1405
1406\hang 0 to 15, prints on one of the sixteen files for \.{\\write} output.
1407
1408\yskip
1409\noindent The symbolic names `|term_and_log|', etc., have been assigned
1410numeric codes that satisfy the convenient relations |no_print+1=term_only|,
1411|no_print+2=log_only|, |term_only+2=log_only+1=term_and_log|.
1412
1413Three additional global variables, |tally| and |term_offset| and
1414|file_offset|, record the number of characters that have been printed
1415since they were most recently cleared to zero. We use |tally| to record
1416the length of (possibly very long) stretches of printing; |term_offset|
1417and |file_offset|, on the other hand, keep track of how many characters
1418have appeared so far on the current line that has been output to the
1419terminal or to the transcript file, respectively.
1420
1421@d no_print=16 {|selector| setting that makes data disappear}
1422@d term_only=17 {printing is destined for the terminal only}
1423@d log_only=18 {printing is destined for the transcript file only}
1424@d term_and_log=19 {normal |selector| setting}
1425@d pseudo=20 {special |selector| setting for |show_context|}
1426@d new_string=21 {printing is deflected to the string pool}
1427@d max_selector=21 {highest selector setting}
1428
1429@<Glob...@>=
1430@!log_file : alpha_file; {transcript of \TeX\ session}
1431@!selector : 0..max_selector; {where to print a message}
1432@!dig : array[0..22] of 0..15; {digits in a number being output}
1433@!tally : integer; {the number of characters recently printed}
1434@!term_offset : 0..max_print_line;
1435  {the number of characters on the current terminal line}
1436@!file_offset : 0..max_print_line;
1437  {the number of characters on the current file line}
1438@!trick_buf:array[0..error_line] of ASCII_code; {circular buffer for
1439  pseudoprinting}
1440@!trick_count: integer; {threshold for pseudoprinting, explained later}
1441@!first_count: integer; {another variable for pseudoprinting}
1442
1443@ @<Initialize the output routines@>=
1444selector:=term_only; tally:=0; term_offset:=0; file_offset:=0;
1445
1446@ Macro abbreviations for output to the terminal and to the log file are
1447defined here for convenience. Some systems need special conventions
1448for terminal output, and it is possible to adhere to those conventions
1449by changing |wterm|, |wterm_ln|, and |wterm_cr| in this section.
1450@^system dependencies@>
1451
1452@d wterm(#)==write(term_out,#)
1453@d wterm_ln(#)==write_ln(term_out,#)
1454@d wterm_cr==write_ln(term_out)
1455@d wlog(#)==write(log_file,#)
1456@d wlog_ln(#)==write_ln(log_file,#)
1457@d wlog_cr==write_ln(log_file)
1458
1459@ To end a line of text output, we call |print_ln|.
1460
1461@<Basic print...@>=
1462procedure print_ln; {prints an end-of-line}
1463begin case selector of
1464term_and_log: begin wterm_cr; wlog_cr;
1465  term_offset:=0; file_offset:=0;
1466  end;
1467log_only: begin wlog_cr; file_offset:=0;
1468  end;
1469term_only: begin wterm_cr; term_offset:=0;
1470  end;
1471no_print,pseudo,new_string: do_nothing;
1472othercases write_ln(write_file[selector])
1473endcases;@/
1474end; {|tally| is not affected}
1475
1476@ The |print_char| procedure sends one character to the desired destination,
1477using the |xchr| array to map it into an external character compatible with
1478|input_ln|. All printing comes through |print_ln| or |print_char|.
1479
1480@<Basic printing...@>=
1481procedure print_char(@!s:ASCII_code); {prints a single character}
1482label exit;
1483begin if @<Character |s| is the current new-line character@> then
1484 if selector<pseudo then
1485  begin print_ln; return;
1486  end;
1487case selector of
1488term_and_log: begin wterm(xchr[s]); wlog(xchr[s]);
1489  incr(term_offset); incr(file_offset);
1490  if term_offset=max_print_line then
1491    begin wterm_cr; term_offset:=0;
1492    end;
1493  if file_offset=max_print_line then
1494    begin wlog_cr; file_offset:=0;
1495    end;
1496  end;
1497log_only: begin wlog(xchr[s]); incr(file_offset);
1498  if file_offset=max_print_line then print_ln;
1499  end;
1500term_only: begin wterm(xchr[s]); incr(term_offset);
1501  if term_offset=max_print_line then print_ln;
1502  end;
1503no_print: do_nothing;
1504pseudo: if tally<trick_count then trick_buf[tally mod error_line]:=s;
1505new_string: begin if pool_ptr<pool_size then append_char(s);
1506  end; {we drop characters if the string space is full}
1507othercases write(write_file[selector],xchr[s])
1508endcases;@/
1509incr(tally);
1510exit:end;
1511
1512@ An entire string is output by calling |print|. Note that if we are outputting
1513the single standard ASCII character \.c, we could call |print("c")|, since
1514|"c"=99| is the number of a single-character string, as explained above. But
1515|print_char("c")| is quicker, so \TeX\ goes directly to the |print_char|
1516routine when it knows that this is safe. (The present implementation
1517assumes that it is always safe to print a visible ASCII character.)
1518@^system dependencies@>
1519
1520@<Basic print...@>=
1521procedure print(@!s:integer); {prints string |s|}
1522label exit;
1523var j:pool_pointer; {current character code position}
1524@!nl:integer; {new-line character to restore}
1525begin if s>=str_ptr then s:="???" {this can't happen}
1526@.???@>
1527else if s<256 then
1528  if s<0 then s:="???" {can't happen}
1529  else begin if selector>pseudo then
1530      begin print_char(s); return; {internal strings are not expanded}
1531      end;
1532    if (@<Character |s| is the current new-line character@>) then
1533      if selector<pseudo then
1534        begin print_ln; return;
1535        end;
1536    nl:=new_line_char; new_line_char:=-1;
1537      {temporarily disable new-line character}
1538    j:=str_start[s];
1539    while j<str_start[s+1] do
1540      begin print_char(so(str_pool[j])); incr(j);
1541      end;
1542    new_line_char:=nl; return;
1543    end;
1544j:=str_start[s];
1545while j<str_start[s+1] do
1546  begin print_char(so(str_pool[j])); incr(j);
1547  end;
1548exit:end;
1549
1550@ Control sequence names, file names, and strings constructed with
1551\.{\\string} might contain |ASCII_code| values that can't
1552be printed using |print_char|. Therefore we use |slow_print| for them:
1553
1554@<Basic print...@>=
1555procedure slow_print(@!s:integer); {prints string |s|}
1556var j:pool_pointer; {current character code position}
1557begin if (s>=str_ptr) or (s<256) then print(s)
1558else begin j:=str_start[s];
1559  while j<str_start[s+1] do
1560    begin print(so(str_pool[j])); incr(j);
1561    end;
1562  end;
1563end;
1564
1565@ Here is the very first thing that \TeX\ prints: a headline that identifies
1566the version number and format package. The |term_offset| variable is temporarily
1567incorrect, but the discrepancy is not serious since we assume that the banner
1568and format identifier together will occupy at most |max_print_line|
1569character positions.
1570
1571@<Initialize the output...@>=
1572wterm(banner);
1573if format_ident=0 then wterm_ln(' (no format preloaded)')
1574else  begin slow_print(format_ident); print_ln;
1575  end;
1576update_terminal;
1577
1578@ The procedure |print_nl| is like |print|, but it makes sure that the
1579string appears at the beginning of a new line.
1580
1581@<Basic print...@>=
1582procedure print_nl(@!s:str_number); {prints string |s| at beginning of line}
1583begin if ((term_offset>0)and(odd(selector)))or@|
1584  ((file_offset>0)and(selector>=log_only)) then print_ln;
1585print(s);
1586end;
1587
1588@ The procedure |print_esc| prints a string that is preceded by
1589the user's escape character (which is usually a backslash).
1590
1591@<Basic print...@>=
1592procedure print_esc(@!s:str_number); {prints escape character, then |s|}
1593var c:integer; {the escape character code}
1594begin  @<Set variable |c| to the current escape character@>;
1595if c>=0 then if c<256 then print(c);
1596slow_print(s);
1597end;
1598
1599@ An array of digits in the range |0..15| is printed by |print_the_digs|.
1600
1601@<Basic print...@>=
1602procedure print_the_digs(@!k:eight_bits);
1603  {prints |dig[k-1]|$\,\ldots\,$|dig[0]|}
1604begin while k>0 do
1605  begin decr(k);
1606  if dig[k]<10 then print_char("0"+dig[k])
1607  else print_char("A"-10+dig[k]);
1608  end;
1609end;
1610
1611@ The following procedure, which prints out the decimal representation of a
1612given integer |n|, has been written carefully so that it works properly
1613if |n=0| or if |(-n)| would cause overflow. It does not apply |mod| or |div|
1614to negative arguments, since such operations are not implemented consistently
1615by all \PASCAL\ compilers.
1616
1617@<Basic print...@>=
1618procedure print_int(@!n:integer); {prints an integer in decimal form}
1619var k:0..23; {index to current digit; we assume that $|n|<10^{23}$}
1620@!m:integer; {used to negate |n| in possibly dangerous cases}
1621begin k:=0;
1622if n<0 then
1623  begin print_char("-");
1624  if n>-100000000 then negate(n)
1625  else  begin m:=-1-n; n:=m div 10; m:=(m mod 10)+1; k:=1;
1626    if m<10 then dig[0]:=m
1627    else  begin dig[0]:=0; incr(n);
1628      end;
1629    end;
1630  end;
1631repeat dig[k]:=n mod 10; n:=n div 10; incr(k);
1632until n=0;
1633print_the_digs(k);
1634end;
1635
1636@ Here is a trivial procedure to print two digits; it is usually called with
1637a parameter in the range |0<=n<=99|.
1638
1639@p procedure print_two(@!n:integer); {prints two least significant digits}
1640begin n:=abs(n) mod 100; print_char("0"+(n div 10));
1641print_char("0"+(n mod 10));
1642end;
1643
1644@ Hexadecimal printing of nonnegative integers is accomplished by |print_hex|.
1645
1646@p procedure print_hex(@!n:integer);
1647  {prints a positive integer in hexadecimal form}
1648var k:0..22; {index to current digit; we assume that $0\L n<16^{22}$}
1649begin k:=0; print_char("""");
1650repeat dig[k]:=n mod 16; n:=n div 16; incr(k);
1651until n=0;
1652print_the_digs(k);
1653end;
1654
1655@ Old versions of \TeX\ needed a procedure called |print_ASCII| whose function
1656is now subsumed by |print|. We retain the old name here as a possible aid to
1657future software arch\ae ologists.
1658
1659@d print_ASCII == print
1660
1661@ Roman numerals are produced by the |print_roman_int| routine.  Readers
1662who like puzzles might enjoy trying to figure out how this tricky code
1663works; therefore no explanation will be given. Notice that 1990 yields
1664\.{mcmxc}, not \.{mxm}.
1665
1666@p procedure print_roman_int(@!n:integer);
1667label exit;
1668var j,@!k: pool_pointer; {mysterious indices into |str_pool|}
1669@!u,@!v: nonnegative_integer; {mysterious numbers}
1670begin j:=str_start["m2d5c2l5x2v5i"]; v:=1000;
1671loop@+  begin while n>=v do
1672    begin print_char(so(str_pool[j])); n:=n-v;
1673    end;
1674  if n<=0 then return; {nonpositive input produces no output}
1675  k:=j+2; u:=v div (so(str_pool[k-1])-"0");
1676  if str_pool[k-1]=si("2") then
1677    begin k:=k+2; u:=u div (so(str_pool[k-1])-"0");
1678    end;
1679  if n+u>=v then
1680    begin print_char(so(str_pool[k])); n:=n+u;
1681    end
1682  else  begin j:=j+2; v:=v div (so(str_pool[j-1])-"0");
1683    end;
1684  end;
1685exit:end;
1686
1687@ The |print| subroutine will not print a string that is still being
1688created. The following procedure will.
1689
1690@p procedure print_current_string; {prints a yet-unmade string}
1691var j:pool_pointer; {points to current character code}
1692begin j:=str_start[str_ptr];
1693while j<pool_ptr do
1694  begin print_char(so(str_pool[j])); incr(j);
1695  end;
1696end;
1697
1698@ Here is a procedure that asks the user to type a line of input,
1699assuming that the |selector| setting is either |term_only| or |term_and_log|.
1700The input is placed into locations |first| through |last-1| of the
1701|buffer| array, and echoed on the transcript file if appropriate.
1702
1703This procedure is never called when |interaction<scroll_mode|.
1704
1705@d prompt_input(#)==begin wake_up_terminal; print(#); term_input;
1706    end {prints a string and gets a line of input}
1707
1708@p procedure term_input; {gets a line from the terminal}
1709var k:0..buf_size; {index into |buffer|}
1710begin update_terminal; {now the user sees the prompt for sure}
1711if not input_ln(term_in,true) then fatal_error("End of file on the terminal!");
1712@.End of file on the terminal@>
1713term_offset:=0; {the user's line ended with \<\rm return>}
1714decr(selector); {prepare to echo the input}
1715if last<>first then for k:=first to last-1 do print(buffer[k]);
1716print_ln; incr(selector); {restore previous status}
1717end;
1718
1719@* \[6] Reporting errors.
1720When something anomalous is detected, \TeX\ typically does something like this:
1721$$\vbox{\halign{#\hfil\cr
1722|print_err("Something anomalous has been detected");|\cr
1723|help3("This is the first line of my offer to help.")|\cr
1724|("This is the second line. I'm trying to")|\cr
1725|("explain the best way for you to proceed.");|\cr
1726|error;|\cr}}$$
1727A two-line help message would be given using |help2|, etc.; these informal
1728helps should use simple vocabulary that complements the words used in the
1729official error message that was printed. (Outside the U.S.A., the help
1730messages should preferably be translated into the local vernacular. Each
1731line of help is at most 60 characters long, in the present implementation,
1732so that |max_print_line| will not be exceeded.)
1733
1734The |print_err| procedure supplies a `\.!' before the official message,
1735and makes sure that the terminal is awake if a stop is going to occur.
1736The |error| procedure supplies a `\..' after the official message, then it
1737shows the location of the error; and if |interaction=error_stop_mode|,
1738it also enters into a dialog with the user, during which time the help
1739message may be printed.
1740@^system dependencies@>
1741
1742@ The global variable |interaction| has four settings, representing increasing
1743amounts of user interaction:
1744
1745@d batch_mode=0 {omits all stops and omits terminal output}
1746@d nonstop_mode=1 {omits all stops}
1747@d scroll_mode=2 {omits error stops}
1748@d error_stop_mode=3 {stops at every opportunity to interact}
1749@d print_err(#)==begin if interaction=error_stop_mode then wake_up_terminal;
1750  print_nl("! "); print(#);
1751  end
1752
1753@<Glob...@>=
1754@!interaction:batch_mode..error_stop_mode; {current level of interaction}
1755
1756@ @<Set init...@>=interaction:=error_stop_mode;
1757
1758@ \TeX\ is careful not to call |error| when the print |selector| setting
1759might be unusual. The only possible values of |selector| at the time of
1760error messages are
1761
1762\yskip\hang|no_print| (when |interaction=batch_mode|
1763  and |log_file| not yet open);
1764
1765\hang|term_only| (when |interaction>batch_mode| and |log_file| not yet open);
1766
1767\hang|log_only| (when |interaction=batch_mode| and |log_file| is open);
1768
1769\hang|term_and_log| (when |interaction>batch_mode| and |log_file| is open).
1770
1771@<Initialize the print |selector| based on |interaction|@>=
1772if interaction=batch_mode then selector:=no_print@+else selector:=term_only
1773
1774@ A global variable |deletions_allowed| is set |false| if the |get_next|
1775routine is active when |error| is called; this ensures that |get_next|
1776and related routines like |get_token| will never be called recursively.
1777A similar interlock is provided by |set_box_allowed|.
1778@^recursion@>
1779
1780The global variable |history| records the worst level of error that
1781has been detected. It has four possible values: |spotless|, |warning_issued|,
1782|error_message_issued|, and |fatal_error_stop|.
1783
1784Another global variable, |error_count|, is increased by one when an
1785|error| occurs without an interactive dialog, and it is reset to zero at
1786the end of every paragraph.  If |error_count| reaches 100, \TeX\ decides
1787that there is no point in continuing further.
1788
1789@d spotless=0 {|history| value when nothing has been amiss yet}
1790@d warning_issued=1 {|history| value when |begin_diagnostic| has been called}
1791@d error_message_issued=2 {|history| value when |error| has been called}
1792@d fatal_error_stop=3 {|history| value when termination was premature}
1793
1794@<Glob...@>=
1795@!deletions_allowed:boolean; {is it safe for |error| to call |get_token|?}
1796@!set_box_allowed:boolean; {is it safe to do a \.{\\setbox} assignment?}
1797@!history:spotless..fatal_error_stop; {has the source input been clean so far?}
1798@!error_count:-1..100; {the number of scrolled errors since the
1799  last paragraph ended}
1800
1801@ The value of |history| is initially |fatal_error_stop|, but it will
1802be changed to |spotless| if \TeX\ survives the initialization process.
1803
1804@<Set init...@>=
1805deletions_allowed:=true; set_box_allowed:=true;
1806error_count:=0; {|history| is initialized elsewhere}
1807
1808@ Since errors can be detected almost anywhere in \TeX, we want to declare the
1809error procedures near the beginning of the program. But the error procedures
1810in turn use some other procedures, which need to be declared |forward|
1811before we get to |error| itself.
1812
1813It is possible for |error| to be called recursively if some error arises
1814when |get_token| is being used to delete a token, and/or if some fatal error
1815occurs while \TeX\ is trying to fix a non-fatal one. But such recursion
1816@^recursion@>
1817is never more than two levels deep.
1818
1819@<Error handling...@>=
1820procedure@?normalize_selector; forward;@t\2@>@/
1821procedure@?get_token; forward;@t\2@>@/
1822procedure@?term_input; forward;@t\2@>@/
1823procedure@?show_context; forward;@t\2@>@/
1824procedure@?begin_file_reading; forward;@t\2@>@/
1825procedure@?open_log_file; forward;@t\2@>@/
1826procedure@?close_files_and_terminate; forward;@t\2@>@/
1827procedure@?clear_for_error_prompt; forward;@t\2@>@/
1828procedure@?give_err_help; forward;@t\2@>@/
1829@t\4\hskip-\fontdimen2\font@>@;@+@!debug@+procedure@?debug_help;
1830  forward;@;@+gubed
1831
1832@ Individual lines of help are recorded in the array |help_line|, which
1833contains entries in positions |0..(help_ptr-1)|. They should be printed
1834in reverse order, i.e., with |help_line[0]| appearing last.
1835
1836@d hlp1(#)==help_line[0]:=#;@+end
1837@d hlp2(#)==help_line[1]:=#; hlp1
1838@d hlp3(#)==help_line[2]:=#; hlp2
1839@d hlp4(#)==help_line[3]:=#; hlp3
1840@d hlp5(#)==help_line[4]:=#; hlp4
1841@d hlp6(#)==help_line[5]:=#; hlp5
1842@d help0==help_ptr:=0 {sometimes there might be no help}
1843@d help1==@+begin help_ptr:=1; hlp1 {use this with one help line}
1844@d help2==@+begin help_ptr:=2; hlp2 {use this with two help lines}
1845@d help3==@+begin help_ptr:=3; hlp3 {use this with three help lines}
1846@d help4==@+begin help_ptr:=4; hlp4 {use this with four help lines}
1847@d help5==@+begin help_ptr:=5; hlp5 {use this with five help lines}
1848@d help6==@+begin help_ptr:=6; hlp6 {use this with six help lines}
1849
1850@<Glob...@>=
1851@!help_line:array[0..5] of str_number; {helps for the next |error|}
1852@!help_ptr:0..6; {the number of help lines present}
1853@!use_err_help:boolean; {should the |err_help| list be shown?}
1854
1855@ @<Set init...@>=
1856help_ptr:=0; use_err_help:=false;
1857
1858@ The |jump_out| procedure just cuts across all active procedure levels and
1859goes to |end_of_TEX|. This is the only nontrivial |@!goto| statement in the
1860whole program. It is used when there is no recovery from a particular error.
1861
1862Some \PASCAL\ compilers do not implement non-local |goto| statements.
1863@^system dependencies@>
1864In such cases the body of |jump_out| should simply be
1865`|close_files_and_terminate|;\thinspace' followed by a call on some system
1866procedure that quietly terminates the program.
1867
1868@<Error hand...@>=
1869procedure jump_out;
1870begin goto end_of_TEX;
1871end;
1872
1873@ Here now is the general |error| routine.
1874
1875@<Error hand...@>=
1876procedure error; {completes the job of error reporting}
1877label continue,exit;
1878var c:ASCII_code; {what the user types}
1879@!s1,@!s2,@!s3,@!s4:integer;
1880  {used to save global variables when deleting tokens}
1881begin if history<error_message_issued then history:=error_message_issued;
1882print_char("."); show_context;
1883if interaction=error_stop_mode then @<Get user's advice and |return|@>;
1884incr(error_count);
1885if error_count=100 then
1886  begin print_nl("(That makes 100 errors; please try again.)");
1887@.That makes 100 errors...@>
1888  history:=fatal_error_stop; jump_out;
1889  end;
1890@<Put help message on the transcript file@>;
1891exit:end;
1892
1893@ @<Get user's advice...@>=
1894loop@+begin continue: clear_for_error_prompt; prompt_input("? ");
1895@.?\relax@>
1896  if last=first then return;
1897  c:=buffer[first];
1898  if c>="a" then c:=c+"A"-"a"; {convert to uppercase}
1899  @<Interpret code |c| and |return| if done@>;
1900  end
1901
1902@ It is desirable to provide an `\.E' option here that gives the user
1903an easy way to return from \TeX\ to the system editor, with the offending
1904line ready to be edited. But such an extension requires some system
1905wizardry, so the present implementation simply types out the name of the
1906file that should be
1907edited and the relevant line number.
1908@^system dependencies@>
1909
1910There is a secret `\.D' option available when the debugging routines haven't
1911been commented~out.
1912@^debugging@>
1913
1914@<Interpret code |c| and |return| if done@>=
1915case c of
1916"0","1","2","3","4","5","6","7","8","9": if deletions_allowed then
1917  @<Delete \(c)|c-"0"| tokens and |goto continue|@>;
1918@t\4\4@>@;@+@!debug "D": begin debug_help; goto continue;@+end;@+gubed@/
1919"E": if base_ptr>0 then
1920  begin print_nl("You want to edit file ");
1921@.You want to edit file x@>
1922  slow_print(input_stack[base_ptr].name_field);
1923  print(" at line "); print_int(line);
1924  interaction:=scroll_mode; jump_out;
1925  end;
1926"H": @<Print the help information and |goto continue|@>;
1927"I":@<Introduce new material from the terminal and |return|@>;
1928"Q","R","S":@<Change the interaction level and |return|@>;
1929"X":begin interaction:=scroll_mode; jump_out;
1930  end;
1931othercases do_nothing
1932endcases;@/
1933@<Print the menu of available options@>
1934
1935@ @<Print the menu...@>=
1936begin print("Type <return> to proceed, S to scroll future error messages,");@/
1937@.Type <return> to proceed...@>
1938print_nl("R to run without stopping, Q to run quietly,");@/
1939print_nl("I to insert something, ");
1940if base_ptr>0 then print("E to edit your file,");
1941if deletions_allowed then
1942  print_nl("1 or ... or 9 to ignore the next 1 to 9 tokens of input,");
1943print_nl("H for help, X to quit.");
1944end
1945
1946@ Here the author of \TeX\ apologizes for making use of the numerical
1947relation between |"Q"|, |"R"|, |"S"|, and the desired interaction settings
1948|batch_mode|, |nonstop_mode|, |scroll_mode|.
1949@^Knuth, Donald Ervin@>
1950
1951@<Change the interaction...@>=
1952begin error_count:=0; interaction:=batch_mode+c-"Q";
1953print("OK, entering ");
1954case c of
1955"Q":begin print_esc("batchmode"); decr(selector);
1956  end;
1957"R":print_esc("nonstopmode");
1958"S":print_esc("scrollmode");
1959end; {there are no other cases}
1960print("..."); print_ln; update_terminal; return;
1961end
1962
1963@ When the following code is executed, |buffer[(first+1)..(last-1)]| may
1964contain the material inserted by the user; otherwise another prompt will
1965be given. In order to understand this part of the program fully, you need
1966to be familiar with \TeX's input stacks.
1967
1968@<Introduce new material...@>=
1969begin begin_file_reading; {enter a new syntactic level for terminal input}
1970{now |state=mid_line|, so an initial blank space will count as a blank}
1971if last>first+1 then
1972  begin loc:=first+1; buffer[first]:=" ";
1973  end
1974else  begin prompt_input("insert>"); loc:=first;
1975@.insert>@>
1976  end;
1977first:=last;
1978cur_input.limit_field:=last-1; {no |end_line_char| ends this line}
1979return;
1980end
1981
1982@ We allow deletion of up to 99 tokens at a time.
1983
1984@<Delete \(c)|c-"0"| tokens...@>=
1985begin s1:=cur_tok; s2:=cur_cmd; s3:=cur_chr; s4:=align_state;
1986align_state:=1000000; OK_to_interrupt:=false;
1987if (last>first+1) and (buffer[first+1]>="0")and(buffer[first+1]<="9") then
1988  c:=c*10+buffer[first+1]-"0"*11
1989else c:=c-"0";
1990while c>0 do
1991  begin get_token; {one-level recursive call of |error| is possible}
1992  decr(c);
1993  end;
1994cur_tok:=s1; cur_cmd:=s2; cur_chr:=s3; align_state:=s4; OK_to_interrupt:=true;
1995help2("I have just deleted some text, as you asked.")@/
1996("You can now delete more, or insert, or whatever.");
1997show_context; goto continue;
1998end
1999
2000@ @<Print the help info...@>=
2001begin if use_err_help then
2002  begin give_err_help; use_err_help:=false;
2003  end
2004else  begin if help_ptr=0 then
2005    help2("Sorry, I don't know how to help in this situation.")@/
2006    @t\kern1em@>("Maybe you should try asking a human?");
2007  repeat decr(help_ptr); print(help_line[help_ptr]); print_ln;
2008  until help_ptr=0;
2009  end;
2010help4("Sorry, I already gave what help I could...")@/
2011  ("Maybe you should try asking a human?")@/
2012  ("An error might have occurred before I noticed any problems.")@/
2013  ("``If all else fails, read the instructions.''");@/
2014goto continue;
2015end
2016
2017@ @<Put help message on the transcript file@>=
2018if interaction>batch_mode then decr(selector); {avoid terminal output}
2019if use_err_help then
2020  begin print_ln; give_err_help;
2021  end
2022else while help_ptr>0 do
2023  begin decr(help_ptr); print_nl(help_line[help_ptr]);
2024  end;
2025print_ln;
2026if interaction>batch_mode then incr(selector); {re-enable terminal output}
2027print_ln
2028
2029@ A dozen or so error messages end with a parenthesized integer, so we
2030save a teeny bit of program space by declaring the following procedure:
2031
2032@p procedure int_error(@!n:integer);
2033begin print(" ("); print_int(n); print_char(")"); error;
2034end;
2035
2036@ In anomalous cases, the print selector might be in an unknown state;
2037the following subroutine is called to fix things just enough to keep
2038running a bit longer.
2039
2040@p procedure normalize_selector;
2041begin if log_opened then selector:=term_and_log
2042else selector:=term_only;
2043if job_name=0 then open_log_file;
2044if interaction=batch_mode then decr(selector);
2045end;
2046
2047@ The following procedure prints \TeX's last words before dying.
2048
2049@d succumb==begin if interaction=error_stop_mode then
2050    interaction:=scroll_mode; {no more interaction}
2051  if log_opened then error;
2052  @!debug if interaction>batch_mode then debug_help;@+gubed@;@/
2053  history:=fatal_error_stop; jump_out; {irrecoverable error}
2054  end
2055
2056@<Error hand...@>=
2057procedure fatal_error(@!s:str_number); {prints |s|, and that's it}
2058begin normalize_selector;@/
2059print_err("Emergency stop"); help1(s); succumb;
2060@.Emergency stop@>
2061end;
2062
2063@ Here is the most dreaded error message.
2064
2065@<Error hand...@>=
2066procedure overflow(@!s:str_number;@!n:integer); {stop due to finiteness}
2067begin normalize_selector;
2068print_err("TeX capacity exceeded, sorry [");
2069@.TeX capacity exceeded ...@>
2070print(s); print_char("="); print_int(n); print_char("]");
2071help2("If you really absolutely need more capacity,")@/
2072  ("you can ask a wizard to enlarge me.");
2073succumb;
2074end;
2075
2076@ The program might sometime run completely amok, at which point there is
2077no choice but to stop. If no previous error has been detected, that's bad
2078news; a message is printed that is really intended for the \TeX\
2079maintenance person instead of the user (unless the user has been
2080particularly diabolical).  The index entries for `this can't happen' may
2081help to pinpoint the problem.
2082@^dry rot@>
2083
2084@<Error hand...@>=
2085procedure confusion(@!s:str_number);
2086  {consistency check violated; |s| tells where}
2087begin normalize_selector;
2088if history<error_message_issued then
2089  begin print_err("This can't happen ("); print(s); print_char(")");
2090@.This can't happen@>
2091  help1("I'm broken. Please show this to someone who can fix can fix");
2092  end
2093else  begin print_err("I can't go on meeting you like this");
2094@.I can't go on...@>
2095  help2("One of your faux pas seems to have wounded me deeply...")@/
2096    ("in fact, I'm barely conscious. Please fix it and try again.");
2097  end;
2098succumb;
2099end;
2100
2101@ Users occasionally want to interrupt \TeX\ while it's running.
2102If the \PASCAL\ runtime system allows this, one can implement
2103a routine that sets the global variable |interrupt| to some nonzero value
2104when such an interrupt is signalled. Otherwise there is probably at least
2105a way to make |interrupt| nonzero using the \PASCAL\ debugger.
2106@^system dependencies@>
2107@^debugging@>
2108
2109@d check_interrupt==begin if interrupt<>0 then pause_for_instructions;
2110  end
2111
2112@<Global...@>=
2113@!interrupt:integer; {should \TeX\ pause for instructions?}
2114@!OK_to_interrupt:boolean; {should interrupts be observed?}
2115
2116@ @<Set init...@>=
2117interrupt:=0; OK_to_interrupt:=true;
2118
2119@ When an interrupt has been detected, the program goes into its
2120highest interaction level and lets the user have nearly the full flexibility of
2121the |error| routine.  \TeX\ checks for interrupts only at times when it is
2122safe to do this.
2123
2124@p procedure pause_for_instructions;
2125begin if OK_to_interrupt then
2126  begin interaction:=error_stop_mode;
2127  if (selector=log_only)or(selector=no_print) then
2128    incr(selector);
2129  print_err("Interruption");
2130@.Interruption@>
2131  help3("You rang?")@/
2132  ("Try to insert some instructions for me (e.g.,`I\showlists'),")@/
2133  ("unless you just want to quit by typing `X'.");
2134  deletions_allowed:=false; error; deletions_allowed:=true;
2135  interrupt:=0;
2136  end;
2137end;
2138
2139@* \[7] Arithmetic with scaled dimensions.
2140The principal computations performed by \TeX\ are done entirely in terms of
2141integers less than $2^{31}$ in magnitude; and divisions are done only when both
2142dividend and divisor are nonnegative. Thus, the arithmetic specified in this
2143program can be carried out in exactly the same way on a wide variety of
2144computers, including some small ones. Why? Because the arithmetic
2145calculations need to be spelled out precisely in order to guarantee that
2146\TeX\ will produce identical output on different machines. If some
2147quantities were rounded differently in different implementations, we would
2148find that line breaks and even page breaks might occur in different places.
2149Hence the arithmetic of \TeX\ has been designed with care, and systems that
2150claim to be implementations of \TeX82 should follow precisely the
2151@:TeX82}{\TeX82@>
2152calculations as they appear in the present program.
2153
2154(Actually there are three places where \TeX\ uses |div| with a possibly negative
2155numerator. These are harmless; see |div| in the index. Also if the user
2156sets the \.{\\time} or the \.{\\year} to a negative value, some diagnostic
2157information will involve negative-numerator division. The same remarks
2158apply for |mod| as well as for |div|.)
2159
2160@ Here is a routine that calculates half of an integer, using an
2161unambiguous convention with respect to signed odd numbers.
2162
2163@p function half(@!x:integer):integer;
2164begin if odd(x) then half:=(x+1) div 2
2165else half:=x @!div 2;
2166end;
2167
2168@ Fixed-point arithmetic is done on {\sl scaled integers\/} that are multiples
2169of $2^{-16}$. In other words, a binary point is assumed to be sixteen bit
2170positions from the right end of a binary computer word.
2171
2172@d unity == @'200000 {$2^{16}$, represents 1.00000}
2173@d two == @'400000 {$2^{17}$, represents 2.00000}
2174
2175@<Types...@>=
2176@!scaled = integer; {this type is used for scaled integers}
2177@!nonnegative_integer=0..@'17777777777; {$0\L x<2^{31}$}
2178@!small_number=0..63; {this type is self-explanatory}
2179
2180@ The following function is used to create a scaled integer from a given decimal
2181fraction $(.d_0d_1\ldots d_{k-1})$, where |0<=k<=17|. The digit $d_i$ is
2182given in |dig[i]|, and the calculation produces a correctly rounded result.
2183
2184@p function round_decimals(@!k:small_number) : scaled;
2185  {converts a decimal fraction}
2186var a:integer; {the accumulator}
2187begin a:=0;
2188while k>0 do
2189  begin decr(k); a:=(a+dig[k]*two) div 10;
2190  end;
2191round_decimals:=(a+1) div 2;
2192end;
2193
2194@ Conversely, here is a procedure analogous to |print_int|. If the output
2195of this procedure is subsequently read by \TeX\ and converted by the
2196|round_decimals| routine above, it turns out that the original value will
2197be reproduced exactly; the ``simplest'' such decimal number is output,
2198but there is always at least one digit following the decimal point.
2199
2200The invariant relation in the \&{repeat} loop is that a sequence of
2201decimal digits yet to be printed will yield the original number if and only if
2202they form a fraction~$f$ in the range $s-\delta\L10\cdot2^{16}f<s$.
2203We can stop if and only if $f=0$ satisfies this condition; the loop will
2204terminate before $s$ can possibly become zero.
2205
2206@p procedure print_scaled(@!s:scaled); {prints scaled real, rounded to five
2207  digits}
2208var delta:scaled; {amount of allowable inaccuracy}
2209begin if s<0 then
2210  begin print_char("-"); negate(s); {print the sign, if negative}
2211  end;
2212print_int(s div unity); {print the integer part}
2213print_char(".");
2214s:=10*(s mod unity)+5; delta:=10;
2215repeat if delta>unity then s:=s+@'100000-50000; {round the last digit}
2216print_char("0"+(s div unity)); s:=10*(s mod unity); delta:=delta*10;
2217until s<=delta;
2218end;
2219
2220@ Physical sizes that a \TeX\ user specifies for portions of documents are
2221represented internally as scaled points. Thus, if we define an `sp' (scaled
2222@^sp@>
2223point) as a unit equal to $2^{-16}$ printer's points, every dimension
2224inside of \TeX\ is an integer number of sp. There are exactly
22254,736,286.72 sp per inch.  Users are not allowed to specify dimensions
2226larger than $2^{30}-1$ sp, which is a distance of about 18.892 feet (5.7583
2227meters); two such quantities can be added without overflow on a 32-bit
2228computer.
2229
2230The present implementation of \TeX\ does not check for overflow when
2231@^overflow in arithmetic@>
2232dimensions are added or subtracted. This could be done by inserting a
2233few dozen tests of the form `\ignorespaces|if x>=@'10000000000 then
2234@t\\{report\_overflow}@>|', but the chance of overflow is so remote that
2235such tests do not seem worthwhile.
2236
2237\TeX\ needs to do only a few arithmetic operations on scaled quantities,
2238other than addition and subtraction, and the following subroutines do most of
2239the work. A single computation might use several subroutine calls, and it is
2240desirable to avoid producing multiple error messages in case of arithmetic
2241overflow; so the routines set the global variable |arith_error| to |true|
2242instead of reporting errors directly to the user. Another global variable,
2243|remainder|, holds the remainder after a division.
2244
2245@<Glob...@>=
2246@!arith_error:boolean; {has arithmetic overflow occurred recently?}
2247@!remainder:scaled; {amount subtracted to get an exact division}
2248
2249@ The first arithmetical subroutine we need computes $nx+y$, where |x|
2250and~|y| are |scaled| and |n| is an integer. We will also use it to
2251multiply integers.
2252
2253@d nx_plus_y(#)==mult_and_add(#,@'7777777777)
2254@d mult_integers(#)==mult_and_add(#,0,@'17777777777)
2255
2256@p function mult_and_add(@!n:integer;@!x,@!y,@!max_answer:scaled):scaled;
2257begin if n<0 then
2258  begin negate(x); negate(n);
2259  end;
2260if n=0 then mult_and_add:=y
2261else if ((x<=(max_answer-y) div n)and(-x<=(max_answer+y) div n)) then
2262  mult_and_add:=n*x+y
2263else  begin arith_error:=true; mult_and_add:=0;
2264  end;
2265end;
2266
2267@ We also need to divide scaled dimensions by integers.
2268
2269@p function x_over_n(@!x:scaled;@!n:integer):scaled;
2270var negative:boolean; {should |remainder| be negated?}
2271begin negative:=false;
2272if n=0 then
2273  begin arith_error:=true; x_over_n:=0; remainder:=x;
2274  end
2275else  begin if n<0 then
2276    begin negate(x); negate(n); negative:=true;
2277    end;
2278  if x>=0 then
2279    begin x_over_n:=x div n; remainder:=x mod n;
2280    end
2281  else  begin x_over_n:=-((-x) div n); remainder:=-((-x) mod n);
2282    end;
2283  end;
2284if negative then negate(remainder);
2285end;
2286
2287@ Then comes the multiplication of a scaled number by a fraction |n/d|,
2288where |n| and |d| are nonnegative integers |<=@t$2^{16}$@>| and |d| is
2289positive. It would be too dangerous to multiply by~|n| and then divide
2290by~|d|, in separate operations, since overflow might well occur; and it
2291would be too inaccurate to divide by |d| and then multiply by |n|. Hence
2292this subroutine simulates 1.5-precision arithmetic.
2293
2294@p function xn_over_d(@!x:scaled; @!n,@!d:integer):scaled;
2295var positive:boolean; {was |x>=0|?}
2296@!t,@!u,@!v:nonnegative_integer; {intermediate quantities}
2297begin if x>=0 then positive:=true
2298else  begin negate(x); positive:=false;
2299  end;
2300t:=(x mod @'100000)*n;
2301u:=(x div @'100000)*n+(t div @'100000);
2302v:=(u mod d)*@'100000 + (t mod @'100000);
2303if u div d>=@'100000 then arith_error:=true
2304else u:=@'100000*(u div d) + (v div d);
2305if positive then
2306  begin xn_over_d:=u; remainder:=v mod d;
2307  end
2308else  begin xn_over_d:=-u; remainder:=-(v mod d);
2309  end;
2310end;
2311
2312@ The next subroutine is used to compute the ``badness'' of glue, when a
2313total~|t| is supposed to be made from amounts that sum to~|s|.  According
2314to {\sl The \TeX book}, the badness of this situation is $100(t/s)^3$;
2315however, badness is simply a heuristic, so we need not squeeze out the
2316last drop of accuracy when computing it. All we really want is an
2317approximation that has similar properties.
2318@:TeXbook}{\sl The \TeX book@>
2319
2320The actual method used to compute the badness is easier to read from the
2321program than to describe in words. It produces an integer value that is a
2322reasonably close approximation to $100(t/s)^3$, and all implementations
2323of \TeX\ should use precisely this method. Any badness of $2^{13}$ or more is
2324treated as infinitely bad, and represented by 10000.
2325
2326It is not difficult to prove that $$\hbox{|badness(t+1,s)>=badness(t,s)
2327>=badness(t,s+1)|}.$$ The badness function defined here is capable of
2328computing at most 1095 distinct values, but that is plenty.
2329
2330@d inf_bad = 10000 {infinitely bad value}
2331
2332@p function badness(@!t,@!s:scaled):halfword; {compute badness, given |t>=0|}
2333var r:integer; {approximation to $\alpha t/s$, where $\alpha^3\approx
2334  100\cdot2^{18}$}
2335begin if t=0 then badness:=0
2336else if s<=0 then badness:=inf_bad
2337else  begin if t<=7230584 then  r:=(t*297) div s {$297^3=99.94\times2^{18}$}
2338  else if s>=1663497 then r:=t div (s div 297)
2339  else r:=t;
2340  if r>1290 then badness:=inf_bad {$1290^3<2^{31}<1291^3$}
2341  else badness:=(r*r*r+@'400000) div @'1000000;
2342  end; {that was $r^3/2^{18}$, rounded to the nearest integer}
2343end;
2344
2345@ When \TeX\ ``packages'' a list into a box, it needs to calculate the
2346proportionality ratio by which the glue inside the box should stretch
2347or shrink. This calculation does not affect \TeX's decision making,
2348so the precise details of rounding, etc., in the glue calculation are not
2349of critical importance for the consistency of results on different computers.
2350
2351We shall use the type |glue_ratio| for such proportionality ratios.
2352A glue ratio should take the same amount of memory as an
2353|integer| (usually 32 bits) if it is to blend smoothly with \TeX's
2354other data structures. Thus |glue_ratio| should be equivalent to
2355|short_real| in some implementations of \PASCAL. Alternatively,
2356it is possible to deal with glue ratios using nothing but fixed-point
2357arithmetic; see {\sl TUGboat \bf3},1 (March 1982), 10--27. (But the
2358routines cited there must be modified to allow negative glue ratios.)
2359@^system dependencies@>
2360
2361@d set_glue_ratio_zero(#) == #:=0.0 {store the representation of zero ratio}
2362@d set_glue_ratio_one(#) == #:=1.0 {store the representation of unit ratio}
2363@d float(#) == # {convert from |glue_ratio| to type |real|}
2364@d unfloat(#) == # {convert from |real| to type |glue_ratio|}
2365@d float_constant(#) == #.0 {convert |integer| constant to |real|}
2366
2367@<Types...@>=
2368@!glue_ratio=real; {one-word representation of a glue expansion factor}
2369
2370@* \[8] Packed data.
2371In order to make efficient use of storage space, \TeX\ bases its major data
2372structures on a |memory_word|, which contains either a (signed) integer,
2373possibly scaled, or a (signed) |glue_ratio|, or a small number of
2374fields that are one half or one quarter of the size used for storing
2375integers.
2376
2377If |x| is a variable of type |memory_word|, it contains up to four
2378fields that can be referred to as follows:
2379$$\vbox{\halign{\hfil#&#\hfil&#\hfil\cr
2380|x|&.|int|&(an |integer|)\cr
2381|x|&.|sc|\qquad&(a |scaled| integer)\cr
2382|x|&.|gr|&(a |glue_ratio|)\cr
2383|x.hh.lh|, |x.hh|&.|rh|&(two halfword fields)\cr
2384|x.hh.b0|, |x.hh.b1|, |x.hh|&.|rh|&(two quarterword fields, one halfword
2385  field)\cr
2386|x.qqqq.b0|, |x.qqqq.b1|, |x.qqqq|&.|b2|, |x.qqqq.b3|\hskip-100pt
2387  &\qquad\qquad\qquad(four quarterword fields)\cr}}$$
2388This is somewhat cumbersome to write, and not very readable either, but
2389macros will be used to make the notation shorter and more transparent.
2390The \PASCAL\ code below gives a formal definition of |memory_word| and
2391its subsidiary types, using packed variant records. \TeX\ makes no
2392assumptions about the relative positions of the fields within a word.
2393
2394Since we are assuming 32-bit integers, a halfword must contain at least
239516 bits, and a quarterword must contain at least 8 bits.
2396@^system dependencies@>
2397But it doesn't hurt to have more bits; for example, with enough 36-bit
2398words you might be able to have |mem_max| as large as 262142, which is
2399eight times as much memory as anybody had during the first four years of
2400\TeX's existence.
2401
2402N.B.: Valuable memory space will be dreadfully wasted unless \TeX\ is compiled
2403by a \PASCAL\ that packs all of the |memory_word| variants into
2404the space of a single integer. This means, for example, that |glue_ratio|
2405words should be |short_real| instead of |real| on some computers. Some
2406\PASCAL\ compilers will pack an integer whose subrange is `|0..255|' into
2407an eight-bit field, but others insist on allocating space for an additional
2408sign bit; on such systems you can get 256 values into a quarterword only
2409if the subrange is `|-128..127|'.
2410
2411The present implementation tries to accommodate as many variations as possible,
2412so it makes few assumptions. If integers having the subrange
2413`|min_quarterword..max_quarterword|' can be packed into a quarterword,
2414and if integers having the subrange `|min_halfword..max_halfword|'
2415can be packed into a halfword, everything should work satisfactorily.
2416
2417It is usually most efficient to have |min_quarterword=min_halfword=0|,
2418so one should try to achieve this unless it causes a severe problem.
2419The values defined here are recommended for most 32-bit computers.
2420
2421@d min_quarterword=0 {smallest allowable value in a |quarterword|}
2422@d max_quarterword=255 {largest allowable value in a |quarterword|}
2423@d min_halfword==0 {smallest allowable value in a |halfword|}
2424@d max_halfword==65535 {largest allowable value in a |halfword|}
2425
2426@ Here are the inequalities that the quarterword and halfword values
2427must satisfy (or rather, the inequalities that they mustn't satisfy):
2428
2429@<Check the ``constant''...@>=
2430init if (mem_min<>mem_bot)or(mem_max<>mem_top) then bad:=10;@+tini@;@/
2431if (mem_min>mem_bot)or(mem_max<mem_top) then bad:=10;
2432if (min_quarterword>0)or(max_quarterword<127) then bad:=11;
2433if (min_halfword>0)or(max_halfword<32767) then bad:=12;
2434if (min_quarterword<min_halfword)or@|
2435  (max_quarterword>max_halfword) then bad:=13;
2436if (mem_min<min_halfword)or(mem_max>=max_halfword)or@|
2437  (mem_bot-mem_min>max_halfword+1) then bad:=14;
2438if (font_base<min_quarterword)or(font_max>max_quarterword) then bad:=15;
2439if font_max>font_base+256 then bad:=16;
2440if (save_size>max_halfword)or(max_strings>max_halfword) then bad:=17;
2441if buf_size>max_halfword then bad:=18;
2442if max_quarterword-min_quarterword<255 then bad:=19;
2443
2444@ The operation of adding or subtracting |min_quarterword| occurs quite
2445frequently in \TeX, so it is convenient to abbreviate this operation
2446by using the macros |qi| and |qo| for input and output to and from
2447quarterword format.
2448
2449The inner loop of \TeX\ will run faster with respect to compilers
2450that don't optimize expressions like `|x+0|' and `|x-0|', if these
2451macros are simplified in the obvious way when |min_quarterword=0|.
2452@^inner loop@>@^system dependencies@>
2453
2454@d qi(#)==#+min_quarterword
2455  {to put an |eight_bits| item into a quarterword}
2456@d qo(#)==#-min_quarterword
2457  {to take an |eight_bits| item out of a quarterword}
2458@d hi(#)==#+min_halfword
2459  {to put a sixteen-bit item into a halfword}
2460@d ho(#)==#-min_halfword
2461  {to take a sixteen-bit item from a halfword}
2462
2463@ The reader should study the following definitions closely:
2464@^system dependencies@>
2465
2466@d sc==int {|scaled| data is equivalent to |integer|}
2467
2468@<Types...@>=
2469@!quarterword = min_quarterword..max_quarterword; {1/4 of a word}
2470@!halfword=min_halfword..max_halfword; {1/2 of a word}
2471@!two_choices = 1..2; {used when there are two variants in a record}
2472@!four_choices = 1..4; {used when there are four variants in a record}
2473@!two_halves = packed record@;@/
2474  @!rh:halfword;
2475  case two_choices of
2476  1: (@!lh:halfword);
2477  2: (@!b0:quarterword; @!b1:quarterword);
2478  end;
2479@!four_quarters = packed record@;@/
2480  @!b0:quarterword;
2481  @!b1:quarterword;
2482  @!b2:quarterword;
2483  @!b3:quarterword;
2484  end;
2485@!memory_word = record@;@/
2486  case four_choices of
2487  1: (@!int:integer);
2488  2: (@!gr:glue_ratio);
2489  3: (@!hh:two_halves);
2490  4: (@!qqqq:four_quarters);
2491  end;
2492@!word_file = file of memory_word;
2493
2494@ When debugging, we may want to print a |memory_word| without knowing
2495what type it is; so we print it in all modes.
2496@^dirty \PASCAL@>@^debugging@>
2497
2498@p @!debug procedure print_word(@!w:memory_word);
2499  {prints |w| in all ways}
2500begin print_int(w.int); print_char(" ");@/
2501print_scaled(w.sc); print_char(" ");@/
2502print_scaled(round(unity*float(w.gr))); print_ln;@/
2503@^real multiplication@>
2504print_int(w.hh.lh); print_char("="); print_int(w.hh.b0); print_char(":");
2505print_int(w.hh.b1); print_char(";"); print_int(w.hh.rh); print_char(" ");@/
2506print_int(w.qqqq.b0); print_char(":"); print_int(w.qqqq.b1); print_char(":");
2507print_int(w.qqqq.b2); print_char(":"); print_int(w.qqqq.b3);
2508end;
2509gubed
2510
2511@* \[9] Dynamic memory allocation.
2512The \TeX\ system does nearly all of its own memory allocation, so that it
2513can readily be transported into environments that do not have automatic
2514facilities for strings, garbage collection, etc., and so that it can be in
2515control of what error messages the user receives. The dynamic storage
2516requirements of \TeX\ are handled by providing a large array |mem| in
2517which consecutive blocks of words are used as nodes by the \TeX\ routines.
2518
2519Pointer variables are indices into this array, or into another array
2520called |eqtb| that will be explained later. A pointer variable might
2521also be a special flag that lies outside the bounds of |mem|, so we
2522allow pointers to assume any |halfword| value. The minimum halfword
2523value represents a null pointer. \TeX\ does not assume that |mem[null]| exists.
2524
2525@d pointer==halfword {a flag or a location in |mem| or |eqtb|}
2526@d null==min_halfword {the null pointer}
2527
2528@<Glob...@>=
2529@!temp_ptr:pointer; {a pointer variable for occasional emergency use}
2530
2531@ The |mem| array is divided into two regions that are allocated separately,
2532but the dividing line between these two regions is not fixed; they grow
2533together until finding their ``natural'' size in a particular job.
2534Locations less than or equal to |lo_mem_max| are used for storing
2535variable-length records consisting of two or more words each. This region
2536is maintained using an algorithm similar to the one described in exercise
25372.5--19 of {\sl The Art of Computer Programming}. However, no size field
2538appears in the allocated nodes; the program is responsible for knowing the
2539relevant size when a node is freed. Locations greater than or equal to
2540|hi_mem_min| are used for storing one-word records; a conventional
2541\.{AVAIL} stack is used for allocation in this region.
2542
2543Locations of |mem| between |mem_bot| and |mem_top| may be dumped as part
2544of preloaded format files, by the \.{INITEX} preprocessor.
2545@.INITEX@>
2546Production versions of \TeX\ may extend the memory at both ends in order to
2547provide more space; locations between |mem_min| and |mem_bot| are always
2548used for variable-size nodes, and locations between |mem_top| and |mem_max|
2549are always used for single-word nodes.
2550
2551The key pointers that govern |mem| allocation have a prescribed order:
2552$$\advance\thickmuskip-2mu
2553\hbox{|null<=mem_min<=mem_bot<lo_mem_max<
2554  hi_mem_min<mem_top<=mem_end<=mem_max|.}$$
2555
2556Empirical tests show that the present implementation of \TeX\ tends to
2557spend about 9\pct! of its running time allocating nodes, and about 6\pct!
2558deallocating them after their use.
2559
2560@<Glob...@>=
2561@!mem : array[mem_min..mem_max] of memory_word; {the big dynamic storage area}
2562@!lo_mem_max : pointer; {the largest location of variable-size memory in use}
2563@!hi_mem_min : pointer; {the smallest location of one-word memory in use}
2564
2565@ In order to study the memory requirements of particular applications, it
2566is possible to prepare a version of \TeX\ that keeps track of current and
2567maximum memory usage. When code between the delimiters |@!stat| $\ldots$
2568|tats| is not ``commented out,'' \TeX\ will run a bit slower but it will
2569report these statistics when |tracing_stats| is sufficiently large.
2570
2571@<Glob...@>=
2572@!var_used, @!dyn_used : integer; {how much memory is in use}
2573
2574@ Let's consider the one-word memory region first, since it's the
2575simplest. The pointer variable |mem_end| holds the highest-numbered location
2576of |mem| that has ever been used. The free locations of |mem| that
2577occur between |hi_mem_min| and |mem_end|, inclusive, are of type
2578|two_halves|, and we write |info(p)| and |link(p)| for the |lh|
2579and |rh| fields of |mem[p]| when it is of this type. The single-word
2580free locations form a linked list
2581$$|avail|,\;\hbox{|link(avail)|},\;\hbox{|link(link(avail))|},\;\ldots$$
2582terminated by |null|.
2583
2584@d link(#) == mem[#].hh.rh {the |link| field of a memory word}
2585@d info(#) == mem[#].hh.lh {the |info| field of a memory word}
2586
2587@<Glob...@>=
2588@!avail : pointer; {head of the list of available one-word nodes}
2589@!mem_end : pointer; {the last one-word node used in |mem|}
2590
2591@ If memory is exhausted, it might mean that the user has forgotten
2592a right brace. We will define some procedures later that try to help
2593pinpoint the trouble.
2594
2595@p @<Declare the procedure called |show_token_list|@>@/
2596@<Declare the procedure called |runaway|@>
2597
2598@ The function |get_avail| returns a pointer to a new one-word node whose
2599|link| field is null. However, \TeX\ will halt if there is no more room left.
2600@^inner loop@>
2601
2602If the available-space list is empty, i.e., if |avail=null|,
2603we try first to increase |mem_end|. If that cannot be done, i.e., if
2604|mem_end=mem_max|, we try to decrease |hi_mem_min|. If that cannot be
2605done, i.e., if |hi_mem_min=lo_mem_max+1|, we have to quit.
2606
2607@p function get_avail : pointer; {single-word node allocation}
2608var p:pointer; {the new node being got}
2609begin p:=avail; {get top location in the |avail| stack}
2610if p<>null then avail:=link(avail) {and pop it off}
2611else if mem_end<mem_max then {or go into virgin territory}
2612  begin incr(mem_end); p:=mem_end;
2613  end
2614else   begin decr(hi_mem_min); p:=hi_mem_min;
2615  if hi_mem_min<=lo_mem_max then
2616    begin runaway; {if memory is exhausted, display possible runaway text}
2617    overflow("main memory size",mem_max+1-mem_min);
2618      {quit; all one-word nodes are busy}
2619@:TeX capacity exceeded main memory size}{\quad main memory size@>
2620    end;
2621  end;
2622link(p):=null; {provide an oft-desired initialization of the new node}
2623@!stat incr(dyn_used);@+tats@;{maintain statistics}
2624get_avail:=p;
2625end;
2626
2627@ Conversely, a one-word node is recycled by calling |free_avail|.
2628This routine is part of \TeX's ``inner loop,'' so we want it to be fast.
2629@^inner loop@>
2630
2631@d free_avail(#)== {single-word node liberation}
2632  begin link(#):=avail; avail:=#;
2633  @!stat decr(dyn_used);@+tats@/
2634  end
2635
2636@ There's also a |fast_get_avail| routine, which saves the procedure-call
2637overhead at the expense of extra programming. This routine is used in
2638the places that would otherwise account for the most calls of |get_avail|.
2639@^inner loop@>
2640
2641@d fast_get_avail(#)==@t@>@;@/
2642  begin #:=avail; {avoid |get_avail| if possible, to save time}
2643  if #=null then #:=get_avail
2644  else  begin avail:=link(#); link(#):=null;
2645    @!stat incr(dyn_used);@+tats@/
2646    end;
2647  end
2648
2649@ The procedure |flush_list(p)| frees an entire linked list of
2650one-word nodes that starts at position |p|.
2651@^inner loop@>
2652
2653@p procedure flush_list(@!p:pointer); {makes list of single-word nodes
2654  available}
2655var @!q,@!r:pointer; {list traversers}
2656begin if p<>null then
2657  begin r:=p;
2658  repeat q:=r; r:=link(r); @!stat decr(dyn_used);@+tats@/
2659  until r=null; {now |q| is the last node on the list}
2660  link(q):=avail; avail:=p;
2661  end;
2662end;
2663
2664@ The available-space list that keeps track of the variable-size portion
2665of |mem| is a nonempty, doubly-linked circular list of empty nodes,
2666pointed to by the roving pointer |rover|.
2667
2668Each empty node has size 2 or more; the first word contains the special
2669value |max_halfword| in its |link| field and the size in its |info| field;
2670the second word contains the two pointers for double linking.
2671
2672Each nonempty node also has size 2 or more. Its first word is of type
2673|two_halves|\kern-1pt, and its |link| field is never equal to |max_halfword|.
2674Otherwise there is complete flexibility with respect to the contents
2675of its other fields and its other words.
2676
2677(We require |mem_max<max_halfword| because terrible things can happen
2678when |max_halfword| appears in the |link| field of a nonempty node.)
2679
2680@d empty_flag == max_halfword {the |link| of an empty variable-size node}
2681@d is_empty(#) == (link(#)=empty_flag) {tests for empty node}
2682@d node_size == info {the size field in empty variable-size nodes}
2683@d llink(#) == info(#+1) {left link in doubly-linked list of empty nodes}
2684@d rlink(#) == link(#+1) {right link in doubly-linked list of empty nodes}
2685
2686@<Glob...@>=
2687@!rover : pointer; {points to some node in the list of empties}
2688
2689@ A call to |get_node| with argument |s| returns a pointer to a new node
2690of size~|s|, which must be 2~or more. The |link| field of the first word
2691of this new node is set to null. An overflow stop occurs if no suitable
2692space exists.
2693
2694If |get_node| is called with $s=2^{30}$, it simply merges adjacent free
2695areas and returns the value |max_halfword|.
2696
2697@p function get_node(@!s:integer):pointer; {variable-size node allocation}
2698label found,exit,restart;
2699var p:pointer; {the node currently under inspection}
2700@!q:pointer; {the node physically after node |p|}
2701@!r:integer; {the newly allocated node, or a candidate for this honor}
2702@!t:integer; {temporary register}
2703begin restart: p:=rover; {start at some free node in the ring}
2704repeat @<Try to allocate within node |p| and its physical successors,
2705  and |goto found| if allocation was possible@>;
2706@^inner loop@>
2707p:=rlink(p); {move to the next node in the ring}
2708until p=rover; {repeat until the whole list has been traversed}
2709if s=@'10000000000 then
2710  begin get_node:=max_halfword; return;
2711  end;
2712if lo_mem_max+2<hi_mem_min then if lo_mem_max+2<=mem_bot+max_halfword then
2713  @<Grow more variable-size memory and |goto restart|@>;
2714overflow("main memory size",mem_max+1-mem_min);
2715  {sorry, nothing satisfactory is left}
2716@:TeX capacity exceeded main memory size}{\quad main memory size@>
2717found: link(r):=null; {this node is now nonempty}
2718@!stat var_used:=var_used+s; {maintain usage statistics}
2719tats@;@/
2720get_node:=r;
2721exit:end;
2722
2723@ The lower part of |mem| grows by 1000 words at a time, unless
2724we are very close to going under. When it grows, we simply link
2725a new node into the available-space list. This method of controlled
2726growth helps to keep the |mem| usage consecutive when \TeX\ is
2727implemented on ``virtual memory'' systems.
2728@^virtual memory@>
2729
2730@<Grow more variable-size memory and |goto restart|@>=
2731begin if hi_mem_min-lo_mem_max>=1998 then t:=lo_mem_max+1000
2732else t:=lo_mem_max+1+(hi_mem_min-lo_mem_max) div 2;
2733  {|lo_mem_max+2<=t<hi_mem_min|}
2734p:=llink(rover); q:=lo_mem_max; rlink(p):=q; llink(rover):=q;@/
2735if t>mem_bot+max_halfword then t:=mem_bot+max_halfword;
2736rlink(q):=rover; llink(q):=p; link(q):=empty_flag; node_size(q):=t-lo_mem_max;@/
2737lo_mem_max:=t; link(lo_mem_max):=null; info(lo_mem_max):=null;
2738rover:=q; goto restart;
2739end
2740
2741@ Empirical tests show that the routine in this section performs a
2742node-merging operation about 0.75 times per allocation, on the average,
2743after which it finds that |r>p+1| about 95\pct! of the time.
2744
2745@<Try to allocate...@>=
2746q:=p+node_size(p); {find the physical successor}
2747@^inner loop@>
2748while is_empty(q) do {merge node |p| with node |q|}
2749  begin t:=rlink(q);
2750  if q=rover then rover:=t;
2751  llink(t):=llink(q); rlink(llink(q)):=t;@/
2752  q:=q+node_size(q);
2753  end;
2754r:=q-s;
2755if r>p+1 then @<Allocate from the top of node |p| and |goto found|@>;
2756if r=p then if rlink(p)<>p then
2757  @<Allocate entire node |p| and |goto found|@>;
2758node_size(p):=q-p {reset the size in case it grew}
2759
2760@ @<Allocate from the top...@>=
2761begin node_size(p):=r-p; {store the remaining size}
2762@^inner loop@>
2763rover:=p; {start searching here next time}
2764goto found;
2765end
2766
2767@ Here we delete node |p| from the ring, and let |rover| rove around.
2768
2769@<Allocate entire...@>=
2770begin rover:=rlink(p); t:=llink(p);
2771llink(rover):=t; rlink(t):=rover;
2772goto found;
2773end
2774
2775@ Conversely, when some variable-size node |p| of size |s| is no longer needed,
2776the operation |free_node(p,s)| will make its words available, by inserting
2777|p| as a new empty node just before where |rover| now points.
2778@^inner loop@>
2779
2780@p procedure free_node(@!p:pointer; @!s:halfword); {variable-size node
2781  liberation}
2782var q:pointer; {|llink(rover)|}
2783begin node_size(p):=s; link(p):=empty_flag;
2784q:=llink(rover); llink(p):=q; rlink(p):=rover; {set both links}
2785llink(rover):=p; rlink(q):=p; {insert |p| into the ring}
2786@!stat var_used:=var_used-s;@+tats@;{maintain statistics}
2787end;
2788
2789@ Just before \.{INITEX} writes out the memory, it sorts the doubly linked
2790available space list. The list is probably very short at such times, so a
2791simple insertion sort is used. The smallest available location will be
2792pointed to by |rover|, the next-smallest by |rlink(rover)|, etc.
2793
2794@p @!init procedure sort_avail; {sorts the available variable-size nodes
2795  by location}
2796var p,@!q,@!r: pointer; {indices into |mem|}
2797@!old_rover:pointer; {initial |rover| setting}
2798begin p:=get_node(@'10000000000); {merge adjacent free areas}
2799p:=rlink(rover); rlink(rover):=max_halfword; old_rover:=rover;
2800while p<>old_rover do @<Sort \(p)|p| into the list starting at |rover|
2801  and advance |p| to |rlink(p)|@>;
2802p:=rover;
2803while rlink(p)<>max_halfword do
2804  begin llink(rlink(p)):=p; p:=rlink(p);
2805  end;
2806rlink(p):=rover; llink(rover):=p;
2807end;
2808tini
2809
2810@ The following |while| loop is guaranteed to
2811terminate, since the list that starts at
2812|rover| ends with |max_halfword| during the sorting procedure.
2813
2814@<Sort \(p)|p|...@>=
2815if p<rover then
2816  begin q:=p; p:=rlink(q); rlink(q):=rover; rover:=q;
2817  end
2818else  begin q:=rover;
2819  while rlink(q)<p do q:=rlink(q);
2820  r:=rlink(p); rlink(p):=rlink(q); rlink(q):=p; p:=r;
2821  end
2822
2823@* \[10] Data structures for boxes and their friends.
2824From the computer's standpoint, \TeX's chief mission is to create
2825horizontal and vertical lists. We shall now investigate how the elements
2826of these lists are represented internally as nodes in the dynamic memory.
2827
2828A horizontal or vertical list is linked together by |link| fields in
2829the first word of each node. Individual nodes represent boxes, glue,
2830penalties, or special things like discretionary hyphens; because of this
2831variety, some nodes are longer than others, and we must distinguish different
2832kinds of nodes. We do this by putting a `|type|' field in the first word,
2833together with the link and an optional `|subtype|'.
2834
2835@d type(#) == mem[#].hh.b0 {identifies what kind of node this is}
2836@d subtype(#) == mem[#].hh.b1 {secondary identification in some cases}
2837
2838@ A |@!char_node|, which represents a single character, is the most important
2839kind of node because it accounts for the vast majority of all boxes.
2840Special precautions are therefore taken to ensure that a |char_node| does
2841not take up much memory space. Every such node is one word long, and in fact
2842it is identifiable by this property, since other kinds of nodes have at least
2843two words, and they appear in |mem| locations less than |hi_mem_min|.
2844This makes it possible to omit the |type| field in a |char_node|, leaving
2845us room for two bytes that identify a |font| and a |character| within
2846that font.
2847
2848Note that the format of a |char_node| allows for up to 256 different
2849fonts and up to 256 characters per font; but most implementations will
2850probably limit the total number of fonts to fewer than 75 per job,
2851and most fonts will stick to characters whose codes are
2852less than 128 (since higher codes
2853are more difficult to access on most keyboards).
2854
2855Extensions of \TeX\ intended for oriental languages will need even more
2856than $256\times256$ possible characters, when we consider different sizes
2857@^oriental characters@>@^Chinese characters@>@^Japanese characters@>
2858and styles of type.  It is suggested that Chinese and Japanese fonts be
2859handled by representing such characters in two consecutive |char_node|
2860entries: The first of these has |font=font_base|, and its |link| points
2861to the second;
2862the second identifies the font and the character dimensions.
2863The saving feature about oriental characters is that most of them have
2864the same box dimensions. The |character| field of the first |char_node|
2865is a ``\\{charext}'' that distinguishes between graphic symbols whose
2866dimensions are identical for typesetting purposes. (See the \MF\ manual.)
2867Such an extension of \TeX\ would not be difficult; further details are
2868left to the reader.
2869
2870In order to make sure that the |character| code fits in a quarterword,
2871\TeX\ adds the quantity |min_quarterword| to the actual code.
2872
2873Character nodes appear only in horizontal lists, never in vertical lists.
2874
2875@d is_char_node(#) == (#>=hi_mem_min)
2876  {does the argument point to a |char_node|?}
2877@d font == type {the font code in a |char_node|}
2878@d character == subtype {the character code in a |char_node|}
2879
2880@ An |hlist_node| stands for a box that was made from a horizontal list.
2881Each |hlist_node| is seven words long, and contains the following fields
2882(in addition to the mandatory |type| and |link|, which we shall not
2883mention explicitly when discussing the other node types): The |height| and
2884|width| and |depth| are scaled integers denoting the dimensions of the
2885box.  There is also a |shift_amount| field, a scaled integer indicating
2886how much this box should be lowered (if it appears in a horizontal list),
2887or how much it should be moved to the right (if it appears in a vertical
2888list). There is a |list_ptr| field, which points to the beginning of the
2889list from which this box was fabricated; if |list_ptr| is |null|, the box
2890is empty. Finally, there are three fields that represent the setting of
2891the glue:  |glue_set(p)| is a word of type |glue_ratio| that represents
2892the proportionality constant for glue setting; |glue_sign(p)| is
2893|stretching| or |shrinking| or |normal| depending on whether or not the
2894glue should stretch or shrink or remain rigid; and |glue_order(p)|
2895specifies the order of infinity to which glue setting applies (|normal|,
2896|fil|, |fill|, or |filll|). The |subtype| field is not used.
2897
2898@d hlist_node=0 {|type| of hlist nodes}
2899@d box_node_size=7 {number of words to allocate for a box node}
2900@d width_offset=1 {position of |width| field in a box node}
2901@d depth_offset=2 {position of |depth| field in a box node}
2902@d height_offset=3 {position of |height| field in a box node}
2903@d width(#) == mem[#+width_offset].sc {width of the box, in sp}
2904@d depth(#) == mem[#+depth_offset].sc {depth of the box, in sp}
2905@d height(#) == mem[#+height_offset].sc {height of the box, in sp}
2906@d shift_amount(#) == mem[#+4].sc {repositioning distance, in sp}
2907@d list_offset=5 {position of |list_ptr| field in a box node}
2908@d list_ptr(#) == link(#+list_offset) {beginning of the list inside the box}
2909@d glue_order(#) == subtype(#+list_offset) {applicable order of infinity}
2910@d glue_sign(#) == type(#+list_offset) {stretching or shrinking}
2911@d normal=0 {the most common case when several cases are named}
2912@d stretching = 1 {glue setting applies to the stretch components}
2913@d shrinking = 2 {glue setting applies to the shrink components}
2914@d glue_offset = 6 {position of |glue_set| in a box node}
2915@d glue_set(#) == mem[#+glue_offset].gr
2916  {a word of type |glue_ratio| for glue setting}
2917
2918@ The |new_null_box| function returns a pointer to an |hlist_node| in
2919which all subfields have the values corresponding to `\.{\\hbox\{\}}'.
2920The |subtype| field is set to |min_quarterword|, since that's the desired
2921|span_count| value if this |hlist_node| is changed to an |unset_node|.
2922
2923@p function new_null_box:pointer; {creates a new box node}
2924var p:pointer; {the new node}
2925begin p:=get_node(box_node_size); type(p):=hlist_node;
2926subtype(p):=min_quarterword;
2927width(p):=0; depth(p):=0; height(p):=0; shift_amount(p):=0; list_ptr(p):=null;
2928glue_sign(p):=normal; glue_order(p):=normal; set_glue_ratio_zero(glue_set(p));
2929new_null_box:=p;
2930end;
2931
2932@ A |vlist_node| is like an |hlist_node| in all respects except that it
2933contains a vertical list.
2934
2935@d vlist_node=1 {|type| of vlist nodes}
2936
2937@ A |rule_node| stands for a solid black rectangle; it has |width|,
2938|depth|, and |height| fields just as in an |hlist_node|. However, if
2939any of these dimensions is $-2^{30}$, the actual value will be determined
2940by running the rule up to the boundary of the innermost enclosing box.
2941This is called a ``running dimension.'' The |width| is never running in
2942an hlist; the |height| and |depth| are never running in a~vlist.
2943
2944@d rule_node=2 {|type| of rule nodes}
2945@d rule_node_size=4 {number of words to allocate for a rule node}
2946@d null_flag==-@'10000000000 {$-2^{30}$, signifies a missing item}
2947@d is_running(#) == (#=null_flag) {tests for a running dimension}
2948
2949@ A new rule node is delivered by the |new_rule| function. It
2950makes all the dimensions ``running,'' so you have to change the
2951ones that are not allowed to run.
2952
2953@p function new_rule:pointer;
2954var p:pointer; {the new node}
2955begin p:=get_node(rule_node_size); type(p):=rule_node;
2956subtype(p):=0; {the |subtype| is not used}
2957width(p):=null_flag; depth(p):=null_flag; height(p):=null_flag;
2958new_rule:=p;
2959end;
2960
2961@ Insertions are represented by |ins_node| records, where the |subtype|
2962indicates the corresponding box number. For example, `\.{\\insert 250}'
2963leads to an |ins_node| whose |subtype| is |250+min_quarterword|.
2964The |height| field of an |ins_node| is slightly misnamed; it actually holds
2965the natural height plus depth of the vertical list being inserted.
2966The |depth| field holds the |split_max_depth| to be used in case this
2967insertion is split, and the |split_top_ptr| points to the corresponding
2968|split_top_skip|. The |float_cost| field holds the |floating_penalty| that
2969will be used if this insertion floats to a subsequent page after a
2970split insertion of the same class.  There is one more field, the
2971|ins_ptr|, which points to the beginning of the vlist for the insertion.
2972
2973@d ins_node=3 {|type| of insertion nodes}
2974@d ins_node_size=5 {number of words to allocate for an insertion}
2975@d float_cost(#)==mem[#+1].int {the |floating_penalty| to be used}
2976@d ins_ptr(#)==info(#+4) {the vertical list to be inserted}
2977@d split_top_ptr(#)==link(#+4) {the |split_top_skip| to be used}
2978
2979@ A |mark_node| has a |mark_ptr| field that points to the reference count
2980of a token list that contains the user's \.{\\mark} text.
2981This field occupies a full word instead of a halfword, because
2982there's nothing to put in the other halfword; it is easier in \PASCAL\ to
2983use the full word than to risk leaving garbage in the unused half.
2984
2985@d mark_node=4 {|type| of a mark node}
2986@d small_node_size=2 {number of words to allocate for most node types}
2987@d mark_ptr(#)==mem[#+1].int {head of the token list for a mark}
2988
2989@ An |adjust_node|, which occurs only in horizontal lists,
2990specifies material that will be moved out into the surrounding
2991vertical list; i.e., it is used to implement \TeX's `\.{\\vadjust}'
2992operation.  The |adjust_ptr| field points to the vlist containing this
2993material.
2994
2995@d adjust_node=5 {|type| of an adjust node}
2996@d adjust_ptr==mark_ptr {vertical list to be moved out of horizontal list}
2997
2998@ A |ligature_node|, which occurs only in horizontal lists, specifies
2999a character that was fabricated from the interaction of two or more
3000actual characters.  The second word of the node, which is called the
3001|lig_char| word, contains |font| and |character| fields just as in a
3002|char_node|. The characters that generated the ligature have not been
3003forgotten, since they are needed for diagnostic messages and for
3004hyphenation; the |lig_ptr| field points to a linked list of character
3005nodes for all original characters that have been deleted. (This list
3006might be empty if the characters that generated the ligature were
3007retained in other nodes.)
3008
3009The |subtype| field is 0, plus 2 and/or 1 if the original source of the
3010ligature included implicit left and/or right boundaries.
3011
3012@d ligature_node=6 {|type| of a ligature node}
3013@d lig_char(#)==#+1 {the word where the ligature is to be found}
3014@d lig_ptr(#)==link(lig_char(#)) {the list of characters}
3015
3016@ The |new_ligature| function creates a ligature node having given
3017contents of the |font|, |character|, and |lig_ptr| fields. We also have
3018a |new_lig_item| function, which returns a two-word node having a given
3019|character| field. Such nodes are used for temporary processing as ligatures
3020are being created.
3021
3022@p function new_ligature(@!f,@!c:quarterword; @!q:pointer):pointer;
3023var p:pointer; {the new node}
3024begin p:=get_node(small_node_size); type(p):=ligature_node;
3025font(lig_char(p)):=f; character(lig_char(p)):=c; lig_ptr(p):=q;
3026subtype(p):=0; new_ligature:=p;
3027end;
3028@#
3029function new_lig_item(@!c:quarterword):pointer;
3030var p:pointer; {the new node}
3031begin p:=get_node(small_node_size); character(p):=c; lig_ptr(p):=null;
3032new_lig_item:=p;
3033end;
3034
3035@ A |disc_node|, which occurs only in horizontal lists, specifies a
3036``dis\-cretion\-ary'' line break. If such a break occurs at node |p|, the text
3037that starts at |pre_break(p)| will precede the break, the text that starts at
3038|post_break(p)| will follow the break, and text that appears in the next
3039|replace_count(p)| nodes will be ignored. For example, an ordinary
3040discretionary hyphen, indicated by `\.{\\-}', yields a |disc_node| with
3041|pre_break| pointing to a |char_node| containing a hyphen, |post_break=null|,
3042and |replace_count=0|. All three of the discretionary texts must be
3043lists that consist entirely of character, kern, box, rule, and ligature nodes.
3044
3045If |pre_break(p)=null|, the |ex_hyphen_penalty| will be charged for this
3046break.  Otherwise the |hyphen_penalty| will be charged.  The texts will
3047actually be substituted into the list by the line-breaking algorithm if it
3048decides to make the break, and the discretionary node will disappear at
3049that time; thus, the output routine sees only discretionaries that were
3050not chosen.
3051
3052@d disc_node=7 {|type| of a discretionary node}
3053@d replace_count==subtype {how many subsequent nodes to replace}
3054@d pre_break==llink {text that precedes a discretionary break}
3055@d post_break==rlink {text that follows a discretionary break}
3056
3057@p function new_disc:pointer; {creates an empty |disc_node|}
3058var p:pointer; {the new node}
3059begin p:=get_node(small_node_size); type(p):=disc_node;
3060replace_count(p):=0; pre_break(p):=null; post_break(p):=null;
3061new_disc:=p;
3062end;
3063
3064@ A |whatsit_node| is a wild card reserved for extensions to \TeX. The
3065|subtype| field in its first word says what `\\{whatsit}' it is, and
3066implicitly determines the node size (which must be 2 or more) and the
3067format of the remaining words. When a |whatsit_node| is encountered
3068in a list, special actions are invoked; knowledgeable people who are
3069careful not to mess up the rest of \TeX\ are able to make \TeX\ do new
3070things by adding code at the end of the program. For example, there
3071might be a `\TeX nicolor' extension to specify different colors of ink,
3072@^extensions to \TeX@>
3073and the whatsit node might contain the desired parameters.
3074
3075The present implementation of \TeX\ treats the features associated with
3076`\.{\\write}' and `\.{\\special}' as if they were extensions, in order to
3077illustrate how such routines might be coded. We shall defer further
3078discussion of extensions until the end of this program.
3079
3080@d whatsit_node=8 {|type| of special extension nodes}
3081
3082@ A |math_node|, which occurs only in horizontal lists, appears before and
3083after mathematical formulas. The |subtype| field is |before| before the
3084formula and |after| after it. There is a |width| field, which represents
3085the amount of surrounding space inserted by \.{\\mathsurround}.
3086
3087@d math_node=9 {|type| of a math node}
3088@d before=0 {|subtype| for math node that introduces a formula}
3089@d after=1 {|subtype| for math node that winds up a formula}
3090
3091@p function new_math(@!w:scaled;@!s:small_number):pointer;
3092var p:pointer; {the new node}
3093begin p:=get_node(small_node_size); type(p):=math_node;
3094subtype(p):=s; width(p):=w; new_math:=p;
3095end;
3096
3097@ \TeX\ makes use of the fact that |hlist_node|, |vlist_node|,
3098|rule_node|, |ins_node|, |mark_node|, |adjust_node|, |ligature_node|,
3099|disc_node|, |whatsit_node|, and |math_node| are at the low end of the
3100type codes, by permitting a break at glue in a list if and only if the
3101|type| of the previous node is less than |math_node|. Furthermore, a
3102node is discarded after a break if its type is |math_node| or~more.
3103
3104@d precedes_break(#)==(type(#)<math_node)
3105@d non_discardable(#)==(type(#)<math_node)
3106
3107@ A |glue_node| represents glue in a list. However, it is really only
3108a pointer to a separate glue specification, since \TeX\ makes use of the
3109fact that many essentially identical nodes of glue are usually present.
3110If |p| points to a |glue_node|, |glue_ptr(p)| points to
3111another packet of words that specify the stretch and shrink components, etc.
3112
3113Glue nodes also serve to represent leaders; the |subtype| is used to
3114distinguish between ordinary glue (which is called |normal|) and the three
3115kinds of leaders (which are called |a_leaders|, |c_leaders|, and |x_leaders|).
3116The |leader_ptr| field points to a rule node or to a box node containing the
3117leaders; it is set to |null| in ordinary glue nodes.
3118
3119Many kinds of glue are computed from \TeX's ``skip'' parameters, and
3120it is helpful to know which parameter has led to a particular glue node.
3121Therefore the |subtype| is set to indicate the source of glue, whenever
3122it originated as a parameter. We will be defining symbolic names for the
3123parameter numbers later (e.g., |line_skip_code=0|, |baseline_skip_code=1|,
3124etc.); it suffices for now to say that the |subtype| of parametric glue
3125will be the same as the parameter number, plus~one.
3126
3127In math formulas there are two more possibilities for the |subtype| in a
3128glue node: |mu_glue| denotes an \.{\\mskip} (where the units are scaled \.{mu}
3129instead of scaled \.{pt}); and |cond_math_glue| denotes the `\.{\\nonscript}'
3130feature that cancels the glue node immediately following if it appears
3131in a subscript.
3132
3133@d glue_node=10 {|type| of node that points to a glue specification}
3134@d cond_math_glue=98 {special |subtype| to suppress glue in the next node}
3135@d mu_glue=99 {|subtype| for math glue}
3136@d a_leaders=100 {|subtype| for aligned leaders}
3137@d c_leaders=101 {|subtype| for centered leaders}
3138@d x_leaders=102 {|subtype| for expanded leaders}
3139@d glue_ptr==llink {pointer to a glue specification}
3140@d leader_ptr==rlink {pointer to box or rule node for leaders}
3141
3142@ A glue specification has a halfword reference count in its first word,
3143@^reference counts@>
3144representing |null| plus the number of glue nodes that point to it (less one).
3145Note that the reference count appears in the same position as
3146the |link| field in list nodes; this is the field that is initialized
3147to |null| when a node is allocated, and it is also the field that is flagged
3148by |empty_flag| in empty nodes.
3149
3150Glue specifications also contain three |scaled| fields, for the |width|,
3151|stretch|, and |shrink| dimensions. Finally, there are two one-byte
3152fields called |stretch_order| and |shrink_order|; these contain the
3153orders of infinity (|normal|, |fil|, |fill|, or |filll|)
3154corresponding to the stretch and shrink values.
3155
3156@d glue_spec_size=4 {number of words to allocate for a glue specification}
3157@d glue_ref_count(#) == link(#) {reference count of a glue specification}
3158@d stretch(#) == mem[#+2].sc {the stretchability of this glob of glue}
3159@d shrink(#) == mem[#+3].sc {the shrinkability of this glob of glue}
3160@d stretch_order == type {order of infinity for stretching}
3161@d shrink_order == subtype {order of infinity for shrinking}
3162@d fil=1 {first-order infinity}
3163@d fill=2 {second-order infinity}
3164@d filll=3 {third-order infinity}
3165
3166@<Types...@>=
3167@!glue_ord=normal..filll; {infinity to the 0, 1, 2, or 3 power}
3168
3169@ Here is a function that returns a pointer to a copy of a glue spec.
3170The reference count in the copy is |null|, because there is assumed
3171to be exactly one reference to the new specification.
3172
3173@p function new_spec(@!p:pointer):pointer; {duplicates a glue specification}
3174var q:pointer; {the new spec}
3175begin q:=get_node(glue_spec_size);@/
3176mem[q]:=mem[p]; glue_ref_count(q):=null;@/
3177width(q):=width(p); stretch(q):=stretch(p); shrink(q):=shrink(p);
3178new_spec:=q;
3179end;
3180
3181@ And here's a function that creates a glue node for a given parameter
3182identified by its code number; for example,
3183|new_param_glue(line_skip_code)| returns a pointer to a glue node for the
3184current \.{\\lineskip}.
3185
3186@p function new_param_glue(@!n:small_number):pointer;
3187var p:pointer; {the new node}
3188@!q:pointer; {the glue specification}
3189begin p:=get_node(small_node_size); type(p):=glue_node; subtype(p):=n+1;
3190leader_ptr(p):=null;@/
3191q:=@<Current |mem| equivalent of glue parameter number |n|@>@t@>;
3192glue_ptr(p):=q; incr(glue_ref_count(q));
3193new_param_glue:=p;
3194end;
3195
3196@ Glue nodes that are more or less anonymous are created by |new_glue|,
3197whose argument points to a glue specification.
3198
3199@p function new_glue(@!q:pointer):pointer;
3200var p:pointer; {the new node}
3201begin p:=get_node(small_node_size); type(p):=glue_node; subtype(p):=normal;
3202leader_ptr(p):=null; glue_ptr(p):=q; incr(glue_ref_count(q));
3203new_glue:=p;
3204end;
3205
3206@ Still another subroutine is needed: This one is sort of a combination
3207of |new_param_glue| and |new_glue|. It creates a glue node for one of
3208the current glue parameters, but it makes a fresh copy of the glue
3209specification, since that specification will probably be subject to change,
3210while the parameter will stay put. The global variable |temp_ptr| is
3211set to the address of the new spec.
3212
3213@p function new_skip_param(@!n:small_number):pointer;
3214var p:pointer; {the new node}
3215begin temp_ptr:=new_spec(@<Current |mem| equivalent of glue parameter...@>);
3216p:=new_glue(temp_ptr); glue_ref_count(temp_ptr):=null; subtype(p):=n+1;
3217new_skip_param:=p;
3218end;
3219
3220@ A |kern_node| has a |width| field to specify a (normally negative)
3221amount of spacing. This spacing correction appears in horizontal lists
3222between letters like A and V when the font designer said that it looks
3223better to move them closer together or further apart. A kern node can
3224also appear in a vertical list, when its `|width|' denotes additional
3225spacing in the vertical direction. The |subtype| is either |normal| (for
3226kerns inserted from font information or math mode calculations) or |explicit|
3227(for kerns inserted from \.{\\kern} and \.{\\/} commands) or |acc_kern|
3228(for kerns inserted from non-math accents) or |mu_glue| (for kerns
3229inserted from \.{\\mkern} specifications in math formulas).
3230
3231@d kern_node=11 {|type| of a kern node}
3232@d explicit=1 {|subtype| of kern nodes from \.{\\kern} and \.{\\/}}
3233@d acc_kern=2 {|subtype| of kern nodes from accents}
3234
3235@ The |new_kern| function creates a kern node having a given width.
3236
3237@p function new_kern(@!w:scaled):pointer;
3238var p:pointer; {the new node}
3239begin p:=get_node(small_node_size); type(p):=kern_node;
3240subtype(p):=normal;
3241width(p):=w;
3242new_kern:=p;
3243end;
3244
3245@ A |penalty_node| specifies the penalty associated with line or page
3246breaking, in its |penalty| field. This field is a fullword integer, but
3247the full range of integer values is not used: Any penalty |>=10000| is
3248treated as infinity, and no break will be allowed for such high values.
3249Similarly, any penalty |<=-10000| is treated as negative infinity, and a
3250break will be forced.
3251
3252@d penalty_node=12 {|type| of a penalty node}
3253@d inf_penalty=inf_bad {``infinite'' penalty value}
3254@d eject_penalty=-inf_penalty {``negatively infinite'' penalty value}
3255@d penalty(#) == mem[#+1].int {the added cost of breaking a list here}
3256
3257@ Anyone who has been reading the last few sections of the program will
3258be able to guess what comes next.
3259
3260@p function new_penalty(@!m:integer):pointer;
3261var p:pointer; {the new node}
3262begin p:=get_node(small_node_size); type(p):=penalty_node;
3263subtype(p):=0; {the |subtype| is not used}
3264penalty(p):=m; new_penalty:=p;
3265end;
3266
3267@ You might think that we have introduced enough node types by now. Well,
3268almost, but there is one more: An |unset_node| has nearly the same format
3269as an |hlist_node| or |vlist_node|; it is used for entries in \.{\\halign}
3270or \.{\\valign} that are not yet in their final form, since the box
3271dimensions are their ``natural'' sizes before any glue adjustment has been
3272made. The |glue_set| word is not present; instead, we have a |glue_stretch|
3273field, which contains the total stretch of order |glue_order| that is
3274present in the hlist or vlist being boxed.
3275Similarly, the |shift_amount| field is replaced by a |glue_shrink| field,
3276containing the total shrink of order |glue_sign| that is present.
3277The |subtype| field is called |span_count|; an unset box typically
3278contains the data for |qo(span_count)+1| columns.
3279Unset nodes will be changed to box nodes when alignment is completed.
3280
3281@d unset_node=13 {|type| for an unset node}
3282@d glue_stretch(#)==mem[#+glue_offset].sc {total stretch in an unset node}
3283@d glue_shrink==shift_amount {total shrink in an unset node}
3284@d span_count==subtype {indicates the number of spanned columns}
3285
3286@ In fact, there are still more types coming. When we get to math formula
3287processing we will see that a |style_node| has |type=14|; and a number
3288of larger type codes will also be defined, for use in math mode only.
3289
3290@ Warning: If any changes are made to these data structure layouts, such as
3291changing any of the node sizes or even reordering the words of nodes,
3292the |copy_node_list| procedure and the memory initialization code
3293below may have to be changed. Such potentially dangerous parts of the
3294program are listed in the index under `data structure assumptions'.
3295@!@^data structure assumptions@>
3296However, other references to the nodes are made symbolically in terms of
3297the \.{WEB} macro definitions above, so that format changes will leave
3298\TeX's other algorithms intact.
3299@^system dependencies@>
3300
3301@* \[11] Memory layout.
3302Some areas of |mem| are dedicated to fixed usage, since static allocation is
3303more efficient than dynamic allocation when we can get away with it. For
3304example, locations |mem_bot| to |mem_bot+3| are always used to store the
3305specification for glue that is `\.{0pt plus 0pt minus 0pt}'. The
3306following macro definitions accomplish the static allocation by giving
3307symbolic names to the fixed positions. Static variable-size nodes appear
3308in locations |mem_bot| through |lo_mem_stat_max|, and static single-word nodes
3309appear in locations |hi_mem_stat_min| through |mem_top|, inclusive. It is
3310harmless to let |lig_trick| and |garbage| share the same location of |mem|.
3311
3312@d zero_glue==mem_bot {specification for \.{0pt plus 0pt minus 0pt}}
3313@d fil_glue==zero_glue+glue_spec_size {\.{0pt plus 1fil minus 0pt}}
3314@d fill_glue==fil_glue+glue_spec_size {\.{0pt plus 1fill minus 0pt}}
3315@d ss_glue==fill_glue+glue_spec_size {\.{0pt plus 1fil minus 1fil}}
3316@d fil_neg_glue==ss_glue+glue_spec_size {\.{0pt plus -1fil minus 0pt}}
3317@d lo_mem_stat_max==fil_neg_glue+glue_spec_size-1 {largest statically
3318  allocated word in the variable-size |mem|}
3319@#
3320@d page_ins_head==mem_top {list of insertion data for current page}
3321@d contrib_head==mem_top-1 {vlist of items not yet on current page}
3322@d page_head==mem_top-2 {vlist for current page}
3323@d temp_head==mem_top-3 {head of a temporary list of some kind}
3324@d hold_head==mem_top-4 {head of a temporary list of another kind}
3325@d adjust_head==mem_top-5 {head of adjustment list returned by |hpack|}
3326@d active==mem_top-7 {head of active list in |line_break|, needs two words}
3327@d align_head==mem_top-8 {head of preamble list for alignments}
3328@d end_span==mem_top-9 {tail of spanned-width lists}
3329@d omit_template==mem_top-10 {a constant token list}
3330@d null_list==mem_top-11 {permanently empty list}
3331@d lig_trick==mem_top-12 {a ligature masquerading as a |char_node|}
3332@d garbage==mem_top-12 {used for scrap information}
3333@d backup_head==mem_top-13 {head of token list built by |scan_keyword|}
3334@d hi_mem_stat_min==mem_top-13 {smallest statically allocated word in
3335  the one-word |mem|}
3336@d hi_mem_stat_usage=14 {the number of one-word nodes always present}
3337
3338@ The following code gets |mem| off to a good start, when \TeX\ is
3339initializing itself the slow~way.
3340
3341@<Local variables for init...@>=
3342@!k:integer; {index into |mem|, |eqtb|, etc.}
3343
3344@ @<Initialize table entries...@>=
3345for k:=mem_bot+1 to lo_mem_stat_max do mem[k].sc:=0;
3346  {all glue dimensions are zeroed}
3347@^data structure assumptions@>
3348k:=mem_bot;@+while k<=lo_mem_stat_max do
3349    {set first words of glue specifications}
3350  begin glue_ref_count(k):=null+1;
3351  stretch_order(k):=normal; shrink_order(k):=normal;
3352  k:=k+glue_spec_size;
3353  end;
3354stretch(fil_glue):=unity; stretch_order(fil_glue):=fil;@/
3355stretch(fill_glue):=unity; stretch_order(fill_glue):=fill;@/
3356stretch(ss_glue):=unity; stretch_order(ss_glue):=fil;@/
3357shrink(ss_glue):=unity; shrink_order(ss_glue):=fil;@/
3358stretch(fil_neg_glue):=-unity; stretch_order(fil_neg_glue):=fil;@/
3359rover:=lo_mem_stat_max+1;
3360link(rover):=empty_flag; {now initialize the dynamic memory}
3361node_size(rover):=1000; {which is a 1000-word available node}
3362llink(rover):=rover; rlink(rover):=rover;@/
3363lo_mem_max:=rover+1000; link(lo_mem_max):=null; info(lo_mem_max):=null;@/
3364for k:=hi_mem_stat_min to mem_top do
3365  mem[k]:=mem[lo_mem_max]; {clear list heads}
3366@<Initialize the special list heads and constant nodes@>;
3367avail:=null; mem_end:=mem_top;
3368hi_mem_min:=hi_mem_stat_min; {initialize the one-word memory}
3369var_used:=lo_mem_stat_max+1-mem_bot; dyn_used:=hi_mem_stat_usage;
3370  {initialize statistics}
3371
3372@ If \TeX\ is extended improperly, the |mem| array might get screwed up.
3373For example, some pointers might be wrong, or some ``dead'' nodes might not
3374have been freed when the last reference to them disappeared. Procedures
3375|check_mem| and |search_mem| are available to help diagnose such
3376problems. These procedures make use of two arrays called |free| and
3377|was_free| that are present only if \TeX's debugging routines have
3378been included. (You may want to decrease the size of |mem| while you
3379@^debugging@>
3380are debugging.)
3381
3382@<Glob...@>=
3383@!debug @!free: packed array [mem_min..mem_max] of boolean; {free cells}
3384@t\hskip10pt@>@!was_free: packed array [mem_min..mem_max] of boolean;
3385  {previously free cells}
3386@t\hskip10pt@>@!was_mem_end,@!was_lo_max,@!was_hi_min: pointer;
3387  {previous |mem_end|, |lo_mem_max|, and |hi_mem_min|}
3388@t\hskip10pt@>@!panicking:boolean; {do we want to check memory constantly?}
3389gubed
3390
3391@ @<Set initial...@>=
3392@!debug was_mem_end:=mem_min; {indicate that everything was previously free}
3393was_lo_max:=mem_min; was_hi_min:=mem_max;
3394panicking:=false;
3395gubed
3396
3397@ Procedure |check_mem| makes sure that the available space lists of
3398|mem| are well formed, and it optionally prints out all locations
3399that are reserved now but were free the last time this procedure was called.
3400
3401@p @!debug procedure check_mem(@!print_locs : boolean);
3402label done1,done2; {loop exits}
3403var p,@!q:pointer; {current locations of interest in |mem|}
3404@!clobbered:boolean; {is something amiss?}
3405begin for p:=mem_min to lo_mem_max do free[p]:=false; {you can probably
3406  do this faster}
3407for p:=hi_mem_min to mem_end do free[p]:=false; {ditto}
3408@<Check single-word |avail| list@>;
3409@<Check variable-size |avail| list@>;
3410@<Check flags of unavailable nodes@>;
3411if print_locs then @<Print newly busy locations@>;
3412for p:=mem_min to lo_mem_max do was_free[p]:=free[p];
3413for p:=hi_mem_min to mem_end do was_free[p]:=free[p];
3414  {|was_free:=free| might be faster}
3415was_mem_end:=mem_end; was_lo_max:=lo_mem_max; was_hi_min:=hi_mem_min;
3416end;
3417gubed
3418
3419@ @<Check single-word...@>=
3420p:=avail; q:=null; clobbered:=false;
3421while p<>null do
3422  begin if (p>mem_end)or(p<hi_mem_min) then clobbered:=true
3423  else if free[p] then clobbered:=true;
3424  if clobbered then
3425    begin print_nl("AVAIL list clobbered at ");
3426@.AVAIL list clobbered...@>
3427    print_int(q); goto done1;
3428    end;
3429  free[p]:=true; q:=p; p:=link(q);
3430  end;
3431done1:
3432
3433@ @<Check variable-size...@>=
3434p:=rover; q:=null; clobbered:=false;
3435repeat if (p>=lo_mem_max)or(p<mem_min) then clobbered:=true
3436  else if (rlink(p)>=lo_mem_max)or(rlink(p)<mem_min) then clobbered:=true
3437  else if  not(is_empty(p))or(node_size(p)<2)or@|
3438   (p+node_size(p)>lo_mem_max)or@| (llink(rlink(p))<>p) then clobbered:=true;
3439  if clobbered then
3440  begin print_nl("Double-AVAIL list clobbered at ");
3441  print_int(q); goto done2;
3442  end;
3443for q:=p to p+node_size(p)-1 do {mark all locations free}
3444  begin if free[q] then
3445    begin print_nl("Doubly free location at ");
3446@.Doubly free location...@>
3447    print_int(q); goto done2;
3448    end;
3449  free[q]:=true;
3450  end;
3451q:=p; p:=rlink(p);
3452until p=rover;
3453done2:
3454
3455@ @<Check flags...@>=
3456p:=mem_min;
3457while p<=lo_mem_max do {node |p| should not be empty}
3458  begin if is_empty(p) then
3459    begin print_nl("Bad flag at "); print_int(p);
3460@.Bad flag...@>
3461    end;
3462  while (p<=lo_mem_max) and not free[p] do incr(p);
3463  while (p<=lo_mem_max) and free[p] do incr(p);
3464  end
3465
3466@ @<Print newly busy...@>=
3467begin print_nl("New busy locs:");
3468for p:=mem_min to lo_mem_max do
3469  if not free[p] and ((p>was_lo_max) or was_free[p]) then
3470    begin print_char(" "); print_int(p);
3471    end;
3472for p:=hi_mem_min to mem_end do
3473  if not free[p] and
3474   ((p<was_hi_min) or (p>was_mem_end) or was_free[p]) then
3475    begin print_char(" "); print_int(p);
3476    end;
3477end
3478
3479@ The |search_mem| procedure attempts to answer the question ``Who points
3480to node~|p|?'' In doing so, it fetches |link| and |info| fields of |mem|
3481that might not be of type |two_halves|. Strictly speaking, this is
3482@^dirty \PASCAL@>
3483undefined in \PASCAL, and it can lead to ``false drops'' (words that seem to
3484point to |p| purely by coincidence). But for debugging purposes, we want
3485to rule out the places that do {\sl not\/} point to |p|, so a few false
3486drops are tolerable.
3487
3488@p @!debug procedure search_mem(@!p:pointer); {look for pointers to |p|}
3489var q:integer; {current position being searched}
3490begin for q:=mem_min to lo_mem_max do
3491  begin if link(q)=p then
3492    begin print_nl("LINK("); print_int(q); print_char(")");
3493    end;
3494  if info(q)=p then
3495    begin print_nl("INFO("); print_int(q); print_char(")");
3496    end;
3497  end;
3498for q:=hi_mem_min to mem_end do
3499  begin if link(q)=p then
3500    begin print_nl("LINK("); print_int(q); print_char(")");
3501    end;
3502  if info(q)=p then
3503    begin print_nl("INFO("); print_int(q); print_char(")");
3504    end;
3505  end;
3506@<Search |eqtb| for equivalents equal to |p|@>;
3507@<Search |save_stack| for equivalents that point to |p|@>;
3508@<Search |hyph_list| for pointers to |p|@>;
3509end;
3510gubed
3511
3512@* \[12] Displaying boxes.
3513We can reinforce our knowledge of the data structures just introduced
3514by considering two procedures that display a list in symbolic form.
3515The first of these, called |short_display|, is used in ``overfull box''
3516messages to give the top-level description of a list. The other one,
3517called |show_node_list|, prints a detailed description of exactly what
3518is in the data structure.
3519
3520The philosophy of |short_display| is to ignore the fine points about exactly
3521what is inside boxes, except that ligatures and discretionary breaks are
3522expanded. As a result, |short_display| is a recursive procedure, but the
3523recursion is never more than one level deep.
3524@^recursion@>
3525
3526A global variable |font_in_short_display| keeps track of the font code that
3527is assumed to be present when |short_display| begins; deviations from this
3528font will be printed.
3529
3530@<Glob...@>=
3531@!font_in_short_display:integer; {an internal font number}
3532
3533@ Boxes, rules, inserts, whatsits, marks, and things in general that are
3534sort of ``complicated'' are indicated only by printing `\.{[]}'.
3535
3536@p procedure short_display(@!p:integer); {prints highlights of list |p|}
3537var n:integer; {for replacement counts}
3538begin while p>mem_min do
3539  begin if is_char_node(p) then
3540    begin if p<=mem_end then
3541      begin if font(p)<>font_in_short_display then
3542        begin if (font(p)<font_base)or(font(p)>font_max) then
3543          print_char("*")
3544@.*\relax@>
3545        else @<Print the font identifier for |font(p)|@>;
3546        print_char(" "); font_in_short_display:=font(p);
3547        end;
3548      print_ASCII(qo(character(p)));
3549      end;
3550    end
3551  else @<Print a short indication of the contents of node |p|@>;
3552  p:=link(p);
3553  end;
3554end;
3555
3556@ @<Print a short indication of the contents of node |p|@>=
3557case type(p) of
3558hlist_node,vlist_node,ins_node,whatsit_node,mark_node,adjust_node,
3559  unset_node: print("[]");
3560rule_node: print_char("|");
3561glue_node: if glue_ptr(p)<>zero_glue then print_char(" ");
3562math_node: print_char("$");
3563ligature_node: short_display(lig_ptr(p));
3564disc_node: begin short_display(pre_break(p));
3565  short_display(post_break(p));@/
3566  n:=replace_count(p);
3567  while n>0 do
3568    begin if link(p)<>null then p:=link(p);
3569    decr(n);
3570    end;
3571  end;
3572othercases do_nothing
3573endcases
3574
3575@ The |show_node_list| routine requires some auxiliary subroutines: one to
3576print a font-and-character combination, one to print a token list without
3577its reference count, and one to print a rule dimension.
3578
3579@p procedure print_font_and_char(@!p:integer); {prints |char_node| data}
3580begin if p>mem_end then print_esc("CLOBBERED.")
3581else  begin if (font(p)<font_base)or(font(p)>font_max) then print_char("*")
3582@.*\relax@>
3583  else @<Print the font identifier for |font(p)|@>;
3584  print_char(" "); print_ASCII(qo(character(p)));
3585  end;
3586end;
3587@#
3588procedure print_mark(@!p:integer); {prints token list data in braces}
3589begin print_char("{");
3590if (p<hi_mem_min)or(p>mem_end) then print_esc("CLOBBERED.")
3591else show_token_list(link(p),null,max_print_line-10);
3592print_char("}");
3593end;
3594@#
3595procedure print_rule_dimen(@!d:scaled); {prints dimension in rule node}
3596begin if is_running(d) then print_char("*") else print_scaled(d);
3597@.*\relax@>
3598end;
3599
3600@ Then there is a subroutine that prints glue stretch and shrink, possibly
3601followed by the name of finite units:
3602
3603@p procedure print_glue(@!d:scaled;@!order:integer;@!s:str_number);
3604  {prints a glue component}
3605begin print_scaled(d);
3606if (order<normal)or(order>filll) then print("foul")
3607else if order>normal then
3608  begin print("fil");
3609  while order>fil do
3610    begin print_char("l"); decr(order);
3611    end;
3612  end
3613else if s<>0 then print(s);
3614end;
3615
3616@ The next subroutine prints a whole glue specification.
3617
3618@p procedure print_spec(@!p:integer;@!s:str_number);
3619  {prints a glue specification}
3620begin if (p<mem_min)or(p>=lo_mem_max) then print_char("*")
3621@.*\relax@>
3622else  begin print_scaled(width(p));
3623  if s<>0 then print(s);
3624  if stretch(p)<>0 then
3625    begin print(" plus "); print_glue(stretch(p),stretch_order(p),s);
3626    end;
3627  if shrink(p)<>0 then
3628    begin print(" minus "); print_glue(shrink(p),shrink_order(p),s);
3629    end;
3630  end;
3631end;
3632
3633@ We also need to declare some procedures that appear later in this
3634documentation.
3635
3636@p @<Declare procedures needed for displaying the elements of mlists@>@;
3637@<Declare the procedure called |print_skip_param|@>
3638
3639@ Since boxes can be inside of boxes, |show_node_list| is inherently recursive,
3640@^recursion@>
3641up to a given maximum number of levels.  The history of nesting is indicated
3642by the current string, which will be printed at the beginning of each line;
3643the length of this string, namely |cur_length|, is the depth of nesting.
3644
3645Recursive calls on |show_node_list| therefore use the following pattern:
3646
3647@d node_list_display(#)==
3648  begin append_char("."); show_node_list(#); flush_char;
3649  end {|str_room| need not be checked; see |show_box| below}
3650
3651@ A global variable called |depth_threshold| is used to record the maximum
3652depth of nesting for which |show_node_list| will show information.  If we
3653have |depth_threshold=0|, for example, only the top level information will
3654be given and no sublists will be traversed. Another global variable, called
3655|breadth_max|, tells the maximum number of items to show at each level;
3656|breadth_max| had better be positive, or you won't see anything.
3657
3658@<Glob...@>=
3659@!depth_threshold : integer; {maximum nesting depth in box displays}
3660@!breadth_max : integer; {maximum number of items shown at the same list level}
3661
3662@ Now we are ready for |show_node_list| itself. This procedure has been
3663written to be ``extra robust'' in the sense that it should not crash or get
3664into a loop even if the data structures have been messed up by bugs in
3665the rest of the program. You can safely call its parent routine
3666|show_box(p)| for arbitrary values of |p| when you are debugging \TeX.
3667However, in the presence of bad data, the procedure may
3668@^dirty \PASCAL@>@^debugging@>
3669fetch a |memory_word| whose variant is different from the way it was stored;
3670for example, it might try to read |mem[p].hh| when |mem[p]|
3671contains a scaled integer, if |p| is a pointer that has been
3672clobbered or chosen at random.
3673
3674@p procedure show_node_list(@!p:integer); {prints a node list symbolically}
3675label exit;
3676var n:integer; {the number of items already printed at this level}
3677@!g:real; {a glue ratio, as a floating point number}
3678begin if cur_length>depth_threshold then
3679  begin if p>null then print(" []");
3680    {indicate that there's been some truncation}
3681  return;
3682  end;
3683n:=0;
3684while p>mem_min do
3685  begin print_ln; print_current_string; {display the nesting history}
3686  if p>mem_end then {pointer out of range}
3687    begin print("Bad link, display aborted."); return;
3688@.Bad link...@>
3689    end;
3690  incr(n); if n>breadth_max then {time to stop}
3691    begin print("etc."); return;
3692@.etc@>
3693    end;
3694  @<Display node |p|@>;
3695  p:=link(p);
3696  end;
3697exit:
3698end;
3699
3700@ @<Display node |p|@>=
3701if is_char_node(p) then print_font_and_char(p)
3702else  case type(p) of
3703  hlist_node,vlist_node,unset_node: @<Display box |p|@>;
3704  rule_node: @<Display rule |p|@>;
3705  ins_node: @<Display insertion |p|@>;
3706  whatsit_node: @<Display the whatsit node |p|@>;
3707  glue_node: @<Display glue |p|@>;
3708  kern_node: @<Display kern |p|@>;
3709  math_node: @<Display math node |p|@>;
3710  ligature_node: @<Display ligature |p|@>;
3711  penalty_node: @<Display penalty |p|@>;
3712  disc_node: @<Display discretionary |p|@>;
3713  mark_node: @<Display mark |p|@>;
3714  adjust_node: @<Display adjustment |p|@>;
3715  @t\4@>@<Cases of |show_node_list| that arise in mlists only@>@;
3716  othercases print("Unknown node type!")
3717  endcases
3718
3719@ @<Display box |p|@>=
3720begin if type(p)=hlist_node then print_esc("h")
3721else if type(p)=vlist_node then print_esc("v")
3722else print_esc("unset");
3723print("box("); print_scaled(height(p)); print_char("+");
3724print_scaled(depth(p)); print(")x"); print_scaled(width(p));
3725if type(p)=unset_node then
3726  @<Display special fields of the unset node |p|@>
3727else  begin @<Display the value of |glue_set(p)|@>;
3728  if shift_amount(p)<>0 then
3729    begin print(", shifted "); print_scaled(shift_amount(p));
3730    end;
3731  end;
3732node_list_display(list_ptr(p)); {recursive call}
3733end
3734
3735@ @<Display special fields of the unset node |p|@>=
3736begin if span_count(p)<>min_quarterword then
3737  begin print(" ("); print_int(qo(span_count(p))+1);
3738  print(" columns)");
3739  end;
3740if glue_stretch(p)<>0 then
3741  begin print(", stretch "); print_glue(glue_stretch(p),glue_order(p),0);
3742  end;
3743if glue_shrink(p)<>0 then
3744  begin print(", shrink "); print_glue(glue_shrink(p),glue_sign(p),0);
3745  end;
3746end
3747
3748@ The code will have to change in this place if |glue_ratio| is
3749a structured type instead of an ordinary |real|. Note that this routine
3750should avoid arithmetic errors even if the |glue_set| field holds an
3751arbitrary random value. The following code assumes that a properly
3752formed nonzero |real| number has absolute value $2^{20}$ or more when
3753it is regarded as an integer; this precaution was adequate to prevent
3754floating point underflow on the author's computer.
3755@^system dependencies@>
3756@^dirty \PASCAL@>
3757
3758@<Display the value of |glue_set(p)|@>=
3759g:=float(glue_set(p));
3760if (g<>float_constant(0))and(glue_sign(p)<>normal) then
3761  begin print(", glue set ");
3762  if glue_sign(p)=shrinking then print("- ");
3763  if abs(mem[p+glue_offset].int)<@'4000000 then print("?.?")
3764  else if abs(g)>float_constant(20000) then
3765    begin if g>float_constant(0) then print_char(">")
3766    else print("< -");
3767    print_glue(20000*unity,glue_order(p),0);
3768    end
3769  else print_glue(round(unity*g),glue_order(p),0);
3770@^real multiplication@>
3771  end
3772
3773@ @<Display rule |p|@>=
3774begin print_esc("rule("); print_rule_dimen(height(p)); print_char("+");
3775print_rule_dimen(depth(p)); print(")x"); print_rule_dimen(width(p));
3776end
3777
3778@ @<Display insertion |p|@>=
3779begin print_esc("insert"); print_int(qo(subtype(p)));
3780print(", natural size "); print_scaled(height(p));
3781print("; split("); print_spec(split_top_ptr(p),0);
3782print_char(","); print_scaled(depth(p));
3783print("); float cost "); print_int(float_cost(p));
3784node_list_display(ins_ptr(p)); {recursive call}
3785end
3786
3787@ @<Display glue |p|@>=
3788if subtype(p)>=a_leaders then @<Display leaders |p|@>
3789else  begin print_esc("glue");
3790  if subtype(p)<>normal then
3791    begin print_char("(");
3792    if subtype(p)<cond_math_glue then
3793      print_skip_param(subtype(p)-1)
3794    else if subtype(p)=cond_math_glue then print_esc("nonscript")
3795    else print_esc("mskip");
3796    print_char(")");
3797    end;
3798  if subtype(p)<>cond_math_glue then
3799    begin print_char(" ");
3800    if subtype(p)<cond_math_glue then print_spec(glue_ptr(p),0)
3801    else print_spec(glue_ptr(p),"mu");
3802    end;
3803  end
3804
3805@ @<Display leaders |p|@>=
3806begin print_esc("");
3807if subtype(p)=c_leaders then print_char("c")
3808else if subtype(p)=x_leaders then print_char("x");
3809print("leaders "); print_spec(glue_ptr(p),0);
3810node_list_display(leader_ptr(p)); {recursive call}
3811end
3812
3813@ An ``explicit'' kern value is indicated implicitly by an explicit space.
3814
3815@<Display kern |p|@>=
3816if subtype(p)<>mu_glue then
3817  begin print_esc("kern");
3818  if subtype(p)<>normal then print_char(" ");
3819  print_scaled(width(p));
3820  if subtype(p)=acc_kern then print(" (for accent)");
3821@.for accent@>
3822  end
3823else  begin print_esc("mkern"); print_scaled(width(p)); print("mu");
3824  end
3825
3826@ @<Display math node |p|@>=
3827begin print_esc("math");
3828if subtype(p)=before then print("on")
3829else print("off");
3830if width(p)<>0 then
3831  begin print(", surrounded "); print_scaled(width(p));
3832  end;
3833end
3834
3835@ @<Display ligature |p|@>=
3836begin print_font_and_char(lig_char(p)); print(" (ligature ");
3837if subtype(p)>1 then print_char("|");
3838font_in_short_display:=font(lig_char(p)); short_display(lig_ptr(p));
3839if odd(subtype(p)) then print_char("|");
3840print_char(")");
3841end
3842
3843@ @<Display penalty |p|@>=
3844begin print_esc("penalty "); print_int(penalty(p));
3845end
3846
3847@ The |post_break| list of a discretionary node is indicated by a prefixed
3848`\.{\char'174}' instead of the `\..' before the |pre_break| list.
3849
3850@<Display discretionary |p|@>=
3851begin print_esc("discretionary");
3852if replace_count(p)>0 then
3853  begin print(" replacing "); print_int(replace_count(p));
3854  end;
3855node_list_display(pre_break(p)); {recursive call}
3856append_char("|"); show_node_list(post_break(p)); flush_char; {recursive call}
3857end
3858
3859@ @<Display mark |p|@>=
3860begin print_esc("mark"); print_mark(mark_ptr(p));
3861end
3862
3863@ @<Display adjustment |p|@>=
3864begin print_esc("vadjust"); node_list_display(adjust_ptr(p)); {recursive call}
3865end
3866
3867@ The recursive machinery is started by calling |show_box|.
3868@^recursion@>
3869
3870@p procedure show_box(@!p:pointer);
3871begin @<Assign the values |depth_threshold:=show_box_depth| and
3872  |breadth_max:=show_box_breadth|@>;
3873if breadth_max<=0 then breadth_max:=5;
3874if pool_ptr+depth_threshold>=pool_size then
3875  depth_threshold:=pool_size-pool_ptr-1;
3876  {now there's enough room for prefix string}
3877show_node_list(p); {the show starts at |p|}
3878print_ln;
3879end;
3880
3881@* \[13] Destroying boxes.
3882When we are done with a node list, we are obliged to return it to free
3883storage, including all of its sublists. The recursive procedure
3884|flush_node_list| does this for us.
3885
3886@ First, however, we shall consider two non-recursive procedures that do
3887simpler tasks. The first of these, |delete_token_ref|, is called when
3888a pointer to a token list's reference count is being removed. This means
3889that the token list should disappear if the reference count was |null|,
3890otherwise the count should be decreased by one.
3891@^reference counts@>
3892
3893@d token_ref_count(#) == info(#) {reference count preceding a token list}
3894
3895@p procedure delete_token_ref(@!p:pointer); {|p| points to the reference count
3896  of a token list that is losing one reference}
3897begin if token_ref_count(p)=null then flush_list(p)
3898else decr(token_ref_count(p));
3899end;
3900
3901@ Similarly, |delete_glue_ref| is called when a pointer to a glue
3902specification is being withdrawn.
3903@^reference counts@>
3904@d fast_delete_glue_ref(#)==@t@>@;@/
3905  begin if glue_ref_count(#)=null then free_node(#,glue_spec_size)
3906  else decr(glue_ref_count(#));
3907  end
3908
3909@p procedure delete_glue_ref(@!p:pointer); {|p| points to a glue specification}
3910fast_delete_glue_ref(p);
3911
3912@ Now we are ready to delete any node list, recursively.
3913In practice, the nodes deleted are usually charnodes (about 2/3 of the time),
3914and they are glue nodes in about half of the remaining cases.
3915@^recursion@>
3916
3917@p procedure flush_node_list(@!p:pointer); {erase list of nodes starting at |p|}
3918label done; {go here when node |p| has been freed}
3919var q:pointer; {successor to node |p|}
3920begin while p<>null do
3921@^inner loop@>
3922  begin q:=link(p);
3923  if is_char_node(p) then free_avail(p)
3924  else  begin case type(p) of
3925    hlist_node,vlist_node,unset_node: begin flush_node_list(list_ptr(p));
3926      free_node(p,box_node_size); goto done;
3927      end;
3928    rule_node: begin free_node(p,rule_node_size); goto done;
3929      end;
3930    ins_node: begin flush_node_list(ins_ptr(p));
3931      delete_glue_ref(split_top_ptr(p));
3932      free_node(p,ins_node_size); goto done;
3933      end;
3934    whatsit_node: @<Wipe out the whatsit node |p| and |goto done|@>;
3935    glue_node: begin fast_delete_glue_ref(glue_ptr(p));
3936      if leader_ptr(p)<>null then flush_node_list(leader_ptr(p));
3937      end;
3938    kern_node,math_node,penalty_node: do_nothing;
3939    ligature_node: flush_node_list(lig_ptr(p));
3940    mark_node: delete_token_ref(mark_ptr(p));
3941    disc_node: begin flush_node_list(pre_break(p));
3942      flush_node_list(post_break(p));
3943      end;
3944    adjust_node: flush_node_list(adjust_ptr(p));
3945    @t\4@>@<Cases of |flush_node_list| that arise in mlists only@>@;
3946    othercases confusion("flushing")
3947@:this can't happen flushing}{\quad flushing@>
3948    endcases;@/
3949    free_node(p,small_node_size);
3950    done:end;
3951  p:=q;
3952  end;
3953end;
3954
3955@* \[14] Copying boxes.
3956Another recursive operation that acts on boxes is sometimes needed: The
3957procedure |copy_node_list| returns a pointer to another node list that has
3958the same structure and meaning as the original. Note that since glue
3959specifications and token lists have reference counts, we need not make
3960copies of them. Reference counts can never get too large to fit in a
3961halfword, since each pointer to a node is in a different memory address,
3962and the total number of memory addresses fits in a halfword.
3963@^recursion@>
3964@^reference counts@>
3965
3966(Well, there actually are also references from outside |mem|; if the
3967|save_stack| is made arbitrarily large, it would theoretically be possible
3968to break \TeX\ by overflowing a reference count. But who would want to do that?)
3969
3970@d add_token_ref(#)==incr(token_ref_count(#)) {new reference to a token list}
3971@d add_glue_ref(#)==incr(glue_ref_count(#)) {new reference to a glue spec}
3972
3973@ The copying procedure copies words en masse without bothering
3974to look at their individual fields. If the node format changes---for
3975example, if the size is altered, or if some link field is moved to another
3976relative position---then this code may need to be changed too.
3977@^data structure assumptions@>
3978
3979@p function copy_node_list(@!p:pointer):pointer; {makes a duplicate of the
3980  node list that starts at |p| and returns a pointer to the new list}
3981var h:pointer; {temporary head of copied list}
3982@!q:pointer; {previous position in new list}
3983@!r:pointer; {current node being fabricated for new list}
3984@!words:0..5; {number of words remaining to be copied}
3985begin h:=get_avail; q:=h;
3986while p<>null do
3987  begin @<Make a copy of node |p| in node |r|@>;
3988  link(q):=r; q:=r; p:=link(p);
3989  end;
3990link(q):=null; q:=link(h); free_avail(h);
3991copy_node_list:=q;
3992end;
3993
3994@ @<Make a copy of node |p|...@>=
3995words:=1; {this setting occurs in more branches than any other}
3996if is_char_node(p) then r:=get_avail
3997else @<Case statement to copy different types and set |words| to the number
3998  of initial words not yet copied@>;
3999while words>0 do
4000  begin decr(words); mem[r+words]:=mem[p+words];
4001  end
4002
4003@ @<Case statement to copy...@>=
4004case type(p) of
4005hlist_node,vlist_node,unset_node: begin r:=get_node(box_node_size);
4006  mem[r+6]:=mem[p+6]; mem[r+5]:=mem[p+5]; {copy the last two words}
4007  list_ptr(r):=copy_node_list(list_ptr(p)); {this affects |mem[r+5]|}
4008  words:=5;
4009  end;
4010rule_node: begin r:=get_node(rule_node_size); words:=rule_node_size;
4011  end;
4012ins_node: begin r:=get_node(ins_node_size); mem[r+4]:=mem[p+4];
4013  add_glue_ref(split_top_ptr(p));
4014  ins_ptr(r):=copy_node_list(ins_ptr(p)); {this affects |mem[r+4]|}
4015  words:=ins_node_size-1;
4016  end;
4017whatsit_node:@<Make a partial copy of the whatsit node |p| and make |r|
4018  point to it; set |words| to the number of initial words not yet copied@>;
4019glue_node: begin r:=get_node(small_node_size); add_glue_ref(glue_ptr(p));
4020  glue_ptr(r):=glue_ptr(p); leader_ptr(r):=copy_node_list(leader_ptr(p));
4021  end;
4022kern_node,math_node,penalty_node: begin r:=get_node(small_node_size);
4023  words:=small_node_size;
4024  end;
4025ligature_node: begin r:=get_node(small_node_size);
4026  mem[lig_char(r)]:=mem[lig_char(p)]; {copy |font| and |character|}
4027  lig_ptr(r):=copy_node_list(lig_ptr(p));
4028  end;
4029disc_node: begin r:=get_node(small_node_size);
4030  pre_break(r):=copy_node_list(pre_break(p));
4031  post_break(r):=copy_node_list(post_break(p));
4032  end;
4033mark_node: begin r:=get_node(small_node_size); add_token_ref(mark_ptr(p));
4034  words:=small_node_size;
4035  end;
4036adjust_node: begin r:=get_node(small_node_size);
4037  adjust_ptr(r):=copy_node_list(adjust_ptr(p));
4038  end; {|words=1=small_node_size-1|}
4039othercases confusion("copying")
4040@:this can't happen copying}{\quad copying@>
4041endcases
4042
4043@* \[15] The command codes.
4044Before we can go any further, we need to define symbolic names for the internal
4045code numbers that represent the various commands obeyed by \TeX. These codes
4046are somewhat arbitrary, but not completely so. For example, the command
4047codes for character types are fixed by the language, since a user says,
4048e.g., `\.{\\catcode \`\\\${} = 3}' to make \.{\char'44} a math delimiter,
4049and the command code |math_shift| is equal to~3. Some other codes have
4050been made adjacent so that |case| statements in the program need not consider
4051cases that are widely spaced, or so that |case| statements can be replaced
4052by |if| statements.
4053
4054At any rate, here is the list, for future reference. First come the
4055``catcode'' commands, several of which share their numeric codes with
4056ordinary commands when the catcode cannot emerge from \TeX's scanning routine.
4057
4058@d escape=0 {escape delimiter (called \.\\ in {\sl The \TeX book\/})}
4059@:TeXbook}{\sl The \TeX book@>
4060@d relax=0 {do nothing ( \.{\\relax} )}
4061@d left_brace=1 {beginning of a group ( \.\{ )}
4062@d right_brace=2 {ending of a group ( \.\} )}
4063@d math_shift=3 {mathematics shift character ( \.\$ )}
4064@d tab_mark=4 {alignment delimiter ( \.\&, \.{\\span} )}
4065@d car_ret=5 {end of line ( |carriage_return|, \.{\\cr}, \.{\\crcr} )}
4066@d out_param=5 {output a macro parameter}
4067@d mac_param=6 {macro parameter symbol ( \.\# )}
4068@d sup_mark=7 {superscript ( \.{\char'136} )}
4069@d sub_mark=8 {subscript ( \.{\char'137} )}
4070@d ignore=9 {characters to ignore ( \.{\^\^@@} )}
4071@d endv=9 {end of \<v_j> list in alignment template}
4072@d spacer=10 {characters equivalent to blank space ( \.{\ } )}
4073@d letter=11 {characters regarded as letters ( \.{A..Z}, \.{a..z} )}
4074@d other_char=12 {none of the special character types}
4075@d active_char=13 {characters that invoke macros ( \.{\char`\~} )}
4076@d par_end=13 {end of paragraph ( \.{\\par} )}
4077@d match=13 {match a macro parameter}
4078@d comment=14 {characters that introduce comments ( \.\% )}
4079@d end_match=14 {end of parameters to macro}
4080@d stop=14 {end of job ( \.{\\end}, \.{\\dump} )}
4081@d invalid_char=15 {characters that shouldn't appear ( \.{\^\^?} )}
4082@d delim_num=15 {specify delimiter numerically ( \.{\\delimiter} )}
4083@d max_char_code=15 {largest catcode for individual characters}
4084
4085@ Next are the ordinary run-of-the-mill command codes.  Codes that are
4086|min_internal| or more represent internal quantities that might be
4087expanded by `\.{\\the}'.
4088
4089@d char_num=16 {character specified numerically ( \.{\\char} )}
4090@d math_char_num=17 {explicit math code ( \.{\\mathchar} )}
4091@d mark=18 {mark definition ( \.{\\mark} )}
4092@d xray=19 {peek inside of \TeX\ ( \.{\\show}, \.{\\showbox}, etc.~)}
4093@d make_box=20 {make a box ( \.{\\box}, \.{\\copy}, \.{\\hbox}, etc.~)}
4094@d hmove=21 {horizontal motion ( \.{\\moveleft}, \.{\\moveright} )}
4095@d vmove=22 {vertical motion ( \.{\\raise}, \.{\\lower} )}
4096@d un_hbox=23 {unglue a box ( \.{\\unhbox}, \.{\\unhcopy} )}
4097@d un_vbox=24 {unglue a box ( \.{\\unvbox}, \.{\\unvcopy} )}
4098@d remove_item=25 {nullify last item ( \.{\\unpenalty},
4099  \.{\\unkern}, \.{\\unskip} )}
4100@d hskip=26 {horizontal glue ( \.{\\hskip}, \.{\\hfil}, etc.~)}
4101@d vskip=27 {vertical glue ( \.{\\vskip}, \.{\\vfil}, etc.~)}
4102@d mskip=28 {math glue ( \.{\\mskip} )}
4103@d kern=29 {fixed space ( \.{\\kern})}
4104@d mkern=30 {math kern ( \.{\\mkern} )}
4105@d leader_ship=31 {use a box ( \.{\\shipout}, \.{\\leaders}, etc.~)}
4106@d halign=32 {horizontal table alignment ( \.{\\halign} )}
4107@d valign=33 {vertical table alignment ( \.{\\valign} )}
4108@d no_align=34 {temporary escape from alignment ( \.{\\noalign} )}
4109@d vrule=35 {vertical rule ( \.{\\vrule} )}
4110@d hrule=36 {horizontal rule ( \.{\\hrule} )}
4111@d insert=37 {vlist inserted in box ( \.{\\insert} )}
4112@d vadjust=38 {vlist inserted in enclosing paragraph ( \.{\\vadjust} )}
4113@d ignore_spaces=39 {gobble |spacer| tokens ( \.{\\ignorespaces} )}
4114@d after_assignment=40 {save till assignment is done ( \.{\\afterassignment} )}
4115@d after_group=41 {save till group is done ( \.{\\aftergroup} )}
4116@d break_penalty=42 {additional badness ( \.{\\penalty} )}
4117@d start_par=43 {begin paragraph ( \.{\\indent}, \.{\\noindent} )}
4118@d ital_corr=44 {italic correction ( \.{\\/} )}
4119@d accent=45 {attach accent in text ( \.{\\accent} )}
4120@d math_accent=46 {attach accent in math ( \.{\\mathaccent} )}
4121@d discretionary=47 {discretionary texts ( \.{\\-}, \.{\\discretionary} )}
4122@d eq_no=48 {equation number ( \.{\\eqno}, \.{\\leqno} )}
4123@d left_right=49 {variable delimiter ( \.{\\left}, \.{\\right} )}
4124@d math_comp=50 {component of formula ( \.{\\mathbin}, etc.~)}
4125@d limit_switch=51 {diddle limit conventions ( \.{\\displaylimits}, etc.~)}
4126@d above=52 {generalized fraction ( \.{\\above}, \.{\\atop}, etc.~)}
4127@d math_style=53 {style specification ( \.{\\displaystyle}, etc.~)}
4128@d math_choice=54 {choice specification ( \.{\\mathchoice} )}
4129@d non_script=55 {conditional math glue ( \.{\\nonscript} )}
4130@d vcenter=56 {vertically center a vbox ( \.{\\vcenter} )}
4131@d case_shift=57 {force specific case ( \.{\\lowercase}, \.{\\uppercase}~)}
4132@d message=58 {send to user ( \.{\\message}, \.{\\errmessage} )}
4133@d extension=59 {extensions to \TeX\ ( \.{\\write}, \.{\\special}, etc.~)}
4134@d in_stream=60 {files for reading ( \.{\\openin}, \.{\\closein} )}
4135@d begin_group=61 {begin local grouping ( \.{\\begingroup} )}
4136@d end_group=62 {end local grouping ( \.{\\endgroup} )}
4137@d omit=63 {omit alignment template ( \.{\\omit} )}
4138@d ex_space=64 {explicit space ( \.{\\\ } )}
4139@d no_boundary=65 {suppress boundary ligatures ( \.{\\noboundary} )}
4140@d radical=66 {square root and similar signs ( \.{\\radical} )}
4141@d end_cs_name=67 {end control sequence ( \.{\\endcsname} )}
4142@d min_internal=68 {the smallest code that can follow \.{\\the}}
4143@d char_given=68 {character code defined by \.{\\chardef}}
4144@d math_given=69 {math code defined by \.{\\mathchardef}}
4145@d last_item=70 {most recent item ( \.{\\lastpenalty},
4146  \.{\\lastkern}, \.{\\lastskip} )}
4147@d max_non_prefixed_command=70 {largest command code that can't be \.{\\global}}
4148
4149@ The next codes are special; they all relate to mode-independent
4150assignment of values to \TeX's internal registers or tables.
4151Codes that are |max_internal| or less represent internal quantities
4152that might be expanded by `\.{\\the}'.
4153
4154@d toks_register=71 {token list register ( \.{\\toks} )}
4155@d assign_toks=72 {special token list ( \.{\\output}, \.{\\everypar}, etc.~)}
4156@d assign_int=73 {user-defined integer ( \.{\\tolerance}, \.{\\day}, etc.~)}
4157@d assign_dimen=74 {user-defined length ( \.{\\hsize}, etc.~)}
4158@d assign_glue=75 {user-defined glue ( \.{\\baselineskip}, etc.~)}
4159@d assign_mu_glue=76 {user-defined muglue ( \.{\\thinmuskip}, etc.~)}
4160@d assign_font_dimen=77 {user-defined font dimension ( \.{\\fontdimen} )}
4161@d assign_font_int=78 {user-defined font integer ( \.{\\hyphenchar},
4162  \.{\\skewchar} )}
4163@d set_aux=79 {specify state info ( \.{\\spacefactor}, \.{\\prevdepth} )}
4164@d set_prev_graf=80 {specify state info ( \.{\\prevgraf} )}
4165@d set_page_dimen=81 {specify state info ( \.{\\pagegoal}, etc.~)}
4166@d set_page_int=82 {specify state info ( \.{\\deadcycles},
4167  \.{\\insertpenalties} )}
4168@d set_box_dimen=83 {change dimension of box ( \.{\\wd}, \.{\\ht}, \.{\\dp} )}
4169@d set_shape=84 {specify fancy paragraph shape ( \.{\\parshape} )}
4170@d def_code=85 {define a character code ( \.{\\catcode}, etc.~)}
4171@d def_family=86 {declare math fonts ( \.{\\textfont}, etc.~)}
4172@d set_font=87 {set current font ( font identifiers )}
4173@d def_font=88 {define a font file ( \.{\\font} )}
4174@d register=89 {internal register ( \.{\\count}, \.{\\dimen}, etc.~)}
4175@d max_internal=89 {the largest code that can follow \.{\\the}}
4176@d advance=90 {advance a register or parameter ( \.{\\advance} )}
4177@d multiply=91 {multiply a register or parameter ( \.{\\multiply} )}
4178@d divide=92 {divide a register or parameter ( \.{\\divide} )}
4179@d prefix=93 {qualify a definition ( \.{\\global}, \.{\\long}, \.{\\outer} )}
4180@d let=94 {assign a command code ( \.{\\let}, \.{\\futurelet} )}
4181@d shorthand_def=95 {code definition ( \.{\\chardef}, \.{\\countdef}, etc.~)}
4182@d read_to_cs=96 {read into a control sequence ( \.{\\read} )}
4183@d def=97 {macro definition ( \.{\\def}, \.{\\gdef}, \.{\\xdef}, \.{\\edef} )}
4184@d set_box=98 {set a box ( \.{\\setbox} )}
4185@d hyph_data=99 {hyphenation data ( \.{\\hyphenation}, \.{\\patterns} )}
4186@d set_interaction=100 {define level of interaction ( \.{\\batchmode}, etc.~)}
4187@d max_command=100 {the largest command code seen at |big_switch|}
4188
4189@ The remaining command codes are extra special, since they cannot get through
4190\TeX's scanner to the main control routine. They have been given values higher
4191than |max_command| so that their special nature is easily discernible.
4192The ``expandable'' commands come first.
4193
4194@d undefined_cs=max_command+1 {initial state of most |eq_type| fields}
4195@d expand_after=max_command+2 {special expansion ( \.{\\expandafter} )}
4196@d no_expand=max_command+3 {special nonexpansion ( \.{\\noexpand} )}
4197@d input=max_command+4 {input a source file ( \.{\\input}, \.{\\endinput} )}
4198@d if_test=max_command+5 {conditional text ( \.{\\if}, \.{\\ifcase}, etc.~)}
4199@d fi_or_else=max_command+6 {delimiters for conditionals ( \.{\\else}, etc.~)}
4200@d cs_name=max_command+7 {make a control sequence from tokens ( \.{\\csname} )}
4201@d convert=max_command+8 {convert to text ( \.{\\number}, \.{\\string}, etc.~)}
4202@d the=max_command+9 {expand an internal quantity ( \.{\\the} )}
4203@d top_bot_mark=max_command+10 {inserted mark ( \.{\\topmark}, etc.~)}
4204@d call=max_command+11 {non-long, non-outer control sequence}
4205@d long_call=max_command+12 {long, non-outer control sequence}
4206@d outer_call=max_command+13 {non-long, outer control sequence}
4207@d long_outer_call=max_command+14 {long, outer control sequence}
4208@d end_template=max_command+15 {end of an alignment template}
4209@d dont_expand=max_command+16 {the following token was marked by \.{\\noexpand}}
4210@d glue_ref=max_command+17 {the equivalent points to a glue specification}
4211@d shape_ref=max_command+18 {the equivalent points to a parshape specification}
4212@d box_ref=max_command+19 {the equivalent points to a box node, or is |null|}
4213@d data=max_command+20 {the equivalent is simply a halfword number}
4214
4215@* \[16] The semantic nest.
4216\TeX\ is typically in the midst of building many lists at once. For example,
4217when a math formula is being processed, \TeX\ is in math mode and
4218working on an mlist; this formula has temporarily interrupted \TeX\ from
4219being in horizontal mode and building the hlist of a paragraph; and this
4220paragraph has temporarily interrupted \TeX\ from being in vertical mode
4221and building the vlist for the next page of a document. Similarly, when a
4222\.{\\vbox} occurs inside of an \.{\\hbox}, \TeX\ is temporarily
4223interrupted from working in restricted horizontal mode, and it enters
4224internal vertical mode.  The ``semantic nest'' is a stack that
4225keeps track of what lists and modes are currently suspended.
4226
4227At each level of processing we are in one of six modes:
4228
4229\yskip\hang|vmode| stands for vertical mode (the page builder);
4230
4231\hang|hmode| stands for horizontal mode (the paragraph builder);
4232
4233\hang|mmode| stands for displayed formula mode;
4234
4235\hang|-vmode| stands for internal vertical mode (e.g., in a \.{\\vbox});
4236
4237\hang|-hmode| stands for restricted horizontal mode (e.g., in an \.{\\hbox});
4238
4239\hang|-mmode| stands for math formula mode (not displayed).
4240
4241\yskip\noindent The mode is temporarily set to zero while processing \.{\\write}
4242texts in the |ship_out| routine.
4243
4244Numeric values are assigned to |vmode|, |hmode|, and |mmode| so that
4245\TeX's ``big semantic switch'' can select the appropriate thing to
4246do by computing the value |abs(mode)+cur_cmd|, where |mode| is the current
4247mode and |cur_cmd| is the current command code.
4248
4249@d vmode=1 {vertical mode}
4250@d hmode=vmode+max_command+1 {horizontal mode}
4251@d mmode=hmode+max_command+1 {math mode}
4252
4253@p procedure print_mode(@!m:integer); {prints the mode represented by |m|}
4254begin if m>0 then
4255  case m div (max_command+1) of
4256  0:print("vertical");
4257  1:print("horizontal");
4258  2:print("display math");
4259  end
4260else if m=0 then print("no")
4261else  case (-m) div (max_command+1) of
4262  0:print("internal vertical");
4263  1:print("restricted horizontal");
4264  2:print("math");
4265  end;
4266print(" mode");
4267end;
4268
4269@ The state of affairs at any semantic level can be represented by
4270five values:
4271
4272\yskip\hang|mode| is the number representing the semantic mode, as
4273just explained.
4274
4275\yskip\hang|head| is a |pointer| to a list head for the list being built;
4276|link(head)| therefore points to the first element of the list, or
4277to |null| if the list is empty.
4278
4279\yskip\hang|tail| is a |pointer| to the final node of the list being
4280built; thus, |tail=head| if and only if the list is empty.
4281
4282\yskip\hang|prev_graf| is the number of lines of the current paragraph that
4283have already been put into the present vertical list.
4284
4285\yskip\hang|aux| is an auxiliary |memory_word| that gives further information
4286that is needed to characterize the situation.
4287
4288\yskip\noindent
4289In vertical mode, |aux| is also known as |prev_depth|; it is the scaled
4290value representing the depth of the previous box, for use in baseline
4291calculations, or it is |<=-1000|pt if the next box on the vertical list is to
4292be exempt from baseline calculations.  In horizontal mode, |aux| is also
4293known as |space_factor| and |clang|; it holds the current space factor used in
4294spacing calculations, and the current language used for hyphenation.
4295(The value of |clang| is undefined in restricted horizontal mode.)
4296In math mode, |aux| is also known as |incompleat_noad|; if
4297not |null|, it points to a record that represents the numerator of a
4298generalized fraction for which the denominator is currently being formed
4299in the current list.
4300
4301There is also a sixth quantity, |mode_line|, which correlates
4302the semantic nest with the user's input; |mode_line| contains the source
4303line number at which the current level of nesting was entered. The negative
4304of this line number is the |mode_line| at the level of the
4305user's output routine.
4306
4307In horizontal mode, the |prev_graf| field is used for initial language data.
4308
4309The semantic nest is an array called |nest| that holds the |mode|, |head|,
4310|tail|, |prev_graf|, |aux|, and |mode_line| values for all semantic levels
4311below the currently active one. Information about the currently active
4312level is kept in the global quantities |mode|, |head|, |tail|, |prev_graf|,
4313|aux|, and |mode_line|, which live in a \PASCAL\ record that is ready to
4314be pushed onto |nest| if necessary.
4315
4316@d ignore_depth==-65536000 {|prev_depth| value that is ignored}
4317
4318@<Types...@>=
4319@!list_state_record=record@!mode_field:-mmode..mmode;@+
4320  @!head_field,@!tail_field: pointer;
4321  @!pg_field,@!ml_field: integer;@+
4322  @!aux_field: memory_word;
4323  end;
4324
4325@ @d mode==cur_list.mode_field {current mode}
4326@d head==cur_list.head_field {header node of current list}
4327@d tail==cur_list.tail_field {final node on current list}
4328@d prev_graf==cur_list.pg_field {number of paragraph lines accumulated}
4329@d aux==cur_list.aux_field {auxiliary data about the current list}
4330@d prev_depth==aux.sc {the name of |aux| in vertical mode}
4331@d space_factor==aux.hh.lh {part of |aux| in horizontal mode}
4332@d clang==aux.hh.rh {the other part of |aux| in horizontal mode}
4333@d incompleat_noad==aux.int {the name of |aux| in math mode}
4334@d mode_line==cur_list.ml_field {source file line number at beginning of list}
4335
4336@<Glob...@>=
4337@!nest:array[0..nest_size] of list_state_record;
4338@!nest_ptr:0..nest_size; {first unused location of |nest|}
4339@!max_nest_stack:0..nest_size; {maximum of |nest_ptr| when pushing}
4340@!cur_list:list_state_record; {the ``top'' semantic state}
4341@!shown_mode:-mmode..mmode; {most recent mode shown by \.{\\tracingcommands}}
4342
4343@ Here is a common way to make the current list grow:
4344
4345@d tail_append(#)==begin link(tail):=#; tail:=link(tail);
4346  end
4347
4348@ We will see later that the vertical list at the bottom semantic level is split
4349into two parts; the ``current page'' runs from |page_head| to |page_tail|,
4350and the ``contribution list'' runs from |contrib_head| to |tail| of
4351semantic level zero. The idea is that contributions are first formed in
4352vertical mode, then ``contributed'' to the current page (during which time
4353the page-breaking decisions are made). For now, we don't need to know
4354any more details about the page-building process.
4355
4356@<Set init...@>=
4357nest_ptr:=0; max_nest_stack:=0;
4358mode:=vmode; head:=contrib_head; tail:=contrib_head;
4359prev_depth:=ignore_depth; mode_line:=0;
4360prev_graf:=0; shown_mode:=0;
4361@<Start a new current page@>;
4362
4363@ When \TeX's work on one level is interrupted, the state is saved by
4364calling |push_nest|. This routine changes |head| and |tail| so that
4365a new (empty) list is begun; it does not change |mode| or |aux|.
4366
4367@p procedure push_nest; {enter a new semantic level, save the old}
4368begin if nest_ptr>max_nest_stack then
4369  begin max_nest_stack:=nest_ptr;
4370  if nest_ptr=nest_size then overflow("semantic nest size",nest_size);
4371@:TeX capacity exceeded semantic nest size}{\quad semantic nest size@>
4372  end;
4373nest[nest_ptr]:=cur_list; {stack the record}
4374incr(nest_ptr); head:=get_avail; tail:=head; prev_graf:=0; mode_line:=line;
4375end;
4376
4377@ Conversely, when \TeX\ is finished on the current level, the former
4378state is restored by calling |pop_nest|. This routine will never be
4379called at the lowest semantic level, nor will it be called unless |head|
4380is a node that should be returned to free memory.
4381
4382@p procedure pop_nest; {leave a semantic level, re-enter the old}
4383begin free_avail(head); decr(nest_ptr); cur_list:=nest[nest_ptr];
4384end;
4385
4386@ Here is a procedure that displays what \TeX\ is working on, at all levels.
4387
4388@p procedure@?print_totals; forward;@t\2@>
4389procedure show_activities;
4390var p:0..nest_size; {index into |nest|}
4391@!m:-mmode..mmode; {mode}
4392@!a:memory_word; {auxiliary}
4393@!q,@!r:pointer; {for showing the current page}
4394@!t:integer; {ditto}
4395begin nest[nest_ptr]:=cur_list; {put the top level into the array}
4396print_nl(""); print_ln;
4397for p:=nest_ptr downto 0 do
4398  begin m:=nest[p].mode_field; a:=nest[p].aux_field;
4399  print_nl("### "); print_mode(m);
4400  print(" entered at line "); print_int(abs(nest[p].ml_field));
4401  if m=hmode then if nest[p].pg_field <> @'40600000 then
4402    begin print(" (language"); print_int(nest[p].pg_field mod @'200000);
4403    print(":hyphenmin"); print_int(nest[p].pg_field div @'20000000);
4404    print_char(","); print_int((nest[p].pg_field div @'200000) mod @'100);
4405    print_char(")");
4406    end;
4407  if nest[p].ml_field<0 then print(" (\output routine)");
4408  if p=0 then
4409    begin @<Show the status of the current page@>;
4410    if link(contrib_head)<>null then
4411      print_nl("### recent contributions:");
4412    end;
4413  show_box(link(nest[p].head_field));
4414  @<Show the auxiliary field, |a|@>;
4415  end;
4416end;
4417
4418@ @<Show the auxiliary...@>=
4419case abs(m) div (max_command+1) of
44200: begin print_nl("prevdepth ");
4421  if a.sc<=ignore_depth then print("ignored")
4422  else print_scaled(a.sc);
4423  if nest[p].pg_field<>0 then
4424    begin print(", prevgraf ");
4425    print_int(nest[p].pg_field); print(" line");
4426    if nest[p].pg_field<>1 then print_char("s");
4427    end;
4428  end;
44291: begin print_nl("spacefactor "); print_int(a.hh.lh);
4430  if m>0 then@+ if a.hh.rh>0 then
4431    begin print(", current language "); print_int(a.hh.rh);@+
4432    end;
4433  end;
44342: if a.int<>null then
4435  begin print("this will be denominator of:"); show_box(a.int);@+
4436  end;
4437end {there are no other cases}
4438
4439@* \[17] The table of equivalents.
4440Now that we have studied the data structures for \TeX's semantic routines,
4441we ought to consider the data structures used by its syntactic routines. In
4442other words, our next concern will be
4443the tables that \TeX\ looks at when it is scanning
4444what the user has written.
4445
4446The biggest and most important such table is called |eqtb|. It holds the
4447current ``equivalents'' of things; i.e., it explains what things mean
4448or what their current values are, for all quantities that are subject to
4449the nesting structure provided by \TeX's grouping mechanism. There are six
4450parts to |eqtb|:
4451
4452\yskip\hangg 1) |eqtb[active_base..(hash_base-1)]| holds the current
4453equivalents of single-character control sequences.
4454
4455\yskip\hangg 2) |eqtb[hash_base..(glue_base-1)]| holds the current
4456equivalents of multiletter control sequences.
4457
4458\yskip\hangg 3) |eqtb[glue_base..(local_base-1)]| holds the current
4459equivalents of glue parameters like the current baselineskip.
4460
4461\yskip\hangg 4) |eqtb[local_base..(int_base-1)]| holds the current
4462equivalents of local halfword quantities like the current box registers,
4463the current ``catcodes,'' the current font, and a pointer to the current
4464paragraph shape.
4465
4466\yskip\hangg 5) |eqtb[int_base..(dimen_base-1)]| holds the current
4467equivalents of fullword integer parameters like the current hyphenation
4468penalty.
4469
4470\yskip\hangg 6) |eqtb[dimen_base..eqtb_size]| holds the current equivalents
4471of fullword dimension parameters like the current hsize or amount of
4472hanging indentation.
4473
4474\yskip\noindent Note that, for example, the current amount of
4475baselineskip glue is determined by the setting of a particular location
4476in region~3 of |eqtb|, while the current meaning of the control sequence
4477`\.{\\baselineskip}' (which might have been changed by \.{\\def} or
4478\.{\\let}) appears in region~2.
4479
4480@ Each entry in |eqtb| is a |memory_word|. Most of these words are of type
4481|two_halves|, and subdivided into three fields:
4482
4483\yskip\hangg 1) The |eq_level| (a quarterword) is the level of grouping at
4484which this equivalent was defined. If the level is |level_zero|, the
4485equivalent has never been defined; |level_one| refers to the outer level
4486(outside of all groups), and this level is also used for global
4487definitions that never go away. Higher levels are for equivalents that
4488will disappear at the end of their group.  @^global definitions@>
4489
4490\yskip\hangg 2) The |eq_type| (another quarterword) specifies what kind of
4491entry this is. There are many types, since each \TeX\ primitive like
4492\.{\\hbox}, \.{\\def}, etc., has its own special code. The list of
4493command codes above includes all possible settings of the |eq_type| field.
4494
4495\yskip\hangg 3) The |equiv| (a halfword) is the current equivalent value.
4496This may be a font number, a pointer into |mem|, or a variety of other
4497things.
4498
4499@d eq_level_field(#)==#.hh.b1
4500@d eq_type_field(#)==#.hh.b0
4501@d equiv_field(#)==#.hh.rh
4502@d eq_level(#)==eq_level_field(eqtb[#]) {level of definition}
4503@d eq_type(#)==eq_type_field(eqtb[#]) {command code for equivalent}
4504@d equiv(#)==equiv_field(eqtb[#]) {equivalent value}
4505@d level_zero=min_quarterword {level for undefined quantities}
4506@d level_one=level_zero+1 {outermost level for defined quantities}
4507
4508@ Many locations in |eqtb| have symbolic names. The purpose of the next
4509paragraphs is to define these names, and to set up the initial values of the
4510equivalents.
4511
4512In the first region we have 256 equivalents for ``active characters'' that
4513act as control sequences, followed by 256 equivalents for single-character
4514control sequences.
4515
4516Then comes region~2, which corresponds to the hash table that we will
4517define later.  The maximum address in this region is used for a dummy
4518control sequence that is perpetually undefined. There also are several
4519locations for control sequences that are perpetually defined
4520(since they are used in error recovery).
4521
4522@d active_base=1 {beginning of region 1, for active character equivalents}
4523@d single_base=active_base+256 {equivalents of one-character control sequences}
4524@d null_cs=single_base+256 {equivalent of \.{\\csname\\endcsname}}
4525@d hash_base=null_cs+1 {beginning of region 2, for the hash table}
4526@d frozen_control_sequence=hash_base+hash_size {for error recovery}
4527@d frozen_protection=frozen_control_sequence {inaccessible but definable}
4528@d frozen_cr=frozen_control_sequence+1 {permanent `\.{\\cr}'}
4529@d frozen_end_group=frozen_control_sequence+2 {permanent `\.{\\endgroup}'}
4530@d frozen_right=frozen_control_sequence+3 {permanent `\.{\\right}'}
4531@d frozen_fi=frozen_control_sequence+4 {permanent `\.{\\fi}'}
4532@d frozen_end_template=frozen_control_sequence+5 {permanent `\.{\\endtemplate}'}
4533@d frozen_endv=frozen_control_sequence+6 {second permanent `\.{\\endtemplate}'}
4534@d frozen_relax=frozen_control_sequence+7 {permanent `\.{\\relax}'}
4535@d end_write=frozen_control_sequence+8 {permanent `\.{\\endwrite}'}
4536@d frozen_dont_expand=frozen_control_sequence+9
4537  {permanent `\.{\\notexpanded:}'}
4538@d frozen_null_font=frozen_control_sequence+10
4539  {permanent `\.{\\nullfont}'}
4540@d font_id_base=frozen_null_font-font_base
4541  {begins table of 257 permanent font identifiers}
4542@d undefined_control_sequence=frozen_null_font+257 {dummy location}
4543@d glue_base=undefined_control_sequence+1 {beginning of region 3}
4544
4545@<Initialize table entries...@>=
4546eq_type(undefined_control_sequence):=undefined_cs;
4547equiv(undefined_control_sequence):=null;
4548eq_level(undefined_control_sequence):=level_zero;
4549for k:=active_base to undefined_control_sequence-1 do
4550  eqtb[k]:=eqtb[undefined_control_sequence];
4551
4552@ Here is a routine that displays the current meaning of an |eqtb| entry
4553in region 1 or~2. (Similar routines for the other regions will appear
4554below.)
4555
4556@<Show equivalent |n|, in region 1 or 2@>=
4557begin sprint_cs(n); print_char("="); print_cmd_chr(eq_type(n),equiv(n));
4558if eq_type(n)>=call then
4559  begin print_char(":"); show_token_list(link(equiv(n)),null,32);
4560  end;
4561end
4562
4563@ Region 3 of |eqtb| contains the 256 \.{\\skip} registers, as well as the
4564glue parameters defined here. It is important that the ``muskip''
4565parameters have larger numbers than the others.
4566
4567@d line_skip_code=0 {interline glue if |baseline_skip| is infeasible}
4568@d baseline_skip_code=1 {desired glue between baselines}
4569@d par_skip_code=2 {extra glue just above a paragraph}
4570@d above_display_skip_code=3 {extra glue just above displayed math}
4571@d below_display_skip_code=4 {extra glue just below displayed math}
4572@d above_display_short_skip_code=5
4573  {glue above displayed math following short lines}
4574@d below_display_short_skip_code=6
4575  {glue below displayed math following short lines}
4576@d left_skip_code=7 {glue at left of justified lines}
4577@d right_skip_code=8 {glue at right of justified lines}
4578@d top_skip_code=9 {glue at top of main pages}
4579@d split_top_skip_code=10 {glue at top of split pages}
4580@d tab_skip_code=11 {glue between aligned entries}
4581@d space_skip_code=12 {glue between words (if not |zero_glue|)}
4582@d xspace_skip_code=13 {glue after sentences (if not |zero_glue|)}
4583@d par_fill_skip_code=14 {glue on last line of paragraph}
4584@d thin_mu_skip_code=15 {thin space in math formula}
4585@d med_mu_skip_code=16 {medium space in math formula}
4586@d thick_mu_skip_code=17 {thick space in math formula}
4587@d glue_pars=18 {total number of glue parameters}
4588@d skip_base=glue_base+glue_pars {table of 256 ``skip'' registers}
4589@d mu_skip_base=skip_base+256 {table of 256 ``muskip'' registers}
4590@d local_base=mu_skip_base+256 {beginning of region 4}
4591@#
4592@d skip(#)==equiv(skip_base+#) {|mem| location of glue specification}
4593@d mu_skip(#)==equiv(mu_skip_base+#) {|mem| location of math glue spec}
4594@d glue_par(#)==equiv(glue_base+#) {|mem| location of glue specification}
4595@d line_skip==glue_par(line_skip_code)
4596@d baseline_skip==glue_par(baseline_skip_code)
4597@d par_skip==glue_par(par_skip_code)
4598@d above_display_skip==glue_par(above_display_skip_code)
4599@d below_display_skip==glue_par(below_display_skip_code)
4600@d above_display_short_skip==glue_par(above_display_short_skip_code)
4601@d below_display_short_skip==glue_par(below_display_short_skip_code)
4602@d left_skip==glue_par(left_skip_code)
4603@d right_skip==glue_par(right_skip_code)
4604@d top_skip==glue_par(top_skip_code)
4605@d split_top_skip==glue_par(split_top_skip_code)
4606@d tab_skip==glue_par(tab_skip_code)
4607@d space_skip==glue_par(space_skip_code)
4608@d xspace_skip==glue_par(xspace_skip_code)
4609@d par_fill_skip==glue_par(par_fill_skip_code)
4610@d thin_mu_skip==glue_par(thin_mu_skip_code)
4611@d med_mu_skip==glue_par(med_mu_skip_code)
4612@d thick_mu_skip==glue_par(thick_mu_skip_code)
4613
4614@<Current |mem| equivalent of glue parameter number |n|@>=glue_par(n)
4615
4616@ Sometimes we need to convert \TeX's internal code numbers into symbolic
4617form. The |print_skip_param| routine gives the symbolic name of a glue
4618parameter.
4619
4620@<Declare the procedure called |print_skip_param|@>=
4621procedure print_skip_param(@!n:integer);
4622begin case n of
4623line_skip_code: print_esc("lineskip");
4624baseline_skip_code: print_esc("baselineskip");
4625par_skip_code: print_esc("parskip");
4626above_display_skip_code: print_esc("abovedisplayskip");
4627below_display_skip_code: print_esc("belowdisplayskip");
4628above_display_short_skip_code: print_esc("abovedisplayshortskip");
4629below_display_short_skip_code: print_esc("belowdisplayshortskip");
4630left_skip_code: print_esc("leftskip");
4631right_skip_code: print_esc("rightskip");
4632top_skip_code: print_esc("topskip");
4633split_top_skip_code: print_esc("splittopskip");
4634tab_skip_code: print_esc("tabskip");
4635space_skip_code: print_esc("spaceskip");
4636xspace_skip_code: print_esc("xspaceskip");
4637par_fill_skip_code: print_esc("parfillskip");
4638thin_mu_skip_code: print_esc("thinmuskip");
4639med_mu_skip_code: print_esc("medmuskip");
4640thick_mu_skip_code: print_esc("thickmuskip");
4641othercases print("[unknown glue parameter!]")
4642endcases;
4643end;
4644
4645@ The symbolic names for glue parameters are put into \TeX's hash table
4646by using the routine called |primitive|, defined below. Let us enter them
4647now, so that we don't have to list all those parameter names anywhere else.
4648
4649@<Put each of \TeX's primitives into the hash table@>=
4650primitive("lineskip",assign_glue,glue_base+line_skip_code);@/
4651@!@:line_skip_}{\.{\\lineskip} primitive@>
4652primitive("baselineskip",assign_glue,glue_base+baseline_skip_code);@/
4653@!@:baseline_skip_}{\.{\\baselineskip} primitive@>
4654primitive("parskip",assign_glue,glue_base+par_skip_code);@/
4655@!@:par_skip_}{\.{\\parskip} primitive@>
4656primitive("abovedisplayskip",assign_glue,glue_base+above_display_skip_code);@/
4657@!@:above_display_skip_}{\.{\\abovedisplayskip} primitive@>
4658primitive("belowdisplayskip",assign_glue,glue_base+below_display_skip_code);@/
4659@!@:below_display_skip_}{\.{\\belowdisplayskip} primitive@>
4660primitive("abovedisplayshortskip",
4661  assign_glue,glue_base+above_display_short_skip_code);@/
4662@!@:above_display_short_skip_}{\.{\\abovedisplayshortskip} primitive@>
4663primitive("belowdisplayshortskip",
4664  assign_glue,glue_base+below_display_short_skip_code);@/
4665@!@:below_display_short_skip_}{\.{\\belowdisplayshortskip} primitive@>
4666primitive("leftskip",assign_glue,glue_base+left_skip_code);@/
4667@!@:left_skip_}{\.{\\leftskip} primitive@>
4668primitive("rightskip",assign_glue,glue_base+right_skip_code);@/
4669@!@:right_skip_}{\.{\\rightskip} primitive@>
4670primitive("topskip",assign_glue,glue_base+top_skip_code);@/
4671@!@:top_skip_}{\.{\\topskip} primitive@>
4672primitive("splittopskip",assign_glue,glue_base+split_top_skip_code);@/
4673@!@:split_top_skip_}{\.{\\splittopskip} primitive@>
4674primitive("tabskip",assign_glue,glue_base+tab_skip_code);@/
4675@!@:tab_skip_}{\.{\\tabskip} primitive@>
4676primitive("spaceskip",assign_glue,glue_base+space_skip_code);@/
4677@!@:space_skip_}{\.{\\spaceskip} primitive@>
4678primitive("xspaceskip",assign_glue,glue_base+xspace_skip_code);@/
4679@!@:xspace_skip_}{\.{\\xspaceskip} primitive@>
4680primitive("parfillskip",assign_glue,glue_base+par_fill_skip_code);@/
4681@!@:par_fill_skip_}{\.{\\parfillskip} primitive@>
4682primitive("thinmuskip",assign_mu_glue,glue_base+thin_mu_skip_code);@/
4683@!@:thin_mu_skip_}{\.{\\thinmuskip} primitive@>
4684primitive("medmuskip",assign_mu_glue,glue_base+med_mu_skip_code);@/
4685@!@:med_mu_skip_}{\.{\\medmuskip} primitive@>
4686primitive("thickmuskip",assign_mu_glue,glue_base+thick_mu_skip_code);@/
4687@!@:thick_mu_skip_}{\.{\\thickmuskip} primitive@>
4688
4689@ @<Cases of |print_cmd_chr| for symbolic printing of primitives@>=
4690assign_glue,assign_mu_glue: if chr_code<skip_base then
4691    print_skip_param(chr_code-glue_base)
4692  else if chr_code<mu_skip_base then
4693    begin print_esc("skip"); print_int(chr_code-skip_base);
4694    end
4695  else  begin print_esc("muskip"); print_int(chr_code-mu_skip_base);
4696    end;
4697
4698@ All glue parameters and registers are initially `\.{0pt plus0pt minus0pt}'.
4699
4700@<Initialize table entries...@>=
4701equiv(glue_base):=zero_glue; eq_level(glue_base):=level_one;
4702eq_type(glue_base):=glue_ref;
4703for k:=glue_base+1 to local_base-1 do eqtb[k]:=eqtb[glue_base];
4704glue_ref_count(zero_glue):=glue_ref_count(zero_glue)+local_base-glue_base;
4705
4706@ @<Show equivalent |n|, in region 3@>=
4707if n<skip_base then
4708  begin print_skip_param(n-glue_base); print_char("=");
4709  if n<glue_base+thin_mu_skip_code then print_spec(equiv(n),"pt")
4710  else print_spec(equiv(n),"mu");
4711  end
4712else if n<mu_skip_base then
4713  begin print_esc("skip"); print_int(n-skip_base); print_char("=");
4714  print_spec(equiv(n),"pt");
4715  end
4716else  begin print_esc("muskip"); print_int(n-mu_skip_base); print_char("=");
4717  print_spec(equiv(n),"mu");
4718  end
4719
4720@ Region 4 of |eqtb| contains the local quantities defined here. The
4721bulk of this region is taken up by five tables that are indexed by eight-bit
4722characters; these tables are important to both the syntactic and semantic
4723portions of \TeX. There are also a bunch of special things like font and
4724token parameters, as well as the tables of \.{\\toks} and \.{\\box}
4725registers.
4726
4727@d par_shape_loc=local_base {specifies paragraph shape}
4728@d output_routine_loc=local_base+1 {points to token list for \.{\\output}}
4729@d every_par_loc=local_base+2 {points to token list for \.{\\everypar}}
4730@d every_math_loc=local_base+3 {points to token list for \.{\\everymath}}
4731@d every_display_loc=local_base+4 {points to token list for \.{\\everydisplay}}
4732@d every_hbox_loc=local_base+5 {points to token list for \.{\\everyhbox}}
4733@d every_vbox_loc=local_base+6 {points to token list for \.{\\everyvbox}}
4734@d every_job_loc=local_base+7 {points to token list for \.{\\everyjob}}
4735@d every_cr_loc=local_base+8 {points to token list for \.{\\everycr}}
4736@d err_help_loc=local_base+9 {points to token list for \.{\\errhelp}}
4737@d toks_base=local_base+10 {table of 256 token list registers}
4738@d box_base=toks_base+256 {table of 256 box registers}
4739@d cur_font_loc=box_base+256 {internal font number outside math mode}
4740@d math_font_base=cur_font_loc+1 {table of 48 math font numbers}
4741@d cat_code_base=math_font_base+48
4742  {table of 256 command codes (the ``catcodes'')}
4743@d lc_code_base=cat_code_base+256 {table of 256 lowercase mappings}
4744@d uc_code_base=lc_code_base+256 {table of 256 uppercase mappings}
4745@d sf_code_base=uc_code_base+256 {table of 256 spacefactor mappings}
4746@d math_code_base=sf_code_base+256 {table of 256 math mode mappings}
4747@d int_base=math_code_base+256 {beginning of region 5}
4748@#
4749@d par_shape_ptr==equiv(par_shape_loc)
4750@d output_routine==equiv(output_routine_loc)
4751@d every_par==equiv(every_par_loc)
4752@d every_math==equiv(every_math_loc)
4753@d every_display==equiv(every_display_loc)
4754@d every_hbox==equiv(every_hbox_loc)
4755@d every_vbox==equiv(every_vbox_loc)
4756@d every_job==equiv(every_job_loc)
4757@d every_cr==equiv(every_cr_loc)
4758@d err_help==equiv(err_help_loc)
4759@d toks(#)==equiv(toks_base+#)
4760@d box(#)==equiv(box_base+#)
4761@d cur_font==equiv(cur_font_loc)
4762@d fam_fnt(#)==equiv(math_font_base+#)
4763@d cat_code(#)==equiv(cat_code_base+#)
4764@d lc_code(#)==equiv(lc_code_base+#)
4765@d uc_code(#)==equiv(uc_code_base+#)
4766@d sf_code(#)==equiv(sf_code_base+#)
4767@d math_code(#)==equiv(math_code_base+#)
4768  {Note: |math_code(c)| is the true math code plus |min_halfword|}
4769
4770@<Put each...@>=
4771primitive("output",assign_toks,output_routine_loc);
4772@!@:output_}{\.{\\output} primitive@>
4773primitive("everypar",assign_toks,every_par_loc);
4774@!@:every_par_}{\.{\\everypar} primitive@>
4775primitive("everymath",assign_toks,every_math_loc);
4776@!@:every_math_}{\.{\\everymath} primitive@>
4777primitive("everydisplay",assign_toks,every_display_loc);
4778@!@:every_display_}{\.{\\everydisplay} primitive@>
4779primitive("everyhbox",assign_toks,every_hbox_loc);
4780@!@:every_hbox_}{\.{\\everyhbox} primitive@>
4781primitive("everyvbox",assign_toks,every_vbox_loc);
4782@!@:every_vbox_}{\.{\\everyvbox} primitive@>
4783primitive("everyjob",assign_toks,every_job_loc);
4784@!@:every_job_}{\.{\\everyjob} primitive@>
4785primitive("everycr",assign_toks,every_cr_loc);
4786@!@:every_cr_}{\.{\\everycr} primitive@>
4787primitive("errhelp",assign_toks,err_help_loc);
4788@!@:err_help_}{\.{\\errhelp} primitive@>
4789
4790@ @<Cases of |print_cmd_chr|...@>=
4791assign_toks: if chr_code>=toks_base then
4792  begin print_esc("toks"); print_int(chr_code-toks_base);
4793  end
4794else  case chr_code of
4795  output_routine_loc: print_esc("output");
4796  every_par_loc: print_esc("everypar");
4797  every_math_loc: print_esc("everymath");
4798  every_display_loc: print_esc("everydisplay");
4799  every_hbox_loc: print_esc("everyhbox");
4800  every_vbox_loc: print_esc("everyvbox");
4801  every_job_loc: print_esc("everyjob");
4802  every_cr_loc: print_esc("everycr");
4803  othercases print_esc("errhelp")
4804  endcases;
4805
4806@ We initialize most things to null or undefined values. An undefined font
4807is represented by the internal code |font_base|.
4808
4809However, the character code tables are given initial values based on the
4810conventional interpretation of ASCII code. These initial values should
4811not be changed when \TeX\ is adapted for use with non-English languages;
4812all changes to the initialization conventions should be made in format
4813packages, not in \TeX\ itself, so that global interchange of formats is
4814possible.
4815
4816@d null_font==font_base
4817@d var_code==@'70000 {math code meaning ``use the current family''}
4818
4819@<Initialize table entries...@>=
4820par_shape_ptr:=null; eq_type(par_shape_loc):=shape_ref;
4821eq_level(par_shape_loc):=level_one;@/
4822for k:=output_routine_loc to toks_base+255 do
4823  eqtb[k]:=eqtb[undefined_control_sequence];
4824box(0):=null; eq_type(box_base):=box_ref; eq_level(box_base):=level_one;
4825for k:=box_base+1 to box_base+255 do eqtb[k]:=eqtb[box_base];
4826cur_font:=null_font; eq_type(cur_font_loc):=data;
4827eq_level(cur_font_loc):=level_one;@/
4828for k:=math_font_base to math_font_base+47 do eqtb[k]:=eqtb[cur_font_loc];
4829equiv(cat_code_base):=0; eq_type(cat_code_base):=data;
4830eq_level(cat_code_base):=level_one;@/
4831for k:=cat_code_base+1 to int_base-1 do eqtb[k]:=eqtb[cat_code_base];
4832for k:=0 to 255 do
4833  begin cat_code(k):=other_char; math_code(k):=hi(k); sf_code(k):=1000;
4834  end;
4835cat_code(carriage_return):=car_ret; cat_code(" "):=spacer;
4836cat_code("\"):=escape; cat_code("%"):=comment;
4837cat_code(invalid_code):=invalid_char; cat_code(null_code):=ignore;
4838for k:="0" to "9" do math_code(k):=hi(k+var_code);
4839for k:="A" to "Z" do
4840  begin cat_code(k):=letter; cat_code(k+"a"-"A"):=letter;@/
4841  math_code(k):=hi(k+var_code+@"100);
4842  math_code(k+"a"-"A"):=hi(k+"a"-"A"+var_code+@"100);@/
4843  lc_code(k):=k+"a"-"A"; lc_code(k+"a"-"A"):=k+"a"-"A";@/
4844  uc_code(k):=k; uc_code(k+"a"-"A"):=k;@/
4845  sf_code(k):=999;
4846  end;
4847
4848@ @<Show equivalent |n|, in region 4@>=
4849if n=par_shape_loc then
4850  begin print_esc("parshape"); print_char("=");
4851  if par_shape_ptr=null then print_char("0")
4852  else print_int(info(par_shape_ptr));
4853  end
4854else if n<toks_base then
4855  begin print_cmd_chr(assign_toks,n); print_char("=");
4856  if equiv(n)<>null then show_token_list(link(equiv(n)),null,32);
4857  end
4858else if n<box_base then
4859  begin print_esc("toks"); print_int(n-toks_base); print_char("=");
4860  if equiv(n)<>null then show_token_list(link(equiv(n)),null,32);
4861  end
4862else if n<cur_font_loc then
4863  begin print_esc("box"); print_int(n-box_base); print_char("=");
4864  if equiv(n)=null then print("void")
4865  else  begin depth_threshold:=0; breadth_max:=1; show_node_list(equiv(n));
4866    end;
4867  end
4868else if n<cat_code_base then @<Show the font identifier in |eqtb[n]|@>
4869else @<Show the halfword code in |eqtb[n]|@>
4870
4871@ @<Show the font identifier in |eqtb[n]|@>=
4872begin if n=cur_font_loc then print("current font")
4873else if n<math_font_base+16 then
4874  begin print_esc("textfont"); print_int(n-math_font_base);
4875  end
4876else if n<math_font_base+32 then
4877  begin print_esc("scriptfont"); print_int(n-math_font_base-16);
4878  end
4879else  begin print_esc("scriptscriptfont"); print_int(n-math_font_base-32);
4880  end;
4881print_char("=");@/
4882print_esc(hash[font_id_base+equiv(n)].rh);
4883  {that's |font_id_text(equiv(n))|}
4884end
4885
4886@ @<Show the halfword code in |eqtb[n]|@>=
4887if n<math_code_base then
4888  begin if n<lc_code_base then
4889    begin print_esc("catcode"); print_int(n-cat_code_base);
4890    end
4891  else if n<uc_code_base then
4892    begin print_esc("lccode"); print_int(n-lc_code_base);
4893    end
4894  else if n<sf_code_base then
4895    begin print_esc("uccode"); print_int(n-uc_code_base);
4896    end
4897  else  begin print_esc("sfcode"); print_int(n-sf_code_base);
4898    end;
4899  print_char("="); print_int(equiv(n));
4900  end
4901else  begin print_esc("mathcode"); print_int(n-math_code_base);
4902  print_char("="); print_int(ho(equiv(n)));
4903  end
4904
4905@ Region 5 of |eqtb| contains the integer parameters and registers defined
4906here, as well as the |del_code| table. The latter table differs from the
4907|cat_code..math_code| tables that precede it, since delimiter codes are
4908fullword integers while the other kinds of codes occupy at most a
4909halfword. This is what makes region~5 different from region~4. We will
4910store the |eq_level| information in an auxiliary array of quarterwords
4911that will be defined later.
4912
4913@d pretolerance_code=0 {badness tolerance before hyphenation}
4914@d tolerance_code=1 {badness tolerance after hyphenation}
4915@d line_penalty_code=2 {added to the badness of every line}
4916@d hyphen_penalty_code=3 {penalty for break after discretionary hyphen}
4917@d ex_hyphen_penalty_code=4 {penalty for break after explicit hyphen}
4918@d club_penalty_code=5 {penalty for creating a club line}
4919@d widow_penalty_code=6 {penalty for creating a widow line}
4920@d display_widow_penalty_code=7 {ditto, just before a display}
4921@d broken_penalty_code=8 {penalty for breaking a page at a broken line}
4922@d bin_op_penalty_code=9 {penalty for breaking after a binary operation}
4923@d rel_penalty_code=10 {penalty for breaking after a relation}
4924@d pre_display_penalty_code=11
4925  {penalty for breaking just before a displayed formula}
4926@d post_display_penalty_code=12
4927  {penalty for breaking just after a displayed formula}
4928@d inter_line_penalty_code=13 {additional penalty between lines}
4929@d double_hyphen_demerits_code=14 {demerits for double hyphen break}
4930@d final_hyphen_demerits_code=15 {demerits for final hyphen break}
4931@d adj_demerits_code=16 {demerits for adjacent incompatible lines}
4932@d mag_code=17 {magnification ratio}
4933@d delimiter_factor_code=18 {ratio for variable-size delimiters}
4934@d looseness_code=19 {change in number of lines for a paragraph}
4935@d time_code=20 {current time of day}
4936@d day_code=21 {current day of the month}
4937@d month_code=22 {current month of the year}
4938@d year_code=23 {current year of our Lord}
4939@d show_box_breadth_code=24 {nodes per level in |show_box|}
4940@d show_box_depth_code=25 {maximum level in |show_box|}
4941@d hbadness_code=26 {hboxes exceeding this badness will be shown by |hpack|}
4942@d vbadness_code=27 {vboxes exceeding this badness will be shown by |vpack|}
4943@d pausing_code=28 {pause after each line is read from a file}
4944@d tracing_online_code=29 {show diagnostic output on terminal}
4945@d tracing_macros_code=30 {show macros as they are being expanded}
4946@d tracing_stats_code=31 {show memory usage if \TeX\ knows it}
4947@d tracing_paragraphs_code=32 {show line-break calculations}
4948@d tracing_pages_code=33 {show page-break calculations}
4949@d tracing_output_code=34 {show boxes when they are shipped out}
4950@d tracing_lost_chars_code=35 {show characters that aren't in the font}
4951@d tracing_commands_code=36 {show command codes at |big_switch|}
4952@d tracing_restores_code=37 {show equivalents when they are restored}
4953@d uc_hyph_code=38 {hyphenate words beginning with a capital letter}
4954@d output_penalty_code=39 {penalty found at current page break}
4955@d max_dead_cycles_code=40 {bound on consecutive dead cycles of output}
4956@d hang_after_code=41 {hanging indentation changes after this many lines}
4957@d floating_penalty_code=42 {penalty for insertions heldover after a split}
4958@d global_defs_code=43 {override \.{\\global} specifications}
4959@d cur_fam_code=44 {current family}
4960@d escape_char_code=45 {escape character for token output}
4961@d default_hyphen_char_code=46 {value of \.{\\hyphenchar} when a font is loaded}
4962@d default_skew_char_code=47 {value of \.{\\skewchar} when a font is loaded}
4963@d end_line_char_code=48 {character placed at the right end of the buffer}
4964@d new_line_char_code=49 {character that prints as |print_ln|}
4965@d language_code=50 {current hyphenation table}
4966@d left_hyphen_min_code=51 {minimum left hyphenation fragment size}
4967@d right_hyphen_min_code=52 {minimum right hyphenation fragment size}
4968@d holding_inserts_code=53 {do not remove insertion nodes from \.{\\box255}}
4969@d error_context_lines_code=54 {maximum intermediate line pairs shown}
4970@d int_pars=55 {total number of integer parameters}
4971@d count_base=int_base+int_pars {256 user \.{\\count} registers}
4972@d del_code_base=count_base+256 {256 delimiter code mappings}
4973@d dimen_base=del_code_base+256 {beginning of region 6}
4974@#
4975@d del_code(#)==eqtb[del_code_base+#].int
4976@d count(#)==eqtb[count_base+#].int
4977@d int_par(#)==eqtb[int_base+#].int {an integer parameter}
4978@d pretolerance==int_par(pretolerance_code)
4979@d tolerance==int_par(tolerance_code)
4980@d line_penalty==int_par(line_penalty_code)
4981@d hyphen_penalty==int_par(hyphen_penalty_code)
4982@d ex_hyphen_penalty==int_par(ex_hyphen_penalty_code)
4983@d club_penalty==int_par(club_penalty_code)
4984@d widow_penalty==int_par(widow_penalty_code)
4985@d display_widow_penalty==int_par(display_widow_penalty_code)
4986@d broken_penalty==int_par(broken_penalty_code)
4987@d bin_op_penalty==int_par(bin_op_penalty_code)
4988@d rel_penalty==int_par(rel_penalty_code)
4989@d pre_display_penalty==int_par(pre_display_penalty_code)
4990@d post_display_penalty==int_par(post_display_penalty_code)
4991@d inter_line_penalty==int_par(inter_line_penalty_code)
4992@d double_hyphen_demerits==int_par(double_hyphen_demerits_code)
4993@d final_hyphen_demerits==int_par(final_hyphen_demerits_code)
4994@d adj_demerits==int_par(adj_demerits_code)
4995@d mag==int_par(mag_code)
4996@d delimiter_factor==int_par(delimiter_factor_code)
4997@d looseness==int_par(looseness_code)
4998@d time==int_par(time_code)
4999@d day==int_par(day_code)
5000@d month==int_par(month_code)
5001@d year==int_par(year_code)
5002@d show_box_breadth==int_par(show_box_breadth_code)
5003@d show_box_depth==int_par(show_box_depth_code)
5004@d hbadness==int_par(hbadness_code)
5005@d vbadness==int_par(vbadness_code)
5006@d pausing==int_par(pausing_code)
5007@d tracing_online==int_par(tracing_online_code)
5008@d tracing_macros==int_par(tracing_macros_code)
5009@d tracing_stats==int_par(tracing_stats_code)
5010@d tracing_paragraphs==int_par(tracing_paragraphs_code)
5011@d tracing_pages==int_par(tracing_pages_code)
5012@d tracing_output==int_par(tracing_output_code)
5013@d tracing_lost_chars==int_par(tracing_lost_chars_code)
5014@d tracing_commands==int_par(tracing_commands_code)
5015@d tracing_restores==int_par(tracing_restores_code)
5016@d uc_hyph==int_par(uc_hyph_code)
5017@d output_penalty==int_par(output_penalty_code)
5018@d max_dead_cycles==int_par(max_dead_cycles_code)
5019@d hang_after==int_par(hang_after_code)
5020@d floating_penalty==int_par(floating_penalty_code)
5021@d global_defs==int_par(global_defs_code)
5022@d cur_fam==int_par(cur_fam_code)
5023@d escape_char==int_par(escape_char_code)
5024@d default_hyphen_char==int_par(default_hyphen_char_code)
5025@d default_skew_char==int_par(default_skew_char_code)
5026@d end_line_char==int_par(end_line_char_code)
5027@d new_line_char==int_par(new_line_char_code)
5028@d language==int_par(language_code)
5029@d left_hyphen_min==int_par(left_hyphen_min_code)
5030@d right_hyphen_min==int_par(right_hyphen_min_code)
5031@d holding_inserts==int_par(holding_inserts_code)
5032@d error_context_lines==int_par(error_context_lines_code)
5033
5034@<Assign the values |depth_threshold:=show_box_depth|...@>=
5035depth_threshold:=show_box_depth;
5036breadth_max:=show_box_breadth
5037
5038@ We can print the symbolic name of an integer parameter as follows.
5039
5040@p procedure print_param(@!n:integer);
5041begin case n of
5042pretolerance_code:print_esc("pretolerance");
5043tolerance_code:print_esc("tolerance");
5044line_penalty_code:print_esc("linepenalty");
5045hyphen_penalty_code:print_esc("hyphenpenalty");
5046ex_hyphen_penalty_code:print_esc("exhyphenpenalty");
5047club_penalty_code:print_esc("clubpenalty");
5048widow_penalty_code:print_esc("widowpenalty");
5049display_widow_penalty_code:print_esc("displaywidowpenalty");
5050broken_penalty_code:print_esc("brokenpenalty");
5051bin_op_penalty_code:print_esc("binoppenalty");
5052rel_penalty_code:print_esc("relpenalty");
5053pre_display_penalty_code:print_esc("predisplaypenalty");
5054post_display_penalty_code:print_esc("postdisplaypenalty");
5055inter_line_penalty_code:print_esc("interlinepenalty");
5056double_hyphen_demerits_code:print_esc("doublehyphendemerits");
5057final_hyphen_demerits_code:print_esc("finalhyphendemerits");
5058adj_demerits_code:print_esc("adjdemerits");
5059mag_code:print_esc("mag");
5060delimiter_factor_code:print_esc("delimiterfactor");
5061looseness_code:print_esc("looseness");
5062time_code:print_esc("time");
5063day_code:print_esc("day");
5064month_code:print_esc("month");
5065year_code:print_esc("year");
5066show_box_breadth_code:print_esc("showboxbreadth");
5067show_box_depth_code:print_esc("showboxdepth");
5068hbadness_code:print_esc("hbadness");
5069vbadness_code:print_esc("vbadness");
5070pausing_code:print_esc("pausing");
5071tracing_online_code:print_esc("tracingonline");
5072tracing_macros_code:print_esc("tracingmacros");
5073tracing_stats_code:print_esc("tracingstats");
5074tracing_paragraphs_code:print_esc("tracingparagraphs");
5075tracing_pages_code:print_esc("tracingpages");
5076tracing_output_code:print_esc("tracingoutput");
5077tracing_lost_chars_code:print_esc("tracinglostchars");
5078tracing_commands_code:print_esc("tracingcommands");
5079tracing_restores_code:print_esc("tracingrestores");
5080uc_hyph_code:print_esc("uchyph");
5081output_penalty_code:print_esc("outputpenalty");
5082max_dead_cycles_code:print_esc("maxdeadcycles");
5083hang_after_code:print_esc("hangafter");
5084floating_penalty_code:print_esc("floatingpenalty");
5085global_defs_code:print_esc("globaldefs");
5086cur_fam_code:print_esc("fam");
5087escape_char_code:print_esc("escapechar");
5088default_hyphen_char_code:print_esc("defaulthyphenchar");
5089default_skew_char_code:print_esc("defaultskewchar");
5090end_line_char_code:print_esc("endlinechar");
5091new_line_char_code:print_esc("newlinechar");
5092language_code:print_esc("language");
5093left_hyphen_min_code:print_esc("lefthyphenmin");
5094right_hyphen_min_code:print_esc("righthyphenmin");
5095holding_inserts_code:print_esc("holdinginserts");
5096error_context_lines_code:print_esc("errorcontextlines");
5097othercases print("[unknown integer parameter!]")
5098endcases;
5099end;
5100
5101@ The integer parameter names must be entered into the hash table.
5102
5103@<Put each...@>=
5104primitive("pretolerance",assign_int,int_base+pretolerance_code);@/
5105@!@:pretolerance_}{\.{\\pretolerance} primitive@>
5106primitive("tolerance",assign_int,int_base+tolerance_code);@/
5107@!@:tolerance_}{\.{\\tolerance} primitive@>
5108primitive("linepenalty",assign_int,int_base+line_penalty_code);@/
5109@!@:line_penalty_}{\.{\\linepenalty} primitive@>
5110primitive("hyphenpenalty",assign_int,int_base+hyphen_penalty_code);@/
5111@!@:hyphen_penalty_}{\.{\\hyphenpenalty} primitive@>
5112primitive("exhyphenpenalty",assign_int,int_base+ex_hyphen_penalty_code);@/
5113@!@:ex_hyphen_penalty_}{\.{\\exhyphenpenalty} primitive@>
5114primitive("clubpenalty",assign_int,int_base+club_penalty_code);@/
5115@!@:club_penalty_}{\.{\\clubpenalty} primitive@>
5116primitive("widowpenalty",assign_int,int_base+widow_penalty_code);@/
5117@!@:widow_penalty_}{\.{\\widowpenalty} primitive@>
5118primitive("displaywidowpenalty",
5119  assign_int,int_base+display_widow_penalty_code);@/
5120@!@:display_widow_penalty_}{\.{\\displaywidowpenalty} primitive@>
5121primitive("brokenpenalty",assign_int,int_base+broken_penalty_code);@/
5122@!@:broken_penalty_}{\.{\\brokenpenalty} primitive@>
5123primitive("binoppenalty",assign_int,int_base+bin_op_penalty_code);@/
5124@!@:bin_op_penalty_}{\.{\\binoppenalty} primitive@>
5125primitive("relpenalty",assign_int,int_base+rel_penalty_code);@/
5126@!@:rel_penalty_}{\.{\\relpenalty} primitive@>
5127primitive("predisplaypenalty",assign_int,int_base+pre_display_penalty_code);@/
5128@!@:pre_display_penalty_}{\.{\\predisplaypenalty} primitive@>
5129primitive("postdisplaypenalty",assign_int,int_base+post_display_penalty_code);@/
5130@!@:post_display_penalty_}{\.{\\postdisplaypenalty} primitive@>
5131primitive("interlinepenalty",assign_int,int_base+inter_line_penalty_code);@/
5132@!@:inter_line_penalty_}{\.{\\interlinepenalty} primitive@>
5133primitive("doublehyphendemerits",
5134  assign_int,int_base+double_hyphen_demerits_code);@/
5135@!@:double_hyphen_demerits_}{\.{\\doublehyphendemerits} primitive@>
5136primitive("finalhyphendemerits",
5137  assign_int,int_base+final_hyphen_demerits_code);@/
5138@!@:final_hyphen_demerits_}{\.{\\finalhyphendemerits} primitive@>
5139primitive("adjdemerits",assign_int,int_base+adj_demerits_code);@/
5140@!@:adj_demerits_}{\.{\\adjdemerits} primitive@>
5141primitive("mag",assign_int,int_base+mag_code);@/
5142@!@:mag_}{\.{\\mag} primitive@>
5143primitive("delimiterfactor",assign_int,int_base+delimiter_factor_code);@/
5144@!@:delimiter_factor_}{\.{\\delimiterfactor} primitive@>
5145primitive("looseness",assign_int,int_base+looseness_code);@/
5146@!@:looseness_}{\.{\\looseness} primitive@>
5147primitive("time",assign_int,int_base+time_code);@/
5148@!@:time_}{\.{\\time} primitive@>
5149primitive("day",assign_int,int_base+day_code);@/
5150@!@:day_}{\.{\\day} primitive@>
5151primitive("month",assign_int,int_base+month_code);@/
5152@!@:month_}{\.{\\month} primitive@>
5153primitive("year",assign_int,int_base+year_code);@/
5154@!@:year_}{\.{\\year} primitive@>
5155primitive("showboxbreadth",assign_int,int_base+show_box_breadth_code);@/
5156@!@:show_box_breadth_}{\.{\\showboxbreadth} primitive@>
5157primitive("showboxdepth",assign_int,int_base+show_box_depth_code);@/
5158@!@:show_box_depth_}{\.{\\showboxdepth} primitive@>
5159primitive("hbadness",assign_int,int_base+hbadness_code);@/
5160@!@:hbadness_}{\.{\\hbadness} primitive@>
5161primitive("vbadness",assign_int,int_base+vbadness_code);@/
5162@!@:vbadness_}{\.{\\vbadness} primitive@>
5163primitive("pausing",assign_int,int_base+pausing_code);@/
5164@!@:pausing_}{\.{\\pausing} primitive@>
5165primitive("tracingonline",assign_int,int_base+tracing_online_code);@/
5166@!@:tracing_online_}{\.{\\tracingonline} primitive@>
5167primitive("tracingmacros",assign_int,int_base+tracing_macros_code);@/
5168@!@:tracing_macros_}{\.{\\tracingmacros} primitive@>
5169primitive("tracingstats",assign_int,int_base+tracing_stats_code);@/
5170@!@:tracing_stats_}{\.{\\tracingstats} primitive@>
5171primitive("tracingparagraphs",assign_int,int_base+tracing_paragraphs_code);@/
5172@!@:tracing_paragraphs_}{\.{\\tracingparagraphs} primitive@>
5173primitive("tracingpages",assign_int,int_base+tracing_pages_code);@/
5174@!@:tracing_pages_}{\.{\\tracingpages} primitive@>
5175primitive("tracingoutput",assign_int,int_base+tracing_output_code);@/
5176@!@:tracing_output_}{\.{\\tracingoutput} primitive@>
5177primitive("tracinglostchars",assign_int,int_base+tracing_lost_chars_code);@/
5178@!@:tracing_lost_chars_}{\.{\\tracinglostchars} primitive@>
5179primitive("tracingcommands",assign_int,int_base+tracing_commands_code);@/
5180@!@:tracing_commands_}{\.{\\tracingcommands} primitive@>
5181primitive("tracingrestores",assign_int,int_base+tracing_restores_code);@/
5182@!@:tracing_restores_}{\.{\\tracingrestores} primitive@>
5183primitive("uchyph",assign_int,int_base+uc_hyph_code);@/
5184@!@:uc_hyph_}{\.{\\uchyph} primitive@>
5185primitive("outputpenalty",assign_int,int_base+output_penalty_code);@/
5186@!@:output_penalty_}{\.{\\outputpenalty} primitive@>
5187primitive("maxdeadcycles",assign_int,int_base+max_dead_cycles_code);@/
5188@!@:max_dead_cycles_}{\.{\\maxdeadcycles} primitive@>
5189primitive("hangafter",assign_int,int_base+hang_after_code);@/
5190@!@:hang_after_}{\.{\\hangafter} primitive@>
5191primitive("floatingpenalty",assign_int,int_base+floating_penalty_code);@/
5192@!@:floating_penalty_}{\.{\\floatingpenalty} primitive@>
5193primitive("globaldefs",assign_int,int_base+global_defs_code);@/
5194@!@:global_defs_}{\.{\\globaldefs} primitive@>
5195primitive("fam",assign_int,int_base+cur_fam_code);@/
5196@!@:fam_}{\.{\\fam} primitive@>
5197primitive("escapechar",assign_int,int_base+escape_char_code);@/
5198@!@:escape_char_}{\.{\\escapechar} primitive@>
5199primitive("defaulthyphenchar",assign_int,int_base+default_hyphen_char_code);@/
5200@!@:default_hyphen_char_}{\.{\\defaulthyphenchar} primitive@>
5201primitive("defaultskewchar",assign_int,int_base+default_skew_char_code);@/
5202@!@:default_skew_char_}{\.{\\defaultskewchar} primitive@>
5203primitive("endlinechar",assign_int,int_base+end_line_char_code);@/
5204@!@:end_line_char_}{\.{\\endlinechar} primitive@>
5205primitive("newlinechar",assign_int,int_base+new_line_char_code);@/
5206@!@:new_line_char_}{\.{\\newlinechar} primitive@>
5207primitive("language",assign_int,int_base+language_code);@/
5208@!@:language_}{\.{\\language} primitive@>
5209primitive("lefthyphenmin",assign_int,int_base+left_hyphen_min_code);@/
5210@!@:left_hyphen_min_}{\.{\\lefthyphenmin} primitive@>
5211primitive("righthyphenmin",assign_int,int_base+right_hyphen_min_code);@/
5212@!@:right_hyphen_min_}{\.{\\righthyphenmin} primitive@>
5213primitive("holdinginserts",assign_int,int_base+holding_inserts_code);@/
5214@!@:holding_inserts_}{\.{\\holdinginserts} primitive@>
5215primitive("errorcontextlines",assign_int,int_base+error_context_lines_code);@/
5216@!@:error_context_lines_}{\.{\\errorcontextlines} primitive@>
5217
5218@ @<Cases of |print_cmd_chr|...@>=
5219assign_int: if chr_code<count_base then print_param(chr_code-int_base)
5220  else  begin print_esc("count"); print_int(chr_code-count_base);
5221    end;
5222
5223@ The integer parameters should really be initialized by a macro package;
5224the following initialization does the minimum to keep \TeX\ from
5225complete failure.
5226@^null delimiter@>
5227
5228@<Initialize table entries...@>=
5229for k:=int_base to del_code_base-1 do eqtb[k].int:=0;
5230mag:=1000; tolerance:=10000; hang_after:=1; max_dead_cycles:=25;
5231escape_char:="\"; end_line_char:=carriage_return;
5232for k:=0 to 255 do del_code(k):=-1;
5233del_code("."):=0; {this null delimiter is used in error recovery}
5234
5235@ The following procedure, which is called just before \TeX\ initializes its
5236input and output, establishes the initial values of the date and time.
5237@^system dependencies@>
5238Since standard \PASCAL\ cannot provide such information, something special
5239is needed. The program here simply specifies July 4, 1776, at noon; but
5240users probably want a better approximation to the truth.
5241
5242@p procedure fix_date_and_time;
5243begin time:=12*60; {minutes since midnight}
5244day:=4; {fourth day of the month}
5245month:=7; {seventh month of the year}
5246year:=1776; {Anno Domini}
5247end;
5248
5249@ @<Show equivalent |n|, in region 5@>=
5250begin if n<count_base then print_param(n-int_base)
5251else if  n<del_code_base then
5252  begin print_esc("count"); print_int(n-count_base);
5253  end
5254else  begin print_esc("delcode"); print_int(n-del_code_base);
5255  end;
5256print_char("="); print_int(eqtb[n].int);
5257end
5258
5259@ @<Set variable |c| to the current escape character@>=c:=escape_char
5260
5261@ @<Character |s| is the current new-line character@>=s=new_line_char
5262
5263@ \TeX\ is occasionally supposed to print diagnostic information that
5264goes only into the transcript file, unless |tracing_online| is positive.
5265Here are two routines that adjust the destination of print commands:
5266
5267@p procedure begin_diagnostic; {prepare to do some tracing}
5268begin old_setting:=selector;
5269if (tracing_online<=0)and(selector=term_and_log) then
5270  begin decr(selector);
5271  if history=spotless then history:=warning_issued;
5272  end;
5273end;
5274@#
5275procedure end_diagnostic(@!blank_line:boolean);
5276  {restore proper conditions after tracing}
5277begin print_nl("");
5278if blank_line then print_ln;
5279selector:=old_setting;
5280end;
5281
5282@ Of course we had better declare another global variable, if the previous
5283routines are going to work.
5284
5285@<Glob...@>=
5286@!old_setting:0..max_selector;
5287
5288@ The final region of |eqtb| contains the dimension parameters defined
5289here, and the 256 \.{\\dimen} registers.
5290
5291@d par_indent_code=0 {indentation of paragraphs}
5292@d math_surround_code=1 {space around math in text}
5293@d line_skip_limit_code=2 {threshold for |line_skip| instead of |baseline_skip|}
5294@d hsize_code=3 {line width in horizontal mode}
5295@d vsize_code=4 {page height in vertical mode}
5296@d max_depth_code=5 {maximum depth of boxes on main pages}
5297@d split_max_depth_code=6 {maximum depth of boxes on split pages}
5298@d box_max_depth_code=7 {maximum depth of explicit vboxes}
5299@d hfuzz_code=8 {tolerance for overfull hbox messages}
5300@d vfuzz_code=9 {tolerance for overfull vbox messages}
5301@d delimiter_shortfall_code=10 {maximum amount uncovered by variable delimiters}
5302@d null_delimiter_space_code=11 {blank space in null delimiters}
5303@d script_space_code=12 {extra space after subscript or superscript}
5304@d pre_display_size_code=13 {length of text preceding a display}
5305@d display_width_code=14 {length of line for displayed equation}
5306@d display_indent_code=15 {indentation of line for displayed equation}
5307@d overfull_rule_code=16 {width of rule that identifies overfull hboxes}
5308@d hang_indent_code=17 {amount of hanging indentation}
5309@d h_offset_code=18 {amount of horizontal offset when shipping pages out}
5310@d v_offset_code=19 {amount of vertical offset when shipping pages out}
5311@d emergency_stretch_code=20 {reduces badnesses on final pass of line-breaking}
5312@d dimen_pars=21 {total number of dimension parameters}
5313@d scaled_base=dimen_base+dimen_pars
5314  {table of 256 user-defined \.{\\dimen} registers}
5315@d eqtb_size=scaled_base+255 {largest subscript of |eqtb|}
5316@#
5317@d dimen(#)==eqtb[scaled_base+#].sc
5318@d dimen_par(#)==eqtb[dimen_base+#].sc {a scaled quantity}
5319@d par_indent==dimen_par(par_indent_code)
5320@d math_surround==dimen_par(math_surround_code)
5321@d line_skip_limit==dimen_par(line_skip_limit_code)
5322@d hsize==dimen_par(hsize_code)
5323@d vsize==dimen_par(vsize_code)
5324@d max_depth==dimen_par(max_depth_code)
5325@d split_max_depth==dimen_par(split_max_depth_code)
5326@d box_max_depth==dimen_par(box_max_depth_code)
5327@d hfuzz==dimen_par(hfuzz_code)
5328@d vfuzz==dimen_par(vfuzz_code)
5329@d delimiter_shortfall==dimen_par(delimiter_shortfall_code)
5330@d null_delimiter_space==dimen_par(null_delimiter_space_code)
5331@d script_space==dimen_par(script_space_code)
5332@d pre_display_size==dimen_par(pre_display_size_code)
5333@d display_width==dimen_par(display_width_code)
5334@d display_indent==dimen_par(display_indent_code)
5335@d overfull_rule==dimen_par(overfull_rule_code)
5336@d hang_indent==dimen_par(hang_indent_code)
5337@d h_offset==dimen_par(h_offset_code)
5338@d v_offset==dimen_par(v_offset_code)
5339@d emergency_stretch==dimen_par(emergency_stretch_code)
5340
5341@p procedure print_length_param(@!n:integer);
5342begin case n of
5343par_indent_code:print_esc("parindent");
5344math_surround_code:print_esc("mathsurround");
5345line_skip_limit_code:print_esc("lineskiplimit");
5346hsize_code:print_esc("hsize");
5347vsize_code:print_esc("vsize");
5348max_depth_code:print_esc("maxdepth");
5349split_max_depth_code:print_esc("splitmaxdepth");
5350box_max_depth_code:print_esc("boxmaxdepth");
5351hfuzz_code:print_esc("hfuzz");
5352vfuzz_code:print_esc("vfuzz");
5353delimiter_shortfall_code:print_esc("delimitershortfall");
5354null_delimiter_space_code:print_esc("nulldelimiterspace");
5355script_space_code:print_esc("scriptspace");
5356pre_display_size_code:print_esc("predisplaysize");
5357display_width_code:print_esc("displaywidth");
5358display_indent_code:print_esc("displayindent");
5359overfull_rule_code:print_esc("overfullrule");
5360hang_indent_code:print_esc("hangindent");
5361h_offset_code:print_esc("hoffset");
5362v_offset_code:print_esc("voffset");
5363emergency_stretch_code:print_esc("emergencystretch");
5364othercases print("[unknown dimen parameter!]")
5365endcases;
5366end;
5367
5368@ @<Put each...@>=
5369primitive("parindent",assign_dimen,dimen_base+par_indent_code);@/
5370@!@:par_indent_}{\.{\\parindent} primitive@>
5371primitive("mathsurround",assign_dimen,dimen_base+math_surround_code);@/
5372@!@:math_surround_}{\.{\\mathsurround} primitive@>
5373primitive("lineskiplimit",assign_dimen,dimen_base+line_skip_limit_code);@/
5374@!@:line_skip_limit_}{\.{\\lineskiplimit} primitive@>
5375primitive("hsize",assign_dimen,dimen_base+hsize_code);@/
5376@!@:hsize_}{\.{\\hsize} primitive@>
5377primitive("vsize",assign_dimen,dimen_base+vsize_code);@/
5378@!@:vsize_}{\.{\\vsize} primitive@>
5379primitive("maxdepth",assign_dimen,dimen_base+max_depth_code);@/
5380@!@:max_depth_}{\.{\\maxdepth} primitive@>
5381primitive("splitmaxdepth",assign_dimen,dimen_base+split_max_depth_code);@/
5382@!@:split_max_depth_}{\.{\\splitmaxdepth} primitive@>
5383primitive("boxmaxdepth",assign_dimen,dimen_base+box_max_depth_code);@/
5384@!@:box_max_depth_}{\.{\\boxmaxdepth} primitive@>
5385primitive("hfuzz",assign_dimen,dimen_base+hfuzz_code);@/
5386@!@:hfuzz_}{\.{\\hfuzz} primitive@>
5387primitive("vfuzz",assign_dimen,dimen_base+vfuzz_code);@/
5388@!@:vfuzz_}{\.{\\vfuzz} primitive@>
5389primitive("delimitershortfall",
5390  assign_dimen,dimen_base+delimiter_shortfall_code);@/
5391@!@:delimiter_shortfall_}{\.{\\delimitershortfall} primitive@>
5392primitive("nulldelimiterspace",
5393  assign_dimen,dimen_base+null_delimiter_space_code);@/
5394@!@:null_delimiter_space_}{\.{\\nulldelimiterspace} primitive@>
5395primitive("scriptspace",assign_dimen,dimen_base+script_space_code);@/
5396@!@:script_space_}{\.{\\scriptspace} primitive@>
5397primitive("predisplaysize",assign_dimen,dimen_base+pre_display_size_code);@/
5398@!@:pre_display_size_}{\.{\\predisplaysize} primitive@>
5399primitive("displaywidth",assign_dimen,dimen_base+display_width_code);@/
5400@!@:display_width_}{\.{\\displaywidth} primitive@>
5401primitive("displayindent",assign_dimen,dimen_base+display_indent_code);@/
5402@!@:display_indent_}{\.{\\displayindent} primitive@>
5403primitive("overfullrule",assign_dimen,dimen_base+overfull_rule_code);@/
5404@!@:overfull_rule_}{\.{\\overfullrule} primitive@>
5405primitive("hangindent",assign_dimen,dimen_base+hang_indent_code);@/
5406@!@:hang_indent_}{\.{\\hangindent} primitive@>
5407primitive("hoffset",assign_dimen,dimen_base+h_offset_code);@/
5408@!@:h_offset_}{\.{\\hoffset} primitive@>
5409primitive("voffset",assign_dimen,dimen_base+v_offset_code);@/
5410@!@:v_offset_}{\.{\\voffset} primitive@>
5411primitive("emergencystretch",assign_dimen,dimen_base+emergency_stretch_code);@/
5412@!@:emergency_stretch_}{\.{\\emergencystretch} primitive@>
5413
5414@ @<Cases of |print_cmd_chr|...@>=
5415assign_dimen: if chr_code<scaled_base then
5416    print_length_param(chr_code-dimen_base)
5417  else  begin print_esc("dimen"); print_int(chr_code-scaled_base);
5418    end;
5419
5420@ @<Initialize table entries...@>=
5421for k:=dimen_base to eqtb_size do eqtb[k].sc:=0;
5422
5423@ @<Show equivalent |n|, in region 6@>=
5424begin if n<scaled_base then print_length_param(n-dimen_base)
5425else  begin print_esc("dimen"); print_int(n-scaled_base);
5426  end;
5427print_char("="); print_scaled(eqtb[n].sc); print("pt");
5428end
5429
5430@ Here is a procedure that displays the contents of |eqtb[n]|
5431symbolically.
5432
5433@p@t\4@>@<Declare the procedure called |print_cmd_chr|@>@;@/
5434@!stat procedure show_eqtb(@!n:pointer);
5435begin if n<active_base then print_char("?") {this can't happen}
5436else if n<glue_base then @<Show equivalent |n|, in region 1 or 2@>
5437else if n<local_base then @<Show equivalent |n|, in region 3@>
5438else if n<int_base then @<Show equivalent |n|, in region 4@>
5439else if n<dimen_base then @<Show equivalent |n|, in region 5@>
5440else if n<=eqtb_size then @<Show equivalent |n|, in region 6@>
5441else print_char("?"); {this can't happen either}
5442end;
5443tats
5444
5445@ The last two regions of |eqtb| have fullword values instead of the
5446three fields |eq_level|, |eq_type|, and |equiv|. An |eq_type| is unnecessary,
5447but \TeX\ needs to store the |eq_level| information in another array
5448called |xeq_level|.
5449
5450@<Glob...@>=
5451@!eqtb:array[active_base..eqtb_size] of memory_word;
5452@!xeq_level:array[int_base..eqtb_size] of quarterword;
5453
5454@ @<Set init...@>=
5455for k:=int_base to eqtb_size do xeq_level[k]:=level_one;
5456
5457@ When the debugging routine |search_mem| is looking for pointers having a
5458given value, it is interested only in regions 1 to~3 of~|eqtb|, and in the
5459first part of region~4.
5460
5461@<Search |eqtb| for equivalents equal to |p|@>=
5462for q:=active_base to box_base+255 do
5463  begin if equiv(q)=p then
5464    begin print_nl("EQUIV("); print_int(q); print_char(")");
5465    end;
5466  end
5467
5468@* \[18] The hash table.
5469Control sequences are stored and retrieved by means of a fairly standard hash
5470table algorithm called the method of ``coalescing lists'' (cf.\ Algorithm 6.4C
5471in {\sl The Art of Computer Programming\/}). Once a control sequence enters the
5472table, it is never removed, because there are complicated situations
5473involving \.{\\gdef} where the removal of a control sequence at the end of
5474a group would be a mistake preventable only by the introduction of a
5475complicated reference-count mechanism.
5476
5477The actual sequence of letters forming a control sequence identifier is
5478stored in the |str_pool| array together with all the other strings. An
5479auxiliary array |hash| consists of items with two halfword fields per
5480word. The first of these, called |next(p)|, points to the next identifier
5481belonging to the same coalesced list as the identifier corresponding to~|p|;
5482and the other, called |text(p)|, points to the |str_start| entry for
5483|p|'s identifier. If position~|p| of the hash table is empty, we have
5484|text(p)=0|; if position |p| is either empty or the end of a coalesced
5485hash list, we have |next(p)=0|. An auxiliary pointer variable called
5486|hash_used| is maintained in such a way that all locations |p>=hash_used|
5487are nonempty. The global variable |cs_count| tells how many multiletter
5488control sequences have been defined, if statistics are being kept.
5489
5490A global boolean variable called |no_new_control_sequence| is set to
5491|true| during the time that new hash table entries are forbidden.
5492
5493@d next(#) == hash[#].lh {link for coalesced lists}
5494@d text(#) == hash[#].rh {string number for control sequence name}
5495@d hash_is_full == (hash_used=hash_base) {test if all positions are occupied}
5496@d font_id_text(#) == text(font_id_base+#) {a frozen font identifier's name}
5497
5498@<Glob...@>=
5499@!hash: array[hash_base..undefined_control_sequence-1] of two_halves;
5500  {the hash table}
5501@!hash_used:pointer; {allocation pointer for |hash|}
5502@!no_new_control_sequence:boolean; {are new identifiers legal?}
5503@!cs_count:integer; {total number of known identifiers}
5504
5505@ @<Set init...@>=
5506no_new_control_sequence:=true; {new identifiers are usually forbidden}
5507next(hash_base):=0; text(hash_base):=0;
5508for k:=hash_base+1 to undefined_control_sequence-1 do hash[k]:=hash[hash_base];
5509
5510@ @<Initialize table entries...@>=
5511hash_used:=frozen_control_sequence; {nothing is used}
5512cs_count:=0;
5513eq_type(frozen_dont_expand):=dont_expand;
5514text(frozen_dont_expand):="notexpanded:";
5515@.notexpanded:@>
5516
5517@ Here is the subroutine that searches the hash table for an identifier
5518that matches a given string of length |l>1| appearing in |buffer[j..
5519(j+l-1)]|. If the identifier is found, the corresponding hash table address
5520is returned. Otherwise, if the global variable |no_new_control_sequence|
5521is |true|, the dummy address |undefined_control_sequence| is returned.
5522Otherwise the identifier is inserted into the hash table and its location
5523is returned.
5524
5525@p function id_lookup(@!j,@!l:integer):pointer; {search the hash table}
5526label found; {go here if you found it}
5527var h:integer; {hash code}
5528@!d:integer; {number of characters in incomplete current string}
5529@!p:pointer; {index in |hash| array}
5530@!k:pointer; {index in |buffer| array}
5531begin @<Compute the hash code |h|@>;
5532p:=h+hash_base; {we start searching here; note that |0<=h<hash_prime|}
5533loop@+begin if text(p)>0 then if length(text(p))=l then
5534    if str_eq_buf(text(p),j) then goto found;
5535  if next(p)=0 then
5536    begin if no_new_control_sequence then
5537      p:=undefined_control_sequence
5538    else @<Insert a new control sequence after |p|, then make
5539      |p| point to it@>;
5540    goto found;
5541    end;
5542  p:=next(p);
5543  end;
5544found: id_lookup:=p;
5545end;
5546
5547@ @<Insert a new control...@>=
5548begin if text(p)>0 then
5549  begin repeat if hash_is_full then overflow("hash size",hash_size);
5550@:TeX capacity exceeded hash size}{\quad hash size@>
5551  decr(hash_used);
5552  until text(hash_used)=0; {search for an empty location in |hash|}
5553  next(p):=hash_used; p:=hash_used;
5554  end;
5555str_room(l); d:=cur_length;
5556while pool_ptr>str_start[str_ptr] do
5557  begin decr(pool_ptr); str_pool[pool_ptr+l]:=str_pool[pool_ptr];
5558  end; {move current string up to make room for another}
5559for k:=j to j+l-1 do append_char(buffer[k]);
5560text(p):=make_string; pool_ptr:=pool_ptr+d;
5561@!stat incr(cs_count);@+tats@;@/
5562end
5563
5564@ The value of |hash_prime| should be roughly 85\pct! of |hash_size|, and it
5565should be a prime number.  The theory of hashing tells us to expect fewer
5566than two table probes, on the average, when the search is successful.
5567[See J.~S. Vitter, {\sl Journal of the ACM\/ \bf30} (1983), 231--258.]
5568@^Vitter, Jeffrey Scott@>
5569
5570@<Compute the hash code |h|@>=
5571h:=buffer[j];
5572for k:=j+1 to j+l-1 do
5573  begin h:=h+h+buffer[k];
5574  while h>=hash_prime do h:=h-hash_prime;
5575  end
5576
5577@ Single-character control sequences do not need to be looked up in a hash
5578table, since we can use the character code itself as a direct address.
5579The procedure |print_cs| prints the name of a control sequence, given
5580a pointer to its address in |eqtb|. A space is printed after the name
5581unless it is a single nonletter or an active character. This procedure
5582might be invoked with invalid data, so it is ``extra robust.'' The
5583individual characters must be printed one at a time using |print|, since
5584they may be unprintable.
5585
5586@<Basic printing...@>=
5587procedure print_cs(@!p:integer); {prints a purported control sequence}
5588begin if p<hash_base then {single character}
5589  if p>=single_base then
5590    if p=null_cs then
5591      begin print_esc("csname"); print_esc("endcsname"); print_char(" ");
5592      end
5593    else  begin print_esc(p-single_base);
5594      if cat_code(p-single_base)=letter then print_char(" ");
5595      end
5596  else if p<active_base then print_esc("IMPOSSIBLE.")
5597@.IMPOSSIBLE@>
5598  else print(p-active_base)
5599else if p>=undefined_control_sequence then print_esc("IMPOSSIBLE.")
5600else if (text(p)<0)or(text(p)>=str_ptr) then print_esc("NONEXISTENT.")
5601@.NONEXISTENT@>
5602else  begin print_esc(text(p));
5603  print_char(" ");
5604  end;
5605end;
5606
5607@ Here is a similar procedure; it avoids the error checks, and it never
5608prints a space after the control sequence.
5609
5610@<Basic printing procedures@>=
5611procedure sprint_cs(@!p:pointer); {prints a control sequence}
5612begin if p<hash_base then
5613  if p<single_base then print(p-active_base)
5614  else  if p<null_cs then print_esc(p-single_base)
5615    else  begin print_esc("csname"); print_esc("endcsname");
5616      end
5617else print_esc(text(p));
5618end;
5619
5620@ We need to put \TeX's ``primitive'' control sequences into the hash
5621table, together with their command code (which will be the |eq_type|)
5622and an operand (which will be the |equiv|). The |primitive| procedure
5623does this, in a way that no \TeX\ user can. The global value |cur_val|
5624contains the new |eqtb| pointer after |primitive| has acted.
5625
5626@p @!init procedure primitive(@!s:str_number;@!c:quarterword;@!o:halfword);
5627var k:pool_pointer; {index into |str_pool|}
5628@!j:small_number; {index into |buffer|}
5629@!l:small_number; {length of the string}
5630begin if s<256 then cur_val:=s+single_base
5631else  begin k:=str_start[s]; l:=str_start[s+1]-k;
5632    {we will move |s| into the (empty) |buffer|}
5633  for j:=0 to l-1 do buffer[j]:=so(str_pool[k+j]);
5634  cur_val:=id_lookup(0,l); {|no_new_control_sequence| is |false|}
5635  flush_string; text(cur_val):=s; {we don't want to have the string twice}
5636  end;
5637eq_level(cur_val):=level_one; eq_type(cur_val):=c; equiv(cur_val):=o;
5638end;
5639tini
5640
5641@ Many of \TeX's primitives need no |equiv|, since they are identifiable
5642by their |eq_type| alone. These primitives are loaded into the hash table
5643as follows:
5644
5645@<Put each of \TeX's primitives into the hash table@>=
5646primitive(" ",ex_space,0);@/
5647@!@:Single-character primitives /}{\quad\.{\\\ }@>
5648primitive("/",ital_corr,0);@/
5649@!@:Single-character primitives /}{\quad\.{\\/}@>
5650primitive("accent",accent,0);@/
5651@!@:accent_}{\.{\\accent} primitive@>
5652primitive("advance",advance,0);@/
5653@!@:advance_}{\.{\\advance} primitive@>
5654primitive("afterassignment",after_assignment,0);@/
5655@!@:after_assignment_}{\.{\\afterassignment} primitive@>
5656primitive("aftergroup",after_group,0);@/
5657@!@:after_group_}{\.{\\aftergroup} primitive@>
5658primitive("begingroup",begin_group,0);@/
5659@!@:begin_group_}{\.{\\begingroup} primitive@>
5660primitive("char",char_num,0);@/
5661@!@:char_}{\.{\\char} primitive@>
5662primitive("csname",cs_name,0);@/
5663@!@:cs_name_}{\.{\\csname} primitive@>
5664primitive("delimiter",delim_num,0);@/
5665@!@:delimiter_}{\.{\\delimiter} primitive@>
5666primitive("divide",divide,0);@/
5667@!@:divide_}{\.{\\divide} primitive@>
5668primitive("endcsname",end_cs_name,0);@/
5669@!@:end_cs_name_}{\.{\\endcsname} primitive@>
5670primitive("endgroup",end_group,0);
5671@!@:end_group_}{\.{\\endgroup} primitive@>
5672text(frozen_end_group):="endgroup"; eqtb[frozen_end_group]:=eqtb[cur_val];@/
5673primitive("expandafter",expand_after,0);@/
5674@!@:expand_after_}{\.{\\expandafter} primitive@>
5675primitive("font",def_font,0);@/
5676@!@:font_}{\.{\\font} primitive@>
5677primitive("fontdimen",assign_font_dimen,0);@/
5678@!@:font_dimen_}{\.{\\fontdimen} primitive@>
5679primitive("halign",halign,0);@/
5680@!@:halign_}{\.{\\halign} primitive@>
5681primitive("hrule",hrule,0);@/
5682@!@:hrule_}{\.{\\hrule} primitive@>
5683primitive("ignorespaces",ignore_spaces,0);@/
5684@!@:ignore_spaces_}{\.{\\ignorespaces} primitive@>
5685primitive("insert",insert,0);@/
5686@!@:insert_}{\.{\\insert} primitive@>
5687primitive("mark",mark,0);@/
5688@!@:mark_}{\.{\\mark} primitive@>
5689primitive("mathaccent",math_accent,0);@/
5690@!@:math_accent_}{\.{\\mathaccent} primitive@>
5691primitive("mathchar",math_char_num,0);@/
5692@!@:math_char_}{\.{\\mathchar} primitive@>
5693primitive("mathchoice",math_choice,0);@/
5694@!@:math_choice_}{\.{\\mathchoice} primitive@>
5695primitive("multiply",multiply,0);@/
5696@!@:multiply_}{\.{\\multiply} primitive@>
5697primitive("noalign",no_align,0);@/
5698@!@:no_align_}{\.{\\noalign} primitive@>
5699primitive("noboundary",no_boundary,0);@/
5700@!@:no_boundary_}{\.{\\noboundary} primitive@>
5701primitive("noexpand",no_expand,0);@/
5702@!@:no_expand_}{\.{\\noexpand} primitive@>
5703primitive("nonscript",non_script,0);@/
5704@!@:non_script_}{\.{\\nonscript} primitive@>
5705primitive("omit",omit,0);@/
5706@!@:omit_}{\.{\\omit} primitive@>
5707primitive("parshape",set_shape,0);@/
5708@!@:par_shape_}{\.{\\parshape} primitive@>
5709primitive("penalty",break_penalty,0);@/
5710@!@:penalty_}{\.{\\penalty} primitive@>
5711primitive("prevgraf",set_prev_graf,0);@/
5712@!@:prev_graf_}{\.{\\prevgraf} primitive@>
5713primitive("radical",radical,0);@/
5714@!@:radical_}{\.{\\radical} primitive@>
5715primitive("read",read_to_cs,0);@/
5716@!@:read_}{\.{\\read} primitive@>
5717primitive("relax",relax,256); {cf.\ |scan_file_name|}
5718@!@:relax_}{\.{\\relax} primitive@>
5719text(frozen_relax):="relax"; eqtb[frozen_relax]:=eqtb[cur_val];@/
5720primitive("setbox",set_box,0);@/
5721@!@:set_box_}{\.{\\setbox} primitive@>
5722primitive("the",the,0);@/
5723@!@:the_}{\.{\\the} primitive@>
5724primitive("toks",toks_register,0);@/
5725@!@:toks_}{\.{\\toks} primitive@>
5726primitive("vadjust",vadjust,0);@/
5727@!@:vadjust_}{\.{\\vadjust} primitive@>
5728primitive("valign",valign,0);@/
5729@!@:valign_}{\.{\\valign} primitive@>
5730primitive("vcenter",vcenter,0);@/
5731@!@:vcenter_}{\.{\\vcenter} primitive@>
5732primitive("vrule",vrule,0);@/
5733@!@:vrule_}{\.{\\vrule} primitive@>
5734
5735@ Each primitive has a corresponding inverse, so that it is possible to
5736display the cryptic numeric contents of |eqtb| in symbolic form.
5737Every call of |primitive| in this program is therefore accompanied by some
5738straightforward code that forms part of the |print_cmd_chr| routine
5739below.
5740
5741@<Cases of |print_cmd_chr|...@>=
5742accent: print_esc("accent");
5743advance: print_esc("advance");
5744after_assignment: print_esc("afterassignment");
5745after_group: print_esc("aftergroup");
5746assign_font_dimen: print_esc("fontdimen");
5747begin_group: print_esc("begingroup");
5748break_penalty: print_esc("penalty");
5749char_num: print_esc("char");
5750cs_name: print_esc("csname");
5751def_font: print_esc("font");
5752delim_num: print_esc("delimiter");
5753divide: print_esc("divide");
5754end_cs_name: print_esc("endcsname");
5755end_group: print_esc("endgroup");
5756ex_space: print_esc(" ");
5757expand_after: print_esc("expandafter");
5758halign: print_esc("halign");
5759hrule: print_esc("hrule");
5760ignore_spaces: print_esc("ignorespaces");
5761insert: print_esc("insert");
5762ital_corr: print_esc("/");
5763mark: print_esc("mark");
5764math_accent: print_esc("mathaccent");
5765math_char_num: print_esc("mathchar");
5766math_choice: print_esc("mathchoice");
5767multiply: print_esc("multiply");
5768no_align: print_esc("noalign");
5769no_boundary:print_esc("noboundary");
5770no_expand: print_esc("noexpand");
5771non_script: print_esc("nonscript");
5772omit: print_esc("omit");
5773radical: print_esc("radical");
5774read_to_cs: print_esc("read");
5775relax: print_esc("relax");
5776set_box: print_esc("setbox");
5777set_prev_graf: print_esc("prevgraf");
5778set_shape: print_esc("parshape");
5779the: print_esc("the");
5780toks_register: print_esc("toks");
5781vadjust: print_esc("vadjust");
5782valign: print_esc("valign");
5783vcenter: print_esc("vcenter");
5784vrule: print_esc("vrule");
5785
5786@ We will deal with the other primitives later, at some point in the program
5787where their |eq_type| and |equiv| values are more meaningful.  For example,
5788the primitives for math mode will be loaded when we consider the routines
5789that deal with formulas. It is easy to find where each particular
5790primitive was treated by looking in the index at the end; for example, the
5791section where |"radical"| entered |eqtb| is listed under `\.{\\radical}
5792primitive'. (Primitives consisting of a single nonalphabetic character,
5793@!like `\.{\\/}', are listed under `Single-character primitives'.)
5794@!@^Single-character primitives@>
5795
5796Meanwhile, this is a convenient place to catch up on something we were unable
5797to do before the hash table was defined:
5798
5799@<Print the font identifier for |font(p)|@>=
5800print_esc(font_id_text(font(p)))
5801
5802@* \[19] Saving and restoring equivalents.
5803The nested structure provided by `$\.{\char'173}\ldots\.{\char'175}$' groups
5804in \TeX\ means that |eqtb| entries valid in outer groups should be saved
5805and restored later if they are overridden inside the braces. When a new |eqtb|
5806value is being assigned, the program therefore checks to see if the previous
5807entry belongs to an outer level. In such a case, the old value is placed
5808on the |save_stack| just before the new value enters |eqtb|. At the
5809end of a grouping level, i.e., when the right brace is sensed, the
5810|save_stack| is used to restore the outer values, and the inner ones are
5811destroyed.
5812
5813Entries on the |save_stack| are of type |memory_word|. The top item on
5814this stack is |save_stack[p]|, where |p=save_ptr-1|; it contains three
5815fields called |save_type|, |save_level|, and |save_index|, and it is
5816interpreted in one of four ways:
5817
5818\yskip\hangg 1) If |save_type(p)=restore_old_value|, then
5819|save_index(p)| is a location in |eqtb| whose current value should
5820be destroyed at the end of the current group and replaced by |save_stack[p-1]|.
5821Furthermore if |save_index(p)>=int_base|, then |save_level(p)|
5822should replace the corresponding entry in |xeq_level|.
5823
5824\yskip\hangg 2) If |save_type(p)=restore_zero|, then |save_index(p)|
5825is a location in |eqtb| whose current value should be destroyed at the end
5826of the current group, when it should be
5827replaced by the current value of |eqtb[undefined_control_sequence]|.
5828
5829\yskip\hangg 3) If |save_type(p)=insert_token|, then |save_index(p)|
5830is a token that should be inserted into \TeX's input when the current
5831group ends.
5832
5833\yskip\hangg 4) If |save_type(p)=level_boundary|, then |save_level(p)|
5834is a code explaining what kind of group we were previously in, and
5835|save_index(p)| points to the level boundary word at the bottom of
5836the entries for that group.
5837
5838@d save_type(#)==save_stack[#].hh.b0 {classifies a |save_stack| entry}
5839@d save_level(#)==save_stack[#].hh.b1
5840  {saved level for regions 5 and 6, or group code}
5841@d save_index(#)==save_stack[#].hh.rh
5842  {|eqtb| location or token or |save_stack| location}
5843@d restore_old_value=0 {|save_type| when a value should be restored later}
5844@d restore_zero=1 {|save_type| when an undefined entry should be restored}
5845@d insert_token=2 {|save_type| when a token is being saved for later use}
5846@d level_boundary=3 {|save_type| corresponding to beginning of group}
5847
5848@ Here are the group codes that are used to discriminate between different
5849kinds of groups. They allow \TeX\ to decide what special actions, if any,
5850should be performed when a group ends.
5851\def\grp{\.{\char'173...\char'175}}
5852
5853Some groups are not supposed to be ended by right braces. For example,
5854the `\.\$' that begins a math formula causes a |math_shift_group| to
5855be started, and this should be terminated by a matching `\.\$'. Similarly,
5856a group that starts with \.{\\left} should end with \.{\\right}, and
5857one that starts with \.{\\begingroup} should end with \.{\\endgroup}.
5858
5859@d bottom_level=0 {group code for the outside world}
5860@d simple_group=1 {group code for local structure only}
5861@d hbox_group=2 {code for `\.{\\hbox}\grp'}
5862@d adjusted_hbox_group=3 {code for `\.{\\hbox}\grp' in vertical mode}
5863@d vbox_group=4 {code for `\.{\\vbox}\grp'}
5864@d vtop_group=5 {code for `\.{\\vtop}\grp'}
5865@d align_group=6 {code for `\.{\\halign}\grp', `\.{\\valign}\grp'}
5866@d no_align_group=7 {code for `\.{\\noalign}\grp'}
5867@d output_group=8 {code for output routine}
5868@d math_group=9 {code for, e.g., `\.{\char'136}\grp'}
5869@d disc_group=10 {code for `\.{\\discretionary}\grp\grp\grp'}
5870@d insert_group=11 {code for `\.{\\insert}\grp', `\.{\\vadjust}\grp'}
5871@d vcenter_group=12 {code for `\.{\\vcenter}\grp'}
5872@d math_choice_group=13 {code for `\.{\\mathchoice}\grp\grp\grp\grp'}
5873@d semi_simple_group=14 {code for `\.{\\begingroup...\\endgroup}'}
5874@d math_shift_group=15 {code for `\.{\$...\$}'}
5875@d math_left_group=16 {code for `\.{\\left...\\right}'}
5876@d max_group_code=16
5877
5878@<Types...@>=
5879@!group_code=0..max_group_code; {|save_level| for a level boundary}
5880
5881@ The global variable |cur_group| keeps track of what sort of group we are
5882currently in. Another global variable, |cur_boundary|, points to the
5883topmost |level_boundary| word.  And |cur_level| is the current depth of
5884nesting. The routines are designed to preserve the condition that no entry
5885in the |save_stack| or in |eqtb| ever has a level greater than |cur_level|.
5886
5887@ @<Glob...@>=
5888@!save_stack : array[0..save_size] of memory_word;
5889@!save_ptr : 0..save_size; {first unused entry on |save_stack|}
5890@!max_save_stack:0..save_size; {maximum usage of save stack}
5891@!cur_level: quarterword; {current nesting level for groups}
5892@!cur_group: group_code; {current group type}
5893@!cur_boundary: 0..save_size; {where the current level begins}
5894
5895@ At this time it might be a good idea for the reader to review the introduction
5896to |eqtb| that was given above just before the long lists of parameter names.
5897Recall that the ``outer level'' of the program is |level_one|, since
5898undefined control sequences are assumed to be ``defined'' at |level_zero|.
5899
5900@<Set init...@>=
5901save_ptr:=0; cur_level:=level_one; cur_group:=bottom_level; cur_boundary:=0;
5902max_save_stack:=0;
5903
5904@ The following macro is used to test if there is room for up to six more
5905entries on |save_stack|. By making a conservative test like this, we can
5906get by with testing for overflow in only a few places.
5907
5908@d check_full_save_stack==if save_ptr>max_save_stack then
5909  begin max_save_stack:=save_ptr;
5910  if max_save_stack>save_size-6 then overflow("save size",save_size);
5911@:TeX capacity exceeded save size}{\quad save size@>
5912  end
5913
5914@ Procedure |new_save_level| is called when a group begins. The
5915argument is a group identification code like `|hbox_group|'. After
5916calling this routine, it is safe to put five more entries on |save_stack|.
5917
5918In some cases integer-valued items are placed onto the
5919|save_stack| just below a |level_boundary| word, because this is a
5920convenient place to keep information that is supposed to ``pop up'' just
5921when the group has finished.
5922For example, when `\.{\\hbox to 100pt}\grp' is being treated, the 100pt
5923dimension is stored on |save_stack| just before |new_save_level| is
5924called.
5925
5926We use the notation |saved(k)| to stand for an integer item that
5927appears in location |save_ptr+k| of the save stack.
5928
5929@d saved(#)==save_stack[save_ptr+#].int
5930
5931@p procedure new_save_level(@!c:group_code); {begin a new level of grouping}
5932begin check_full_save_stack;
5933save_type(save_ptr):=level_boundary; save_level(save_ptr):=cur_group;
5934save_index(save_ptr):=cur_boundary;
5935if cur_level=max_quarterword then overflow("grouping levels",
5936@:TeX capacity exceeded grouping levels}{\quad grouping levels@>
5937  max_quarterword-min_quarterword);
5938  {quit if |(cur_level+1)| is too big to be stored in |eqtb|}
5939cur_boundary:=save_ptr; incr(cur_level); incr(save_ptr); cur_group:=c;
5940end;
5941
5942@ Just before an entry of |eqtb| is changed, the following procedure should
5943be called to update the other data structures properly. It is important
5944to keep in mind that reference counts in |mem| include references from
5945within |save_stack|, so these counts must be handled carefully.
5946@^reference counts@>
5947
5948@p procedure eq_destroy(@!w:memory_word); {gets ready to forget |w|}
5949var q:pointer; {|equiv| field of |w|}
5950begin case eq_type_field(w) of
5951call,long_call,outer_call,long_outer_call: delete_token_ref(equiv_field(w));
5952glue_ref: delete_glue_ref(equiv_field(w));
5953shape_ref: begin q:=equiv_field(w); {we need to free a \.{\\parshape} block}
5954  if q<>null then free_node(q,info(q)+info(q)+1);
5955  end; {such a block is |2n+1| words long, where |n=info(q)|}
5956box_ref: flush_node_list(equiv_field(w));
5957othercases do_nothing
5958endcases;
5959end;
5960
5961@ To save a value of |eqtb[p]| that was established at level |l|, we
5962can use the following subroutine.
5963
5964@p procedure eq_save(@!p:pointer;@!l:quarterword); {saves |eqtb[p]|}
5965begin check_full_save_stack;
5966if l=level_zero then save_type(save_ptr):=restore_zero
5967else  begin save_stack[save_ptr]:=eqtb[p]; incr(save_ptr);
5968  save_type(save_ptr):=restore_old_value;
5969  end;
5970save_level(save_ptr):=l; save_index(save_ptr):=p; incr(save_ptr);
5971end;
5972
5973@ The procedure |eq_define| defines an |eqtb| entry having specified
5974|eq_type| and |equiv| fields, and saves the former value if appropriate.
5975This procedure is used only for entries in the first four regions of |eqtb|,
5976i.e., only for entries that have |eq_type| and |equiv| fields.
5977After calling this routine, it is safe to put four more entries on
5978|save_stack|, provided that there was room for four more entries before
5979the call, since |eq_save| makes the necessary test.
5980
5981@p procedure eq_define(@!p:pointer;@!t:quarterword;@!e:halfword);
5982  {new data for |eqtb|}
5983begin if eq_level(p)=cur_level then eq_destroy(eqtb[p])
5984else if cur_level>level_one then eq_save(p,eq_level(p));
5985eq_level(p):=cur_level; eq_type(p):=t; equiv(p):=e;
5986end;
5987
5988@ The counterpart of |eq_define| for the remaining (fullword) positions in
5989|eqtb| is called |eq_word_define|. Since |xeq_level[p]>=level_one| for all
5990|p|, a `|restore_zero|' will never be used in this case.
5991
5992@p procedure eq_word_define(@!p:pointer;@!w:integer);
5993begin if xeq_level[p]<>cur_level then
5994  begin eq_save(p,xeq_level[p]); xeq_level[p]:=cur_level;
5995  end;
5996eqtb[p].int:=w;
5997end;
5998
5999@ The |eq_define| and |eq_word_define| routines take care of local definitions.
6000@^global definitions@>
6001Global definitions are done in almost the same way, but there is no need
6002to save old values, and the new value is associated with |level_one|.
6003
6004@p procedure geq_define(@!p:pointer;@!t:quarterword;@!e:halfword);
6005  {global |eq_define|}
6006begin eq_destroy(eqtb[p]);
6007eq_level(p):=level_one; eq_type(p):=t; equiv(p):=e;
6008end;
6009@#
6010procedure geq_word_define(@!p:pointer;@!w:integer); {global |eq_word_define|}
6011begin eqtb[p].int:=w; xeq_level[p]:=level_one;
6012end;
6013
6014@ Subroutine |save_for_after| puts a token on the stack for save-keeping.
6015
6016@p procedure save_for_after(@!t:halfword);
6017begin if cur_level>level_one then
6018  begin check_full_save_stack;
6019  save_type(save_ptr):=insert_token; save_level(save_ptr):=level_zero;
6020  save_index(save_ptr):=t; incr(save_ptr);
6021  end;
6022end;
6023
6024@ The |unsave| routine goes the other way, taking items off of |save_stack|.
6025This routine takes care of restoration when a level ends; everything
6026belonging to the topmost group is cleared off of the save stack.
6027
6028@p@t\4@>@<Declare the procedure called |restore_trace|@>@;@/
6029procedure@?back_input; forward; @t\2@>
6030procedure unsave; {pops the top level off the save stack}
6031label done;
6032var p:pointer; {position to be restored}
6033@!l:quarterword; {saved level, if in fullword regions of |eqtb|}
6034@!t:halfword; {saved value of |cur_tok|}
6035begin if cur_level>level_one then
6036  begin decr(cur_level);
6037  @<Clear off top level from |save_stack|@>;
6038  end
6039else confusion("curlevel"); {|unsave| is not used when |cur_group=bottom_level|}
6040@:this can't happen curlevel}{\quad curlevel@>
6041end;
6042
6043@ @<Clear off...@>=
6044loop@+begin decr(save_ptr);
6045  if save_type(save_ptr)=level_boundary then goto done;
6046  p:=save_index(save_ptr);
6047  if save_type(save_ptr)=insert_token then
6048    @<Insert token |p| into \TeX's input@>
6049  else  begin if save_type(save_ptr)=restore_old_value then
6050      begin l:=save_level(save_ptr); decr(save_ptr);
6051      end
6052    else save_stack[save_ptr]:=eqtb[undefined_control_sequence];
6053    @<Store \(s)|save_stack[save_ptr]| in |eqtb[p]|, unless
6054      |eqtb[p]| holds a global value@>;
6055    end;
6056  end;
6057done: cur_group:=save_level(save_ptr); cur_boundary:=save_index(save_ptr)
6058
6059@ A global definition, which sets the level to |level_one|,
6060@^global definitions@>
6061will not be undone by |unsave|. If at least one global definition of
6062|eqtb[p]| has been carried out within the group that just ended, the
6063last such definition will therefore survive.
6064
6065@<Store \(s)|save...@>=
6066if p<int_base then
6067  if eq_level(p)=level_one then
6068    begin eq_destroy(save_stack[save_ptr]); {destroy the saved value}
6069    @!stat if tracing_restores>0 then restore_trace(p,"retaining");@+tats@;@/
6070    end
6071  else  begin eq_destroy(eqtb[p]); {destroy the current value}
6072    eqtb[p]:=save_stack[save_ptr]; {restore the saved value}
6073    @!stat if tracing_restores>0 then restore_trace(p,"restoring");@+tats@;@/
6074    end
6075else if xeq_level[p]<>level_one then
6076  begin eqtb[p]:=save_stack[save_ptr]; xeq_level[p]:=l;
6077  @!stat if tracing_restores>0 then restore_trace(p,"restoring");@+tats@;@/
6078  end
6079else  begin
6080  @!stat if tracing_restores>0 then restore_trace(p,"retaining");@+tats@;@/
6081  end
6082
6083@ @<Declare the procedure called |restore_trace|@>=
6084@!stat procedure restore_trace(@!p:pointer;@!s:str_number);
6085  {|eqtb[p]| has just been restored or retained}
6086begin begin_diagnostic; print_char("{"); print(s); print_char(" ");
6087show_eqtb(p); print_char("}");
6088end_diagnostic(false);
6089end;
6090tats
6091
6092@ When looking for possible pointers to a memory location, it is helpful
6093to look for references from |eqtb| that might be waiting on the
6094save stack. Of course, we might find spurious pointers too; but this
6095routine is merely an aid when debugging, and at such times we are
6096grateful for any scraps of information, even if they prove to be irrelevant.
6097@^dirty \PASCAL@>
6098
6099@<Search |save_stack| for equivalents that point to |p|@>=
6100if save_ptr>0 then for q:=0 to save_ptr-1 do
6101  begin if equiv_field(save_stack[q])=p then
6102    begin print_nl("SAVE("); print_int(q); print_char(")");
6103    end;
6104  end
6105
6106@ Most of the parameters kept in |eqtb| can be changed freely, but there's
6107an exception:  The magnification should not be used with two different
6108values during any \TeX\ job, since a single magnification is applied to an
6109entire run. The global variable |mag_set| is set to the current magnification
6110whenever it becomes necessary to ``freeze'' it at a particular value.
6111
6112@<Glob...@>=
6113@!mag_set:integer; {if nonzero, this magnification should be used henceforth}
6114
6115@ @<Set init...@>=
6116mag_set:=0;
6117
6118@ The |prepare_mag| subroutine is called whenever \TeX\ wants to use |mag|
6119for magnification.
6120
6121@p procedure prepare_mag;
6122begin if (mag_set>0)and(mag<>mag_set) then
6123  begin print_err("Incompatible magnification ("); print_int(mag);
6124@.Incompatible magnification@>
6125  print(");"); print_nl(" the previous value will be retained");
6126  help2("I can handle only one magnification ratio per job. So I've")@/
6127  ("reverted to the magnification you used earlier on this run.");@/
6128  int_error(mag_set);
6129  geq_word_define(int_base+mag_code,mag_set); {|mag:=mag_set|}
6130  end;
6131if (mag<=0)or(mag>32768) then
6132  begin print_err("Illegal magnification has been changed to 1000");@/
6133@.Illegal magnification...@>
6134  help1("The magnification ratio must be between 1 and 32768.");
6135  int_error(mag); geq_word_define(int_base+mag_code,1000);
6136  end;
6137mag_set:=mag;
6138end;
6139
6140@* \[20] Token lists.
6141A \TeX\ token is either a character or a control sequence, and it is
6142@^token@>
6143represented internally in one of two ways: (1)~A character whose ASCII
6144code number is |c| and whose command code is |m| is represented as the
6145number $2^8m+c$; the command code is in the range |1<=m<=14|. (2)~A control
6146sequence whose |eqtb| address is |p| is represented as the number
6147|cs_token_flag+p|. Here |cs_token_flag=@t$2^{12}-1$@>| is larger than
6148$2^8m+c$, yet it is small enough that |cs_token_flag+p< max_halfword|;
6149thus, a token fits comfortably in a halfword.
6150
6151A token |t| represents a |left_brace| command if and only if
6152|t<left_brace_limit|; it represents a |right_brace| command if and only if
6153we have |left_brace_limit<=t<right_brace_limit|; and it represents a |match| or
6154|end_match| command if and only if |match_token<=t<=end_match_token|.
6155The following definitions take care of these token-oriented constants
6156and a few others.
6157
6158@d cs_token_flag==@'7777 {amount added to the |eqtb| location in a
6159  token that stands for a control sequence; is a multiple of~256, less~1}
6160@d left_brace_token=@'0400 {$2^8\cdot|left_brace|$}
6161@d left_brace_limit=@'1000 {$2^8\cdot(|left_brace|+1)$}
6162@d right_brace_token=@'1000 {$2^8\cdot|right_brace|$}
6163@d right_brace_limit=@'1400 {$2^8\cdot(|right_brace|+1)$}
6164@d math_shift_token=@'1400 {$2^8\cdot|math_shift|$}
6165@d tab_token=@'2000 {$2^8\cdot|tab_mark|$}
6166@d out_param_token=@'2400 {$2^8\cdot|out_param|$}
6167@d space_token=@'5040 {$2^8\cdot|spacer|+|" "|$}
6168@d letter_token=@'5400 {$2^8\cdot|letter|$}
6169@d other_token=@'6000 {$2^8\cdot|other_char|$}
6170@d match_token=@'6400 {$2^8\cdot|match|$}
6171@d end_match_token=@'7000 {$2^8\cdot|end_match|$}
6172
6173@ @<Check the ``constant''...@>=
6174if cs_token_flag+undefined_control_sequence>max_halfword then bad:=21;
6175
6176@ A token list is a singly linked list of one-word nodes in |mem|, where
6177each word contains a token and a link. Macro definitions, output-routine
6178definitions, marks, \.{\\write} texts, and a few other things
6179are remembered by \TeX\ in the form
6180of token lists, usually preceded by a node with a reference count in its
6181|token_ref_count| field. The token stored in location |p| is called
6182|info(p)|.
6183
6184Three special commands appear in the token lists of macro definitions.
6185When |m=match|, it means that \TeX\ should scan a parameter
6186for the current macro; when |m=end_match|, it means that parameter
6187matching should end and \TeX\ should start reading the macro text; and
6188when |m=out_param|, it means that \TeX\ should insert parameter
6189number |c| into the text at this point.
6190
6191The enclosing \.{\char'173} and \.{\char'175} characters of a macro
6192definition are omitted, but the final right brace of an output routine
6193is included at the end of its token list.
6194
6195Here is an example macro definition that illustrates these conventions.
6196After \TeX\ processes the text
6197$$\.{\\def\\mac a\#1\#2 \\b \{\#1\\-a \#\#1\#2 \#2\}}$$
6198the definition of \.{\\mac} is represented as a token list containing
6199$$\def\,{\hskip2pt}
6200\vbox{\halign{\hfil#\hfil\cr
6201(reference count), |letter|\,\.a, |match|\,\#, |match|\,\#, |spacer|\,\.\ ,
6202\.{\\b}, |end_match|,\cr
6203|out_param|\,1, \.{\\-}, |letter|\,\.a, |spacer|\,\.\ , |mac_param|\,\#,
6204|other_char|\,\.1,\cr
6205|out_param|\,2, |spacer|\,\.\ , |out_param|\,2.\cr}}$$
6206The procedure |scan_toks| builds such token lists, and |macro_call|
6207does the parameter matching.
6208@^reference counts@>
6209
6210Examples such as
6211$$\.{\\def\\m\{\\def\\m\{a\}\ b\}}$$
6212explain why reference counts would be needed even if \TeX\ had no \.{\\let}
6213operation: When the token list for \.{\\m} is being read, the redefinition of
6214\.{\\m} changes the |eqtb| entry before the token list has been fully
6215consumed, so we dare not simply destroy a token list when its
6216control sequence is being redefined.
6217
6218If the parameter-matching part of a definition ends with `\.{\#\{}',
6219the corresponding token list will have `\.\{' just before the `|end_match|'
6220and also at the very end. The first `\.\{' is used to delimit the parameter; the
6221second one keeps the first from disappearing.
6222
6223@ The procedure |show_token_list|, which prints a symbolic form of
6224the token list that starts at a given node |p|, illustrates these
6225conventions. The token list being displayed should not begin with a reference
6226count. However, the procedure is intended to be robust, so that if the
6227memory links are awry or if |p| is not really a pointer to a token list,
6228nothing catastrophic will happen.
6229
6230An additional parameter |q| is also given; this parameter is either null
6231or it points to a node in the token list where a certain magic computation
6232takes place that will be explained later. (Basically, |q| is non-null when
6233we are printing the two-line context information at the time of an error
6234message; |q| marks the place corresponding to where the second line
6235should begin.)
6236
6237For example, if |p| points to the node containing the first \.a in the
6238token list above, then |show_token_list| will print the string
6239$$\hbox{`\.{a\#1\#2\ \\b\ ->\#1\\-a\ \#\#1\#2\ \#2}';}$$
6240and if |q| points to the node containing the second \.a,
6241the magic computation will be performed just before the second \.a is printed.
6242
6243The generation will stop, and `\.{\\ETC.}' will be printed, if the length
6244of printing exceeds a given limit~|l|. Anomalous entries are printed in the
6245form of control sequences that are not followed by a blank space, e.g.,
6246`\.{\\BAD.}'; this cannot be confused with actual control sequences because
6247a real control sequence named \.{BAD} would come out `\.{\\BAD\ }'.
6248
6249@<Declare the procedure called |show_token_list|@>=
6250procedure show_token_list(@!p,@!q:integer;@!l:integer);
6251label exit;
6252var m,@!c:integer; {pieces of a token}
6253@!match_chr:ASCII_code; {character used in a `|match|'}
6254@!n:ASCII_code; {the highest parameter number, as an ASCII digit}
6255begin match_chr:="#"; n:="0"; tally:=0;
6256while (p<>null) and (tally<l) do
6257  begin if p=q then @<Do magic computation@>;
6258  @<Display token |p|, and |return| if there are problems@>;
6259  p:=link(p);
6260  end;
6261if p<>null then print_esc("ETC.");
6262@.ETC@>
6263exit:
6264end;
6265
6266@ @<Display token |p|...@>=
6267if (p<hi_mem_min) or (p>mem_end) then
6268  begin print_esc("CLOBBERED."); return;
6269@.CLOBBERED@>
6270  end;
6271if info(p)>=cs_token_flag then print_cs(info(p)-cs_token_flag)
6272else  begin m:=info(p) div @'400; c:=info(p) mod @'400;
6273  if info(p)<0 then print_esc("BAD.")
6274@.BAD@>
6275  else @<Display the token $(|m|,|c|)$@>;
6276  end
6277
6278@ The procedure usually ``learns'' the character code used for macro
6279parameters by seeing one in a |match| command before it runs into any
6280|out_param| commands.
6281
6282@<Display the token ...@>=
6283case m of
6284left_brace,right_brace,math_shift,tab_mark,sup_mark,sub_mark,spacer,
6285  letter,other_char: print(c);
6286mac_param: begin print(c); print(c);
6287  end;
6288out_param: begin print(match_chr);
6289  if c<=9 then print_char(c+"0")
6290  else  begin print_char("!"); return;
6291    end;
6292  end;
6293match: begin match_chr:=c; print(c); incr(n); print_char(n);
6294  if n>"9" then return;
6295  end;
6296end_match: print("->");
6297@.->@>
6298othercases print_esc("BAD.")
6299@.BAD@>
6300endcases
6301
6302@ Here's the way we sometimes want to display a token list, given a pointer
6303to its reference count; the pointer may be null.
6304
6305@p procedure token_show(@!p:pointer);
6306begin if p<>null then show_token_list(link(p),null,10000000);
6307end;
6308
6309@ The |print_meaning| subroutine displays |cur_cmd| and |cur_chr| in
6310symbolic form, including the expansion of a macro or mark.
6311
6312@p procedure print_meaning;
6313begin print_cmd_chr(cur_cmd,cur_chr);
6314if cur_cmd>=call then
6315  begin print_char(":"); print_ln; token_show(cur_chr);
6316  end
6317else if cur_cmd=top_bot_mark then
6318  begin print_char(":"); print_ln;
6319  token_show(cur_mark[cur_chr]);
6320  end;
6321end;
6322
6323@* \[21] Introduction to the syntactic routines.
6324Let's pause a moment now and try to look at the Big Picture.
6325The \TeX\ program consists of three main parts: syntactic routines,
6326semantic routines, and output routines. The chief purpose of the
6327syntactic routines is to deliver the user's input to the semantic routines,
6328one token at a time. The semantic routines act as an interpreter
6329responding to these tokens, which may be regarded as commands. And the
6330output routines are periodically called on to convert box-and-glue
6331lists into a compact set of instructions that will be sent
6332to a typesetter. We have discussed the basic data structures and utility
6333routines of \TeX, so we are good and ready to plunge into the real activity by
6334considering the syntactic routines.
6335
6336Our current goal is to come to grips with the |get_next| procedure,
6337which is the keystone of \TeX's input mechanism. Each call of |get_next|
6338sets the value of three variables |cur_cmd|, |cur_chr|, and |cur_cs|,
6339representing the next input token.
6340$$\vbox{\halign{#\hfil\cr
6341  \hbox{|cur_cmd| denotes a command code from the long list of codes
6342   given above;}\cr
6343  \hbox{|cur_chr| denotes a character code or other modifier of the command
6344   code;}\cr
6345  \hbox{|cur_cs| is the |eqtb| location of the current control sequence,}\cr
6346  \hbox{\qquad if the current token was a control sequence,
6347   otherwise it's zero.}\cr}}$$
6348Underlying this external behavior of |get_next| is all the machinery
6349necessary to convert from character files to tokens. At a given time we
6350may be only partially finished with the reading of several files (for
6351which \.{\\input} was specified), and partially finished with the expansion
6352of some user-defined macros and/or some macro parameters, and partially
6353finished with the generation of some text in a template for \.{\\halign},
6354and so on. When reading a character file, special characters must be
6355classified as math delimiters, etc.; comments and extra blank spaces must
6356be removed, paragraphs must be recognized, and control sequences must be
6357found in the hash table. Furthermore there are occasions in which the
6358scanning routines have looked ahead for a word like `\.{plus}' but only
6359part of that word was found, hence a few characters must be put back
6360into the input and scanned again.
6361
6362To handle these situations, which might all be present simultaneously,
6363\TeX\ uses various stacks that hold information about the incomplete
6364activities, and there is a finite state control for each level of the
6365input mechanism. These stacks record the current state of an implicitly
6366recursive process, but the |get_next| procedure is not recursive.
6367Therefore it will not be difficult to translate these algorithms into
6368low-level languages that do not support recursion.
6369
6370@<Glob...@>=
6371@!cur_cmd: eight_bits; {current command set by |get_next|}
6372@!cur_chr: halfword; {operand of current command}
6373@!cur_cs: pointer; {control sequence found here, zero if none found}
6374@!cur_tok: halfword; {packed representative of |cur_cmd| and |cur_chr|}
6375
6376@ The |print_cmd_chr| routine prints a symbolic interpretation of a
6377command code and its modifier. This is used in certain `\.{You can\'t}'
6378error messages, and in the implementation of diagnostic routines like
6379\.{\\show}.
6380
6381The body of |print_cmd_chr| is a rather tedious listing of print
6382commands, and most of it is essentially an inverse to the |primitive|
6383routine that enters a \TeX\ primitive into |eqtb|. Therefore much of
6384this procedure appears elsewhere in the program,
6385together with the corresponding |primitive| calls.
6386
6387@d chr_cmd(#)==begin print(#); print_ASCII(chr_code);
6388  end
6389
6390@<Declare the procedure called |print_cmd_chr|@>=
6391procedure print_cmd_chr(@!cmd:quarterword;@!chr_code:halfword);
6392begin case cmd of
6393left_brace: chr_cmd("begin-group character ");
6394right_brace: chr_cmd("end-group character ");
6395math_shift: chr_cmd("math shift character ");
6396mac_param: chr_cmd("macro parameter character ");
6397sup_mark: chr_cmd("superscript character ");
6398sub_mark: chr_cmd("subscript character ");
6399endv: print("end of alignment template");
6400spacer: chr_cmd("blank space ");
6401letter: chr_cmd("the letter ");
6402other_char: chr_cmd("the character ");
6403@t\4@>@<Cases of |print_cmd_chr| for symbolic printing of primitives@>@/
6404othercases print("[unknown command code!]")
6405endcases;
6406end;
6407
6408@ Here is a procedure that displays the current command.
6409
6410@p procedure show_cur_cmd_chr;
6411begin begin_diagnostic; print_nl("{");
6412if mode<>shown_mode then
6413  begin print_mode(mode); print(": "); shown_mode:=mode;
6414  end;
6415print_cmd_chr(cur_cmd,cur_chr); print_char("}");
6416end_diagnostic(false);
6417end;
6418
6419@* \[22] Input stacks and states.
6420This implementation of
6421\TeX\ uses two different conventions for representing sequential stacks.
6422@^stack conventions@>@^conventions for representing stacks@>
6423
6424\yskip\hangg 1) If there is frequent access to the top entry, and if the
6425stack is essentially never empty, then the top entry is kept in a global
6426variable (even better would be a machine register), and the other entries
6427appear in the array $\\{stack}[0\to(\\{ptr}-1)]$. For example, the
6428semantic stack described above is handled this way, and so is the input
6429stack that we are about to study.
6430
6431\yskip\hangg 2) If there is infrequent top access, the entire stack contents
6432are in the array $\\{stack}[0\to(\\{ptr}-1)]$. For example, the |save_stack|
6433is treated this way, as we have seen.
6434
6435\yskip\noindent
6436The state of \TeX's input mechanism appears in the input stack, whose
6437entries are records with six fields, called |state|, |index|, |start|, |loc|,
6438|limit|, and |name|. This stack is maintained with
6439convention~(1), so it is declared in the following way:
6440
6441@<Types...@>=
6442@!in_state_record = record
6443  @!state_field, @!index_field: quarterword;
6444  @!start_field,@!loc_field, @!limit_field, @!name_field: halfword;
6445  end;
6446
6447@ @<Glob...@>=
6448@!input_stack : array[0..stack_size] of in_state_record;
6449@!input_ptr : 0..stack_size; {first unused location of |input_stack|}
6450@!max_in_stack: 0..stack_size; {largest value of |input_ptr| when pushing}
6451@!cur_input : in_state_record;
6452  {the ``top'' input state, according to convention (1)}
6453
6454@ We've already defined the special variable |loc==cur_input.loc_field|
6455in our discussion of basic input-output routines. The other components of
6456|cur_input| are defined in the same way:
6457
6458@d state==cur_input.state_field {current scanner state}
6459@d index==cur_input.index_field {reference for buffer information}
6460@d start==cur_input.start_field {starting position in |buffer|}
6461@d limit==cur_input.limit_field {end of current line in |buffer|}
6462@d name==cur_input.name_field {name of the current file}
6463
6464@ Let's look more closely now at the control variables
6465(|state|,~|index|,~|start|,~|loc|,~|limit|,~|name|),
6466assuming that \TeX\ is reading a line of characters that have been input
6467from some file or from the user's terminal. There is an array called
6468|buffer| that acts as a stack of all lines of characters that are
6469currently being read from files, including all lines on subsidiary
6470levels of the input stack that are not yet completed. \TeX\ will return to
6471the other lines when it is finished with the present input file.
6472
6473(Incidentally, on a machine with byte-oriented addressing, it might be
6474appropriate to combine |buffer| with the |str_pool| array,
6475letting the buffer entries grow downward from the top of the string pool
6476and checking that these two tables don't bump into each other.)
6477
6478The line we are currently working on begins in position |start| of the
6479buffer; the next character we are about to read is |buffer[loc]|; and
6480|limit| is the location of the last character present.  If |loc>limit|,
6481the line has been completely read. Usually |buffer[limit]| is the
6482|end_line_char|, denoting the end of a line, but this is not
6483true if the current line is an insertion that was entered on the user's
6484terminal in response to an error message.
6485
6486The |name| variable is a string number that designates the name of
6487the current file, if we are reading a text file. It is zero if we
6488are reading from the terminal; it is |n+1| if we are reading from
6489input stream |n|, where |0<=n<=16|. (Input stream 16 stands for
6490an invalid stream number; in such cases the input is actually from
6491the terminal, under control of the procedure |read_toks|.)
6492
6493The |state| variable has one of three values, when we are scanning such
6494files:
6495$$\baselineskip 15pt\vbox{\halign{#\hfil\cr
64961) |state=mid_line| is the normal state.\cr
64972) |state=skip_blanks| is like |mid_line|, but blanks are ignored.\cr
64983) |state=new_line| is the state at the beginning of a line.\cr}}$$
6499These state values are assigned numeric codes so that if we add the state
6500code to the next character's command code, we get distinct values. For
6501example, `|mid_line+spacer|' stands for the case that a blank
6502space character occurs in the middle of a line when it is not being
6503ignored; after this case is processed, the next value of |state| will
6504be |skip_blanks|.
6505
6506@d mid_line=1 {|state| code when scanning a line of characters}
6507@d skip_blanks=2+max_char_code {|state| code when ignoring blanks}
6508@d new_line=3+max_char_code+max_char_code {|state| code at start of line}
6509
6510@ Additional information about the current line is available via the
6511|index| variable, which counts how many lines of characters are present
6512in the buffer below the current level. We have |index=0| when reading
6513from the terminal and prompting the user for each line; then if the user types,
6514e.g., `\.{\\input paper}', we will have |index=1| while reading
6515the file \.{paper.tex}. However, it does not follow that |index| is the
6516same as the input stack pointer, since many of the levels on the input
6517stack may come from token lists. For example, the instruction `\.{\\input
6518paper}' might occur in a token list.
6519
6520The global variable |in_open| is equal to the |index|
6521value of the highest non-token-list level. Thus, the number of partially read
6522lines in the buffer is |in_open+1|, and we have |in_open=index|
6523when we are not reading a token list.
6524
6525If we are not currently reading from the terminal, or from an input
6526stream, we are reading from the file variable |input_file[index]|. We use
6527the notation |terminal_input| as a convenient abbreviation for |name=0|,
6528and |cur_file| as an abbreviation for |input_file[index]|.
6529
6530The global variable |line| contains the line number in the topmost
6531open file, for use in error messages. If we are not reading from
6532the terminal, |line_stack[index]| holds the line number for the
6533enclosing level, so that |line| can be restored when the current
6534file has been read. Line numbers should never be negative, since the
6535negative of the current line number is used to identify the user's output
6536routine in the |mode_line| field of the semantic nest entries.
6537
6538If more information about the input state is needed, it can be
6539included in small arrays like those shown here. For example,
6540the current page or segment number in the input file might be
6541put into a variable |@!page|, maintained for enclosing levels in
6542`\ignorespaces|@!page_stack:array[1..max_in_open] of integer|\unskip'
6543by analogy with |line_stack|.
6544@^system dependencies@>
6545
6546@d terminal_input==(name=0) {are we reading from the terminal?}
6547@d cur_file==input_file[index] {the current |alpha_file| variable}
6548
6549@<Glob...@>=
6550@!in_open : 0..max_in_open; {the number of lines in the buffer, less one}
6551@!open_parens : 0..max_in_open; {the number of open text files}
6552@!input_file : array[1..max_in_open] of alpha_file;
6553@!line : integer; {current line number in the current source file}
6554@!line_stack : array[1..max_in_open] of integer;
6555
6556@ Users of \TeX\ sometimes forget to balance left and right braces properly,
6557and one of the ways \TeX\ tries to spot such errors is by considering an
6558input file as broken into subfiles by control sequences that
6559are declared to be \.{\\outer}.
6560
6561A variable called |scanner_status| tells \TeX\ whether or not to complain
6562when a subfile ends. This variable has six possible values:
6563
6564\yskip\hang|normal|, means that a subfile can safely end here without incident.
6565
6566\yskip\hang|skipping|, means that a subfile can safely end here, but not a file,
6567because we're reading past some conditional text that was not selected.
6568
6569\yskip\hang|defining|, means that a subfile shouldn't end now because a
6570macro is being defined.
6571
6572\yskip\hang|matching|, means that a subfile shouldn't end now because a
6573macro is being used and we are searching for the end of its arguments.
6574
6575\yskip\hang|aligning|, means that a subfile shouldn't end now because we are
6576not finished with the preamble of an \.{\\halign} or \.{\\valign}.
6577
6578\yskip\hang|absorbing|, means that a subfile shouldn't end now because we are
6579reading a balanced token list for \.{\\message}, \.{\\write}, etc.
6580
6581\yskip\noindent
6582If the |scanner_status| is not |normal|, the variable |warning_index| points
6583to the |eqtb| location for the relevant control sequence name to print
6584in an error message.
6585
6586@d skipping=1 {|scanner_status| when passing conditional text}
6587@d defining=2 {|scanner_status| when reading a macro definition}
6588@d matching=3 {|scanner_status| when reading macro arguments}
6589@d aligning=4 {|scanner_status| when reading an alignment preamble}
6590@d absorbing=5 {|scanner_status| when reading a balanced text}
6591
6592@<Glob...@>=
6593@!scanner_status : normal..absorbing; {can a subfile end now?}
6594@!warning_index : pointer; {identifier relevant to non-|normal| scanner status}
6595@!def_ref : pointer; {reference count of token list being defined}
6596
6597@ Here is a procedure that uses |scanner_status| to print a warning message
6598when a subfile has ended, and at certain other crucial times:
6599
6600@<Declare the procedure called |runaway|@>=
6601procedure runaway;
6602var p:pointer; {head of runaway list}
6603begin if scanner_status>skipping then
6604  begin print_nl("Runaway ");
6605@.Runaway...@>
6606  case scanner_status of
6607  defining: begin print("definition"); p:=def_ref;
6608    end;
6609  matching: begin print("argument"); p:=temp_head;
6610    end;
6611  aligning: begin print("preamble"); p:=hold_head;
6612    end;
6613  absorbing: begin print("text"); p:=def_ref;
6614    end;
6615  end; {there are no other cases}
6616  print_char("?");print_ln; show_token_list(link(p),null,error_line-10);
6617  end;
6618end;
6619
6620@ However, all this discussion about input state really applies only to the
6621case that we are inputting from a file. There is another important case,
6622namely when we are currently getting input from a token list. In this case
6623|state=token_list|, and the conventions about the other state variables
6624are different:
6625
6626\yskip\hang|loc| is a pointer to the current node in the token list, i.e.,
6627the node that will be read next. If |loc=null|, the token list has been
6628fully read.
6629
6630\yskip\hang|start| points to the first node of the token list; this node
6631may or may not contain a reference count, depending on the type of token
6632list involved.
6633
6634\yskip\hang|token_type|, which takes the place of |index| in the
6635discussion above, is a code number that explains what kind of token list
6636is being scanned.
6637
6638\yskip\hang|name| points to the |eqtb| address of the control sequence
6639being expanded, if the current token list is a macro.
6640
6641\yskip\hang|param_start|, which takes the place of |limit|, tells where
6642the parameters of the current macro begin in the |param_stack|, if the
6643current token list is a macro.
6644
6645\yskip\noindent The |token_type| can take several values, depending on
6646where the current token list came from:
6647
6648\yskip\hang|parameter|, if a parameter is being scanned;
6649
6650\hang|u_template|, if the \<u_j> part of an alignment
6651template is being scanned;
6652
6653\hang|v_template|, if the \<v_j> part of an alignment
6654template is being scanned;
6655
6656\hang|backed_up|, if the token list being scanned has been inserted as
6657`to be read again'.
6658
6659\hang|inserted|, if the token list being scanned has been inserted as
6660the text expansion of a \.{\\count} or similar variable;
6661
6662\hang|macro|, if a user-defined control sequence is being scanned;
6663
6664\hang|output_text|, if an \.{\\output} routine is being scanned;
6665
6666\hang|every_par_text|, if the text of \.{\\everypar} is being scanned;
6667
6668\hang|every_math_text|, if the text of \.{\\everymath} is being scanned;
6669
6670\hang|every_display_text|, if the text of \.{\\everydisplay} is being scanned;
6671
6672\hang|every_hbox_text|, if the text of \.{\\everyhbox} is being scanned;
6673
6674\hang|every_vbox_text|, if the text of \.{\\everyvbox} is being scanned;
6675
6676\hang|every_job_text|, if the text of \.{\\everyjob} is being scanned;
6677
6678\hang|every_cr_text|, if the text of \.{\\everycr} is being scanned;
6679
6680\hang|mark_text|, if the text of a \.{\\mark} is being scanned;
6681
6682\hang|write_text|, if the text of a \.{\\write} is being scanned.
6683
6684\yskip\noindent
6685The codes for |output_text|, |every_par_text|, etc., are equal to a constant
6686plus the corresponding codes for token list parameters |output_routine_loc|,
6687|every_par_loc|, etc.  The token list begins with a reference count if and
6688only if |token_type>=macro|.
6689@^reference counts@>
6690
6691@d token_list=0 {|state| code when scanning a token list}
6692@d token_type==index {type of current token list}
6693@d param_start==limit {base of macro parameters in |param_stack|}
6694@d parameter=0 {|token_type| code for parameter}
6695@d u_template=1 {|token_type| code for \<u_j> template}
6696@d v_template=2 {|token_type| code for \<v_j> template}
6697@d backed_up=3 {|token_type| code for text to be reread}
6698@d inserted=4 {|token_type| code for inserted texts}
6699@d macro=5 {|token_type| code for defined control sequences}
6700@d output_text=6 {|token_type| code for output routines}
6701@d every_par_text=7 {|token_type| code for \.{\\everypar}}
6702@d every_math_text=8 {|token_type| code for \.{\\everymath}}
6703@d every_display_text=9 {|token_type| code for \.{\\everydisplay}}
6704@d every_hbox_text=10 {|token_type| code for \.{\\everyhbox}}
6705@d every_vbox_text=11 {|token_type| code for \.{\\everyvbox}}
6706@d every_job_text=12 {|token_type| code for \.{\\everyjob}}
6707@d every_cr_text=13 {|token_type| code for \.{\\everycr}}
6708@d mark_text=14 {|token_type| code for \.{\\topmark}, etc.}
6709@d write_text=15 {|token_type| code for \.{\\write}}
6710
6711@ The |param_stack| is an auxiliary array used to hold pointers to the token
6712lists for parameters at the current level and subsidiary levels of input.
6713This stack is maintained with convention (2), and it grows at a different
6714rate from the others.
6715
6716@<Glob...@>=
6717@!param_stack:array [0..param_size] of pointer;
6718  {token list pointers for parameters}
6719@!param_ptr:0..param_size; {first unused entry in |param_stack|}
6720@!max_param_stack:integer;
6721  {largest value of |param_ptr|, will be |<=param_size+9|}
6722
6723@ The input routines must also interact with the processing of
6724\.{\\halign} and \.{\\valign}, since the appearance of tab marks and
6725\.{\\cr} in certain places is supposed to trigger the beginning of special
6726\<v_j> template text in the scanner. This magic is accomplished by an
6727|align_state| variable that is increased by~1 when a `\.{\char'173}' is
6728scanned and decreased by~1 when a `\.{\char'175}' is scanned. The |align_state|
6729is nonzero during the \<u_j> template, after which it is set to zero; the
6730\<v_j> template begins when a tab mark or \.{\\cr} occurs at a time that
6731|align_state=0|.
6732
6733@<Glob...@>=
6734@!align_state:integer; {group level with respect to current alignment}
6735
6736@ Thus, the ``current input state'' can be very complicated indeed; there
6737can be many levels and each level can arise in a variety of ways. The
6738|show_context| procedure, which is used by \TeX's error-reporting routine to
6739print out the current input state on all levels down to the most recent
6740line of characters from an input file, illustrates most of these conventions.
6741The global variable |base_ptr| contains the lowest level that was
6742displayed by this procedure.
6743
6744@<Glob...@>=
6745@!base_ptr:0..stack_size; {shallowest level shown by |show_context|}
6746
6747@ The status at each level is indicated by printing two lines, where the first
6748line indicates what was read so far and the second line shows what remains
6749to be read. The context is cropped, if necessary, so that the first line
6750contains at most |half_error_line| characters, and the second contains
6751at most |error_line|. Non-current input levels whose |token_type| is
6752`|backed_up|' are shown only if they have not been fully read.
6753
6754@p procedure show_context; {prints where the scanner is}
6755label done;
6756var old_setting:0..max_selector; {saved |selector| setting}
6757@!nn:integer; {number of contexts shown so far, less one}
6758@!bottom_line:boolean; {have we reached the final context to be shown?}
6759@<Local variables for formatting calculations@>@/
6760begin base_ptr:=input_ptr; input_stack[base_ptr]:=cur_input;
6761  {store current state}
6762nn:=-1; bottom_line:=false;
6763loop@+begin cur_input:=input_stack[base_ptr]; {enter into the context}
6764  if (state<>token_list) then
6765    if (name>17) or (base_ptr=0) then bottom_line:=true;
6766  if (base_ptr=input_ptr)or bottom_line or(nn<error_context_lines) then
6767    @<Display the current context@>
6768  else if nn=error_context_lines then
6769    begin print_nl("..."); incr(nn); {omitted if |error_context_lines<0|}
6770    end;
6771  if bottom_line then goto done;
6772  decr(base_ptr);
6773  end;
6774done: cur_input:=input_stack[input_ptr]; {restore original state}
6775end;
6776
6777@ @<Display the current context@>=
6778begin if (base_ptr=input_ptr) or (state<>token_list) or
6779   (token_type<>backed_up) or (loc<>null) then
6780    {we omit backed-up token lists that have already been read}
6781  begin tally:=0; {get ready to count characters}
6782  old_setting:=selector;
6783  if state<>token_list then
6784    begin @<Print location of current line@>;
6785    @<Pseudoprint the line@>;
6786    end
6787  else  begin @<Print type of token list@>;
6788    @<Pseudoprint the token list@>;
6789    end;
6790  selector:=old_setting; {stop pseudoprinting}
6791  @<Print two lines using the tricky pseudoprinted information@>;
6792  incr(nn);
6793  end;
6794end
6795
6796@ This routine should be changed, if necessary, to give the best possible
6797indication of where the current line resides in the input file.
6798For example, on some systems it is best to print both a page and line number.
6799@^system dependencies@>
6800
6801@<Print location of current line@>=
6802if name<=17 then
6803  if terminal_input then
6804    if base_ptr=0 then print_nl("<*>") else print_nl("<insert> ")
6805  else  begin print_nl("<read ");
6806    if name=17 then print_char("*")@+else print_int(name-1);
6807@.*\relax@>
6808    print_char(">");
6809    end
6810else  begin print_nl("l."); print_int(line);
6811  end;
6812print_char(" ")
6813
6814@ @<Print type of token list@>=
6815case token_type of
6816parameter: print_nl("<argument> ");
6817u_template,v_template: print_nl("<template> ");
6818backed_up: if loc=null then print_nl("<recently read> ")
6819  else print_nl("<to be read again> ");
6820inserted: print_nl("<inserted text> ");
6821macro: begin print_ln; print_cs(name);
6822  end;
6823output_text: print_nl("<output> ");
6824every_par_text: print_nl("<everypar> ");
6825every_math_text: print_nl("<everymath> ");
6826every_display_text: print_nl("<everydisplay> ");
6827every_hbox_text: print_nl("<everyhbox> ");
6828every_vbox_text: print_nl("<everyvbox> ");
6829every_job_text: print_nl("<everyjob> ");
6830every_cr_text: print_nl("<everycr> ");
6831mark_text: print_nl("<mark> ");
6832write_text: print_nl("<write> ");
6833othercases print_nl("?") {this should never happen}
6834endcases
6835
6836@ Here it is necessary to explain a little trick. We don't want to store a long
6837string that corresponds to a token list, because that string might take up
6838lots of memory; and we are printing during a time when an error message is
6839being given, so we dare not do anything that might overflow one of \TeX's
6840tables. So `pseudoprinting' is the answer: We enter a mode of printing
6841that stores characters into a buffer of length |error_line|, where character
6842$k+1$ is placed into \hbox{|trick_buf[k mod error_line]|} if
6843|k<trick_count|, otherwise character |k| is dropped. Initially we set
6844|tally:=0| and |trick_count:=1000000|; then when we reach the
6845point where transition from line 1 to line 2 should occur, we
6846set |first_count:=tally| and |trick_count:=@tmax@>(error_line,
6847tally+1+error_line-half_error_line)|. At the end of the
6848pseudoprinting, the values of |first_count|, |tally|, and
6849|trick_count| give us all the information we need to print the two lines,
6850and all of the necessary text is in |trick_buf|.
6851
6852Namely, let |l| be the length of the descriptive information that appears
6853on the first line. The length of the context information gathered for that
6854line is |k=first_count|, and the length of the context information
6855gathered for line~2 is $m=\min(|tally|, |trick_count|)-k$. If |l+k<=h|,
6856where |h=half_error_line|, we print |trick_buf[0..k-1]| after the
6857descriptive information on line~1, and set |n:=l+k|; here |n| is the
6858length of line~1. If $l+k>h$, some cropping is necessary, so we set |n:=h|
6859and print `\.{...}' followed by
6860$$\hbox{|trick_buf[(l+k-h+3)..k-1]|,}$$
6861where subscripts of |trick_buf| are circular modulo |error_line|. The
6862second line consists of |n|~spaces followed by |trick_buf[k..(k+m-1)]|,
6863unless |n+m>error_line|; in the latter case, further cropping is done.
6864This is easier to program than to explain.
6865
6866@<Local variables for formatting...@>=
6867@!i:0..buf_size; {index into |buffer|}
6868@!j:0..buf_size; {end of current line in |buffer|}
6869@!l:0..half_error_line; {length of descriptive information on line 1}
6870@!m:integer; {context information gathered for line 2}
6871@!n:0..error_line; {length of line 1}
6872@!p: integer; {starting or ending place in |trick_buf|}
6873@!q: integer; {temporary index}
6874
6875@ The following code sets up the print routines so that they will gather
6876the desired information.
6877
6878@d begin_pseudoprint==
6879  begin l:=tally; tally:=0; selector:=pseudo;
6880  trick_count:=1000000;
6881  end
6882@d set_trick_count==
6883  begin first_count:=tally;
6884  trick_count:=tally+1+error_line-half_error_line;
6885  if trick_count<error_line then trick_count:=error_line;
6886  end
6887
6888@ And the following code uses the information after it has been gathered.
6889
6890@<Print two lines using the tricky pseudoprinted information@>=
6891if trick_count=1000000 then set_trick_count;
6892  {|set_trick_count| must be performed}
6893if tally<trick_count then m:=tally-first_count
6894else m:=trick_count-first_count; {context on line 2}
6895if l+first_count<=half_error_line then
6896  begin p:=0; n:=l+first_count;
6897  end
6898else  begin print("..."); p:=l+first_count-half_error_line+3;
6899  n:=half_error_line;
6900  end;
6901for q:=p to first_count-1 do print_char(trick_buf[q mod error_line]);
6902print_ln;
6903for q:=1 to n do print_char(" "); {print |n| spaces to begin line~2}
6904if m+n<=error_line then p:=first_count+m else p:=first_count+(error_line-n-3);
6905for q:=first_count to p-1 do print_char(trick_buf[q mod error_line]);
6906if m+n>error_line then print("...")
6907
6908@ But the trick is distracting us from our current goal, which is to
6909understand the input state. So let's concentrate on the data structures that
6910are being pseudoprinted as we finish up the |show_context| procedure.
6911
6912@<Pseudoprint the line@>=
6913begin_pseudoprint;
6914if buffer[limit]=end_line_char then j:=limit
6915else j:=limit+1; {determine the effective end of the line}
6916if j>0 then for i:=start to j-1 do
6917  begin if i=loc then set_trick_count;
6918  print(buffer[i]);
6919  end
6920
6921@ @<Pseudoprint the token list@>=
6922begin_pseudoprint;
6923if token_type<macro then show_token_list(start,loc,100000)
6924else show_token_list(link(start),loc,100000) {avoid reference count}
6925
6926@ Here is the missing piece of |show_token_list| that is activated when the
6927token beginning line~2 is about to be shown:
6928
6929@<Do magic computation@>=set_trick_count
6930
6931@* \[23] Maintaining the input stacks.
6932The following subroutines change the input status in commonly needed ways.
6933
6934First comes |push_input|, which stores the current state and creates a
6935new level (having, initially, the same properties as the old).
6936
6937@d push_input==@t@> {enter a new input level, save the old}
6938  begin if input_ptr>max_in_stack then
6939    begin max_in_stack:=input_ptr;
6940    if input_ptr=stack_size then overflow("input stack size",stack_size);
6941@:TeX capacity exceeded input stack size}{\quad input stack size@>
6942    end;
6943  input_stack[input_ptr]:=cur_input; {stack the record}
6944  incr(input_ptr);
6945  end
6946
6947@ And of course what goes up must come down.
6948
6949@d pop_input==@t@> {leave an input level, re-enter the old}
6950  begin decr(input_ptr); cur_input:=input_stack[input_ptr];
6951  end
6952
6953@ Here is a procedure that starts a new level of token-list input, given
6954a token list |p| and its type |t|. If |t=macro|, the calling routine should
6955set |name| and |loc|.
6956
6957@d back_list(#)==begin_token_list(#,backed_up) {backs up a simple token list}
6958@d ins_list(#)==begin_token_list(#,inserted) {inserts a simple token list}
6959
6960@p procedure begin_token_list(@!p:pointer;@!t:quarterword);
6961begin push_input; state:=token_list; start:=p; token_type:=t;
6962if t>=macro then {the token list starts with a reference count}
6963  begin add_token_ref(p);
6964  if t=macro then param_start:=param_ptr
6965  else  begin loc:=link(p);
6966    if tracing_macros>1 then
6967      begin begin_diagnostic; print_nl("");
6968      case t of
6969      mark_text:print_esc("mark");
6970      write_text:print_esc("write");
6971      othercases print_cmd_chr(assign_toks,t-output_text+output_routine_loc)
6972      endcases;@/
6973      print("->"); token_show(p); end_diagnostic(false);
6974      end;
6975    end;
6976  end
6977else loc:=p;
6978end;
6979
6980@ When a token list has been fully scanned, the following computations
6981should be done as we leave that level of input. The |token_type| tends
6982to be equal to either |backed_up| or |inserted| about 2/3 of the time.
6983@^inner loop@>
6984
6985@p procedure end_token_list; {leave a token-list input level}
6986begin if token_type>=backed_up then {token list to be deleted}
6987  begin if token_type<=inserted then flush_list(start)
6988  else  begin delete_token_ref(start); {update reference count}
6989    if token_type=macro then {parameters must be flushed}
6990      while param_ptr>param_start do
6991        begin decr(param_ptr);
6992        flush_list(param_stack[param_ptr]);
6993        end;
6994    end;
6995  end
6996else if token_type=u_template then
6997  if align_state>500000 then align_state:=0
6998  else fatal_error("(interwoven alignment preambles are not allowed)");
6999@.interwoven alignment preambles...@>
7000pop_input;
7001check_interrupt;
7002end;
7003
7004@ Sometimes \TeX\ has read too far and wants to ``unscan'' what it has
7005seen. The |back_input| procedure takes care of this by putting the token
7006just scanned back into the input stream, ready to be read again. This
7007procedure can be used only if |cur_tok| represents the token to be
7008replaced. Some applications of \TeX\ use this procedure a lot,
7009so it has been slightly optimized for speed.
7010@^inner loop@>
7011
7012@p procedure back_input; {undoes one token of input}
7013var p:pointer; {a token list of length one}
7014begin while (state=token_list)and(loc=null)and(token_type<>v_template) do
7015  end_token_list; {conserve stack space}
7016p:=get_avail; info(p):=cur_tok;
7017if cur_tok<right_brace_limit then
7018  if cur_tok<left_brace_limit then decr(align_state)
7019  else incr(align_state);
7020push_input; state:=token_list; start:=p; token_type:=backed_up;
7021loc:=p; {that was |back_list(p)|, without procedure overhead}
7022end;
7023
7024@ @<Insert token |p| into \TeX's input@>=
7025begin t:=cur_tok; cur_tok:=p; back_input; cur_tok:=t;
7026end
7027
7028@ The |back_error| routine is used when we want to replace an offending token
7029just before issuing an error message. This routine, like |back_input|,
7030requires that |cur_tok| has been set. We disable interrupts during the
7031call of |back_input| so that the help message won't be lost.
7032
7033@p procedure back_error; {back up one token and call |error|}
7034begin OK_to_interrupt:=false; back_input; OK_to_interrupt:=true; error;
7035end;
7036@#
7037procedure ins_error; {back up one inserted token and call |error|}
7038begin OK_to_interrupt:=false; back_input; token_type:=inserted;
7039OK_to_interrupt:=true; error;
7040end;
7041
7042@ The |begin_file_reading| procedure starts a new level of input for lines
7043of characters to be read from a file, or as an insertion from the
7044terminal. It does not take care of opening the file, nor does it set |loc|
7045or |limit| or |line|.
7046@^system dependencies@>
7047
7048@p procedure begin_file_reading;
7049begin if in_open=max_in_open then overflow("text input levels",max_in_open);
7050@:TeX capacity exceeded text input levels}{\quad text input levels@>
7051if first=buf_size then overflow("buffer size",buf_size);
7052@:TeX capacity exceeded buffer size}{\quad buffer size@>
7053incr(in_open); push_input; index:=in_open;
7054line_stack[index]:=line; start:=first; state:=mid_line;
7055name:=0; {|terminal_input| is now |true|}
7056end;
7057
7058@ Conversely, the variables must be downdated when such a level of input
7059is finished:
7060
7061@p procedure end_file_reading;
7062begin first:=start; line:=line_stack[index];
7063if name>17 then a_close(cur_file); {forget it}
7064pop_input; decr(in_open);
7065end;
7066
7067@ In order to keep the stack from overflowing during a long sequence of
7068inserted `\.{\\show}' commands, the following routine removes completed
7069error-inserted lines from memory.
7070
7071@p procedure clear_for_error_prompt;
7072begin while (state<>token_list)and terminal_input and@|
7073  (input_ptr>0)and(loc>limit) do end_file_reading;
7074print_ln; clear_terminal;
7075end;
7076
7077@ To get \TeX's whole input mechanism going, we perform the following
7078actions.
7079
7080@<Initialize the input routines@>=
7081begin input_ptr:=0; max_in_stack:=0;
7082in_open:=0; open_parens:=0; max_buf_stack:=0;
7083param_ptr:=0; max_param_stack:=0;
7084first:=buf_size; repeat buffer[first]:=0; decr(first); until first=0;
7085scanner_status:=normal; warning_index:=null; first:=1;
7086state:=new_line; start:=1; index:=0; line:=0; name:=0;
7087force_eof:=false;
7088align_state:=1000000;@/
7089if not init_terminal then goto final_end;
7090limit:=last; first:=last+1; {|init_terminal| has set |loc| and |last|}
7091end
7092
7093@* \[24] Getting the next token.
7094The heart of \TeX's input mechanism is the |get_next| procedure, which
7095we shall develop in the next few sections of the program. Perhaps we
7096shouldn't actually call it the ``heart,'' however, because it really acts
7097as \TeX's eyes and mouth, reading the source files and gobbling them up.
7098And it also helps \TeX\ to regurgitate stored token lists that are to be
7099processed again.
7100@^eyes and mouth@>
7101
7102The main duty of |get_next| is to input one token and to set |cur_cmd|
7103and |cur_chr| to that token's command code and modifier. Furthermore, if
7104the input token is a control sequence, the |eqtb| location of that control
7105sequence is stored in |cur_cs|; otherwise |cur_cs| is set to zero.
7106
7107Underlying this simple description is a certain amount of complexity
7108because of all the cases that need to be handled.
7109However, the inner loop of |get_next| is reasonably short and fast.
7110
7111When |get_next| is asked to get the next token of a \.{\\read} line,
7112it sets |cur_cmd=cur_chr=cur_cs=0| in the case that no more tokens
7113appear on that line. (There might not be any tokens at all, if the
7114|end_line_char| has |ignore| as its catcode.)
7115
7116@ The value of |par_loc| is the |eqtb| address of `\.{\\par}'. This quantity
7117is needed because a blank line of input is supposed to be exactly equivalent
7118to the appearance of \.{\\par}; we must set |cur_cs:=par_loc|
7119when detecting a blank line.
7120
7121@<Glob...@>=
7122@!par_loc:pointer; {location of `\.{\\par}' in |eqtb|}
7123@!par_token:halfword; {token representing `\.{\\par}'}
7124
7125@ @<Put each...@>=
7126primitive("par",par_end,256); {cf.\ |scan_file_name|}
7127@!@:par_}{\.{\\par} primitive@>
7128par_loc:=cur_val; par_token:=cs_token_flag+par_loc;
7129
7130@ @<Cases of |print_cmd_chr|...@>=
7131par_end:print_esc("par");
7132
7133@ Before getting into |get_next|, let's consider the subroutine that
7134is called when an `\.{\\outer}' control sequence has been scanned or
7135when the end of a file has been reached. These two cases are distinguished
7136by |cur_cs|, which is zero at the end of a file.
7137
7138@p procedure check_outer_validity;
7139var p:pointer; {points to inserted token list}
7140@!q:pointer; {auxiliary pointer}
7141begin if scanner_status<>normal then
7142  begin deletions_allowed:=false;
7143  @<Back up an outer control sequence so that it can be reread@>;
7144  if scanner_status>skipping then
7145    @<Tell the user what has run away and try to recover@>
7146  else  begin print_err("Incomplete "); print_cmd_chr(if_test,cur_if);
7147@.Incomplete \\if...@>
7148    print("; all text was ignored after line "); print_int(skip_line);
7149    help3("A forbidden control sequence occurred in skipped text.")@/
7150    ("This kind of error happens when you say `\if...' and forget")@/
7151    ("the matching `\fi'. I've inserted a `\fi'; this might work.");
7152    if cur_cs<>0 then cur_cs:=0
7153    else help_line[2]:=@|
7154      "The file ended while I was skipping conditional text.";
7155    cur_tok:=cs_token_flag+frozen_fi; ins_error;
7156    end;
7157  deletions_allowed:=true;
7158  end;
7159end;
7160
7161@ An outer control sequence that occurs in a \.{\\read} will not be reread,
7162since the error recovery for \.{\\read} is not very powerful.
7163
7164@<Back up an outer control sequence so that it can be reread@>=
7165if cur_cs<>0 then
7166  begin if (state=token_list)or(name<1)or(name>17) then
7167    begin p:=get_avail; info(p):=cs_token_flag+cur_cs;
7168    back_list(p); {prepare to read the control sequence again}
7169    end;
7170  cur_cmd:=spacer; cur_chr:=" "; {replace it by a space}
7171  end
7172
7173@ @<Tell the user what has run away...@>=
7174begin runaway; {print a definition, argument, or preamble}
7175if cur_cs=0 then print_err("File ended")
7176@.File ended while scanning...@>
7177else  begin cur_cs:=0; print_err("Forbidden control sequence found");
7178@.Forbidden control sequence...@>
7179  end;
7180print(" while scanning ");
7181@<Print either `\.{definition}' or `\.{use}' or `\.{preamble}' or `\.{text}',
7182  and insert tokens that should lead to recovery@>;
7183print(" of "); sprint_cs(warning_index);
7184help4("I suspect you have forgotten a `}', causing me")@/
7185("to read past where you wanted me to stop.")@/
7186("I'll try to recover; but if the error is serious,")@/
7187("you'd better type `E' or `X' now and fix your file.");@/
7188error;
7189end
7190
7191@ The recovery procedure can't be fully understood without knowing more
7192about the \TeX\ routines that should be aborted, but we can sketch the
7193ideas here:  For a runaway definition we will insert a right brace; for a
7194runaway preamble, we will insert a special \.{\\cr} token and a right
7195brace; and for a runaway argument, we will set |long_state| to
7196|outer_call| and insert \.{\\par}.
7197
7198@<Print either `\.{definition}' or ...@>=
7199p:=get_avail;
7200case scanner_status of
7201defining:begin print("definition"); info(p):=right_brace_token+"}";
7202  end;
7203matching:begin print("use"); info(p):=par_token; long_state:=outer_call;
7204  end;
7205aligning:begin print("preamble"); info(p):=right_brace_token+"}"; q:=p;
7206  p:=get_avail; link(p):=q; info(p):=cs_token_flag+frozen_cr;
7207  align_state:=-1000000;
7208  end;
7209absorbing:begin print("text"); info(p):=right_brace_token+"}";
7210  end;
7211end; {there are no other cases}
7212ins_list(p)
7213
7214@ We need to mention a procedure here that may be called by |get_next|.
7215
7216@p procedure@?firm_up_the_line; forward;
7217
7218@ Now we're ready to take the plunge into |get_next| itself. Parts of
7219this routine are executed more often than any other instructions of \TeX.
7220@^mastication@>@^inner loop@>
7221
7222@d switch=25 {a label in |get_next|}
7223@d start_cs=26 {another}
7224
7225@p procedure get_next; {sets |cur_cmd|, |cur_chr|, |cur_cs| to next token}
7226label restart, {go here to get the next input token}
7227  switch, {go here to eat the next character from a file}
7228  reswitch, {go here to digest it again}
7229  start_cs, {go here to start looking for a control sequence}
7230  found, {go here when a control sequence has been found}
7231  exit; {go here when the next input token has been got}
7232var k:0..buf_size; {an index into |buffer|}
7233@!t:halfword; {a token}
7234@!cat:0..max_char_code; {|cat_code(cur_chr)|, usually}
7235@!c,@!cc:ASCII_code; {constituents of a possible expanded code}
7236@!d:2..3; {number of excess characters in an expanded code}
7237begin restart: cur_cs:=0;
7238if state<>token_list then
7239@<Input from external file, |goto restart| if no input found@>
7240else @<Input from token list, |goto restart| if end of list or
7241  if a parameter needs to be expanded@>;
7242@<If an alignment entry has just ended, take appropriate action@>;
7243exit:end;
7244
7245@ An alignment entry ends when a tab or \.{\\cr} occurs, provided that the
7246current level of braces is the same as the level that was present at the
7247beginning of that alignment entry; i.e., provided that |align_state| has
7248returned to the value it had after the \<u_j> template for that entry.
7249@^inner loop@>
7250
7251@<If an alignment entry has just ended, take appropriate action@>=
7252if cur_cmd<=car_ret then if cur_cmd>=tab_mark then if align_state=0 then
7253  @<Insert the \(v)\<v_j> template and |goto restart|@>
7254
7255@ @<Input from external file, |goto restart| if no input found@>=
7256@^inner loop@>
7257begin switch: if loc<=limit then {current line not yet finished}
7258  begin cur_chr:=buffer[loc]; incr(loc);
7259  reswitch: cur_cmd:=cat_code(cur_chr);
7260  @<Change state if necessary, and |goto switch| if the
7261    current character should be ignored,
7262    or |goto reswitch| if the current character
7263    changes to another@>;
7264  end
7265else  begin state:=new_line;@/
7266  @<Move to next line of file,
7267    or |goto restart| if there is no next line,
7268    or |return| if a \.{\\read} line has finished@>;
7269  check_interrupt;
7270  goto switch;
7271  end;
7272end
7273
7274@ The following 48-way switch accomplishes the scanning quickly, assuming
7275that a decent \PASCAL\ compiler has translated the code. Note that the numeric
7276values for |mid_line|, |skip_blanks|, and |new_line| are spaced
7277apart from each other by |max_char_code+1|, so we can add a character's
7278command code to the state to get a single number that characterizes both.
7279
7280@d any_state_plus(#) == mid_line+#,skip_blanks+#,new_line+#
7281
7282@<Change state if necessary...@>=
7283case state+cur_cmd of
7284@<Cases where character is ignored@>: goto switch;
7285any_state_plus(escape): @<Scan a control sequence
7286  and set |state:=skip_blanks| or |mid_line|@>;
7287any_state_plus(active_char): @<Process an active-character control sequence
7288  and set |state:=mid_line|@>;
7289any_state_plus(sup_mark): @<If this |sup_mark| starts an expanded character
7290  like~\.{\^\^A} or~\.{\^\^df}, then |goto reswitch|,
7291  otherwise set |state:=mid_line|@>;
7292any_state_plus(invalid_char): @<Decry the invalid character and
7293  |goto restart|@>;
7294@t\4@>@<Handle situations involving spaces, braces, changes of state@>@;
7295othercases do_nothing
7296endcases
7297
7298@ @<Cases where character is ignored@>=
7299any_state_plus(ignore),skip_blanks+spacer,new_line+spacer
7300
7301@ We go to |restart| instead of to |switch|, because |state| might equal
7302|token_list| after the error has been dealt with
7303(cf.\ |clear_for_error_prompt|).
7304
7305@<Decry the invalid...@>=
7306begin print_err("Text line contains an invalid character");
7307@.Text line contains...@>
7308help2("A funny symbol that I can't read has just been input.")@/
7309("Continue, and I'll forget that it ever happened.");@/
7310deletions_allowed:=false; error; deletions_allowed:=true;
7311goto restart;
7312end
7313
7314@ @d add_delims_to(#)==#+math_shift,#+tab_mark,#+mac_param,
7315  #+sub_mark,#+letter,#+other_char
7316
7317@<Handle situations involving spaces, braces, changes of state@>=
7318mid_line+spacer:@<Enter |skip_blanks| state, emit a space@>;
7319mid_line+car_ret:@<Finish line, emit a space@>;
7320skip_blanks+car_ret,any_state_plus(comment):
7321  @<Finish line, |goto switch|@>;
7322new_line+car_ret:@<Finish line, emit a \.{\\par}@>;
7323mid_line+left_brace: incr(align_state);
7324skip_blanks+left_brace,new_line+left_brace: begin
7325  state:=mid_line; incr(align_state);
7326  end;
7327mid_line+right_brace: decr(align_state);
7328skip_blanks+right_brace,new_line+right_brace: begin
7329  state:=mid_line; decr(align_state);
7330  end;
7331add_delims_to(skip_blanks),add_delims_to(new_line): state:=mid_line;
7332
7333@ When a character of type |spacer| gets through, its character code is
7334changed to $\.{"\ "}=@'40$. This means that the ASCII codes for tab and space,
7335and for the space inserted at the end of a line, will
7336be treated alike when macro parameters are being matched. We do this
7337since such characters are indistinguishable on most computer terminal displays.
7338
7339@<Finish line, emit a space@>=
7340begin loc:=limit+1; cur_cmd:=spacer; cur_chr:=" ";
7341end
7342
7343@ The following code is performed only when |cur_cmd=spacer|.
7344
7345@<Enter |skip_blanks| state, emit a space@>=
7346begin state:=skip_blanks; cur_chr:=" ";
7347end
7348
7349@ @<Finish line, |goto switch|@>=
7350begin loc:=limit+1; goto switch;
7351end
7352
7353@ @<Finish line, emit a \.{\\par}@>=
7354begin loc:=limit+1; cur_cs:=par_loc; cur_cmd:=eq_type(cur_cs);
7355cur_chr:=equiv(cur_cs);
7356if cur_cmd>=outer_call then check_outer_validity;
7357end
7358
7359@ Notice that a code like \.{\^\^8} becomes \.x if not followed by a hex digit.
7360
7361@d is_hex(#)==(((#>="0")and(#<="9"))or((#>="a")and(#<="f")))
7362@d hex_to_cur_chr==
7363  if c<="9" then cur_chr:=c-"0" @+else cur_chr:=c-"a"+10;
7364  if cc<="9" then cur_chr:=16*cur_chr+cc-"0"
7365  else cur_chr:=16*cur_chr+cc-"a"+10
7366
7367@<If this |sup_mark| starts an expanded character...@>=
7368begin if cur_chr=buffer[loc] then if loc<limit then
7369  begin c:=buffer[loc+1]; @+if c<@'200 then {yes we have an expanded char}
7370    begin loc:=loc+2;
7371    if is_hex(c) then if loc<=limit then
7372      begin cc:=buffer[loc]; @+if is_hex(cc) then
7373        begin incr(loc); hex_to_cur_chr; goto reswitch;
7374        end;
7375      end;
7376    if c<@'100 then cur_chr:=c+@'100 @+else cur_chr:=c-@'100;
7377    goto reswitch;
7378    end;
7379  end;
7380state:=mid_line;
7381end
7382
7383@ @<Process an active-character...@>=
7384begin cur_cs:=cur_chr+active_base;
7385cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs); state:=mid_line;
7386if cur_cmd>=outer_call then check_outer_validity;
7387end
7388
7389@ Control sequence names are scanned only when they appear in some line of
7390a file; once they have been scanned the first time, their |eqtb| location
7391serves as a unique identification, so \TeX\ doesn't need to refer to the
7392original name any more except when it prints the equivalent in symbolic form.
7393
7394The program that scans a control sequence has been written carefully
7395in order to avoid the blowups that might otherwise occur if a malicious
7396user tried something like `\.{\\catcode\'15=0}'. The algorithm might
7397look at |buffer[limit+1]|, but it never looks at |buffer[limit+2]|.
7398
7399If expanded characters like `\.{\^\^A}' or `\.{\^\^df}'
7400appear in or just following
7401a control sequence name, they are converted to single characters in the
7402buffer and the process is repeated, slowly but surely.
7403
7404@<Scan a control...@>=
7405begin if loc>limit then cur_cs:=null_cs {|state| is irrelevant in this case}
7406else  begin start_cs: k:=loc; cur_chr:=buffer[k]; cat:=cat_code(cur_chr);
7407  incr(k);
7408  if cat=letter then state:=skip_blanks
7409  else if cat=spacer then state:=skip_blanks
7410  else state:=mid_line;
7411  if (cat=letter)and(k<=limit) then
7412    @<Scan ahead in the buffer until finding a nonletter;
7413    if an expanded code is encountered, reduce it
7414    and |goto start_cs|; otherwise if a multiletter control
7415    sequence is found, adjust |cur_cs| and |loc|, and
7416    |goto found|@>
7417  else @<If an expanded code is present, reduce it and |goto start_cs|@>;
7418  cur_cs:=single_base+buffer[loc]; incr(loc);
7419  end;
7420found: cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs);
7421if cur_cmd>=outer_call then check_outer_validity;
7422end
7423
7424@ Whenever we reach the following piece of code, we will have
7425|cur_chr=buffer[k-1]| and |k<=limit+1| and |cat=cat_code(cur_chr)|. If an
7426expanded code like \.{\^\^A} or \.{\^\^df} appears in |buffer[(k-1)..(k+1)]|
7427or |buffer[(k-1)..(k+2)]|, we
7428will store the corresponding code in |buffer[k-1]| and shift the rest of
7429the buffer left two or three places.
7430
7431@<If an expanded...@>=
7432begin if buffer[k]=cur_chr then @+if cat=sup_mark then @+if k<limit then
7433  begin c:=buffer[k+1]; @+if c<@'200 then {yes, one is indeed present}
7434    begin d:=2;
7435    if is_hex(c) then @+if k+2<=limit then
7436      begin cc:=buffer[k+2]; @+if is_hex(cc) then incr(d);
7437      end;
7438    if d>2 then
7439      begin hex_to_cur_chr; buffer[k-1]:=cur_chr;
7440      end
7441    else if c<@'100 then buffer[k-1]:=c+@'100
7442    else buffer[k-1]:=c-@'100;
7443    limit:=limit-d; first:=first-d;
7444    while k<=limit do
7445      begin buffer[k]:=buffer[k+d]; incr(k);
7446      end;
7447    goto start_cs;
7448    end;
7449  end;
7450end
7451
7452@ @<Scan ahead in the buffer...@>=
7453begin repeat cur_chr:=buffer[k]; cat:=cat_code(cur_chr); incr(k);
7454until (cat<>letter)or(k>limit);
7455@<If an expanded...@>;
7456if cat<>letter then decr(k);
7457  {now |k| points to first nonletter}
7458if k>loc+1 then {multiletter control sequence has been scanned}
7459  begin cur_cs:=id_lookup(loc,k-loc); loc:=k; goto found;
7460  end;
7461end
7462
7463@ Let's consider now what happens when |get_next| is looking at a token list.
7464
7465@<Input from token list, |goto restart| if end of list or
7466  if a parameter needs to be expanded@>=
7467if loc<>null then {list not exhausted}
7468@^inner loop@>
7469  begin t:=info(loc); loc:=link(loc); {move to next}
7470  if t>=cs_token_flag then {a control sequence token}
7471    begin cur_cs:=t-cs_token_flag;
7472    cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs);
7473    if cur_cmd>=outer_call then
7474      if cur_cmd=dont_expand then
7475        @<Get the next token, suppressing expansion@>
7476      else check_outer_validity;
7477    end
7478  else  begin cur_cmd:=t div @'400; cur_chr:=t mod @'400;
7479    case cur_cmd of
7480    left_brace: incr(align_state);
7481    right_brace: decr(align_state);
7482    out_param: @<Insert macro parameter and |goto restart|@>;
7483    othercases do_nothing
7484    endcases;
7485    end;
7486  end
7487else  begin {we are done with this token list}
7488  end_token_list; goto restart; {resume previous level}
7489  end
7490
7491@ The present point in the program is reached only when the |expand|
7492routine has inserted a special marker into the input. In this special
7493case, |info(loc)| is known to be a control sequence token, and |link(loc)=null|.
7494
7495@d no_expand_flag=257 {this characterizes a special variant of |relax|}
7496
7497@<Get the next token, suppressing expansion@>=
7498begin cur_cs:=info(loc)-cs_token_flag; loc:=null;@/
7499cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs);
7500if cur_cmd>max_command then
7501  begin cur_cmd:=relax; cur_chr:=no_expand_flag;
7502  end;
7503end
7504
7505@ @<Insert macro parameter...@>=
7506begin begin_token_list(param_stack[param_start+cur_chr-1],parameter);
7507goto restart;
7508end
7509
7510@ All of the easy branches of |get_next| have now been taken care of.
7511There is one more branch.
7512
7513@d end_line_char_inactive == (end_line_char<0)or(end_line_char>255)
7514
7515@<Move to next line of file, or |goto restart|...@>=
7516if name>17 then @<Read next line of file into |buffer|, or
7517  |goto restart| if the file has ended@>
7518else  begin if not terminal_input then {\.{\\read} line has ended}
7519    begin cur_cmd:=0; cur_chr:=0; return;
7520    end;
7521  if input_ptr>0 then {text was inserted during error recovery}
7522    begin end_file_reading; goto restart; {resume previous level}
7523    end;
7524  if selector<log_only then open_log_file;
7525  if interaction>nonstop_mode then
7526    begin if end_line_char_inactive then incr(limit);
7527    if limit=start then {previous line was empty}
7528      print_nl("(Please type a command or say `\end')");
7529@.Please type...@>
7530    print_ln; first:=start;
7531    prompt_input("*"); {input on-line into |buffer|}
7532@.*\relax@>
7533    limit:=last;
7534    if end_line_char_inactive then decr(limit)
7535    else  buffer[limit]:=end_line_char;
7536    first:=limit+1;
7537    loc:=start;
7538    end
7539  else fatal_error("*** (job aborted, no legal \end found)");
7540@.job aborted@>
7541    {nonstop mode, which is intended for overnight batch processing,
7542    never waits for on-line input}
7543  end
7544
7545@ The global variable |force_eof| is normally |false|; it is set |true|
7546by an \.{\\endinput} command.
7547
7548@<Glob...@>=
7549@!force_eof:boolean; {should the next \.{\\input} be aborted early?}
7550
7551@ @<Read next line of file into |buffer|, or
7552  |goto restart| if the file has ended@>=
7553begin incr(line); first:=start;
7554if not force_eof then
7555  begin if input_ln(cur_file,true) then {not end of file}
7556    firm_up_the_line {this sets |limit|}
7557  else force_eof:=true;
7558  end;
7559if force_eof then
7560  begin print_char(")"); decr(open_parens);
7561  update_terminal; {show user that file has been read}
7562  force_eof:=false;
7563  end_file_reading; {resume previous level}
7564  check_outer_validity; goto restart;
7565  end;
7566if end_line_char_inactive then decr(limit)
7567else  buffer[limit]:=end_line_char;
7568first:=limit+1; loc:=start; {ready to read}
7569end
7570
7571@ If the user has set the |pausing| parameter to some positive value,
7572and if nonstop mode has not been selected, each line of input is displayed
7573on the terminal and the transcript file, followed by `\.{=>}'.
7574\TeX\ waits for a response. If the response is simply |carriage_return|, the
7575line is accepted as it stands, otherwise the line typed is
7576used instead of the line in the file.
7577
7578@p procedure firm_up_the_line;
7579var k:0..buf_size; {an index into |buffer|}
7580begin limit:=last;
7581if pausing>0 then if interaction>nonstop_mode then
7582  begin wake_up_terminal; print_ln;
7583  if start<limit then for k:=start to limit-1 do print(buffer[k]);
7584  first:=limit; prompt_input("=>"); {wait for user response}
7585@.=>@>
7586  if last>first then
7587    begin for k:=first to last-1 do {move line down in buffer}
7588      buffer[k+start-first]:=buffer[k];
7589    limit:=start+last-first;
7590    end;
7591  end;
7592end;
7593
7594@ Since |get_next| is used so frequently in \TeX, it is convenient
7595to define three related procedures that do a little more:
7596
7597\yskip\hang|get_token| not only sets |cur_cmd| and |cur_chr|, it
7598also sets |cur_tok|, a packed halfword version of the current token.
7599
7600\yskip\hang|get_x_token|, meaning ``get an expanded token,'' is like
7601|get_token|, but if the current token turns out to be a user-defined
7602control sequence (i.e., a macro call), or a conditional,
7603or something like \.{\\topmark} or \.{\\expandafter} or \.{\\csname},
7604it is eliminated from the input by beginning the expansion of the macro
7605or the evaluation of the conditional.
7606
7607\yskip\hang|x_token| is like |get_x_token| except that it assumes that
7608|get_next| has already been called.
7609
7610\yskip\noindent
7611In fact, these three procedures account for almost every use of |get_next|.
7612
7613@ No new control sequences will be defined except during a call of
7614|get_token|, or when \.{\\csname} compresses a token list, because
7615|no_new_control_sequence| is always |true| at other times.
7616
7617@p procedure get_token; {sets |cur_cmd|, |cur_chr|, |cur_tok|}
7618begin no_new_control_sequence:=false; get_next; no_new_control_sequence:=true;
7619@^inner loop@>
7620if cur_cs=0 then cur_tok:=(cur_cmd*@'400)+cur_chr
7621else cur_tok:=cs_token_flag+cur_cs;
7622end;
7623
7624@* \[25] Expanding the next token.
7625Only a dozen or so command codes |>max_command| can possibly be returned by
7626|get_next|; in increasing order, they are |undefined_cs|, |expand_after|,
7627|no_expand|, |input|, |if_test|, |fi_or_else|, |cs_name|, |convert|, |the|,
7628|top_bot_mark|, |call|, |long_call|, |outer_call|, |long_outer_call|, and
7629|end_template|.{\emergencystretch=40pt\par}
7630
7631The |expand| subroutine is used when |cur_cmd>max_command|. It removes a
7632``call'' or a conditional or one of the other special operations just
7633listed.  It follows that |expand| might invoke itself recursively. In all
7634cases, |expand| destroys the current token, but it sets things up so that
7635the next |get_next| will deliver the appropriate next token. The value of
7636|cur_tok| need not be known when |expand| is called.
7637
7638Since several of the basic scanning routines communicate via global variables,
7639their values are saved as local variables of |expand| so that
7640recursive calls don't invalidate them.
7641@^recursion@>
7642
7643@p@t\4@>@<Declare the procedure called |macro_call|@>@;@/
7644@t\4@>@<Declare the procedure called |insert_relax|@>@;@/
7645procedure@?pass_text; forward;@t\2@>
7646procedure@?start_input; forward;@t\2@>
7647procedure@?conditional; forward;@t\2@>
7648procedure@?get_x_token; forward;@t\2@>
7649procedure@?conv_toks; forward;@t\2@>
7650procedure@?ins_the_toks; forward;@t\2@>
7651procedure expand;
7652var t:halfword; {token that is being ``expanded after''}
7653@!p,@!q,@!r:pointer; {for list manipulation}
7654@!j:0..buf_size; {index into |buffer|}
7655@!cv_backup:integer; {to save the global quantity |cur_val|}
7656@!cvl_backup,@!radix_backup,@!co_backup:small_number;
7657  {to save |cur_val_level|, etc.}
7658@!backup_backup:pointer; {to save |link(backup_head)|}
7659@!save_scanner_status:small_number; {temporary storage of |scanner_status|}
7660begin cv_backup:=cur_val; cvl_backup:=cur_val_level; radix_backup:=radix;
7661co_backup:=cur_order; backup_backup:=link(backup_head);
7662if cur_cmd<call then @<Expand a nonmacro@>
7663else if cur_cmd<end_template then macro_call
7664else @<Insert a token containing |frozen_endv|@>;
7665cur_val:=cv_backup; cur_val_level:=cvl_backup; radix:=radix_backup;
7666cur_order:=co_backup; link(backup_head):=backup_backup;
7667end;
7668
7669@ @<Expand a nonmacro@>=
7670begin if tracing_commands>1 then show_cur_cmd_chr;
7671case cur_cmd of
7672top_bot_mark:@<Insert the \(a)appropriate mark text into the scanner@>;
7673expand_after:@<Expand the token after the next token@>;
7674no_expand:@<Suppress expansion of the next token@>;
7675cs_name:@<Manufacture a control sequence name@>;
7676convert:conv_toks; {this procedure is discussed in Part 27 below}
7677the:ins_the_toks; {this procedure is discussed in Part 27 below}
7678if_test:conditional; {this procedure is discussed in Part 28 below}
7679fi_or_else:@<Terminate the current conditional and skip to \.{\\fi}@>;
7680input:@<Initiate or terminate input from a file@>;
7681othercases @<Complain about an undefined macro@>
7682endcases;
7683end
7684
7685@ It takes only a little shuffling to do what \TeX\ calls \.{\\expandafter}.
7686
7687@<Expand the token after...@>=
7688begin get_token; t:=cur_tok; get_token;
7689if cur_cmd>max_command then expand@+else back_input;
7690cur_tok:=t; back_input;
7691end
7692
7693@ The implementation of \.{\\noexpand} is a bit trickier, because it is
7694necessary to insert a special `|dont_expand|' marker into \TeX's reading
7695mechanism.  This special marker is processed by |get_next|, but it does
7696not slow down the inner loop.
7697
7698Since \.{\\outer} macros might arise here, we must also
7699clear the |scanner_status| temporarily.
7700
7701@<Suppress expansion...@>=
7702begin save_scanner_status:=scanner_status; scanner_status:=normal;
7703get_token; scanner_status:=save_scanner_status; t:=cur_tok;
7704back_input; {now |start| and |loc| point to the backed-up token |t|}
7705if t>=cs_token_flag then
7706  begin p:=get_avail; info(p):=cs_token_flag+frozen_dont_expand;
7707  link(p):=loc; start:=p; loc:=p;
7708  end;
7709end
7710
7711@ @<Complain about an undefined macro@>=
7712begin print_err("Undefined control sequence");
7713@.Undefined control sequence@>
7714help5("The control sequence at the end of the top line")@/
7715("of your error message was never \def'ed. If you have")@/
7716("misspelled it (e.g., `\hobx'), type `I' and the correct")@/
7717("spelling (e.g., `I\hbox'). Otherwise just continue,")@/
7718("and I'll forget about whatever was undefined.");
7719error;
7720end
7721
7722@ The |expand| procedure and some other routines that construct token
7723lists find it convenient to use the following macros, which are valid only if
7724the variables |p| and |q| are reserved for token-list building.
7725
7726@d store_new_token(#)==begin q:=get_avail; link(p):=q; info(q):=#;
7727  p:=q; {|link(p)| is |null|}
7728  end
7729@d fast_store_new_token(#)==begin fast_get_avail(q); link(p):=q; info(q):=#;
7730  p:=q; {|link(p)| is |null|}
7731  end
7732
7733@ @<Manufacture a control...@>=
7734begin r:=get_avail; p:=r; {head of the list of characters}
7735repeat get_x_token;
7736if cur_cs=0 then store_new_token(cur_tok);
7737until cur_cs<>0;
7738if cur_cmd<>end_cs_name then @<Complain about missing \.{\\endcsname}@>;
7739@<Look up the characters of list |r| in the hash table, and set |cur_cs|@>;
7740flush_list(r);
7741if eq_type(cur_cs)=undefined_cs then
7742  begin eq_define(cur_cs,relax,256); {N.B.: The |save_stack| might change}
7743  end; {the control sequence will now match `\.{\\relax}'}
7744cur_tok:=cur_cs+cs_token_flag; back_input;
7745end
7746
7747@ @<Complain about missing \.{\\endcsname}@>=
7748begin print_err("Missing "); print_esc("endcsname"); print(" inserted");
7749@.Missing \\endcsname...@>
7750help2("The control sequence marked <to be read again> should")@/
7751  ("not appear between \csname and \endcsname.");
7752back_error;
7753end
7754
7755@ @<Look up the characters of list |r| in the hash table...@>=
7756j:=first; p:=link(r);
7757while p<>null do
7758  begin if j>=max_buf_stack then
7759    begin max_buf_stack:=j+1;
7760    if max_buf_stack=buf_size then
7761      overflow("buffer size",buf_size);
7762@:TeX capacity exceeded buffer size}{\quad buffer size@>
7763    end;
7764  buffer[j]:=info(p) mod @'400; incr(j); p:=link(p);
7765  end;
7766if j>first+1 then
7767  begin no_new_control_sequence:=false; cur_cs:=id_lookup(first,j-first);
7768  no_new_control_sequence:=true;
7769  end
7770else if j=first then cur_cs:=null_cs {the list is empty}
7771else cur_cs:=single_base+buffer[first] {the list has length one}
7772
7773@ An |end_template| command is effectively changed to an |endv| command
7774by the following code. (The reason for this is discussed below; the
7775|frozen_end_template| at the end of the template has passed the
7776|check_outer_validity| test, so its mission of error detection has been
7777accomplished.)
7778
7779@<Insert a token containing |frozen_endv|@>=
7780begin cur_tok:=cs_token_flag+frozen_endv; back_input;
7781end
7782
7783@ The processing of \.{\\input} involves the |start_input| subroutine,
7784which will be declared later; the processing of \.{\\endinput} is trivial.
7785
7786@<Put each...@>=
7787primitive("input",input,0);@/
7788@!@:input_}{\.{\\input} primitive@>
7789primitive("endinput",input,1);@/
7790@!@:end_input_}{\.{\\endinput} primitive@>
7791
7792@ @<Cases of |print_cmd_chr|...@>=
7793input: if chr_code=0 then print_esc("input")@+else print_esc("endinput");
7794
7795@ @<Initiate or terminate input...@>=
7796if cur_chr>0 then force_eof:=true
7797else if name_in_progress then insert_relax
7798else start_input
7799
7800@ Sometimes the expansion looks too far ahead, so we want to insert
7801a harmless \.{\\relax} into the user's input.
7802
7803@<Declare the procedure called |insert_relax|@>=
7804procedure insert_relax;
7805begin cur_tok:=cs_token_flag+cur_cs; back_input;
7806cur_tok:=cs_token_flag+frozen_relax; back_input; token_type:=inserted;
7807end;
7808
7809@ Here is a recursive procedure that is \TeX's usual way to get the
7810next token of input. It has been slightly optimized to take account of
7811common cases.
7812
7813@p procedure get_x_token; {sets |cur_cmd|, |cur_chr|, |cur_tok|,
7814  and expands macros}
7815label restart,done;
7816begin restart: get_next;
7817@^inner loop@>
7818if cur_cmd<=max_command then goto done;
7819if cur_cmd>=call then
7820  if cur_cmd<end_template then macro_call
7821  else  begin cur_cs:=frozen_endv; cur_cmd:=endv;
7822    goto done; {|cur_chr=null_list|}
7823    end
7824else expand;
7825goto restart;
7826done: if cur_cs=0 then cur_tok:=(cur_cmd*@'400)+cur_chr
7827else cur_tok:=cs_token_flag+cur_cs;
7828end;
7829
7830@ The |get_x_token| procedure is equivalent to two consecutive
7831procedure calls: |get_next; x_token|.
7832
7833@p procedure x_token; {|get_x_token| without the initial |get_next|}
7834begin while cur_cmd>max_command do
7835  begin expand;
7836  get_next;
7837  end;
7838if cur_cs=0 then cur_tok:=(cur_cmd*@'400)+cur_chr
7839else cur_tok:=cs_token_flag+cur_cs;
7840end;
7841
7842@ A control sequence that has been \.{\\def}'ed by the user is expanded by
7843\TeX's |macro_call| procedure.
7844
7845Before we get into the details of |macro_call|, however, let's consider the
7846treatment of primitives like \.{\\topmark}, since they are essentially
7847macros without parameters. The token lists for such marks are kept in a
7848global array of five pointers; we refer to the individual entries of this
7849array by symbolic names |top_mark|, etc. The value of |top_mark| is either
7850|null| or a pointer to the reference count of a token list.
7851
7852@d top_mark_code=0 {the mark in effect at the previous page break}
7853@d first_mark_code=1 {the first mark between |top_mark| and |bot_mark|}
7854@d bot_mark_code=2 {the mark in effect at the current page break}
7855@d split_first_mark_code=3 {the first mark found by \.{\\vsplit}}
7856@d split_bot_mark_code=4 {the last mark found by \.{\\vsplit}}
7857@d top_mark==cur_mark[top_mark_code]
7858@d first_mark==cur_mark[first_mark_code]
7859@d bot_mark==cur_mark[bot_mark_code]
7860@d split_first_mark==cur_mark[split_first_mark_code]
7861@d split_bot_mark==cur_mark[split_bot_mark_code]
7862
7863@<Glob...@>=
7864@!cur_mark:array[top_mark_code..split_bot_mark_code] of pointer;
7865  {token lists for marks}
7866
7867@ @<Set init...@>=
7868top_mark:=null; first_mark:=null; bot_mark:=null;
7869split_first_mark:=null; split_bot_mark:=null;
7870
7871@ @<Put each...@>=
7872primitive("topmark",top_bot_mark,top_mark_code);
7873@!@:top_mark_}{\.{\\topmark} primitive@>
7874primitive("firstmark",top_bot_mark,first_mark_code);
7875@!@:first_mark_}{\.{\\firstmark} primitive@>
7876primitive("botmark",top_bot_mark,bot_mark_code);
7877@!@:bot_mark_}{\.{\\botmark} primitive@>
7878primitive("splitfirstmark",top_bot_mark,split_first_mark_code);
7879@!@:split_first_mark_}{\.{\\splitfirstmark} primitive@>
7880primitive("splitbotmark",top_bot_mark,split_bot_mark_code);
7881@!@:split_bot_mark_}{\.{\\splitbotmark} primitive@>
7882
7883@ @<Cases of |print_cmd_chr|...@>=
7884top_bot_mark: case chr_code of
7885  first_mark_code: print_esc("firstmark");
7886  bot_mark_code: print_esc("botmark");
7887  split_first_mark_code: print_esc("splitfirstmark");
7888  split_bot_mark_code: print_esc("splitbotmark");
7889  othercases print_esc("topmark")
7890  endcases;
7891
7892@ The following code is activated when |cur_cmd=top_bot_mark| and
7893when |cur_chr| is a code like |top_mark_code|.
7894
7895@<Insert the \(a)appropriate mark text into the scanner@>=
7896begin if cur_mark[cur_chr]<>null then
7897  begin_token_list(cur_mark[cur_chr],mark_text);
7898end
7899
7900@ Now let's consider |macro_call| itself, which is invoked when \TeX\ is
7901scanning a control sequence whose |cur_cmd| is either |call|, |long_call|,
7902|outer_call|, or |long_outer_call|.  The control sequence definition
7903appears in the token list whose reference count is in location |cur_chr|
7904of |mem|.
7905
7906The global variable |long_state| will be set to |call| or to |long_call|,
7907depending on whether or not the control sequence disallows \.{\\par}
7908in its parameters. The |get_next| routine will set |long_state| to
7909|outer_call| and emit \.{\\par}, if a file ends or if an \.{\\outer}
7910control sequence occurs in the midst of an argument.
7911
7912@<Glob...@>=
7913@!long_state:call..long_outer_call; {governs the acceptance of \.{\\par}}
7914
7915@ The parameters, if any, must be scanned before the macro is expanded.
7916Parameters are token lists without reference counts. They are placed on
7917an auxiliary stack called |pstack| while they are being scanned, since
7918the |param_stack| may be losing entries during the matching process.
7919(Note that |param_stack| can't be gaining entries, since |macro_call| is
7920the only routine that puts anything onto |param_stack|, and it
7921is not recursive.)
7922
7923@<Glob...@>=
7924@!pstack:array[0..8] of pointer; {arguments supplied to a macro}
7925
7926@ After parameter scanning is complete, the parameters are moved to the
7927|param_stack|. Then the macro body is fed to the scanner; in other words,
7928|macro_call| places the defined text of the control sequence at the
7929top of\/ \TeX's input stack, so that |get_next| will proceed to read it
7930next.
7931
7932The global variable |cur_cs| contains the |eqtb| address of the control sequence
7933being expanded, when |macro_call| begins. If this control sequence has not been
7934declared \.{\\long}, i.e., if its command code in the |eq_type| field is
7935not |long_call| or |long_outer_call|, its parameters are not allowed to contain
7936the control sequence \.{\\par}. If an illegal \.{\\par} appears, the macro
7937call is aborted, and the \.{\\par} will be rescanned.
7938
7939@<Declare the procedure called |macro_call|@>=
7940procedure macro_call; {invokes a user-defined control sequence}
7941label exit, continue, done, done1, found;
7942var r:pointer; {current node in the macro's token list}
7943@!p:pointer; {current node in parameter token list being built}
7944@!q:pointer; {new node being put into the token list}
7945@!s:pointer; {backup pointer for parameter matching}
7946@!t:pointer; {cycle pointer for backup recovery}
7947@!u,@!v:pointer; {auxiliary pointers for backup recovery}
7948@!rbrace_ptr:pointer; {one step before the last |right_brace| token}
7949@!n:small_number; {the number of parameters scanned}
7950@!unbalance:halfword; {unmatched left braces in current parameter}
7951@!m:halfword; {the number of tokens or groups (usually)}
7952@!ref_count:pointer; {start of the token list}
7953@!save_scanner_status:small_number; {|scanner_status| upon entry}
7954@!save_warning_index:pointer; {|warning_index| upon entry}
7955@!match_chr:ASCII_code; {character used in parameter}
7956begin save_scanner_status:=scanner_status; save_warning_index:=warning_index;
7957warning_index:=cur_cs; ref_count:=cur_chr; r:=link(ref_count); n:=0;
7958if tracing_macros>0 then @<Show the text of the macro being expanded@>;
7959if info(r)<>end_match_token then
7960  @<Scan the parameters and make |link(r)| point to the macro body; but
7961    |return| if an illegal \.{\\par} is detected@>;
7962@<Feed the macro body and its parameters to the scanner@>;
7963exit:scanner_status:=save_scanner_status; warning_index:=save_warning_index;
7964end;
7965
7966@ Before we put a new token list on the input stack, it is wise to clean off
7967all token lists that have recently been depleted. Then a user macro that ends
7968with a call to itself will not require unbounded stack space.
7969
7970@<Feed the macro body and its parameters to the scanner@>=
7971while (state=token_list)and(loc=null)and(token_type<>v_template) do
7972  end_token_list; {conserve stack space}
7973begin_token_list(ref_count,macro); name:=warning_index; loc:=link(r);
7974if n>0 then
7975  begin if param_ptr+n>max_param_stack then
7976    begin max_param_stack:=param_ptr+n;
7977    if max_param_stack>param_size then
7978      overflow("parameter stack size",param_size);
7979@:TeX capacity exceeded parameter stack size}{\quad parameter stack size@>
7980    end;
7981  for m:=0 to n-1 do param_stack[param_ptr+m]:=pstack[m];
7982  param_ptr:=param_ptr+n;
7983  end
7984
7985@ At this point, the reader will find it advisable to review the explanation
7986of token list format that was presented earlier, since many aspects of that
7987format are of importance chiefly in the |macro_call| routine.
7988
7989The token list might begin with a string of compulsory tokens before the
7990first |match| or |end_match|. In that case the macro name is supposed to be
7991followed by those tokens; the following program will set |s=null| to
7992represent this restriction. Otherwise |s| will be set to the first token of
7993a string that will delimit the next parameter.
7994
7995@<Scan the parameters and make |link(r)| point to the macro body...@>=
7996begin scanner_status:=matching; unbalance:=0;
7997long_state:=eq_type(cur_cs);
7998if long_state>=outer_call then long_state:=long_state-2;
7999repeat link(temp_head):=null;
8000if (info(r)>match_token+255)or(info(r)<match_token) then s:=null
8001else  begin match_chr:=info(r)-match_token; s:=link(r); r:=s;
8002  p:=temp_head; m:=0;
8003  end;
8004@<Scan a parameter until its delimiter string has been found; or, if |s=null|,
8005  simply scan the delimiter string@>;@/
8006{now |info(r)| is a token whose command code is either |match| or |end_match|}
8007until info(r)=end_match_token;
8008end
8009
8010@ If |info(r)| is a |match| or |end_match| command, it cannot be equal to
8011any token found by |get_token|. Therefore an undelimited parameter---i.e.,
8012a |match| that is immediately followed by |match| or |end_match|---will
8013always fail the test `|cur_tok=info(r)|' in the following algorithm.
8014
8015@<Scan a parameter until its delimiter string has been found; or, ...@>=
8016continue: get_token; {set |cur_tok| to the next token of input}
8017if cur_tok=info(r) then
8018  @<Advance \(r)|r|; |goto found| if the parameter delimiter has been
8019    fully matched, otherwise |goto continue|@>;
8020@<Contribute the recently matched tokens to the current parameter, and
8021  |goto continue| if a partial match is still in effect;
8022  but abort if |s=null|@>;
8023if cur_tok=par_token then if long_state<>long_call then
8024  @<Report a runaway argument and abort@>;
8025if cur_tok<right_brace_limit then
8026  if cur_tok<left_brace_limit then
8027    @<Contribute an entire group to the current parameter@>
8028  else @<Report an extra right brace and |goto continue|@>
8029else @<Store the current token, but |goto continue| if it is
8030   a blank space that would become an undelimited parameter@>;
8031incr(m);
8032if info(r)>end_match_token then goto continue;
8033if info(r)<match_token then goto continue;
8034found: if s<>null then @<Tidy up the parameter just scanned, and tuck it away@>
8035
8036@ @<Store the current token, but |goto continue| if it is...@>=
8037begin if cur_tok=space_token then
8038  if info(r)<=end_match_token then
8039    if info(r)>=match_token then goto continue;
8040store_new_token(cur_tok);
8041end
8042
8043@ A slightly subtle point arises here: When the parameter delimiter ends
8044with `\.{\#\{}', the token list will have a left brace both before and
8045after the |end_match|\kern-.4pt. Only one of these should affect the
8046|align_state|, but both will be scanned, so we must make a correction.
8047
8048@<Advance \(r)|r|; |goto found| if the parameter delimiter has been fully...@>=
8049begin r:=link(r);
8050if (info(r)>=match_token)and(info(r)<=end_match_token) then
8051  begin if cur_tok<left_brace_limit then decr(align_state);
8052  goto found;
8053  end
8054else goto continue;
8055end
8056
8057@ @<Report an extra right brace and |goto continue|@>=
8058begin back_input; print_err("Argument of "); sprint_cs(warning_index);
8059@.Argument of \\x has...@>
8060print(" has an extra }");
8061help6("I've run across a `}' that doesn't seem to match anything.")@/
8062  ("For example, `\def\a#1{...}' and `\a}' would produce")@/
8063  ("this error. If you simply proceed now, the `\par' that")@/
8064  ("I've just inserted will cause me to report a runaway")@/
8065  ("argument that might be the root of the problem. But if")@/
8066  ("your `}' was spurious, just type `2' and it will go away.");
8067incr(align_state); long_state:=call; cur_tok:=par_token; ins_error;
8068goto continue;
8069end {a white lie; the \.{\\par} won't always trigger a runaway}
8070
8071@ If |long_state=outer_call|, a runaway argument has already been reported.
8072
8073@<Report a runaway argument and abort@>=
8074begin if long_state=call then
8075  begin runaway; print_err("Paragraph ended before ");
8076@.Paragraph ended before...@>
8077  sprint_cs(warning_index); print(" was complete");
8078  help3("I suspect you've forgotten a `}', causing me to apply this")@/
8079    ("control sequence to too much text. How can we recover?")@/
8080    ("My plan is to forget the whole thing and hope for the best.");
8081  back_error;
8082  end;
8083pstack[n]:=link(temp_head); align_state:=align_state-unbalance;
8084for m:=0 to n do flush_list(pstack[m]);
8085return;
8086end
8087
8088@ When the following code becomes active, we have matched tokens from |s| to
8089the predecessor of |r|, and we have found that |cur_tok<>info(r)|. An
8090interesting situation now presents itself: If the parameter is to be
8091delimited by a string such as `\.{ab}', and if we have scanned `\.{aa}',
8092we want to contribute one `\.a' to the current parameter and resume
8093looking for a `\.b'. The program must account for such partial matches and
8094for others that can be quite complex.  But most of the time we have |s=r|
8095and nothing needs to be done.
8096
8097Incidentally, it is possible for \.{\\par} tokens to sneak in to certain
8098parameters of non-\.{\\long} macros. For example, consider a case like
8099`\.{\\def\\a\#1\\par!\{...\}}' where the first \.{\\par} is not followed
8100by an exclamation point. In such situations it does not seem appropriate
8101to prohibit the \.{\\par}, so \TeX\ keeps quiet about this bending of
8102the rules.
8103
8104@<Contribute the recently matched tokens to the current parameter...@>=
8105if s<>r then
8106  if s=null then @<Report an improper use of the macro and abort@>
8107  else  begin t:=s;
8108    repeat store_new_token(info(t)); incr(m); u:=link(t); v:=s;
8109    loop@+  begin if u=r then
8110        if cur_tok<>info(v) then goto done
8111        else  begin r:=link(v); goto continue;
8112          end;
8113      if info(u)<>info(v) then goto done;
8114      u:=link(u); v:=link(v);
8115      end;
8116    done: t:=link(t);
8117    until t=r;
8118    r:=s; {at this point, no tokens are recently matched}
8119    end
8120
8121@ @<Report an improper use...@>=
8122begin print_err("Use of "); sprint_cs(warning_index);
8123@.Use of x doesn't match...@>
8124print(" doesn't match its definition");
8125help4("If you say, e.g., `\def\a1{...}', then you must always")@/
8126  ("put `1' after `\a', since control sequence names are")@/
8127  ("made up of letters only. The macro here has not been")@/
8128  ("followed by the required stuff, so I'm ignoring it.");
8129error; return;
8130end
8131
8132@ @<Contribute an entire group to the current parameter@>=
8133begin unbalance:=1;
8134@^inner loop@>
8135loop@+  begin fast_store_new_token(cur_tok); get_token;
8136  if cur_tok=par_token then if long_state<>long_call then
8137    @<Report a runaway argument and abort@>;
8138  if cur_tok<right_brace_limit then
8139    if cur_tok<left_brace_limit then incr(unbalance)
8140    else  begin decr(unbalance);
8141      if unbalance=0 then goto done1;
8142      end;
8143  end;
8144done1: rbrace_ptr:=p; store_new_token(cur_tok);
8145end
8146
8147@ If the parameter consists of a single group enclosed in braces, we must
8148strip off the enclosing braces. That's why |rbrace_ptr| was introduced.
8149
8150@<Tidy up the parameter just scanned, and tuck it away@>=
8151begin if (m=1)and(info(p)<right_brace_limit)and(p<>temp_head) then
8152  begin link(rbrace_ptr):=null; free_avail(p);
8153  p:=link(temp_head); pstack[n]:=link(p); free_avail(p);
8154  end
8155else pstack[n]:=link(temp_head);
8156incr(n);
8157if tracing_macros>0 then
8158  begin begin_diagnostic; print_nl(match_chr); print_int(n);
8159  print("<-"); show_token_list(pstack[n-1],null,1000);
8160  end_diagnostic(false);
8161  end;
8162end
8163
8164@ @<Show the text of the macro being expanded@>=
8165begin begin_diagnostic; print_ln; print_cs(warning_index);
8166token_show(ref_count); end_diagnostic(false);
8167end
8168
8169@* \[26] Basic scanning subroutines.
8170Let's turn now to some procedures that \TeX\ calls upon frequently to digest
8171certain kinds of patterns in the input. Most of these are quite simple;
8172some are quite elaborate. Almost all of the routines call |get_x_token|,
8173which can cause them to be invoked recursively.
8174@^stomach@>
8175@^recursion@>
8176
8177@ The |scan_left_brace| routine is called when a left brace is supposed to be
8178the next non-blank token. (The term ``left brace'' means, more precisely,
8179a character whose catcode is |left_brace|.) \TeX\ allows \.{\\relax} to
8180appear before the |left_brace|.
8181
8182@p procedure scan_left_brace; {reads a mandatory |left_brace|}
8183begin @<Get the next non-blank non-relax non-call token@>;
8184if cur_cmd<>left_brace then
8185  begin print_err("Missing { inserted");
8186@.Missing \{ inserted@>
8187  help4("A left brace was mandatory here, so I've put one in.")@/
8188    ("You might want to delete and/or insert some corrections")@/
8189    ("so that I will find a matching right brace soon.")@/
8190    ("(If you're confused by all this, try typing `I}' now.)");
8191  back_error; cur_tok:=left_brace_token+"{"; cur_cmd:=left_brace;
8192  cur_chr:="{"; incr(align_state);
8193  end;
8194end;
8195
8196@ @<Get the next non-blank non-relax non-call token@>=
8197repeat get_x_token;
8198until (cur_cmd<>spacer)and(cur_cmd<>relax)
8199
8200@ The |scan_optional_equals| routine looks for an optional `\.=' sign preceded
8201by optional spaces; `\.{\\relax}' is not ignored here.
8202
8203@p procedure scan_optional_equals;
8204begin  @<Get the next non-blank non-call token@>;
8205if cur_tok<>other_token+"=" then back_input;
8206end;
8207
8208@ @<Get the next non-blank non-call token@>=
8209repeat get_x_token;
8210until cur_cmd<>spacer
8211
8212@ In case you are getting bored, here is a slightly less trivial routine:
8213Given a string of lowercase letters, like `\.{pt}' or `\.{plus}' or
8214`\.{width}', the |scan_keyword| routine checks to see whether the next
8215tokens of input match this string. The match must be exact, except that
8216uppercase letters will match their lowercase counterparts; uppercase
8217equivalents are determined by subtracting |"a"-"A"|, rather than using the
8218|uc_code| table, since \TeX\ uses this routine only for its own limited
8219set of keywords.
8220
8221If a match is found, the characters are effectively removed from the input
8222and |true| is returned. Otherwise |false| is returned, and the input
8223is left essentially unchanged (except for the fact that some macros
8224may have been expanded, etc.).
8225@^inner loop@>
8226
8227@p function scan_keyword(@!s:str_number):boolean; {look for a given string}
8228label exit;
8229var p:pointer; {tail of the backup list}
8230@!q:pointer; {new node being added to the token list via |store_new_token|}
8231@!k:pool_pointer; {index into |str_pool|}
8232begin p:=backup_head; link(p):=null; k:=str_start[s];
8233while k<str_start[s+1] do
8234  begin get_x_token; {recursion is possible here}
8235@^recursion@>
8236  if (cur_cs=0)and@|
8237   ((cur_chr=so(str_pool[k]))or(cur_chr=so(str_pool[k])-"a"+"A")) then
8238    begin store_new_token(cur_tok); incr(k);
8239    end
8240  else if (cur_cmd<>spacer)or(p<>backup_head) then
8241    begin back_input;
8242    if p<>backup_head then back_list(link(backup_head));
8243    scan_keyword:=false; return;
8244    end;
8245  end;
8246flush_list(link(backup_head)); scan_keyword:=true;
8247exit:end;
8248
8249@ Here is a procedure that sounds an alarm when mu and non-mu units
8250are being switched.
8251
8252@p procedure mu_error;
8253begin print_err("Incompatible glue units");
8254@.Incompatible glue units@>
8255help1("I'm going to assume that 1mu=1pt when they're mixed.");
8256error;
8257end;
8258
8259@ The next routine `|scan_something_internal|' is used to fetch internal
8260numeric quantities like `\.{\\hsize}', and also to handle the `\.{\\the}'
8261when expanding constructions like `\.{\\the\\toks0}' and
8262`\.{\\the\\baselineskip}'. Soon we will be considering the |scan_int|
8263procedure, which calls |scan_something_internal|; on the other hand,
8264|scan_something_internal| also calls |scan_int|, for constructions like
8265`\.{\\catcode\`\\\$}' or `\.{\\fontdimen} \.3 \.{\\ff}'. So we
8266have to declare |scan_int| as a |forward| procedure. A few other
8267procedures are also declared at this point.
8268
8269@p procedure@?scan_int; forward; {scans an integer value}
8270@t\4\4@>@<Declare procedures that scan restricted classes of integers@>@;
8271@t\4\4@>@<Declare procedures that scan font-related stuff@>
8272
8273@ \TeX\ doesn't know exactly what to expect when |scan_something_internal|
8274begins.  For example, an integer or dimension or glue value could occur
8275immediately after `\.{\\hskip}'; and one can even say \.{\\the} with
8276respect to token lists in constructions like
8277`\.{\\xdef\\o\{\\the\\output\}}'.  On the other hand, only integers are
8278allowed after a construction like `\.{\\count}'. To handle the various
8279possibilities, |scan_something_internal| has a |level| parameter, which
8280tells the ``highest'' kind of quantity that |scan_something_internal| is
8281allowed to produce. Six levels are distinguished, namely |int_val|,
8282|dimen_val|, |glue_val|, |mu_val|, |ident_val|, and |tok_val|.
8283
8284The output of |scan_something_internal| (and of the other routines
8285|scan_int|, |scan_dimen|, and |scan_glue| below) is put into the global
8286variable |cur_val|, and its level is put into |cur_val_level|. The highest
8287values of |cur_val_level| are special: |mu_val| is used only when
8288|cur_val| points to something in a ``muskip'' register, or to one of the
8289three parameters \.{\\thinmuskip}, \.{\\medmuskip}, \.{\\thickmuskip};
8290|ident_val| is used only when |cur_val| points to a font identifier;
8291|tok_val| is used only when |cur_val| points to |null| or to the reference
8292count of a token list. The last two cases are allowed only when
8293|scan_something_internal| is called with |level=tok_val|.
8294
8295If the output is glue, |cur_val| will point to a glue specification, and
8296the reference count of that glue will have been updated to reflect this
8297reference; if the output is a nonempty token list, |cur_val| will point to
8298its reference count, but in this case the count will not have been updated.
8299Otherwise |cur_val| will contain the integer or scaled value in question.
8300
8301@d int_val=0 {integer values}
8302@d dimen_val=1 {dimension values}
8303@d glue_val=2 {glue specifications}
8304@d mu_val=3 {math glue specifications}
8305@d ident_val=4 {font identifier}
8306@d tok_val=5 {token lists}
8307
8308@<Glob...@>=
8309@!cur_val:integer; {value returned by numeric scanners}
8310@!cur_val_level:int_val..tok_val; {the ``level'' of this value}
8311
8312@ The hash table is initialized with `\.{\\count}', `\.{\\dimen}', `\.{\\skip}',
8313and `\.{\\muskip}' all having |register| as their command code; they are
8314distinguished by the |chr_code|, which is either |int_val|, |dimen_val|,
8315|glue_val|, or |mu_val|.
8316
8317@<Put each...@>=
8318primitive("count",register,int_val);
8319@!@:count_}{\.{\\count} primitive@>
8320primitive("dimen",register,dimen_val);
8321@!@:dimen_}{\.{\\dimen} primitive@>
8322primitive("skip",register,glue_val);
8323@!@:skip_}{\.{\\skip} primitive@>
8324primitive("muskip",register,mu_val);
8325@!@:mu_skip_}{\.{\\muskip} primitive@>
8326
8327@ @<Cases of |print_cmd_chr|...@>=
8328register: if chr_code=int_val then print_esc("count")
8329  else if chr_code=dimen_val then print_esc("dimen")
8330  else if chr_code=glue_val then print_esc("skip")
8331  else print_esc("muskip");
8332
8333@ OK, we're ready for |scan_something_internal| itself. A second parameter,
8334|negative|, is set |true| if the value that is found should be negated.
8335It is assumed that |cur_cmd| and |cur_chr| represent the first token of
8336the internal quantity to be scanned; an error will be signalled if
8337|cur_cmd<min_internal| or |cur_cmd>max_internal|.
8338
8339@d scanned_result_end(#)==cur_val_level:=#;@+end
8340@d scanned_result(#)==@+begin cur_val:=#;scanned_result_end
8341
8342@p procedure scan_something_internal(@!level:small_number;@!negative:boolean);
8343  {fetch an internal parameter}
8344var m:halfword; {|chr_code| part of the operand token}
8345@!p:0..nest_size; {index into |nest|}
8346begin m:=cur_chr;
8347case cur_cmd of
8348def_code: @<Fetch a character code from some table@>;
8349toks_register,assign_toks,def_family,set_font,def_font: @<Fetch a token list or
8350  font identifier, provided that |level=tok_val|@>;
8351assign_int: scanned_result(eqtb[m].int)(int_val);
8352assign_dimen: scanned_result(eqtb[m].sc)(dimen_val);
8353assign_glue: scanned_result(equiv(m))(glue_val);
8354assign_mu_glue: scanned_result(equiv(m))(mu_val);
8355set_aux: @<Fetch the |space_factor| or the |prev_depth|@>;
8356set_prev_graf: @<Fetch the |prev_graf|@>;
8357set_page_int:@<Fetch the |dead_cycles| or the |insert_penalties|@>;
8358set_page_dimen: @<Fetch something on the |page_so_far|@>;
8359set_shape: @<Fetch the |par_shape| size@>;
8360set_box_dimen: @<Fetch a box dimension@>;
8361char_given,math_given: scanned_result(cur_chr)(int_val);
8362assign_font_dimen: @<Fetch a font dimension@>;
8363assign_font_int: @<Fetch a font integer@>;
8364register: @<Fetch a register@>;
8365last_item: @<Fetch an item in the current node, if appropriate@>;
8366othercases @<Complain that \.{\\the} can't do this; give zero result@>
8367endcases;@/
8368while cur_val_level>level do @<Convert \(c)|cur_val| to a lower level@>;
8369@<Fix the reference count, if any, and negate |cur_val| if |negative|@>;
8370end;
8371
8372@ @<Fetch a character code from some table@>=
8373begin scan_char_num;
8374if m=math_code_base then scanned_result(ho(math_code(cur_val)))(int_val)
8375else if m<math_code_base then scanned_result(equiv(m+cur_val))(int_val)
8376else scanned_result(eqtb[m+cur_val].int)(int_val);
8377end
8378
8379@ @<Fetch a token list...@>=
8380if level<>tok_val then
8381  begin print_err("Missing number, treated as zero");
8382@.Missing number...@>
8383  help3("A number should have been here; I inserted `0'.")@/
8384    ("(If you can't figure out why I needed to see a number,")@/
8385    ("look up `weird error' in the index to The TeXbook.)");
8386@:TeXbook}{\sl The \TeX book@>
8387  back_error; scanned_result(0)(dimen_val);
8388  end
8389else if cur_cmd<=assign_toks then
8390  begin if cur_cmd<assign_toks then {|cur_cmd=toks_register|}
8391    begin scan_eight_bit_int; m:=toks_base+cur_val;
8392    end;
8393  scanned_result(equiv(m))(tok_val);
8394  end
8395else  begin back_input; scan_font_ident;
8396  scanned_result(font_id_base+cur_val)(ident_val);
8397  end
8398
8399@ Users refer to `\.{\\the\\spacefactor}' only in horizontal
8400mode, and to `\.{\\the\\prevdepth}' only in vertical mode; so we put the
8401associated mode in the modifier part of the |set_aux| command.
8402The |set_page_int| command has modifier 0 or 1, for `\.{\\deadcycles}' and
8403`\.{\\insertpenalties}', respectively. The |set_box_dimen| command is
8404modified by either |width_offset|, |height_offset|, or |depth_offset|.
8405And the |last_item| command is modified by either |int_val|, |dimen_val|,
8406|glue_val|, |input_line_no_code|, or |badness_code|.
8407
8408@d input_line_no_code=glue_val+1 {code for \.{\\inputlineno}}
8409@d badness_code=glue_val+2 {code for \.{\\badness}}
8410
8411@<Put each...@>=
8412primitive("spacefactor",set_aux,hmode);
8413@!@:space_factor_}{\.{\\spacefactor} primitive@>
8414primitive("prevdepth",set_aux,vmode);@/
8415@!@:prev_depth_}{\.{\\prevdepth} primitive@>
8416primitive("deadcycles",set_page_int,0);
8417@!@:dead_cycles_}{\.{\\deadcycles} primitive@>
8418primitive("insertpenalties",set_page_int,1);
8419@!@:insert_penalties_}{\.{\\insertpenalties} primitive@>
8420primitive("wd",set_box_dimen,width_offset);
8421@!@:wd_}{\.{\\wd} primitive@>
8422primitive("ht",set_box_dimen,height_offset);
8423@!@:ht_}{\.{\\ht} primitive@>
8424primitive("dp",set_box_dimen,depth_offset);
8425@!@:dp_}{\.{\\dp} primitive@>
8426primitive("lastpenalty",last_item,int_val);
8427@!@:last_penalty_}{\.{\\lastpenalty} primitive@>
8428primitive("lastkern",last_item,dimen_val);
8429@!@:last_kern_}{\.{\\lastkern} primitive@>
8430primitive("lastskip",last_item,glue_val);
8431@!@:last_skip_}{\.{\\lastskip} primitive@>
8432primitive("inputlineno",last_item,input_line_no_code);
8433@!@:input_line_no_}{\.{\\inputlineno} primitive@>
8434primitive("badness",last_item,badness_code);
8435@!@:badness_}{\.{\\badness} primitive@>
8436
8437@ @<Cases of |print_cmd_chr|...@>=
8438set_aux: if chr_code=vmode then print_esc("prevdepth")
8439@+else print_esc("spacefactor");
8440set_page_int: if chr_code=0 then print_esc("deadcycles")
8441@+else print_esc("insertpenalties");
8442set_box_dimen: if chr_code=width_offset then print_esc("wd")
8443else if chr_code=height_offset then print_esc("ht")
8444else print_esc("dp");
8445last_item: case chr_code of
8446  int_val: print_esc("lastpenalty");
8447  dimen_val: print_esc("lastkern");
8448  glue_val: print_esc("lastskip");
8449  input_line_no_code: print_esc("inputlineno");
8450  othercases print_esc("badness")
8451  endcases;
8452
8453@ @<Fetch the |space_factor| or the |prev_depth|@>=
8454if abs(mode)<>m then
8455  begin print_err("Improper "); print_cmd_chr(set_aux,m);
8456@.Improper \\spacefactor@>
8457@.Improper \\prevdepth@>
8458  help4("You can refer to \spacefactor only in horizontal mode;")@/
8459    ("you can refer to \prevdepth only in vertical mode; and")@/
8460    ("neither of these is meaningful inside \write. So")@/
8461    ("I'm forgetting what you said and using zero instead.");
8462  error;
8463  if level<>tok_val then scanned_result(0)(dimen_val)
8464  else scanned_result(0)(int_val);
8465  end
8466else if m=vmode then scanned_result(prev_depth)(dimen_val)
8467else scanned_result(space_factor)(int_val)
8468
8469@ @<Fetch the |dead_cycles| or the |insert_penalties|@>=
8470begin if m=0 then cur_val:=dead_cycles@+else cur_val:=insert_penalties;
8471cur_val_level:=int_val;
8472end
8473
8474@ @<Fetch a box dimension@>=
8475begin scan_eight_bit_int;
8476if box(cur_val)=null then cur_val:=0 @+else cur_val:=mem[box(cur_val)+m].sc;
8477cur_val_level:=dimen_val;
8478end
8479
8480@ Inside an \.{\\output} routine, a user may wish to look at the page totals
8481that were present at the moment when output was triggered.
8482
8483@d max_dimen==@'7777777777 {$2^{30}-1$}
8484
8485@<Fetch something on the |page_so_far|@>=
8486begin if (page_contents=empty) and (not output_active) then
8487  if m=0 then cur_val:=max_dimen@+else cur_val:=0
8488else cur_val:=page_so_far[m];
8489cur_val_level:=dimen_val;
8490end
8491
8492@ @<Fetch the |prev_graf|@>=
8493if mode=0 then scanned_result(0)(int_val) {|prev_graf=0| within \.{\\write}}
8494else begin nest[nest_ptr]:=cur_list; p:=nest_ptr;
8495  while abs(nest[p].mode_field)<>vmode do decr(p);
8496  scanned_result(nest[p].pg_field)(int_val);
8497  end
8498
8499@ @<Fetch the |par_shape| size@>=
8500begin if par_shape_ptr=null then cur_val:=0
8501else cur_val:=info(par_shape_ptr);
8502cur_val_level:=int_val;
8503end
8504
8505@ Here is where \.{\\lastpenalty}, \.{\\lastkern}, and \.{\\lastskip} are
8506implemented. The reference count for \.{\\lastskip} will be updated later.
8507
8508We also handle \.{\\inputlineno} and \.{\\badness} here, because they are
8509legal in similar contexts.
8510
8511@<Fetch an item in the current node...@>=
8512if cur_chr>glue_val then
8513  begin if cur_chr=input_line_no_code then cur_val:=line
8514  else cur_val:=last_badness; {|cur_chr=badness_code|}
8515  cur_val_level:=int_val;
8516  end
8517else begin if cur_chr=glue_val then cur_val:=zero_glue@+else cur_val:=0;
8518  cur_val_level:=cur_chr;
8519  if not is_char_node(tail)and(mode<>0) then
8520    case cur_chr of
8521    int_val: if type(tail)=penalty_node then cur_val:=penalty(tail);
8522    dimen_val: if type(tail)=kern_node then cur_val:=width(tail);
8523    glue_val: if type(tail)=glue_node then
8524      begin cur_val:=glue_ptr(tail);
8525      if subtype(tail)=mu_glue then cur_val_level:=mu_val;
8526      end;
8527    end {there are no other cases}
8528  else if (mode=vmode)and(tail=head) then
8529    case cur_chr of
8530    int_val: cur_val:=last_penalty;
8531    dimen_val: cur_val:=last_kern;
8532    glue_val: if last_glue<>max_halfword then cur_val:=last_glue;
8533    end; {there are no other cases}
8534  end
8535
8536@ @<Fetch a font dimension@>=
8537begin find_font_dimen(false); font_info[fmem_ptr].sc:=0;
8538scanned_result(font_info[cur_val].sc)(dimen_val);
8539end
8540
8541@ @<Fetch a font integer@>=
8542begin scan_font_ident;
8543if m=0 then scanned_result(hyphen_char[cur_val])(int_val)
8544else scanned_result(skew_char[cur_val])(int_val);
8545end
8546
8547@ @<Fetch a register@>=
8548begin scan_eight_bit_int;
8549case m of
8550int_val:cur_val:=count(cur_val);
8551dimen_val:cur_val:=dimen(cur_val);
8552glue_val: cur_val:=skip(cur_val);
8553mu_val: cur_val:=mu_skip(cur_val);
8554end; {there are no other cases}
8555cur_val_level:=m;
8556end
8557
8558@ @<Complain that \.{\\the} can't do this; give zero result@>=
8559begin print_err("You can't use `"); print_cmd_chr(cur_cmd,cur_chr);
8560@.You can't use x after ...@>
8561print("' after "); print_esc("the");
8562help1("I'm forgetting what you said and using zero instead.");
8563error;
8564if level<>tok_val then scanned_result(0)(dimen_val)
8565else scanned_result(0)(int_val);
8566end
8567
8568@ When a |glue_val| changes to a |dimen_val|, we use the width component
8569of the glue; there is no need to decrease the reference count, since it
8570has not yet been increased.  When a |dimen_val| changes to an |int_val|,
8571we use scaled points so that the value doesn't actually change. And when a
8572|mu_val| changes to a |glue_val|, the value doesn't change either.
8573
8574@<Convert \(c)|cur_val| to a lower level@>=
8575begin if cur_val_level=glue_val then cur_val:=width(cur_val)
8576else if cur_val_level=mu_val then mu_error;
8577decr(cur_val_level);
8578end
8579
8580@ If |cur_val| points to a glue specification at this point, the reference
8581count for the glue does not yet include the reference by |cur_val|.
8582If |negative| is |true|, |cur_val_level| is known to be |<=mu_val|.
8583
8584@<Fix the reference count, if any, ...@>=
8585if negative then
8586  if cur_val_level>=glue_val then
8587    begin cur_val:=new_spec(cur_val);
8588    @<Negate all three glue components of |cur_val|@>;
8589    end
8590  else negate(cur_val)
8591else if (cur_val_level>=glue_val)and(cur_val_level<=mu_val) then
8592  add_glue_ref(cur_val)
8593
8594@ @<Negate all three...@>=
8595begin negate(width(cur_val));
8596negate(stretch(cur_val));
8597negate(shrink(cur_val));
8598end
8599
8600@ Our next goal is to write the |scan_int| procedure, which scans anything that
8601\TeX\ treats as an integer. But first we might as well look at some simple
8602applications of |scan_int| that have already been made inside of
8603|scan_something_internal|.
8604
8605@ @<Declare procedures that scan restricted classes of integers@>=
8606procedure scan_eight_bit_int;
8607begin scan_int;
8608if (cur_val<0)or(cur_val>255) then
8609  begin print_err("Bad register code");
8610@.Bad register code@>
8611  help2("A register number must be between 0 and 255.")@/
8612    ("I changed this one to zero."); int_error(cur_val); cur_val:=0;
8613  end;
8614end;
8615
8616@ @<Declare procedures that scan restricted classes of integers@>=
8617procedure scan_char_num;
8618begin scan_int;
8619if (cur_val<0)or(cur_val>255) then
8620  begin print_err("Bad character code");
8621@.Bad character code@>
8622  help2("A character number must be between 0 and 255.")@/
8623    ("I changed this one to zero."); int_error(cur_val); cur_val:=0;
8624  end;
8625end;
8626
8627@ While we're at it, we might as well deal with similar routines that
8628will be needed later.
8629
8630@<Declare procedures that scan restricted classes of integers@>=
8631procedure scan_four_bit_int;
8632begin scan_int;
8633if (cur_val<0)or(cur_val>15) then
8634  begin print_err("Bad number");
8635@.Bad number@>
8636  help2("Since I expected to read a number between 0 and 15,")@/
8637    ("I changed this one to zero."); int_error(cur_val); cur_val:=0;
8638  end;
8639end;
8640
8641@ @<Declare procedures that scan restricted classes of integers@>=
8642procedure scan_fifteen_bit_int;
8643begin scan_int;
8644if (cur_val<0)or(cur_val>@'77777) then
8645  begin print_err("Bad mathchar");
8646@.Bad mathchar@>
8647  help2("A mathchar number must be between 0 and 32767.")@/
8648    ("I changed this one to zero."); int_error(cur_val); cur_val:=0;
8649  end;
8650end;
8651
8652@ @<Declare procedures that scan restricted classes of integers@>=
8653procedure scan_twenty_seven_bit_int;
8654begin scan_int;
8655if (cur_val<0)or(cur_val>@'777777777) then
8656  begin print_err("Bad delimiter code");
8657@.Bad delimiter code@>
8658  help2("A numeric delimiter code must be between 0 and 2^{27}-1.")@/
8659    ("I changed this one to zero."); int_error(cur_val); cur_val:=0;
8660  end;
8661end;
8662
8663@ An integer number can be preceded by any number of spaces and `\.+' or
8664`\.-' signs. Then comes either a decimal constant (i.e., radix 10), an
8665octal constant (i.e., radix 8, preceded by~\.\'), a hexadecimal constant
8666(radix 16, preceded by~\."), an alphabetic constant (preceded by~\.\`), or
8667an internal variable. After scanning is complete,
8668|cur_val| will contain the answer, which must be at most
8669$2^{31}-1=2147483647$ in absolute value. The value of |radix| is set to
867010, 8, or 16 in the cases of decimal, octal, or hexadecimal constants,
8671otherwise |radix| is set to zero. An optional space follows a constant.
8672
8673@d octal_token=other_token+"'" {apostrophe, indicates an octal constant}
8674@d hex_token=other_token+"""" {double quote, indicates a hex constant}
8675@d alpha_token=other_token+"`" {reverse apostrophe, precedes alpha constants}
8676@d point_token=other_token+"." {decimal point}
8677@d continental_point_token=other_token+"," {decimal point, Eurostyle}
8678
8679@<Glob...@>=
8680@!radix:small_number; {|scan_int| sets this to 8, 10, 16, or zero}
8681
8682@ We initialize the following global variables just in case |expand|
8683comes into action before any of the basic scanning routines has assigned
8684them a value.
8685
8686@<Set init...@>=
8687cur_val:=0; cur_val_level:=int_val; radix:=0; cur_order:=normal;
8688
8689@ The |scan_int| routine is used also to scan the integer part of a
8690fraction; for example, the `\.3' in `\.{3.14159}' will be found by
8691|scan_int|. The |scan_dimen| routine assumes that |cur_tok=point_token|
8692after the integer part of such a fraction has been scanned by |scan_int|,
8693and that the decimal point has been backed up to be scanned again.
8694
8695@p procedure scan_int; {sets |cur_val| to an integer}
8696label done;
8697var negative:boolean; {should the answer be negated?}
8698@!m:integer; {|@t$2^{31}$@> div radix|, the threshold of danger}
8699@!d:small_number; {the digit just scanned}
8700@!vacuous:boolean; {have no digits appeared?}
8701@!OK_so_far:boolean; {has an error message been issued?}
8702begin radix:=0; OK_so_far:=true;@/
8703@<Get the next non-blank non-sign token; set |negative| appropriately@>;
8704if cur_tok=alpha_token then @<Scan an alphabetic character code into |cur_val|@>
8705else if (cur_cmd>=min_internal)and(cur_cmd<=max_internal) then
8706  scan_something_internal(int_val,false)
8707else @<Scan a numeric constant@>;
8708if negative then negate(cur_val);
8709end;
8710
8711@ @<Get the next non-blank non-sign token...@>=
8712negative:=false;
8713repeat @<Get the next non-blank non-call token@>;
8714if cur_tok=other_token+"-" then
8715  begin negative := not negative; cur_tok:=other_token+"+";
8716  end;
8717until cur_tok<>other_token+"+"
8718
8719@ A space is ignored after an alphabetic character constant, so that
8720such constants behave like numeric ones.
8721
8722@<Scan an alphabetic character code into |cur_val|@>=
8723begin get_token; {suppress macro expansion}
8724if cur_tok<cs_token_flag then
8725  begin cur_val:=cur_chr;
8726  if cur_cmd<=right_brace then
8727    if cur_cmd=right_brace then incr(align_state)
8728    else decr(align_state);
8729  end
8730else if cur_tok<cs_token_flag+single_base then
8731  cur_val:=cur_tok-cs_token_flag-active_base
8732else cur_val:=cur_tok-cs_token_flag-single_base;
8733if cur_val>255 then
8734  begin print_err("Improper alphabetic constant");
8735@.Improper alphabetic constant@>
8736  help2("A one-character control sequence belongs after a ` mark.")@/
8737    ("So I'm essentially inserting \0 here.");
8738  cur_val:="0"; back_error;
8739  end
8740else @<Scan an optional space@>;
8741end
8742
8743@ @<Scan an optional space@>=
8744begin get_x_token; if cur_cmd<>spacer then back_input;
8745end
8746
8747@ @<Scan a numeric constant@>=
8748begin radix:=10; m:=214748364;
8749if cur_tok=octal_token then
8750  begin radix:=8; m:=@'2000000000; get_x_token;
8751  end
8752else if cur_tok=hex_token then
8753  begin radix:=16; m:=@'1000000000; get_x_token;
8754  end;
8755vacuous:=true; cur_val:=0;@/
8756@<Accumulate the constant until |cur_tok| is not a suitable digit@>;
8757if vacuous then @<Express astonishment that no number was here@>
8758else if cur_cmd<>spacer then back_input;
8759end
8760
8761@ @d infinity==@'17777777777 {the largest positive value that \TeX\ knows}
8762@d zero_token=other_token+"0" {zero, the smallest digit}
8763@d A_token=letter_token+"A" {the smallest special hex digit}
8764@d other_A_token=other_token+"A" {special hex digit of type |other_char|}
8765
8766@<Accumulate the constant...@>=
8767loop@+  begin if (cur_tok<zero_token+radix)and(cur_tok>=zero_token)and
8768    (cur_tok<=zero_token+9) then d:=cur_tok-zero_token
8769  else if radix=16 then
8770    if (cur_tok<=A_token+5)and(cur_tok>=A_token) then d:=cur_tok-A_token+10
8771    else if (cur_tok<=other_A_token+5)and(cur_tok>=other_A_token) then
8772      d:=cur_tok-other_A_token+10
8773    else goto done
8774  else goto done;
8775  vacuous:=false;
8776  if (cur_val>=m)and((cur_val>m)or(d>7)or(radix<>10)) then
8777    begin if OK_so_far then
8778      begin print_err("Number too big");
8779@.Number too big@>
8780      help2("I can only go up to 2147483647='17777777777=""7FFFFFFF,")@/
8781        ("so I'm using that number instead of yours.");
8782      error; cur_val:=infinity; OK_so_far:=false;
8783      end;
8784    end
8785  else cur_val:=cur_val*radix+d;
8786  get_x_token;
8787  end;
8788done:
8789
8790@ @<Express astonishment...@>=
8791begin print_err("Missing number, treated as zero");
8792@.Missing number...@>
8793help3("A number should have been here; I inserted `0'.")@/
8794  ("(If you can't figure out why I needed to see a number,")@/
8795  ("look up `weird error' in the index to The TeXbook.)");
8796@:TeXbook}{\sl The \TeX book@>
8797back_error;
8798end
8799
8800@ The |scan_dimen| routine is similar to |scan_int|, but it sets |cur_val| to
8801a |scaled| value, i.e., an integral number of sp. One of its main tasks
8802is therefore to interpret the abbreviations for various kinds of units and
8803to convert measurements to scaled points.
8804
8805There are three parameters: |mu| is |true| if the finite units must be
8806`\.{mu}', while |mu| is |false| if `\.{mu}' units are disallowed;
8807|inf| is |true| if the infinite units `\.{fil}', `\.{fill}', `\.{filll}'
8808are permitted; and |shortcut| is |true| if |cur_val| already contains
8809an integer and only the units need to be considered.
8810
8811The order of infinity that was found in the case of infinite glue is returned
8812in the global variable |cur_order|.
8813
8814@<Glob...@>=
8815@!cur_order:glue_ord; {order of infinity found by |scan_dimen|}
8816
8817@ Constructions like `\.{-\'77 pt}' are legal dimensions, so |scan_dimen|
8818may begin with |scan_int|. This explains why it is convenient to use
8819|scan_int| also for the integer part of a decimal fraction.
8820
8821Several branches of |scan_dimen| work with |cur_val| as an integer and
8822with an auxiliary fraction |f|, so that the actual quantity of interest is
8823$|cur_val|+|f|/2^{16}$. At the end of the routine, this ``unpacked''
8824representation is put into the single word |cur_val|, which suddenly
8825switches significance from |integer| to |scaled|.
8826
8827@d attach_fraction=88 {go here to pack |cur_val| and |f| into |cur_val|}
8828@d attach_sign=89 {go here when |cur_val| is correct except perhaps for sign}
8829@d scan_normal_dimen==scan_dimen(false,false,false)
8830
8831@p procedure scan_dimen(@!mu,@!inf,@!shortcut:boolean);
8832  {sets |cur_val| to a dimension}
8833label done, done1, done2, found, not_found, attach_fraction, attach_sign;
8834var negative:boolean; {should the answer be negated?}
8835@!f:integer; {numerator of a fraction whose denominator is $2^{16}$}
8836@<Local variables for dimension calculations@>@;
8837begin f:=0; arith_error:=false; cur_order:=normal; negative:=false;
8838if not shortcut then
8839  begin @<Get the next non-blank non-sign...@>;
8840  if (cur_cmd>=min_internal)and(cur_cmd<=max_internal) then
8841    @<Fetch an internal dimension and |goto attach_sign|,
8842      or fetch an internal integer@>
8843  else  begin back_input;
8844    if cur_tok=continental_point_token then cur_tok:=point_token;
8845    if cur_tok<>point_token then scan_int
8846    else  begin radix:=10; cur_val:=0;
8847      end;
8848    if cur_tok=continental_point_token then cur_tok:=point_token;
8849    if (radix=10)and(cur_tok=point_token) then @<Scan decimal fraction@>;
8850    end;
8851  end;
8852if cur_val<0 then {in this case |f=0|}
8853  begin negative := not negative; negate(cur_val);
8854  end;
8855@<Scan units and set |cur_val| to $x\cdot(|cur_val|+f/2^{16})$, where there
8856  are |x| sp per unit; |goto attach_sign| if the units are internal@>;
8857@<Scan an optional space@>;
8858attach_sign: if arith_error or(abs(cur_val)>=@'10000000000) then
8859  @<Report that this dimension is out of range@>;
8860if negative then negate(cur_val);
8861end;
8862
8863@ @<Fetch an internal dimension and |goto attach_sign|...@>=
8864if mu then
8865  begin scan_something_internal(mu_val,false);
8866  @<Coerce glue to a dimension@>;
8867  if cur_val_level=mu_val then goto attach_sign;
8868  if cur_val_level<>int_val then mu_error;
8869  end
8870else  begin scan_something_internal(dimen_val,false);
8871  if cur_val_level=dimen_val then goto attach_sign;
8872  end
8873
8874@ @<Local variables for dimension calculations@>=
8875@!num,@!denom:1..65536; {conversion ratio for the scanned units}
8876@!k,@!kk:small_number; {number of digits in a decimal fraction}
8877@!p,@!q:pointer; {top of decimal digit stack}
8878@!v:scaled; {an internal dimension}
8879@!save_cur_val:integer; {temporary storage of |cur_val|}
8880
8881@ The following code is executed when |scan_something_internal| was
8882called asking for |mu_val|, when we really wanted a ``mudimen'' instead
8883of ``muglue.''
8884
8885@<Coerce glue to a dimension@>=
8886if cur_val_level>=glue_val then
8887  begin v:=width(cur_val); delete_glue_ref(cur_val); cur_val:=v;
8888  end
8889
8890@ When the following code is executed, we have |cur_tok=point_token|, but this
8891token has been backed up using |back_input|; we must first discard it.
8892
8893It turns out that a decimal point all by itself is equivalent to `\.{0.0}'.
8894Let's hope people don't use that fact.
8895
8896@<Scan decimal fraction@>=
8897begin k:=0; p:=null; get_token; {|point_token| is being re-scanned}
8898loop@+  begin get_x_token;
8899  if (cur_tok>zero_token+9)or(cur_tok<zero_token) then goto done1;
8900  if k<17 then {digits for |k>=17| cannot affect the result}
8901    begin q:=get_avail; link(q):=p; info(q):=cur_tok-zero_token;
8902    p:=q; incr(k);
8903    end;
8904  end;
8905done1: for kk:=k downto 1 do
8906  begin dig[kk-1]:=info(p); q:=p; p:=link(p); free_avail(q);
8907  end;
8908f:=round_decimals(k);
8909if cur_cmd<>spacer then back_input;
8910end
8911
8912@ Now comes the harder part: At this point in the program, |cur_val| is a
8913nonnegative integer and $f/2^{16}$ is a nonnegative fraction less than 1;
8914we want to multiply the sum of these two quantities by the appropriate
8915factor, based on the specified units, in order to produce a |scaled|
8916result, and we want to do the calculation with fixed point arithmetic that
8917does not overflow.
8918
8919@<Scan units and set |cur_val| to $x\cdot(|cur_val|+f/2^{16})$...@>=
8920if inf then @<Scan for \(f)\.{fil} units; |goto attach_fraction| if found@>;
8921@<Scan for \(u)units that are internal dimensions;
8922  |goto attach_sign| with |cur_val| set if found@>;
8923if mu then @<Scan for \(m)\.{mu} units and |goto attach_fraction|@>;
8924if scan_keyword("true") then @<Adjust \(f)for the magnification ratio@>;
8925@.true@>
8926if scan_keyword("pt") then goto attach_fraction; {the easy case}
8927@.pt@>
8928@<Scan for \(a)all other units and adjust |cur_val| and |f| accordingly;
8929  |goto done| in the case of scaled points@>;
8930attach_fraction: if cur_val>=@'40000 then arith_error:=true
8931else cur_val:=cur_val*unity+f;
8932done:
8933
8934@ A specification like `\.{filllll}' or `\.{fill L L L}' will lead to two
8935error messages (one for each additional keyword \.{"l"}).
8936
8937@<Scan for \(f)\.{fil} units...@>=
8938if scan_keyword("fil") then
8939@.fil@>
8940  begin cur_order:=fil;
8941  while scan_keyword("l") do
8942    begin if cur_order=filll then
8943      begin print_err("Illegal unit of measure (");
8944@.Illegal unit of measure@>
8945      print("replaced by filll)");
8946      help1("I dddon't go any higher than filll."); error;
8947      end
8948    else incr(cur_order);
8949    end;
8950  goto attach_fraction;
8951  end
8952
8953@ @<Scan for \(u)units that are internal dimensions...@>=
8954save_cur_val:=cur_val;
8955@<Get the next non-blank non-call...@>;
8956if (cur_cmd<min_internal)or(cur_cmd>max_internal) then back_input
8957else  begin if mu then
8958    begin scan_something_internal(mu_val,false); @<Coerce glue...@>;
8959    if cur_val_level<>mu_val then mu_error;
8960    end
8961  else scan_something_internal(dimen_val,false);
8962  v:=cur_val; goto found;
8963  end;
8964if mu then goto not_found;
8965if scan_keyword("em") then v:=(@<The em width for |cur_font|@>)
8966@.em@>
8967else if scan_keyword("ex") then v:=(@<The x-height for |cur_font|@>)
8968@.ex@>
8969else goto not_found;
8970@<Scan an optional space@>;
8971found:cur_val:=nx_plus_y(save_cur_val,v,xn_over_d(v,f,@'200000));
8972goto attach_sign;
8973not_found:
8974
8975@ @<Scan for \(m)\.{mu} units and |goto attach_fraction|@>=
8976if scan_keyword("mu") then goto attach_fraction
8977@.mu@>
8978else  begin print_err("Illegal unit of measure ("); print("mu inserted)");
8979@.Illegal unit of measure@>
8980  help4("The unit of measurement in math glue must be mu.")@/
8981    ("To recover gracefully from this error, it's best to")@/
8982    ("delete the erroneous units; e.g., type `2' to delete")@/
8983    ("two letters. (See Chapter 27 of The TeXbook.)");
8984@:TeXbook}{\sl The \TeX book@>
8985  error; goto attach_fraction;
8986  end
8987
8988@ @<Adjust \(f)for the magnification ratio@>=
8989begin prepare_mag;
8990if mag<>1000 then
8991  begin cur_val:=xn_over_d(cur_val,1000,mag);
8992  f:=(1000*f+@'200000*remainder) div mag;
8993  cur_val:=cur_val+(f div @'200000); f:=f mod @'200000;
8994  end;
8995end
8996
8997@ The necessary conversion factors can all be specified exactly as
8998fractions whose numerator and denominator sum to 32768 or less.
8999According to the definitions here, $\rm2660\,dd\approx1000.33297\,mm$;
9000this agrees well with the value $\rm1000.333\,mm$ cited by Bosshard
9001@^Bosshard, Hans Rudolf@>
9002in {\sl Technische Grundlagen zur Satzherstellung\/} (Bern, 1980).
9003
9004@d set_conversion_end(#)== denom:=#; end
9005@d set_conversion(#)==@+begin num:=#; set_conversion_end
9006
9007@<Scan for \(a)all other units and adjust |cur_val| and |f|...@>=
9008if scan_keyword("in") then set_conversion(7227)(100)
9009@.in@>
9010else if scan_keyword("pc") then set_conversion(12)(1)
9011@.pc@>
9012else if scan_keyword("cm") then set_conversion(7227)(254)
9013@.cm@>
9014else if scan_keyword("mm") then set_conversion(7227)(2540)
9015@.mm@>
9016else if scan_keyword("bp") then set_conversion(7227)(7200)
9017@.bp@>
9018else if scan_keyword("dd") then set_conversion(1238)(1157)
9019@.dd@>
9020else if scan_keyword("cc") then set_conversion(14856)(1157)
9021@.cc@>
9022else if scan_keyword("sp") then goto done
9023@.sp@>
9024else @<Complain about unknown unit and |goto done2|@>;
9025cur_val:=xn_over_d(cur_val,num,denom);
9026f:=(num*f+@'200000*remainder) div denom;@/
9027cur_val:=cur_val+(f div @'200000); f:=f mod @'200000;
9028done2:
9029
9030@ @<Complain about unknown unit...@>=
9031begin print_err("Illegal unit of measure ("); print("pt inserted)");
9032@.Illegal unit of measure@>
9033help6("Dimensions can be in units of em, ex, in, pt, pc,")@/
9034  ("cm, mm, dd, cc, bp, or sp; but yours is a new one!")@/
9035  ("I'll assume that you meant to say pt, for printer's points.")@/
9036  ("To recover gracefully from this error, it's best to")@/
9037  ("delete the erroneous units; e.g., type `2' to delete")@/
9038  ("two letters. (See Chapter 27 of The TeXbook.)");
9039@:TeXbook}{\sl The \TeX book@>
9040error; goto done2;
9041end
9042
9043
9044@ @<Report that this dimension is out of range@>=
9045begin print_err("Dimension too large");
9046@.Dimension too large@>
9047help2("I can't work with sizes bigger than about 19 feet.")@/
9048  ("Continue and I'll use the largest value I can.");@/
9049error; cur_val:=max_dimen; arith_error:=false;
9050end
9051
9052@ The final member of \TeX's value-scanning trio is |scan_glue|, which
9053makes |cur_val| point to a glue specification. The reference count of that
9054glue spec will take account of the fact that |cur_val| is pointing to~it.
9055
9056The |level| parameter should be either |glue_val| or |mu_val|.
9057
9058Since |scan_dimen| was so much more complex than |scan_int|, we might expect
9059|scan_glue| to be even worse. But fortunately, it is very simple, since
9060most of the work has already been done.
9061
9062@p procedure scan_glue(@!level:small_number);
9063  {sets |cur_val| to a glue spec pointer}
9064label exit;
9065var negative:boolean; {should the answer be negated?}
9066@!q:pointer; {new glue specification}
9067@!mu:boolean; {does |level=mu_val|?}
9068begin mu:=(level=mu_val); @<Get the next non-blank non-sign...@>;
9069if (cur_cmd>=min_internal)and(cur_cmd<=max_internal) then
9070  begin scan_something_internal(level,negative);
9071  if cur_val_level>=glue_val then
9072    begin if cur_val_level<>level then mu_error;
9073    return;
9074    end;
9075  if cur_val_level=int_val then scan_dimen(mu,false,true)
9076  else if level=mu_val then mu_error;
9077  end
9078else  begin back_input; scan_dimen(mu,false,false);
9079  if negative then negate(cur_val);
9080  end;
9081@<Create a new glue specification whose width is |cur_val|; scan for its
9082  stretch and shrink components@>;
9083exit:end;
9084
9085@ @<Create a new glue specification whose width is |cur_val|...@>=
9086q:=new_spec(zero_glue); width(q):=cur_val;
9087if scan_keyword("plus") then
9088@.plus@>
9089  begin scan_dimen(mu,true,false);
9090  stretch(q):=cur_val; stretch_order(q):=cur_order;
9091  end;
9092if scan_keyword("minus") then
9093@.minus@>
9094  begin scan_dimen(mu,true,false);
9095  shrink(q):=cur_val; shrink_order(q):=cur_order;
9096  end;
9097cur_val:=q
9098
9099@ Here's a similar procedure that returns a pointer to a rule node. This
9100routine is called just after \TeX\ has seen \.{\\hrule} or \.{\\vrule};
9101therefore |cur_cmd| will be either |hrule| or |vrule|. The idea is to store
9102the default rule dimensions in the node, then to override them if
9103`\.{height}' or `\.{width}' or `\.{depth}' specifications are
9104found (in any order).
9105
9106@d default_rule=26214 {0.4\thinspace pt}
9107
9108@p function scan_rule_spec:pointer;
9109label reswitch;
9110var q:pointer; {the rule node being created}
9111begin q:=new_rule; {|width|, |depth|, and |height| all equal |null_flag| now}
9112if cur_cmd=vrule then width(q):=default_rule
9113else  begin height(q):=default_rule; depth(q):=0;
9114  end;
9115reswitch: if scan_keyword("width") then
9116@.width@>
9117  begin scan_normal_dimen; width(q):=cur_val; goto reswitch;
9118  end;
9119if scan_keyword("height") then
9120@.height@>
9121  begin scan_normal_dimen; height(q):=cur_val; goto reswitch;
9122  end;
9123if scan_keyword("depth") then
9124@.depth@>
9125  begin scan_normal_dimen; depth(q):=cur_val; goto reswitch;
9126  end;
9127scan_rule_spec:=q;
9128end;
9129
9130@* \[27] Building token lists.
9131The token lists for macros and for other things like \.{\\mark} and \.{\\output}
9132and \.{\\write} are produced by a procedure called |scan_toks|.
9133
9134Before we get into the details of |scan_toks|, let's consider a much
9135simpler task, that of converting the current string into a token list.
9136The |str_toks| function does this; it classifies spaces as type |spacer|
9137and everything else as type |other_char|.
9138
9139The token list created by |str_toks| begins at |link(temp_head)| and ends
9140at the value |p| that is returned. (If |p=temp_head|, the list is empty.)
9141
9142@p function str_toks(@!b:pool_pointer):pointer;
9143  {changes the string |str_pool[b..pool_ptr]| to a token list}
9144var p:pointer; {tail of the token list}
9145@!q:pointer; {new node being added to the token list via |store_new_token|}
9146@!t:halfword; {token being appended}
9147@!k:pool_pointer; {index into |str_pool|}
9148begin str_room(1);
9149p:=temp_head; link(p):=null; k:=b;
9150while k<pool_ptr do
9151  begin t:=so(str_pool[k]);
9152  if t=" " then t:=space_token
9153  else t:=other_token+t;
9154  fast_store_new_token(t);
9155  incr(k);
9156  end;
9157pool_ptr:=b; str_toks:=p;
9158end;
9159
9160@ The main reason for wanting |str_toks| is the next function,
9161|the_toks|, which has similar input/output characteristics.
9162
9163This procedure is supposed to scan something like `\.{\\skip\\count12}',
9164i.e., whatever can follow `\.{\\the}', and it constructs a token list
9165containing something like `\.{-3.0pt minus 0.5fill}'.
9166
9167@p function the_toks:pointer;
9168var old_setting:0..max_selector; {holds |selector| setting}
9169@!p,@!q,@!r:pointer; {used for copying a token list}
9170@!b:pool_pointer; {base of temporary string}
9171begin get_x_token; scan_something_internal(tok_val,false);
9172if cur_val_level>=ident_val then @<Copy the token list@>
9173else begin old_setting:=selector; selector:=new_string; b:=pool_ptr;
9174  case cur_val_level of
9175  int_val:print_int(cur_val);
9176  dimen_val:begin print_scaled(cur_val); print("pt");
9177    end;
9178  glue_val: begin print_spec(cur_val,"pt"); delete_glue_ref(cur_val);
9179    end;
9180  mu_val: begin print_spec(cur_val,"mu"); delete_glue_ref(cur_val);
9181    end;
9182  end; {there are no other cases}
9183  selector:=old_setting; the_toks:=str_toks(b);
9184  end;
9185end;
9186
9187@ @<Copy the token list@>=
9188begin p:=temp_head; link(p):=null;
9189if cur_val_level=ident_val then store_new_token(cs_token_flag+cur_val)
9190else if cur_val<>null then
9191  begin r:=link(cur_val); {do not copy the reference count}
9192  while r<>null do
9193    begin fast_store_new_token(info(r)); r:=link(r);
9194    end;
9195  end;
9196the_toks:=p;
9197end
9198
9199@ Here's part of the |expand| subroutine that we are now ready to complete:
9200
9201@p procedure ins_the_toks;
9202begin link(garbage):=the_toks; ins_list(link(temp_head));
9203end;
9204
9205@ The primitives \.{\\number}, \.{\\romannumeral}, \.{\\string}, \.{\\meaning},
9206\.{\\fontname}, and \.{\\jobname} are defined as follows.
9207
9208@d number_code=0 {command code for \.{\\number}}
9209@d roman_numeral_code=1 {command code for \.{\\romannumeral}}
9210@d string_code=2 {command code for \.{\\string}}
9211@d meaning_code=3 {command code for \.{\\meaning}}
9212@d font_name_code=4 {command code for \.{\\fontname}}
9213@d job_name_code=5 {command code for \.{\\jobname}}
9214
9215@<Put each...@>=
9216primitive("number",convert,number_code);@/
9217@!@:number_}{\.{\\number} primitive@>
9218primitive("romannumeral",convert,roman_numeral_code);@/
9219@!@:roman_numeral_}{\.{\\romannumeral} primitive@>
9220primitive("string",convert,string_code);@/
9221@!@:string_}{\.{\\string} primitive@>
9222primitive("meaning",convert,meaning_code);@/
9223@!@:meaning_}{\.{\\meaning} primitive@>
9224primitive("fontname",convert,font_name_code);@/
9225@!@:font_name_}{\.{\\fontname} primitive@>
9226primitive("jobname",convert,job_name_code);@/
9227@!@:job_name_}{\.{\\jobname} primitive@>
9228
9229@ @<Cases of |print_cmd_chr|...@>=
9230convert: case chr_code of
9231  number_code: print_esc("number");
9232  roman_numeral_code: print_esc("romannumeral");
9233  string_code: print_esc("string");
9234  meaning_code: print_esc("meaning");
9235  font_name_code: print_esc("fontname");
9236  othercases print_esc("jobname")
9237  endcases;
9238
9239@ The procedure |conv_toks| uses |str_toks| to insert the token list
9240for |convert| functions into the scanner; `\.{\\outer}' control sequences
9241are allowed to follow `\.{\\string}' and `\.{\\meaning}'.
9242
9243@p procedure conv_toks;
9244var old_setting:0..max_selector; {holds |selector| setting}
9245@!c:number_code..job_name_code; {desired type of conversion}
9246@!save_scanner_status:small_number; {|scanner_status| upon entry}
9247@!b:pool_pointer; {base of temporary string}
9248begin c:=cur_chr; @<Scan the argument for command |c|@>;
9249old_setting:=selector; selector:=new_string; b:=pool_ptr;
9250@<Print the result of command |c|@>;
9251selector:=old_setting; link(garbage):=str_toks(b); ins_list(link(temp_head));
9252end;
9253
9254@ @<Scan the argument for command |c|@>=
9255case c of
9256number_code,roman_numeral_code: scan_int;
9257string_code, meaning_code: begin save_scanner_status:=scanner_status;
9258  scanner_status:=normal; get_token; scanner_status:=save_scanner_status;
9259  end;
9260font_name_code: scan_font_ident;
9261job_name_code: if job_name=0 then open_log_file;
9262end {there are no other cases}
9263
9264@ @<Print the result of command |c|@>=
9265case c of
9266number_code: print_int(cur_val);
9267roman_numeral_code: print_roman_int(cur_val);
9268string_code:if cur_cs<>0 then sprint_cs(cur_cs)
9269  else print_char(cur_chr);
9270meaning_code: print_meaning;
9271font_name_code: begin print(font_name[cur_val]);
9272  if font_size[cur_val]<>font_dsize[cur_val] then
9273    begin print(" at "); print_scaled(font_size[cur_val]);
9274    print("pt");
9275    end;
9276  end;
9277job_name_code: print(job_name);
9278end {there are no other cases}
9279
9280@ Now we can't postpone the difficulties any longer; we must bravely tackle
9281|scan_toks|. This function returns a pointer to the tail of a new token
9282list, and it also makes |def_ref| point to the reference count at the
9283head of that list.
9284
9285There are two boolean parameters, |macro_def| and |xpand|. If |macro_def|
9286is true, the goal is to create the token list for a macro definition;
9287otherwise the goal is to create the token list for some other \TeX\
9288primitive: \.{\\mark}, \.{\\output}, \.{\\everypar}, \.{\\lowercase},
9289\.{\\uppercase}, \.{\\message}, \.{\\errmessage}, \.{\\write}, or
9290\.{\\special}. In the latter cases a left brace must be scanned next; this
9291left brace will not be part of the token list, nor will the matching right
9292brace that comes at the end. If |xpand| is false, the token list will
9293simply be copied from the input using |get_token|. Otherwise all expandable
9294tokens will be expanded until unexpandable tokens are left, except that
9295the results of expanding `\.{\\the}' are not expanded further.
9296If both |macro_def| and |xpand| are true, the expansion applies
9297only to the macro body (i.e., to the material following the first
9298|left_brace| character).
9299
9300The value of |cur_cs| when |scan_toks| begins should be the |eqtb|
9301address of the control sequence to display in ``runaway'' error
9302messages.
9303
9304@p function scan_toks(@!macro_def,@!xpand:boolean):pointer;
9305label found,done,done1,done2;
9306var t:halfword; {token representing the highest parameter number}
9307@!s:halfword; {saved token}
9308@!p:pointer; {tail of the token list being built}
9309@!q:pointer; {new node being added to the token list via |store_new_token|}
9310@!unbalance:halfword; {number of unmatched left braces}
9311@!hash_brace:halfword; {possible `\.{\#\{}' token}
9312begin if macro_def then scanner_status:=defining
9313@+else scanner_status:=absorbing;
9314warning_index:=cur_cs; def_ref:=get_avail; token_ref_count(def_ref):=null;
9315p:=def_ref; hash_brace:=0; t:=zero_token;
9316if macro_def then @<Scan and build the parameter part of the macro definition@>
9317else scan_left_brace; {remove the compulsory left brace}
9318@<Scan and build the body of the token list; |goto found| when finished@>;
9319found: scanner_status:=normal;
9320if hash_brace<>0 then store_new_token(hash_brace);
9321scan_toks:=p;
9322end;
9323
9324@ @<Scan and build the parameter part...@>=
9325begin loop begin get_token; {set |cur_cmd|, |cur_chr|, |cur_tok|}
9326  if cur_tok<right_brace_limit then goto done1;
9327  if cur_cmd=mac_param then
9328    @<If the next character is a parameter number, make |cur_tok|
9329      a |match| token; but if it is a left brace, store
9330      `|left_brace|, |end_match|', set |hash_brace|, and |goto done|@>;
9331  store_new_token(cur_tok);
9332  end;
9333done1: store_new_token(end_match_token);
9334if cur_cmd=right_brace then
9335  @<Express shock at the missing left brace; |goto found|@>;
9336done: end
9337
9338@ @<Express shock...@>=
9339begin print_err("Missing { inserted"); incr(align_state);
9340@.Missing \{ inserted@>
9341help2("Where was the left brace? You said something like `\def\a}',")@/
9342  ("which I'm going to interpret as `\def\a{}'."); error; goto found;
9343end
9344
9345@ @<If the next character is a parameter number...@>=
9346begin s:=match_token+cur_chr; get_token;
9347if cur_cmd=left_brace then
9348  begin hash_brace:=cur_tok;
9349  store_new_token(cur_tok); store_new_token(end_match_token);
9350  goto done;
9351  end;
9352if t=zero_token+9 then
9353  begin print_err("You already have nine parameters");
9354@.You already have nine...@>
9355  help1("I'm going to ignore the # sign you just used."); error;
9356  end
9357else  begin incr(t);
9358  if cur_tok<>t then
9359    begin print_err("Parameters must be numbered consecutively");
9360@.Parameters...consecutively@>
9361    help2("I've inserted the digit you should have used after the #.")@/
9362      ("Type `1' to delete what you did use."); back_error;
9363    end;
9364  cur_tok:=s;
9365  end;
9366end
9367
9368@ @<Scan and build the body of the token list; |goto found| when finished@>=
9369unbalance:=1;
9370loop@+  begin if xpand then @<Expand the next part of the input@>
9371  else get_token;
9372  if cur_tok<right_brace_limit then
9373    if cur_cmd<right_brace then incr(unbalance)
9374    else  begin decr(unbalance);
9375      if unbalance=0 then goto found;
9376      end
9377  else if cur_cmd=mac_param then
9378    if macro_def then @<Look for parameter number or \.{\#\#}@>;
9379  store_new_token(cur_tok);
9380  end
9381
9382@ Here we insert an entire token list created by |the_toks| without
9383expanding it further.
9384
9385@<Expand the next part of the input@>=
9386begin loop begin get_next;
9387  if cur_cmd<=max_command then goto done2;
9388  if cur_cmd<>the then expand
9389  else  begin q:=the_toks;
9390    if link(temp_head)<>null then
9391      begin link(p):=link(temp_head); p:=q;
9392      end;
9393    end;
9394  end;
9395done2: x_token
9396end
9397
9398@ @<Look for parameter number...@>=
9399begin s:=cur_tok;
9400if xpand then get_x_token else get_token;
9401if cur_cmd<>mac_param then
9402  if (cur_tok<=zero_token)or(cur_tok>t) then
9403    begin print_err("Illegal parameter number in definition of ");
9404@.Illegal parameter number...@>
9405    sprint_cs(warning_index);
9406    help3("You meant to type ## instead of #, right?")@/
9407    ("Or maybe a } was forgotten somewhere earlier, and things")@/
9408    ("are all screwed up? I'm going to assume that you meant ##.");
9409    back_error; cur_tok:=s;
9410    end
9411  else cur_tok:=out_param_token-"0"+cur_chr;
9412end
9413
9414@ Another way to create a token list is via the \.{\\read} command. The
9415sixteen files potentially usable for reading appear in the following
9416global variables. The value of |read_open[n]| will be |closed| if
9417stream number |n| has not been opened or if it has been fully read;
9418|just_open| if an \.{\\openin} but not a \.{\\read} has been done;
9419and |normal| if it is open and ready to read the next line.
9420
9421@d closed=2 {not open, or at end of file}
9422@d just_open=1 {newly opened, first line not yet read}
9423
9424@<Glob...@>=
9425@!read_file:array[0..15] of alpha_file; {used for \.{\\read}}
9426@!read_open:array[0..16] of normal..closed; {state of |read_file[n]|}
9427
9428@ @<Set init...@>=
9429for k:=0 to 16 do read_open[k]:=closed;
9430
9431@ The |read_toks| procedure constructs a token list like that for any
9432macro definition, and makes |cur_val| point to it. Parameter |r| points
9433to the control sequence that will receive this token list.
9434
9435@p procedure read_toks(@!n:integer;@!r:pointer);
9436label done;
9437var p:pointer; {tail of the token list}
9438@!q:pointer; {new node being added to the token list via |store_new_token|}
9439@!s:integer; {saved value of |align_state|}
9440@!m:small_number; {stream number}
9441begin scanner_status:=defining; warning_index:=r;
9442def_ref:=get_avail; token_ref_count(def_ref):=null;
9443p:=def_ref; {the reference count}
9444store_new_token(end_match_token);
9445if (n<0)or(n>15) then m:=16@+else m:=n;
9446s:=align_state; align_state:=1000000; {disable tab marks, etc.}
9447repeat @<Input and store tokens from the next line of the file@>;
9448until align_state=1000000;
9449cur_val:=def_ref; scanner_status:=normal; align_state:=s;
9450end;
9451
9452@ @<Input and store tokens from the next line of the file@>=
9453begin_file_reading; name:=m+1;
9454if read_open[m]=closed then @<Input for \.{\\read} from the terminal@>
9455else if read_open[m]=just_open then @<Input the first line of |read_file[m]|@>
9456else @<Input the next line of |read_file[m]|@>;
9457limit:=last;
9458if end_line_char_inactive then decr(limit)
9459else  buffer[limit]:=end_line_char;
9460first:=limit+1; loc:=start; state:=new_line;@/
9461loop@+  begin get_token;
9462  if cur_tok=0 then goto done;
9463    {|cur_cmd=cur_chr=0| will occur at the end of the line}
9464  if align_state<1000000 then {unmatched `\.\}' aborts the line}
9465    begin repeat get_token; until cur_tok=0;
9466    align_state:=1000000; goto done;
9467    end;
9468  store_new_token(cur_tok);
9469  end;
9470done: end_file_reading
9471
9472@ Here we input on-line into the |buffer| array, prompting the user explicitly
9473if |n>=0|.  The value of |n| is set negative so that additional prompts
9474will not be given in the case of multi-line input.
9475
9476@<Input for \.{\\read} from the terminal@>=
9477if interaction>nonstop_mode then
9478  if n<0 then prompt_input("")
9479  else  begin wake_up_terminal;
9480    print_ln; sprint_cs(r); prompt_input("="); n:=-1;
9481    end
9482else fatal_error("*** (cannot \read from terminal in nonstop modes)")
9483@.cannot \\read@>
9484
9485@ The first line of a file must be treated specially, since |input_ln|
9486must be told not to start with |get|.
9487@^system dependencies@>
9488
9489@<Input the first line of |read_file[m]|@>=
9490if input_ln(read_file[m],false) then read_open[m]:=normal
9491else  begin a_close(read_file[m]); read_open[m]:=closed;
9492  end
9493
9494@ An empty line is appended at the end of a |read_file|.
9495@^empty line at end of file@>
9496
9497@<Input the next line of |read_file[m]|@>=
9498begin if not input_ln(read_file[m],true) then
9499  begin a_close(read_file[m]); read_open[m]:=closed;
9500  if align_state<>1000000 then
9501    begin runaway;
9502    print_err("File ended within "); print_esc("read");
9503@.File ended within \\read@>
9504    help1("This \read has unbalanced braces.");
9505    align_state:=1000000; error;
9506    end;
9507  end;
9508end
9509
9510@* \[28] Conditional processing.
9511We consider now the way \TeX\ handles various kinds of \.{\\if} commands.
9512
9513@d if_char_code=0 { `\.{\\if}' }
9514@d if_cat_code=1 { `\.{\\ifcat}' }
9515@d if_int_code=2 { `\.{\\ifnum}' }
9516@d if_dim_code=3 { `\.{\\ifdim}' }
9517@d if_odd_code=4 { `\.{\\ifodd}' }
9518@d if_vmode_code=5 { `\.{\\ifvmode}' }
9519@d if_hmode_code=6 { `\.{\\ifhmode}' }
9520@d if_mmode_code=7 { `\.{\\ifmmode}' }
9521@d if_inner_code=8 { `\.{\\ifinner}' }
9522@d if_void_code=9 { `\.{\\ifvoid}' }
9523@d if_hbox_code=10 { `\.{\\ifhbox}' }
9524@d if_vbox_code=11 { `\.{\\ifvbox}' }
9525@d ifx_code=12 { `\.{\\ifx}' }
9526@d if_eof_code=13 { `\.{\\ifeof}' }
9527@d if_true_code=14 { `\.{\\iftrue}' }
9528@d if_false_code=15 { `\.{\\iffalse}' }
9529@d if_case_code=16 { `\.{\\ifcase}' }
9530
9531@<Put each...@>=
9532primitive("if",if_test,if_char_code);
9533@!@:if_char_}{\.{\\if} primitive@>
9534primitive("ifcat",if_test,if_cat_code);
9535@!@:if_cat_code_}{\.{\\ifcat} primitive@>
9536primitive("ifnum",if_test,if_int_code);
9537@!@:if_int_}{\.{\\ifnum} primitive@>
9538primitive("ifdim",if_test,if_dim_code);
9539@!@:if_dim_}{\.{\\ifdim} primitive@>
9540primitive("ifodd",if_test,if_odd_code);
9541@!@:if_odd_}{\.{\\ifodd} primitive@>
9542primitive("ifvmode",if_test,if_vmode_code);
9543@!@:if_vmode_}{\.{\\ifvmode} primitive@>
9544primitive("ifhmode",if_test,if_hmode_code);
9545@!@:if_hmode_}{\.{\\ifhmode} primitive@>
9546primitive("ifmmode",if_test,if_mmode_code);
9547@!@:if_mmode_}{\.{\\ifmmode} primitive@>
9548primitive("ifinner",if_test,if_inner_code);
9549@!@:if_inner_}{\.{\\ifinner} primitive@>
9550primitive("ifvoid",if_test,if_void_code);
9551@!@:if_void_}{\.{\\ifvoid} primitive@>
9552primitive("ifhbox",if_test,if_hbox_code);
9553@!@:if_hbox_}{\.{\\ifhbox} primitive@>
9554primitive("ifvbox",if_test,if_vbox_code);
9555@!@:if_vbox_}{\.{\\ifvbox} primitive@>
9556primitive("ifx",if_test,ifx_code);
9557@!@:ifx_}{\.{\\ifx} primitive@>
9558primitive("ifeof",if_test,if_eof_code);
9559@!@:if_eof_}{\.{\\ifeof} primitive@>
9560primitive("iftrue",if_test,if_true_code);
9561@!@:if_true_}{\.{\\iftrue} primitive@>
9562primitive("iffalse",if_test,if_false_code);
9563@!@:if_false_}{\.{\\iffalse} primitive@>
9564primitive("ifcase",if_test,if_case_code);
9565@!@:if_case_}{\.{\\ifcase} primitive@>
9566
9567@ @<Cases of |print_cmd_chr|...@>=
9568if_test: case chr_code of
9569  if_cat_code:print_esc("ifcat");
9570  if_int_code:print_esc("ifnum");
9571  if_dim_code:print_esc("ifdim");
9572  if_odd_code:print_esc("ifodd");
9573  if_vmode_code:print_esc("ifvmode");
9574  if_hmode_code:print_esc("ifhmode");
9575  if_mmode_code:print_esc("ifmmode");
9576  if_inner_code:print_esc("ifinner");
9577  if_void_code:print_esc("ifvoid");
9578  if_hbox_code:print_esc("ifhbox");
9579  if_vbox_code:print_esc("ifvbox");
9580  ifx_code:print_esc("ifx");
9581  if_eof_code:print_esc("ifeof");
9582  if_true_code:print_esc("iftrue");
9583  if_false_code:print_esc("iffalse");
9584  if_case_code:print_esc("ifcase");
9585  othercases print_esc("if")
9586  endcases;
9587
9588@ Conditions can be inside conditions, and this nesting has a stack
9589that is independent of the |save_stack|.
9590
9591Four global variables represent the top of the condition stack:
9592|cond_ptr| points to pushed-down entries, if any; |if_limit| specifies
9593the largest code of a |fi_or_else| command that is syntactically legal;
9594|cur_if| is the name of the current type of conditional; and |if_line|
9595is the line number at which it began.
9596
9597If no conditions are currently in progress, the condition stack has the
9598special state |cond_ptr=null|, |if_limit=normal|, |cur_if=0|, |if_line=0|.
9599Otherwise |cond_ptr| points to a two-word node; the |type|, |subtype|, and
9600|link| fields of the first word contain |if_limit|, |cur_if|, and
9601|cond_ptr| at the next level, and the second word contains the
9602corresponding |if_line|.
9603
9604@d if_node_size=2 {number of words in stack entry for conditionals}
9605@d if_line_field(#)==mem[#+1].int
9606@d if_code=1 {code for \.{\\if...} being evaluated}
9607@d fi_code=2 {code for \.{\\fi}}
9608@d else_code=3 {code for \.{\\else}}
9609@d or_code=4 {code for \.{\\or}}
9610
9611@<Glob...@>=
9612@!cond_ptr:pointer; {top of the condition stack}
9613@!if_limit:normal..or_code; {upper bound on |fi_or_else| codes}
9614@!cur_if:small_number; {type of conditional being worked on}
9615@!if_line:integer; {line where that conditional began}
9616
9617@ @<Set init...@>=
9618cond_ptr:=null; if_limit:=normal; cur_if:=0; if_line:=0;
9619
9620@ @<Put each...@>=
9621primitive("fi",fi_or_else,fi_code);
9622@!@:fi_}{\.{\\fi} primitive@>
9623text(frozen_fi):="fi"; eqtb[frozen_fi]:=eqtb[cur_val];
9624primitive("or",fi_or_else,or_code);
9625@!@:or_}{\.{\\or} primitive@>
9626primitive("else",fi_or_else,else_code);
9627@!@:else_}{\.{\\else} primitive@>
9628
9629@ @<Cases of |print_cmd_chr|...@>=
9630fi_or_else: if chr_code=fi_code then print_esc("fi")
9631  else if chr_code=or_code then print_esc("or")
9632  else print_esc("else");
9633
9634@ When we skip conditional text, we keep track of the line number
9635where skipping began, for use in error messages.
9636
9637@<Glob...@>=
9638@!skip_line:integer; {skipping began here}
9639
9640@ Here is a procedure that ignores text until coming to an \.{\\or},
9641\.{\\else}, or \.{\\fi} at level zero of $\.{\\if}\ldots\.{\\fi}$
9642nesting. After it has acted, |cur_chr| will indicate the token that
9643was found, but |cur_tok| will not be set (because this makes the
9644procedure run faster).
9645
9646@p procedure pass_text;
9647label done;
9648var l:integer; {level of $\.{\\if}\ldots\.{\\fi}$ nesting}
9649@!save_scanner_status:small_number; {|scanner_status| upon entry}
9650begin save_scanner_status:=scanner_status; scanner_status:=skipping; l:=0;
9651skip_line:=line;
9652loop@+  begin get_next;
9653  if cur_cmd=fi_or_else then
9654    begin if l=0 then goto done;
9655    if cur_chr=fi_code then decr(l);
9656    end
9657  else if cur_cmd=if_test then incr(l);
9658  end;
9659done: scanner_status:=save_scanner_status;
9660end;
9661
9662@ When we begin to process a new \.{\\if}, we set |if_limit:=if_code|; then
9663if\/ \.{\\or} or \.{\\else} or \.{\\fi} occurs before the current \.{\\if}
9664condition has been evaluated, \.{\\relax} will be inserted.
9665For example, a sequence of commands like `\.{\\ifvoid1\\else...\\fi}'
9666would otherwise require something after the `\.1'.
9667
9668@<Push the condition stack@>=
9669begin p:=get_node(if_node_size); link(p):=cond_ptr; type(p):=if_limit;
9670subtype(p):=cur_if; if_line_field(p):=if_line;
9671cond_ptr:=p; cur_if:=cur_chr; if_limit:=if_code; if_line:=line;
9672end
9673
9674@ @<Pop the condition stack@>=
9675begin p:=cond_ptr; if_line:=if_line_field(p);
9676cur_if:=subtype(p); if_limit:=type(p); cond_ptr:=link(p);
9677free_node(p,if_node_size);
9678end
9679
9680@ Here's a procedure that changes the |if_limit| code corresponding to
9681a given value of |cond_ptr|.
9682
9683@p procedure change_if_limit(@!l:small_number;@!p:pointer);
9684label exit;
9685var q:pointer;
9686begin if p=cond_ptr then if_limit:=l {that's the easy case}
9687else  begin q:=cond_ptr;
9688  loop@+  begin if q=null then confusion("if");
9689@:this can't happen if}{\quad if@>
9690    if link(q)=p then
9691      begin type(q):=l; return;
9692      end;
9693    q:=link(q);
9694    end;
9695  end;
9696exit:end;
9697
9698@ A condition is started when the |expand| procedure encounters
9699an |if_test| command; in that case |expand| reduces to |conditional|,
9700which is a recursive procedure.
9701@^recursion@>
9702
9703@p procedure conditional;
9704label exit,common_ending;
9705var b:boolean; {is the condition true?}
9706@!r:"<"..">"; {relation to be evaluated}
9707@!m,@!n:integer; {to be tested against the second operand}
9708@!p,@!q:pointer; {for traversing token lists in \.{\\ifx} tests}
9709@!save_scanner_status:small_number; {|scanner_status| upon entry}
9710@!save_cond_ptr:pointer; {|cond_ptr| corresponding to this conditional}
9711@!this_if:small_number; {type of this conditional}
9712begin @<Push the condition stack@>;@+save_cond_ptr:=cond_ptr;this_if:=cur_chr;@/
9713@<Either process \.{\\ifcase} or set |b| to the value of a boolean condition@>;
9714if tracing_commands>1 then @<Display the value of |b|@>;
9715if b then
9716  begin change_if_limit(else_code,save_cond_ptr);
9717  return; {wait for \.{\\else} or \.{\\fi}}
9718  end;
9719@<Skip to \.{\\else} or \.{\\fi}, then |goto common_ending|@>;
9720common_ending: if cur_chr=fi_code then @<Pop the condition stack@>
9721else if_limit:=fi_code; {wait for \.{\\fi}}
9722exit:end;
9723
9724@ In a construction like `\.{\\if\\iftrue abc\\else d\\fi}', the first
9725\.{\\else} that we come to after learning that the \.{\\if} is false is
9726not the \.{\\else} we're looking for. Hence the following curious
9727logic is needed.
9728
9729@ @<Skip to \.{\\else} or \.{\\fi}...@>=
9730loop@+  begin pass_text;
9731  if cond_ptr=save_cond_ptr then
9732    begin if cur_chr<>or_code then goto common_ending;
9733    print_err("Extra "); print_esc("or");
9734@.Extra \\or@>
9735    help1("I'm ignoring this; it doesn't match any \if.");
9736    error;
9737    end
9738  else if cur_chr=fi_code then @<Pop the condition stack@>;
9739  end
9740
9741@ @<Either process \.{\\ifcase} or set |b|...@>=
9742case this_if of
9743if_char_code, if_cat_code: @<Test if two characters match@>;
9744if_int_code, if_dim_code: @<Test relation between integers or dimensions@>;
9745if_odd_code: @<Test if an integer is odd@>;
9746if_vmode_code: b:=(abs(mode)=vmode);
9747if_hmode_code: b:=(abs(mode)=hmode);
9748if_mmode_code: b:=(abs(mode)=mmode);
9749if_inner_code: b:=(mode<0);
9750if_void_code, if_hbox_code, if_vbox_code: @<Test box register status@>;
9751ifx_code: @<Test if two tokens match@>;
9752if_eof_code: begin scan_four_bit_int; b:=(read_open[cur_val]=closed);
9753  end;
9754if_true_code: b:=true;
9755if_false_code: b:=false;
9756if_case_code: @<Select the appropriate case
9757  and |return| or |goto common_ending|@>;
9758end {there are no other cases}
9759
9760@ @<Display the value of |b|@>=
9761begin begin_diagnostic;
9762if b then print("{true}")@+else print("{false}");
9763end_diagnostic(false);
9764end
9765
9766@ Here we use the fact that |"<"|, |"="|, and |">"| are consecutive ASCII
9767codes.
9768@^ASCII code@>
9769
9770@<Test relation between integers or dimensions@>=
9771begin if this_if=if_int_code then scan_int@+else scan_normal_dimen;
9772n:=cur_val; @<Get the next non-blank non-call...@>;
9773if (cur_tok>=other_token+"<")and(cur_tok<=other_token+">") then
9774  r:=cur_tok-other_token
9775else  begin print_err("Missing = inserted for ");
9776@.Missing = inserted@>
9777  print_cmd_chr(if_test,this_if);
9778  help1("I was expecting to see `<', `=', or `>'. Didn't.");
9779  back_error; r:="=";
9780  end;
9781if this_if=if_int_code then scan_int@+else scan_normal_dimen;
9782case r of
9783"<": b:=(n<cur_val);
9784"=": b:=(n=cur_val);
9785">": b:=(n>cur_val);
9786end;
9787end
9788
9789@ @<Test if an integer is odd@>=
9790begin scan_int; b:=odd(cur_val);
9791end
9792
9793@ @<Test box register status@>=
9794begin scan_eight_bit_int; p:=box(cur_val);
9795if this_if=if_void_code then b:=(p=null)
9796else if p=null then b:=false
9797else if this_if=if_hbox_code then b:=(type(p)=hlist_node)
9798else b:=(type(p)=vlist_node);
9799end
9800
9801@ An active character will be treated as category 13 following
9802\.{\\if\\noexpand} or following \.{\\ifcat\\noexpand}. We use the fact that
9803active characters have the smallest tokens, among all control sequences.
9804
9805@d get_x_token_or_active_char==@t@>@;
9806  begin get_x_token;
9807  if cur_cmd=relax then if cur_chr=no_expand_flag then
9808    begin cur_cmd:=active_char;
9809    cur_chr:=cur_tok-cs_token_flag-active_base;
9810    end;
9811  end
9812
9813@<Test if two characters match@>=
9814begin get_x_token_or_active_char;
9815if (cur_cmd>active_char)or(cur_chr>255) then {not a character}
9816  begin m:=relax; n:=256;
9817  end
9818else  begin m:=cur_cmd; n:=cur_chr;
9819  end;
9820get_x_token_or_active_char;
9821if (cur_cmd>active_char)or(cur_chr>255) then
9822  begin cur_cmd:=relax; cur_chr:=256;
9823  end;
9824if this_if=if_char_code then b:=(n=cur_chr)@+else b:=(m=cur_cmd);
9825end
9826
9827@ Note that `\.{\\ifx}' will declare two macros different if one is \\{long}
9828or \\{outer} and the other isn't, even though the texts of the macros are
9829the same.
9830
9831We need to reset |scanner_status|, since \.{\\outer} control sequences
9832are allowed, but we might be scanning a macro definition or preamble.
9833
9834@<Test if two tokens match@>=
9835begin save_scanner_status:=scanner_status; scanner_status:=normal;
9836get_next; n:=cur_cs; p:=cur_cmd; q:=cur_chr;
9837get_next; if cur_cmd<>p then b:=false
9838else if cur_cmd<call then b:=(cur_chr=q)
9839else @<Test if two macro texts match@>;
9840scanner_status:=save_scanner_status;
9841end
9842
9843@ Note also that `\.{\\ifx}' decides that macros \.{\\a} and \.{\\b} are
9844different in examples like this:
9845$$\vbox{\halign{\.{#}\hfil&\qquad\.{#}\hfil\cr
9846  {}\\def\\a\{\\c\}&
9847  {}\\def\\c\{\}\cr
9848  {}\\def\\b\{\\d\}&
9849  {}\\def\\d\{\}\cr}}$$
9850
9851@<Test if two macro texts match@>=
9852begin p:=link(cur_chr); q:=link(equiv(n)); {omit reference counts}
9853if p=q then b:=true
9854else begin while (p<>null)and(q<>null) do
9855    if info(p)<>info(q) then p:=null
9856    else  begin p:=link(p); q:=link(q);
9857      end;
9858  b:=((p=null)and(q=null));
9859  end;
9860end
9861
9862@ @<Select the appropriate case and |return| or |goto common_ending|@>=
9863begin scan_int; n:=cur_val; {|n| is the number of cases to pass}
9864if tracing_commands>1 then
9865  begin begin_diagnostic; print("{case "); print_int(n); print_char("}");
9866  end_diagnostic(false);
9867  end;
9868while n<>0 do
9869  begin pass_text;
9870  if cond_ptr=save_cond_ptr then
9871    if cur_chr=or_code then decr(n)
9872    else goto common_ending
9873  else if cur_chr=fi_code then @<Pop the condition stack@>;
9874  end;
9875change_if_limit(or_code,save_cond_ptr);
9876return; {wait for \.{\\or}, \.{\\else}, or \.{\\fi}}
9877end
9878
9879@ The processing of conditionals is complete except for the following
9880code, which is actually part of |expand|. It comes into play when
9881\.{\\or}, \.{\\else}, or \.{\\fi} is scanned.
9882
9883@<Terminate the current conditional and skip to \.{\\fi}@>=
9884if cur_chr>if_limit then
9885  if if_limit=if_code then insert_relax {condition not yet evaluated}
9886  else  begin print_err("Extra "); print_cmd_chr(fi_or_else,cur_chr);
9887@.Extra \\or@>
9888@.Extra \\else@>
9889@.Extra \\fi@>
9890    help1("I'm ignoring this; it doesn't match any \if.");
9891    error;
9892    end
9893else  begin while cur_chr<>fi_code do pass_text; {skip to \.{\\fi}}
9894  @<Pop the condition stack@>;
9895  end
9896
9897@* \[29] File names.
9898It's time now to fret about file names.  Besides the fact that different
9899operating systems treat files in different ways, we must cope with the
9900fact that completely different naming conventions are used by different
9901groups of people. The following programs show what is required for one
9902particular operating system; similar routines for other systems are not
9903difficult to devise.
9904@^fingers@>
9905@^system dependencies@>
9906
9907\TeX\ assumes that a file name has three parts: the name proper; its
9908``extension''; and a ``file area'' where it is found in an external file
9909system.  The extension of an input file or a write file is assumed to be
9910`\.{.tex}' unless otherwise specified; it is `\.{.log}' on the
9911transcript file that records each run of \TeX; it is `\.{.tfm}' on the font
9912metric files that describe characters in the fonts \TeX\ uses; it is
9913`\.{.dvi}' on the output files that specify typesetting information; and it
9914is `\.{.fmt}' on the format files written by \.{INITEX} to initialize \TeX.
9915The file area can be arbitrary on input files, but files are usually
9916output to the user's current area.  If an input file cannot be
9917found on the specified area, \TeX\ will look for it on a special system
9918area; this special area is intended for commonly used input files like
9919\.{webmac.tex}.
9920
9921Simple uses of \TeX\ refer only to file names that have no explicit
9922extension or area. For example, a person usually says `\.{\\input} \.{paper}'
9923or `\.{\\font\\tenrm} \.= \.{helvetica}' instead of `\.{\\input}
9924\.{paper.new}' or `\.{\\font\\tenrm} \.= \.{<csd.knuth>test}'. Simple file
9925names are best, because they make the \TeX\ source files portable;
9926whenever a file name consists entirely of letters and digits, it should be
9927treated in the same way by all implementations of \TeX. However, users
9928need the ability to refer to other files in their environment, especially
9929when responding to error messages concerning unopenable files; therefore
9930we want to let them use the syntax that appears in their favorite
9931operating system.
9932
9933The following procedures don't allow spaces to be part of
9934file names; but some users seem to like names that are spaced-out.
9935System-dependent changes to allow such things should probably
9936be made with reluctance, and only when an entire file name that
9937includes spaces is ``quoted'' somehow.
9938
9939@ In order to isolate the system-dependent aspects of file names, the
9940@^system dependencies@>
9941system-independent parts of \TeX\ are expressed in terms
9942of three system-dependent
9943procedures called |begin_name|, |more_name|, and |end_name|. In
9944essence, if the user-specified characters of the file name are $c_1\ldots c_n$,
9945the system-independent driver program does the operations
9946$$|begin_name|;\,|more_name|(c_1);\,\ldots\,;\,|more_name|(c_n);
9947\,|end_name|.$$
9948These three procedures communicate with each other via global variables.
9949Afterwards the file name will appear in the string pool as three strings
9950called |cur_name|\penalty10000\hskip-.05em,
9951|cur_area|, and |cur_ext|; the latter two are null (i.e.,
9952|""|), unless they were explicitly specified by the user.
9953
9954Actually the situation is slightly more complicated, because \TeX\ needs
9955to know when the file name ends. The |more_name| routine is a function
9956(with side effects) that returns |true| on the calls |more_name|$(c_1)$,
9957\dots, |more_name|$(c_{n-1})$. The final call |more_name|$(c_n)$
9958returns |false|; or, it returns |true| and the token following $c_n$ is
9959something like `\.{\\hbox}' (i.e., not a character). In other words,
9960|more_name| is supposed to return |true| unless it is sure that the
9961file name has been completely scanned; and |end_name| is supposed to be able
9962to finish the assembly of |cur_name|, |cur_area|, and |cur_ext| regardless of
9963whether $|more_name|(c_n)$ returned |true| or |false|.
9964
9965@<Glob...@>=
9966@!cur_name:str_number; {name of file just scanned}
9967@!cur_area:str_number; {file area just scanned, or \.{""}}
9968@!cur_ext:str_number; {file extension just scanned, or \.{""}}
9969
9970@ The file names we shall deal with for illustrative purposes have the
9971following structure:  If the name contains `\.>' or `\.:', the file area
9972consists of all characters up to and including the final such character;
9973otherwise the file area is null.  If the remaining file name contains
9974`\..', the file extension consists of all such characters from the first
9975remaining `\..' to the end, otherwise the file extension is null.
9976@^system dependencies@>
9977
9978We can scan such file names easily by using two global variables that keep track
9979of the occurrences of area and extension delimiters:
9980
9981@<Glob...@>=
9982@!area_delimiter:pool_pointer; {the most recent `\.>' or `\.:', if any}
9983@!ext_delimiter:pool_pointer; {the relevant `\..', if any}
9984
9985@ Input files that can't be found in the user's area may appear in a standard
9986system area called |TEX_area|. Font metric files whose areas are not given
9987explicitly are assumed to appear in a standard system area called
9988|TEX_font_area|.  These system area names will, of course, vary from place
9989to place.
9990@^system dependencies@>
9991
9992@d TEX_area=="TeXinputs:"
9993@.TeXinputs@>
9994@d TEX_font_area=="TeXfonts:"
9995@.TeXfonts@>
9996
9997@ Here now is the first of the system-dependent routines for file name scanning.
9998@^system dependencies@>
9999
10000@p procedure begin_name;
10001begin area_delimiter:=0; ext_delimiter:=0;
10002end;
10003
10004@ And here's the second. The string pool might change as the file name is
10005being scanned, since a new \.{\\csname} might be entered; therefore we keep
10006|area_delimiter| and |ext_delimiter| relative to the beginning of the current
10007string, instead of assigning an absolute address like |pool_ptr| to them.
10008@^system dependencies@>
10009
10010@p function more_name(@!c:ASCII_code):boolean;
10011begin if c=" " then more_name:=false
10012else  begin str_room(1); append_char(c); {contribute |c| to the current string}
10013  if (c=">")or(c=":") then
10014    begin area_delimiter:=cur_length; ext_delimiter:=0;
10015    end
10016  else if (c=".")and(ext_delimiter=0) then ext_delimiter:=cur_length;
10017  more_name:=true;
10018  end;
10019end;
10020
10021@ The third.
10022@^system dependencies@>
10023
10024@p procedure end_name;
10025begin if str_ptr+3>max_strings then
10026  overflow("number of strings",max_strings-init_str_ptr);
10027@:TeX capacity exceeded number of strings}{\quad number of strings@>
10028if area_delimiter=0 then cur_area:=""
10029else  begin cur_area:=str_ptr;
10030  str_start[str_ptr+1]:=str_start[str_ptr]+area_delimiter; incr(str_ptr);
10031  end;
10032if ext_delimiter=0 then
10033  begin cur_ext:=""; cur_name:=make_string;
10034  end
10035else  begin cur_name:=str_ptr;
10036  str_start[str_ptr+1]:=str_start[str_ptr]+ext_delimiter-area_delimiter-1;
10037  incr(str_ptr); cur_ext:=make_string;
10038  end;
10039end;
10040
10041@ Conversely, here is a routine that takes three strings and prints a file
10042name that might have produced them. (The routine is system dependent, because
10043some operating systems put the file area last instead of first.)
10044@^system dependencies@>
10045
10046@<Basic printing...@>=
10047procedure print_file_name(@!n,@!a,@!e:integer);
10048begin slow_print(a); slow_print(n); slow_print(e);
10049end;
10050
10051@ Another system-dependent routine is needed to convert three internal
10052\TeX\ strings
10053into the |name_of_file| value that is used to open files. The present code
10054allows both lowercase and uppercase letters in the file name.
10055@^system dependencies@>
10056
10057@d append_to_name(#)==begin c:=#; incr(k);
10058  if k<=file_name_size then name_of_file[k]:=xchr[c];
10059  end
10060
10061@p procedure pack_file_name(@!n,@!a,@!e:str_number);
10062var k:integer; {number of positions filled in |name_of_file|}
10063@!c: ASCII_code; {character being packed}
10064@!j:pool_pointer; {index into |str_pool|}
10065begin k:=0;
10066for j:=str_start[a] to str_start[a+1]-1 do append_to_name(so(str_pool[j]));
10067for j:=str_start[n] to str_start[n+1]-1 do append_to_name(so(str_pool[j]));
10068for j:=str_start[e] to str_start[e+1]-1 do append_to_name(so(str_pool[j]));
10069if k<=file_name_size then name_length:=k@+else name_length:=file_name_size;
10070for k:=name_length+1 to file_name_size do name_of_file[k]:=' ';
10071end;
10072
10073@ A messier routine is also needed, since format file names must be scanned
10074before \TeX's string mechanism has been initialized. We shall use the
10075global variable |TEX_format_default| to supply the text for default system areas
10076and extensions related to format files.
10077@^system dependencies@>
10078
10079@d format_default_length=20 {length of the |TEX_format_default| string}
10080@d format_area_length=11 {length of its area part}
10081@d format_ext_length=4 {length of its `\.{.fmt}' part}
10082@d format_extension=".fmt" {the extension, as a \.{WEB} constant}
10083
10084@<Glob...@>=
10085@!TEX_format_default:packed array[1..format_default_length] of char;
10086
10087@ @<Set init...@>=
10088TEX_format_default:='TeXformats:plain.fmt';
10089@.TeXformats@>
10090@.plain@>
10091@^system dependencies@>
10092
10093@ @<Check the ``constant'' values for consistency@>=
10094if format_default_length>file_name_size then bad:=31;
10095
10096@ Here is the messy routine that was just mentioned. It sets |name_of_file|
10097from the first |n| characters of |TEX_format_default|, followed by
10098|buffer[a..b]|, followed by the last |format_ext_length| characters of
10099|TEX_format_default|.
10100
10101We dare not give error messages here, since \TeX\ calls this routine before
10102the |error| routine is ready to roll. Instead, we simply drop excess characters,
10103since the error will be detected in another way when a strange file name
10104isn't found.
10105@^system dependencies@>
10106
10107@p procedure pack_buffered_name(@!n:small_number;@!a,@!b:integer);
10108var k:integer; {number of positions filled in |name_of_file|}
10109@!c: ASCII_code; {character being packed}
10110@!j:integer; {index into |buffer| or |TEX_format_default|}
10111begin if n+b-a+1+format_ext_length>file_name_size then
10112  b:=a+file_name_size-n-1-format_ext_length;
10113k:=0;
10114for j:=1 to n do append_to_name(xord[TEX_format_default[j]]);
10115for j:=a to b do append_to_name(buffer[j]);
10116for j:=format_default_length-format_ext_length+1 to format_default_length do
10117  append_to_name(xord[TEX_format_default[j]]);
10118if k<=file_name_size then name_length:=k@+else name_length:=file_name_size;
10119for k:=name_length+1 to file_name_size do name_of_file[k]:=' ';
10120end;
10121
10122@ Here is the only place we use |pack_buffered_name|. This part of the program
10123becomes active when a ``virgin'' \TeX\ is trying to get going, just after
10124the preliminary initialization, or when the user is substituting another
10125format file by typing `\.\&' after the initial `\.{**}' prompt.  The buffer
10126contains the first line of input in |buffer[loc..(last-1)]|, where
10127|loc<last| and |buffer[loc]<>" "|.
10128
10129@<Declare the function called |open_fmt_file|@>=
10130function open_fmt_file:boolean;
10131label found,exit;
10132var j:0..buf_size; {the first space after the format file name}
10133begin j:=loc;
10134if buffer[loc]="&" then
10135  begin incr(loc); j:=loc; buffer[last]:=" ";
10136  while buffer[j]<>" " do incr(j);
10137  pack_buffered_name(0,loc,j-1); {try first without the system file area}
10138  if w_open_in(fmt_file) then goto found;
10139  pack_buffered_name(format_area_length,loc,j-1);
10140    {now try the system format file area}
10141  if w_open_in(fmt_file) then goto found;
10142  wake_up_terminal;
10143  wterm_ln('Sorry, I can''t find that format;',' will try PLAIN.');
10144@.Sorry, I can't find...@>
10145  update_terminal;
10146  end;
10147  {now pull out all the stops: try for the system \.{plain} file}
10148pack_buffered_name(format_default_length-format_ext_length,1,0);
10149if not w_open_in(fmt_file) then
10150  begin wake_up_terminal;
10151  wterm_ln('I can''t find the PLAIN format file!');
10152@.I can't find PLAIN...@>
10153@.plain@>
10154  open_fmt_file:=false; return;
10155  end;
10156found:loc:=j; open_fmt_file:=true;
10157exit:end;
10158
10159@ Operating systems often make it possible to determine the exact name (and
10160possible version number) of a file that has been opened. The following routine,
10161which simply makes a \TeX\ string from the value of |name_of_file|, should
10162ideally be changed to deduce the full name of file~|f|, which is the file
10163most recently opened, if it is possible to do this in a \PASCAL\ program.
10164@^system dependencies@>
10165
10166This routine might be called after string memory has overflowed, hence
10167we dare not use `|str_room|'.
10168
10169@p function make_name_string:str_number;
10170var k:1..file_name_size; {index into |name_of_file|}
10171begin if (pool_ptr+name_length>pool_size)or(str_ptr=max_strings)or
10172 (cur_length>0) then
10173  make_name_string:="?"
10174else  begin for k:=1 to name_length do append_char(xord[name_of_file[k]]);
10175  make_name_string:=make_string;
10176  end;
10177end;
10178function a_make_name_string(var f:alpha_file):str_number;
10179begin a_make_name_string:=make_name_string;
10180end;
10181function b_make_name_string(var f:byte_file):str_number;
10182begin b_make_name_string:=make_name_string;
10183end;
10184function w_make_name_string(var f:word_file):str_number;
10185begin w_make_name_string:=make_name_string;
10186end;
10187
10188@ Now let's consider the ``driver''
10189routines by which \TeX\ deals with file names
10190in a system-independent manner.  First comes a procedure that looks for a
10191file name in the input by calling |get_x_token| for the information.
10192
10193@p procedure scan_file_name;
10194label done;
10195begin name_in_progress:=true; begin_name;
10196@<Get the next non-blank non-call...@>;
10197loop@+begin if (cur_cmd>other_char)or(cur_chr>255) then {not a character}
10198    begin back_input; goto done;
10199    end;
10200  if not more_name(cur_chr) then goto done;
10201  get_x_token;
10202  end;
10203done: end_name; name_in_progress:=false;
10204end;
10205
10206@ The global variable |name_in_progress| is used to prevent recursive
10207use of |scan_file_name|, since the |begin_name| and other procedures
10208communicate via global variables. Recursion would arise only by
10209devious tricks like `\.{\\input\\input f}'; such attempts at sabotage
10210must be thwarted. Furthermore, |name_in_progress| prevents \.{\\input}
10211@^recursion@>
10212from being initiated when a font size specification is being scanned.
10213
10214Another global variable, |job_name|, contains the file name that was first
10215\.{\\input} by the user. This name is extended by `\.{.log}' and `\.{.dvi}'
10216and `\.{.fmt}' in the names of \TeX's output files.
10217
10218@<Glob...@>=
10219@!name_in_progress:boolean; {is a file name being scanned?}
10220@!job_name:str_number; {principal file name}
10221@!log_opened:boolean; {has the transcript file been opened?}
10222
10223@ Initially |job_name=0|; it becomes nonzero as soon as the true name is known.
10224We have |job_name=0| if and only if the `\.{log}' file has not been opened,
10225except of course for a short time just after |job_name| has become nonzero.
10226
10227@<Initialize the output...@>=
10228job_name:=0; name_in_progress:=false; log_opened:=false;
10229
10230@ Here is a routine that manufactures the output file names, assuming that
10231|job_name<>0|. It ignores and changes the current settings of |cur_area|
10232and |cur_ext|.
10233
10234@d pack_cur_name==pack_file_name(cur_name,cur_area,cur_ext)
10235
10236@p procedure pack_job_name(@!s:str_number); {|s = ".log"|, |".dvi"|, or
10237  |format_extension|}
10238begin cur_area:=""; cur_ext:=s;
10239cur_name:=job_name; pack_cur_name;
10240end;
10241
10242@ If some trouble arises when \TeX\ tries to open a file, the following
10243routine calls upon the user to supply another file name. Parameter~|s|
10244is used in the error message to identify the type of file; parameter~|e|
10245is the default extension if none is given. Upon exit from the routine,
10246variables |cur_name|, |cur_area|, |cur_ext|, and |name_of_file| are
10247ready for another attempt at file opening.
10248
10249@p procedure prompt_file_name(@!s,@!e:str_number);
10250label done;
10251var k:0..buf_size; {index into |buffer|}
10252begin if interaction=scroll_mode then wake_up_terminal;
10253if s="input file name" then print_err("I can't find file `")
10254@.I can't find file x@>
10255else print_err("I can't write on file `");
10256@.I can't write on file x@>
10257print_file_name(cur_name,cur_area,cur_ext); print("'.");
10258if e=".tex" then show_context;
10259print_nl("Please type another "); print(s);
10260@.Please type...@>
10261if interaction<scroll_mode then
10262  fatal_error("*** (job aborted, file error in nonstop mode)");
10263@.job aborted, file error...@>
10264clear_terminal; prompt_input(": "); @<Scan file name in the buffer@>;
10265if cur_ext="" then cur_ext:=e;
10266pack_cur_name;
10267end;
10268
10269@ @<Scan file name in the buffer@>=
10270begin begin_name; k:=first;
10271while (buffer[k]=" ")and(k<last) do incr(k);
10272loop@+  begin if k=last then goto done;
10273  if not more_name(buffer[k]) then goto done;
10274  incr(k);
10275  end;
10276done:end_name;
10277end
10278
10279@ Here's an example of how these conventions are used. Whenever it is time to
10280ship out a box of stuff, we shall use the macro |ensure_dvi_open|.
10281
10282@d ensure_dvi_open==if output_file_name=0 then
10283  begin if job_name=0 then open_log_file;
10284  pack_job_name(".dvi");
10285  while not b_open_out(dvi_file) do
10286    prompt_file_name("file name for output",".dvi");
10287  output_file_name:=b_make_name_string(dvi_file);
10288  end
10289
10290@<Glob...@>=
10291@!dvi_file: byte_file; {the device-independent output goes here}
10292@!output_file_name: str_number; {full name of the output file}
10293@!log_name:str_number; {full name of the log file}
10294
10295@ @<Initialize the output...@>=output_file_name:=0;
10296
10297@ The |open_log_file| routine is used to open the transcript file and to help
10298it catch up to what has previously been printed on the terminal.
10299
10300@p procedure open_log_file;
10301var old_setting:0..max_selector; {previous |selector| setting}
10302@!k:0..buf_size; {index into |months| and |buffer|}
10303@!l:0..buf_size; {end of first input line}
10304@!months:packed array [1..36] of char; {abbreviations of month names}
10305begin old_setting:=selector;
10306if job_name=0 then job_name:="texput";
10307@.texput@>
10308pack_job_name(".log");
10309while not a_open_out(log_file) do @<Try to get a different log file name@>;
10310log_name:=a_make_name_string(log_file);
10311selector:=log_only; log_opened:=true;
10312@<Print the banner line, including the date and time@>;
10313input_stack[input_ptr]:=cur_input; {make sure bottom level is in memory}
10314print_nl("**");
10315@.**@>
10316l:=input_stack[0].limit_field; {last position of first line}
10317if buffer[l]=end_line_char then decr(l);
10318for k:=1 to l do print(buffer[k]);
10319print_ln; {now the transcript file contains the first line of input}
10320selector:=old_setting+2; {|log_only| or |term_and_log|}
10321end;
10322
10323@ Sometimes |open_log_file| is called at awkward moments when \TeX\ is
10324unable to print error messages or even to |show_context|.
10325The |prompt_file_name| routine can result in a |fatal_error|, but the |error|
10326routine will not be invoked because |log_opened| will be false.
10327
10328The normal idea of |batch_mode| is that nothing at all should be written
10329on the terminal. However, in the unusual case that
10330no log file could be opened, we make an exception and allow
10331an explanatory message to be seen.
10332
10333Incidentally, the program always refers to the log file as a `\.{transcript
10334file}', because some systems cannot use the extension `\.{.log}' for
10335this file.
10336
10337@<Try to get a different log file name@>=
10338begin selector:=term_only;
10339prompt_file_name("transcript file name",".log");
10340end
10341
10342@ @<Print the banner...@>=
10343begin wlog(banner);
10344slow_print(format_ident); print("  ");
10345print_int(day); print_char(" ");
10346months:='JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC';
10347for k:=3*month-2 to 3*month do wlog(months[k]);
10348print_char(" "); print_int(year); print_char(" ");
10349print_two(time div 60); print_char(":"); print_two(time mod 60);
10350end
10351
10352@ Let's turn now to the procedure that is used to initiate file reading
10353when an `\.{\\input}' command is being processed.
10354
10355@p procedure start_input; {\TeX\ will \.{\\input} something}
10356label done;
10357begin scan_file_name; {set |cur_name| to desired file name}
10358if cur_ext="" then cur_ext:=".tex";
10359pack_cur_name;
10360loop@+  begin begin_file_reading; {set up |cur_file| and new level of input}
10361  if a_open_in(cur_file) then goto done;
10362  if cur_area="" then
10363    begin pack_file_name(cur_name,TEX_area,cur_ext);
10364    if a_open_in(cur_file) then goto done;
10365    end;
10366  end_file_reading; {remove the level that didn't work}
10367  prompt_file_name("input file name",".tex");
10368  end;
10369done: name:=a_make_name_string(cur_file);
10370if job_name=0 then
10371  begin job_name:=cur_name; open_log_file;
10372  end; {|open_log_file| doesn't |show_context|, so |limit|
10373    and |loc| needn't be set to meaningful values yet}
10374if term_offset+length(name)>max_print_line-2 then print_ln
10375else if (term_offset>0)or(file_offset>0) then print_char(" ");
10376print_char("("); incr(open_parens); slow_print(name); update_terminal;
10377state:=new_line;
10378if name=str_ptr-1 then {we can conserve string pool space now}
10379  begin flush_string; name:=cur_name;
10380  end;
10381@<Read the first line of the new file@>;
10382end;
10383
10384@ Here we have to remember to tell the |input_ln| routine not to
10385start with a |get|. If the file is empty, it is considered to
10386contain a single blank line.
10387@^system dependencies@>
10388@^empty line at end of file@>
10389
10390@<Read the first line...@>=
10391begin line:=1;
10392if input_ln(cur_file,false) then do_nothing;
10393firm_up_the_line;
10394if end_line_char_inactive then decr(limit)
10395else  buffer[limit]:=end_line_char;
10396first:=limit+1; loc:=start;
10397end
10398
10399@* \[30] Font metric data.
10400\TeX\ gets its knowledge about fonts from font metric files, also called
10401\.{TFM} files; the `\.T' in `\.{TFM}' stands for \TeX,
10402but other programs know about them too.
10403@:TFM files}{\.{TFM} files@>
10404@^font metric files@>
10405
10406The information in a \.{TFM} file appears in a sequence of 8-bit bytes.
10407Since the number of bytes is always a multiple of 4, we could
10408also regard the file as a sequence of 32-bit words, but \TeX\ uses the
10409byte interpretation. The format of \.{TFM} files was designed by
10410Lyle Ramshaw in 1980. The intent is to convey a lot of different kinds
10411@^Ramshaw, Lyle Harold@>
10412of information in a compact but useful form.
10413
10414@<Glob...@>=
10415@!tfm_file:byte_file;
10416
10417@ The first 24 bytes (6 words) of a \.{TFM} file contain twelve 16-bit
10418integers that give the lengths of the various subsequent portions
10419of the file. These twelve integers are, in order:
10420$$\vbox{\halign{\hfil#&$\null=\null$#\hfil\cr
10421|lf|&length of the entire file, in words;\cr
10422|lh|&length of the header data, in words;\cr
10423|bc|&smallest character code in the font;\cr
10424|ec|&largest character code in the font;\cr
10425|nw|&number of words in the width table;\cr
10426|nh|&number of words in the height table;\cr
10427|nd|&number of words in the depth table;\cr
10428|ni|&number of words in the italic correction table;\cr
10429|nl|&number of words in the lig/kern table;\cr
10430|nk|&number of words in the kern table;\cr
10431|ne|&number of words in the extensible character table;\cr
10432|np|&number of font parameter words.\cr}}$$
10433They are all nonnegative and less than $2^{15}$. We must have |bc-1<=ec<=255|,
10434and
10435$$\hbox{|lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np|.}$$
10436Note that a font may contain as many as 256 characters (if |bc=0| and |ec=255|),
10437and as few as 0 characters (if |bc=ec+1|).
10438
10439Incidentally, when two or more 8-bit bytes are combined to form an integer of
1044016 or more bits, the most significant bytes appear first in the file.
10441This is called BigEndian order.
10442@!@^BigEndian order@>
10443
10444@ The rest of the \.{TFM} file may be regarded as a sequence of ten data
10445arrays having the informal specification
10446$$\def\arr$[#1]#2${\&{array} $[#1]$ \&{of} #2}
10447\vbox{\halign{\hfil\\{#}&$\,:\,$\arr#\hfil\cr
10448header&|[0..lh-1]@t\\{stuff}@>|\cr
10449char\_info&|[bc..ec]char_info_word|\cr
10450width&|[0..nw-1]fix_word|\cr
10451height&|[0..nh-1]fix_word|\cr
10452depth&|[0..nd-1]fix_word|\cr
10453italic&|[0..ni-1]fix_word|\cr
10454lig\_kern&|[0..nl-1]lig_kern_command|\cr
10455kern&|[0..nk-1]fix_word|\cr
10456exten&|[0..ne-1]extensible_recipe|\cr
10457param&|[1..np]fix_word|\cr}}$$
10458The most important data type used here is a |@!fix_word|, which is
10459a 32-bit representation of a binary fraction. A |fix_word| is a signed
10460quantity, with the two's complement of the entire word used to represent
10461negation. Of the 32 bits in a |fix_word|, exactly 12 are to the left of the
10462binary point; thus, the largest |fix_word| value is $2048-2^{-20}$, and
10463the smallest is $-2048$. We will see below, however, that all but two of
10464the |fix_word| values must lie between $-16$ and $+16$.
10465
10466@ The first data array is a block of header information, which contains
10467general facts about the font. The header must contain at least two words,
10468|header[0]| and |header[1]|, whose meaning is explained below.
10469Additional header information of use to other software routines might
10470also be included, but \TeX82 does not need to know about such details.
10471For example, 16 more words of header information are in use at the Xerox
10472Palo Alto Research Center; the first ten specify the character coding
10473scheme used (e.g., `\.{XEROX text}' or `\.{TeX math symbols}'), the next five
10474give the font identifier (e.g., `\.{HELVETICA}' or `\.{CMSY}'), and the
10475last gives the ``face byte.'' The program that converts \.{DVI} files
10476to Xerox printing format gets this information by looking at the \.{TFM}
10477file, which it needs to read anyway because of other information that
10478is not explicitly repeated in \.{DVI}~format.
10479
10480\yskip\hang|header[0]| is a 32-bit check sum that \TeX\ will copy into
10481the \.{DVI} output file. Later on when the \.{DVI} file is printed,
10482possibly on another computer, the actual font that gets used is supposed
10483to have a check sum that agrees with the one in the \.{TFM} file used by
10484\TeX. In this way, users will be warned about potential incompatibilities.
10485(However, if the check sum is zero in either the font file or the \.{TFM}
10486file, no check is made.)  The actual relation between this check sum and
10487the rest of the \.{TFM} file is not important; the check sum is simply an
10488identification number with the property that incompatible fonts almost
10489always have distinct check sums.
10490@^check sum@>
10491
10492\yskip\hang|header[1]| is a |fix_word| containing the design size of
10493the font, in units of \TeX\ points. This number must be at least 1.0; it is
10494fairly arbitrary, but usually the design size is 10.0 for a ``10 point''
10495font, i.e., a font that was designed to look best at a 10-point size,
10496whatever that really means. When a \TeX\ user asks for a font
10497`\.{at} $\delta$ \.{pt}', the effect is to override the design size
10498and replace it by $\delta$, and to multiply the $x$ and~$y$ coordinates
10499of the points in the font image by a factor of $\delta$ divided by the
10500design size.  {\sl All other dimensions in the\/ \.{TFM} file are
10501|fix_word|\kern-1pt\ numbers in design-size units}, with the exception of
10502|param[1]| (which denotes the slant ratio). Thus, for example, the value
10503of |param[6]|, which defines the \.{em} unit, is often the |fix_word| value
10504$2^{20}=1.0$, since many fonts have a design size equal to one em.
10505The other dimensions must be less than 16 design-size units in absolute
10506value; thus, |header[1]| and |param[1]| are the only |fix_word|
10507entries in the whole \.{TFM} file whose first byte might be something
10508besides 0 or 255.
10509
10510@ Next comes the |char_info| array, which contains one |@!char_info_word|
10511per character. Each word in this part of the file contains six fields
10512packed into four bytes as follows.
10513
10514\yskip\hang first byte: |@!width_index| (8 bits)\par
10515\hang second byte: |@!height_index| (4 bits) times 16, plus |@!depth_index|
10516  (4~bits)\par
10517\hang third byte: |@!italic_index| (6 bits) times 4, plus |@!tag|
10518  (2~bits)\par
10519\hang fourth byte: |@!remainder| (8 bits)\par
10520\yskip\noindent
10521The actual width of a character is \\{width}|[width_index]|, in design-size
10522units; this is a device for compressing information, since many characters
10523have the same width. Since it is quite common for many characters
10524to have the same height, depth, or italic correction, the \.{TFM} format
10525imposes a limit of 16 different heights, 16 different depths, and
1052664 different italic corrections.
10527
10528@!@^italic correction@>
10529The italic correction of a character has two different uses.
10530(a)~In ordinary text, the italic correction is added to the width only if
10531the \TeX\ user specifies `\.{\\/}' after the character.
10532(b)~In math formulas, the italic correction is always added to the width,
10533except with respect to the positioning of subscripts.
10534
10535Incidentally, the relation $\\{width}[0]=\\{height}[0]=\\{depth}[0]=
10536\\{italic}[0]=0$ should always hold, so that an index of zero implies a
10537value of zero.  The |width_index| should never be zero unless the
10538character does not exist in the font, since a character is valid if and
10539only if it lies between |bc| and |ec| and has a nonzero |width_index|.
10540
10541@ The |tag| field in a |char_info_word| has four values that explain how to
10542interpret the |remainder| field.
10543
10544\yskip\hangg|tag=0| (|no_tag|) means that |remainder| is unused.\par
10545\hangg|tag=1| (|lig_tag|) means that this character has a ligature/kerning
10546program starting at position |remainder| in the |lig_kern| array.\par
10547\hangg|tag=2| (|list_tag|) means that this character is part of a chain of
10548characters of ascending sizes, and not the largest in the chain.  The
10549|remainder| field gives the character code of the next larger character.\par
10550\hangg|tag=3| (|ext_tag|) means that this character code represents an
10551extensible character, i.e., a character that is built up of smaller pieces
10552so that it can be made arbitrarily large. The pieces are specified in
10553|@!exten[remainder]|.\par
10554\yskip\noindent
10555Characters with |tag=2| and |tag=3| are treated as characters with |tag=0|
10556unless they are used in special circumstances in math formulas. For example,
10557the \.{\\sum} operation looks for a |list_tag|, and the \.{\\left}
10558operation looks for both |list_tag| and |ext_tag|.
10559
10560@d no_tag=0 {vanilla character}
10561@d lig_tag=1 {character has a ligature/kerning program}
10562@d list_tag=2 {character has a successor in a charlist}
10563@d ext_tag=3 {character is extensible}
10564
10565@ The |lig_kern| array contains instructions in a simple programming language
10566that explains what to do for special letter pairs. Each word in this array is a
10567|@!lig_kern_command| of four bytes.
10568
10569\yskip\hang first byte: |skip_byte|, indicates that this is the final program
10570  step if the byte is 128 or more, otherwise the next step is obtained by
10571  skipping this number of intervening steps.\par
10572\hang second byte: |next_char|, ``if |next_char| follows the current character,
10573  then perform the operation and stop, otherwise continue.''\par
10574\hang third byte: |op_byte|, indicates a ligature step if less than~128,
10575  a kern step otherwise.\par
10576\hang fourth byte: |remainder|.\par
10577\yskip\noindent
10578In a kern step, an
10579additional space equal to |kern[256*(op_byte-128)+remainder]| is inserted
10580between the current character and |next_char|. This amount is
10581often negative, so that the characters are brought closer together
10582by kerning; but it might be positive.
10583
10584There are eight kinds of ligature steps, having |op_byte| codes $4a+2b+c$ where
10585$0\le a\le b+c$ and $0\le b,c\le1$. The character whose code is
10586|remainder| is inserted between the current character and |next_char|;
10587then the current character is deleted if $b=0$, and |next_char| is
10588deleted if $c=0$; then we pass over $a$~characters to reach the next
10589current character (which may have a ligature/kerning program of its own).
10590
10591If the very first instruction of the |lig_kern| array has |skip_byte=255|,
10592the |next_char| byte is the so-called right boundary character of this font;
10593the value of |next_char| need not lie between |bc| and~|ec|.
10594If the very last instruction of the |lig_kern| array has |skip_byte=255|,
10595there is a special ligature/kerning program for a left boundary character,
10596beginning at location |256*op_byte+remainder|.
10597The interpretation is that \TeX\ puts implicit boundary characters
10598before and after each consecutive string of characters from the same font.
10599These implicit characters do not appear in the output, but they can affect
10600ligatures and kerning.
10601
10602If the very first instruction of a character's |lig_kern| program has
10603|skip_byte>128|, the program actually begins in location
10604|256*op_byte+remainder|. This feature allows access to large |lig_kern|
10605arrays, because the first instruction must otherwise
10606appear in a location |<=255|.
10607
10608Any instruction with |skip_byte>128| in the |lig_kern| array must satisfy
10609the condition
10610$$\hbox{|256*op_byte+remainder<nl|.}$$
10611If such an instruction is encountered during
10612normal program execution, it denotes an unconditional halt; no ligature
10613or kerning command is performed.
10614
10615@d stop_flag==qi(128) {value indicating `\.{STOP}' in a lig/kern program}
10616@d kern_flag==qi(128) {op code for a kern step}
10617@d skip_byte(#)==#.b0
10618@d next_char(#)==#.b1
10619@d op_byte(#)==#.b2
10620@d rem_byte(#)==#.b3
10621
10622@ Extensible characters are specified by an |@!extensible_recipe|, which
10623consists of four bytes called |@!top|, |@!mid|, |@!bot|, and |@!rep| (in this
10624order). These bytes are the character codes of individual pieces used to
10625build up a large symbol.  If |top|, |mid|, or |bot| are zero, they are not
10626present in the built-up result. For example, an extensible vertical line is
10627like an extensible bracket, except that the top and bottom pieces are missing.
10628
10629Let $T$, $M$, $B$, and $R$ denote the respective pieces, or an empty box
10630if the piece isn't present. Then the extensible characters have the form
10631$TR^kMR^kB$ from top to bottom, for some |k>=0|, unless $M$ is absent;
10632in the latter case we can have $TR^kB$ for both even and odd values of~|k|.
10633The width of the extensible character is the width of $R$; and the
10634height-plus-depth is the sum of the individual height-plus-depths of the
10635components used, since the pieces are butted together in a vertical list.
10636
10637@d ext_top(#)==#.b0 {|top| piece in a recipe}
10638@d ext_mid(#)==#.b1 {|mid| piece in a recipe}
10639@d ext_bot(#)==#.b2 {|bot| piece in a recipe}
10640@d ext_rep(#)==#.b3 {|rep| piece in a recipe}
10641
10642@ The final portion of a \.{TFM} file is the |param| array, which is another
10643sequence of |fix_word| values.
10644
10645\yskip\hang|param[1]=slant| is the amount of italic slant, which is used
10646to help position accents. For example, |slant=.25| means that when you go
10647up one unit, you also go .25 units to the right. The |slant| is a pure
10648number; it's the only |fix_word| other than the design size itself that is
10649not scaled by the design size.
10650
10651\hang|param[2]=space| is the normal spacing between words in text.
10652Note that character |" "| in the font need not have anything to do with
10653blank spaces.
10654
10655\hang|param[3]=space_stretch| is the amount of glue stretching between words.
10656
10657\hang|param[4]=space_shrink| is the amount of glue shrinking between words.
10658
10659\hang|param[5]=x_height| is the size of one ex in the font; it is also
10660the height of letters for which accents don't have to be raised or lowered.
10661
10662\hang|param[6]=quad| is the size of one em in the font.
10663
10664\hang|param[7]=extra_space| is the amount added to |param[2]| at the
10665ends of sentences.
10666
10667\yskip\noindent
10668If fewer than seven parameters are present, \TeX\ sets the missing parameters
10669to zero. Fonts used for math symbols are required to have
10670additional parameter information, which is explained later.
10671
10672@d slant_code=1
10673@d space_code=2
10674@d space_stretch_code=3
10675@d space_shrink_code=4
10676@d x_height_code=5
10677@d quad_code=6
10678@d extra_space_code=7
10679
10680@ So that is what \.{TFM} files hold. Since \TeX\ has to absorb such information
10681about lots of fonts, it stores most of the data in a large array called
10682|font_info|. Each item of |font_info| is a |memory_word|; the |fix_word|
10683data gets converted into |scaled| entries, while everything else goes into
10684words of type |four_quarters|.
10685
10686When the user defines \.{\\font\\f}, say, \TeX\ assigns an internal number
10687to the user's font~\.{\\f}. Adding this number to |font_id_base| gives the
10688|eqtb| location of a ``frozen'' control sequence that will always select
10689the font.
10690
10691@<Types...@>=
10692@!internal_font_number=font_base..font_max; {|font| in a |char_node|}
10693@!font_index=0..font_mem_size; {index into |font_info|}
10694
10695@ Here now is the (rather formidable) array of font arrays.
10696
10697@d non_char==qi(256) {a |halfword| code that can't match a real character}
10698@d non_address=0 {a spurious |bchar_label|}
10699
10700@<Glob...@>=
10701@!font_info:array[font_index] of memory_word;
10702  {the big collection of font data}
10703@!fmem_ptr:font_index; {first unused word of |font_info|}
10704@!font_ptr:internal_font_number; {largest internal font number in use}
10705@!font_check:array[internal_font_number] of four_quarters; {check sum}
10706@!font_size:array[internal_font_number] of scaled; {``at'' size}
10707@!font_dsize:array[internal_font_number] of scaled; {``design'' size}
10708@!font_params:array[internal_font_number] of font_index; {how many font
10709  parameters are present}
10710@!font_name:array[internal_font_number] of str_number; {name of the font}
10711@!font_area:array[internal_font_number] of str_number; {area of the font}
10712@!font_bc:array[internal_font_number] of eight_bits;
10713  {beginning (smallest) character code}
10714@!font_ec:array[internal_font_number] of eight_bits;
10715  {ending (largest) character code}
10716@!font_glue:array[internal_font_number] of pointer;
10717  {glue specification for interword space, |null| if not allocated}
10718@!font_used:array[internal_font_number] of boolean;
10719  {has a character from this font actually appeared in the output?}
10720@!hyphen_char:array[internal_font_number] of integer;
10721  {current \.{\\hyphenchar} values}
10722@!skew_char:array[internal_font_number] of integer;
10723  {current \.{\\skewchar} values}
10724@!bchar_label:array[internal_font_number] of font_index;
10725  {start of |lig_kern| program for left boundary character,
10726  |non_address| if there is none}
10727@!font_bchar:array[internal_font_number] of min_quarterword..non_char;
10728  {right boundary character, |non_char| if there is none}
10729@!font_false_bchar:array[internal_font_number] of min_quarterword..non_char;
10730  {|font_bchar| if it doesn't exist in the font, otherwise |non_char|}
10731
10732@ Besides the arrays just enumerated, we have directory arrays that make it
10733easy to get at the individual entries in |font_info|. For example, the
10734|char_info| data for character |c| in font |f| will be in
10735|font_info[char_base[f]+c].qqqq|; and if |w| is the |width_index|
10736part of this word (the |b0| field), the width of the character is
10737|font_info[width_base[f]+w].sc|. (These formulas assume that
10738|min_quarterword| has already been added to |c| and to |w|, since \TeX\
10739stores its quarterwords that way.)
10740
10741@<Glob...@>=
10742@!char_base:array[internal_font_number] of integer;
10743  {base addresses for |char_info|}
10744@!width_base:array[internal_font_number] of integer;
10745  {base addresses for widths}
10746@!height_base:array[internal_font_number] of integer;
10747  {base addresses for heights}
10748@!depth_base:array[internal_font_number] of integer;
10749  {base addresses for depths}
10750@!italic_base:array[internal_font_number] of integer;
10751  {base addresses for italic corrections}
10752@!lig_kern_base:array[internal_font_number] of integer;
10753  {base addresses for ligature/kerning programs}
10754@!kern_base:array[internal_font_number] of integer;
10755  {base addresses for kerns}
10756@!exten_base:array[internal_font_number] of integer;
10757  {base addresses for extensible recipes}
10758@!param_base:array[internal_font_number] of integer;
10759  {base addresses for font parameters}
10760
10761@ @<Set init...@>=
10762for k:=font_base to font_max do font_used[k]:=false;
10763
10764@ \TeX\ always knows at least one font, namely the null font. It has no
10765characters, and its seven parameters are all equal to zero.
10766
10767@<Initialize table...@>=
10768font_ptr:=null_font; fmem_ptr:=7;
10769font_name[null_font]:="nullfont"; font_area[null_font]:="";
10770hyphen_char[null_font]:="-"; skew_char[null_font]:=-1;
10771bchar_label[null_font]:=non_address;
10772font_bchar[null_font]:=non_char; font_false_bchar[null_font]:=non_char;
10773font_bc[null_font]:=1; font_ec[null_font]:=0;
10774font_size[null_font]:=0; font_dsize[null_font]:=0;
10775char_base[null_font]:=0; width_base[null_font]:=0;
10776height_base[null_font]:=0; depth_base[null_font]:=0;
10777italic_base[null_font]:=0; lig_kern_base[null_font]:=0;
10778kern_base[null_font]:=0; exten_base[null_font]:=0;
10779font_glue[null_font]:=null; font_params[null_font]:=7;
10780param_base[null_font]:=-1;
10781for k:=0 to 6 do font_info[k].sc:=0;
10782
10783@ @<Put each...@>=
10784primitive("nullfont",set_font,null_font);
10785@!@:null_font_}{\.{\\nullfont} primitive@>
10786text(frozen_null_font):="nullfont"; eqtb[frozen_null_font]:=eqtb[cur_val];
10787
10788@ Of course we want to define macros that suppress the detail of how font
10789information is actually packed, so that we don't have to write things like
10790$$\hbox{|font_info[width_base[f]+font_info[char_base[f]+c].qqqq.b0].sc|}$$
10791too often. The \.{WEB} definitions here make |char_info(f)(c)| the
10792|four_quarters| word of font information corresponding to character
10793|c| of font |f|. If |q| is such a word, |char_width(f)(q)| will be
10794the character's width; hence the long formula above is at least
10795abbreviated to
10796$$\hbox{|char_width(f)(char_info(f)(c))|.}$$
10797Usually, of course, we will fetch |q| first and look at several of its
10798fields at the same time.
10799
10800The italic correction of a character will be denoted by
10801|char_italic(f)(q)|, so it is analogous to |char_width|.  But we will get
10802at the height and depth in a slightly different way, since we usually want
10803to compute both height and depth if we want either one.  The value of
10804|height_depth(q)| will be the 8-bit quantity
10805$$b=|height_index|\times16+|depth_index|,$$ and if |b| is such a byte we
10806will write |char_height(f)(b)| and |char_depth(f)(b)| for the height and
10807depth of the character |c| for which |q=char_info(f)(c)|. Got that?
10808
10809The tag field will be called |char_tag(q)|; the remainder byte will be
10810called |rem_byte(q)|, using a macro that we have already defined above.
10811
10812Access to a character's |width|, |height|, |depth|, and |tag| fields is
10813part of \TeX's inner loop, so we want these macros to produce code that is
10814as fast as possible under the circumstances.
10815@^inner loop@>
10816
10817@d char_info_end(#)==#].qqqq
10818@d char_info(#)==font_info[char_base[#]+char_info_end
10819@d char_width_end(#)==#.b0].sc
10820@d char_width(#)==font_info[width_base[#]+char_width_end
10821@d char_exists(#)==(#.b0>min_quarterword)
10822@d char_italic_end(#)==(qo(#.b2)) div 4].sc
10823@d char_italic(#)==font_info[italic_base[#]+char_italic_end
10824@d height_depth(#)==qo(#.b1)
10825@d char_height_end(#)==(#) div 16].sc
10826@d char_height(#)==font_info[height_base[#]+char_height_end
10827@d char_depth_end(#)==(#) mod 16].sc
10828@d char_depth(#)==font_info[depth_base[#]+char_depth_end
10829@d char_tag(#)==((qo(#.b2)) mod 4)
10830
10831@ The global variable |null_character| is set up to be a word of
10832|char_info| for a character that doesn't exist. Such a word provides a
10833convenient way to deal with erroneous situations.
10834
10835@<Glob...@>=
10836@!null_character:four_quarters; {nonexistent character information}
10837
10838@ @<Set init...@>=
10839null_character.b0:=min_quarterword; null_character.b1:=min_quarterword;
10840null_character.b2:=min_quarterword; null_character.b3:=min_quarterword;
10841
10842@ Here are some macros that help process ligatures and kerns.
10843We write |char_kern(f)(j)| to find the amount of kerning specified by
10844kerning command~|j| in font~|f|. If |j| is the |char_info| for a character
10845with a ligature/kern program, the first instruction of that program is either
10846|i=font_info[lig_kern_start(f)(j)]| or |font_info[lig_kern_restart(f)(i)]|,
10847depending on whether or not |skip_byte(i)<=stop_flag|.
10848
10849The constant |kern_base_offset| should be simplified, for \PASCAL\ compilers
10850that do not do local optimization.
10851@^system dependencies@>
10852
10853@d char_kern_end(#)==256*op_byte(#)+rem_byte(#)].sc
10854@d char_kern(#)==font_info[kern_base[#]+char_kern_end
10855@d kern_base_offset==256*(128+min_quarterword)
10856@d lig_kern_start(#)==lig_kern_base[#]+rem_byte {beginning of lig/kern program}
10857@d lig_kern_restart_end(#)==256*op_byte(#)+rem_byte(#)+32768-kern_base_offset
10858@d lig_kern_restart(#)==lig_kern_base[#]+lig_kern_restart_end
10859
10860@ Font parameters are referred to as |slant(f)|, |space(f)|, etc.
10861
10862@d param_end(#)==param_base[#]].sc
10863@d param(#)==font_info[#+param_end
10864@d slant==param(slant_code) {slant to the right, per unit distance upward}
10865@d space==param(space_code) {normal space between words}
10866@d space_stretch==param(space_stretch_code) {stretch between words}
10867@d space_shrink==param(space_shrink_code) {shrink between words}
10868@d x_height==param(x_height_code) {one ex}
10869@d quad==param(quad_code) {one em}
10870@d extra_space==param(extra_space_code) {additional space at end of sentence}
10871
10872@<The em width for |cur_font|@>=quad(cur_font)
10873
10874@ @<The x-height for |cur_font|@>=x_height(cur_font)
10875
10876@ \TeX\ checks the information of a \.{TFM} file for validity as the
10877file is being read in, so that no further checks will be needed when
10878typesetting is going on. The somewhat tedious subroutine that does this
10879is called |read_font_info|. It has four parameters: the user font
10880identifier~|u|, the file name and area strings |nom| and |aire|, and the
10881``at'' size~|s|. If |s|~is negative, it's the negative of a scale factor
10882to be applied to the design size; |s=-1000| is the normal case.
10883Otherwise |s| will be substituted for the design size; in this
10884case, |s| must be positive and less than $2048\rm\,pt$
10885(i.e., it must be less than $2^{27}$ when considered as an integer).
10886
10887The subroutine opens and closes a global file variable called |tfm_file|.
10888It returns the value of the internal font number that was just loaded.
10889If an error is detected, an error message is issued and no font
10890information is stored; |null_font| is returned in this case.
10891
10892@d bad_tfm=11 {label for |read_font_info|}
10893@d abort==goto bad_tfm {do this when the \.{TFM} data is wrong}
10894
10895@p function read_font_info(@!u:pointer;@!nom,@!aire:str_number;
10896  @!s:scaled):internal_font_number; {input a \.{TFM} file}
10897label done,bad_tfm,not_found;
10898var k:font_index; {index into |font_info|}
10899@!file_opened:boolean; {was |tfm_file| successfully opened?}
10900@!lf,@!lh,@!bc,@!ec,@!nw,@!nh,@!nd,@!ni,@!nl,@!nk,@!ne,@!np:halfword;
10901  {sizes of subfiles}
10902@!f:internal_font_number; {the new font's number}
10903@!g:internal_font_number; {the number to return}
10904@!a,@!b,@!c,@!d:eight_bits; {byte variables}
10905@!qw:four_quarters;@!sw:scaled; {accumulators}
10906@!bch_label:integer; {left boundary start location, or infinity}
10907@!bchar:0..256; {right boundary character, or 256}
10908@!z:scaled; {the design size or the ``at'' size}
10909@!alpha:integer;@!beta:1..16;
10910  {auxiliary quantities used in fixed-point multiplication}
10911begin g:=null_font;@/
10912@<Read and check the font data; |abort| if the \.{TFM} file is
10913  malformed; if there's no room for this font, say so and |goto
10914  done|; otherwise |incr(font_ptr)| and |goto done|@>;
10915bad_tfm: @<Report that the font won't be loaded@>;
10916done: if file_opened then b_close(tfm_file);
10917read_font_info:=g;
10918end;
10919
10920@ There are programs called \.{TFtoPL} and \.{PLtoTF} that convert
10921between the \.{TFM} format and a symbolic property-list format
10922that can be easily edited. These programs contain extensive
10923diagnostic information, so \TeX\ does not have to bother giving
10924precise details about why it rejects a particular \.{TFM} file.
10925@.TFtoPL@> @.PLtoTF@>
10926
10927@d start_font_error_message==print_err("Font "); sprint_cs(u);
10928  print_char("="); print_file_name(nom,aire,"");
10929  if s>=0 then
10930    begin print(" at "); print_scaled(s); print("pt");
10931    end
10932  else if s<>-1000 then
10933    begin print(" scaled "); print_int(-s);
10934    end
10935
10936@<Report that the font won't be loaded@>=
10937start_font_error_message;
10938@.Font x=xx not loadable...@>
10939if file_opened then print(" not loadable: Bad metric (TFM) file")
10940else print(" not loadable: Metric (TFM) file not found");
10941help5("I wasn't able to read the size data for this font,")@/
10942("so I will ignore the font specification.")@/
10943("[Wizards can fix TFM files using TFtoPL/PLtoTF.]")@/
10944("You might try inserting a different font spec;")@/
10945("e.g., type `I\font<same font id>=<substitute font name>'.");
10946error
10947
10948@ @<Read and check...@>=
10949@<Open |tfm_file| for input@>;
10950@<Read the {\.{TFM}} size fields@>;
10951@<Use size fields to allocate font information@>;
10952@<Read the {\.{TFM}} header@>;
10953@<Read character data@>;
10954@<Read box dimensions@>;
10955@<Read ligature/kern program@>;
10956@<Read extensible character recipes@>;
10957@<Read font parameters@>;
10958@<Make final adjustments and |goto done|@>
10959
10960@ @<Open |tfm_file| for input@>=
10961file_opened:=false;
10962if aire="" then pack_file_name(nom,TEX_font_area,".tfm")
10963else pack_file_name(nom,aire,".tfm");
10964if not b_open_in(tfm_file) then abort;
10965file_opened:=true
10966
10967@ Note: A malformed \.{TFM} file might be shorter than it claims to be;
10968thus |eof(tfm_file)| might be true when |read_font_info| refers to
10969|tfm_file^| or when it says |get(tfm_file)|. If such circumstances
10970cause system error messages, you will have to defeat them somehow,
10971for example by defining |fget| to be `\ignorespaces|begin get(tfm_file);|
10972|if eof(tfm_file) then abort; end|\unskip'.
10973@^system dependencies@>
10974
10975@d fget==get(tfm_file)
10976@d fbyte==tfm_file^
10977@d read_sixteen(#)==begin #:=fbyte;
10978  if #>127 then abort;
10979  fget; #:=#*@'400+fbyte;
10980  end
10981@d store_four_quarters(#)==begin fget; a:=fbyte; qw.b0:=qi(a);
10982  fget; b:=fbyte; qw.b1:=qi(b);
10983  fget; c:=fbyte; qw.b2:=qi(c);
10984  fget; d:=fbyte; qw.b3:=qi(d);
10985  #:=qw;
10986  end
10987
10988@ @<Read the {\.{TFM}} size fields@>=
10989begin read_sixteen(lf);
10990fget; read_sixteen(lh);
10991fget; read_sixteen(bc);
10992fget; read_sixteen(ec);
10993if (bc>ec+1)or(ec>255) then abort;
10994if bc>255 then {|bc=256| and |ec=255|}
10995  begin bc:=1; ec:=0;
10996  end;
10997fget; read_sixteen(nw);
10998fget; read_sixteen(nh);
10999fget; read_sixteen(nd);
11000fget; read_sixteen(ni);
11001fget; read_sixteen(nl);
11002fget; read_sixteen(nk);
11003fget; read_sixteen(ne);
11004fget; read_sixteen(np);
11005if lf<>6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np then abort;
11006if (nw=0)or(nh=0)or(nd=0)or(ni=0) then abort;
11007end
11008
11009@ The preliminary settings of the index-offset variables |char_base|,
11010|width_base|, |lig_kern_base|, |kern_base|, and |exten_base| will be
11011corrected later by subtracting |min_quarterword| from them; and we will
11012subtract 1 from |param_base| too. It's best to forget about such anomalies
11013until later.
11014
11015@<Use size fields to allocate font information@>=
11016lf:=lf-6-lh; {|lf| words should be loaded into |font_info|}
11017if np<7 then lf:=lf+7-np; {at least seven parameters will appear}
11018if (font_ptr=font_max)or(fmem_ptr+lf>font_mem_size) then
11019  @<Apologize for not loading the font, |goto done|@>;
11020f:=font_ptr+1;
11021char_base[f]:=fmem_ptr-bc;
11022width_base[f]:=char_base[f]+ec+1;
11023height_base[f]:=width_base[f]+nw;
11024depth_base[f]:=height_base[f]+nh;
11025italic_base[f]:=depth_base[f]+nd;
11026lig_kern_base[f]:=italic_base[f]+ni;
11027kern_base[f]:=lig_kern_base[f]+nl-kern_base_offset;
11028exten_base[f]:=kern_base[f]+kern_base_offset+nk;
11029param_base[f]:=exten_base[f]+ne
11030
11031@ @<Apologize for not loading...@>=
11032begin start_font_error_message;
11033print(" not loaded: Not enough room left");
11034@.Font x=xx not loaded...@>
11035help4("I'm afraid I won't be able to make use of this font,")@/
11036("because my memory for character-size data is too small.")@/
11037("If you're really stuck, ask a wizard to enlarge me.")@/
11038("Or maybe try `I\font<same font id>=<name of loaded font>'.");
11039error; goto done;
11040end
11041
11042@ Only the first two words of the header are needed by \TeX82.
11043
11044@<Read the {\.{TFM}} header@>=
11045begin if lh<2 then abort;
11046store_four_quarters(font_check[f]);
11047fget; read_sixteen(z); {this rejects a negative design size}
11048fget; z:=z*@'400+fbyte; fget; z:=(z*@'20)+(fbyte div@'20);
11049if z<unity then abort;
11050while lh>2 do
11051  begin fget;fget;fget;fget;decr(lh); {ignore the rest of the header}
11052  end;
11053font_dsize[f]:=z;
11054if s<>-1000 then
11055  if s>=0 then z:=s
11056  else z:=xn_over_d(z,-s,1000);
11057font_size[f]:=z;
11058end
11059
11060@ @<Read character data@>=
11061for k:=fmem_ptr to width_base[f]-1 do
11062  begin store_four_quarters(font_info[k].qqqq);
11063  if (a>=nw)or(b div @'20>=nh)or(b mod @'20>=nd)or
11064    (c div 4>=ni) then abort;
11065  case c mod 4 of
11066  lig_tag: if d>=nl then abort;
11067  ext_tag: if d>=ne then abort;
11068  list_tag: @<Check for charlist cycle@>;
11069  othercases do_nothing {|no_tag|}
11070  endcases;
11071  end
11072
11073@ We want to make sure that there is no cycle of characters linked together
11074by |list_tag| entries, since such a cycle would get \TeX\ into an endless
11075loop. If such a cycle exists, the routine here detects it when processing
11076the largest character code in the cycle.
11077
11078@d check_byte_range(#)==begin if (#<bc)or(#>ec) then abort@+end
11079@d current_character_being_worked_on==k+bc-fmem_ptr
11080
11081@<Check for charlist cycle@>=
11082begin check_byte_range(d);
11083while d<current_character_being_worked_on do
11084  begin qw:=char_info(f)(d);
11085  {N.B.: not |qi(d)|, since |char_base[f]| hasn't been adjusted yet}
11086  if char_tag(qw)<>list_tag then goto not_found;
11087  d:=qo(rem_byte(qw)); {next character on the list}
11088  end;
11089if d=current_character_being_worked_on then abort; {yes, there's a cycle}
11090not_found:end
11091
11092@ A |fix_word| whose four bytes are $(a,b,c,d)$ from left to right represents
11093the number
11094$$x=\left\{\vcenter{\halign{$#$,\hfil\qquad&if $#$\hfil\cr
11095b\cdot2^{-4}+c\cdot2^{-12}+d\cdot2^{-20}&a=0;\cr
11096-16+b\cdot2^{-4}+c\cdot2^{-12}+d\cdot2^{-20}&a=255.\cr}}\right.$$
11097(No other choices of |a| are allowed, since the magnitude of a number in
11098design-size units must be less than 16.)  We want to multiply this
11099quantity by the integer~|z|, which is known to be less than $2^{27}$.
11100If $|z|<2^{23}$, the individual multiplications $b\cdot z$,
11101$c\cdot z$, $d\cdot z$ cannot overflow; otherwise we will divide |z| by 2,
111024, 8, or 16, to obtain a multiplier less than $2^{23}$, and we can
11103compensate for this later. If |z| has thereby been replaced by
11104$|z|^\prime=|z|/2^e$, let $\beta=2^{4-e}$; we shall compute
11105$$\lfloor(b+c\cdot2^{-8}+d\cdot2^{-16})\,z^\prime/\beta\rfloor$$
11106if $a=0$, or the same quantity minus $\alpha=2^{4+e}z^\prime$ if $a=255$.
11107This calculation must be done exactly, in order to guarantee portability
11108of \TeX\ between computers.
11109
11110@d store_scaled(#)==begin fget; a:=fbyte; fget; b:=fbyte;
11111  fget; c:=fbyte; fget; d:=fbyte;@/
11112  sw:=(((((d*z)div@'400)+(c*z))div@'400)+(b*z))div beta;
11113  if a=0 then #:=sw@+else if a=255 then #:=sw-alpha@+else abort;
11114  end
11115
11116@<Read box dimensions@>=
11117begin @<Replace |z| by $|z|^\prime$ and compute $\alpha,\beta$@>;
11118for k:=width_base[f] to lig_kern_base[f]-1 do
11119  store_scaled(font_info[k].sc);
11120if font_info[width_base[f]].sc<>0 then abort; {\\{width}[0] must be zero}
11121if font_info[height_base[f]].sc<>0 then abort; {\\{height}[0] must be zero}
11122if font_info[depth_base[f]].sc<>0 then abort; {\\{depth}[0] must be zero}
11123if font_info[italic_base[f]].sc<>0 then abort; {\\{italic}[0] must be zero}
11124end
11125
11126@ @<Replace |z|...@>=
11127begin alpha:=16;
11128while z>=@'40000000 do
11129  begin z:=z div 2; alpha:=alpha+alpha;
11130  end;
11131beta:=256 div alpha; alpha:=alpha*z;
11132end
11133
11134@ @d check_existence(#)==@t@>@;@/
11135  begin check_byte_range(#);
11136  qw:=char_info(f)(#); {N.B.: not |qi(#)|}
11137  if not char_exists(qw) then abort;
11138  end
11139
11140@<Read ligature/kern program@>=
11141bch_label:=@'77777; bchar:=256;
11142if nl>0 then
11143  begin for k:=lig_kern_base[f] to kern_base[f]+kern_base_offset-1 do
11144    begin store_four_quarters(font_info[k].qqqq);
11145    if a>128 then
11146      begin if 256*c+d>=nl then abort;
11147      if a=255 then if k=lig_kern_base[f] then bchar:=b;
11148      end
11149    else begin if b<>bchar then check_existence(b);
11150      if c<128 then check_existence(d) {check ligature}
11151      else if 256*(c-128)+d>=nk then abort; {check kern}
11152      if a<128 then if k-lig_kern_base[f]+a+1>=nl then abort;
11153      end;
11154    end;
11155  if a=255 then bch_label:=256*c+d;
11156  end;
11157for k:=kern_base[f]+kern_base_offset to exten_base[f]-1 do
11158  store_scaled(font_info[k].sc);
11159
11160@ @<Read extensible character recipes@>=
11161for k:=exten_base[f] to param_base[f]-1 do
11162  begin store_four_quarters(font_info[k].qqqq);
11163  if a<>0 then check_existence(a);
11164  if b<>0 then check_existence(b);
11165  if c<>0 then check_existence(c);
11166  check_existence(d);
11167  end
11168
11169@ We check to see that the \.{TFM} file doesn't end prematurely; but
11170no error message is given for files having more than |lf| words.
11171
11172@<Read font parameters@>=
11173begin for k:=1 to np do
11174  if k=1 then {the |slant| parameter is a pure number}
11175    begin fget; sw:=fbyte; if sw>127 then sw:=sw-256;
11176    fget; sw:=sw*@'400+fbyte; fget; sw:=sw*@'400+fbyte;
11177    fget; font_info[param_base[f]].sc:=
11178      (sw*@'20)+(fbyte div@'20);
11179    end
11180  else store_scaled(font_info[param_base[f]+k-1].sc);
11181if eof(tfm_file) then abort;
11182for k:=np+1 to 7 do font_info[param_base[f]+k-1].sc:=0;
11183end
11184
11185@ Now to wrap it up, we have checked all the necessary things about the \.{TFM}
11186file, and all we need to do is put the finishing touches on the data for
11187the new font.
11188
11189@d adjust(#)==#[f]:=qo(#[f])
11190  {correct for the excess |min_quarterword| that was added}
11191
11192@<Make final adjustments...@>=
11193if np>=7 then font_params[f]:=np@+else font_params[f]:=7;
11194hyphen_char[f]:=default_hyphen_char; skew_char[f]:=default_skew_char;
11195if bch_label<nl then bchar_label[f]:=bch_label+lig_kern_base[f]
11196else bchar_label[f]:=non_address;
11197font_bchar[f]:=qi(bchar);
11198font_false_bchar[f]:=qi(bchar);
11199if bchar<=ec then if bchar>=bc then
11200  begin qw:=char_info(f)(bchar); {N.B.: not |qi(bchar)|}
11201  if char_exists(qw) then font_false_bchar[f]:=non_char;
11202  end;
11203font_name[f]:=nom;
11204font_area[f]:=aire;
11205font_bc[f]:=bc; font_ec[f]:=ec; font_glue[f]:=null;
11206adjust(char_base); adjust(width_base); adjust(lig_kern_base);
11207adjust(kern_base); adjust(exten_base);
11208decr(param_base[f]);
11209fmem_ptr:=fmem_ptr+lf; font_ptr:=f; g:=f; goto done
11210
11211@ Before we forget about the format of these tables, let's deal with two
11212of \TeX's basic scanning routines related to font information.
11213
11214@<Declare procedures that scan font-related stuff@>=
11215procedure scan_font_ident;
11216var f:internal_font_number;
11217@!m:halfword;
11218begin @<Get the next non-blank non-call...@>;
11219if cur_cmd=def_font then f:=cur_font
11220else if cur_cmd=set_font then f:=cur_chr
11221else if cur_cmd=def_family then
11222  begin m:=cur_chr; scan_four_bit_int; f:=equiv(m+cur_val);
11223  end
11224else  begin print_err("Missing font identifier");
11225@.Missing font identifier@>
11226  help2("I was looking for a control sequence whose")@/
11227  ("current meaning has been defined by \font.");
11228  back_error; f:=null_font;
11229  end;
11230cur_val:=f;
11231end;
11232
11233@ The following routine is used to implement `\.{\\fontdimen} |n| |f|'.
11234The boolean parameter |writing| is set |true| if the calling program
11235intends to change the parameter value.
11236
11237@<Declare procedures that scan font-related stuff@>=
11238procedure find_font_dimen(@!writing:boolean);
11239  {sets |cur_val| to |font_info| location}
11240var f:internal_font_number;
11241@!n:integer; {the parameter number}
11242begin scan_int; n:=cur_val; scan_font_ident; f:=cur_val;
11243if n<=0 then cur_val:=fmem_ptr
11244else  begin if writing and(n<=space_shrink_code)and@|
11245    (n>=space_code)and(font_glue[f]<>null) then
11246    begin delete_glue_ref(font_glue[f]);
11247    font_glue[f]:=null;
11248    end;
11249  if n>font_params[f] then
11250    if f<font_ptr then cur_val:=fmem_ptr
11251    else @<Increase the number of parameters in the last font@>
11252  else cur_val:=n+param_base[f];
11253  end;
11254@<Issue an error message if |cur_val=fmem_ptr|@>;
11255end;
11256
11257@ @<Issue an error message if |cur_val=fmem_ptr|@>=
11258if cur_val=fmem_ptr then
11259  begin print_err("Font "); print_esc(font_id_text(f));
11260  print(" has only "); print_int(font_params[f]);
11261  print(" fontdimen parameters");
11262@.Font x has only...@>
11263  help2("To increase the number of font parameters, you must")@/
11264    ("use \fontdimen immediately after the \font is loaded.");
11265  error;
11266  end
11267
11268@ @<Increase the number of parameters...@>=
11269begin repeat if fmem_ptr=font_mem_size then
11270  overflow("font memory",font_mem_size);
11271@:TeX capacity exceeded font memory}{\quad font memory@>
11272font_info[fmem_ptr].sc:=0; incr(fmem_ptr); incr(font_params[f]);
11273until n=font_params[f];
11274cur_val:=fmem_ptr-1; {this equals |param_base[f]+font_params[f]|}
11275end
11276
11277@ When \TeX\ wants to typeset a character that doesn't exist, the
11278character node is not created; thus the output routine can assume
11279that characters exist when it sees them. The following procedure
11280prints a warning message unless the user has suppressed it.
11281
11282@p procedure char_warning(@!f:internal_font_number;@!c:eight_bits);
11283begin if tracing_lost_chars>0 then
11284  begin begin_diagnostic;
11285  print_nl("Missing character: There is no ");
11286@.Missing character@>
11287  print_ASCII(c); print(" in font ");
11288  slow_print(font_name[f]); print_char("!"); end_diagnostic(false);
11289  end;
11290end;
11291
11292@ Here is a function that returns a pointer to a character node for a
11293given character in a given font. If that character doesn't exist,
11294|null| is returned instead.
11295
11296@p function new_character(@!f:internal_font_number;@!c:eight_bits):pointer;
11297label exit;
11298var p:pointer; {newly allocated node}
11299begin if font_bc[f]<=c then if font_ec[f]>=c then
11300  if char_exists(char_info(f)(qi(c))) then
11301    begin p:=get_avail; font(p):=f; character(p):=qi(c);
11302    new_character:=p; return;
11303    end;
11304char_warning(f,c);
11305new_character:=null;
11306exit:end;
11307
11308@* \[31] Device-independent file format.
11309The most important output produced by a run of \TeX\ is the ``device
11310independent'' (\.{DVI}) file that specifies where characters and rules
11311are to appear on printed pages. The form of these files was designed by
11312David R. Fuchs in 1979. Almost any reasonable typesetting device can be
11313@^Fuchs, David Raymond@>
11314@:DVI_files}{\.{DVI} files@>
11315driven by a program that takes \.{DVI} files as input, and dozens of such
11316\.{DVI}-to-whatever programs have been written. Thus, it is possible to
11317print the output of \TeX\ on many different kinds of equipment, using \TeX\
11318as a device-independent ``front end.''
11319
11320A \.{DVI} file is a stream of 8-bit bytes, which may be regarded as a
11321series of commands in a machine-like language. The first byte of each command
11322is the operation code, and this code is followed by zero or more bytes
11323that provide parameters to the command. The parameters themselves may consist
11324of several consecutive bytes; for example, the `|set_rule|' command has two
11325parameters, each of which is four bytes long. Parameters are usually
11326regarded as nonnegative integers; but four-byte-long parameters,
11327and shorter parameters that denote distances, can be
11328either positive or negative. Such parameters are given in two's complement
11329notation. For example, a two-byte-long distance parameter has a value between
11330$-2^{15}$ and $2^{15}-1$. As in \.{TFM} files, numbers that occupy
11331more than one byte position appear in BigEndian order.
11332
11333A \.{DVI} file consists of a ``preamble,'' followed by a sequence of one
11334or more ``pages,'' followed by a ``postamble.'' The preamble is simply a
11335|pre| command, with its parameters that define the dimensions used in the
11336file; this must come first.  Each ``page'' consists of a |bop| command,
11337followed by any number of other commands that tell where characters are to
11338be placed on a physical page, followed by an |eop| command. The pages
11339appear in the order that \TeX\ generated them. If we ignore |nop| commands
11340and \\{fnt\_def} commands (which are allowed between any two commands in
11341the file), each |eop| command is immediately followed by a |bop| command,
11342or by a |post| command; in the latter case, there are no more pages in the
11343file, and the remaining bytes form the postamble.  Further details about
11344the postamble will be explained later.
11345
11346Some parameters in \.{DVI} commands are ``pointers.'' These are four-byte
11347quantities that give the location number of some other byte in the file;
11348the first byte is number~0, then comes number~1, and so on. For example,
11349one of the parameters of a |bop| command points to the previous |bop|;
11350this makes it feasible to read the pages in backwards order, in case the
11351results are being directed to a device that stacks its output face up.
11352Suppose the preamble of a \.{DVI} file occupies bytes 0 to 99. Now if the
11353first page occupies bytes 100 to 999, say, and if the second
11354page occupies bytes 1000 to 1999, then the |bop| that starts in byte 1000
11355points to 100 and the |bop| that starts in byte 2000 points to 1000. (The
11356very first |bop|, i.e., the one starting in byte 100, has a pointer of~$-1$.)
11357
11358@ The \.{DVI} format is intended to be both compact and easily interpreted
11359by a machine. Compactness is achieved by making most of the information
11360implicit instead of explicit. When a \.{DVI}-reading program reads the
11361commands for a page, it keeps track of several quantities: (a)~The current
11362font |f| is an integer; this value is changed only
11363by \\{fnt} and \\{fnt\_num} commands. (b)~The current position on the page
11364is given by two numbers called the horizontal and vertical coordinates,
11365|h| and |v|. Both coordinates are zero at the upper left corner of the page;
11366moving to the right corresponds to increasing the horizontal coordinate, and
11367moving down corresponds to increasing the vertical coordinate. Thus, the
11368coordinates are essentially Cartesian, except that vertical directions are
11369flipped; the Cartesian version of |(h,v)| would be |(h,-v)|.  (c)~The
11370current spacing amounts are given by four numbers |w|, |x|, |y|, and |z|,
11371where |w| and~|x| are used for horizontal spacing and where |y| and~|z|
11372are used for vertical spacing. (d)~There is a stack containing
11373|(h,v,w,x,y,z)| values; the \.{DVI} commands |push| and |pop| are used to
11374change the current level of operation. Note that the current font~|f| is
11375not pushed and popped; the stack contains only information about
11376positioning.
11377
11378The values of |h|, |v|, |w|, |x|, |y|, and |z| are signed integers having up
11379to 32 bits, including the sign. Since they represent physical distances,
11380there is a small unit of measurement such that increasing |h| by~1 means
11381moving a certain tiny distance to the right. The actual unit of
11382measurement is variable, as explained below; \TeX\ sets things up so that
11383its \.{DVI} output is in sp units, i.e., scaled points, in agreement with
11384all the |scaled| dimensions in \TeX's data structures.
11385
11386@ Here is a list of all the commands that may appear in a \.{DVI} file. Each
11387command is specified by its symbolic name (e.g., |bop|), its opcode byte
11388(e.g., 139), and its parameters (if any). The parameters are followed
11389by a bracketed number telling how many bytes they occupy; for example,
11390`|p[4]|' means that parameter |p| is four bytes long.
11391
11392\yskip\hang|set_char_0| 0. Typeset character number~0 from font~|f|
11393such that the reference point of the character is at |(h,v)|. Then
11394increase |h| by the width of that character. Note that a character may
11395have zero or negative width, so one cannot be sure that |h| will advance
11396after this command; but |h| usually does increase.
11397
11398\yskip\hang\\{set\_char\_1} through \\{set\_char\_127} (opcodes 1 to 127).
11399Do the operations of |set_char_0|; but use the character whose number
11400matches the opcode, instead of character~0.
11401
11402\yskip\hang|set1| 128 |c[1]|. Same as |set_char_0|, except that character
11403number~|c| is typeset. \TeX82 uses this command for characters in the
11404range |128<=c<256|.
11405
11406\yskip\hang|@!set2| 129 |c[2]|. Same as |set1|, except that |c|~is two
11407bytes long, so it is in the range |0<=c<65536|. \TeX82 never uses this
11408command, but it should come in handy for extensions of \TeX\ that deal
11409with oriental languages.
11410@^oriental characters@>@^Chinese characters@>@^Japanese characters@>
11411
11412\yskip\hang|@!set3| 130 |c[3]|. Same as |set1|, except that |c|~is three
11413bytes long, so it can be as large as $2^{24}-1$. Not even the Chinese
11414language has this many characters, but this command might prove useful
11415in some yet unforeseen extension.
11416
11417\yskip\hang|@!set4| 131 |c[4]|. Same as |set1|, except that |c|~is four
11418bytes long. Imagine that.
11419
11420\yskip\hang|set_rule| 132 |a[4]| |b[4]|. Typeset a solid black rectangle
11421of height~|a| and width~|b|, with its bottom left corner at |(h,v)|. Then
11422set |h:=h+b|. If either |a<=0| or |b<=0|, nothing should be typeset. Note
11423that if |b<0|, the value of |h| will decrease even though nothing else happens.
11424See below for details about how to typeset rules so that consistency with
11425\MF\ is guaranteed.
11426
11427\yskip\hang|@!put1| 133 |c[1]|. Typeset character number~|c| from font~|f|
11428such that the reference point of the character is at |(h,v)|. (The `put'
11429commands are exactly like the `set' commands, except that they simply put out a
11430character or a rule without moving the reference point afterwards.)
11431
11432\yskip\hang|@!put2| 134 |c[2]|. Same as |set2|, except that |h| is not changed.
11433
11434\yskip\hang|@!put3| 135 |c[3]|. Same as |set3|, except that |h| is not changed.
11435
11436\yskip\hang|@!put4| 136 |c[4]|. Same as |set4|, except that |h| is not changed.
11437
11438\yskip\hang|put_rule| 137 |a[4]| |b[4]|. Same as |set_rule|, except that
11439|h| is not changed.
11440
11441\yskip\hang|nop| 138. No operation, do nothing. Any number of |nop|'s
11442may occur between \.{DVI} commands, but a |nop| cannot be inserted between
11443a command and its parameters or between two parameters.
11444
11445\yskip\hang|bop| 139 $c_0[4]$ $c_1[4]$ $\ldots$ $c_9[4]$ $p[4]$. Beginning
11446of a page: Set |(h,v,w,x,y,z):=(0,0,0,0,0,0)| and set the stack empty. Set
11447the current font |f| to an undefined value.  The ten $c_i$ parameters hold
11448the values of \.{\\count0} $\ldots$ \.{\\count9} in \TeX\ at the time
11449\.{\\shipout} was invoked for this page; they can be used to identify
11450pages, if a user wants to print only part of a \.{DVI} file. The parameter
11451|p| points to the previous |bop| in the file; the first
11452|bop| has $p=-1$.
11453
11454\yskip\hang|eop| 140.  End of page: Print what you have read since the
11455previous |bop|. At this point the stack should be empty. (The \.{DVI}-reading
11456programs that drive most output devices will have kept a buffer of the
11457material that appears on the page that has just ended. This material is
11458largely, but not entirely, in order by |v| coordinate and (for fixed |v|) by
11459|h|~coordinate; so it usually needs to be sorted into some order that is
11460appropriate for the device in question.)
11461
11462\yskip\hang|push| 141. Push the current values of |(h,v,w,x,y,z)| onto the
11463top of the stack; do not change any of these values. Note that |f| is
11464not pushed.
11465
11466\yskip\hang|pop| 142. Pop the top six values off of the stack and assign
11467them respectively to |(h,v,w,x,y,z)|. The number of pops should never
11468exceed the number of pushes, since it would be highly embarrassing if the
11469stack were empty at the time of a |pop| command.
11470
11471\yskip\hang|right1| 143 |b[1]|. Set |h:=h+b|, i.e., move right |b| units.
11472The parameter is a signed number in two's complement notation, |-128<=b<128|;
11473if |b<0|, the reference point moves left.
11474
11475\yskip\hang|right2| 144 |b[2]|. Same as |right1|, except that |b| is a
11476two-byte quantity in the range |-32768<=b<32768|.
11477
11478\yskip\hang|right3| 145 |b[3]|. Same as |right1|, except that |b| is a
11479three-byte quantity in the range |@t$-2^{23}$@><=b<@t$2^{23}$@>|.
11480
11481\yskip\hang|right4| 146 |b[4]|. Same as |right1|, except that |b| is a
11482four-byte quantity in the range |@t$-2^{31}$@><=b<@t$2^{31}$@>|.
11483
11484\yskip\hang|w0| 147. Set |h:=h+w|; i.e., move right |w| units. With luck,
11485this parameterless command will usually suffice, because the same kind of motion
11486will occur several times in succession; the following commands explain how
11487|w| gets particular values.
11488
11489\yskip\hang|w1| 148 |b[1]|. Set |w:=b| and |h:=h+b|. The value of |b| is a
11490signed quantity in two's complement notation, |-128<=b<128|. This command
11491changes the current |w|~spacing and moves right by |b|.
11492
11493\yskip\hang|@!w2| 149 |b[2]|. Same as |w1|, but |b| is two bytes long,
11494|-32768<=b<32768|.
11495
11496\yskip\hang|@!w3| 150 |b[3]|. Same as |w1|, but |b| is three bytes long,
11497|@t$-2^{23}$@><=b<@t$2^{23}$@>|.
11498
11499\yskip\hang|@!w4| 151 |b[4]|. Same as |w1|, but |b| is four bytes long,
11500|@t$-2^{31}$@><=b<@t$2^{31}$@>|.
11501
11502\yskip\hang|x0| 152. Set |h:=h+x|; i.e., move right |x| units. The `|x|'
11503commands are like the `|w|' commands except that they involve |x| instead
11504of |w|.
11505
11506\yskip\hang|x1| 153 |b[1]|. Set |x:=b| and |h:=h+b|. The value of |b| is a
11507signed quantity in two's complement notation, |-128<=b<128|. This command
11508changes the current |x|~spacing and moves right by |b|.
11509
11510\yskip\hang|@!x2| 154 |b[2]|. Same as |x1|, but |b| is two bytes long,
11511|-32768<=b<32768|.
11512
11513\yskip\hang|@!x3| 155 |b[3]|. Same as |x1|, but |b| is three bytes long,
11514|@t$-2^{23}$@><=b<@t$2^{23}$@>|.
11515
11516\yskip\hang|@!x4| 156 |b[4]|. Same as |x1|, but |b| is four bytes long,
11517|@t$-2^{31}$@><=b<@t$2^{31}$@>|.
11518
11519\yskip\hang|down1| 157 |a[1]|. Set |v:=v+a|, i.e., move down |a| units.
11520The parameter is a signed number in two's complement notation, |-128<=a<128|;
11521if |a<0|, the reference point moves up.
11522
11523\yskip\hang|@!down2| 158 |a[2]|. Same as |down1|, except that |a| is a
11524two-byte quantity in the range |-32768<=a<32768|.
11525
11526\yskip\hang|@!down3| 159 |a[3]|. Same as |down1|, except that |a| is a
11527three-byte quantity in the range |@t$-2^{23}$@><=a<@t$2^{23}$@>|.
11528
11529\yskip\hang|@!down4| 160 |a[4]|. Same as |down1|, except that |a| is a
11530four-byte quantity in the range |@t$-2^{31}$@><=a<@t$2^{31}$@>|.
11531
11532\yskip\hang|y0| 161. Set |v:=v+y|; i.e., move down |y| units. With luck,
11533this parameterless command will usually suffice, because the same kind of motion
11534will occur several times in succession; the following commands explain how
11535|y| gets particular values.
11536
11537\yskip\hang|y1| 162 |a[1]|. Set |y:=a| and |v:=v+a|. The value of |a| is a
11538signed quantity in two's complement notation, |-128<=a<128|. This command
11539changes the current |y|~spacing and moves down by |a|.
11540
11541\yskip\hang|@!y2| 163 |a[2]|. Same as |y1|, but |a| is two bytes long,
11542|-32768<=a<32768|.
11543
11544\yskip\hang|@!y3| 164 |a[3]|. Same as |y1|, but |a| is three bytes long,
11545|@t$-2^{23}$@><=a<@t$2^{23}$@>|.
11546
11547\yskip\hang|@!y4| 165 |a[4]|. Same as |y1|, but |a| is four bytes long,
11548|@t$-2^{31}$@><=a<@t$2^{31}$@>|.
11549
11550\yskip\hang|z0| 166. Set |v:=v+z|; i.e., move down |z| units. The `|z|' commands
11551are like the `|y|' commands except that they involve |z| instead of |y|.
11552
11553\yskip\hang|z1| 167 |a[1]|. Set |z:=a| and |v:=v+a|. The value of |a| is a
11554signed quantity in two's complement notation, |-128<=a<128|. This command
11555changes the current |z|~spacing and moves down by |a|.
11556
11557\yskip\hang|@!z2| 168 |a[2]|. Same as |z1|, but |a| is two bytes long,
11558|-32768<=a<32768|.
11559
11560\yskip\hang|@!z3| 169 |a[3]|. Same as |z1|, but |a| is three bytes long,
11561|@t$-2^{23}$@><=a<@t$2^{23}$@>|.
11562
11563\yskip\hang|@!z4| 170 |a[4]|. Same as |z1|, but |a| is four bytes long,
11564|@t$-2^{31}$@><=a<@t$2^{31}$@>|.
11565
11566\yskip\hang|fnt_num_0| 171. Set |f:=0|. Font 0 must previously have been
11567defined by a \\{fnt\_def} instruction, as explained below.
11568
11569\yskip\hang\\{fnt\_num\_1} through \\{fnt\_num\_63} (opcodes 172 to 234). Set
11570|f:=1|, \dots, \hbox{|f:=63|}, respectively.
11571
11572\yskip\hang|fnt1| 235 |k[1]|. Set |f:=k|. \TeX82 uses this command for font
11573numbers in the range |64<=k<256|.
11574
11575\yskip\hang|@!fnt2| 236 |k[2]|. Same as |fnt1|, except that |k|~is two
11576bytes long, so it is in the range |0<=k<65536|. \TeX82 never generates this
11577command, but large font numbers may prove useful for specifications of
11578color or texture, or they may be used for special fonts that have fixed
11579numbers in some external coding scheme.
11580
11581\yskip\hang|@!fnt3| 237 |k[3]|. Same as |fnt1|, except that |k|~is three
11582bytes long, so it can be as large as $2^{24}-1$.
11583
11584\yskip\hang|@!fnt4| 238 |k[4]|. Same as |fnt1|, except that |k|~is four
11585bytes long; this is for the really big font numbers (and for the negative ones).
11586
11587\yskip\hang|xxx1| 239 |k[1]| |x[k]|. This command is undefined in
11588general; it functions as a $(k+2)$-byte |nop| unless special \.{DVI}-reading
11589programs are being used. \TeX82 generates |xxx1| when a short enough
11590\.{\\special} appears, setting |k| to the number of bytes being sent. It
11591is recommended that |x| be a string having the form of a keyword followed
11592by possible parameters relevant to that keyword.
11593
11594\yskip\hang|@!xxx2| 240 |k[2]| |x[k]|. Like |xxx1|, but |0<=k<65536|.
11595
11596\yskip\hang|@!xxx3| 241 |k[3]| |x[k]|. Like |xxx1|, but |0<=k<@t$2^{24}$@>|.
11597
11598\yskip\hang|xxx4| 242 |k[4]| |x[k]|. Like |xxx1|, but |k| can be ridiculously
11599large. \TeX82 uses |xxx4| when sending a string of length 256 or more.
11600
11601\yskip\hang|fnt_def1| 243 |k[1]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|.
11602Define font |k|, where |0<=k<256|; font definitions will be explained shortly.
11603
11604\yskip\hang|@!fnt_def2| 244 |k[2]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|.
11605Define font |k|, where |0<=k<65536|.
11606
11607\yskip\hang|@!fnt_def3| 245 |k[3]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|.
11608Define font |k|, where |0<=k<@t$2^{24}$@>|.
11609
11610\yskip\hang|@!fnt_def4| 246 |k[4]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|.
11611Define font |k|, where |@t$-2^{31}$@><=k<@t$2^{31}$@>|.
11612
11613\yskip\hang|pre| 247 |i[1]| |num[4]| |den[4]| |mag[4]| |k[1]| |x[k]|.
11614Beginning of the preamble; this must come at the very beginning of the
11615file. Parameters |i|, |num|, |den|, |mag|, |k|, and |x| are explained below.
11616
11617\yskip\hang|post| 248. Beginning of the postamble, see below.
11618
11619\yskip\hang|post_post| 249. Ending of the postamble, see below.
11620
11621\yskip\noindent Commands 250--255 are undefined at the present time.
11622
11623@ @d set_char_0=0 {typeset character 0 and move right}
11624@d set1=128 {typeset a character and move right}
11625@d set_rule=132 {typeset a rule and move right}
11626@d put_rule=137 {typeset a rule}
11627@d nop=138 {no operation}
11628@d bop=139 {beginning of page}
11629@d eop=140 {ending of page}
11630@d push=141 {save the current positions}
11631@d pop=142 {restore previous positions}
11632@d right1=143 {move right}
11633@d w0=147 {move right by |w|}
11634@d w1=148 {move right and set |w|}
11635@d x0=152 {move right by |x|}
11636@d x1=153 {move right and set |x|}
11637@d down1=157 {move down}
11638@d y0=161 {move down by |y|}
11639@d y1=162 {move down and set |y|}
11640@d z0=166 {move down by |z|}
11641@d z1=167 {move down and set |z|}
11642@d fnt_num_0=171 {set current font to 0}
11643@d fnt1=235 {set current font}
11644@d xxx1=239 {extension to \.{DVI} primitives}
11645@d xxx4=242 {potentially long extension to \.{DVI} primitives}
11646@d fnt_def1=243 {define the meaning of a font number}
11647@d pre=247 {preamble}
11648@d post=248 {postamble beginning}
11649@d post_post=249 {postamble ending}
11650
11651@ The preamble contains basic information about the file as a whole. As
11652stated above, there are six parameters:
11653$$\hbox{|@!i[1]| |@!num[4]| |@!den[4]| |@!mag[4]| |@!k[1]| |@!x[k]|.}$$
11654The |i| byte identifies \.{DVI} format; currently this byte is always set
11655to~2. (The value |i=3| is currently used for an extended format that
11656allows a mixture of right-to-left and left-to-right typesetting.
11657Some day we will set |i=4|, when \.{DVI} format makes another
11658incompatible change---perhaps in the year 2048.)
11659
11660The next two parameters, |num| and |den|, are positive integers that define
11661the units of measurement; they are the numerator and denominator of a
11662fraction by which all dimensions in the \.{DVI} file could be multiplied
11663in order to get lengths in units of $10^{-7}$ meters. Since $\rm 7227{pt} =
11664254{cm}$, and since \TeX\ works with scaled points where there are $2^{16}$
11665sp in a point, \TeX\ sets
11666$|num|/|den|=(254\cdot10^5)/(7227\cdot2^{16})=25400000/473628672$.
11667@^sp@>
11668
11669The |mag| parameter is what \TeX\ calls \.{\\mag}, i.e., 1000 times the
11670desired magnification. The actual fraction by which dimensions are
11671multiplied is therefore $|mag|\cdot|num|/1000|den|$. Note that if a \TeX\
11672source document does not call for any `\.{true}' dimensions, and if you
11673change it only by specifying a different \.{\\mag} setting, the \.{DVI}
11674file that \TeX\ creates will be completely unchanged except for the value
11675of |mag| in the preamble and postamble. (Fancy \.{DVI}-reading programs allow
11676users to override the |mag|~setting when a \.{DVI} file is being printed.)
11677
11678Finally, |k| and |x| allow the \.{DVI} writer to include a comment, which is not
11679interpreted further. The length of comment |x| is |k|, where |0<=k<256|.
11680
11681@d id_byte=2 {identifies the kind of \.{DVI} files described here}
11682
11683@ Font definitions for a given font number |k| contain further parameters
11684$$\hbox{|c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|.}$$
11685The four-byte value |c| is the check sum that \TeX\ found in the \.{TFM}
11686file for this font; |c| should match the check sum of the font found by
11687programs that read this \.{DVI} file.
11688@^check sum@>
11689
11690Parameter |s| contains a fixed-point scale factor that is applied to
11691the character widths in font |k|; font dimensions in \.{TFM} files and
11692other font files are relative to this quantity, which is called the
11693``at size'' elsewhere in this documentation. The value of |s| is
11694always positive and less than $2^{27}$. It is given in the same units
11695as the other \.{DVI} dimensions, i.e., in sp when \TeX82 has made the
11696file.  Parameter |d| is similar to |s|; it is the ``design size,'' and
11697(like~|s|) it is given in \.{DVI} units. Thus, font |k| is to be used
11698at $|mag|\cdot s/1000d$ times its normal size.
11699
11700The remaining part of a font definition gives the external name of the font,
11701which is an ASCII string of length |a+l|. The number |a| is the length
11702of the ``area'' or directory, and |l| is the length of the font name itself;
11703the standard local system font area is supposed to be used when |a=0|.
11704The |n| field contains the area in its first |a| bytes.
11705
11706Font definitions must appear before the first use of a particular font number.
11707Once font |k| is defined, it must not be defined again; however, we
11708shall see below that font definitions appear in the postamble as well as
11709in the pages, so in this sense each font number is defined exactly twice,
11710if at all. Like |nop| commands, font definitions can
11711appear before the first |bop|, or between an |eop| and a |bop|.
11712
11713@ Sometimes it is desirable to make horizontal or vertical rules line up
11714precisely with certain features in characters of a font. It is possible to
11715guarantee the correct matching between \.{DVI} output and the characters
11716generated by \MF\ by adhering to the following principles: (1)~The \MF\
11717characters should be positioned so that a bottom edge or left edge that is
11718supposed to line up with the bottom or left edge of a rule appears at the
11719reference point, i.e., in row~0 and column~0 of the \MF\ raster. This
11720ensures that the position of the rule will not be rounded differently when
11721the pixel size is not a perfect multiple of the units of measurement in
11722the \.{DVI} file. (2)~A typeset rule of height $a>0$ and width $b>0$
11723should be equivalent to a \MF-generated character having black pixels in
11724precisely those raster positions whose \MF\ coordinates satisfy
11725|0<=x<@t$\alpha$@>b| and |0<=y<@t$\alpha$@>a|, where $\alpha$ is the number
11726of pixels per \.{DVI} unit.
11727@:METAFONT}{\MF@>
11728@^alignment of rules with characters@>
11729@^rules aligning with characters@>
11730
11731@ The last page in a \.{DVI} file is followed by `|post|'; this command
11732introduces the postamble, which summarizes important facts that \TeX\ has
11733accumulated about the file, making it possible to print subsets of the data
11734with reasonable efficiency. The postamble has the form
11735$$\vbox{\halign{\hbox{#\hfil}\cr
11736  |post| |p[4]| |num[4]| |den[4]| |mag[4]| |l[4]| |u[4]| |s[2]| |t[2]|\cr
11737  $\langle\,$font definitions$\,\rangle$\cr
11738  |post_post| |q[4]| |i[1]| 223's$[{\G}4]$\cr}}$$
11739Here |p| is a pointer to the final |bop| in the file. The next three
11740parameters, |num|, |den|, and |mag|, are duplicates of the quantities that
11741appeared in the preamble.
11742
11743Parameters |l| and |u| give respectively the height-plus-depth of the tallest
11744page and the width of the widest page, in the same units as other dimensions
11745of the file. These numbers might be used by a \.{DVI}-reading program to
11746position individual ``pages'' on large sheets of film or paper; however,
11747the standard convention for output on normal size paper is to position each
11748page so that the upper left-hand corner is exactly one inch from the left
11749and the top. Experience has shown that it is unwise to design \.{DVI}-to-printer
11750software that attempts cleverly to center the output; a fixed position of
11751the upper left corner is easiest for users to understand and to work with.
11752Therefore |l| and~|u| are often ignored.
11753
11754Parameter |s| is the maximum stack depth (i.e., the largest excess of
11755|push| commands over |pop| commands) needed to process this file. Then
11756comes |t|, the total number of pages (|bop| commands) present.
11757
11758The postamble continues with font definitions, which are any number of
11759\\{fnt\_def} commands as described above, possibly interspersed with |nop|
11760commands. Each font number that is used in the \.{DVI} file must be defined
11761exactly twice: Once before it is first selected by a \\{fnt} command, and once
11762in the postamble.
11763
11764@ The last part of the postamble, following the |post_post| byte that
11765signifies the end of the font definitions, contains |q|, a pointer to the
11766|post| command that started the postamble.  An identification byte, |i|,
11767comes next; this currently equals~2, as in the preamble.
11768
11769The |i| byte is followed by four or more bytes that are all equal to
11770the decimal number 223 (i.e., @'337 in octal). \TeX\ puts out four to seven of
11771these trailing bytes, until the total length of the file is a multiple of
11772four bytes, since this works out best on machines that pack four bytes per
11773word; but any number of 223's is allowed, as long as there are at least four
11774of them. In effect, 223 is a sort of signature that is added at the very end.
11775@^Fuchs, David Raymond@>
11776
11777This curious way to finish off a \.{DVI} file makes it feasible for
11778\.{DVI}-reading programs to find the postamble first, on most computers,
11779even though \TeX\ wants to write the postamble last. Most operating
11780systems permit random access to individual words or bytes of a file, so
11781the \.{DVI} reader can start at the end and skip backwards over the 223's
11782until finding the identification byte. Then it can back up four bytes, read
11783|q|, and move to byte |q| of the file. This byte should, of course,
11784contain the value 248 (|post|); now the postamble can be read, so the
11785\.{DVI} reader can discover all the information needed for typesetting the
11786pages. Note that it is also possible to skip through the \.{DVI} file at
11787reasonably high speed to locate a particular page, if that proves
11788desirable. This saves a lot of time, since \.{DVI} files used in production
11789jobs tend to be large.
11790
11791Unfortunately, however, standard \PASCAL\ does not include the ability to
11792@^system dependencies@>
11793access a random position in a file, or even to determine the length of a file.
11794Almost all systems nowadays provide the necessary capabilities, so \.{DVI}
11795format has been designed to work most efficiently with modern operating systems.
11796But if \.{DVI} files have to be processed under the restrictions of standard
11797\PASCAL, one can simply read them from front to back, since the necessary
11798header information is present in the preamble and in the font definitions.
11799(The |l| and |u| and |s| and |t| parameters, which appear only in the
11800postamble, are ``frills'' that are handy but not absolutely necessary.)
11801
11802@* \[32] Shipping pages out.
11803After considering \TeX's eyes and stomach, we come now to the bowels.
11804@^bowels@>
11805
11806The |ship_out| procedure is given a pointer to a box; its mission is
11807to describe that box in \.{DVI} form, outputting a ``page'' to |dvi_file|.
11808The \.{DVI} coordinates $(h,v)=(0,0)$ should correspond to the upper left
11809corner of the box being shipped.
11810
11811Since boxes can be inside of boxes inside of boxes, the main work of
11812|ship_out| is done by two mutually recursive routines, |hlist_out|
11813and |vlist_out|, which traverse the hlists and vlists inside of horizontal
11814and vertical boxes.
11815
11816As individual pages are being processed, we need to accumulate
11817information about the entire set of pages, since such statistics must be
11818reported in the postamble. The global variables |total_pages|, |max_v|,
11819|max_h|, |max_push|, and |last_bop| are used to record this information.
11820
11821The variable |doing_leaders| is |true| while leaders are being output.
11822The variable |dead_cycles| contains the number of times an output routine
11823has been initiated since the last |ship_out|.
11824
11825A few additional global variables are also defined here for use in
11826|vlist_out| and |hlist_out|. They could have been local variables, but
11827that would waste stack space when boxes are deeply nested, since the
11828values of these variables are not needed during recursive calls.
11829@^recursion@>
11830
11831@<Glob...@>=
11832@!total_pages:integer; {the number of pages that have been shipped out}
11833@!max_v:scaled; {maximum height-plus-depth of pages shipped so far}
11834@!max_h:scaled; {maximum width of pages shipped so far}
11835@!max_push:integer; {deepest nesting of |push| commands encountered so far}
11836@!last_bop:integer; {location of previous |bop| in the \.{DVI} output}
11837@!dead_cycles:integer; {recent outputs that didn't ship anything out}
11838@!doing_leaders:boolean; {are we inside a leader box?}
11839@#
11840@!c,@!f:quarterword; {character and font in current |char_node|}
11841@!rule_ht,@!rule_dp,@!rule_wd:scaled; {size of current rule being output}
11842@!g:pointer; {current glue specification}
11843@!lq,@!lr:integer; {quantities used in calculations for leaders}
11844
11845@ @<Set init...@>=
11846total_pages:=0; max_v:=0; max_h:=0; max_push:=0; last_bop:=-1;
11847doing_leaders:=false; dead_cycles:=0; cur_s:=-1;
11848
11849@ The \.{DVI} bytes are output to a buffer instead of being written directly
11850to the output file. This makes it possible to reduce the overhead of
11851subroutine calls, thereby measurably speeding up the computation, since
11852output of \.{DVI} bytes is part of \TeX's inner loop. And it has another
11853advantage as well, since we can change instructions in the buffer in order to
11854make the output more compact. For example, a `|down2|' command can be
11855changed to a `|y2|', thereby making a subsequent `|y0|' command possible,
11856saving two bytes.
11857
11858The output buffer is divided into two parts of equal size; the bytes found
11859in |dvi_buf[0..half_buf-1]| constitute the first half, and those in
11860|dvi_buf[half_buf..dvi_buf_size-1]| constitute the second. The global
11861variable |dvi_ptr| points to the position that will receive the next
11862output byte. When |dvi_ptr| reaches |dvi_limit|, which is always equal
11863to one of the two values |half_buf| or |dvi_buf_size|, the half buffer that
11864is about to be invaded next is sent to the output and |dvi_limit| is
11865changed to its other value. Thus, there is always at least a half buffer's
11866worth of information present, except at the very beginning of the job.
11867
11868Bytes of the \.{DVI} file are numbered sequentially starting with 0;
11869the next byte to be generated will be number |dvi_offset+dvi_ptr|.
11870A byte is present in the buffer only if its number is |>=dvi_gone|.
11871
11872@<Types...@>=
11873@!dvi_index=0..dvi_buf_size; {an index into the output buffer}
11874
11875@ Some systems may find it more efficient to make |dvi_buf| a |packed|
11876array, since output of four bytes at once may be facilitated.
11877@^system dependencies@>
11878
11879@<Glob...@>=
11880@!dvi_buf:array[dvi_index] of eight_bits; {buffer for \.{DVI} output}
11881@!half_buf:dvi_index; {half of |dvi_buf_size|}
11882@!dvi_limit:dvi_index; {end of the current half buffer}
11883@!dvi_ptr:dvi_index; {the next available buffer address}
11884@!dvi_offset:integer; {|dvi_buf_size| times the number of times the
11885  output buffer has been fully emptied}
11886@!dvi_gone:integer; {the number of bytes already output to |dvi_file|}
11887
11888@ Initially the buffer is all in one piece; we will output half of it only
11889after it first fills up.
11890
11891@<Set init...@>=
11892half_buf:=dvi_buf_size div 2; dvi_limit:=dvi_buf_size; dvi_ptr:=0;
11893dvi_offset:=0; dvi_gone:=0;
11894
11895@ The actual output of |dvi_buf[a..b]| to |dvi_file| is performed by calling
11896|write_dvi(a,b)|. For best results, this procedure should be optimized to
11897run as fast as possible on each particular system, since it is part of
11898\TeX's inner loop. It is safe to assume that |a| and |b+1| will both be
11899multiples of 4 when |write_dvi(a,b)| is called; therefore it is possible on
11900many machines to use efficient methods to pack four bytes per word and to
11901output an array of words with one system call.
11902@^system dependencies@>
11903@^inner loop@>
11904@^defecation@>
11905
11906@p procedure write_dvi(@!a,@!b:dvi_index);
11907var k:dvi_index;
11908begin for k:=a to b do write(dvi_file,dvi_buf[k]);
11909end;
11910
11911@ To put a byte in the buffer without paying the cost of invoking a procedure
11912each time, we use the macro |dvi_out|.
11913
11914@d dvi_out(#)==@+begin dvi_buf[dvi_ptr]:=#; incr(dvi_ptr);
11915  if dvi_ptr=dvi_limit then dvi_swap;
11916  end
11917
11918@p procedure dvi_swap; {outputs half of the buffer}
11919begin if dvi_limit=dvi_buf_size then
11920  begin write_dvi(0,half_buf-1); dvi_limit:=half_buf;
11921  dvi_offset:=dvi_offset+dvi_buf_size; dvi_ptr:=0;
11922  end
11923else  begin write_dvi(half_buf,dvi_buf_size-1); dvi_limit:=dvi_buf_size;
11924  end;
11925dvi_gone:=dvi_gone+half_buf;
11926end;
11927
11928@ Here is how we clean out the buffer when \TeX\ is all through; |dvi_ptr|
11929will be a multiple of~4.
11930
11931@<Empty the last bytes out of |dvi_buf|@>=
11932if dvi_limit=half_buf then write_dvi(half_buf,dvi_buf_size-1);
11933if dvi_ptr>0 then write_dvi(0,dvi_ptr-1)
11934
11935@ The |dvi_four| procedure outputs four bytes in two's complement notation,
11936without risking arithmetic overflow.
11937
11938@p procedure dvi_four(@!x:integer);
11939begin if x>=0 then dvi_out(x div @'100000000)
11940else  begin x:=x+@'10000000000;
11941  x:=x+@'10000000000;
11942  dvi_out((x div @'100000000) + 128);
11943  end;
11944x:=x mod @'100000000; dvi_out(x div @'200000);
11945x:=x mod @'200000; dvi_out(x div @'400);
11946dvi_out(x mod @'400);
11947end;
11948
11949@ A mild optimization of the output is performed by the |dvi_pop|
11950routine, which issues a |pop| unless it is possible to cancel a
11951`|push| |pop|' pair. The parameter to |dvi_pop| is the byte address
11952following the old |push| that matches the new |pop|.
11953
11954@p procedure dvi_pop(@!l:integer);
11955begin if (l=dvi_offset+dvi_ptr)and(dvi_ptr>0) then decr(dvi_ptr)
11956else dvi_out(pop);
11957end;
11958
11959@ Here's a procedure that outputs a font definition. Since \TeX82 uses at
11960most 256 different fonts per job, |fnt_def1| is always used as the command code.
11961
11962@p procedure dvi_font_def(@!f:internal_font_number);
11963var k:pool_pointer; {index into |str_pool|}
11964begin dvi_out(fnt_def1);
11965dvi_out(f-font_base-1);@/
11966dvi_out(qo(font_check[f].b0));
11967dvi_out(qo(font_check[f].b1));
11968dvi_out(qo(font_check[f].b2));
11969dvi_out(qo(font_check[f].b3));@/
11970dvi_four(font_size[f]);
11971dvi_four(font_dsize[f]);@/
11972dvi_out(length(font_area[f]));
11973dvi_out(length(font_name[f]));
11974@<Output the font name whose internal number is |f|@>;
11975end;
11976
11977@ @<Output the font name whose internal number is |f|@>=
11978for k:=str_start[font_area[f]] to str_start[font_area[f]+1]-1 do
11979  dvi_out(so(str_pool[k]));
11980for k:=str_start[font_name[f]] to str_start[font_name[f]+1]-1 do
11981  dvi_out(so(str_pool[k]))
11982
11983@ Versions of \TeX\ intended for small computers might well choose to omit
11984the ideas in the next few parts of this program, since it is not really
11985necessary to optimize the \.{DVI} code by making use of the |w0|, |x0|,
11986|y0|, and |z0| commands. Furthermore, the algorithm that we are about to
11987describe does not pretend to give an optimum reduction in the length
11988of the \.{DVI} code; after all, speed is more important than compactness.
11989But the method is surprisingly effective, and it takes comparatively little
11990time.
11991
11992We can best understand the basic idea by first considering a simpler problem
11993that has the same essential characteristics. Given a sequence of digits,
11994say $3\,1\,4\,1\,5\,9\,2\,6\,5\,3\,5\,8\,9$, we want to assign subscripts
11995$d$, $y$, or $z$ to each digit so as to maximize the number of ``$y$-hits''
11996and ``$z$-hits''; a $y$-hit is an instance of two appearances of the same
11997digit with the subscript $y$, where no $y$'s intervene between the two
11998appearances, and a $z$-hit is defined similarly. For example, the sequence
11999above could be decorated with subscripts as follows:
12000$$3_z\,1_y\,4_d\,1_y\,5_y\,9_d\,2_d\,6_d\,5_y\,3_z\,5_y\,8_d\,9_d.$$
12001There are three $y$-hits ($1_y\ldots1_y$ and $5_y\ldots5_y\ldots5_y$) and
12002one $z$-hit ($3_z\ldots3_z$); there are no $d$-hits, since the two appearances
12003of $9_d$ have $d$'s between them, but we don't count $d$-hits so it doesn't
12004matter how many there are. These subscripts are analogous to the \.{DVI}
12005commands called \\{down}, $y$, and $z$, and the digits are analogous to
12006different amounts of vertical motion; a $y$-hit or $z$-hit corresponds to
12007the opportunity to use the one-byte commands |y0| or |z0| in a \.{DVI} file.
12008
12009\TeX's method of assigning subscripts works like this: Append a new digit,
12010say $\delta$, to the right of the sequence. Now look back through the
12011sequence until one of the following things happens: (a)~You see
12012$\delta_y$ or $\delta_z$, and this was the first time you encountered a
12013$y$ or $z$ subscript, respectively.  Then assign $y$ or $z$ to the new
12014$\delta$; you have scored a hit. (b)~You see $\delta_d$, and no $y$
12015subscripts have been encountered so far during this search.  Then change
12016the previous $\delta_d$ to $\delta_y$ (this corresponds to changing a
12017command in the output buffer), and assign $y$ to the new $\delta$; it's
12018another hit.  (c)~You see $\delta_d$, and a $y$ subscript has been seen
12019but not a $z$.  Change the previous $\delta_d$ to $\delta_z$ and assign
12020$z$ to the new $\delta$. (d)~You encounter both $y$ and $z$ subscripts
12021before encountering a suitable $\delta$, or you scan all the way to the
12022front of the sequence. Assign $d$ to the new $\delta$; this assignment may
12023be changed later.
12024
12025The subscripts $3_z\,1_y\,4_d\ldots\,$ in the example above were, in fact,
12026produced by this procedure, as the reader can verify. (Go ahead and try it.)
12027
12028@ In order to implement such an idea, \TeX\ maintains a stack of pointers
12029to the \\{down}, $y$, and $z$ commands that have been generated for the
12030current page. And there is a similar stack for \\{right}, |w|, and |x|
12031commands. These stacks are called the down stack and right stack, and their
12032top elements are maintained in the variables |down_ptr| and |right_ptr|.
12033
12034Each entry in these stacks contains four fields: The |width| field is
12035the amount of motion down or to the right; the |location| field is the
12036byte number of the \.{DVI} command in question (including the appropriate
12037|dvi_offset|); the |link| field points to the next item below this one
12038on the stack; and the |info| field encodes the options for possible change
12039in the \.{DVI} command.
12040
12041@d movement_node_size=3 {number of words per entry in the down and right stacks}
12042@d location(#)==mem[#+2].int {\.{DVI} byte number for a movement command}
12043
12044@<Glob...@>=
12045@!down_ptr,@!right_ptr:pointer; {heads of the down and right stacks}
12046
12047@ @<Set init...@>=
12048down_ptr:=null; right_ptr:=null;
12049
12050@ Here is a subroutine that produces a \.{DVI} command for some specified
12051downward or rightward motion. It has two parameters: |w| is the amount
12052of motion, and |o| is either |down1| or |right1|. We use the fact that
12053the command codes have convenient arithmetic properties: |y1-down1=w1-right1|
12054and |z1-down1=x1-right1|.
12055
12056@p procedure movement(@!w:scaled;@!o:eight_bits);
12057label exit,found,not_found,2,1;
12058var mstate:small_number; {have we seen a |y| or |z|?}
12059@!p,@!q:pointer; {current and top nodes on the stack}
12060@!k:integer; {index into |dvi_buf|, modulo |dvi_buf_size|}
12061begin q:=get_node(movement_node_size); {new node for the top of the stack}
12062width(q):=w; location(q):=dvi_offset+dvi_ptr;
12063if o=down1 then
12064  begin link(q):=down_ptr; down_ptr:=q;
12065  end
12066else  begin link(q):=right_ptr; right_ptr:=q;
12067  end;
12068@<Look at the other stack entries until deciding what sort of \.{DVI} command
12069  to generate; |goto found| if node |p| is a ``hit''@>;
12070@<Generate a |down| or |right| command for |w| and |return|@>;
12071found: @<Generate a |y0| or |z0| command in order to reuse a previous
12072  appearance of~|w|@>;
12073exit:end;
12074
12075@ The |info| fields in the entries of the down stack or the right stack
12076have six possible settings: |y_here| or |z_here| mean that the \.{DVI}
12077command refers to |y| or |z|, respectively (or to |w| or |x|, in the
12078case of horizontal motion); |yz_OK| means that the \.{DVI} command is
12079\\{down} (or \\{right}) but can be changed to either |y| or |z| (or
12080to either |w| or |x|); |y_OK| means that it is \\{down} and can be changed
12081to |y| but not |z|; |z_OK| is similar; and |d_fixed| means it must stay
12082\\{down}.
12083
12084The four settings |yz_OK|, |y_OK|, |z_OK|, |d_fixed| would not need to
12085be distinguished from each other if we were simply solving the
12086digit-subscripting problem mentioned above. But in \TeX's case there is
12087a complication because of the nested structure of |push| and |pop|
12088commands. Suppose we add parentheses to the digit-subscripting problem,
12089redefining hits so that $\delta_y\ldots \delta_y$ is a hit if all $y$'s between
12090the $\delta$'s are enclosed in properly nested parentheses, and if the
12091parenthesis level of the right-hand $\delta_y$ is deeper than or equal to
12092that of the left-hand one. Thus, `(' and `)' correspond to `|push|'
12093and `|pop|'. Now if we want to assign a subscript to the final 1 in the
12094sequence
12095$$2_y\,7_d\,1_d\,(\,8_z\,2_y\,8_z\,)\,1$$
12096we cannot change the previous $1_d$ to $1_y$, since that would invalidate
12097the $2_y\ldots2_y$ hit. But we can change it to $1_z$, scoring a hit
12098since the intervening $8_z$'s are enclosed in parentheses.
12099
12100The program below removes movement nodes that are introduced after a |push|,
12101before it outputs the corresponding |pop|.
12102
12103@d y_here=1 {|info| when the movement entry points to a |y| command}
12104@d z_here=2 {|info| when the movement entry points to a |z| command}
12105@d yz_OK=3 {|info| corresponding to an unconstrained \\{down} command}
12106@d y_OK=4 {|info| corresponding to a \\{down} that can't become a |z|}
12107@d z_OK=5 {|info| corresponding to a \\{down} that can't become a |y|}
12108@d d_fixed=6 {|info| corresponding to a \\{down} that can't change}
12109
12110@ When the |movement| procedure gets to the label |found|, the value of
12111|info(p)| will be either |y_here| or |z_here|. If it is, say, |y_here|,
12112the procedure generates a |y0| command (or a |w0| command), and marks
12113all |info| fields between |q| and |p| so that |y| is not OK in that range.
12114
12115@<Generate a |y0| or |z0| command...@>=
12116info(q):=info(p);
12117if info(q)=y_here then
12118  begin dvi_out(o+y0-down1); {|y0| or |w0|}
12119  while link(q)<>p do
12120    begin q:=link(q);
12121    case info(q) of
12122    yz_OK: info(q):=z_OK;
12123    y_OK: info(q):=d_fixed;
12124    othercases do_nothing
12125    endcases;
12126    end;
12127  end
12128else  begin dvi_out(o+z0-down1); {|z0| or |x0|}
12129  while link(q)<>p do
12130    begin q:=link(q);
12131    case info(q) of
12132    yz_OK: info(q):=y_OK;
12133    z_OK: info(q):=d_fixed;
12134    othercases do_nothing
12135    endcases;
12136    end;
12137  end
12138
12139@ @<Generate a |down| or |right|...@>=
12140info(q):=yz_OK;
12141if abs(w)>=@'40000000 then
12142  begin dvi_out(o+3); {|down4| or |right4|}
12143  dvi_four(w); return;
12144  end;
12145if abs(w)>=@'100000 then
12146  begin dvi_out(o+2); {|down3| or |right3|}
12147  if w<0 then w:=w+@'100000000;
12148  dvi_out(w div @'200000); w:=w mod @'200000; goto 2;
12149  end;
12150if abs(w)>=@'200 then
12151  begin dvi_out(o+1); {|down2| or |right2|}
12152  if w<0 then w:=w+@'200000;
12153  goto 2;
12154  end;
12155dvi_out(o); {|down1| or |right1|}
12156if w<0 then w:=w+@'400;
12157goto 1;
121582: dvi_out(w div @'400);
121591: dvi_out(w mod @'400); return
12160
12161@ As we search through the stack, we are in one of three states,
12162|y_seen|, |z_seen|, or |none_seen|, depending on whether we have
12163encountered |y_here| or |z_here| nodes. These states are encoded as
12164multiples of 6, so that they can be added to the |info| fields for quick
12165decision-making.
12166@^inner loop@>
12167
12168@d none_seen=0 {no |y_here| or |z_here| nodes have been encountered yet}
12169@d y_seen=6 {we have seen |y_here| but not |z_here|}
12170@d z_seen=12 {we have seen |z_here| but not |y_here|}
12171
12172@<Look at the other stack entries until deciding...@>=
12173p:=link(q); mstate:=none_seen;
12174while p<>null do
12175  begin if width(p)=w then @<Consider a node with matching width;
12176    |goto found| if it's a hit@>
12177  else  case mstate+info(p) of
12178    none_seen+y_here: mstate:=y_seen;
12179    none_seen+z_here: mstate:=z_seen;
12180    y_seen+z_here,z_seen+y_here: goto not_found;
12181    othercases do_nothing
12182    endcases;
12183  p:=link(p);
12184  end;
12185not_found:
12186
12187@ We might find a valid hit in a |y| or |z| byte that is already gone
12188from the buffer. But we can't change bytes that are gone forever; ``the
12189moving finger writes, $\ldots\,\,$.''
12190
12191@<Consider a node with matching width...@>=
12192case mstate+info(p) of
12193none_seen+yz_OK,none_seen+y_OK,z_seen+yz_OK,z_seen+y_OK:@t@>@;@/
12194  if location(p)<dvi_gone then goto not_found
12195  else @<Change buffered instruction to |y| or |w| and |goto found|@>;
12196none_seen+z_OK,y_seen+yz_OK,y_seen+z_OK:@t@>@;@/
12197  if location(p)<dvi_gone then goto not_found
12198  else @<Change buffered instruction to |z| or |x| and |goto found|@>;
12199none_seen+y_here,none_seen+z_here,y_seen+z_here,z_seen+y_here: goto found;
12200othercases do_nothing
12201endcases
12202
12203@ @<Change buffered instruction to |y| or |w| and |goto found|@>=
12204begin k:=location(p)-dvi_offset;
12205if k<0 then k:=k+dvi_buf_size;
12206dvi_buf[k]:=dvi_buf[k]+y1-down1;
12207info(p):=y_here; goto found;
12208end
12209
12210@ @<Change buffered instruction to |z| or |x| and |goto found|@>=
12211begin k:=location(p)-dvi_offset;
12212if k<0 then k:=k+dvi_buf_size;
12213dvi_buf[k]:=dvi_buf[k]+z1-down1;
12214info(p):=z_here; goto found;
12215end
12216
12217@ In case you are wondering when all the movement nodes are removed from
12218\TeX's memory, the answer is that they are recycled just before
12219|hlist_out| and |vlist_out| finish outputting a box. This restores the
12220down and right stacks to the state they were in before the box was output,
12221except that some |info|'s may have become more restrictive.
12222
12223@p procedure prune_movements(@!l:integer);
12224  {delete movement nodes with |location>=l|}
12225label done,exit;
12226var p:pointer; {node being deleted}
12227begin while down_ptr<>null do
12228  begin if location(down_ptr)<l then goto done;
12229  p:=down_ptr; down_ptr:=link(p); free_node(p,movement_node_size);
12230  end;
12231done: while right_ptr<>null do
12232  begin if location(right_ptr)<l then return;
12233  p:=right_ptr; right_ptr:=link(p); free_node(p,movement_node_size);
12234  end;
12235exit:end;
12236
12237@ The actual distances by which we want to move might be computed as the
12238sum of several separate movements. For example, there might be several
12239glue nodes in succession, or we might want to move right by the width of
12240some box plus some amount of glue. More importantly, the baselineskip
12241distances are computed in terms of glue together with the depth and
12242height of adjacent boxes, and we want the \.{DVI} file to lump these
12243three quantities together into a single motion.
12244
12245Therefore, \TeX\ maintains two pairs of global variables: |dvi_h| and |dvi_v|
12246are the |h| and |v| coordinates corresponding to the commands actually
12247output to the \.{DVI} file, while |cur_h| and |cur_v| are the coordinates
12248corresponding to the current state of the output routines. Coordinate
12249changes will accumulate in |cur_h| and |cur_v| without being reflected
12250in the output, until such a change becomes necessary or desirable; we
12251can call the |movement| procedure whenever we want to make |dvi_h=cur_h|
12252or |dvi_v=cur_v|.
12253
12254The current font reflected in the \.{DVI} output is called |dvi_f|;
12255there is no need for a `\\{cur\_f}' variable.
12256
12257The depth of nesting of |hlist_out| and |vlist_out| is called |cur_s|;
12258this is essentially the depth of |push| commands in the \.{DVI} output.
12259
12260@d synch_h==if cur_h<>dvi_h then
12261    begin movement(cur_h-dvi_h,right1); dvi_h:=cur_h;
12262    end
12263@d synch_v==if cur_v<>dvi_v then
12264    begin movement(cur_v-dvi_v,down1); dvi_v:=cur_v;
12265    end
12266
12267@<Glob...@>=
12268@!dvi_h,@!dvi_v:scaled; {a \.{DVI} reader program thinks we are here}
12269@!cur_h,@!cur_v:scaled; {\TeX\ thinks we are here}
12270@!dvi_f:internal_font_number; {the current font}
12271@!cur_s:integer; {current depth of output box nesting, initially $-1$}
12272
12273@ @<Initialize variables as |ship_out| begins@>=
12274dvi_h:=0; dvi_v:=0; cur_h:=h_offset; dvi_f:=null_font;
12275ensure_dvi_open;
12276if total_pages=0 then
12277  begin dvi_out(pre); dvi_out(id_byte); {output the preamble}
12278@^preamble of \.{DVI} file@>
12279  dvi_four(25400000); dvi_four(473628672); {conversion ratio for sp}
12280  prepare_mag; dvi_four(mag); {magnification factor is frozen}
12281  old_setting:=selector; selector:=new_string;
12282  print(" TeX output "); print_int(year); print_char(".");
12283  print_two(month); print_char("."); print_two(day);
12284  print_char(":"); print_two(time div 60);
12285  print_two(time mod 60);
12286  selector:=old_setting; dvi_out(cur_length);
12287  for s:=str_start[str_ptr] to pool_ptr-1 do dvi_out(so(str_pool[s]));
12288  pool_ptr:=str_start[str_ptr]; {flush the current string}
12289  end
12290
12291@ When |hlist_out| is called, its duty is to output the box represented
12292by the |hlist_node| pointed to by |temp_ptr|. The reference point of that
12293box has coordinates |(cur_h,cur_v)|.
12294
12295Similarly, when |vlist_out| is called, its duty is to output the box represented
12296by the |vlist_node| pointed to by |temp_ptr|. The reference point of that
12297box has coordinates |(cur_h,cur_v)|.
12298@^recursion@>
12299
12300@p procedure@?vlist_out; forward; {|hlist_out| and |vlist_out| are mutually
12301  recursive}
12302
12303@ The recursive procedures |hlist_out| and |vlist_out| each have local variables
12304|save_h| and |save_v| to hold the values of |dvi_h| and |dvi_v| just before
12305entering a new level of recursion.  In effect, the values of |save_h| and
12306|save_v| on \TeX's run-time stack correspond to the values of |h| and |v|
12307that a \.{DVI}-reading program will push onto its coordinate stack.
12308
12309@d move_past=13 {go to this label when advancing past glue or a rule}
12310@d fin_rule=14 {go to this label to finish processing a rule}
12311@d next_p=15 {go to this label when finished with node |p|}
12312
12313@p @t\4@>@<Declare procedures needed in |hlist_out|, |vlist_out|@>@t@>@/
12314procedure hlist_out; {output an |hlist_node| box}
12315label reswitch, move_past, fin_rule, next_p;
12316var base_line: scaled; {the baseline coordinate for this box}
12317@!left_edge: scaled; {the left coordinate for this box}
12318@!save_h,@!save_v: scaled; {what |dvi_h| and |dvi_v| should pop to}
12319@!this_box: pointer; {pointer to containing box}
12320@!g_order: glue_ord; {applicable order of infinity for glue}
12321@!g_sign: normal..shrinking; {selects type of glue}
12322@!p:pointer; {current position in the hlist}
12323@!save_loc:integer; {\.{DVI} byte location upon entry}
12324@!leader_box:pointer; {the leader box being replicated}
12325@!leader_wd:scaled; {width of leader box being replicated}
12326@!lx:scaled; {extra space between leader boxes}
12327@!outer_doing_leaders:boolean; {were we doing leaders?}
12328@!edge:scaled; {left edge of sub-box, or right edge of leader space}
12329@!glue_temp:real; {glue value before rounding}
12330@!cur_glue:real; {glue seen so far}
12331@!cur_g:scaled; {rounded equivalent of |cur_glue| times the glue ratio}
12332begin cur_g:=0; cur_glue:=float_constant(0);
12333this_box:=temp_ptr; g_order:=glue_order(this_box);
12334g_sign:=glue_sign(this_box); p:=list_ptr(this_box);
12335incr(cur_s);
12336if cur_s>0 then dvi_out(push);
12337if cur_s>max_push then max_push:=cur_s;
12338save_loc:=dvi_offset+dvi_ptr; base_line:=cur_v; left_edge:=cur_h;
12339while p<>null do @<Output node |p| for |hlist_out| and move to the next node,
12340  maintaining the condition |cur_v=base_line|@>;
12341prune_movements(save_loc);
12342if cur_s>0 then dvi_pop(save_loc);
12343decr(cur_s);
12344end;
12345
12346@ We ought to give special care to the efficiency of one part of |hlist_out|,
12347since it belongs to \TeX's inner loop. When a |char_node| is encountered,
12348we save a little time by processing several nodes in succession until
12349reaching a non-|char_node|. The program uses the fact that |set_char_0=0|.
12350@^inner loop@>
12351
12352@<Output node |p| for |hlist_out|...@>=
12353reswitch: if is_char_node(p) then
12354  begin synch_h; synch_v;
12355  repeat f:=font(p); c:=character(p);
12356  if f<>dvi_f then @<Change font |dvi_f| to |f|@>;
12357  if c>=qi(128) then dvi_out(set1);
12358  dvi_out(qo(c));@/
12359  cur_h:=cur_h+char_width(f)(char_info(f)(c));
12360  p:=link(p);
12361  until not is_char_node(p);
12362  dvi_h:=cur_h;
12363  end
12364else @<Output the non-|char_node| |p| for |hlist_out|
12365    and move to the next node@>
12366
12367@ @<Change font |dvi_f| to |f|@>=
12368begin if not font_used[f] then
12369  begin dvi_font_def(f); font_used[f]:=true;
12370  end;
12371if f<=64+font_base then dvi_out(f-font_base-1+fnt_num_0)
12372else  begin dvi_out(fnt1); dvi_out(f-font_base-1);
12373  end;
12374dvi_f:=f;
12375end
12376
12377@ @<Output the non-|char_node| |p| for |hlist_out|...@>=
12378begin case type(p) of
12379hlist_node,vlist_node:@<Output a box in an hlist@>;
12380rule_node: begin rule_ht:=height(p); rule_dp:=depth(p); rule_wd:=width(p);
12381  goto fin_rule;
12382  end;
12383whatsit_node: @<Output the whatsit node |p| in an hlist@>;
12384glue_node: @<Move right or output leaders@>;
12385kern_node,math_node:cur_h:=cur_h+width(p);
12386ligature_node: @<Make node |p| look like a |char_node| and |goto reswitch|@>;
12387othercases do_nothing
12388endcases;@/
12389goto next_p;
12390fin_rule: @<Output a rule in an hlist@>;
12391move_past: cur_h:=cur_h+rule_wd;
12392next_p:p:=link(p);
12393end
12394
12395@ @<Output a box in an hlist@>=
12396if list_ptr(p)=null then cur_h:=cur_h+width(p)
12397else  begin save_h:=dvi_h; save_v:=dvi_v;
12398  cur_v:=base_line+shift_amount(p); {shift the box down}
12399  temp_ptr:=p; edge:=cur_h;
12400  if type(p)=vlist_node then vlist_out@+else hlist_out;
12401  dvi_h:=save_h; dvi_v:=save_v;
12402  cur_h:=edge+width(p); cur_v:=base_line;
12403  end
12404
12405@ @<Output a rule in an hlist@>=
12406if is_running(rule_ht) then rule_ht:=height(this_box);
12407if is_running(rule_dp) then rule_dp:=depth(this_box);
12408rule_ht:=rule_ht+rule_dp; {this is the rule thickness}
12409if (rule_ht>0)and(rule_wd>0) then {we don't output empty rules}
12410  begin synch_h; cur_v:=base_line+rule_dp; synch_v;
12411  dvi_out(set_rule); dvi_four(rule_ht); dvi_four(rule_wd);
12412  cur_v:=base_line; dvi_h:=dvi_h+rule_wd;
12413  end
12414
12415@ @d billion==float_constant(1000000000)
12416@d vet_glue(#)== glue_temp:=#;
12417  if glue_temp>billion then
12418           glue_temp:=billion
12419  else if glue_temp<-billion then
12420           glue_temp:=-billion
12421
12422@<Move right or output leaders@>=
12423begin g:=glue_ptr(p); rule_wd:=width(g)-cur_g;
12424if g_sign<>normal then
12425  begin if g_sign=stretching then
12426    begin if stretch_order(g)=g_order then
12427      begin cur_glue:=cur_glue+stretch(g);
12428      vet_glue(float(glue_set(this_box))*cur_glue);
12429@^real multiplication@>
12430      cur_g:=round(glue_temp);
12431      end;
12432    end
12433  else if shrink_order(g)=g_order then
12434      begin cur_glue:=cur_glue-shrink(g);
12435      vet_glue(float(glue_set(this_box))*cur_glue);
12436      cur_g:=round(glue_temp);
12437      end;
12438  end;
12439rule_wd:=rule_wd+cur_g;
12440if subtype(p)>=a_leaders then
12441  @<Output leaders in an hlist, |goto fin_rule| if a rule
12442    or to |next_p| if done@>;
12443goto move_past;
12444end
12445
12446@ @<Output leaders in an hlist...@>=
12447begin leader_box:=leader_ptr(p);
12448if type(leader_box)=rule_node then
12449  begin rule_ht:=height(leader_box); rule_dp:=depth(leader_box);
12450  goto fin_rule;
12451  end;
12452leader_wd:=width(leader_box);
12453if (leader_wd>0)and(rule_wd>0) then
12454  begin rule_wd:=rule_wd+10; {compensate for floating-point rounding}
12455  edge:=cur_h+rule_wd; lx:=0;
12456  @<Let |cur_h| be the position of the first box, and set |leader_wd+lx|
12457    to the spacing between corresponding parts of boxes@>;
12458  while cur_h+leader_wd<=edge do
12459    @<Output a leader box at |cur_h|,
12460      then advance |cur_h| by |leader_wd+lx|@>;
12461  cur_h:=edge-10; goto next_p;
12462  end;
12463end
12464
12465@ The calculations related to leaders require a bit of care. First, in the
12466case of |a_leaders| (aligned leaders), we want to move |cur_h| to
12467|left_edge| plus the smallest multiple of |leader_wd| for which the result
12468is not less than the current value of |cur_h|; i.e., |cur_h| should become
12469$|left_edge|+|leader_wd|\times\lceil
12470(|cur_h|-|left_edge|)/|leader_wd|\rceil$.  The program here should work in
12471all cases even though some implementations of \PASCAL\ give nonstandard
12472results for the |div| operation when |cur_h| is less than |left_edge|.
12473
12474In the case of |c_leaders| (centered leaders), we want to increase |cur_h|
12475by half of the excess space not occupied by the leaders; and in the
12476case of |x_leaders| (expanded leaders) we increase |cur_h|
12477by $1/(q+1)$ of this excess space, where $q$ is the number of times the
12478leader box will be replicated. Slight inaccuracies in the division might
12479accumulate; half of this rounding error is placed at each end of the leaders.
12480
12481@<Let |cur_h| be the position of the first box, ...@>=
12482if subtype(p)=a_leaders then
12483  begin save_h:=cur_h;
12484  cur_h:=left_edge+leader_wd*((cur_h-left_edge)@!div leader_wd);
12485  if cur_h<save_h then cur_h:=cur_h+leader_wd;
12486  end
12487else  begin lq:=rule_wd div leader_wd; {the number of box copies}
12488  lr:=rule_wd mod leader_wd; {the remaining space}
12489  if subtype(p)=c_leaders then cur_h:=cur_h+(lr div 2)
12490  else  begin lx:=lr div (lq+1);
12491    cur_h:=cur_h+((lr-(lq-1)*lx) div 2);
12492    end;
12493  end
12494
12495@ The `\\{synch}' operations here are intended to decrease the number of
12496bytes needed to specify horizontal and vertical motion in the \.{DVI} output.
12497
12498@<Output a leader box at |cur_h|, ...@>=
12499begin cur_v:=base_line+shift_amount(leader_box); synch_v; save_v:=dvi_v;@/
12500synch_h; save_h:=dvi_h; temp_ptr:=leader_box;
12501outer_doing_leaders:=doing_leaders; doing_leaders:=true;
12502if type(leader_box)=vlist_node then vlist_out@+else hlist_out;
12503doing_leaders:=outer_doing_leaders;
12504dvi_v:=save_v; dvi_h:=save_h; cur_v:=base_line;
12505cur_h:=save_h+leader_wd+lx;
12506end
12507
12508@ The |vlist_out| routine is similar to |hlist_out|, but a bit simpler.
12509
12510@p procedure vlist_out; {output a |vlist_node| box}
12511label move_past, fin_rule, next_p;
12512var left_edge: scaled; {the left coordinate for this box}
12513@!top_edge: scaled; {the top coordinate for this box}
12514@!save_h,@!save_v: scaled; {what |dvi_h| and |dvi_v| should pop to}
12515@!this_box: pointer; {pointer to containing box}
12516@!g_order: glue_ord; {applicable order of infinity for glue}
12517@!g_sign: normal..shrinking; {selects type of glue}
12518@!p:pointer; {current position in the vlist}
12519@!save_loc:integer; {\.{DVI} byte location upon entry}
12520@!leader_box:pointer; {the leader box being replicated}
12521@!leader_ht:scaled; {height of leader box being replicated}
12522@!lx:scaled; {extra space between leader boxes}
12523@!outer_doing_leaders:boolean; {were we doing leaders?}
12524@!edge:scaled; {bottom boundary of leader space}
12525@!glue_temp:real; {glue value before rounding}
12526@!cur_glue:real; {glue seen so far}
12527@!cur_g:scaled; {rounded equivalent of |cur_glue| times the glue ratio}
12528begin cur_g:=0; cur_glue:=float_constant(0);
12529this_box:=temp_ptr; g_order:=glue_order(this_box);
12530g_sign:=glue_sign(this_box); p:=list_ptr(this_box);
12531incr(cur_s);
12532if cur_s>0 then dvi_out(push);
12533if cur_s>max_push then max_push:=cur_s;
12534save_loc:=dvi_offset+dvi_ptr; left_edge:=cur_h; cur_v:=cur_v-height(this_box);
12535top_edge:=cur_v;
12536while p<>null do @<Output node |p| for |vlist_out| and move to the next node,
12537  maintaining the condition |cur_h=left_edge|@>;
12538prune_movements(save_loc);
12539if cur_s>0 then dvi_pop(save_loc);
12540decr(cur_s);
12541end;
12542
12543@ @<Output node |p| for |vlist_out|...@>=
12544begin if is_char_node(p) then confusion("vlistout")
12545@:this can't happen vlistout}{\quad vlistout@>
12546else @<Output the non-|char_node| |p| for |vlist_out|@>;
12547next_p:p:=link(p);
12548end
12549
12550@ @<Output the non-|char_node| |p| for |vlist_out|@>=
12551begin case type(p) of
12552hlist_node,vlist_node:@<Output a box in a vlist@>;
12553rule_node: begin rule_ht:=height(p); rule_dp:=depth(p); rule_wd:=width(p);
12554  goto fin_rule;
12555  end;
12556whatsit_node: @<Output the whatsit node |p| in a vlist@>;
12557glue_node: @<Move down or output leaders@>;
12558kern_node:cur_v:=cur_v+width(p);
12559othercases do_nothing
12560endcases;@/
12561goto next_p;
12562fin_rule: @<Output a rule in a vlist, |goto next_p|@>;
12563move_past: cur_v:=cur_v+rule_ht;
12564end
12565
12566@ The |synch_v| here allows the \.{DVI} output to use one-byte commands
12567for adjusting |v| in most cases, since the baselineskip distance will
12568usually be constant.
12569
12570@<Output a box in a vlist@>=
12571if list_ptr(p)=null then cur_v:=cur_v+height(p)+depth(p)
12572else  begin cur_v:=cur_v+height(p); synch_v;
12573  save_h:=dvi_h; save_v:=dvi_v;
12574  cur_h:=left_edge+shift_amount(p); {shift the box right}
12575  temp_ptr:=p;
12576  if type(p)=vlist_node then vlist_out@+else hlist_out;
12577  dvi_h:=save_h; dvi_v:=save_v;
12578  cur_v:=save_v+depth(p); cur_h:=left_edge;
12579  end
12580
12581@ @<Output a rule in a vlist...@>=
12582if is_running(rule_wd) then rule_wd:=width(this_box);
12583rule_ht:=rule_ht+rule_dp; {this is the rule thickness}
12584cur_v:=cur_v+rule_ht;
12585if (rule_ht>0)and(rule_wd>0) then {we don't output empty rules}
12586  begin synch_h; synch_v;
12587  dvi_out(put_rule); dvi_four(rule_ht); dvi_four(rule_wd);
12588  end;
12589goto next_p
12590
12591@ @<Move down or output leaders@>=
12592begin g:=glue_ptr(p); rule_ht:=width(g)-cur_g;
12593if g_sign<>normal then
12594  begin if g_sign=stretching then
12595    begin if stretch_order(g)=g_order then
12596      begin cur_glue:=cur_glue+stretch(g);
12597      vet_glue(float(glue_set(this_box))*cur_glue);
12598@^real multiplication@>
12599      cur_g:=round(glue_temp);
12600      end;
12601    end
12602  else if shrink_order(g)=g_order then
12603      begin cur_glue:=cur_glue-shrink(g);
12604      vet_glue(float(glue_set(this_box))*cur_glue);
12605      cur_g:=round(glue_temp);
12606      end;
12607  end;
12608rule_ht:=rule_ht+cur_g;
12609if subtype(p)>=a_leaders then
12610  @<Output leaders in a vlist, |goto fin_rule| if a rule
12611    or to |next_p| if done@>;
12612goto move_past;
12613end
12614
12615@ @<Output leaders in a vlist...@>=
12616begin leader_box:=leader_ptr(p);
12617if type(leader_box)=rule_node then
12618  begin rule_wd:=width(leader_box); rule_dp:=0;
12619  goto fin_rule;
12620  end;
12621leader_ht:=height(leader_box)+depth(leader_box);
12622if (leader_ht>0)and(rule_ht>0) then
12623  begin rule_ht:=rule_ht+10; {compensate for floating-point rounding}
12624  edge:=cur_v+rule_ht; lx:=0;
12625  @<Let |cur_v| be the position of the first box, and set |leader_ht+lx|
12626    to the spacing between corresponding parts of boxes@>;
12627  while cur_v+leader_ht<=edge do
12628    @<Output a leader box at |cur_v|,
12629      then advance |cur_v| by |leader_ht+lx|@>;
12630  cur_v:=edge-10; goto next_p;
12631  end;
12632end
12633
12634@ @<Let |cur_v| be the position of the first box, ...@>=
12635if subtype(p)=a_leaders then
12636  begin save_v:=cur_v;
12637  cur_v:=top_edge+leader_ht*((cur_v-top_edge)@!div leader_ht);
12638  if cur_v<save_v then cur_v:=cur_v+leader_ht;
12639  end
12640else  begin lq:=rule_ht div leader_ht; {the number of box copies}
12641  lr:=rule_ht mod leader_ht; {the remaining space}
12642  if subtype(p)=c_leaders then cur_v:=cur_v+(lr div 2)
12643  else  begin lx:=lr div (lq+1);
12644    cur_v:=cur_v+((lr-(lq-1)*lx) div 2);
12645    end;
12646  end
12647
12648@ When we reach this part of the program, |cur_v| indicates the top of a
12649leader box, not its baseline.
12650
12651@<Output a leader box at |cur_v|, ...@>=
12652begin cur_h:=left_edge+shift_amount(leader_box); synch_h; save_h:=dvi_h;@/
12653cur_v:=cur_v+height(leader_box); synch_v; save_v:=dvi_v;
12654temp_ptr:=leader_box;
12655outer_doing_leaders:=doing_leaders; doing_leaders:=true;
12656if type(leader_box)=vlist_node then vlist_out@+else hlist_out;
12657doing_leaders:=outer_doing_leaders;
12658dvi_v:=save_v; dvi_h:=save_h; cur_h:=left_edge;
12659cur_v:=save_v-height(leader_box)+leader_ht+lx;
12660end
12661
12662@ The |hlist_out| and |vlist_out| procedures are now complete, so we are
12663ready for the |ship_out| routine that gets them started in the first place.
12664
12665@p procedure ship_out(@!p:pointer); {output the box |p|}
12666label done;
12667var page_loc:integer; {location of the current |bop|}
12668@!j,@!k:0..9; {indices to first ten count registers}
12669@!s:pool_pointer; {index into |str_pool|}
12670@!old_setting:0..max_selector; {saved |selector| setting}
12671begin if tracing_output>0 then
12672  begin print_nl(""); print_ln;
12673  print("Completed box being shipped out");
12674@.Completed box...@>
12675  end;
12676if term_offset>max_print_line-9 then print_ln
12677else if (term_offset>0)or(file_offset>0) then print_char(" ");
12678print_char("["); j:=9;
12679while (count(j)=0)and(j>0) do decr(j);
12680for k:=0 to j do
12681  begin print_int(count(k));
12682  if k<j then print_char(".");
12683  end;
12684update_terminal;
12685if tracing_output>0 then
12686  begin print_char("]");
12687  begin_diagnostic; show_box(p); end_diagnostic(true);
12688  end;
12689@<Ship box |p| out@>;
12690if tracing_output<=0 then print_char("]");
12691dead_cycles:=0;
12692update_terminal; {progress report}
12693@<Flush the box from memory, showing statistics if requested@>;
12694end;
12695
12696@ @<Flush the box from memory, showing statistics if requested@>=
12697@!stat if tracing_stats>1 then
12698  begin print_nl("Memory usage before: ");
12699@.Memory usage...@>
12700  print_int(var_used); print_char("&");
12701  print_int(dyn_used); print_char(";");
12702  end;
12703tats@/
12704flush_node_list(p);
12705@!stat if tracing_stats>1 then
12706  begin print(" after: ");
12707  print_int(var_used); print_char("&");
12708  print_int(dyn_used); print("; still untouched: ");
12709  print_int(hi_mem_min-lo_mem_max-1); print_ln;
12710  end;
12711tats
12712
12713@ @<Ship box |p| out@>=
12714@<Update the values of |max_h| and |max_v|; but if the page is too large,
12715  |goto done|@>;
12716@<Initialize variables as |ship_out| begins@>;
12717page_loc:=dvi_offset+dvi_ptr;
12718dvi_out(bop);
12719for k:=0 to 9 do dvi_four(count(k));
12720dvi_four(last_bop); last_bop:=page_loc;
12721cur_v:=height(p)+v_offset; temp_ptr:=p;
12722if type(p)=vlist_node then vlist_out@+else hlist_out;
12723dvi_out(eop); incr(total_pages); cur_s:=-1;
12724done:
12725
12726@ Sometimes the user will generate a huge page because other error messages
12727are being ignored. Such pages are not output to the \.{dvi} file, since they
12728may confuse the printing software.
12729
12730@<Update the values of |max_h| and |max_v|; but if the page is too large...@>=
12731if (height(p)>max_dimen)or@|(depth(p)>max_dimen)or@|
12732   (height(p)+depth(p)+v_offset>max_dimen)or@|
12733   (width(p)+h_offset>max_dimen) then
12734  begin print_err("Huge page cannot be shipped out");
12735@.Huge page...@>
12736  help2("The page just created is more than 18 feet tall or")@/
12737   ("more than 18 feet wide, so I suspect something went wrong.");
12738  error;
12739  if tracing_output<=0 then
12740    begin begin_diagnostic;
12741    print_nl("The following box has been deleted:");
12742@.The following...deleted@>
12743    show_box(p);
12744    end_diagnostic(true);
12745    end;
12746  goto done;
12747  end;
12748if height(p)+depth(p)+v_offset>max_v then max_v:=height(p)+depth(p)+v_offset;
12749if width(p)+h_offset>max_h then max_h:=width(p)+h_offset
12750
12751@ At the end of the program, we must finish things off by writing the
12752post\-amble. If |total_pages=0|, the \.{DVI} file was never opened.
12753If |total_pages>=65536|, the \.{DVI} file will lie. And if
12754|max_push>=65536|, the user deserves whatever chaos might ensue.
12755
12756An integer variable |k| will be declared for use by this routine.
12757
12758@<Finish the \.{DVI} file@>=
12759while cur_s>-1 do
12760  begin if cur_s>0 then dvi_out(pop)
12761  else  begin dvi_out(eop); incr(total_pages);
12762    end;
12763  decr(cur_s);
12764  end;
12765if total_pages=0 then print_nl("No pages of output.")
12766@.No pages of output@>
12767else  begin dvi_out(post); {beginning of the postamble}
12768  dvi_four(last_bop); last_bop:=dvi_offset+dvi_ptr-5; {|post| location}
12769  dvi_four(25400000); dvi_four(473628672); {conversion ratio for sp}
12770  prepare_mag; dvi_four(mag); {magnification factor}
12771  dvi_four(max_v); dvi_four(max_h);@/
12772  dvi_out(max_push div 256); dvi_out(max_push mod 256);@/
12773  dvi_out((total_pages div 256) mod 256); dvi_out(total_pages mod 256);@/
12774  @<Output the font definitions for all fonts that were used@>;
12775  dvi_out(post_post); dvi_four(last_bop); dvi_out(id_byte);@/
12776  k:=4+((dvi_buf_size-dvi_ptr) mod 4); {the number of 223's}
12777  while k>0 do
12778    begin dvi_out(223); decr(k);
12779    end;
12780  @<Empty the last bytes out of |dvi_buf|@>;
12781  print_nl("Output written on "); slow_print(output_file_name);
12782@.Output written on x@>
12783  print(" ("); print_int(total_pages); print(" page");
12784  if total_pages<>1 then print_char("s");
12785  print(", "); print_int(dvi_offset+dvi_ptr); print(" bytes).");
12786  b_close(dvi_file);
12787  end
12788
12789@ @<Output the font definitions...@>=
12790while font_ptr>font_base do
12791  begin if font_used[font_ptr] then dvi_font_def(font_ptr);
12792  decr(font_ptr);
12793  end
12794
12795@* \[33] Packaging.
12796We're essentially done with the parts of \TeX\ that are concerned with
12797the input (|get_next|) and the output (|ship_out|). So it's time to
12798get heavily into the remaining part, which does the real work of typesetting.
12799
12800After lists are constructed, \TeX\ wraps them up and puts them into boxes.
12801Two major subroutines are given the responsibility for this task: |hpack|
12802applies to horizontal lists (hlists) and |vpack| applies to vertical lists
12803(vlists). The main duty of |hpack| and |vpack| is to compute the dimensions
12804of the resulting boxes, and to adjust the glue if one of those dimensions
12805is pre-specified. The computed sizes normally enclose all of the material
12806inside the new box; but some items may stick out if negative glue is used,
12807if the box is overfull, or if a \.{\\vbox} includes other boxes that have
12808been shifted left.
12809
12810The subroutine call |hpack(p,w,m)| returns a pointer to an |hlist_node|
12811for a box containing the hlist that starts at |p|. Parameter |w| specifies
12812a width; and parameter |m| is either `|exactly|' or `|additional|'.  Thus,
12813|hpack(p,w,exactly)| produces a box whose width is exactly |w|, while
12814|hpack(p,w,additional)| yields a box whose width is the natural width plus
12815|w|.  It is convenient to define a macro called `|natural|' to cover the
12816most common case, so that we can say |hpack(p,natural)| to get a box that
12817has the natural width of list |p|.
12818
12819Similarly, |vpack(p,w,m)| returns a pointer to a |vlist_node| for a
12820box containing the vlist that starts at |p|. In this case |w| represents
12821a height instead of a width; the parameter |m| is interpreted as in |hpack|.
12822
12823@d exactly=0 {a box dimension is pre-specified}
12824@d additional=1 {a box dimension is increased from the natural one}
12825@d natural==0,additional {shorthand for parameters to |hpack| and |vpack|}
12826
12827@ The parameters to |hpack| and |vpack| correspond to \TeX's primitives
12828like `\.{\\hbox} \.{to} \.{300pt}', `\.{\\hbox} \.{spread} \.{10pt}'; note
12829that `\.{\\hbox}' with no dimension following it is equivalent to
12830`\.{\\hbox} \.{spread} \.{0pt}'.  The |scan_spec| subroutine scans such
12831constructions in the user's input, including the mandatory left brace that
12832follows them, and it puts the specification onto |save_stack| so that the
12833desired box can later be obtained by executing the following code:
12834$$\vbox{\halign{#\hfil\cr
12835|save_ptr:=save_ptr-2;|\cr
12836|hpack(p,saved(1),saved(0)).|\cr}}$$
12837Special care is necessary to ensure that the special |save_stack| codes
12838are placed just below the new group code, because scanning can change
12839|save_stack| when \.{\\csname} appears.
12840
12841@p procedure scan_spec(@!c:group_code;@!three_codes:boolean);
12842  {scans a box specification and left brace}
12843label found;
12844var @!s:integer; {temporarily saved value}
12845@!spec_code:exactly..additional;
12846begin if three_codes then s:=saved(0);
12847if scan_keyword("to") then spec_code:=exactly
12848@.to@>
12849else if scan_keyword("spread") then spec_code:=additional
12850@.spread@>
12851else  begin spec_code:=additional; cur_val:=0;
12852  goto found;
12853  end;
12854scan_normal_dimen;
12855found: if three_codes then
12856  begin saved(0):=s; incr(save_ptr);
12857  end;
12858saved(0):=spec_code; saved(1):=cur_val; save_ptr:=save_ptr+2;
12859new_save_level(c); scan_left_brace;
12860end;
12861
12862@ To figure out the glue setting, |hpack| and |vpack| determine how much
12863stretchability and shrinkability are present, considering all four orders
12864of infinity. The highest order of infinity that has a nonzero coefficient
12865is then used as if no other orders were present.
12866
12867For example, suppose that the given list contains six glue nodes with
12868the respective stretchabilities 3pt, 8fill, 5fil, 6pt, $-3$fil, $-8$fill.
12869Then the total is essentially 2fil; and if a total additional space of 6pt
12870is to be achieved by stretching, the actual amounts of stretch will be
128710pt, 0pt, 15pt, 0pt, $-9$pt, and 0pt, since only `fil' glue will be
12872considered. (The `fill' glue is therefore not really stretching infinitely
12873with respect to `fil'; nobody would actually want that to happen.)
12874
12875The arrays |total_stretch| and |total_shrink| are used to determine how much
12876glue of each kind is present. A global variable |last_badness| is used
12877to implement \.{\\badness}.
12878
12879@<Glob...@>=
12880@!total_stretch, @!total_shrink: array[glue_ord] of scaled;
12881  {glue found by |hpack| or |vpack|}
12882@!last_badness:integer; {badness of the most recently packaged box}
12883
12884@ If the global variable |adjust_tail| is non-null, the |hpack| routine
12885also removes all occurrences of |ins_node|, |mark_node|, and |adjust_node|
12886items and appends the resulting material onto the list that ends at
12887location |adjust_tail|.
12888
12889@< Glob...@>=
12890@!adjust_tail:pointer; {tail of adjustment list}
12891
12892@ @<Set init...@>=adjust_tail:=null; last_badness:=0;
12893
12894@ Here now is |hpack|, which contains few if any surprises.
12895
12896@p function hpack(@!p:pointer;@!w:scaled;@!m:small_number):pointer;
12897label reswitch, common_ending, exit;
12898var r:pointer; {the box node that will be returned}
12899@!q:pointer; {trails behind |p|}
12900@!h,@!d,@!x:scaled; {height, depth, and natural width}
12901@!s:scaled; {shift amount}
12902@!g:pointer; {points to a glue specification}
12903@!o:glue_ord; {order of infinity}
12904@!f:internal_font_number; {the font in a |char_node|}
12905@!i:four_quarters; {font information about a |char_node|}
12906@!hd:eight_bits; {height and depth indices for a character}
12907begin last_badness:=0; r:=get_node(box_node_size); type(r):=hlist_node;
12908subtype(r):=min_quarterword; shift_amount(r):=0;
12909q:=r+list_offset; link(q):=p;@/
12910h:=0; @<Clear dimensions to zero@>;
12911while p<>null do @<Examine node |p| in the hlist, taking account of its effect
12912  on the dimensions of the new box, or moving it to the adjustment list;
12913  then advance |p| to the next node@>;
12914if adjust_tail<>null then link(adjust_tail):=null;
12915height(r):=h; depth(r):=d;@/
12916@<Determine the value of |width(r)| and the appropriate glue setting;
12917  then |return| or |goto common_ending|@>;
12918common_ending: @<Finish issuing a diagnostic message
12919      for an overfull or underfull hbox@>;
12920exit: hpack:=r;
12921end;
12922
12923@ @<Clear dimensions to zero@>=
12924d:=0; x:=0;
12925total_stretch[normal]:=0; total_shrink[normal]:=0;
12926total_stretch[fil]:=0; total_shrink[fil]:=0;
12927total_stretch[fill]:=0; total_shrink[fill]:=0;
12928total_stretch[filll]:=0; total_shrink[filll]:=0
12929
12930@ @<Examine node |p| in the hlist, taking account of its effect...@>=
12931@^inner loop@>
12932begin reswitch: while is_char_node(p) do
12933  @<Incorporate character dimensions into the dimensions of
12934    the hbox that will contain~it, then move to the next node@>;
12935if p<>null then
12936  begin case type(p) of
12937  hlist_node,vlist_node,rule_node,unset_node:
12938    @<Incorporate box dimensions into the dimensions of
12939      the hbox that will contain~it@>;
12940  ins_node,mark_node,adjust_node: if adjust_tail<>null then
12941    @<Transfer node |p| to the adjustment list@>;
12942  whatsit_node:@<Incorporate a whatsit node into an hbox@>;
12943  glue_node:@<Incorporate glue into the horizontal totals@>;
12944  kern_node,math_node: x:=x+width(p);
12945  ligature_node: @<Make node |p| look like a |char_node|
12946    and |goto reswitch|@>;
12947  othercases do_nothing
12948  endcases;@/
12949  p:=link(p);
12950  end;
12951end
12952
12953
12954@ @<Make node |p| look like a |char_node| and |goto reswitch|@>=
12955begin mem[lig_trick]:=mem[lig_char(p)]; link(lig_trick):=link(p);
12956p:=lig_trick; goto reswitch;
12957end
12958
12959@ The code here implicitly uses the fact that running dimensions are
12960indicated by |null_flag|, which will be ignored in the calculations
12961because it is a highly negative number.
12962
12963@<Incorporate box dimensions into the dimensions of the hbox...@>=
12964begin x:=x+width(p);
12965if type(p)>=rule_node then s:=0 @+else s:=shift_amount(p);
12966if height(p)-s>h then h:=height(p)-s;
12967if depth(p)+s>d then d:=depth(p)+s;
12968end
12969
12970@ The following code is part of \TeX's inner loop; i.e., adding another
12971character of text to the user's input will cause each of these instructions
12972to be exercised one more time.
12973@^inner loop@>
12974
12975@<Incorporate character dimensions into the dimensions of the hbox...@>=
12976begin f:=font(p); i:=char_info(f)(character(p)); hd:=height_depth(i);
12977x:=x+char_width(f)(i);@/
12978s:=char_height(f)(hd);@+if s>h then h:=s;
12979s:=char_depth(f)(hd);@+if s>d then d:=s;
12980p:=link(p);
12981end
12982
12983@ Although node |q| is not necessarily the immediate predecessor of node |p|,
12984it always points to some node in the list preceding |p|. Thus, we can delete
12985nodes by moving |q| when necessary. The algorithm takes linear time, and the
12986extra computation does not intrude on the inner loop unless it is necessary
12987to make a deletion.
12988@^inner loop@>
12989
12990@<Transfer node |p| to the adjustment list@>=
12991begin while link(q)<>p do q:=link(q);
12992if type(p)=adjust_node then
12993  begin link(adjust_tail):=adjust_ptr(p);
12994  while link(adjust_tail)<>null do adjust_tail:=link(adjust_tail);
12995  p:=link(p); free_node(link(q),small_node_size);
12996  end
12997else  begin link(adjust_tail):=p; adjust_tail:=p; p:=link(p);
12998  end;
12999link(q):=p; p:=q;
13000end
13001
13002@ @<Incorporate glue into the horizontal totals@>=
13003begin g:=glue_ptr(p); x:=x+width(g);@/
13004o:=stretch_order(g); total_stretch[o]:=total_stretch[o]+stretch(g);
13005o:=shrink_order(g); total_shrink[o]:=total_shrink[o]+shrink(g);
13006if subtype(p)>=a_leaders then
13007  begin g:=leader_ptr(p);
13008  if height(g)>h then h:=height(g);
13009  if depth(g)>d then d:=depth(g);
13010  end;
13011end
13012
13013@ When we get to the present part of the program, |x| is the natural width
13014of the box being packaged.
13015
13016@<Determine the value of |width(r)| and the appropriate glue setting...@>=
13017if m=additional then w:=x+w;
13018width(r):=w; x:=w-x; {now |x| is the excess to be made up}
13019if x=0 then
13020  begin glue_sign(r):=normal; glue_order(r):=normal;
13021  set_glue_ratio_zero(glue_set(r));
13022  return;
13023  end
13024else if x>0 then @<Determine horizontal glue stretch setting, then |return|
13025    or \hbox{|goto common_ending|}@>
13026else @<Determine horizontal glue shrink setting, then |return|
13027    or \hbox{|goto common_ending|}@>
13028
13029@ @<Determine horizontal glue stretch setting...@>=
13030begin @<Determine the stretch order@>;
13031glue_order(r):=o; glue_sign(r):=stretching;
13032if total_stretch[o]<>0 then glue_set(r):=unfloat(x/total_stretch[o])
13033@^real division@>
13034else  begin glue_sign(r):=normal;
13035  set_glue_ratio_zero(glue_set(r)); {there's nothing to stretch}
13036  end;
13037if o=normal then if list_ptr(r)<>null then
13038  @<Report an underfull hbox and |goto common_ending|, if this box
13039    is sufficiently bad@>;
13040return;
13041end
13042
13043@ @<Determine the stretch order@>=
13044if total_stretch[filll]<>0 then o:=filll
13045else if total_stretch[fill]<>0 then o:=fill
13046else if total_stretch[fil]<>0 then o:=fil
13047else o:=normal
13048
13049@ @<Report an underfull hbox and |goto common_ending|, if...@>=
13050begin last_badness:=badness(x,total_stretch[normal]);
13051if last_badness>hbadness then
13052  begin print_ln;
13053  if last_badness>100 then print_nl("Underfull")@+else print_nl("Loose");
13054  print(" \hbox (badness "); print_int(last_badness);
13055@.Underfull \\hbox...@>
13056@.Loose \\hbox...@>
13057  goto common_ending;
13058  end;
13059end
13060
13061@ In order to provide a decent indication of where an overfull or underfull
13062box originated, we use a global variable |pack_begin_line| that is
13063set nonzero only when |hpack| is being called by the paragraph builder
13064or the alignment finishing routine.
13065
13066@<Glob...@>=
13067@!pack_begin_line:integer; {source file line where the current paragraph
13068  or alignment began; a negative value denotes alignment}
13069
13070@ @<Set init...@>=
13071pack_begin_line:=0;
13072
13073@ @<Finish issuing a diagnostic message for an overfull or underfull hbox@>=
13074if output_active then print(") has occurred while \output is active")
13075else  begin if pack_begin_line<>0 then
13076    begin if pack_begin_line>0 then print(") in paragraph at lines ")
13077    else print(") in alignment at lines ");
13078    print_int(abs(pack_begin_line));
13079    print("--");
13080    end
13081  else print(") detected at line ");
13082  print_int(line);
13083  end;
13084print_ln;@/
13085font_in_short_display:=null_font; short_display(list_ptr(r)); print_ln;@/
13086begin_diagnostic; show_box(r); end_diagnostic(true)
13087
13088@ @<Determine horizontal glue shrink setting...@>=
13089begin @<Determine the shrink order@>;
13090glue_order(r):=o; glue_sign(r):=shrinking;
13091if total_shrink[o]<>0 then glue_set(r):=unfloat((-x)/total_shrink[o])
13092@^real division@>
13093else  begin glue_sign(r):=normal;
13094  set_glue_ratio_zero(glue_set(r)); {there's nothing to shrink}
13095  end;
13096if (total_shrink[o]<-x)and(o=normal)and(list_ptr(r)<>null) then
13097  begin last_badness:=1000000;
13098  set_glue_ratio_one(glue_set(r)); {use the maximum shrinkage}
13099  @<Report an overfull hbox and |goto common_ending|, if this box
13100    is sufficiently bad@>;
13101  end
13102else if o=normal then if list_ptr(r)<>null then
13103  @<Report a tight hbox and |goto common_ending|, if this box
13104    is sufficiently bad@>;
13105return;
13106end
13107
13108@ @<Determine the shrink order@>=
13109if total_shrink[filll]<>0 then o:=filll
13110else if total_shrink[fill]<>0 then o:=fill
13111else if total_shrink[fil]<>0 then o:=fil
13112else o:=normal
13113
13114@ @<Report an overfull hbox and |goto common_ending|, if...@>=
13115if (-x-total_shrink[normal]>hfuzz)or(hbadness<100) then
13116  begin if (overfull_rule>0)and(-x-total_shrink[normal]>hfuzz) then
13117    begin while link(q)<>null do q:=link(q);
13118    link(q):=new_rule;
13119    width(link(q)):=overfull_rule;
13120    end;
13121  print_ln; print_nl("Overfull \hbox (");
13122@.Overfull \\hbox...@>
13123  print_scaled(-x-total_shrink[normal]); print("pt too wide");
13124  goto common_ending;
13125  end
13126
13127@ @<Report a tight hbox and |goto common_ending|, if...@>=
13128begin last_badness:=badness(-x,total_shrink[normal]);
13129if last_badness>hbadness then
13130  begin print_ln; print_nl("Tight \hbox (badness "); print_int(last_badness);
13131@.Tight \\hbox...@>
13132  goto common_ending;
13133  end;
13134end
13135
13136@ The |vpack| subroutine is actually a special case of a slightly more
13137general routine called |vpackage|, which has four parameters. The fourth
13138parameter, which is |max_dimen| in the case of |vpack|, specifies the
13139maximum depth of the page box that is constructed. The depth is first
13140computed by the normal rules; if it exceeds this limit, the reference
13141point is simply moved down until the limiting depth is attained.
13142
13143@d vpack(#)==vpackage(#,max_dimen) {special case of unconstrained depth}
13144
13145@p function vpackage(@!p:pointer;@!h:scaled;@!m:small_number;@!l:scaled):
13146  pointer;
13147label common_ending, exit;
13148var r:pointer; {the box node that will be returned}
13149@!w,@!d,@!x:scaled; {width, depth, and natural height}
13150@!s:scaled; {shift amount}
13151@!g:pointer; {points to a glue specification}
13152@!o:glue_ord; {order of infinity}
13153begin last_badness:=0; r:=get_node(box_node_size); type(r):=vlist_node;
13154subtype(r):=min_quarterword; shift_amount(r):=0;
13155list_ptr(r):=p;@/
13156w:=0; @<Clear dimensions to zero@>;
13157while p<>null do @<Examine node |p| in the vlist, taking account of its effect
13158  on the dimensions of the new box; then advance |p| to the next node@>;
13159width(r):=w;
13160if d>l then
13161  begin x:=x+d-l; depth(r):=l;
13162  end
13163else depth(r):=d;
13164@<Determine the value of |height(r)| and the appropriate glue setting;
13165  then |return| or |goto common_ending|@>;
13166common_ending: @<Finish issuing a diagnostic message
13167      for an overfull or underfull vbox@>;
13168exit: vpackage:=r;
13169end;
13170
13171@ @<Examine node |p| in the vlist, taking account of its effect...@>=
13172begin if is_char_node(p) then confusion("vpack")
13173@:this can't happen vpack}{\quad vpack@>
13174else  case type(p) of
13175  hlist_node,vlist_node,rule_node,unset_node:
13176    @<Incorporate box dimensions into the dimensions of
13177      the vbox that will contain~it@>;
13178  whatsit_node:@<Incorporate a whatsit node into a vbox@>;
13179  glue_node: @<Incorporate glue into the vertical totals@>;
13180  kern_node: begin x:=x+d+width(p); d:=0;
13181    end;
13182  othercases do_nothing
13183  endcases;
13184p:=link(p);
13185end
13186
13187@ @<Incorporate box dimensions into the dimensions of the vbox...@>=
13188begin x:=x+d+height(p); d:=depth(p);
13189if type(p)>=rule_node then s:=0 @+else s:=shift_amount(p);
13190if width(p)+s>w then w:=width(p)+s;
13191end
13192
13193@ @<Incorporate glue into the vertical totals@>=
13194begin x:=x+d; d:=0;@/
13195g:=glue_ptr(p); x:=x+width(g);@/
13196o:=stretch_order(g); total_stretch[o]:=total_stretch[o]+stretch(g);
13197o:=shrink_order(g); total_shrink[o]:=total_shrink[o]+shrink(g);
13198if subtype(p)>=a_leaders then
13199  begin g:=leader_ptr(p);
13200  if width(g)>w then w:=width(g);
13201  end;
13202end
13203
13204@ When we get to the present part of the program, |x| is the natural height
13205of the box being packaged.
13206
13207@<Determine the value of |height(r)| and the appropriate glue setting...@>=
13208if m=additional then h:=x+h;
13209height(r):=h; x:=h-x; {now |x| is the excess to be made up}
13210if x=0 then
13211  begin glue_sign(r):=normal; glue_order(r):=normal;
13212  set_glue_ratio_zero(glue_set(r));
13213  return;
13214  end
13215else if x>0 then @<Determine vertical glue stretch setting, then |return|
13216    or \hbox{|goto common_ending|}@>
13217else @<Determine vertical glue shrink setting, then |return|
13218    or \hbox{|goto common_ending|}@>
13219
13220@ @<Determine vertical glue stretch setting...@>=
13221begin @<Determine the stretch order@>;
13222glue_order(r):=o; glue_sign(r):=stretching;
13223if total_stretch[o]<>0 then glue_set(r):=unfloat(x/total_stretch[o])
13224@^real division@>
13225else  begin glue_sign(r):=normal;
13226  set_glue_ratio_zero(glue_set(r)); {there's nothing to stretch}
13227  end;
13228if o=normal then if list_ptr(r)<>null then
13229  @<Report an underfull vbox and |goto common_ending|, if this box
13230    is sufficiently bad@>;
13231return;
13232end
13233
13234@ @<Report an underfull vbox and |goto common_ending|, if...@>=
13235begin last_badness:=badness(x,total_stretch[normal]);
13236if last_badness>vbadness then
13237  begin print_ln;
13238  if last_badness>100 then print_nl("Underfull")@+else print_nl("Loose");
13239  print(" \vbox (badness "); print_int(last_badness);
13240@.Underfull \\vbox...@>
13241@.Loose \\vbox...@>
13242  goto common_ending;
13243  end;
13244end
13245
13246@ @<Finish issuing a diagnostic message for an overfull or underfull vbox@>=
13247if output_active then print(") has occurred while \output is active")
13248else  begin if pack_begin_line<>0 then {it's actually negative}
13249    begin print(") in alignment at lines ");
13250    print_int(abs(pack_begin_line));
13251    print("--");
13252    end
13253  else print(") detected at line ");
13254  print_int(line);
13255  print_ln;@/
13256  end;
13257begin_diagnostic; show_box(r); end_diagnostic(true)
13258
13259@ @<Determine vertical glue shrink setting...@>=
13260begin @<Determine the shrink order@>;
13261glue_order(r):=o; glue_sign(r):=shrinking;
13262if total_shrink[o]<>0 then glue_set(r):=unfloat((-x)/total_shrink[o])
13263@^real division@>
13264else  begin glue_sign(r):=normal;
13265  set_glue_ratio_zero(glue_set(r)); {there's nothing to shrink}
13266  end;
13267if (total_shrink[o]<-x)and(o=normal)and(list_ptr(r)<>null) then
13268  begin last_badness:=1000000;
13269  set_glue_ratio_one(glue_set(r)); {use the maximum shrinkage}
13270  @<Report an overfull vbox and |goto common_ending|, if this box
13271    is sufficiently bad@>;
13272  end
13273else if o=normal then if list_ptr(r)<>null then
13274  @<Report a tight vbox and |goto common_ending|, if this box
13275    is sufficiently bad@>;
13276return;
13277end
13278
13279@ @<Report an overfull vbox and |goto common_ending|, if...@>=
13280if (-x-total_shrink[normal]>vfuzz)or(vbadness<100) then
13281  begin print_ln; print_nl("Overfull \vbox (");
13282@.Overfull \\vbox...@>
13283  print_scaled(-x-total_shrink[normal]); print("pt too high");
13284  goto common_ending;
13285  end
13286
13287@ @<Report a tight vbox and |goto common_ending|, if...@>=
13288begin last_badness:=badness(-x,total_shrink[normal]);
13289if last_badness>vbadness then
13290  begin print_ln; print_nl("Tight \vbox (badness "); print_int(last_badness);
13291@.Tight \\vbox...@>
13292  goto common_ending;
13293  end;
13294end
13295
13296@ When a box is being appended to the current vertical list, the
13297baselineskip calculation is handled by the |append_to_vlist| routine.
13298
13299@p procedure append_to_vlist(@!b:pointer);
13300var d:scaled; {deficiency of space between baselines}
13301@!p:pointer; {a new glue node}
13302begin if prev_depth>ignore_depth then
13303  begin d:=width(baseline_skip)-prev_depth-height(b);
13304  if d<line_skip_limit then p:=new_param_glue(line_skip_code)
13305  else  begin p:=new_skip_param(baseline_skip_code);
13306    width(temp_ptr):=d; {|temp_ptr=glue_ptr(p)|}
13307    end;
13308  link(tail):=p; tail:=p;
13309  end;
13310link(tail):=b; tail:=b; prev_depth:=depth(b);
13311end;
13312
13313@* \[34] Data structures for math mode.
13314When \TeX\ reads a formula that is enclosed between \.\$'s, it constructs an
13315{\sl mlist}, which is essentially a tree structure representing that
13316formula.  An mlist is a linear sequence of items, but we can regard it as
13317a tree structure because mlists can appear within mlists. For example, many
13318of the entries can be subscripted or superscripted, and such ``scripts''
13319are mlists in their own right.
13320
13321An entire formula is parsed into such a tree before any of the actual
13322typesetting is done, because the current style of type is usually not
13323known until the formula has been fully scanned. For example, when the
13324formula `\.{\$a+b \\over c+d\$}' is being read, there is no way to tell
13325that `\.{a+b}' will be in script size until `\.{\\over}' has appeared.
13326
13327During the scanning process, each element of the mlist being built is
13328classified as a relation, a binary operator, an open parenthesis, etc.,
13329or as a construct like `\.{\\sqrt}' that must be built up. This classification
13330appears in the mlist data structure.
13331
13332After a formula has been fully scanned, the mlist is converted to an hlist
13333so that it can be incorporated into the surrounding text. This conversion is
13334controlled by a recursive procedure that decides all of the appropriate
13335styles by a ``top-down'' process starting at the outermost level and working
13336in towards the subformulas. The formula is ultimately pasted together using
13337combinations of horizontal and vertical boxes, with glue and penalty nodes
13338inserted as necessary.
13339
13340An mlist is represented internally as a linked list consisting chiefly
13341of ``noads'' (pronounced ``no-adds''), to distinguish them from the somewhat
13342similar ``nodes'' in hlists and vlists. Certain kinds of ordinary nodes are
13343allowed to appear in mlists together with the noads; \TeX\ tells the difference
13344by means of the |type| field, since a noad's |type| is always greater than
13345that of a node. An mlist does not contain character nodes, hlist nodes, vlist
13346nodes, math nodes, ligature nodes,
13347or unset nodes; in particular, each mlist item appears in the
13348variable-size part of |mem|, so the |type| field is always present.
13349
13350@ Each noad is four or more words long. The first word contains the |type|
13351and |subtype| and |link| fields that are already so familiar to us; the
13352second, third, and fourth words are called the noad's |nucleus|, |subscr|,
13353and |supscr| fields.
13354
13355Consider, for example, the simple formula `\.{\$x\^2\$}', which would be
13356parsed into an mlist containing a single element called an |ord_noad|.
13357The |nucleus| of this noad is a representation of `\.x', the |subscr| is
13358empty, and the |supscr| is a representation of `\.2'.
13359
13360The |nucleus|, |subscr|, and |supscr| fields are further broken into
13361subfields. If |p| points to a noad, and if |q| is one of its principal
13362fields (e.g., |q=subscr(p)|), there are several possibilities for the
13363subfields, depending on the |math_type| of |q|.
13364
13365\yskip\hang|math_type(q)=math_char| means that |fam(q)| refers to one of
13366the sixteen font families, and |character(q)| is the number of a character
13367within a font of that family, as in a character node.
13368
13369\yskip\hang|math_type(q)=math_text_char| is similar, but the character is
13370unsubscripted and unsuperscripted and it is followed immediately by another
13371character from the same font. (This |math_type| setting appears only
13372briefly during the processing; it is used to suppress unwanted italic
13373corrections.)
13374
13375\yskip\hang|math_type(q)=empty| indicates a field with no value (the
13376corresponding attribute of noad |p| is not present).
13377
13378\yskip\hang|math_type(q)=sub_box| means that |info(q)| points to a box
13379node (either an |hlist_node| or a |vlist_node|) that should be used as the
13380value of the field.  The |shift_amount| in the subsidiary box node is the
13381amount by which that box will be shifted downward.
13382
13383\yskip\hang|math_type(q)=sub_mlist| means that |info(q)| points to
13384an mlist; the mlist must be converted to an hlist in order to obtain
13385the value of this field.
13386
13387\yskip\noindent In the latter case, we might have |info(q)=null|. This
13388is not the same as |math_type(q)=empty|; for example, `\.{\$P\_\{\}\$}'
13389and `\.{\$P\$}' produce different results (the former will not have the
13390``italic correction'' added to the width of |P|, but the ``script skip''
13391will be added).
13392
13393The definitions of subfields given here are evidently wasteful of space,
13394since a halfword is being used for the |math_type| although only three
13395bits would be needed. However, there are hardly ever many noads present at
13396once, since they are soon converted to nodes that take up even more space,
13397so we can afford to represent them in whatever way simplifies the
13398programming.
13399
13400@d noad_size=4 {number of words in a normal noad}
13401@d nucleus(#)==#+1 {the |nucleus| field of a noad}
13402@d supscr(#)==#+2 {the |supscr| field of a noad}
13403@d subscr(#)==#+3 {the |subscr| field of a noad}
13404@d math_type==link {a |halfword| in |mem|}
13405@d fam==font {a |quarterword| in |mem|}
13406@d math_char=1 {|math_type| when the attribute is simple}
13407@d sub_box=2 {|math_type| when the attribute is a box}
13408@d sub_mlist=3 {|math_type| when the attribute is a formula}
13409@d math_text_char=4 {|math_type| when italic correction is dubious}
13410
13411@ Each portion of a formula is classified as Ord, Op, Bin, Rel, Ope,
13412Clo, Pun, or Inn, for purposes of spacing and line breaking. An
13413|ord_noad|, |op_noad|, |bin_noad|, |rel_noad|, |open_noad|, |close_noad|,
13414|punct_noad|, or |inner_noad| is used to represent portions of the various
13415types. For example, an `\.=' sign in a formula leads to the creation of a
13416|rel_noad| whose |nucleus| field is a representation of an equals sign
13417(usually |fam=0|, |character=@'75|).  A formula preceded by \.{\\mathrel}
13418also results in a |rel_noad|.  When a |rel_noad| is followed by an
13419|op_noad|, say, and possibly separated by one or more ordinary nodes (not
13420noads), \TeX\ will insert a penalty node (with the current |rel_penalty|)
13421just after the formula that corresponds to the |rel_noad|, unless there
13422already was a penalty immediately following; and a ``thick space'' will be
13423inserted just before the formula that corresponds to the |op_noad|.
13424
13425A noad of type |ord_noad|, |op_noad|, \dots, |inner_noad| usually
13426has a |subtype=normal|. The only exception is that an |op_noad| might
13427have |subtype=limits| or |no_limits|, if the normal positioning of
13428limits has been overridden for this operator.
13429
13430@d ord_noad=unset_node+3 {|type| of a noad classified Ord}
13431@d op_noad=ord_noad+1 {|type| of a noad classified Op}
13432@d bin_noad=ord_noad+2 {|type| of a noad classified Bin}
13433@d rel_noad=ord_noad+3 {|type| of a noad classified Rel}
13434@d open_noad=ord_noad+4 {|type| of a noad classified Ope}
13435@d close_noad=ord_noad+5 {|type| of a noad classified Clo}
13436@d punct_noad=ord_noad+6 {|type| of a noad classified Pun}
13437@d inner_noad=ord_noad+7 {|type| of a noad classified Inn}
13438@d limits=1 {|subtype| of |op_noad| whose scripts are to be above, below}
13439@d no_limits=2 {|subtype| of |op_noad| whose scripts are to be normal}
13440
13441@ A |radical_noad| is five words long; the fifth word is the |left_delimiter|
13442field, which usually represents a square root sign.
13443
13444A |fraction_noad| is six words long; it has a |right_delimiter| field
13445as well as a |left_delimiter|.
13446
13447Delimiter fields are of type |four_quarters|, and they have four subfields
13448called |small_fam|, |small_char|, |large_fam|, |large_char|. These subfields
13449represent variable-size delimiters by giving the ``small'' and ``large''
13450starting characters, as explained in Chapter~17 of {\sl The \TeX book}.
13451@:TeXbook}{\sl The \TeX book@>
13452
13453A |fraction_noad| is actually quite different from all other noads. Not
13454only does it have six words, it has |thickness|, |denominator|, and
13455|numerator| fields instead of |nucleus|, |subscr|, and |supscr|. The
13456|thickness| is a scaled value that tells how thick to make a fraction
13457rule; however, the special value |default_code| is used to stand for the
13458|default_rule_thickness| of the current size. The |numerator| and
13459|denominator| point to mlists that define a fraction; we always have
13460$$\hbox{|math_type(numerator)=math_type(denominator)=sub_mlist|}.$$ The
13461|left_delimiter| and |right_delimiter| fields specify delimiters that will
13462be placed at the left and right of the fraction. In this way, a
13463|fraction_noad| is able to represent all of \TeX's operators \.{\\over},
13464\.{\\atop}, \.{\\above}, \.{\\overwithdelims}, \.{\\atopwithdelims}, and
13465 \.{\\abovewithdelims}.
13466
13467@d left_delimiter(#)==#+4 {first delimiter field of a noad}
13468@d right_delimiter(#)==#+5 {second delimiter field of a fraction noad}
13469@d radical_noad=inner_noad+1 {|type| of a noad for square roots}
13470@d radical_noad_size=5 {number of |mem| words in a radical noad}
13471@d fraction_noad=radical_noad+1 {|type| of a noad for generalized fractions}
13472@d fraction_noad_size=6 {number of |mem| words in a fraction noad}
13473@d small_fam(#)==mem[#].qqqq.b0 {|fam| for ``small'' delimiter}
13474@d small_char(#)==mem[#].qqqq.b1 {|character| for ``small'' delimiter}
13475@d large_fam(#)==mem[#].qqqq.b2 {|fam| for ``large'' delimiter}
13476@d large_char(#)==mem[#].qqqq.b3 {|character| for ``large'' delimiter}
13477@d thickness==width {|thickness| field in a fraction noad}
13478@d default_code==@'10000000000 {denotes |default_rule_thickness|}
13479@d numerator==supscr {|numerator| field in a fraction noad}
13480@d denominator==subscr {|denominator| field in a fraction noad}
13481
13482@ The global variable |empty_field| is set up for initialization of empty
13483fields in new noads. Similarly, |null_delimiter| is for the initialization
13484of delimiter fields.
13485
13486@<Glob...@>=
13487@!empty_field:two_halves;
13488@!null_delimiter:four_quarters;
13489
13490@ @<Set init...@>=
13491empty_field.rh:=empty; empty_field.lh:=null;@/
13492null_delimiter.b0:=0; null_delimiter.b1:=min_quarterword;@/
13493null_delimiter.b2:=0; null_delimiter.b3:=min_quarterword;
13494
13495@ The |new_noad| function creates an |ord_noad| that is completely null.
13496
13497@p function new_noad:pointer;
13498var p:pointer;
13499begin p:=get_node(noad_size);
13500type(p):=ord_noad; subtype(p):=normal;
13501mem[nucleus(p)].hh:=empty_field;
13502mem[subscr(p)].hh:=empty_field;
13503mem[supscr(p)].hh:=empty_field;
13504new_noad:=p;
13505end;
13506
13507@ A few more kinds of noads will complete the set: An |under_noad| has its
13508nucleus underlined; an |over_noad| has it overlined. An |accent_noad| places
13509an accent over its nucleus; the accent character appears as
13510|fam(accent_chr(p))| and |character(accent_chr(p))|. A |vcenter_noad|
13511centers its nucleus vertically with respect to the axis of the formula;
13512in such noads we always have |math_type(nucleus(p))=sub_box|.
13513
13514And finally, we have |left_noad| and |right_noad| types, to implement
13515\TeX's \.{\\left} and \.{\\right}. The |nucleus| of such noads is
13516replaced by a |delimiter| field; thus, for example, `\.{\\left(}' produces
13517a |left_noad| such that |delimiter(p)| holds the family and character
13518codes for all left parentheses. A |left_noad| never appears in an mlist
13519except as the first element, and a |right_noad| never appears in an mlist
13520except as the last element; furthermore, we either have both a |left_noad|
13521and a |right_noad|, or neither one is present. The |subscr| and |supscr|
13522fields are always |empty| in a |left_noad| and a |right_noad|.
13523
13524@d under_noad=fraction_noad+1 {|type| of a noad for underlining}
13525@d over_noad=under_noad+1 {|type| of a noad for overlining}
13526@d accent_noad=over_noad+1 {|type| of a noad for accented subformulas}
13527@d accent_noad_size=5 {number of |mem| words in an accent noad}
13528@d accent_chr(#)==#+4 {the |accent_chr| field of an accent noad}
13529@d vcenter_noad=accent_noad+1 {|type| of a noad for \.{\\vcenter}}
13530@d left_noad=vcenter_noad+1 {|type| of a noad for \.{\\left}}
13531@d right_noad=left_noad+1 {|type| of a noad for \.{\\right}}
13532@d delimiter==nucleus {|delimiter| field in left and right noads}
13533@d scripts_allowed(#)==(type(#)>=ord_noad)and(type(#)<left_noad)
13534
13535@ Math formulas can also contain instructions like \.{\\textstyle} that
13536override \TeX's normal style rules. A |style_node| is inserted into the
13537data structure to record such instructions; it is three words long, so it
13538is considered a node instead of a noad. The |subtype| is either |display_style|
13539or |text_style| or |script_style| or |script_script_style|. The
13540second and third words of a |style_node| are not used, but they are
13541present because a |choice_node| is converted to a |style_node|.
13542
13543\TeX\ uses even numbers 0, 2, 4, 6 to encode the basic styles
13544|display_style|, \dots, |script_script_style|, and adds~1 to get the
13545``cramped'' versions of these styles. This gives a numerical order that
13546is backwards from the convention of Appendix~G in {\sl The \TeX book\/};
13547i.e., a smaller style has a larger numerical value.
13548@:TeXbook}{\sl The \TeX book@>
13549
13550@d style_node=unset_node+1 {|type| of a style node}
13551@d style_node_size=3 {number of words in a style node}
13552@d display_style=0 {|subtype| for \.{\\displaystyle}}
13553@d text_style=2 {|subtype| for \.{\\textstyle}}
13554@d script_style=4 {|subtype| for \.{\\scriptstyle}}
13555@d script_script_style=6 {|subtype| for \.{\\scriptscriptstyle}}
13556@d cramped=1 {add this to an uncramped style if you want to cramp it}
13557
13558@p function new_style(@!s:small_number):pointer; {create a style node}
13559var p:pointer; {the new node}
13560begin p:=get_node(style_node_size); type(p):=style_node;
13561subtype(p):=s; width(p):=0; depth(p):=0; {the |width| and |depth| are not used}
13562new_style:=p;
13563end;
13564
13565@ Finally, the \.{\\mathchoice} primitive creates a |choice_node|, which
13566has special subfields |display_mlist|, |text_mlist|, |script_mlist|,
13567and |script_script_mlist| pointing to the mlists for each style.
13568
13569@d choice_node=unset_node+2 {|type| of a choice node}
13570@d display_mlist(#)==info(#+1) {mlist to be used in display style}
13571@d text_mlist(#)==link(#+1) {mlist to be used in text style}
13572@d script_mlist(#)==info(#+2) {mlist to be used in script style}
13573@d script_script_mlist(#)==link(#+2) {mlist to be used in scriptscript style}
13574
13575@p function new_choice:pointer; {create a choice node}
13576var p:pointer; {the new node}
13577begin p:=get_node(style_node_size); type(p):=choice_node;
13578subtype(p):=0; {the |subtype| is not used}
13579display_mlist(p):=null; text_mlist(p):=null; script_mlist(p):=null;
13580script_script_mlist(p):=null;
13581new_choice:=p;
13582end;
13583
13584@ Let's consider now the previously unwritten part of |show_node_list|
13585that displays the things that can only be present in mlists; this
13586program illustrates how to access the data structures just defined.
13587
13588In the context of the following program, |p| points to a node or noad that
13589should be displayed, and the current string contains the ``recursion history''
13590that leads to this point. The recursion history consists of a dot for each
13591outer level in which |p| is subsidiary to some node, or in which |p| is
13592subsidiary to the |nucleus| field of some noad; the dot is replaced by
13593`\.\_' or `\.\^' or `\./' or `\.\\' if |p| is descended from the |subscr|
13594or |supscr| or |denominator| or |numerator| fields of noads. For example,
13595the current string would be `\.{.\^.\_/}' if |p| points to the |ord_noad| for
13596|x| in the (ridiculous) formula
13597`\.{\$\\sqrt\{a\^\{\\mathinner\{b\_\{c\\over x+y\}\}\}\}\$}'.
13598
13599@<Cases of |show_node_list| that arise...@>=
13600style_node:print_style(subtype(p));
13601choice_node:@<Display choice node |p|@>;
13602ord_noad,op_noad,bin_noad,rel_noad,open_noad,close_noad,punct_noad,inner_noad,
13603  radical_noad,over_noad,under_noad,vcenter_noad,accent_noad,
13604  left_noad,right_noad:@<Display normal noad |p|@>;
13605fraction_noad:@<Display fraction noad |p|@>;
13606
13607@ Here are some simple routines used in the display of noads.
13608
13609@<Declare procedures needed for displaying the elements of mlists@>=
13610procedure print_fam_and_char(@!p:pointer); {prints family and character}
13611begin print_esc("fam"); print_int(fam(p)); print_char(" ");
13612print_ASCII(qo(character(p)));
13613end;
13614@#
13615procedure print_delimiter(@!p:pointer); {prints a delimiter as 24-bit hex value}
13616var a:integer; {accumulator}
13617begin a:=small_fam(p)*256+qo(small_char(p));
13618a:=a*@"1000+large_fam(p)*256+qo(large_char(p));
13619if a<0 then print_int(a) {this should never happen}
13620else print_hex(a);
13621end;
13622
13623@ The next subroutine will descend to another level of recursion when a
13624subsidiary mlist needs to be displayed. The parameter |c| indicates what
13625character is to become part of the recursion history. An empty mlist is
13626distinguished from a field with |math_type(p)=empty|, because these are
13627not equivalent (as explained above).
13628@^recursion@>
13629
13630@<Declare procedures needed for displaying...@>=
13631procedure@?show_info; forward;@t\2@>@?{|show_node_list(info(temp_ptr))|}
13632procedure print_subsidiary_data(@!p:pointer;@!c:ASCII_code);
13633  {display a noad field}
13634begin if cur_length>=depth_threshold then
13635  begin if math_type(p)<>empty then print(" []");
13636  end
13637else  begin append_char(c); {include |c| in the recursion history}
13638  temp_ptr:=p; {prepare for |show_info| if recursion is needed}
13639  case math_type(p) of
13640  math_char: begin print_ln; print_current_string; print_fam_and_char(p);
13641    end;
13642  sub_box: show_info; {recursive call}
13643  sub_mlist: if info(p)=null then
13644      begin print_ln; print_current_string; print("{}");
13645      end
13646    else show_info; {recursive call}
13647  othercases do_nothing {|empty|}
13648  endcases;@/
13649  flush_char; {remove |c| from the recursion history}
13650  end;
13651end;
13652
13653@ The inelegant introduction of |show_info| in the code above seems better
13654than the alternative of using \PASCAL's strange |forward| declaration for a
13655procedure with parameters. The \PASCAL\ convention about dropping parameters
13656from a post-|forward| procedure is, frankly, so intolerable to the author
13657of \TeX\ that he would rather stoop to communication via a global temporary
13658variable. (A similar stoopidity occurred with respect to |hlist_out| and
13659|vlist_out| above, and it will occur with respect to |mlist_to_hlist| below.)
13660@^Knuth, Donald Ervin@>
13661@:PASCAL}{\PASCAL@>
13662
13663@p procedure show_info; {the reader will kindly forgive this}
13664begin show_node_list(info(temp_ptr));
13665end;
13666
13667@ @<Declare procedures needed for displaying...@>=
13668procedure print_style(@!c:integer);
13669begin case c div 2 of
136700: print_esc("displaystyle"); {|display_style=0|}
136711: print_esc("textstyle"); {|text_style=2|}
136722: print_esc("scriptstyle"); {|script_style=4|}
136733: print_esc("scriptscriptstyle"); {|script_script_style=6|}
13674othercases print("Unknown style!")
13675endcases;
13676end;
13677
13678@ @<Display choice node |p|@>=
13679begin print_esc("mathchoice");
13680append_char("D"); show_node_list(display_mlist(p)); flush_char;
13681append_char("T"); show_node_list(text_mlist(p)); flush_char;
13682append_char("S"); show_node_list(script_mlist(p)); flush_char;
13683append_char("s"); show_node_list(script_script_mlist(p)); flush_char;
13684end
13685
13686@ @<Display normal noad |p|@>=
13687begin case type(p) of
13688ord_noad: print_esc("mathord");
13689op_noad: print_esc("mathop");
13690bin_noad: print_esc("mathbin");
13691rel_noad: print_esc("mathrel");
13692open_noad: print_esc("mathopen");
13693close_noad: print_esc("mathclose");
13694punct_noad: print_esc("mathpunct");
13695inner_noad: print_esc("mathinner");
13696over_noad: print_esc("overline");
13697under_noad: print_esc("underline");
13698vcenter_noad: print_esc("vcenter");
13699radical_noad: begin print_esc("radical"); print_delimiter(left_delimiter(p));
13700  end;
13701accent_noad: begin print_esc("accent"); print_fam_and_char(accent_chr(p));
13702  end;
13703left_noad: begin print_esc("left"); print_delimiter(delimiter(p));
13704  end;
13705right_noad: begin print_esc("right"); print_delimiter(delimiter(p));
13706  end;
13707end;
13708if subtype(p)<>normal then
13709  if subtype(p)=limits then print_esc("limits")
13710  else print_esc("nolimits");
13711if type(p)<left_noad then print_subsidiary_data(nucleus(p),".");
13712print_subsidiary_data(supscr(p),"^");
13713print_subsidiary_data(subscr(p),"_");
13714end
13715
13716@ @<Display fraction noad |p|@>=
13717begin print_esc("fraction, thickness ");
13718if thickness(p)=default_code then print("= default")
13719else print_scaled(thickness(p));
13720if (small_fam(left_delimiter(p))<>0)or@+
13721  (small_char(left_delimiter(p))<>min_quarterword)or@|
13722  (large_fam(left_delimiter(p))<>0)or@|
13723  (large_char(left_delimiter(p))<>min_quarterword) then
13724  begin print(", left-delimiter "); print_delimiter(left_delimiter(p));
13725  end;
13726if (small_fam(right_delimiter(p))<>0)or@|
13727  (small_char(right_delimiter(p))<>min_quarterword)or@|
13728  (large_fam(right_delimiter(p))<>0)or@|
13729  (large_char(right_delimiter(p))<>min_quarterword) then
13730  begin print(", right-delimiter "); print_delimiter(right_delimiter(p));
13731  end;
13732print_subsidiary_data(numerator(p),"\");
13733print_subsidiary_data(denominator(p),"/");
13734end
13735
13736@ That which can be displayed can also be destroyed.
13737
13738@<Cases of |flush_node_list| that arise...@>=
13739style_node: begin free_node(p,style_node_size); goto done;
13740  end;
13741choice_node:begin flush_node_list(display_mlist(p));
13742  flush_node_list(text_mlist(p));
13743  flush_node_list(script_mlist(p));
13744  flush_node_list(script_script_mlist(p));
13745  free_node(p,style_node_size); goto done;
13746  end;
13747ord_noad,op_noad,bin_noad,rel_noad,open_noad,close_noad,punct_noad,inner_noad,
13748  radical_noad,over_noad,under_noad,vcenter_noad,accent_noad:@t@>@;@/
13749  begin if math_type(nucleus(p))>=sub_box then
13750    flush_node_list(info(nucleus(p)));
13751  if math_type(supscr(p))>=sub_box then
13752    flush_node_list(info(supscr(p)));
13753  if math_type(subscr(p))>=sub_box then
13754    flush_node_list(info(subscr(p)));
13755  if type(p)=radical_noad then free_node(p,radical_noad_size)
13756  else if type(p)=accent_noad then free_node(p,accent_noad_size)
13757  else free_node(p,noad_size);
13758  goto done;
13759  end;
13760left_noad,right_noad: begin free_node(p,noad_size); goto done;
13761  end;
13762fraction_noad: begin flush_node_list(info(numerator(p)));
13763  flush_node_list(info(denominator(p)));
13764  free_node(p,fraction_noad_size); goto done;
13765  end;
13766
13767@* \[35] Subroutines for math mode.
13768In order to convert mlists to hlists, i.e., noads to nodes, we need several
13769subroutines that are conveniently dealt with now.
13770
13771Let us first introduce the macros that make it easy to get at the parameters and
13772other font information. A size code, which is a multiple of 16, is added to a
13773family number to get an index into the table of internal font numbers
13774for each combination of family and size.  (Be alert: Size codes get
13775larger as the type gets smaller.)
13776
13777@d text_size=0 {size code for the largest size in a family}
13778@d script_size=16 {size code for the medium size in a family}
13779@d script_script_size=32 {size code for the smallest size in a family}
13780
13781@<Basic printing procedures@>=
13782procedure print_size(@!s:integer);
13783begin if s=text_size then print_esc("textfont")
13784else if s=script_size then print_esc("scriptfont")
13785else print_esc("scriptscriptfont");
13786end;
13787
13788@ Before an mlist is converted to an hlist, \TeX\ makes sure that
13789the fonts in family~2 have enough parameters to be math-symbol
13790fonts, and that the fonts in family~3 have enough parameters to be
13791math-extension fonts. The math-symbol parameters are referred to by using the
13792following macros, which take a size code as their parameter; for example,
13793|num1(cur_size)| gives the value of the |num1| parameter for the current size.
13794@^parameters for symbols@>
13795@^font parameters@>
13796
13797@d mathsy_end(#)==fam_fnt(2+#)]].sc
13798@d mathsy(#)==font_info[#+param_base[mathsy_end
13799@d math_x_height==mathsy(5) {height of `\.x'}
13800@d math_quad==mathsy(6) {\.{18mu}}
13801@d num1==mathsy(8) {numerator shift-up in display styles}
13802@d num2==mathsy(9) {numerator shift-up in non-display, non-\.{\\atop}}
13803@d num3==mathsy(10) {numerator shift-up in non-display \.{\\atop}}
13804@d denom1==mathsy(11) {denominator shift-down in display styles}
13805@d denom2==mathsy(12) {denominator shift-down in non-display styles}
13806@d sup1==mathsy(13) {superscript shift-up in uncramped display style}
13807@d sup2==mathsy(14) {superscript shift-up in uncramped non-display}
13808@d sup3==mathsy(15) {superscript shift-up in cramped styles}
13809@d sub1==mathsy(16) {subscript shift-down if superscript is absent}
13810@d sub2==mathsy(17) {subscript shift-down if superscript is present}
13811@d sup_drop==mathsy(18) {superscript baseline below top of large box}
13812@d sub_drop==mathsy(19) {subscript baseline below bottom of large box}
13813@d delim1==mathsy(20) {size of \.{\\atopwithdelims} delimiters
13814  in display styles}
13815@d delim2==mathsy(21) {size of \.{\\atopwithdelims} delimiters in non-displays}
13816@d axis_height==mathsy(22) {height of fraction lines above the baseline}
13817@d total_mathsy_params=22
13818
13819@ The math-extension parameters have similar macros, but the size code is
13820omitted (since it is always |cur_size| when we refer to such parameters).
13821@^parameters for symbols@>
13822@^font parameters@>
13823
13824@d mathex(#)==font_info[#+param_base[fam_fnt(3+cur_size)]].sc
13825@d default_rule_thickness==mathex(8) {thickness of \.{\\over} bars}
13826@d big_op_spacing1==mathex(9) {minimum clearance above a displayed op}
13827@d big_op_spacing2==mathex(10) {minimum clearance below a displayed op}
13828@d big_op_spacing3==mathex(11) {minimum baselineskip above displayed op}
13829@d big_op_spacing4==mathex(12) {minimum baselineskip below displayed op}
13830@d big_op_spacing5==mathex(13) {padding above and below displayed limits}
13831@d total_mathex_params=13
13832
13833@ We also need to compute the change in style between mlists and their
13834subsidiaries. The following macros define the subsidiary style for
13835an overlined nucleus (|cramped_style|), for a subscript or a superscript
13836(|sub_style| or |sup_style|), or for a numerator or denominator (|num_style|
13837or |denom_style|).
13838
13839@d cramped_style(#)==2*(# div 2)+cramped {cramp the style}
13840@d sub_style(#)==2*(# div 4)+script_style+cramped {smaller and cramped}
13841@d sup_style(#)==2*(# div 4)+script_style+(# mod 2) {smaller}
13842@d num_style(#)==#+2-2*(# div 6) {smaller unless already script-script}
13843@d denom_style(#)==2*(# div 2)+cramped+2-2*(# div 6) {smaller, cramped}
13844
13845@ When the style changes, the following piece of program computes associated
13846information:
13847
13848@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>=
13849begin if cur_style<script_style then cur_size:=text_size
13850else cur_size:=16*((cur_style-text_style) div 2);
13851cur_mu:=x_over_n(math_quad(cur_size),18);
13852end
13853
13854@ Here is a function that returns a pointer to a rule node having a given
13855thickness |t|. The rule will extend horizontally to the boundary of the vlist
13856that eventually contains it.
13857
13858@p function fraction_rule(@!t:scaled):pointer;
13859  {construct the bar for a fraction}
13860var p:pointer; {the new node}
13861begin p:=new_rule; height(p):=t; depth(p):=0; fraction_rule:=p;
13862end;
13863
13864@ The |overbar| function returns a pointer to a vlist box that consists of
13865a given box |b|, above which has been placed a kern of height |k| under a
13866fraction rule of thickness |t| under additional space of height |t|.
13867
13868@p function overbar(@!b:pointer;@!k,@!t:scaled):pointer;
13869var p,@!q:pointer; {nodes being constructed}
13870begin p:=new_kern(k); link(p):=b; q:=fraction_rule(t); link(q):=p;
13871p:=new_kern(t); link(p):=q; overbar:=vpack(p,natural);
13872end;
13873
13874@ The |var_delimiter| function, which finds or constructs a sufficiently
13875large delimiter, is the most interesting of the auxiliary functions that
13876currently concern us. Given a pointer |d| to a delimiter field in some noad,
13877together with a size code |s| and a vertical distance |v|, this function
13878returns a pointer to a box that contains the smallest variant of |d| whose
13879height plus depth is |v| or more. (And if no variant is large enough, it
13880returns the largest available variant.) In particular, this routine will
13881construct arbitrarily large delimiters from extensible components, if
13882|d| leads to such characters.
13883
13884The value returned is a box whose |shift_amount| has been set so that
13885the box is vertically centered with respect to the axis in the given size.
13886If a built-up symbol is returned, the height of the box before shifting
13887will be the height of its topmost component.
13888
13889@p@t\4@>@<Declare subprocedures for |var_delimiter|@>
13890function var_delimiter(@!d:pointer;@!s:small_number;@!v:scaled):pointer;
13891label found,continue;
13892var b:pointer; {the box that will be constructed}
13893@!f,@!g: internal_font_number; {best-so-far and tentative font codes}
13894@!c,@!x,@!y: quarterword; {best-so-far and tentative character codes}
13895@!m,@!n: integer; {the number of extensible pieces}
13896@!u: scaled; {height-plus-depth of a tentative character}
13897@!w: scaled; {largest height-plus-depth so far}
13898@!q: four_quarters; {character info}
13899@!hd: eight_bits; {height-depth byte}
13900@!r: four_quarters; {extensible pieces}
13901@!z: small_number; {runs through font family members}
13902@!large_attempt: boolean; {are we trying the ``large'' variant?}
13903begin f:=null_font; w:=0; large_attempt:=false;
13904z:=small_fam(d); x:=small_char(d);
13905loop@+  begin @<Look at the variants of |(z,x)|; set |f| and |c| whenever
13906    a better character is found; |goto found| as soon as a
13907    large enough variant is encountered@>;
13908  if large_attempt then goto found; {there were none large enough}
13909  large_attempt:=true; z:=large_fam(d); x:=large_char(d);
13910  end;
13911found: if f<>null_font then
13912  @<Make variable |b| point to a box for |(f,c)|@>
13913else  begin b:=new_null_box;
13914  width(b):=null_delimiter_space; {use this width if no delimiter was found}
13915  end;
13916shift_amount(b):=half(height(b)-depth(b)) - axis_height(s);
13917var_delimiter:=b;
13918end;
13919
13920@ The search process is complicated slightly by the facts that some of the
13921characters might not be present in some of the fonts, and they might not
13922be probed in increasing order of height.
13923
13924@<Look at the variants of |(z,x)|; set |f| and |c|...@>=
13925if (z<>0)or(x<>min_quarterword) then
13926  begin z:=z+s+16;
13927  repeat z:=z-16; g:=fam_fnt(z);
13928  if g<>null_font then
13929    @<Look at the list of characters starting with |x| in
13930      font |g|; set |f| and |c| whenever
13931      a better character is found; |goto found| as soon as a
13932      large enough variant is encountered@>;
13933  until z<16;
13934  end
13935
13936@ @<Look at the list of characters starting with |x|...@>=
13937begin y:=x;
13938if (qo(y)>=font_bc[g])and(qo(y)<=font_ec[g]) then
13939  begin continue: q:=char_info(g)(y);
13940  if char_exists(q) then
13941    begin if char_tag(q)=ext_tag then
13942      begin f:=g; c:=y; goto found;
13943      end;
13944    hd:=height_depth(q);
13945    u:=char_height(g)(hd)+char_depth(g)(hd);
13946    if u>w then
13947      begin f:=g; c:=y; w:=u;
13948      if u>=v then goto found;
13949      end;
13950    if char_tag(q)=list_tag then
13951      begin y:=rem_byte(q); goto continue;
13952      end;
13953    end;
13954  end;
13955end
13956
13957@ Here is a subroutine that creates a new box, whose list contains a
13958single character, and whose width includes the italic correction for
13959that character. The height or depth of the box will be negative, if
13960the height or depth of the character is negative; thus, this routine
13961may deliver a slightly different result than |hpack| would produce.
13962
13963@<Declare subprocedures for |var_delimiter|@>=
13964function char_box(@!f:internal_font_number;@!c:quarterword):pointer;
13965var q:four_quarters;
13966@!hd:eight_bits; {|height_depth| byte}
13967@!b,@!p:pointer; {the new box and its character node}
13968begin q:=char_info(f)(c); hd:=height_depth(q);
13969b:=new_null_box; width(b):=char_width(f)(q)+char_italic(f)(q);
13970height(b):=char_height(f)(hd); depth(b):=char_depth(f)(hd);
13971p:=get_avail; character(p):=c; font(p):=f; list_ptr(b):=p; char_box:=b;
13972end;
13973
13974@ When the following code is executed, |char_tag(q)| will be equal to
13975|ext_tag| if and only if a built-up symbol is supposed to be returned.
13976
13977@<Make variable |b| point to a box for |(f,c)|@>=
13978if char_tag(q)=ext_tag then
13979  @<Construct an extensible character in a new box |b|,
13980    using recipe |rem_byte(q)| and font |f|@>
13981else b:=char_box(f,c)
13982
13983@ When we build an extensible character, it's handy to have the
13984following subroutine, which puts a given character on top
13985of the characters already in box |b|:
13986
13987@<Declare subprocedures for |var_delimiter|@>=
13988procedure stack_into_box(@!b:pointer;@!f:internal_font_number;
13989  @!c:quarterword);
13990var p:pointer; {new node placed into |b|}
13991begin p:=char_box(f,c); link(p):=list_ptr(b); list_ptr(b):=p;
13992height(b):=height(p);
13993end;
13994
13995@ Another handy subroutine computes the height plus depth of
13996a given character:
13997
13998@<Declare subprocedures for |var_delimiter|@>=
13999function height_plus_depth(@!f:internal_font_number;@!c:quarterword):scaled;
14000var q:four_quarters;
14001@!hd:eight_bits; {|height_depth| byte}
14002begin q:=char_info(f)(c); hd:=height_depth(q);
14003height_plus_depth:=char_height(f)(hd)+char_depth(f)(hd);
14004end;
14005
14006@ @<Construct an extensible...@>=
14007begin b:=new_null_box;
14008type(b):=vlist_node;
14009r:=font_info[exten_base[f]+rem_byte(q)].qqqq;@/
14010@<Compute the minimum suitable height, |w|, and the corresponding
14011  number of extension steps, |n|; also set |width(b)|@>;
14012c:=ext_bot(r);
14013if c<>min_quarterword then stack_into_box(b,f,c);
14014c:=ext_rep(r);
14015for m:=1 to n do stack_into_box(b,f,c);
14016c:=ext_mid(r);
14017if c<>min_quarterword then
14018  begin stack_into_box(b,f,c); c:=ext_rep(r);
14019  for m:=1 to n do stack_into_box(b,f,c);
14020  end;
14021c:=ext_top(r);
14022if c<>min_quarterword then stack_into_box(b,f,c);
14023depth(b):=w-height(b);
14024end
14025
14026@ The width of an extensible character is the width of the repeatable
14027module. If this module does not have positive height plus depth,
14028we don't use any copies of it, otherwise we use as few as possible
14029(in groups of two if there is a middle part).
14030
14031@<Compute the minimum suitable height, |w|, and...@>=
14032c:=ext_rep(r); u:=height_plus_depth(f,c);
14033w:=0; q:=char_info(f)(c); width(b):=char_width(f)(q)+char_italic(f)(q);@/
14034c:=ext_bot(r);@+if c<>min_quarterword then w:=w+height_plus_depth(f,c);
14035c:=ext_mid(r);@+if c<>min_quarterword then w:=w+height_plus_depth(f,c);
14036c:=ext_top(r);@+if c<>min_quarterword then w:=w+height_plus_depth(f,c);
14037n:=0;
14038if u>0 then while w<v do
14039  begin w:=w+u; incr(n);
14040  if ext_mid(r)<>min_quarterword then w:=w+u;
14041  end
14042
14043@ The next subroutine is much simpler; it is used for numerators and
14044denominators of fractions as well as for displayed operators and
14045their limits above and below. It takes a given box~|b| and
14046changes it so that the new box is centered in a box of width~|w|.
14047The centering is done by putting \.{\\hss} glue at the left and right
14048of the list inside |b|, then packaging the new box; thus, the
14049actual box might not really be centered, if it already contains
14050infinite glue.
14051
14052The given box might contain a single character whose italic correction
14053has been added to the width of the box; in this case a compensating
14054kern is inserted.
14055
14056@p function rebox(@!b:pointer;@!w:scaled):pointer;
14057var p:pointer; {temporary register for list manipulation}
14058@!f:internal_font_number; {font in a one-character box}
14059@!v:scaled; {width of a character without italic correction}
14060begin if (width(b)<>w)and(list_ptr(b)<>null) then
14061  begin if type(b)=vlist_node then b:=hpack(b,natural);
14062  p:=list_ptr(b);
14063  if (is_char_node(p))and(link(p)=null) then
14064    begin f:=font(p); v:=char_width(f)(char_info(f)(character(p)));
14065    if v<>width(b) then link(p):=new_kern(width(b)-v);
14066    end;
14067  free_node(b,box_node_size);
14068  b:=new_glue(ss_glue); link(b):=p;
14069  while link(p)<>null do p:=link(p);
14070  link(p):=new_glue(ss_glue);
14071  rebox:=hpack(b,w,exactly);
14072  end
14073else  begin width(b):=w; rebox:=b;
14074  end;
14075end;
14076
14077@ Here is a subroutine that creates a new glue specification from another
14078one that is expressed in `\.{mu}', given the value of the math unit.
14079
14080@d mu_mult(#)==nx_plus_y(n,#,xn_over_d(#,f,@'200000))
14081
14082@p function math_glue(@!g:pointer;@!m:scaled):pointer;
14083var p:pointer; {the new glue specification}
14084@!n:integer; {integer part of |m|}
14085@!f:scaled; {fraction part of |m|}
14086begin n:=x_over_n(m,@'200000); f:=remainder;@/
14087if f<0 then
14088  begin decr(n); f:=f+@'200000;
14089  end;
14090p:=get_node(glue_spec_size);
14091width(p):=mu_mult(width(g)); {convert \.{mu} to \.{pt}}
14092stretch_order(p):=stretch_order(g);
14093if stretch_order(p)=normal then stretch(p):=mu_mult(stretch(g))
14094else stretch(p):=stretch(g);
14095shrink_order(p):=shrink_order(g);
14096if shrink_order(p)=normal then shrink(p):=mu_mult(shrink(g))
14097else shrink(p):=shrink(g);
14098math_glue:=p;
14099end;
14100
14101@ The |math_kern| subroutine removes |mu_glue| from a kern node, given
14102the value of the math unit.
14103
14104@p procedure math_kern(@!p:pointer;@!m:scaled);
14105var @!n:integer; {integer part of |m|}
14106@!f:scaled; {fraction part of |m|}
14107begin if subtype(p)=mu_glue then
14108  begin n:=x_over_n(m,@'200000); f:=remainder;@/
14109  if f<0 then
14110    begin decr(n); f:=f+@'200000;
14111    end;
14112  width(p):=mu_mult(width(p)); subtype(p):=explicit;
14113  end;
14114end;
14115
14116@ Sometimes it is necessary to destroy an mlist. The following
14117subroutine empties the current list, assuming that |abs(mode)=mmode|.
14118
14119@p procedure flush_math;
14120begin flush_node_list(link(head)); flush_node_list(incompleat_noad);
14121link(head):=null; tail:=head; incompleat_noad:=null;
14122end;
14123
14124@* \[36] Typesetting math formulas.
14125\TeX's most important routine for dealing with formulas is called
14126|mlist_to_hlist|.  After a formula has been scanned and represented as an
14127mlist, this routine converts it to an hlist that can be placed into a box
14128or incorporated into the text of a paragraph. There are three implicit
14129parameters, passed in global variables: |cur_mlist| points to the first
14130node or noad in the given mlist (and it might be |null|); |cur_style| is a
14131style code; and |mlist_penalties| is |true| if penalty nodes for potential
14132line breaks are to be inserted into the resulting hlist. After
14133|mlist_to_hlist| has acted, |link(temp_head)| points to the translated hlist.
14134
14135Since mlists can be inside mlists, the procedure is recursive. And since this
14136is not part of \TeX's inner loop, the program has been written in a manner
14137that stresses compactness over efficiency.
14138@^recursion@>
14139
14140@<Glob...@>=
14141@!cur_mlist:pointer; {beginning of mlist to be translated}
14142@!cur_style:small_number; {style code at current place in the list}
14143@!cur_size:small_number; {size code corresponding to |cur_style|}
14144@!cur_mu:scaled; {the math unit width corresponding to |cur_size|}
14145@!mlist_penalties:boolean; {should |mlist_to_hlist| insert penalties?}
14146
14147@ The recursion in |mlist_to_hlist| is due primarily to a subroutine
14148called |clean_box| that puts a given noad field into a box using a given
14149math style; |mlist_to_hlist| can call |clean_box|, which can call
14150|mlist_to_hlist|.
14151@^recursion@>
14152
14153The box returned by |clean_box| is ``clean'' in the
14154sense that its |shift_amount| is zero.
14155
14156@p procedure@?mlist_to_hlist; forward;@t\2@>@/
14157function clean_box(@!p:pointer;@!s:small_number):pointer;
14158label found;
14159var q:pointer; {beginning of a list to be boxed}
14160@!save_style:small_number; {|cur_style| to be restored}
14161@!x:pointer; {box to be returned}
14162@!r:pointer; {temporary pointer}
14163begin case math_type(p) of
14164math_char: begin cur_mlist:=new_noad; mem[nucleus(cur_mlist)]:=mem[p];
14165  end;
14166sub_box: begin q:=info(p); goto found;
14167  end;
14168sub_mlist: cur_mlist:=info(p);
14169othercases begin q:=new_null_box; goto found;
14170  end
14171endcases;@/
14172save_style:=cur_style; cur_style:=s; mlist_penalties:=false;@/
14173mlist_to_hlist; q:=link(temp_head); {recursive call}
14174cur_style:=save_style; {restore the style}
14175@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>;
14176found: if is_char_node(q)or(q=null) then x:=hpack(q,natural)
14177  else if (link(q)=null)and(type(q)<=vlist_node)and(shift_amount(q)=0) then
14178    x:=q {it's already clean}
14179  else x:=hpack(q,natural);
14180@<Simplify a trivial box@>;
14181clean_box:=x;
14182end;
14183
14184@ Here we save memory space in a common case.
14185
14186@<Simplify a trivial box@>=
14187q:=list_ptr(x);
14188if is_char_node(q) then
14189  begin r:=link(q);
14190  if r<>null then if link(r)=null then if not is_char_node(r) then
14191   if type(r)=kern_node then {unneeded italic correction}
14192    begin free_node(r,small_node_size); link(q):=null;
14193    end;
14194  end
14195
14196@ It is convenient to have a procedure that converts a |math_char|
14197field to an ``unpacked'' form. The |fetch| routine sets |cur_f|, |cur_c|,
14198and |cur_i| to the font code, character code, and character information bytes of
14199a given noad field. It also takes care of issuing error messages for
14200nonexistent characters; in such cases, |char_exists(cur_i)| will be |false|
14201after |fetch| has acted, and the field will also have been reset to |empty|.
14202
14203@p procedure fetch(@!a:pointer); {unpack the |math_char| field |a|}
14204begin cur_c:=character(a); cur_f:=fam_fnt(fam(a)+cur_size);
14205if cur_f=null_font then
14206  @<Complain about an undefined family and set |cur_i| null@>
14207else  begin if (qo(cur_c)>=font_bc[cur_f])and(qo(cur_c)<=font_ec[cur_f]) then
14208    cur_i:=char_info(cur_f)(cur_c)
14209  else cur_i:=null_character;
14210  if not(char_exists(cur_i)) then
14211    begin char_warning(cur_f,qo(cur_c));
14212    math_type(a):=empty;
14213    end;
14214  end;
14215end;
14216
14217@ @<Complain about an undefined family...@>=
14218begin print_err(""); print_size(cur_size); print_char(" ");
14219print_int(fam(a)); print(" is undefined (character ");
14220print_ASCII(qo(cur_c)); print_char(")");
14221help4("Somewhere in the math formula just ended, you used the")@/
14222("stated character from an undefined font family. For example,")@/
14223("plain TeX doesn't allow \it or \sl in subscripts. Proceed,")@/
14224("and I'll try to forget that I needed that character.");
14225error; cur_i:=null_character; math_type(a):=empty;
14226end
14227
14228@ The outputs of |fetch| are placed in global variables.
14229
14230@<Glob...@>=
14231@!cur_f:internal_font_number; {the |font| field of a |math_char|}
14232@!cur_c:quarterword; {the |character| field of a |math_char|}
14233@!cur_i:four_quarters; {the |char_info| of a |math_char|,
14234  or a lig/kern instruction}
14235
14236@ We need to do a lot of different things, so |mlist_to_hlist| makes two
14237passes over the given mlist.
14238
14239The first pass does most of the processing: It removes ``mu'' spacing from
14240glue, it recursively evaluates all subsidiary mlists so that only the
14241top-level mlist remains to be handled, it puts fractions and square roots
14242and such things into boxes, it attaches subscripts and superscripts, and
14243it computes the overall height and depth of the top-level mlist so that
14244the size of delimiters for a |left_noad| and a |right_noad| will be known.
14245The hlist resulting from each noad is recorded in that noad's |new_hlist|
14246field, an integer field that replaces the |nucleus| or |thickness|.
14247@^recursion@>
14248
14249The second pass eliminates all noads and inserts the correct glue and
14250penalties between nodes.
14251
14252@d new_hlist(#)==mem[nucleus(#)].int {the translation of an mlist}
14253
14254@ Here is the overall plan of |mlist_to_hlist|, and the list of its
14255local variables.
14256
14257@d done_with_noad=80 {go here when a noad has been fully translated}
14258@d done_with_node=81 {go here when a node has been fully converted}
14259@d check_dimensions=82 {go here to update |max_h| and |max_d|}
14260@d delete_q=83 {go here to delete |q| and move to the next node}
14261
14262@p@t\4@>@<Declare math construction procedures@>
14263procedure mlist_to_hlist;
14264label reswitch, check_dimensions, done_with_noad, done_with_node, delete_q,
14265  done;
14266var mlist:pointer; {beginning of the given list}
14267@!penalties:boolean; {should penalty nodes be inserted?}
14268@!style:small_number; {the given style}
14269@!save_style:small_number; {holds |cur_style| during recursion}
14270@!q:pointer; {runs through the mlist}
14271@!r:pointer; {the most recent noad preceding |q|}
14272@!r_type:small_number; {the |type| of noad |r|, or |op_noad| if |r=null|}
14273@!t:small_number; {the effective |type| of noad |q| during the second pass}
14274@!p,@!x,@!y,@!z: pointer; {temporary registers for list construction}
14275@!pen:integer; {a penalty to be inserted}
14276@!s:small_number; {the size of a noad to be deleted}
14277@!max_h,@!max_d:scaled; {maximum height and depth of the list translated so far}
14278@!delta:scaled; {offset between subscript and superscript}
14279begin mlist:=cur_mlist; penalties:=mlist_penalties;
14280style:=cur_style; {tuck global parameters away as local variables}
14281q:=mlist; r:=null; r_type:=op_noad; max_h:=0; max_d:=0;
14282@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>;
14283while q<>null do @<Process node-or-noad |q| as much as possible in preparation
14284    for the second pass of |mlist_to_hlist|, then move to the next
14285    item in the mlist@>;
14286@<Convert \(a)a final |bin_noad| to an |ord_noad|@>;
14287@<Make a second pass over the mlist, removing all noads and inserting the
14288  proper spacing and penalties@>;
14289end;
14290
14291@ We use the fact that no character nodes appear in an mlist, hence
14292the field |type(q)| is always present.
14293
14294@<Process node-or-noad...@>=
14295begin @<Do first-pass processing based on |type(q)|; |goto done_with_noad|
14296  if a noad has been fully processed, |goto check_dimensions| if it
14297  has been translated into |new_hlist(q)|, or |goto done_with_node|
14298  if a node has been fully processed@>;
14299check_dimensions: z:=hpack(new_hlist(q),natural);
14300if height(z)>max_h then max_h:=height(z);
14301if depth(z)>max_d then max_d:=depth(z);
14302free_node(z,box_node_size);
14303done_with_noad: r:=q; r_type:=type(r);
14304done_with_node: q:=link(q);
14305end
14306
14307@ One of the things we must do on the first pass is change a |bin_noad| to
14308an |ord_noad| if the |bin_noad| is not in the context of a binary operator.
14309The values of |r| and |r_type| make this fairly easy.
14310
14311@<Do first-pass processing...@>=
14312reswitch: delta:=0;
14313case type(q) of
14314bin_noad: case r_type of
14315  bin_noad,op_noad,rel_noad,open_noad,punct_noad,left_noad:
14316    begin type(q):=ord_noad; goto reswitch;
14317    end;
14318  othercases do_nothing
14319  endcases;
14320rel_noad,close_noad,punct_noad,right_noad: begin@t@>@;@/
14321  @<Convert \(a)a final |bin_noad| to an |ord_noad|@>;
14322  if type(q)=right_noad then goto done_with_noad;
14323  end;
14324@t\4@>@<Cases for noads that can follow a |bin_noad|@>@;
14325@t\4@>@<Cases for nodes that can appear in an mlist, after which we
14326  |goto done_with_node|@>@;
14327othercases confusion("mlist1")
14328@:this can't happen mlist1}{\quad mlist1@>
14329endcases;@/
14330@<Convert \(n)|nucleus(q)| to an hlist and attach the sub/superscripts@>
14331
14332@ @<Convert \(a)a final |bin_noad| to an |ord_noad|@>=
14333if r_type=bin_noad then type(r):=ord_noad
14334
14335@ @<Cases for nodes that can appear in an mlist...@>=
14336style_node: begin cur_style:=subtype(q);
14337  @<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>;
14338  goto done_with_node;
14339  end;
14340choice_node: @<Change this node to a style node followed by the correct choice,
14341   then |goto done_with_node|@>;
14342ins_node,mark_node,adjust_node,
14343  whatsit_node,penalty_node,disc_node: goto done_with_node;
14344rule_node: begin if height(q)>max_h then max_h:=height(q);
14345  if depth(q)>max_d then max_d:=depth(q); goto done_with_node;
14346  end;
14347glue_node: begin @<Convert \(m)math glue to ordinary glue@>;
14348  goto done_with_node;
14349  end;
14350kern_node: begin math_kern(q,cur_mu); goto done_with_node;
14351  end;
14352
14353@ @d choose_mlist(#)==begin p:=#(q); #(q):=null;@+end
14354
14355@<Change this node to a style node...@>=
14356begin case cur_style div 2 of
143570: choose_mlist(display_mlist); {|display_style=0|}
143581: choose_mlist(text_mlist); {|text_style=2|}
143592: choose_mlist(script_mlist); {|script_style=4|}
143603: choose_mlist(script_script_mlist); {|script_script_style=6|}
14361end; {there are no other cases}
14362flush_node_list(display_mlist(q));
14363flush_node_list(text_mlist(q));
14364flush_node_list(script_mlist(q));
14365flush_node_list(script_script_mlist(q));@/
14366type(q):=style_node; subtype(q):=cur_style; width(q):=0; depth(q):=0;
14367if p<>null then
14368  begin z:=link(q); link(q):=p;
14369  while link(p)<>null do p:=link(p);
14370  link(p):=z;
14371  end;
14372goto done_with_node;
14373end
14374
14375@ Conditional math glue (`\.{\\nonscript}') results in a |glue_node|
14376pointing to |zero_glue|, with |subtype(q)=cond_math_glue|; in such a case
14377the node following will be eliminated if it is a glue or kern node and if the
14378current size is different from |text_size|. Unconditional math glue
14379(`\.{\\muskip}') is converted to normal glue by multiplying the dimensions
14380by |cur_mu|.
14381@!@:non_script_}{\.{\\nonscript} primitive@>
14382
14383@<Convert \(m)math glue to ordinary glue@>=
14384if subtype(q)=mu_glue then
14385  begin x:=glue_ptr(q);
14386  y:=math_glue(x,cur_mu); delete_glue_ref(x); glue_ptr(q):=y;
14387  subtype(q):=normal;
14388  end
14389else if (cur_size<>text_size)and(subtype(q)=cond_math_glue) then
14390  begin p:=link(q);
14391  if p<>null then if (type(p)=glue_node)or(type(p)=kern_node) then
14392    begin link(q):=link(p); link(p):=null; flush_node_list(p);
14393    end;
14394  end
14395
14396@ @<Cases for noads that can follow a |bin_noad|@>=
14397left_noad: goto done_with_noad;
14398fraction_noad: begin make_fraction(q); goto check_dimensions;
14399  end;
14400op_noad: begin delta:=make_op(q);
14401  if subtype(q)=limits then goto check_dimensions;
14402  end;
14403ord_noad: make_ord(q);
14404open_noad,inner_noad: do_nothing;
14405radical_noad: make_radical(q);
14406over_noad: make_over(q);
14407under_noad: make_under(q);
14408accent_noad: make_math_accent(q);
14409vcenter_noad: make_vcenter(q);
14410
14411@ Most of the actual construction work of |mlist_to_hlist| is done
14412by procedures with names
14413like |make_fraction|, |make_radical|, etc. To illustrate
14414the general setup of such procedures, let's begin with a couple of
14415simple ones.
14416
14417@<Declare math...@>=
14418procedure make_over(@!q:pointer);
14419begin info(nucleus(q)):=@|
14420  overbar(clean_box(nucleus(q),cramped_style(cur_style)),@|
14421  3*default_rule_thickness,default_rule_thickness);
14422math_type(nucleus(q)):=sub_box;
14423end;
14424
14425@ @<Declare math...@>=
14426procedure make_under(@!q:pointer);
14427var p,@!x,@!y: pointer; {temporary registers for box construction}
14428@!delta:scaled; {overall height plus depth}
14429begin x:=clean_box(nucleus(q),cur_style);
14430p:=new_kern(3*default_rule_thickness); link(x):=p;
14431link(p):=fraction_rule(default_rule_thickness);
14432y:=vpack(x,natural);
14433delta:=height(y)+depth(y)+default_rule_thickness;
14434height(y):=height(x); depth(y):=delta-height(y);
14435info(nucleus(q)):=y; math_type(nucleus(q)):=sub_box;
14436end;
14437
14438@ @<Declare math...@>=
14439procedure make_vcenter(@!q:pointer);
14440var v:pointer; {the box that should be centered vertically}
14441@!delta:scaled; {its height plus depth}
14442begin v:=info(nucleus(q));
14443if type(v)<>vlist_node then confusion("vcenter");
14444@:this can't happen vcenter}{\quad vcenter@>
14445delta:=height(v)+depth(v);
14446height(v):=axis_height(cur_size)+half(delta);
14447depth(v):=delta-height(v);
14448end;
14449
14450@ According to the rules in the \.{DVI} file specifications, we ensure alignment
14451@^square roots@>
14452between a square root sign and the rule above its nucleus by assuming that the
14453baseline of the square-root symbol is the same as the bottom of the rule. The
14454height of the square-root symbol will be the thickness of the rule, and the
14455depth of the square-root symbol should exceed or equal the height-plus-depth
14456of the nucleus plus a certain minimum clearance~|clr|. The symbol will be
14457placed so that the actual clearance is |clr| plus half the excess.
14458
14459@<Declare math...@>=
14460procedure make_radical(@!q:pointer);
14461var x,@!y:pointer; {temporary registers for box construction}
14462@!delta,@!clr:scaled; {dimensions involved in the calculation}
14463begin x:=clean_box(nucleus(q),cramped_style(cur_style));
14464if cur_style<text_style then {display style}
14465  clr:=default_rule_thickness+(abs(math_x_height(cur_size)) div 4)
14466else  begin clr:=default_rule_thickness; clr:=clr + (abs(clr) div 4);
14467  end;
14468y:=var_delimiter(left_delimiter(q),cur_size,height(x)+depth(x)+clr+
14469  default_rule_thickness);
14470delta:=depth(y)-(height(x)+depth(x)+clr);
14471if delta>0 then clr:=clr+half(delta); {increase the actual clearance}
14472shift_amount(y):=-(height(x)+clr);
14473link(y):=overbar(x,clr,height(y));
14474info(nucleus(q)):=hpack(y,natural); math_type(nucleus(q)):=sub_box;
14475end;
14476
14477@ Slants are not considered when placing accents in math mode. The accenter is
14478centered over the accentee, and the accent width is treated as zero with
14479respect to the size of the final box.
14480
14481@<Declare math...@>=
14482procedure make_math_accent(@!q:pointer);
14483label done,done1;
14484var p,@!x,@!y:pointer; {temporary registers for box construction}
14485@!a:integer; {address of lig/kern instruction}
14486@!c:quarterword; {accent character}
14487@!f:internal_font_number; {its font}
14488@!i:four_quarters; {its |char_info|}
14489@!s:scaled; {amount to skew the accent to the right}
14490@!h:scaled; {height of character being accented}
14491@!delta:scaled; {space to remove between accent and accentee}
14492@!w:scaled; {width of the accentee, not including sub/superscripts}
14493begin fetch(accent_chr(q));
14494if char_exists(cur_i) then
14495  begin i:=cur_i; c:=cur_c; f:=cur_f;@/
14496  @<Compute the amount of skew@>;
14497  x:=clean_box(nucleus(q),cramped_style(cur_style)); w:=width(x); h:=height(x);
14498  @<Switch to a larger accent if available and appropriate@>;
14499  if h<x_height(f) then delta:=h@+else delta:=x_height(f);
14500  if (math_type(supscr(q))<>empty)or(math_type(subscr(q))<>empty) then
14501    if math_type(nucleus(q))=math_char then
14502      @<Swap the subscript and superscript into box |x|@>;
14503  y:=char_box(f,c);
14504  shift_amount(y):=s+half(w-width(y));
14505  width(y):=0; p:=new_kern(-delta); link(p):=x; link(y):=p;
14506  y:=vpack(y,natural); width(y):=width(x);
14507  if height(y)<h then @<Make the height of box |y| equal to |h|@>;
14508  info(nucleus(q)):=y;
14509  math_type(nucleus(q)):=sub_box;
14510  end;
14511end;
14512
14513@ @<Make the height of box |y|...@>=
14514begin p:=new_kern(h-height(y)); link(p):=list_ptr(y); list_ptr(y):=p;
14515height(y):=h;
14516end
14517
14518@ @<Switch to a larger accent if available and appropriate@>=
14519loop@+  begin if char_tag(i)<>list_tag then goto done;
14520  y:=rem_byte(i);
14521  i:=char_info(f)(y);
14522  if not char_exists(i) then goto done;
14523  if char_width(f)(i)>w then goto done;
14524  c:=y;
14525  end;
14526done:
14527
14528@ @<Compute the amount of skew@>=
14529s:=0;
14530if math_type(nucleus(q))=math_char then
14531  begin fetch(nucleus(q));
14532  if char_tag(cur_i)=lig_tag then
14533    begin a:=lig_kern_start(cur_f)(cur_i);
14534    cur_i:=font_info[a].qqqq;
14535    if skip_byte(cur_i)>stop_flag then
14536      begin a:=lig_kern_restart(cur_f)(cur_i);
14537      cur_i:=font_info[a].qqqq;
14538      end;
14539    loop@+ begin if qo(next_char(cur_i))=skew_char[cur_f] then
14540        begin if op_byte(cur_i)>=kern_flag then
14541          if skip_byte(cur_i)<=stop_flag then s:=char_kern(cur_f)(cur_i);
14542        goto done1;
14543        end;
14544      if skip_byte(cur_i)>=stop_flag then goto done1;
14545      a:=a+qo(skip_byte(cur_i))+1;
14546      cur_i:=font_info[a].qqqq;
14547      end;
14548    end;
14549  end;
14550done1:
14551
14552@ @<Swap the subscript and superscript into box |x|@>=
14553begin flush_node_list(x); x:=new_noad;
14554mem[nucleus(x)]:=mem[nucleus(q)];
14555mem[supscr(x)]:=mem[supscr(q)];
14556mem[subscr(x)]:=mem[subscr(q)];@/
14557mem[supscr(q)].hh:=empty_field;
14558mem[subscr(q)].hh:=empty_field;@/
14559math_type(nucleus(q)):=sub_mlist; info(nucleus(q)):=x;
14560x:=clean_box(nucleus(q),cur_style); delta:=delta+height(x)-h; h:=height(x);
14561end
14562
14563@ The |make_fraction| procedure is a bit different because it sets
14564|new_hlist(q)| directly rather than making a sub-box.
14565
14566@<Declare math...@>=
14567procedure make_fraction(@!q:pointer);
14568var p,@!v,@!x,@!y,@!z:pointer; {temporary registers for box construction}
14569@!delta,@!delta1,@!delta2,@!shift_up,@!shift_down,@!clr:scaled;
14570  {dimensions for box calculations}
14571begin if thickness(q)=default_code then thickness(q):=default_rule_thickness;
14572@<Create equal-width boxes |x| and |z| for the numerator and denominator,
14573  and compute the default amounts |shift_up| and |shift_down| by which they
14574  are displaced from the baseline@>;
14575if thickness(q)=0 then @<Adjust \(s)|shift_up| and |shift_down| for the case
14576  of no fraction line@>
14577else @<Adjust \(s)|shift_up| and |shift_down| for the case of a fraction line@>;
14578@<Construct a vlist box for the fraction, according to |shift_up| and
14579  |shift_down|@>;
14580@<Put the \(f)fraction into a box with its delimiters, and make |new_hlist(q)|
14581  point to it@>;
14582end;
14583
14584@ @<Create equal-width boxes |x| and |z| for the numerator and denom...@>=
14585x:=clean_box(numerator(q),num_style(cur_style));
14586z:=clean_box(denominator(q),denom_style(cur_style));
14587if width(x)<width(z) then x:=rebox(x,width(z))
14588else z:=rebox(z,width(x));
14589if cur_style<text_style then {display style}
14590  begin shift_up:=num1(cur_size); shift_down:=denom1(cur_size);
14591  end
14592else  begin shift_down:=denom2(cur_size);
14593  if thickness(q)<>0 then shift_up:=num2(cur_size)
14594  else shift_up:=num3(cur_size);
14595  end
14596
14597@ The numerator and denominator must be separated by a certain minimum
14598clearance, called |clr| in the following program. The difference between
14599|clr| and the actual clearance is |2delta|.
14600
14601@<Adjust \(s)|shift_up| and |shift_down| for the case of no fraction line@>=
14602begin if cur_style<text_style then clr:=7*default_rule_thickness
14603else clr:=3*default_rule_thickness;
14604delta:=half(clr-((shift_up-depth(x))-(height(z)-shift_down)));
14605if delta>0 then
14606  begin shift_up:=shift_up+delta;
14607  shift_down:=shift_down+delta;
14608  end;
14609end
14610
14611@ In the case of a fraction line, the minimum clearance depends on the actual
14612thickness of the line.
14613
14614@<Adjust \(s)|shift_up| and |shift_down| for the case of a fraction line@>=
14615begin if cur_style<text_style then clr:=3*thickness(q)
14616else clr:=thickness(q);
14617delta:=half(thickness(q));
14618delta1:=clr-((shift_up-depth(x))-(axis_height(cur_size)+delta));
14619delta2:=clr-((axis_height(cur_size)-delta)-(height(z)-shift_down));
14620if delta1>0 then shift_up:=shift_up+delta1;
14621if delta2>0 then shift_down:=shift_down+delta2;
14622end
14623
14624@ @<Construct a vlist box for the fraction...@>=
14625v:=new_null_box; type(v):=vlist_node;
14626height(v):=shift_up+height(x); depth(v):=depth(z)+shift_down;
14627width(v):=width(x); {this also equals |width(z)|}
14628if thickness(q)=0 then
14629  begin p:=new_kern((shift_up-depth(x))-(height(z)-shift_down));
14630  link(p):=z;
14631  end
14632else  begin y:=fraction_rule(thickness(q));@/
14633  p:=new_kern((axis_height(cur_size)-delta)-@|(height(z)-shift_down));@/
14634  link(y):=p; link(p):=z;@/
14635  p:=new_kern((shift_up-depth(x))-(axis_height(cur_size)+delta));
14636  link(p):=y;
14637  end;
14638link(x):=p; list_ptr(v):=x
14639
14640@ @<Put the \(f)fraction into a box with its delimiters...@>=
14641if cur_style<text_style then delta:=delim1(cur_size)
14642else delta:=delim2(cur_size);
14643x:=var_delimiter(left_delimiter(q), cur_size, delta); link(x):=v;@/
14644z:=var_delimiter(right_delimiter(q), cur_size, delta); link(v):=z;@/
14645new_hlist(q):=hpack(x,natural)
14646
14647@ If the nucleus of an |op_noad| is a single character, it is to be
14648centered vertically with respect to the axis, after first being enlarged
14649(via a character list in the font) if we are in display style.  The normal
14650convention for placing displayed limits is to put them above and below the
14651operator in display style.
14652
14653The italic correction is removed from the character if there is a subscript
14654and the limits are not being displayed. The |make_op|
14655routine returns the value that should be used as an offset between
14656subscript and superscript.
14657
14658After |make_op| has acted, |subtype(q)| will be |limits| if and only if
14659the limits have been set above and below the operator. In that case,
14660|new_hlist(q)| will already contain the desired final box.
14661
14662@<Declare math...@>=
14663function make_op(@!q:pointer):scaled;
14664var delta:scaled; {offset between subscript and superscript}
14665@!p,@!v,@!x,@!y,@!z:pointer; {temporary registers for box construction}
14666@!c:quarterword;@+@!i:four_quarters; {registers for character examination}
14667@!shift_up,@!shift_down:scaled; {dimensions for box calculation}
14668begin if (subtype(q)=normal)and(cur_style<text_style) then
14669  subtype(q):=limits;
14670if math_type(nucleus(q))=math_char then
14671  begin fetch(nucleus(q));
14672  if (cur_style<text_style)and(char_tag(cur_i)=list_tag) then {make it larger}
14673    begin c:=rem_byte(cur_i); i:=char_info(cur_f)(c);
14674    if char_exists(i) then
14675      begin cur_c:=c; cur_i:=i; character(nucleus(q)):=c;
14676      end;
14677    end;
14678  delta:=char_italic(cur_f)(cur_i); x:=clean_box(nucleus(q),cur_style);
14679  if (math_type(subscr(q))<>empty)and(subtype(q)<>limits) then
14680    width(x):=width(x)-delta; {remove italic correction}
14681  shift_amount(x):=half(height(x)-depth(x)) - axis_height(cur_size);
14682    {center vertically}
14683  math_type(nucleus(q)):=sub_box; info(nucleus(q)):=x;
14684  end
14685else delta:=0;
14686if subtype(q)=limits then
14687  @<Construct a box with limits above and below it, skewed by |delta|@>;
14688make_op:=delta;
14689end;
14690
14691@ The following program builds a vlist box |v| for displayed limits. The
14692width of the box is not affected by the fact that the limits may be skewed.
14693
14694@<Construct a box with limits above and below it...@>=
14695begin x:=clean_box(supscr(q),sup_style(cur_style));
14696y:=clean_box(nucleus(q),cur_style);
14697z:=clean_box(subscr(q),sub_style(cur_style));
14698v:=new_null_box; type(v):=vlist_node; width(v):=width(y);
14699if width(x)>width(v) then width(v):=width(x);
14700if width(z)>width(v) then width(v):=width(z);
14701x:=rebox(x,width(v)); y:=rebox(y,width(v)); z:=rebox(z,width(v));@/
14702shift_amount(x):=half(delta); shift_amount(z):=-shift_amount(x);
14703height(v):=height(y); depth(v):=depth(y);
14704@<Attach the limits to |y| and adjust |height(v)|, |depth(v)| to
14705  account for their presence@>;
14706new_hlist(q):=v;
14707end
14708
14709@ We use |shift_up| and |shift_down| in the following program for the
14710amount of glue between the displayed operator |y| and its limits |x| and
14711|z|. The vlist inside box |v| will consist of |x| followed by |y| followed
14712by |z|, with kern nodes for the spaces between and around them.
14713
14714@<Attach the limits to |y| and adjust |height(v)|, |depth(v)|...@>=
14715if math_type(supscr(q))=empty then
14716  begin free_node(x,box_node_size); list_ptr(v):=y;
14717  end
14718else  begin shift_up:=big_op_spacing3-depth(x);
14719  if shift_up<big_op_spacing1 then shift_up:=big_op_spacing1;
14720  p:=new_kern(shift_up); link(p):=y; link(x):=p;@/
14721  p:=new_kern(big_op_spacing5); link(p):=x; list_ptr(v):=p;
14722  height(v):=height(v)+big_op_spacing5+height(x)+depth(x)+shift_up;
14723  end;
14724if math_type(subscr(q))=empty then free_node(z,box_node_size)
14725else  begin shift_down:=big_op_spacing4-height(z);
14726  if shift_down<big_op_spacing2 then shift_down:=big_op_spacing2;
14727  p:=new_kern(shift_down); link(y):=p; link(p):=z;@/
14728  p:=new_kern(big_op_spacing5); link(z):=p;
14729  depth(v):=depth(v)+big_op_spacing5+height(z)+depth(z)+shift_down;
14730  end
14731
14732@ A ligature found in a math formula does not create a |ligature_node|, because
14733there is no question of hyphenation afterwards; the ligature will simply be
14734stored in an ordinary |char_node|, after residing in an |ord_noad|.
14735
14736The |math_type| is converted to |math_text_char| here if we would not want to
14737apply an italic correction to the current character unless it belongs
14738to a math font (i.e., a font with |space=0|).
14739
14740No boundary characters enter into these ligatures.
14741
14742@<Declare math...@>=
14743procedure make_ord(@!q:pointer);
14744label restart,exit;
14745var a:integer; {address of lig/kern instruction}
14746@!p,@!r:pointer; {temporary registers for list manipulation}
14747begin restart:@t@>@;@/
14748if math_type(subscr(q))=empty then if math_type(supscr(q))=empty then
14749 if math_type(nucleus(q))=math_char then
14750  begin p:=link(q);
14751  if p<>null then if (type(p)>=ord_noad)and(type(p)<=punct_noad) then
14752    if math_type(nucleus(p))=math_char then
14753    if fam(nucleus(p))=fam(nucleus(q)) then
14754      begin math_type(nucleus(q)):=math_text_char;
14755      fetch(nucleus(q));
14756      if char_tag(cur_i)=lig_tag then
14757        begin a:=lig_kern_start(cur_f)(cur_i);
14758        cur_c:=character(nucleus(p));
14759        cur_i:=font_info[a].qqqq;
14760        if skip_byte(cur_i)>stop_flag then
14761          begin a:=lig_kern_restart(cur_f)(cur_i);
14762          cur_i:=font_info[a].qqqq;
14763          end;
14764        loop@+ begin @<If instruction |cur_i| is a kern with |cur_c|, attach
14765            the kern after~|q|; or if it is a ligature with |cur_c|, combine
14766            noads |q| and~|p| appropriately; then |return| if the cursor has
14767            moved past a noad, or |goto restart|@>;
14768          if skip_byte(cur_i)>=stop_flag then return;
14769          a:=a+qo(skip_byte(cur_i))+1;
14770          cur_i:=font_info[a].qqqq;
14771          end;
14772        end;
14773      end;
14774  end;
14775exit:end;
14776
14777@ Note that a ligature between an |ord_noad| and another kind of noad
14778is replaced by an |ord_noad|, when the two noads collapse into one.
14779But we could make a parenthesis (say) change shape when it follows
14780certain letters. Presumably a font designer will define such
14781ligatures only when this convention makes sense.
14782
14783\chardef\?='174 % vertical line to indicate character retention
14784
14785@<If instruction |cur_i| is a kern with |cur_c|, ...@>=
14786if next_char(cur_i)=cur_c then if skip_byte(cur_i)<=stop_flag then
14787  if op_byte(cur_i)>=kern_flag then
14788    begin p:=new_kern(char_kern(cur_f)(cur_i));
14789    link(p):=link(q); link(q):=p; return;
14790    end
14791  else  begin check_interrupt; {allow a way out of infinite ligature loop}
14792    case op_byte(cur_i) of
14793  qi(1),qi(5): character(nucleus(q)):=rem_byte(cur_i); {\.{=:\?}, \.{=:\?>}}
14794  qi(2),qi(6): character(nucleus(p)):=rem_byte(cur_i); {\.{\?=:}, \.{\?=:>}}
14795  qi(3),qi(7),qi(11):begin r:=new_noad; {\.{\?=:\?}, \.{\?=:\?>}, \.{\?=:\?>>}}
14796      character(nucleus(r)):=rem_byte(cur_i);
14797      fam(nucleus(r)):=fam(nucleus(q));@/
14798      link(q):=r; link(r):=p;
14799      if op_byte(cur_i)<qi(11) then math_type(nucleus(r)):=math_char
14800      else math_type(nucleus(r)):=math_text_char; {prevent combination}
14801      end;
14802    othercases begin link(q):=link(p);
14803      character(nucleus(q)):=rem_byte(cur_i); {\.{=:}}
14804      mem[subscr(q)]:=mem[subscr(p)]; mem[supscr(q)]:=mem[supscr(p)];@/
14805      free_node(p,noad_size);
14806      end
14807    endcases;
14808    if op_byte(cur_i)>qi(3) then return;
14809    math_type(nucleus(q)):=math_char; goto restart;
14810    end
14811
14812@ When we get to the following part of the program, we have ``fallen through''
14813from cases that did not lead to |check_dimensions| or |done_with_noad| or
14814|done_with_node|. Thus, |q|~points to a noad whose nucleus may need to be
14815converted to an hlist, and whose subscripts and superscripts need to be
14816appended if they are present.
14817
14818If |nucleus(q)| is not a |math_char|, the variable |delta| is the amount
14819by which a superscript should be moved right with respect to a subscript
14820when both are present.
14821@^subscripts@>
14822@^superscripts@>
14823
14824@<Convert \(n)|nucleus(q)| to an hlist and attach the sub/superscripts@>=
14825case math_type(nucleus(q)) of
14826math_char, math_text_char:
14827  @<Create a character node |p| for |nucleus(q)|, possibly followed
14828  by a kern node for the italic correction, and set |delta| to the
14829  italic correction if a subscript is present@>;
14830empty: p:=null;
14831sub_box: p:=info(nucleus(q));
14832sub_mlist: begin cur_mlist:=info(nucleus(q)); save_style:=cur_style;
14833  mlist_penalties:=false; mlist_to_hlist; {recursive call}
14834@^recursion@>
14835  cur_style:=save_style; @<Set up the values...@>;
14836  p:=hpack(link(temp_head),natural);
14837  end;
14838othercases confusion("mlist2")
14839@:this can't happen mlist2}{\quad mlist2@>
14840endcases;@/
14841new_hlist(q):=p;
14842if (math_type(subscr(q))=empty)and(math_type(supscr(q))=empty) then
14843  goto check_dimensions;
14844make_scripts(q,delta)
14845
14846@ @<Create a character node |p| for |nucleus(q)|...@>=
14847begin fetch(nucleus(q));
14848if char_exists(cur_i) then
14849  begin delta:=char_italic(cur_f)(cur_i); p:=new_character(cur_f,qo(cur_c));
14850  if (math_type(nucleus(q))=math_text_char)and(space(cur_f)<>0) then
14851    delta:=0; {no italic correction in mid-word of text font}
14852  if (math_type(subscr(q))=empty)and(delta<>0) then
14853    begin link(p):=new_kern(delta); delta:=0;
14854    end;
14855  end
14856else p:=null;
14857end
14858
14859@ The purpose of |make_scripts(q,delta)| is to attach the subscript and/or
14860superscript of noad |q| to the list that starts at |new_hlist(q)|,
14861given that the subscript and superscript aren't both empty. The superscript
14862will appear to the right of the subscript by a given distance |delta|.
14863
14864We set |shift_down| and |shift_up| to the minimum amounts to shift the
14865baseline of subscripts and superscripts based on the given nucleus.
14866
14867@<Declare math...@>=
14868procedure make_scripts(@!q:pointer;@!delta:scaled);
14869var p,@!x,@!y,@!z:pointer; {temporary registers for box construction}
14870@!shift_up,@!shift_down,@!clr:scaled; {dimensions in the calculation}
14871@!t:small_number; {subsidiary size code}
14872begin p:=new_hlist(q);
14873if is_char_node(p) then
14874  begin shift_up:=0; shift_down:=0;
14875  end
14876else  begin z:=hpack(p,natural);
14877  if cur_style<script_style then t:=script_size@+else t:=script_script_size;
14878  shift_up:=height(z)-sup_drop(t);
14879  shift_down:=depth(z)+sub_drop(t);
14880  free_node(z,box_node_size);
14881  end;
14882if math_type(supscr(q))=empty then
14883  @<Construct a subscript box |x| when there is no superscript@>
14884else  begin @<Construct a superscript box |x|@>;
14885  if math_type(subscr(q))=empty then shift_amount(x):=-shift_up
14886  else @<Construct a sub/superscript combination box |x|, with the
14887    superscript offset by |delta|@>;
14888  end;
14889if new_hlist(q)=null then new_hlist(q):=x
14890else  begin p:=new_hlist(q);
14891  while link(p)<>null do p:=link(p);
14892  link(p):=x;
14893  end;
14894end;
14895
14896@ When there is a subscript without a superscript, the top of the subscript
14897should not exceed the baseline plus four-fifths of the x-height.
14898
14899@<Construct a subscript box |x| when there is no superscript@>=
14900begin x:=clean_box(subscr(q),sub_style(cur_style));
14901width(x):=width(x)+script_space;
14902if shift_down<sub1(cur_size) then shift_down:=sub1(cur_size);
14903clr:=height(x)-(abs(math_x_height(cur_size)*4) div 5);
14904if shift_down<clr then shift_down:=clr;
14905shift_amount(x):=shift_down;
14906end
14907
14908@ The bottom of a superscript should never descend below the baseline plus
14909one-fourth of the x-height.
14910
14911@<Construct a superscript box |x|@>=
14912begin x:=clean_box(supscr(q),sup_style(cur_style));
14913width(x):=width(x)+script_space;
14914if odd(cur_style) then clr:=sup3(cur_size)
14915else if cur_style<text_style then clr:=sup1(cur_size)
14916else clr:=sup2(cur_size);
14917if shift_up<clr then shift_up:=clr;
14918clr:=depth(x)+(abs(math_x_height(cur_size)) div 4);
14919if shift_up<clr then shift_up:=clr;
14920end
14921
14922@ When both subscript and superscript are present, the subscript must be
14923separated from the superscript by at least four times |default_rule_thickness|.
14924If this condition would be violated, the subscript moves down, after which
14925both subscript and superscript move up so that the bottom of the superscript
14926is at least as high as the baseline plus four-fifths of the x-height.
14927
14928@<Construct a sub/superscript combination box |x|...@>=
14929begin y:=clean_box(subscr(q),sub_style(cur_style));
14930width(y):=width(y)+script_space;
14931if shift_down<sub2(cur_size) then shift_down:=sub2(cur_size);
14932clr:=4*default_rule_thickness-
14933  ((shift_up-depth(x))-(height(y)-shift_down));
14934if clr>0 then
14935  begin shift_down:=shift_down+clr;
14936  clr:=(abs(math_x_height(cur_size)*4) div 5)-(shift_up-depth(x));
14937  if clr>0 then
14938    begin shift_up:=shift_up+clr;
14939    shift_down:=shift_down-clr;
14940    end;
14941  end;
14942shift_amount(x):=delta; {superscript is |delta| to the right of the subscript}
14943p:=new_kern((shift_up-depth(x))-(height(y)-shift_down)); link(x):=p; link(p):=y;
14944x:=vpack(x,natural); shift_amount(x):=shift_down;
14945end
14946
14947@ We have now tied up all the loose ends of the first pass of |mlist_to_hlist|.
14948The second pass simply goes through and hooks everything together with the
14949proper glue and penalties. It also handles the |left_noad| and |right_noad| that
14950might be present, since |max_h| and |max_d| are now known. Variable |p| points
14951to a node at the current end of the final hlist.
14952
14953@<Make a second pass over the mlist, ...@>=
14954p:=temp_head; link(p):=null; q:=mlist; r_type:=0; cur_style:=style;
14955@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>;
14956while q<>null do
14957  begin @<If node |q| is a style node, change the style and |goto delete_q|;
14958    otherwise if it is not a noad, put it into the hlist,
14959    advance |q|, and |goto done|; otherwise set |s| to the size
14960    of noad |q|, set |t| to the associated type (|ord_noad..
14961    inner_noad|), and set |pen| to the associated penalty@>;
14962  @<Append inter-element spacing based on |r_type| and |t|@>;
14963  @<Append any |new_hlist| entries for |q|, and any appropriate penalties@>;
14964  r_type:=t;
14965  delete_q: r:=q; q:=link(q); free_node(r,s);
14966  done: end
14967
14968@ Just before doing the big |case| switch in the second pass, the program
14969sets up default values so that most of the branches are short.
14970
14971@<If node |q| is a style node, change the style...@>=
14972t:=ord_noad; s:=noad_size; pen:=inf_penalty;
14973case type(q) of
14974op_noad,open_noad,close_noad,punct_noad,inner_noad: t:=type(q);
14975bin_noad: begin t:=bin_noad; pen:=bin_op_penalty;
14976  end;
14977rel_noad: begin t:=rel_noad; pen:=rel_penalty;
14978  end;
14979ord_noad,vcenter_noad,over_noad,under_noad: do_nothing;
14980radical_noad: s:=radical_noad_size;
14981accent_noad: s:=accent_noad_size;
14982fraction_noad: begin t:=inner_noad; s:=fraction_noad_size;
14983  end;
14984left_noad,right_noad: t:=make_left_right(q,style,max_d,max_h);
14985style_node: @<Change the current style and |goto delete_q|@>;
14986whatsit_node,penalty_node,rule_node,disc_node,adjust_node,ins_node,mark_node,
14987 glue_node,kern_node:@t@>@;@/
14988  begin link(p):=q; p:=q; q:=link(q); link(p):=null; goto done;
14989  end;
14990othercases confusion("mlist3")
14991@:this can't happen mlist3}{\quad mlist3@>
14992endcases
14993
14994@ The |make_left_right| function constructs a left or right delimiter of
14995the required size and returns the value |open_noad| or |close_noad|. The
14996|right_noad| and |left_noad| will both be based on the original |style|,
14997so they will have consistent sizes.
14998
14999We use the fact that |right_noad-left_noad=close_noad-open_noad|.
15000
15001@<Declare math...@>=
15002function make_left_right(@!q:pointer;@!style:small_number;
15003  @!max_d,@!max_h:scaled):small_number;
15004var delta,@!delta1,@!delta2:scaled; {dimensions used in the calculation}
15005begin if style<script_style then cur_size:=text_size
15006else cur_size:=16*((style-text_style) div 2);
15007delta2:=max_d+axis_height(cur_size);
15008delta1:=max_h+max_d-delta2;
15009if delta2>delta1 then delta1:=delta2; {|delta1| is max distance from axis}
15010delta:=(delta1 div 500)*delimiter_factor;
15011delta2:=delta1+delta1-delimiter_shortfall;
15012if delta<delta2 then delta:=delta2;
15013new_hlist(q):=var_delimiter(delimiter(q),cur_size,delta);
15014make_left_right:=type(q)-(left_noad-open_noad); {|open_noad| or |close_noad|}
15015end;
15016
15017@ @<Change the current style and |goto delete_q|@>=
15018begin cur_style:=subtype(q); s:=style_node_size;
15019@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>;
15020goto delete_q;
15021end
15022
15023@ The inter-element spacing in math formulas depends on a $8\times8$ table that
15024\TeX\ preloads as a 64-digit string. The elements of this string have the
15025following significance:
15026$$\vbox{\halign{#\hfil\cr
15027\.0 means no space;\cr
15028\.1 means a conditional thin space (\.{\\nonscript\\mskip\\thinmuskip});\cr
15029\.2 means a thin space (\.{\\mskip\\thinmuskip});\cr
15030\.3 means a conditional medium space
15031  (\.{\\nonscript\\mskip\\medmuskip});\cr
15032\.4 means a conditional thick space
15033  (\.{\\nonscript\\mskip\\thickmuskip});\cr
15034\.* means an impossible case.\cr}}$$
15035This is all pretty cryptic, but {\sl The \TeX book\/} explains what is
15036supposed to happen, and the string makes it happen.
15037@:TeXbook}{\sl The \TeX book@>
15038
15039A global variable |magic_offset| is computed so that if |a| and |b| are
15040in the range |ord_noad..inner_noad|, then |str_pool[a*8+b+magic_offset]|
15041is the digit for spacing between noad types |a| and |b|.
15042
15043If \PASCAL\ had provided a good way to preload constant arrays, this part of
15044the program would not have been so strange.
15045@:PASCAL}{\PASCAL@>
15046
15047@d math_spacing=@;@/
15048@t\hskip-35pt@>
15049"0234000122*4000133**3**344*0400400*000000234000111*1111112341011"
15050@t$ \hskip-35pt$@>
15051
15052@<Glob...@>=
15053@!magic_offset:integer; {used to find inter-element spacing}
15054
15055@ @<Compute the magic offset@>=
15056magic_offset:=str_start[math_spacing]-9*ord_noad
15057
15058@ @<Append inter-element spacing based on |r_type| and |t|@>=
15059if r_type>0 then {not the first noad}
15060  begin case so(str_pool[r_type*8+t+magic_offset]) of
15061  "0": x:=0;
15062  "1": if cur_style<script_style then x:=thin_mu_skip_code@+else x:=0;
15063  "2": x:=thin_mu_skip_code;
15064  "3": if cur_style<script_style then x:=med_mu_skip_code@+else x:=0;
15065  "4": if cur_style<script_style then x:=thick_mu_skip_code@+else x:=0;
15066  othercases confusion("mlist4")
15067@:this can't happen mlist4}{\quad mlist4@>
15068  endcases;
15069  if x<>0 then
15070    begin y:=math_glue(glue_par(x),cur_mu);
15071    z:=new_glue(y); glue_ref_count(y):=null; link(p):=z; p:=z;@/
15072    subtype(z):=x+1; {store a symbolic subtype}
15073    end;
15074  end
15075
15076@ We insert a penalty node after the hlist entries of noad |q| if |pen|
15077is not an ``infinite'' penalty, and if the node immediately following |q|
15078is not a penalty node or a |rel_noad| or absent entirely.
15079
15080@<Append any |new_hlist| entries for |q|, and any appropriate penalties@>=
15081if new_hlist(q)<>null then
15082  begin link(p):=new_hlist(q);
15083  repeat p:=link(p);
15084  until link(p)=null;
15085  end;
15086if penalties then if link(q)<>null then if pen<inf_penalty then
15087  begin r_type:=type(link(q));
15088  if r_type<>penalty_node then if r_type<>rel_noad then
15089    begin z:=new_penalty(pen); link(p):=z; p:=z;
15090    end;
15091  end
15092
15093@* \[37] Alignment.
15094It's sort of a miracle whenever \.{\\halign} and \.{\\valign} work, because
15095they cut across so many of the control structures of \TeX.
15096
15097Therefore the
15098present page is probably not the best place for a beginner to start reading
15099this program; it is better to master everything else first.
15100
15101Let us focus our thoughts on an example of what the input might be, in order
15102to get some idea about how the alignment miracle happens. The example doesn't
15103do anything useful, but it is sufficiently general to indicate all of the
15104special cases that must be dealt with; please do not be disturbed by its
15105apparent complexity and meaninglessness.
15106$$\vbox{\halign{\.{#}\hfil\cr
15107{}\\tabskip 2pt plus 3pt\cr
15108{}\\halign to 300pt\{u1\#v1\&\cr
15109\hskip 50pt\\tabskip 1pt plus 1fil u2\#v2\&\cr
15110\hskip 50pt u3\#v3\\cr\cr
15111\hskip 25pt a1\&\\omit a2\&\\vrule\\cr\cr
15112\hskip 25pt \\noalign\{\\vskip 3pt\}\cr
15113\hskip 25pt b1\\span b2\\cr\cr
15114\hskip 25pt \\omit\&c2\\span\\omit\\cr\}\cr}}$$
15115Here's what happens:
15116
15117\yskip
15118(0) When `\.{\\halign to 300pt\{}' is scanned, the |scan_spec| routine
15119places the 300pt dimension onto the |save_stack|, and an |align_group|
15120code is placed above it. This will make it possible to complete the alignment
15121when the matching `\.\}' is found.
15122
15123(1) The preamble is scanned next. Macros in the preamble are not expanded,
15124@^preamble@>
15125except as part of a tabskip specification. For example, if \.{u2} had been
15126a macro in the preamble above, it would have been expanded, since \TeX\
15127must look for `\.{minus...}' as part of the tabskip glue. A ``preamble list''
15128is constructed based on the user's preamble; in our case it contains the
15129following seven items:
15130$$\vbox{\halign{\.{#}\hfil\qquad&(#)\hfil\cr
15131{}\\glue 2pt plus 3pt&the tabskip preceding column 1\cr
15132{}\\alignrecord, width $-\infty$&preamble info for column 1\cr
15133{}\\glue 2pt plus 3pt&the tabskip between columns 1 and 2\cr
15134{}\\alignrecord, width $-\infty$&preamble info for column 2\cr
15135{}\\glue 1pt plus 1fil&the tabskip between columns 2 and 3\cr
15136{}\\alignrecord, width $-\infty$&preamble info for column 3\cr
15137{}\\glue 1pt plus 1fil&the tabskip following column 3\cr}}$$
15138These ``alignrecord'' entries have the same size as an |unset_node|,
15139since they will later be converted into such nodes. However, at the
15140moment they have no |type| or |subtype| fields; they have |info| fields
15141instead, and these |info| fields are initially set to the value |end_span|,
15142for reasons explained below. Furthermore, the alignrecord nodes have no
15143|height| or |depth| fields; these are renamed |u_part| and |v_part|,
15144and they point to token lists for the templates of the alignment.
15145For example, the |u_part| field in the first alignrecord points to the
15146token list `\.{u1}', i.e., the template preceding the `\.\#' for column~1.
15147
15148(2) \TeX\ now looks at what follows the \.{\\cr} that ended the preamble.
15149It is not `\.{\\noalign}' or `\.{\\omit}', so this input is put back to
15150be read again, and the template `\.{u1}' is fed to the scanner. Just
15151before reading `\.{u1}', \TeX\ goes into restricted horizontal mode.
15152Just after reading `\.{u1}', \TeX\ will see `\.{a1}', and then (when the
15153{\.\&} is sensed) \TeX\ will see `\.{v1}'. Then \TeX\ scans an |endv|
15154token, indicating the end of a column. At this point an |unset_node| is
15155created, containing the contents of the current hlist (i.e., `\.{u1a1v1}').
15156The natural width of this unset node replaces the |width| field of the
15157alignrecord for column~1; in general, the alignrecords will record the
15158maximum natural width that has occurred so far in a given column.
15159
15160(3) Since `\.{\\omit}' follows the `\.\&', the templates for column~2
15161are now bypassed. Again \TeX\ goes into restricted horizontal mode and
15162makes an |unset_node| from the resulting hlist; but this time the
15163hlist contains simply `\.{a2}'. The natural width of the new unset box
15164is remembered in the |width| field of the alignrecord for column~2.
15165
15166(4) A third |unset_node| is created for column 3, using essentially the
15167mechanism that worked for column~1; this unset box contains `\.{u3\\vrule
15168v3}'. The vertical rule in this case has running dimensions that will later
15169extend to the height and depth of the whole first row, since each |unset_node|
15170in a row will eventually inherit the height and depth of its enclosing box.
15171
15172(5) The first row has now ended; it is made into a single unset box
15173comprising the following seven items:
15174$$\vbox{\halign{\hbox to 325pt{\qquad\.{#}\hfil}\cr
15175{}\\glue 2pt plus 3pt\cr
15176{}\\unsetbox for 1 column: u1a1v1\cr
15177{}\\glue 2pt plus 3pt\cr
15178{}\\unsetbox for 1 column: a2\cr
15179{}\\glue 1pt plus 1fil\cr
15180{}\\unsetbox for 1 column: u3\\vrule v3\cr
15181{}\\glue 1pt plus 1fil\cr}}$$
15182The width of this unset row is unimportant, but it has the correct height
15183and depth, so the correct baselineskip glue will be computed as the row
15184is inserted into a vertical list.
15185
15186(6) Since `\.{\\noalign}' follows the current \.{\\cr}, \TeX\ appends
15187additional material (in this case \.{\\vskip 3pt}) to the vertical list.
15188While processing this material, \TeX\ will be in internal vertical
15189mode, and |no_align_group| will be on |save_stack|.
15190
15191(7) The next row produces an unset box that looks like this:
15192$$\vbox{\halign{\hbox to 325pt{\qquad\.{#}\hfil}\cr
15193{}\\glue 2pt plus 3pt\cr
15194{}\\unsetbox for 2 columns: u1b1v1u2b2v2\cr
15195{}\\glue 1pt plus 1fil\cr
15196{}\\unsetbox for 1 column: {\rm(empty)}\cr
15197{}\\glue 1pt plus 1fil\cr}}$$
15198The natural width of the unset box that spans columns 1~and~2 is stored
15199in a ``span node,'' which we will explain later; the |info| field of the
15200alignrecord for column~1 now points to the new span node, and the |info|
15201of the span node points to |end_span|.
15202
15203(8) The final row produces the unset box
15204$$\vbox{\halign{\hbox to 325pt{\qquad\.{#}\hfil}\cr
15205{}\\glue 2pt plus 3pt\cr
15206{}\\unsetbox for 1 column: {\rm(empty)}\cr
15207{}\\glue 2pt plus 3pt\cr
15208{}\\unsetbox for 2 columns: u2c2v2\cr
15209{}\\glue 1pt plus 1fil\cr}}$$
15210A new span node is attached to the alignrecord for column 2.
15211
15212(9) The last step is to compute the true column widths and to change all the
15213unset boxes to hboxes, appending the whole works to the vertical list that
15214encloses the \.{\\halign}. The rules for deciding on the final widths of
15215each unset column box will be explained below.
15216
15217\yskip\noindent
15218Note that as \.{\\halign} is being processed, we fearlessly give up control
15219to the rest of \TeX. At critical junctures, an alignment routine is
15220called upon to step in and do some little action, but most of the time
15221these routines just lurk in the background. It's something like
15222post-hypnotic suggestion.
15223
15224@ We have mentioned that alignrecords contain no |height| or |depth| fields.
15225Their |glue_sign| and |glue_order| are pre-empted as well, since it
15226is necessary to store information about what to do when a template ends.
15227This information is called the |extra_info| field.
15228
15229@d u_part(#)==mem[#+height_offset].int {pointer to \<u_j> token list}
15230@d v_part(#)==mem[#+depth_offset].int {pointer to \<v_j> token list}
15231@d extra_info(#)==info(#+list_offset) {info to remember during template}
15232
15233@ Alignments can occur within alignments, so a small stack is used to access
15234the alignrecord information. At each level we have a |preamble| pointer,
15235indicating the beginning of the preamble list; a |cur_align| pointer,
15236indicating the current position in the preamble list; a |cur_span| pointer,
15237indicating the value of |cur_align| at the beginning of a sequence of
15238spanned columns; a |cur_loop| pointer, indicating the tabskip glue before
15239an alignrecord that should be copied next if the current list is extended;
15240and the |align_state| variable, which indicates the nesting of braces so
15241that \.{\\cr} and \.{\\span} and tab marks are properly intercepted.
15242There also are pointers |cur_head| and |cur_tail| to the head and tail
15243of a list of adjustments being moved out from horizontal mode to
15244vertical~mode.
15245
15246The current values of these seven quantities appear in global variables;
15247when they have to be pushed down, they are stored in 5-word nodes, and
15248|align_ptr| points to the topmost such node.
15249
15250@d preamble==link(align_head) {the current preamble list}
15251@d align_stack_node_size=5 {number of |mem| words to save alignment states}
15252
15253@<Glob...@>=
15254@!cur_align:pointer; {current position in preamble list}
15255@!cur_span:pointer; {start of currently spanned columns in preamble list}
15256@!cur_loop:pointer; {place to copy when extending a periodic preamble}
15257@!align_ptr:pointer; {most recently pushed-down alignment stack node}
15258@!cur_head,@!cur_tail:pointer; {adjustment list pointers}
15259
15260@ The |align_state| and |preamble| variables are initialized elsewhere.
15261
15262@<Set init...@>=
15263align_ptr:=null; cur_align:=null; cur_span:=null; cur_loop:=null;
15264cur_head:=null; cur_tail:=null;
15265
15266@ Alignment stack maintenance is handled by a pair of trivial routines
15267called |push_alignment| and |pop_alignment|.
15268
15269@p procedure push_alignment;
15270var p:pointer; {the new alignment stack node}
15271begin p:=get_node(align_stack_node_size);
15272link(p):=align_ptr; info(p):=cur_align;
15273llink(p):=preamble; rlink(p):=cur_span;
15274mem[p+2].int:=cur_loop; mem[p+3].int:=align_state;
15275info(p+4):=cur_head; link(p+4):=cur_tail;
15276align_ptr:=p;
15277cur_head:=get_avail;
15278end;
15279@#
15280procedure pop_alignment;
15281var p:pointer; {the top alignment stack node}
15282begin free_avail(cur_head);
15283p:=align_ptr;
15284cur_tail:=link(p+4); cur_head:=info(p+4);
15285align_state:=mem[p+3].int; cur_loop:=mem[p+2].int;
15286cur_span:=rlink(p); preamble:=llink(p);
15287cur_align:=info(p); align_ptr:=link(p);
15288free_node(p,align_stack_node_size);
15289end;
15290
15291@ \TeX\ has eight procedures that govern alignments: |init_align| and
15292|fin_align| are used at the very beginning and the very end; |init_row| and
15293|fin_row| are used at the beginning and end of individual rows; |init_span|
15294is used at the beginning of a sequence of spanned columns (possibly involving
15295only one column); |init_col| and |fin_col| are used at the beginning and
15296end of individual columns; and |align_peek| is used after \.{\\cr} to see
15297whether the next item is \.{\\noalign}.
15298
15299We shall consider these routines in the order they are first used during
15300the course of a complete \.{\\halign}, namely |init_align|, |align_peek|,
15301|init_row|, |init_span|, |init_col|, |fin_col|, |fin_row|, |fin_align|.
15302
15303@ When \.{\\halign} or \.{\\valign} has been scanned in an appropriate
15304mode, \TeX\ calls |init_align|, whose task is to get everything off to a
15305good start. This mostly involves scanning the preamble and putting its
15306information into the preamble list.
15307@^preamble@>
15308
15309@p @t\4@>@<Declare the procedure called |get_preamble_token|@>@t@>@/
15310procedure@?align_peek; forward;@t\2@>@/
15311procedure@?normal_paragraph; forward;@t\2@>@/
15312procedure init_align;
15313label done, done1, done2, continue;
15314var save_cs_ptr:pointer; {|warning_index| value for error messages}
15315@!p:pointer; {for short-term temporary use}
15316begin save_cs_ptr:=cur_cs; {\.{\\halign} or \.{\\valign}, usually}
15317push_alignment; align_state:=-1000000; {enter a new alignment level}
15318@<Check for improper alignment in displayed math@>;
15319push_nest; {enter a new semantic level}
15320@<Change current mode to |-vmode| for \.{\\halign}, |-hmode| for \.{\\valign}@>;
15321scan_spec(align_group,false);@/
15322@<Scan the preamble and record it in the |preamble| list@>;
15323new_save_level(align_group);
15324if every_cr<>null then begin_token_list(every_cr,every_cr_text);
15325align_peek; {look for \.{\\noalign} or \.{\\omit}}
15326end;
15327
15328@ In vertical modes, |prev_depth| already has the correct value. But
15329if we are in |mmode| (displayed formula mode), we reach out to the
15330enclosing vertical mode for the |prev_depth| value that produces the
15331correct baseline calculations.
15332
15333@<Change current mode...@>=
15334if mode=mmode then
15335  begin mode:=-vmode; prev_depth:=nest[nest_ptr-2].aux_field.sc;
15336  end
15337else if mode>0 then negate(mode)
15338
15339@ When \.{\\halign} is used as a displayed formula, there should be
15340no other pieces of mlists present.
15341
15342@<Check for improper alignment in displayed math@>=
15343if (mode=mmode)and((tail<>head)or(incompleat_noad<>null)) then
15344  begin print_err("Improper "); print_esc("halign"); print(" inside $$'s");
15345@.Improper \\halign...@>
15346  help3("Displays can use special alignments (like \eqalignno)")@/
15347  ("only if nothing but the alignment itself is between $$'s.")@/
15348  ("So I've deleted the formulas that preceded this alignment.");
15349  error; flush_math;
15350  end
15351
15352@ @<Scan the preamble and record it in the |preamble| list@>=
15353preamble:=null; cur_align:=align_head; cur_loop:=null; scanner_status:=aligning;
15354warning_index:=save_cs_ptr; align_state:=-1000000;
15355  {at this point, |cur_cmd=left_brace|}
15356loop@+  begin @<Append the current tabskip glue to the preamble list@>;
15357  if cur_cmd=car_ret then goto done; {\.{\\cr} ends the preamble}
15358  @<Scan preamble text until |cur_cmd| is |tab_mark| or |car_ret|,
15359    looking for changes in the tabskip glue; append an
15360    alignrecord to the preamble list@>;
15361  end;
15362done: scanner_status:=normal
15363
15364@ @<Append the current tabskip glue to the preamble list@>=
15365link(cur_align):=new_param_glue(tab_skip_code);
15366cur_align:=link(cur_align)
15367
15368@ @<Scan preamble text until |cur_cmd| is |tab_mark| or |car_ret|...@>=
15369@<Scan the template \<u_j>, putting the resulting token list in |hold_head|@>;
15370link(cur_align):=new_null_box; cur_align:=link(cur_align); {a new alignrecord}
15371info(cur_align):=end_span; width(cur_align):=null_flag;
15372u_part(cur_align):=link(hold_head);
15373@<Scan the template \<v_j>, putting the resulting token list in |hold_head|@>;
15374v_part(cur_align):=link(hold_head)
15375
15376@ We enter `\.{\\span}' into |eqtb| with |tab_mark| as its command code,
15377and with |span_code| as the command modifier. This makes \TeX\ interpret it
15378essentially the same as an alignment delimiter like `\.\&', yet it is
15379recognizably different when we need to distinguish it from a normal delimiter.
15380It also turns out to be useful to give a special |cr_code| to `\.{\\cr}',
15381and an even larger |cr_cr_code| to `\.{\\crcr}'.
15382
15383The end of a template is represented by two ``frozen'' control sequences
15384called \.{\\endtemplate}. The first has the command code |end_template|, which
15385is |>outer_call|, so it will not easily disappear in the presence of errors.
15386The |get_x_token| routine converts the first into the second, which has |endv|
15387as its command code.
15388
15389@d span_code=256 {distinct from any character}
15390@d cr_code=257 {distinct from |span_code| and from any character}
15391@d cr_cr_code=cr_code+1 {this distinguishes \.{\\crcr} from \.{\\cr}}
15392@d end_template_token==cs_token_flag+frozen_end_template
15393
15394@<Put each of \TeX's primitives into the hash table@>=
15395primitive("span",tab_mark,span_code);@/
15396@!@:span_}{\.{\\span} primitive@>
15397primitive("cr",car_ret,cr_code);
15398@!@:cr_}{\.{\\cr} primitive@>
15399text(frozen_cr):="cr"; eqtb[frozen_cr]:=eqtb[cur_val];@/
15400primitive("crcr",car_ret,cr_cr_code);
15401@!@:cr_cr_}{\.{\\crcr} primitive@>
15402text(frozen_end_template):="endtemplate"; text(frozen_endv):="endtemplate";
15403eq_type(frozen_endv):=endv; equiv(frozen_endv):=null_list;
15404eq_level(frozen_endv):=level_one;@/
15405eqtb[frozen_end_template]:=eqtb[frozen_endv];
15406eq_type(frozen_end_template):=end_template;
15407
15408@ @<Cases of |print_cmd_chr|...@>=
15409tab_mark: if chr_code=span_code then print_esc("span")
15410  else chr_cmd("alignment tab character ");
15411car_ret: if chr_code=cr_code then print_esc("cr")
15412  else print_esc("crcr");
15413
15414@ The preamble is copied directly, except that \.{\\tabskip} causes a change
15415to the tabskip glue, thereby possibly expanding macros that immediately
15416follow it. An appearance of \.{\\span} also causes such an expansion.
15417
15418Note that if the preamble contains `\.{\\global\\tabskip}', the `\.{\\global}'
15419token survives in the preamble and the `\.{\\tabskip}' defines new
15420tabskip glue (locally).
15421
15422@<Declare the procedure called |get_preamble_token|@>=
15423procedure get_preamble_token;
15424label restart;
15425begin restart: get_token;
15426while (cur_chr=span_code)and(cur_cmd=tab_mark) do
15427  begin get_token; {this token will be expanded once}
15428  if cur_cmd>max_command then
15429    begin expand; get_token;
15430    end;
15431  end;
15432if cur_cmd=endv then
15433  fatal_error("(interwoven alignment preambles are not allowed)");
15434@.interwoven alignment preambles...@>
15435if (cur_cmd=assign_glue)and(cur_chr=glue_base+tab_skip_code) then
15436  begin scan_optional_equals; scan_glue(glue_val);
15437  if global_defs>0 then geq_define(glue_base+tab_skip_code,glue_ref,cur_val)
15438  else eq_define(glue_base+tab_skip_code,glue_ref,cur_val);
15439  goto restart;
15440  end;
15441end;
15442
15443@ Spaces are eliminated from the beginning of a template.
15444
15445@<Scan the template \<u_j>...@>=
15446p:=hold_head; link(p):=null;
15447loop@+  begin get_preamble_token;
15448  if cur_cmd=mac_param then goto done1;
15449  if (cur_cmd<=car_ret)and(cur_cmd>=tab_mark)and(align_state=-1000000) then
15450   if (p=hold_head)and(cur_loop=null)and(cur_cmd=tab_mark)
15451    then cur_loop:=cur_align
15452   else  begin print_err("Missing # inserted in alignment preamble");
15453@.Missing \# inserted...@>
15454    help3("There should be exactly one # between &'s, when an")@/
15455    ("\halign or \valign is being set up. In this case you had")@/
15456    ("none, so I've put one in; maybe that will work.");
15457    back_error; goto done1;
15458    end
15459  else if (cur_cmd<>spacer)or(p<>hold_head) then
15460    begin link(p):=get_avail; p:=link(p); info(p):=cur_tok;
15461    end;
15462  end;
15463done1:
15464
15465@ @<Scan the template \<v_j>...@>=
15466p:=hold_head; link(p):=null;
15467loop@+  begin continue: get_preamble_token;
15468  if (cur_cmd<=car_ret)and(cur_cmd>=tab_mark)and(align_state=-1000000) then
15469    goto done2;
15470  if cur_cmd=mac_param then
15471    begin print_err("Only one # is allowed per tab");
15472@.Only one \# is allowed...@>
15473    help3("There should be exactly one # between &'s, when an")@/
15474    ("\halign or \valign is being set up. In this case you had")@/
15475    ("more than one, so I'm ignoring all but the first.");
15476    error; goto continue;
15477    end;
15478  link(p):=get_avail; p:=link(p); info(p):=cur_tok;
15479  end;
15480done2: link(p):=get_avail; p:=link(p);
15481info(p):=end_template_token {put \.{\\endtemplate} at the end}
15482
15483@ The tricky part about alignments is getting the templates into the
15484scanner at the right time, and recovering control when a row or column
15485is finished.
15486
15487We usually begin a row after each \.{\\cr} has been sensed, unless that
15488\.{\\cr} is followed by \.{\\noalign} or by the right brace that terminates
15489the alignment. The |align_peek| routine is used to look ahead and do
15490the right thing; it either gets a new row started, or gets a \.{\\noalign}
15491started, or finishes off the alignment.
15492
15493@<Declare the procedure called |align_peek|@>=
15494procedure align_peek;
15495label restart;
15496begin restart: align_state:=1000000; @<Get the next non-blank non-call token@>;
15497if cur_cmd=no_align then
15498  begin scan_left_brace; new_save_level(no_align_group);
15499  if mode=-vmode then normal_paragraph;
15500  end
15501else if cur_cmd=right_brace then fin_align
15502else if (cur_cmd=car_ret)and(cur_chr=cr_cr_code) then
15503  goto restart {ignore \.{\\crcr}}
15504else  begin init_row; {start a new row}
15505  init_col; {start a new column and replace what we peeked at}
15506  end;
15507end;
15508
15509@ To start a row (i.e., a `row' that rhymes with `dough' but not with `bough'),
15510we enter a new semantic level, copy the first tabskip glue, and change
15511from internal vertical mode to restricted horizontal mode or vice versa.
15512The |space_factor| and |prev_depth| are not used on this semantic level,
15513but we clear them to zero just to be tidy.
15514
15515@p @t\4@>@<Declare the procedure called |init_span|@>@t@>@/
15516procedure init_row;
15517begin push_nest; mode:=(-hmode-vmode)-mode;
15518if mode=-hmode then space_factor:=0 @+else prev_depth:=0;
15519tail_append(new_glue(glue_ptr(preamble)));
15520subtype(tail):=tab_skip_code+1;@/
15521cur_align:=link(preamble); cur_tail:=cur_head; init_span(cur_align);
15522end;
15523
15524@ The parameter to |init_span| is a pointer to the alignrecord where the
15525next column or group of columns will begin. A new semantic level is
15526entered, so that the columns will generate a list for subsequent packaging.
15527
15528@<Declare the procedure called |init_span|@>=
15529procedure init_span(@!p:pointer);
15530begin push_nest;
15531if mode=-hmode then space_factor:=1000
15532else  begin prev_depth:=ignore_depth; normal_paragraph;
15533  end;
15534cur_span:=p;
15535end;
15536
15537@ When a column begins, we assume that |cur_cmd| is either |omit| or else
15538the current token should be put back into the input until the \<u_j>
15539template has been scanned.  (Note that |cur_cmd| might be |tab_mark| or
15540|car_ret|.)  We also assume that |align_state| is approximately 1000000 at
15541this time.  We remain in the same mode, and start the template if it is
15542called for.
15543
15544@p procedure init_col;
15545begin extra_info(cur_align):=cur_cmd;
15546if cur_cmd=omit then align_state:=0
15547else  begin back_input; begin_token_list(u_part(cur_align),u_template);
15548  end; {now |align_state=1000000|}
15549end;
15550
15551@ The scanner sets |align_state| to zero when the \<u_j> template ends. When
15552a subsequent \.{\\cr} or \.{\\span} or tab mark occurs with |align_state=0|,
15553the scanner activates the following code, which fires up the \<v_j> template.
15554We need to remember the |cur_chr|, which is either |cr_cr_code|, |cr_code|,
15555|span_code|, or a character code, depending on how the column text has ended.
15556
15557This part of the program had better not be activated when the preamble
15558to another alignment is being scanned, or when no alignment preamble is active.
15559
15560@<Insert the \(v)\<v_j>...@>=
15561begin if (scanner_status=aligning) or (cur_align=null) then
15562  fatal_error("(interwoven alignment preambles are not allowed)");
15563@.interwoven alignment preambles...@>
15564cur_cmd:=extra_info(cur_align); extra_info(cur_align):=cur_chr;
15565if cur_cmd=omit then begin_token_list(omit_template,v_template)
15566else begin_token_list(v_part(cur_align),v_template);
15567align_state:=1000000; goto restart;
15568end
15569
15570@ The token list |omit_template| just referred to is a constant token
15571list that contains the special control sequence \.{\\endtemplate} only.
15572
15573@<Initialize the special...@>=
15574info(omit_template):=end_template_token; {|link(omit_template)=null|}
15575
15576@ When the |endv| command at the end of a \<v_j> template comes through the
15577scanner, things really start to happen; and it is the |fin_col| routine
15578that makes them happen. This routine returns |true| if a row as well as a
15579column has been finished.
15580
15581@p function fin_col:boolean;
15582label exit;
15583var p:pointer; {the alignrecord after the current one}
15584@!q,@!r:pointer; {temporary pointers for list manipulation}
15585@!s:pointer; {a new span node}
15586@!u:pointer; {a new unset box}
15587@!w:scaled; {natural width}
15588@!o:glue_ord; {order of infinity}
15589@!n:halfword; {span counter}
15590begin if cur_align=null then confusion("endv");
15591q:=link(cur_align);@+if q=null then confusion("endv");
15592@:this can't happen endv}{\quad endv@>
15593if align_state<500000 then
15594  fatal_error("(interwoven alignment preambles are not allowed)");
15595@.interwoven alignment preambles...@>
15596p:=link(q);
15597@<If the preamble list has been traversed, check that the row has ended@>;
15598if extra_info(cur_align)<>span_code then
15599  begin unsave; new_save_level(align_group);@/
15600  @<Package an unset box for the current column and record its width@>;
15601  @<Copy the tabskip glue between columns@>;
15602  if extra_info(cur_align)>=cr_code then
15603    begin fin_col:=true; return;
15604    end;
15605  init_span(p);
15606  end;
15607align_state:=1000000; @<Get the next non-blank non-call token@>;
15608cur_align:=p;
15609init_col; fin_col:=false;
15610exit: end;
15611
15612@ @<If the preamble list has been traversed, check that the row has ended@>=
15613if (p=null)and(extra_info(cur_align)<cr_code) then
15614 if cur_loop<>null then @<Lengthen the preamble periodically@>
15615 else  begin print_err("Extra alignment tab has been changed to ");
15616@.Extra alignment tab...@>
15617  print_esc("cr");
15618  help3("You have given more \span or & marks than there were")@/
15619  ("in the preamble to the \halign or \valign now in progress.")@/
15620  ("So I'll assume that you meant to type \cr instead.");
15621  extra_info(cur_align):=cr_code; error;
15622  end
15623
15624@ @<Lengthen the preamble...@>=
15625begin link(q):=new_null_box; p:=link(q); {a new alignrecord}
15626info(p):=end_span; width(p):=null_flag; cur_loop:=link(cur_loop);
15627@<Copy the templates from node |cur_loop| into node |p|@>;
15628cur_loop:=link(cur_loop);
15629link(p):=new_glue(glue_ptr(cur_loop));
15630end
15631
15632@ @<Copy the templates from node |cur_loop| into node |p|@>=
15633q:=hold_head; r:=u_part(cur_loop);
15634while r<>null do
15635  begin link(q):=get_avail; q:=link(q); info(q):=info(r); r:=link(r);
15636  end;
15637link(q):=null; u_part(p):=link(hold_head);
15638q:=hold_head; r:=v_part(cur_loop);
15639while r<>null do
15640  begin link(q):=get_avail; q:=link(q); info(q):=info(r); r:=link(r);
15641  end;
15642link(q):=null; v_part(p):=link(hold_head)
15643
15644@ @<Copy the tabskip glue...@>=
15645tail_append(new_glue(glue_ptr(link(cur_align))));
15646subtype(tail):=tab_skip_code+1
15647
15648@ @<Package an unset...@>=
15649begin if mode=-hmode then
15650  begin adjust_tail:=cur_tail; u:=hpack(link(head),natural); w:=width(u);
15651  cur_tail:=adjust_tail; adjust_tail:=null;
15652  end
15653else  begin u:=vpackage(link(head),natural,0); w:=height(u);
15654  end;
15655n:=min_quarterword; {this represents a span count of 1}
15656if cur_span<>cur_align then @<Update width entry for spanned columns@>
15657else if w>width(cur_align) then width(cur_align):=w;
15658type(u):=unset_node; span_count(u):=n;@/
15659@<Determine the stretch order@>;
15660glue_order(u):=o; glue_stretch(u):=total_stretch[o];@/
15661@<Determine the shrink order@>;
15662glue_sign(u):=o; glue_shrink(u):=total_shrink[o];@/
15663pop_nest; link(tail):=u; tail:=u;
15664end
15665
15666@ A span node is a 2-word record containing |width|, |info|, and |link|
15667fields. The |link| field is not really a link, it indicates the number of
15668spanned columns; the |info| field points to a span node for the same
15669starting column, having a greater extent of spanning, or to |end_span|,
15670which has the largest possible |link| field; the |width| field holds the
15671largest natural width corresponding to a particular set of spanned columns.
15672
15673A list of the maximum widths so far, for spanned columns starting at a
15674given column, begins with the |info| field of the alignrecord for that
15675column.
15676
15677@d span_node_size=2 {number of |mem| words for a span node}
15678
15679@<Initialize the special list heads...@>=
15680link(end_span):=max_quarterword+1; info(end_span):=null;
15681
15682@ @<Update width entry for spanned columns@>=
15683begin q:=cur_span;
15684repeat incr(n); q:=link(link(q));
15685until q=cur_align;
15686if n>max_quarterword then confusion("256 spans"); {this can happen, but won't}
15687@^system dependencies@>
15688@:this can't happen 256 spans}{\quad 256 spans@>
15689q:=cur_span; while link(info(q))<n do q:=info(q);
15690if link(info(q))>n then
15691  begin s:=get_node(span_node_size); info(s):=info(q); link(s):=n;
15692  info(q):=s; width(s):=w;
15693  end
15694else if width(info(q))<w then width(info(q)):=w;
15695end
15696
15697@ At the end of a row, we append an unset box to the current vlist (for
15698\.{\\halign}) or the current hlist (for \.{\\valign}). This unset box
15699contains the unset boxes for the columns, separated by the tabskip glue.
15700Everything will be set later.
15701
15702@p procedure fin_row;
15703var p:pointer; {the new unset box}
15704begin if mode=-hmode then
15705  begin p:=hpack(link(head),natural);
15706  pop_nest; append_to_vlist(p);
15707  if cur_head<>cur_tail then
15708    begin link(tail):=link(cur_head); tail:=cur_tail;
15709    end;
15710  end
15711else  begin p:=vpack(link(head),natural); pop_nest;
15712  link(tail):=p; tail:=p; space_factor:=1000;
15713  end;
15714type(p):=unset_node; glue_stretch(p):=0;
15715if every_cr<>null then begin_token_list(every_cr,every_cr_text);
15716align_peek;
15717end; {note that |glue_shrink(p)=0| since |glue_shrink==shift_amount|}
15718
15719@ Finally, we will reach the end of the alignment, and we can breathe a
15720sigh of relief that memory hasn't overflowed. All the unset boxes will now be
15721set so that the columns line up, taking due account of spanned columns.
15722
15723@p procedure@?do_assignments; forward;@t\2@>@/
15724procedure@?resume_after_display; forward;@t\2@>@/
15725procedure@?build_page; forward;@t\2@>@/
15726procedure fin_align;
15727var @!p,@!q,@!r,@!s,@!u,@!v: pointer; {registers for the list operations}
15728@!t,@!w:scaled; {width of column}
15729@!o:scaled; {shift offset for unset boxes}
15730@!n:halfword; {matching span amount}
15731@!rule_save:scaled; {temporary storage for |overfull_rule|}
15732@!aux_save:memory_word; {temporary storage for |aux|}
15733begin if cur_group<>align_group then confusion("align1");
15734@:this can't happen align}{\quad align@>
15735unsave; {that |align_group| was for individual entries}
15736if cur_group<>align_group then confusion("align0");
15737unsave; {that |align_group| was for the whole alignment}
15738if nest[nest_ptr-1].mode_field=mmode then o:=display_indent
15739  else o:=0;
15740@<Go through the preamble list, determining the column widths and
15741  changing the alignrecords to dummy unset boxes@>;
15742@<Package the preamble list, to determine the actual tabskip glue amounts,
15743  and let |p| point to this prototype box@>;
15744@<Set the glue in all the unset boxes of the current list@>;
15745flush_node_list(p); pop_alignment;
15746@<Insert the \(c)current list into its environment@>;
15747end;@/
15748@t\4@>@<Declare the procedure called |align_peek|@>
15749
15750@ It's time now to dismantle the preamble list and to compute the column
15751widths. Let $w_{ij}$ be the maximum of the natural widths of all entries
15752that span columns $i$ through $j$, inclusive. The alignrecord for column~$i$
15753contains $w_{ii}$ in its |width| field, and there is also a linked list of
15754the nonzero $w_{ij}$ for increasing $j$, accessible via the |info| field;
15755these span nodes contain the value $j-i+|min_quarterword|$ in their
15756|link| fields. The values of $w_{ii}$ were initialized to |null_flag|, which
15757we regard as $-\infty$.
15758
15759The final column widths are defined by the formula
15760$$w_j=\max_{1\L i\L j}\biggl( w_{ij}-\sum_{i\L k<j}(t_k+w_k)\biggr),$$
15761where $t_k$ is the natural width of the tabskip glue between columns
15762$k$ and~$k+1$. However, if $w_{ij}=-\infty$ for all |i| in the range
15763|1<=i<=j| (i.e., if every entry that involved column~|j| also involved
15764column~|j+1|), we let $w_j=0$, and we zero out the tabskip glue after
15765column~|j|.
15766
15767\TeX\ computes these values by using the following scheme: First $w_1=w_{11}$.
15768Then replace $w_{2j}$ by $\max(w_{2j},w_{1j}-t_1-w_1)$, for all $j>1$.
15769Then $w_2=w_{22}$. Then replace $w_{3j}$ by $\max(w_{3j},w_{2j}-t_2-w_2)$
15770for all $j>2$; and so on. If any $w_j$ turns out to be $-\infty$, its
15771value is changed to zero and so is the next tabskip.
15772
15773@<Go through the preamble list,...@>=
15774q:=link(preamble);
15775repeat flush_list(u_part(q)); flush_list(v_part(q));
15776p:=link(link(q));
15777if width(q)=null_flag then
15778  @<Nullify |width(q)| and the tabskip glue following this column@>;
15779if info(q)<>end_span then
15780  @<Merge the widths in the span nodes of |q| with those of |p|,
15781    destroying the span nodes of |q|@>;
15782type(q):=unset_node; span_count(q):=min_quarterword; height(q):=0;
15783depth(q):=0; glue_order(q):=normal; glue_sign(q):=normal;
15784glue_stretch(q):=0; glue_shrink(q):=0; q:=p;
15785until q=null
15786
15787@ @<Nullify |width(q)| and the tabskip glue following this column@>=
15788begin width(q):=0; r:=link(q); s:=glue_ptr(r);
15789if s<>zero_glue then
15790  begin add_glue_ref(zero_glue); delete_glue_ref(s);
15791  glue_ptr(r):=zero_glue;
15792  end;
15793end
15794
15795@ Merging of two span-node lists is a typical exercise in the manipulation of
15796linearly linked data structures. The essential invariant in the following
15797|repeat| loop is that we want to dispense with node |r|, in |q|'s list,
15798and |u| is its successor; all nodes of |p|'s list up to and including |s|
15799have been processed, and the successor of |s| matches |r| or precedes |r|
15800or follows |r|, according as |link(r)=n| or |link(r)>n| or |link(r)<n|.
15801
15802@<Merge the widths...@>=
15803begin t:=width(q)+width(glue_ptr(link(q)));
15804r:=info(q); s:=end_span; info(s):=p; n:=min_quarterword+1;
15805repeat width(r):=width(r)-t; u:=info(r);
15806while link(r)>n do
15807  begin s:=info(s); n:=link(info(s))+1;
15808  end;
15809if link(r)<n then
15810  begin info(r):=info(s); info(s):=r; decr(link(r)); s:=r;
15811  end
15812else  begin if width(r)>width(info(s)) then width(info(s)):=width(r);
15813  free_node(r,span_node_size);
15814  end;
15815r:=u;
15816until r=end_span;
15817end
15818
15819@ Now the preamble list has been converted to a list of alternating unset
15820boxes and tabskip glue, where the box widths are equal to the final
15821column sizes. In case of \.{\\valign}, we change the widths to heights,
15822so that a correct error message will be produced if the alignment is
15823overfull or underfull.
15824
15825@<Package the preamble list...@>=
15826save_ptr:=save_ptr-2; pack_begin_line:=-mode_line;
15827if mode=-vmode then
15828  begin rule_save:=overfull_rule;
15829  overfull_rule:=0; {prevent rule from being packaged}
15830  p:=hpack(preamble,saved(1),saved(0)); overfull_rule:=rule_save;
15831  end
15832else  begin q:=link(preamble);
15833  repeat height(q):=width(q); width(q):=0; q:=link(link(q));
15834  until q=null;
15835  p:=vpack(preamble,saved(1),saved(0));
15836  q:=link(preamble);
15837  repeat width(q):=height(q); height(q):=0; q:=link(link(q));
15838  until q=null;
15839  end;
15840pack_begin_line:=0
15841
15842@ @<Set the glue in all the unset...@>=
15843q:=link(head); s:=head;
15844while q<>null do
15845  begin if not is_char_node(q) then
15846    if type(q)=unset_node then
15847      @<Set the unset box |q| and the unset boxes in it@>
15848    else if type(q)=rule_node then
15849      @<Make the running dimensions in rule |q| extend to the
15850        boundaries of the alignment@>;
15851  s:=q; q:=link(q);
15852  end
15853
15854@ @<Make the running dimensions in rule |q| extend...@>=
15855begin if is_running(width(q)) then width(q):=width(p);
15856if is_running(height(q)) then height(q):=height(p);
15857if is_running(depth(q)) then depth(q):=depth(p);
15858if o<>0 then
15859  begin r:=link(q); link(q):=null; q:=hpack(q,natural);
15860  shift_amount(q):=o; link(q):=r; link(s):=q;
15861  end;
15862end
15863
15864@ The unset box |q| represents a row that contains one or more unset boxes,
15865depending on how soon \.{\\cr} occurred in that row.
15866
15867@<Set the unset box |q| and the unset boxes in it@>=
15868begin if mode=-vmode then
15869  begin type(q):=hlist_node; width(q):=width(p);
15870  end
15871else  begin type(q):=vlist_node; height(q):=height(p);
15872  end;
15873glue_order(q):=glue_order(p); glue_sign(q):=glue_sign(p);
15874glue_set(q):=glue_set(p); shift_amount(q):=o;
15875r:=link(list_ptr(q)); s:=link(list_ptr(p));
15876repeat @<Set the glue in node |r| and change it from an unset node@>;
15877r:=link(link(r)); s:=link(link(s));
15878until r=null;
15879end
15880
15881@ A box made from spanned columns will be followed by tabskip glue nodes and
15882by empty boxes as if there were no spanning. This permits perfect alignment
15883of subsequent entries, and it prevents values that depend on floating point
15884arithmetic from entering into the dimensions of any boxes.
15885
15886@<Set the glue in node |r|...@>=
15887n:=span_count(r); t:=width(s); w:=t; u:=hold_head;
15888while n>min_quarterword do
15889  begin decr(n);
15890  @<Append tabskip glue and an empty box to list |u|,
15891    and update |s| and |t| as the prototype nodes are passed@>;
15892  end;
15893if mode=-vmode then
15894  @<Make the unset node |r| into an |hlist_node| of width |w|,
15895    setting the glue as if the width were |t|@>
15896else @<Make the unset node |r| into a |vlist_node| of height |w|,
15897    setting the glue as if the height were |t|@>;
15898shift_amount(r):=0;
15899if u<>hold_head then {append blank boxes to account for spanned nodes}
15900  begin link(u):=link(r); link(r):=link(hold_head); r:=u;
15901  end
15902
15903@ @<Append tabskip glue and an empty box to list |u|...@>=
15904s:=link(s); v:=glue_ptr(s); link(u):=new_glue(v); u:=link(u);
15905subtype(u):=tab_skip_code+1; t:=t+width(v);
15906if glue_sign(p)=stretching then
15907  begin if stretch_order(v)=glue_order(p) then
15908    t:=t+round(float(glue_set(p))*stretch(v));
15909@^real multiplication@>
15910  end
15911else if glue_sign(p)=shrinking then
15912  begin if shrink_order(v)=glue_order(p) then
15913    t:=t-round(float(glue_set(p))*shrink(v));
15914  end;
15915s:=link(s); link(u):=new_null_box; u:=link(u); t:=t+width(s);
15916if mode=-vmode then width(u):=width(s)@+else
15917  begin type(u):=vlist_node; height(u):=width(s);
15918  end
15919
15920@ @<Make the unset node |r| into an |hlist_node| of width |w|...@>=
15921begin height(r):=height(q); depth(r):=depth(q);
15922if t=width(r) then
15923  begin glue_sign(r):=normal; glue_order(r):=normal;
15924  set_glue_ratio_zero(glue_set(r));
15925  end
15926else if t>width(r) then
15927  begin glue_sign(r):=stretching;
15928  if glue_stretch(r)=0 then set_glue_ratio_zero(glue_set(r))
15929  else glue_set(r):=unfloat((t-width(r))/glue_stretch(r));
15930@^real division@>
15931  end
15932else  begin glue_order(r):=glue_sign(r); glue_sign(r):=shrinking;
15933  if glue_shrink(r)=0 then set_glue_ratio_zero(glue_set(r))
15934  else if (glue_order(r)=normal)and(width(r)-t>glue_shrink(r)) then
15935    set_glue_ratio_one(glue_set(r))
15936  else glue_set(r):=unfloat((width(r)-t)/glue_shrink(r));
15937  end;
15938width(r):=w; type(r):=hlist_node;
15939end
15940
15941@ @<Make the unset node |r| into a |vlist_node| of height |w|...@>=
15942begin width(r):=width(q);
15943if t=height(r) then
15944  begin glue_sign(r):=normal; glue_order(r):=normal;
15945  set_glue_ratio_zero(glue_set(r));
15946  end
15947else if t>height(r) then
15948  begin glue_sign(r):=stretching;
15949  if glue_stretch(r)=0 then set_glue_ratio_zero(glue_set(r))
15950  else glue_set(r):=unfloat((t-height(r))/glue_stretch(r));
15951@^real division@>
15952  end
15953else  begin glue_order(r):=glue_sign(r); glue_sign(r):=shrinking;
15954  if glue_shrink(r)=0 then set_glue_ratio_zero(glue_set(r))
15955  else if (glue_order(r)=normal)and(height(r)-t>glue_shrink(r)) then
15956    set_glue_ratio_one(glue_set(r))
15957  else glue_set(r):=unfloat((height(r)-t)/glue_shrink(r));
15958  end;
15959height(r):=w; type(r):=vlist_node;
15960end
15961
15962@ We now have a completed alignment, in the list that starts at |head|
15963and ends at |tail|. This list will be merged with the one that encloses
15964it. (In case the enclosing mode is |mmode|, for displayed formulas,
15965we will need to insert glue before and after the display; that part of the
15966program will be deferred until we're more familiar with such operations.)
15967
15968In restricted horizontal mode, the |clang| part of |aux| is undefined;
15969an over-cautious \PASCAL\ runtime system may complain about this.
15970@^dirty \PASCAL@>
15971
15972@<Insert the \(c)current list into its environment@>=
15973aux_save:=aux; p:=link(head); q:=tail; pop_nest;
15974if mode=mmode then @<Finish an alignment in a display@>
15975else  begin aux:=aux_save; link(tail):=p;
15976  if p<>null then tail:=q;
15977  if mode=vmode then build_page;
15978  end
15979
15980@* \[38] Breaking paragraphs into lines.
15981We come now to what is probably the most interesting algorithm of \TeX:
15982the mechanism for choosing the ``best possible'' breakpoints that yield
15983the individual lines of a paragraph. \TeX's line-breaking algorithm takes
15984a given horizontal list and converts it to a sequence of boxes that are
15985appended to the current vertical list. In the course of doing this, it
15986creates a special data structure containing three kinds of records that are
15987not used elsewhere in \TeX. Such nodes are created while a paragraph is
15988being processed, and they are destroyed afterwards; thus, the other parts
15989of \TeX\ do not need to know anything about how line-breaking is done.
15990
15991The method used here is based on an approach devised by Michael F. Plass and
15992@^Plass, Michael Frederick@>
15993@^Knuth, Donald Ervin@>
15994the author in 1977, subsequently generalized and improved by the same two
15995people in 1980. A detailed discussion appears in {\sl SOFTWARE---Practice
15996\AM\ Experience \bf11} (1981), 1119--1184, where it is shown that the
15997line-breaking problem can be regarded as a special case of the problem of
15998computing the shortest path in an acyclic network. The cited paper includes
15999numerous examples and describes the history of line breaking as it has been
16000practiced by printers through the ages. The present implementation adds two
16001new ideas to the algorithm of 1980: Memory space requirements are considerably
16002reduced by using smaller records for inactive nodes than for active ones,
16003and arithmetic overflow is avoided by using ``delta distances'' instead of
16004keeping track of the total distance from the beginning of the paragraph to the
16005current point.
16006
16007@ The |line_break| procedure should be invoked only in horizontal mode; it
16008leaves that mode and places its output into the current vlist of the
16009enclosing vertical mode (or internal vertical mode).
16010There is one explicit parameter:  |final_widow_penalty| is the amount of
16011additional penalty to be inserted before the final line of the paragraph.
16012
16013There are also a number of implicit parameters: The hlist to be broken
16014starts at |link(head)|, and it is nonempty. The value of |prev_graf| in the
16015enclosing semantic level tells where the paragraph should begin in the
16016sequence of line numbers, in case hanging indentation or \.{\\parshape}
16017is in use; |prev_graf| is zero unless this paragraph is being continued
16018after a displayed formula.  Other implicit parameters, such as the
16019|par_shape_ptr| and various penalties to use for hyphenation, etc., appear
16020in |eqtb|.
16021
16022After |line_break| has acted, it will have updated the current vlist and the
16023value of |prev_graf|. Furthermore, the global variable |just_box| will
16024point to the final box created by |line_break|, so that the width of this
16025line can be ascertained when it is necessary to decide whether to use
16026|above_display_skip| or |above_display_short_skip| before a displayed formula.
16027
16028@<Glob...@>=
16029@!just_box:pointer; {the |hlist_node| for the last line of the new paragraph}
16030
16031@ Since |line_break| is a rather lengthy procedure---sort of a small world unto
16032itself---we must build it up little by little, somewhat more cautiously
16033than we have done with the simpler procedures of \TeX. Here is the
16034general outline.
16035
16036@p@t\4@>@<Declare subprocedures for |line_break|@>
16037procedure line_break(@!final_widow_penalty:integer);
16038label done,done1,done2,done3,done4,done5,continue;
16039var @<Local variables for line breaking@>@;
16040begin pack_begin_line:=mode_line; {this is for over/underfull box messages}
16041@<Get ready to start line breaking@>;
16042@<Find optimal breakpoints@>;
16043@<Break the paragraph at the chosen breakpoints, justify the resulting lines
16044to the correct widths, and append them to the current vertical list@>;
16045@<Clean up the memory by removing the break nodes@>;
16046pack_begin_line:=0;
16047end;
16048
16049@ The first task is to move the list from |head| to |temp_head| and go
16050into the enclosing semantic level. We also append the \.{\\parfillskip}
16051glue to the end of the paragraph, removing a space (or other glue node) if
16052it was there, since spaces usually precede blank lines and instances of
16053`\.{\$\$}'. The |par_fill_skip| is preceded by an infinite penalty, so
16054it will never be considered as a potential breakpoint.
16055
16056This code assumes that a |glue_node| and a |penalty_node| occupy the
16057same number of |mem|~words.
16058@^data structure assumptions@>
16059
16060@<Get ready to start...@>=
16061link(temp_head):=link(head);
16062if is_char_node(tail) then tail_append(new_penalty(inf_penalty))
16063else if type(tail)<>glue_node then tail_append(new_penalty(inf_penalty))
16064else  begin type(tail):=penalty_node; delete_glue_ref(glue_ptr(tail));
16065  flush_node_list(leader_ptr(tail)); penalty(tail):=inf_penalty;
16066  end;
16067link(tail):=new_param_glue(par_fill_skip_code);
16068init_cur_lang:=prev_graf mod @'200000;
16069init_l_hyf:=prev_graf div @'20000000;
16070init_r_hyf:=(prev_graf div @'200000) mod @'100;
16071pop_nest;
16072
16073@ When looking for optimal line breaks, \TeX\ creates a ``break node'' for
16074each break that is {\sl feasible}, in the sense that there is a way to end
16075a line at the given place without requiring any line to stretch more than
16076a given tolerance. A break node is characterized by three things: the position
16077of the break (which is a pointer to a |glue_node|, |math_node|, |penalty_node|,
16078or |disc_node|); the ordinal number of the line that will follow this
16079breakpoint; and the fitness classification of the line that has just
16080ended, i.e., |tight_fit|, |decent_fit|, |loose_fit|, or |very_loose_fit|.
16081
16082@d tight_fit=3 {fitness classification for lines shrinking 0.5 to 1.0 of their
16083  shrinkability}
16084@d loose_fit=1 {fitness classification for lines stretching 0.5 to 1.0 of their
16085  stretchability}
16086@d very_loose_fit=0 {fitness classification for lines stretching more than
16087  their stretchability}
16088@d decent_fit=2 {fitness classification for all other lines}
16089
16090@ The algorithm essentially determines the best possible way to achieve
16091each feasible combination of position, line, and fitness. Thus, it answers
16092questions like, ``What is the best way to break the opening part of the
16093paragraph so that the fourth line is a tight line ending at such-and-such
16094a place?'' However, the fact that all lines are to be the same length
16095after a certain point makes it possible to regard all sufficiently large
16096line numbers as equivalent, when the looseness parameter is zero, and this
16097makes it possible for the algorithm to save space and time.
16098
16099An ``active node'' and a ``passive node'' are created in |mem| for each
16100feasible breakpoint that needs to be considered. Active nodes are three
16101words long and passive nodes are two words long. We need active nodes only
16102for breakpoints near the place in the paragraph that is currently being
16103examined, so they are recycled within a comparatively short time after
16104they are created.
16105
16106@ An active node for a given breakpoint contains six fields:
16107
16108\yskip\hang|link| points to the next node in the list of active nodes; the
16109last active node has |link=last_active|.
16110
16111\yskip\hang|break_node| points to the passive node associated with this
16112breakpoint.
16113
16114\yskip\hang|line_number| is the number of the line that follows this
16115breakpoint.
16116
16117\yskip\hang|fitness| is the fitness classification of the line ending at this
16118breakpoint.
16119
16120\yskip\hang|type| is either |hyphenated| or |unhyphenated|, depending on
16121whether this breakpoint is a |disc_node|.
16122
16123\yskip\hang|total_demerits| is the minimum possible sum of demerits over all
16124lines leading from the beginning of the paragraph to this breakpoint.
16125
16126\yskip\noindent
16127The value of |link(active)| points to the first active node on a linked list
16128of all currently active nodes. This list is in order by |line_number|,
16129except that nodes with |line_number>easy_line| may be in any order relative
16130to each other.
16131
16132@d active_node_size=3 {number of words in active nodes}
16133@d fitness==subtype {|very_loose_fit..tight_fit| on final line for this break}
16134@d break_node==rlink {pointer to the corresponding passive node}
16135@d line_number==llink {line that begins at this breakpoint}
16136@d total_demerits(#)==mem[#+2].int {the quantity that \TeX\ minimizes}
16137@d unhyphenated=0 {the |type| of a normal active break node}
16138@d hyphenated=1 {the |type| of an active node that breaks at a |disc_node|}
16139@d last_active==active {the active list ends where it begins}
16140
16141@ @<Initialize the special list heads...@>=
16142type(last_active):=hyphenated; line_number(last_active):=max_halfword;
16143subtype(last_active):=0; {the |subtype| is never examined by the algorithm}
16144
16145@ The passive node for a given breakpoint contains only four fields:
16146
16147\yskip\hang|link| points to the passive node created just before this one,
16148if any, otherwise it is |null|.
16149
16150\yskip\hang|cur_break| points to the position of this breakpoint in the
16151horizontal list for the paragraph being broken.
16152
16153\yskip\hang|prev_break| points to the passive node that should precede this
16154one in an optimal path to this breakpoint.
16155
16156\yskip\hang|serial| is equal to |n| if this passive node is the |n|th
16157one created during the current pass. (This field is used only when
16158printing out detailed statistics about the line-breaking calculations.)
16159
16160\yskip\noindent
16161There is a global variable called |passive| that points to the most
16162recently created passive node. Another global variable, |printed_node|,
16163is used to help print out the paragraph when detailed information about
16164the line-breaking computation is being displayed.
16165
16166@d passive_node_size=2 {number of words in passive nodes}
16167@d cur_break==rlink {in passive node, points to position of this breakpoint}
16168@d prev_break==llink {points to passive node that should precede this one}
16169@d serial==info {serial number for symbolic identification}
16170
16171@<Glob...@>=
16172@!passive:pointer; {most recent node on passive list}
16173@!printed_node:pointer; {most recent node that has been printed}
16174@!pass_number:halfword; {the number of passive nodes allocated on this pass}
16175
16176@ The active list also contains ``delta'' nodes that help the algorithm
16177compute the badness of individual lines. Such nodes appear only between two
16178active nodes, and they have |type=delta_node|. If |p| and |r| are active nodes
16179and if |q| is a delta node between them, so that |link(p)=q| and |link(q)=r|,
16180then |q| tells the space difference between lines in the horizontal list that
16181start after breakpoint |p| and lines that start after breakpoint |r|. In
16182other words, if we know the length of the line that starts after |p| and
16183ends at our current position, then the corresponding length of the line that
16184starts after |r| is obtained by adding the amounts in node~|q|. A delta node
16185contains six scaled numbers, since it must record the net change in glue
16186stretchability with respect to all orders of infinity. The natural width
16187difference appears in |mem[q+1].sc|; the stretch differences in units of
16188pt, fil, fill, and filll appear in |mem[q+2..q+5].sc|; and the shrink difference
16189appears in |mem[q+6].sc|. The |subtype| field of a delta node is not used.
16190
16191@d delta_node_size=7 {number of words in a delta node}
16192@d delta_node=2 {|type| field in a delta node}
16193
16194@ As the algorithm runs, it maintains a set of six delta-like registers
16195for the length of the line following the first active breakpoint to the
16196current position in the given hlist. When it makes a pass through the
16197active list, it also maintains a similar set of six registers for the
16198length following the active breakpoint of current interest. A third set
16199holds the length of an empty line (namely, the sum of \.{\\leftskip} and
16200\.{\\rightskip}); and a fourth set is used to create new delta nodes.
16201
16202When we pass a delta node we want to do operations like
16203$$\hbox{\ignorespaces|for
16204k:=1 to 6 do cur_active_width[k]:=cur_active_width[k]+mem[q+k].sc|};$$ and we
16205want to do this without the overhead of |for| loops. The |do_all_six|
16206macro makes such six-tuples convenient.
16207
16208@d do_all_six(#)==#(1);#(2);#(3);#(4);#(5);#(6)
16209
16210@<Glob...@>=
16211@!active_width:array[1..6] of scaled;
16212  {distance from first active node to~|cur_p|}
16213@!cur_active_width:array[1..6] of scaled; {distance from current active node}
16214@!background:array[1..6] of scaled; {length of an ``empty'' line}
16215@!break_width:array[1..6] of scaled; {length being computed after current break}
16216
16217@ Let's state the principles of the delta nodes more precisely and concisely,
16218so that the following programs will be less obscure. For each legal
16219breakpoint~|p| in the paragraph, we define two quantities $\alpha(p)$ and
16220$\beta(p)$ such that the length of material in a line from breakpoint~|p|
16221to breakpoint~|q| is $\gamma+\beta(q)-\alpha(p)$, for some fixed $\gamma$.
16222Intuitively, $\alpha(p)$ and $\beta(q)$ are the total length of material from
16223the beginning of the paragraph to a point ``after'' a break at |p| and to a
16224point ``before'' a break at |q|; and $\gamma$ is the width of an empty line,
16225namely the length contributed by \.{\\leftskip} and \.{\\rightskip}.
16226
16227Suppose, for example, that the paragraph consists entirely of alternating
16228boxes and glue skips; let the boxes have widths $x_1\ldots x_n$ and
16229let the skips have widths $y_1\ldots y_n$, so that the paragraph can be
16230represented by $x_1y_1\ldots x_ny_n$. Let $p_i$ be the legal breakpoint
16231at $y_i$; then $\alpha(p_i)=x_1+y_1+\cdots+x_i+y_i$, and $\beta(p_i)=
16232x_1+y_1+\cdots+x_i$. To check this, note that the length of material from
16233$p_2$ to $p_5$, say, is $\gamma+x_3+y_3+x_4+y_4+x_5=\gamma+\beta(p_5)
16234-\alpha(p_2)$.
16235
16236The quantities $\alpha$, $\beta$, $\gamma$ involve glue stretchability and
16237shrinkability as well as a natural width. If we were to compute $\alpha(p)$
16238and $\beta(p)$ for each |p|, we would need multiple precision arithmetic, and
16239the multiprecise numbers would have to be kept in the active nodes.
16240\TeX\ avoids this problem by working entirely with relative differences
16241or ``deltas.'' Suppose, for example, that the active list contains
16242$a_1\,\delta_1\,a_2\,\delta_2\,a_3$, where the |a|'s are active breakpoints
16243and the $\delta$'s are delta nodes. Then $\delta_1=\alpha(a_1)-\alpha(a_2)$
16244and $\delta_2=\alpha(a_2)-\alpha(a_3)$. If the line breaking algorithm is
16245currently positioned at some other breakpoint |p|, the |active_width| array
16246contains the value $\gamma+\beta(p)-\alpha(a_1)$. If we are scanning through
16247the list of active nodes and considering a tentative line that runs from
16248$a_2$ to~|p|, say, the |cur_active_width| array will contain the value
16249$\gamma+\beta(p)-\alpha(a_2)$. Thus, when we move from $a_2$ to $a_3$,
16250we want to add $\alpha(a_2)-\alpha(a_3)$ to |cur_active_width|; and this
16251is just $\delta_2$, which appears in the active list between $a_2$ and
16252$a_3$. The |background| array contains $\gamma$. The |break_width| array
16253will be used to calculate values of new delta nodes when the active
16254list is being updated.
16255
16256@ Glue nodes in a horizontal list that is being paragraphed are not supposed to
16257include ``infinite'' shrinkability; that is why the algorithm maintains
16258four registers for stretching but only one for shrinking. If the user tries to
16259introduce infinite shrinkability, the shrinkability will be reset to finite
16260and an error message will be issued. A boolean variable |no_shrink_error_yet|
16261prevents this error message from appearing more than once per paragraph.
16262
16263@d check_shrinkage(#)==if (shrink_order(#)<>normal)and(shrink(#)<>0) then
16264  begin #:=finite_shrink(#);
16265  end
16266
16267@<Glob...@>=
16268@!no_shrink_error_yet:boolean; {have we complained about infinite shrinkage?}
16269
16270@ @<Declare subprocedures for |line_break|@>=
16271function finite_shrink(@!p:pointer):pointer; {recovers from infinite shrinkage}
16272var q:pointer; {new glue specification}
16273begin if no_shrink_error_yet then
16274  begin no_shrink_error_yet:=false;
16275  print_err("Infinite glue shrinkage found in a paragraph");
16276@.Infinite glue shrinkage...@>
16277  help5("The paragraph just ended includes some glue that has")@/
16278  ("infinite shrinkability, e.g., `\hskip 0pt minus 1fil'.")@/
16279  ("Such glue doesn't belong there---it allows a paragraph")@/
16280  ("of any length to fit on one line. But it's safe to proceed,")@/
16281  ("since the offensive shrinkability has been made finite.");
16282  error;
16283  end;
16284q:=new_spec(p); shrink_order(q):=normal;
16285delete_glue_ref(p); finite_shrink:=q;
16286end;
16287
16288@ @<Get ready to start...@>=
16289no_shrink_error_yet:=true;@/
16290check_shrinkage(left_skip); check_shrinkage(right_skip);@/
16291q:=left_skip; r:=right_skip; background[1]:=width(q)+width(r);@/
16292background[2]:=0; background[3]:=0; background[4]:=0; background[5]:=0;@/
16293background[2+stretch_order(q)]:=stretch(q);@/
16294background[2+stretch_order(r)]:=@|background[2+stretch_order(r)]+stretch(r);@/
16295background[6]:=shrink(q)+shrink(r);
16296
16297@ A pointer variable |cur_p| runs through the given horizontal list as we look
16298for breakpoints. This variable is global, since it is used both by |line_break|
16299and by its subprocedure |try_break|.
16300
16301Another global variable called |threshold| is used to determine the feasibility
16302of individual lines: Breakpoints are feasible if there is a way to reach
16303them without creating lines whose badness exceeds |threshold|.  (The
16304badness is compared to |threshold| before penalties are added, so that
16305penalty values do not affect the feasibility of breakpoints, except that
16306no break is allowed when the penalty is 10000 or more.) If |threshold|
16307is 10000 or more, all legal breaks are considered feasible, since the
16308|badness| function specified above never returns a value greater than~10000.
16309
16310Up to three passes might be made through the paragraph in an attempt to find at
16311least one set of feasible breakpoints. On the first pass, we have
16312|threshold=pretolerance| and |second_pass=final_pass=false|.
16313If this pass fails to find a
16314feasible solution, |threshold| is set to |tolerance|, |second_pass| is set
16315|true|, and an attempt is made to hyphenate as many words as possible.
16316If that fails too, we add |emergency_stretch| to the background
16317stretchability and set |final_pass=true|.
16318
16319@<Glob...@>=
16320@!cur_p:pointer; {the current breakpoint under consideration}
16321@!second_pass:boolean; {is this our second attempt to break this paragraph?}
16322@!final_pass:boolean; {is this our final attempt to break this paragraph?}
16323@!threshold:integer; {maximum badness on feasible lines}
16324
16325@ The heart of the line-breaking procedure is `|try_break|', a subroutine
16326that tests if the current breakpoint |cur_p| is feasible, by running
16327through the active list to see what lines of text can be made from active
16328nodes to~|cur_p|.  If feasible breaks are possible, new break nodes are
16329created.  If |cur_p| is too far from an active node, that node is
16330deactivated.
16331
16332The parameter |pi| to |try_break| is the penalty associated
16333with a break at |cur_p|; we have |pi=eject_penalty| if the break is forced,
16334and |pi=inf_penalty| if the break is illegal.
16335
16336The other parameter, |break_type|, is set to |hyphenated| or |unhyphenated|,
16337depending on whether or not the current break is at a |disc_node|. The
16338end of a paragraph is also regarded as `|hyphenated|'; this case is
16339distinguishable by the condition |cur_p=null|.
16340
16341@d copy_to_cur_active(#)==cur_active_width[#]:=active_width[#]
16342@d deactivate=60 {go here when node |r| should be deactivated}
16343
16344@<Declare subprocedures for |line_break|@>=
16345procedure try_break(@!pi:integer;@!break_type:small_number);
16346label exit,done,done1,continue,deactivate;
16347var r:pointer; {runs through the active list}
16348@!prev_r:pointer; {stays a step behind |r|}
16349@!old_l:halfword; {maximum line number in current equivalence class of lines}
16350@!no_break_yet:boolean; {have we found a feasible break at |cur_p|?}
16351@<Other local variables for |try_break|@>@;
16352begin @<Make sure that |pi| is in the proper range@>;
16353no_break_yet:=true; prev_r:=active; old_l:=0;
16354do_all_six(copy_to_cur_active);
16355loop@+  begin continue: r:=link(prev_r);
16356  @<If node |r| is of type |delta_node|, update |cur_active_width|,
16357    set |prev_r| and |prev_prev_r|, then |goto continue|@>;
16358  @<If a line number class has ended, create new active nodes for
16359    the best feasible breaks in that class; then |return|
16360    if |r=last_active|, otherwise compute the new |line_width|@>;
16361  @<Consider the demerits for a line from |r| to |cur_p|;
16362    deactivate node |r| if it should no longer be active;
16363    then |goto continue| if a line from |r| to |cur_p| is infeasible,
16364    otherwise record a new feasible break@>;
16365  end;
16366exit: @!stat @<Update the value of |printed_node| for
16367  symbolic displays@>@+tats@;
16368end;
16369
16370@ @<Other local variables for |try_break|@>=
16371@!prev_prev_r:pointer; {a step behind |prev_r|, if |type(prev_r)=delta_node|}
16372@!s:pointer; {runs through nodes ahead of |cur_p|}
16373@!q:pointer; {points to a new node being created}
16374@!v:pointer; {points to a glue specification or a node ahead of |cur_p|}
16375@!t:integer; {node count, if |cur_p| is a discretionary node}
16376@!f:internal_font_number; {used in character width calculation}
16377@!l:halfword; {line number of current active node}
16378@!node_r_stays_active:boolean; {should node |r| remain in the active list?}
16379@!line_width:scaled; {the current line will be justified to this width}
16380@!fit_class:very_loose_fit..tight_fit; {possible fitness class of test line}
16381@!b:halfword; {badness of test line}
16382@!d:integer; {demerits of test line}
16383@!artificial_demerits:boolean; {has |d| been forced to zero?}
16384@!save_link:pointer; {temporarily holds value of |link(cur_p)|}
16385@!shortfall:scaled; {used in badness calculations}
16386
16387@ @<Make sure that |pi| is in the proper range@>=
16388if abs(pi)>=inf_penalty then
16389  if pi>0 then return {this breakpoint is inhibited by infinite penalty}
16390  else pi:=eject_penalty {this breakpoint will be forced}
16391
16392@ The following code uses the fact that |type(last_active)<>delta_node|.
16393
16394@d update_width(#)==@|
16395  cur_active_width[#]:=cur_active_width[#]+mem[r+#].sc
16396
16397@<If node |r|...@>=
16398@^inner loop@>
16399if type(r)=delta_node then
16400  begin do_all_six(update_width);
16401  prev_prev_r:=prev_r; prev_r:=r; goto continue;
16402  end
16403
16404@ As we consider various ways to end a line at |cur_p|, in a given line number
16405class, we keep track of the best total demerits known, in an array with
16406one entry for each of the fitness classifications. For example,
16407|minimal_demerits[tight_fit]| contains the fewest total demerits of feasible
16408line breaks ending at |cur_p| with a |tight_fit| line; |best_place[tight_fit]|
16409points to the passive node for the break before~|cur_p| that achieves such
16410an optimum; and |best_pl_line[tight_fit]| is the |line_number| field in the
16411active node corresponding to |best_place[tight_fit]|. When no feasible break
16412sequence is known, the |minimal_demerits| entries will be equal to
16413|awful_bad|, which is $2^{30}-1$. Another variable, |minimum_demerits|,
16414keeps track of the smallest value in the |minimal_demerits| array.
16415
16416@d awful_bad==@'7777777777 {more than a billion demerits}
16417
16418@<Global...@>=
16419@!minimal_demerits:array[very_loose_fit..tight_fit] of integer; {best total
16420  demerits known for current line class and position, given the fitness}
16421@!minimum_demerits:integer; {best total demerits known for current line class
16422  and position}
16423@!best_place:array[very_loose_fit..tight_fit] of pointer; {how to achieve
16424  |minimal_demerits|}
16425@!best_pl_line:array[very_loose_fit..tight_fit] of halfword; {corresponding
16426  line number}
16427
16428@ @<Get ready to start...@>=
16429minimum_demerits:=awful_bad;
16430minimal_demerits[tight_fit]:=awful_bad;
16431minimal_demerits[decent_fit]:=awful_bad;
16432minimal_demerits[loose_fit]:=awful_bad;
16433minimal_demerits[very_loose_fit]:=awful_bad;
16434
16435@ The first part of the following code is part of \TeX's inner loop, so
16436we don't want to waste any time. The current active node, namely node |r|,
16437contains the line number that will be considered next. At the end of the
16438list we have arranged the data structure so that |r=last_active| and
16439|line_number(last_active)>old_l|.
16440@^inner loop@>
16441
16442@<If a line number class...@>=
16443begin l:=line_number(r);
16444if l>old_l then
16445  begin {now we are no longer in the inner loop}
16446  if (minimum_demerits<awful_bad)and@|
16447      ((old_l<>easy_line)or(r=last_active)) then
16448    @<Create new active nodes for the best feasible breaks
16449      just found@>;
16450  if r=last_active then return;
16451  @<Compute the new line width@>;
16452  end;
16453end
16454
16455@ It is not necessary to create new active nodes having |minimal_demerits|
16456greater than
16457|minimum_demerits+abs(adj_demerits)|, since such active nodes will never
16458be chosen in the final paragraph breaks. This observation allows us to
16459omit a substantial number of feasible breakpoints from further consideration.
16460
16461@<Create new active nodes...@>=
16462begin if no_break_yet then @<Compute the values of |break_width|@>;
16463@<Insert a delta node to prepare for breaks at |cur_p|@>;
16464if abs(adj_demerits)>=awful_bad-minimum_demerits then
16465  minimum_demerits:=awful_bad-1
16466else minimum_demerits:=minimum_demerits+abs(adj_demerits);
16467for fit_class:=very_loose_fit to tight_fit do
16468  begin if minimal_demerits[fit_class]<=minimum_demerits then
16469    @<Insert a new active node
16470      from |best_place[fit_class]| to |cur_p|@>;
16471  minimal_demerits[fit_class]:=awful_bad;
16472  end;
16473minimum_demerits:=awful_bad;
16474@<Insert a delta node to prepare for the next active node@>;
16475end
16476
16477@ When we insert a new active node for a break at |cur_p|, suppose this
16478new node is to be placed just before active node |a|; then we essentially
16479want to insert `$\delta\,|cur_p|\,\delta^\prime$' before |a|, where
16480$\delta=\alpha(a)-\alpha(|cur_p|)$ and $\delta^\prime=\alpha(|cur_p|)-\alpha(a)$
16481in the notation explained above.  The |cur_active_width| array now holds
16482$\gamma+\beta(|cur_p|)-\alpha(a)$; so $\delta$ can be obtained by
16483subtracting |cur_active_width| from the quantity $\gamma+\beta(|cur_p|)-
16484\alpha(|cur_p|)$. The latter quantity can be regarded as the length of a
16485line ``from |cur_p| to |cur_p|''; we call it the |break_width| at |cur_p|.
16486
16487The |break_width| is usually negative, since it consists of the background
16488(which is normally zero) minus the width of nodes following~|cur_p| that are
16489eliminated after a break. If, for example, node |cur_p| is a glue node, the
16490width of this glue is subtracted from the background; and we also look
16491ahead to eliminate all subsequent glue and penalty and kern and math
16492nodes, subtracting their widths as well.
16493
16494Kern nodes do not disappear at a line break unless they are |explicit|.
16495
16496@d set_break_width_to_background(#)==break_width[#]:=background[#]
16497
16498@<Compute the values of |break...@>=
16499begin no_break_yet:=false; do_all_six(set_break_width_to_background);
16500s:=cur_p;
16501if break_type>unhyphenated then if cur_p<>null then
16502  @<Compute the discretionary |break_width| values@>;
16503while s<>null do
16504  begin if is_char_node(s) then goto done;
16505  case type(s) of
16506  glue_node:@<Subtract glue from |break_width|@>;
16507  penalty_node: do_nothing;
16508  math_node: break_width[1]:=break_width[1]-width(s);
16509  kern_node: if subtype(s)<>explicit then goto done
16510    else break_width[1]:=break_width[1]-width(s);
16511  othercases goto done
16512  endcases;@/
16513  s:=link(s);
16514  end;
16515done: end
16516
16517@ @<Subtract glue from |break...@>=
16518begin v:=glue_ptr(s); break_width[1]:=break_width[1]-width(v);
16519break_width[2+stretch_order(v)]:=break_width[2+stretch_order(v)]-stretch(v);
16520break_width[6]:=break_width[6]-shrink(v);
16521end
16522
16523@ When |cur_p| is a discretionary break, the length of a line ``from |cur_p| to
16524|cur_p|'' has to be defined properly so that the other calculations work out.
16525Suppose that the pre-break text at |cur_p| has length $l_0$, the post-break
16526text has length $l_1$, and the replacement text has length |l|. Suppose
16527also that |q| is the node following the replacement text. Then length of a
16528line from |cur_p| to |q| will be computed as $\gamma+\beta(q)-\alpha(|cur_p|)$,
16529where $\beta(q)=\beta(|cur_p|)-l_0+l$. The actual length will be the background
16530plus $l_1$, so the length from |cur_p| to |cur_p| should be $\gamma+l_0+l_1-l$.
16531If the post-break text of the discretionary is empty, a break may also
16532discard~|q|; in that unusual case we subtract the length of~|q| and any
16533other nodes that will be discarded after the discretionary break.
16534
16535The value of $l_0$ need not be computed, since |line_break| will put
16536it into the global variable |disc_width| before calling |try_break|.
16537
16538@<Glob...@>=
16539@!disc_width:scaled; {the length of discretionary material preceding a break}
16540
16541@ @<Compute the discretionary |break...@>=
16542begin t:=replace_count(cur_p); v:=cur_p; s:=post_break(cur_p);
16543while t>0 do
16544  begin decr(t); v:=link(v);
16545  @<Subtract the width of node |v| from |break_width|@>;
16546  end;
16547while s<>null do
16548  begin @<Add the width of node |s| to |break_width|@>;
16549  s:=link(s);
16550  end;
16551break_width[1]:=break_width[1]+disc_width;
16552if post_break(cur_p)=null then s:=link(v);
16553          {nodes may be discardable after the break}
16554end
16555
16556@ Replacement texts and discretionary texts are supposed to contain
16557only character nodes, kern nodes, ligature nodes, and box or rule nodes.
16558
16559@<Subtract the width of node |v|...@>=
16560if is_char_node(v) then
16561  begin f:=font(v);
16562  break_width[1]:=break_width[1]-char_width(f)(char_info(f)(character(v)));
16563  end
16564else  case type(v) of
16565  ligature_node: begin f:=font(lig_char(v));@/
16566    break_width[1]:=@|break_width[1]-
16567      char_width(f)(char_info(f)(character(lig_char(v))));
16568    end;
16569  hlist_node,vlist_node,rule_node,kern_node:
16570    break_width[1]:=break_width[1]-width(v);
16571  othercases confusion("disc1")
16572@:this can't happen disc1}{\quad disc1@>
16573  endcases
16574
16575@ @<Add the width of node |s| to |b...@>=
16576if is_char_node(s) then
16577  begin f:=font(s);
16578  break_width[1]:=@|break_width[1]+char_width(f)(char_info(f)(character(s)));
16579  end
16580else  case type(s) of
16581  ligature_node: begin f:=font(lig_char(s));
16582    break_width[1]:=break_width[1]+
16583      char_width(f)(char_info(f)(character(lig_char(s))));
16584    end;
16585  hlist_node,vlist_node,rule_node,kern_node:
16586    break_width[1]:=break_width[1]+width(s);
16587  othercases confusion("disc2")
16588@:this can't happen disc2}{\quad disc2@>
16589  endcases
16590
16591@ We use the fact that |type(active)<>delta_node|.
16592
16593@d convert_to_break_width(#)==@|
16594  mem[prev_r+#].sc:=@|@t\hskip10pt@>mem[prev_r+#].sc
16595  -cur_active_width[#]+break_width[#]
16596@d store_break_width(#)==active_width[#]:=break_width[#]
16597@d new_delta_to_break_width(#)==@|
16598  mem[q+#].sc:=break_width[#]-cur_active_width[#]
16599
16600@<Insert a delta node to prepare for breaks at |cur_p|@>=
16601if type(prev_r)=delta_node then {modify an existing delta node}
16602  begin do_all_six(convert_to_break_width);
16603  end
16604else if prev_r=active then {no delta node needed at the beginning}
16605  begin do_all_six(store_break_width);
16606  end
16607else  begin q:=get_node(delta_node_size); link(q):=r; type(q):=delta_node;@/
16608  subtype(q):=0; {the |subtype| is not used}
16609  do_all_six(new_delta_to_break_width);
16610  link(prev_r):=q; prev_prev_r:=prev_r; prev_r:=q;
16611  end
16612
16613@ When the following code is performed, we will have just inserted at
16614least one active node before |r|, so |type(prev_r)<>delta_node|.
16615
16616@d new_delta_from_break_width(#)==@|mem[q+#].sc:=
16617    cur_active_width[#]-break_width[#]
16618
16619@<Insert a delta node to prepare for the next active node@>=
16620if r<>last_active then
16621  begin q:=get_node(delta_node_size); link(q):=r; type(q):=delta_node;@/
16622  subtype(q):=0; {the |subtype| is not used}
16623  do_all_six(new_delta_from_break_width);
16624  link(prev_r):=q; prev_prev_r:=prev_r; prev_r:=q;
16625  end
16626
16627@ When we create an active node, we also create the corresponding
16628passive node.
16629
16630@<Insert a new active node from |best_place[fit_class]| to |cur_p|@>=
16631begin q:=get_node(passive_node_size);
16632link(q):=passive; passive:=q; cur_break(q):=cur_p;
16633@!stat incr(pass_number); serial(q):=pass_number;@+tats@;@/
16634prev_break(q):=best_place[fit_class];@/
16635q:=get_node(active_node_size); break_node(q):=passive;
16636line_number(q):=best_pl_line[fit_class]+1;
16637fitness(q):=fit_class; type(q):=break_type;
16638total_demerits(q):=minimal_demerits[fit_class];
16639link(q):=r; link(prev_r):=q; prev_r:=q;
16640@!stat if tracing_paragraphs>0 then
16641  @<Print a symbolic description of the new break node@>;
16642tats@;@/
16643end
16644
16645@ @<Print a symbolic description of the new break node@>=
16646begin print_nl("@@@@"); print_int(serial(passive));
16647@.\AT!\AT!@>
16648print(": line "); print_int(line_number(q)-1);
16649print_char("."); print_int(fit_class);
16650if break_type=hyphenated then print_char("-");
16651print(" t="); print_int(total_demerits(q));
16652print(" -> @@@@");
16653if prev_break(passive)=null then print_char("0")
16654else print_int(serial(prev_break(passive)));
16655end
16656
16657@ The length of lines depends on whether the user has specified
16658\.{\\parshape} or \.{\\hangindent}. If |par_shape_ptr| is not null, it
16659points to a $(2n+1)$-word record in |mem|, where the |info| in the first
16660word contains the value of |n|, and the other $2n$ words contain the left
16661margins and line lengths for the first |n| lines of the paragraph; the
16662specifications for line |n| apply to all subsequent lines. If
16663|par_shape_ptr=null|, the shape of the paragraph depends on the value of
16664|n=hang_after|; if |n>=0|, hanging indentation takes place on lines |n+1|,
16665|n+2|, \dots, otherwise it takes place on lines 1, \dots, $\vert
16666n\vert$. When hanging indentation is active, the left margin is
16667|hang_indent|, if |hang_indent>=0|, else it is 0; the line length is
16668$|hsize|-\vert|hang_indent|\vert$. The normal setting is
16669|par_shape_ptr=null|, |hang_after=1|, and |hang_indent=0|.
16670Note that if |hang_indent=0|, the value of |hang_after| is irrelevant.
16671@^length of lines@> @^hanging indentation@>
16672
16673@<Glob...@>=
16674@!easy_line:halfword; {line numbers |>easy_line| are equivalent in break nodes}
16675@!last_special_line:halfword; {line numbers |>last_special_line| all have
16676  the same width}
16677@!first_width:scaled; {the width of all lines |<=last_special_line|, if
16678  no \.{\\parshape} has been specified}
16679@!second_width:scaled; {the width of all lines |>last_special_line|}
16680@!first_indent:scaled; {left margin to go with |first_width|}
16681@!second_indent:scaled; {left margin to go with |second_width|}
16682
16683@ We compute the values of |easy_line| and the other local variables relating
16684to line length when the |line_break| procedure is initializing itself.
16685
16686@<Get ready to start...@>=
16687if par_shape_ptr=null then
16688  if hang_indent=0 then
16689    begin last_special_line:=0; second_width:=hsize;
16690    second_indent:=0;
16691    end
16692  else @<Set line length parameters in preparation for hanging indentation@>
16693else  begin last_special_line:=info(par_shape_ptr)-1;
16694  second_width:=mem[par_shape_ptr+2*(last_special_line+1)].sc;
16695  second_indent:=mem[par_shape_ptr+2*last_special_line+1].sc;
16696  end;
16697if looseness=0 then easy_line:=last_special_line
16698else easy_line:=max_halfword
16699
16700@ @<Set line length parameters in preparation for hanging indentation@>=
16701begin last_special_line:=abs(hang_after);
16702if hang_after<0 then
16703  begin first_width:=hsize-abs(hang_indent);
16704  if hang_indent>=0 then first_indent:=hang_indent
16705  else first_indent:=0;
16706  second_width:=hsize; second_indent:=0;
16707  end
16708else  begin first_width:=hsize; first_indent:=0;
16709  second_width:=hsize-abs(hang_indent);
16710  if hang_indent>=0 then second_indent:=hang_indent
16711  else second_indent:=0;
16712  end;
16713end
16714
16715@ When we come to the following code, we have just encountered the first
16716active node~|r| whose |line_number| field contains |l|. Thus we want to
16717compute the length of the $l\mskip1mu$th line of the current paragraph. Furthermore,
16718we want to set |old_l| to the last number in the class of line numbers
16719equivalent to~|l|.
16720
16721@<Compute the new line width@>=
16722if l>easy_line then
16723  begin line_width:=second_width; old_l:=max_halfword-1;
16724  end
16725else  begin old_l:=l;
16726  if l>last_special_line then line_width:=second_width
16727  else if par_shape_ptr=null then line_width:=first_width
16728  else line_width:=mem[par_shape_ptr+2*l@,].sc;
16729  end
16730
16731@ The remaining part of |try_break| deals with the calculation of
16732demerits for a break from |r| to |cur_p|.
16733
16734The first thing to do is calculate the badness, |b|. This value will always
16735be between zero and |inf_bad+1|; the latter value occurs only in the
16736case of lines from |r| to |cur_p| that cannot shrink enough to fit the necessary
16737width. In such cases, node |r| will be deactivated.
16738We also deactivate node~|r| when a break at~|cur_p| is forced, since future
16739breaks must go through a forced break.
16740
16741@<Consider the demerits for a line from |r| to |cur_p|...@>=
16742begin artificial_demerits:=false;@/
16743@^inner loop@>
16744shortfall:=line_width-cur_active_width[1]; {we're this much too short}
16745if shortfall>0 then
16746  @<Set the value of |b| to the badness for stretching the line,
16747    and compute the corresponding |fit_class|@>
16748else @<Set the value of |b| to the badness for shrinking the line,
16749    and compute the corresponding |fit_class|@>;
16750if (b>inf_bad)or(pi=eject_penalty) then
16751  @<Prepare to deactivate node~|r|, and |goto deactivate| unless
16752    there is a reason to consider lines of text from |r| to |cur_p|@>
16753else  begin prev_r:=r;
16754  if b>threshold then goto continue;
16755  node_r_stays_active:=true;
16756  end;
16757@<Record a new feasible break@>;
16758if node_r_stays_active then goto continue; {|prev_r| has been set to |r|}
16759deactivate: @<Deactivate node |r|@>;
16760end
16761
16762@ When a line must stretch, the available stretchability can be found in the
16763subarray |cur_active_width[2..5]|, in units of points, fil, fill, and filll.
16764
16765The present section is part of \TeX's inner loop, and it is most often performed
16766when the badness is infinite; therefore it is worth while to make a quick
16767test for large width excess and small stretchability, before calling the
16768|badness| subroutine.
16769@^inner loop@>
16770
16771@<Set the value of |b| to the badness for stretching...@>=
16772if (cur_active_width[3]<>0)or(cur_active_width[4]<>0)or@|
16773  (cur_active_width[5]<>0) then
16774  begin b:=0; fit_class:=decent_fit; {infinite stretch}
16775  end
16776else  begin if shortfall>7230584 then if cur_active_width[2]<1663497 then
16777    begin b:=inf_bad; fit_class:=very_loose_fit; goto done1;
16778    end;
16779  b:=badness(shortfall,cur_active_width[2]);
16780  if b>12 then
16781    if b>99 then fit_class:=very_loose_fit
16782    else fit_class:=loose_fit
16783  else fit_class:=decent_fit;
16784  done1:
16785  end
16786
16787@ Shrinkability is never infinite in a paragraph;
16788we can shrink the line from |r| to |cur_p| by at most |cur_active_width[6]|.
16789
16790@<Set the value of |b| to the badness for shrinking...@>=
16791begin if -shortfall>cur_active_width[6] then b:=inf_bad+1
16792else b:=badness(-shortfall,cur_active_width[6]);
16793if b>12 then fit_class:=tight_fit@+else fit_class:=decent_fit;
16794end
16795
16796@ During the final pass, we dare not lose all active nodes, lest we lose
16797touch with the line breaks already found. The code shown here makes sure
16798that such a catastrophe does not happen, by permitting overfull boxes as
16799a last resort. This particular part of \TeX\ was a source of several subtle
16800bugs before the correct program logic was finally discovered; readers
16801who seek to ``improve'' \TeX\ should therefore think thrice before daring
16802to make any changes here.
16803@^overfull boxes@>
16804
16805@<Prepare to deactivate node~|r|, and |goto deactivate| unless...@>=
16806begin if final_pass and (minimum_demerits=awful_bad) and@|
16807   (link(r)=last_active) and
16808   (prev_r=active) then
16809  artificial_demerits:=true {set demerits zero, this break is forced}
16810else if b>threshold then goto deactivate;
16811node_r_stays_active:=false;
16812end
16813
16814@ When we get to this part of the code, the line from |r| to |cur_p| is
16815feasible, its badness is~|b|, and its fitness classification is |fit_class|.
16816We don't want to make an active node for this break yet, but we will
16817compute the total demerits and record them in the |minimal_demerits| array,
16818if such a break is the current champion among all ways to get to |cur_p|
16819in a given line-number class and fitness class.
16820
16821@<Record a new feasible break@>=
16822if artificial_demerits then d:=0
16823else @<Compute the demerits, |d|, from |r| to |cur_p|@>;
16824@!stat if tracing_paragraphs>0 then
16825  @<Print a symbolic description of this feasible break@>;
16826tats@;@/
16827d:=d+total_demerits(r); {this is the minimum total demerits
16828  from the beginning to |cur_p| via |r|}
16829if d<=minimal_demerits[fit_class] then
16830  begin minimal_demerits[fit_class]:=d;
16831  best_place[fit_class]:=break_node(r); best_pl_line[fit_class]:=l;
16832  if d<minimum_demerits then minimum_demerits:=d;
16833  end
16834
16835@ @<Print a symbolic description of this feasible break@>=
16836begin if printed_node<>cur_p then
16837  @<Print the list between |printed_node| and |cur_p|,
16838    then set |printed_node:=cur_p|@>;
16839print_nl("@@");
16840@.\AT!@>
16841if cur_p=null then print_esc("par")
16842else if type(cur_p)<>glue_node then
16843  begin if type(cur_p)=penalty_node then print_esc("penalty")
16844  else if type(cur_p)=disc_node then print_esc("discretionary")
16845  else if type(cur_p)=kern_node then print_esc("kern")
16846  else print_esc("math");
16847  end;
16848print(" via @@@@");
16849if break_node(r)=null then print_char("0")
16850else print_int(serial(break_node(r)));
16851print(" b=");
16852if b>inf_bad then print_char("*")@+else print_int(b);
16853@.*\relax@>
16854print(" p="); print_int(pi); print(" d=");
16855if artificial_demerits then print_char("*")@+else print_int(d);
16856end
16857
16858@ @<Print the list between |printed_node| and |cur_p|...@>=
16859begin print_nl("");
16860if cur_p=null then short_display(link(printed_node))
16861else  begin save_link:=link(cur_p);
16862  link(cur_p):=null; print_nl(""); short_display(link(printed_node));
16863  link(cur_p):=save_link;
16864  end;
16865printed_node:=cur_p;
16866end
16867
16868@ When the data for a discretionary break is being displayed, we will have
16869printed the |pre_break| and |post_break| lists; we want to skip over the
16870third list, so that the discretionary data will not appear twice.  The
16871following code is performed at the very end of |try_break|.
16872
16873@<Update the value of |printed_node|...@>=
16874if cur_p=printed_node then if cur_p<>null then if type(cur_p)=disc_node then
16875  begin t:=replace_count(cur_p);
16876  while t>0 do
16877    begin decr(t); printed_node:=link(printed_node);
16878    end;
16879  end
16880
16881@ @<Compute the demerits, |d|, from |r| to |cur_p|@>=
16882begin d:=line_penalty+b;
16883if abs(d)>=10000 then d:=100000000@+else d:=d*d;
16884if pi<>0 then
16885  if pi>0 then d:=d+pi*pi
16886  else if pi>eject_penalty then d:=d-pi*pi;
16887if (break_type=hyphenated)and(type(r)=hyphenated) then
16888  if cur_p<>null then d:=d+double_hyphen_demerits
16889  else d:=d+final_hyphen_demerits;
16890if abs(fit_class-fitness(r))>1 then d:=d+adj_demerits;
16891end
16892
16893@ When an active node disappears, we must delete an adjacent delta node if the
16894active node was at the beginning or the end of the active list, or if it
16895was surrounded by delta nodes. We also must preserve the property that
16896|cur_active_width| represents the length of material from |link(prev_r)|
16897to~|cur_p|.
16898
16899@d combine_two_deltas(#)==@|mem[prev_r+#].sc:=mem[prev_r+#].sc+mem[r+#].sc
16900@d downdate_width(#)==@|cur_active_width[#]:=cur_active_width[#]-
16901  mem[prev_r+#].sc
16902
16903@<Deactivate node |r|@>=
16904link(prev_r):=link(r); free_node(r,active_node_size);
16905if prev_r=active then @<Update the active widths, since the first active
16906  node has been deleted@>
16907else if type(prev_r)=delta_node then
16908  begin r:=link(prev_r);
16909  if r=last_active then
16910    begin do_all_six(downdate_width);
16911    link(prev_prev_r):=last_active;
16912    free_node(prev_r,delta_node_size); prev_r:=prev_prev_r;
16913    end
16914  else if type(r)=delta_node then
16915    begin do_all_six(update_width);
16916    do_all_six(combine_two_deltas);
16917    link(prev_r):=link(r); free_node(r,delta_node_size);
16918    end;
16919  end
16920
16921@ The following code uses the fact that |type(last_active)<>delta_node|. If the
16922active list has just become empty, we do not need to update the
16923|active_width| array, since it will be initialized when an active
16924node is next inserted.
16925
16926@d update_active(#)==active_width[#]:=active_width[#]+mem[r+#].sc
16927
16928@<Update the active widths,...@>=
16929begin r:=link(active);
16930if type(r)=delta_node then
16931  begin do_all_six(update_active);
16932  do_all_six(copy_to_cur_active);
16933  link(active):=link(r); free_node(r,delta_node_size);
16934  end;
16935end
16936
16937@* \[39] Breaking paragraphs into lines, continued.
16938So far we have gotten a little way into the |line_break| routine, having
16939covered its important |try_break| subroutine. Now let's consider the
16940rest of the process.
16941
16942The main loop of |line_break| traverses the given hlist,
16943starting at |link(temp_head)|, and calls |try_break| at each legal
16944breakpoint. A variable called |auto_breaking| is set to true except
16945within math formulas, since glue nodes are not legal breakpoints when
16946they appear in formulas.
16947
16948The current node of interest in the hlist is pointed to by |cur_p|. Another
16949variable, |prev_p|, is usually one step behind |cur_p|, but the real
16950meaning of |prev_p| is this: If |type(cur_p)=glue_node| then |cur_p| is a legal
16951breakpoint if and only if |auto_breaking| is true and |prev_p| does not
16952point to a glue node, penalty node, explicit kern node, or math node.
16953
16954The following declarations provide for a few other local variables that are
16955used in special calculations.
16956
16957@<Local variables for line breaking@>=
16958@!auto_breaking:boolean; {is node |cur_p| outside a formula?}
16959@!prev_p:pointer; {helps to determine when glue nodes are breakpoints}
16960@!q,@!r,@!s,@!prev_s:pointer; {miscellaneous nodes of temporary interest}
16961@!f:internal_font_number; {used when calculating character widths}
16962
16963@ The `\ignorespaces|loop|\unskip' in the following code is performed at most
16964thrice per call of |line_break|, since it is actually a pass over the
16965entire paragraph.
16966
16967@<Find optimal breakpoints@>=
16968threshold:=pretolerance;
16969if threshold>=0 then
16970  begin @!stat if tracing_paragraphs>0 then
16971    begin begin_diagnostic; print_nl("@@firstpass");@+end;@;@+tats@;@/
16972  second_pass:=false; final_pass:=false;
16973  end
16974else  begin threshold:=tolerance; second_pass:=true;
16975  final_pass:=(emergency_stretch<=0);
16976  @!stat if tracing_paragraphs>0 then begin_diagnostic;@+tats@;
16977  end;
16978loop@+  begin if threshold>inf_bad then threshold:=inf_bad;
16979  if second_pass then @<Initialize for hyphenating a paragraph@>;
16980  @<Create an active breakpoint representing the beginning of the paragraph@>;
16981  cur_p:=link(temp_head); auto_breaking:=true;@/
16982  prev_p:=cur_p; {glue at beginning is not a legal breakpoint}
16983  while (cur_p<>null)and(link(active)<>last_active) do
16984    @<Call |try_break| if |cur_p| is a legal breakpoint;
16985    on the second pass, also try to hyphenate the next
16986    word, if |cur_p| is a glue node;
16987    then advance |cur_p| to the next node of the paragraph
16988    that could possibly be a legal breakpoint@>;
16989  if cur_p=null then
16990    @<Try the final line break at the end of the paragraph,
16991    and |goto done| if the desired breakpoints have been found@>;
16992  @<Clean up the memory by removing the break nodes@>;
16993  if not second_pass then
16994    begin@!stat if tracing_paragraphs>0 then print_nl("@@secondpass");@;@+tats@/
16995    threshold:=tolerance; second_pass:=true; final_pass:=(emergency_stretch<=0);
16996    end {if at first you don't succeed, \dots}
16997  else begin @!stat if tracing_paragraphs>0 then
16998      print_nl("@@emergencypass");@;@+tats@/
16999    background[2]:=background[2]+emergency_stretch; final_pass:=true;
17000    end;
17001  end;
17002done: @!stat if tracing_paragraphs>0 then
17003  begin end_diagnostic(true); normalize_selector;
17004  end;@+tats@/
17005
17006@ The active node that represents the starting point does not need a
17007corresponding passive node.
17008
17009@d store_background(#)==active_width[#]:=background[#]
17010
17011@<Create an active breakpoint representing the beginning of the paragraph@>=
17012q:=get_node(active_node_size);
17013type(q):=unhyphenated; fitness(q):=decent_fit;
17014link(q):=last_active; break_node(q):=null;
17015line_number(q):=prev_graf+1; total_demerits(q):=0; link(active):=q;
17016do_all_six(store_background);@/
17017passive:=null; printed_node:=temp_head; pass_number:=0;
17018font_in_short_display:=null_font
17019
17020@ @<Clean...@>=
17021q:=link(active);
17022while q<>last_active do
17023  begin cur_p:=link(q);
17024  if type(q)=delta_node then free_node(q,delta_node_size)
17025  else free_node(q,active_node_size);
17026  q:=cur_p;
17027  end;
17028q:=passive;
17029while q<>null do
17030  begin cur_p:=link(q);
17031  free_node(q,passive_node_size);
17032  q:=cur_p;
17033  end
17034
17035@ Here is the main switch in the |line_break| routine, where legal breaks
17036are determined. As we move through the hlist, we need to keep the |active_width|
17037array up to date, so that the badness of individual lines is readily calculated
17038by |try_break|. It is convenient to use the short name |act_width| for
17039the component of active width that represents real width as opposed to glue.
17040
17041@d act_width==active_width[1] {length from first active node to current node}
17042@d kern_break==begin if not is_char_node(link(cur_p)) and auto_breaking then
17043    if type(link(cur_p))=glue_node then try_break(0,unhyphenated);
17044  act_width:=act_width+width(cur_p);
17045  end
17046
17047@<Call |try_break| if |cur_p| is a legal breakpoint...@>=
17048begin if is_char_node(cur_p) then
17049  @<Advance \(c)|cur_p| to the node following the present
17050    string of characters@>;
17051case type(cur_p) of
17052hlist_node,vlist_node,rule_node: act_width:=act_width+width(cur_p);
17053whatsit_node: @<Advance \(p)past a whatsit node in the \(l)|line_break| loop@>;
17054glue_node: begin @<If node |cur_p| is a legal breakpoint, call |try_break|;
17055  then update the active widths by including the glue in |glue_ptr(cur_p)|@>;
17056  if second_pass and auto_breaking then
17057    @<Try to hyphenate the following word@>;
17058  end;
17059kern_node: if subtype(cur_p)=explicit then kern_break
17060  else act_width:=act_width+width(cur_p);
17061ligature_node: begin f:=font(lig_char(cur_p));
17062  act_width:=act_width+char_width(f)(char_info(f)(character(lig_char(cur_p))));
17063  end;
17064disc_node: @<Try to break after a discretionary fragment, then |goto done5|@>;
17065math_node: begin auto_breaking:=(subtype(cur_p)=after); kern_break;
17066  end;
17067penalty_node: try_break(penalty(cur_p),unhyphenated);
17068mark_node,ins_node,adjust_node: do_nothing;
17069othercases confusion("paragraph")
17070@:this can't happen paragraph}{\quad paragraph@>
17071endcases;@/
17072prev_p:=cur_p; cur_p:=link(cur_p);
17073done5:end
17074
17075@ The code that passes over the characters of words in a paragraph is
17076part of \TeX's inner loop, so it has been streamlined for speed. We use
17077the fact that `\.{\\parfillskip}' glue appears at the end of each paragraph;
17078it is therefore unnecessary to check if |link(cur_p)=null| when |cur_p| is a
17079character node.
17080@^inner loop@>
17081
17082@<Advance \(c)|cur_p| to the node following the present string...@>=
17083begin prev_p:=cur_p;
17084repeat f:=font(cur_p);
17085act_width:=act_width+char_width(f)(char_info(f)(character(cur_p)));
17086cur_p:=link(cur_p);
17087until not is_char_node(cur_p);
17088end
17089
17090@ When node |cur_p| is a glue node, we look at |prev_p| to see whether or not
17091a breakpoint is legal at |cur_p|, as explained above.
17092
17093@<If node |cur_p| is a legal breakpoint, call...@>=
17094if auto_breaking then
17095  begin if is_char_node(prev_p) then try_break(0,unhyphenated)
17096  else if precedes_break(prev_p) then try_break(0,unhyphenated)
17097  else if (type(prev_p)=kern_node)and(subtype(prev_p)<>explicit) then
17098    try_break(0,unhyphenated);
17099  end;
17100check_shrinkage(glue_ptr(cur_p)); q:=glue_ptr(cur_p);
17101act_width:=act_width+width(q);@|
17102active_width[2+stretch_order(q)]:=@|
17103  active_width[2+stretch_order(q)]+stretch(q);@/
17104active_width[6]:=active_width[6]+shrink(q)
17105
17106@ The following code knows that discretionary texts contain
17107only character nodes, kern nodes, box nodes, rule nodes, and ligature nodes.
17108
17109@<Try to break after a discretionary fragment...@>=
17110begin s:=pre_break(cur_p); disc_width:=0;
17111if s=null then try_break(ex_hyphen_penalty,hyphenated)
17112else  begin repeat @<Add the width of node |s| to |disc_width|@>;
17113    s:=link(s);
17114  until s=null;
17115  act_width:=act_width+disc_width;
17116  try_break(hyphen_penalty,hyphenated);
17117  act_width:=act_width-disc_width;
17118  end;
17119r:=replace_count(cur_p); s:=link(cur_p);
17120while r>0 do
17121  begin @<Add the width of node |s| to |act_width|@>;
17122  decr(r); s:=link(s);
17123  end;
17124prev_p:=cur_p; cur_p:=s; goto done5;
17125end
17126
17127@ @<Add the width of node |s| to |disc_width|@>=
17128if is_char_node(s) then
17129  begin f:=font(s);
17130  disc_width:=disc_width+char_width(f)(char_info(f)(character(s)));
17131  end
17132else  case type(s) of
17133  ligature_node: begin f:=font(lig_char(s));
17134    disc_width:=disc_width+
17135      char_width(f)(char_info(f)(character(lig_char(s))));
17136    end;
17137  hlist_node,vlist_node,rule_node,kern_node:
17138    disc_width:=disc_width+width(s);
17139  othercases confusion("disc3")
17140@:this can't happen disc3}{\quad disc3@>
17141  endcases
17142
17143@ @<Add the width of node |s| to |act_width|@>=
17144if is_char_node(s) then
17145  begin f:=font(s);
17146  act_width:=act_width+char_width(f)(char_info(f)(character(s)));
17147  end
17148else  case type(s) of
17149  ligature_node: begin f:=font(lig_char(s));
17150    act_width:=act_width+
17151      char_width(f)(char_info(f)(character(lig_char(s))));
17152    end;
17153  hlist_node,vlist_node,rule_node,kern_node:
17154    act_width:=act_width+width(s);
17155  othercases confusion("disc4")
17156@:this can't happen disc4}{\quad disc4@>
17157  endcases
17158
17159@ The forced line break at the paragraph's end will reduce the list of
17160breakpoints so that all active nodes represent breaks at |cur_p=null|.
17161On the first pass, we insist on finding an active node that has the
17162correct ``looseness.'' On the final pass, there will be at least one active
17163node, and we will match the desired looseness as well as we can.
17164
17165The global variable |best_bet| will be set to the active node for the best
17166way to break the paragraph, and a few other variables are used to
17167help determine what is best.
17168
17169@<Glob...@>=
17170@!best_bet:pointer; {use this passive node and its predecessors}
17171@!fewest_demerits:integer; {the demerits associated with |best_bet|}
17172@!best_line:halfword; {line number following the last line of the new paragraph}
17173@!actual_looseness:integer; {the difference between |line_number(best_bet)|
17174  and the optimum |best_line|}
17175@!line_diff:integer; {the difference between the current line number and
17176  the optimum |best_line|}
17177
17178@ @<Try the final line break at the end of the paragraph...@>=
17179begin try_break(eject_penalty,hyphenated);
17180if link(active)<>last_active then
17181  begin @<Find an active node with fewest demerits@>;
17182  if looseness=0 then goto done;
17183  @<Find the best active node for the desired looseness@>;
17184  if (actual_looseness=looseness)or final_pass then goto done;
17185  end;
17186end
17187
17188@ @<Find an active node...@>=
17189r:=link(active); fewest_demerits:=awful_bad;
17190repeat if type(r)<>delta_node then if total_demerits(r)<fewest_demerits then
17191  begin fewest_demerits:=total_demerits(r); best_bet:=r;
17192  end;
17193r:=link(r);
17194until r=last_active;
17195best_line:=line_number(best_bet)
17196
17197@ The adjustment for a desired looseness is a slightly more complicated
17198version of the loop just considered. Note that if a paragraph is broken
17199into segments by displayed equations, each segment will be subject to the
17200looseness calculation, independently of the other segments.
17201
17202@<Find the best active node...@>=
17203begin r:=link(active); actual_looseness:=0;
17204repeat if type(r)<>delta_node then
17205  begin line_diff:=line_number(r)-best_line;
17206  if ((line_diff<actual_looseness)and(looseness<=line_diff))or@|
17207  ((line_diff>actual_looseness)and(looseness>=line_diff)) then
17208    begin best_bet:=r; actual_looseness:=line_diff;
17209    fewest_demerits:=total_demerits(r);
17210    end
17211  else if (line_diff=actual_looseness)and@|
17212    (total_demerits(r)<fewest_demerits) then
17213    begin best_bet:=r; fewest_demerits:=total_demerits(r);
17214    end;
17215  end;
17216r:=link(r);
17217until r=last_active;
17218best_line:=line_number(best_bet);
17219end
17220
17221@ Once the best sequence of breakpoints has been found (hurray), we call on the
17222procedure |post_line_break| to finish the remainder of the work.
17223(By introducing this subprocedure, we are able to keep |line_break|
17224from getting extremely long.)
17225
17226@<Break the paragraph at the chosen...@>=
17227post_line_break(final_widow_penalty)
17228
17229@ The total number of lines that will be set by |post_line_break|
17230is |best_line-prev_graf-1|. The last breakpoint is specified by
17231|break_node(best_bet)|, and this passive node points to the other breakpoints
17232via the |prev_break| links. The finishing-up phase starts by linking the
17233relevant passive nodes in forward order, changing |prev_break| to
17234|next_break|. (The |next_break| fields actually reside in the same memory
17235space as the |prev_break| fields did, but we give them a new name because
17236of their new significance.) Then the lines are justified, one by one.
17237
17238@d next_break==prev_break {new name for |prev_break| after links are reversed}
17239
17240@<Declare subprocedures for |line_break|@>=
17241procedure post_line_break(@!final_widow_penalty:integer);
17242label done,done1;
17243var q,@!r,@!s:pointer; {temporary registers for list manipulation}
17244@!disc_break:boolean; {was the current break at a discretionary node?}
17245@!post_disc_break:boolean; {and did it have a nonempty post-break part?}
17246@!cur_width:scaled; {width of line number |cur_line|}
17247@!cur_indent:scaled; {left margin of line number |cur_line|}
17248@!t:quarterword; {used for replacement counts in discretionary nodes}
17249@!pen:integer; {use when calculating penalties between lines}
17250@!cur_line: halfword; {the current line number being justified}
17251begin @<Reverse the links of the relevant passive nodes, setting |cur_p| to the
17252  first breakpoint@>;
17253cur_line:=prev_graf+1;
17254repeat @<Justify the line ending at breakpoint |cur_p|, and append it to the
17255  current vertical list, together with associated penalties and other
17256  insertions@>;
17257incr(cur_line); cur_p:=next_break(cur_p);
17258if cur_p<>null then if not post_disc_break then
17259  @<Prune unwanted nodes at the beginning of the next line@>;
17260until cur_p=null;
17261if (cur_line<>best_line)or(link(temp_head)<>null) then
17262  confusion("line breaking");
17263@:this can't happen line breaking}{\quad line breaking@>
17264prev_graf:=best_line-1;
17265end;
17266
17267@ The job of reversing links in a list is conveniently regarded as the job
17268of taking items off one stack and putting them on another. In this case we
17269take them off a stack pointed to by |q| and having |prev_break| fields;
17270we put them on a stack pointed to by |cur_p| and having |next_break| fields.
17271Node |r| is the passive node being moved from stack to stack.
17272
17273@<Reverse the links of the relevant passive nodes...@>=
17274q:=break_node(best_bet); cur_p:=null;
17275repeat r:=q; q:=prev_break(q); next_break(r):=cur_p; cur_p:=r;
17276until q=null
17277
17278@ Glue and penalty and kern and math nodes are deleted at the beginning of
17279a line, except in the anomalous case that the node to be deleted is actually
17280one of the chosen breakpoints. Otherwise
17281the pruning done here is designed to match
17282the lookahead computation in |try_break|, where the |break_width| values
17283are computed for non-discretionary breakpoints.
17284
17285@<Prune unwanted nodes at the beginning of the next line@>=
17286begin r:=temp_head;
17287loop@+  begin q:=link(r);
17288  if q=cur_break(cur_p) then goto done1;
17289    {|cur_break(cur_p)| is the next breakpoint}
17290  {now |q| cannot be |null|}
17291  if is_char_node(q) then goto done1;
17292  if non_discardable(q) then goto done1;
17293  if type(q)=kern_node then if subtype(q)<>explicit then goto done1;
17294  r:=q; {now |type(q)=glue_node|, |kern_node|, |math_node| or |penalty_node|}
17295  end;
17296done1: if r<>temp_head then
17297  begin link(r):=null; flush_node_list(link(temp_head));
17298  link(temp_head):=q;
17299  end;
17300end
17301
17302@ The current line to be justified appears in a horizontal list starting
17303at |link(temp_head)| and ending at |cur_break(cur_p)|. If |cur_break(cur_p)| is
17304a glue node, we reset the glue to equal the |right_skip| glue; otherwise
17305we append the |right_skip| glue at the right. If |cur_break(cur_p)| is a
17306discretionary node, we modify the list so that the discretionary break
17307is compulsory, and we set |disc_break| to |true|. We also append
17308the |left_skip| glue at the left of the line, unless it is zero.
17309
17310@<Justify the line ending at breakpoint |cur_p|, and append it...@>=
17311@<Modify the end of the line to reflect the nature of the break and to include
17312  \.{\\rightskip}; also set the proper value of |disc_break|@>;
17313@<Put the \(l)\.{\\leftskip} glue at the left and detach this line@>;
17314@<Call the packaging subroutine, setting |just_box| to the justified box@>;
17315@<Append the new box to the current vertical list, followed by the list of
17316  special nodes taken out of the box by the packager@>;
17317@<Append a penalty node, if a nonzero penalty is appropriate@>
17318
17319@ At the end of the following code, |q| will point to the final node on the
17320list about to be justified.
17321
17322@<Modify the end of the line...@>=
17323q:=cur_break(cur_p); disc_break:=false; post_disc_break:=false;
17324if q<>null then {|q| cannot be a |char_node|}
17325  if type(q)=glue_node then
17326    begin delete_glue_ref(glue_ptr(q));
17327    glue_ptr(q):=right_skip;
17328    subtype(q):=right_skip_code+1; add_glue_ref(right_skip);
17329    goto done;
17330    end
17331  else  begin if type(q)=disc_node then
17332      @<Change discretionary to compulsory and set
17333        |disc_break:=true|@>
17334    else if (type(q)=math_node)or(type(q)=kern_node) then width(q):=0;
17335    end
17336else  begin q:=temp_head;
17337  while link(q)<>null do q:=link(q);
17338  end;
17339@<Put the \(r)\.{\\rightskip} glue after node |q|@>;
17340done:
17341
17342@ @<Change discretionary to compulsory...@>=
17343begin t:=replace_count(q);
17344@<Destroy the |t| nodes following |q|, and
17345   make |r| point to the following node@>;
17346if post_break(q)<>null then @<Transplant the post-break list@>;
17347if pre_break(q)<>null then @<Transplant the pre-break list@>;
17348link(q):=r; disc_break:=true;
17349end
17350
17351@ @<Destroy the |t| nodes following |q|...@>=
17352if t=0 then r:=link(q)
17353else  begin r:=q;
17354  while t>1 do
17355    begin r:=link(r); decr(t);
17356    end;
17357  s:=link(r);
17358  r:=link(s); link(s):=null;
17359  flush_node_list(link(q)); replace_count(q):=0;
17360  end
17361
17362@ We move the post-break list from inside node |q| to the main list by
17363re\-attaching it just before the present node |r|, then resetting |r|.
17364
17365@<Transplant the post-break list@>=
17366begin s:=post_break(q);
17367while link(s)<>null do s:=link(s);
17368link(s):=r; r:=post_break(q); post_break(q):=null; post_disc_break:=true;
17369end
17370
17371@ We move the pre-break list from inside node |q| to the main list by
17372re\-attaching it just after the present node |q|, then resetting |q|.
17373
17374@<Transplant the pre-break list@>=
17375begin s:=pre_break(q); link(q):=s;
17376while link(s)<>null do s:=link(s);
17377pre_break(q):=null; q:=s;
17378end
17379
17380@ @<Put the \(r)\.{\\rightskip} glue after node |q|@>=
17381r:=new_param_glue(right_skip_code); link(r):=link(q); link(q):=r; q:=r
17382
17383@ The following code begins with |q| at the end of the list to be
17384justified. It ends with |q| at the beginning of that list, and with
17385|link(temp_head)| pointing to the remainder of the paragraph, if any.
17386
17387@<Put the \(l)\.{\\leftskip} glue at the left...@>=
17388r:=link(q); link(q):=null; q:=link(temp_head); link(temp_head):=r;
17389if left_skip<>zero_glue then
17390  begin r:=new_param_glue(left_skip_code);
17391  link(r):=q; q:=r;
17392  end
17393
17394@ @<Append the new box to the current vertical list...@>=
17395append_to_vlist(just_box);
17396if adjust_head<>adjust_tail then
17397  begin link(tail):=link(adjust_head); tail:=adjust_tail;
17398   end;
17399adjust_tail:=null
17400
17401@ Now |q| points to the hlist that represents the current line of the
17402paragraph. We need to compute the appropriate line width, pack the
17403line into a box of this size, and shift the box by the appropriate
17404amount of indentation.
17405
17406@<Call the packaging subroutine...@>=
17407if cur_line>last_special_line then
17408  begin cur_width:=second_width; cur_indent:=second_indent;
17409  end
17410else if par_shape_ptr=null then
17411  begin cur_width:=first_width; cur_indent:=first_indent;
17412  end
17413else  begin cur_width:=mem[par_shape_ptr+2*cur_line].sc;
17414  cur_indent:=mem[par_shape_ptr+2*cur_line-1].sc;
17415  end;
17416adjust_tail:=adjust_head; just_box:=hpack(q,cur_width,exactly);
17417shift_amount(just_box):=cur_indent
17418
17419@ Penalties between the lines of a paragraph come from club and widow lines,
17420from the |inter_line_penalty| parameter, and from lines that end at
17421discretionary breaks.  Breaking between lines of a two-line paragraph gets
17422both club-line and widow-line penalties. The local variable |pen| will
17423be set to the sum of all relevant penalties for the current line, except
17424that the final line is never penalized.
17425
17426@<Append a penalty node, if a nonzero penalty is appropriate@>=
17427if cur_line+1<>best_line then
17428  begin pen:=inter_line_penalty;
17429  if cur_line=prev_graf+1 then pen:=pen+club_penalty;
17430  if cur_line+2=best_line then pen:=pen+final_widow_penalty;
17431  if disc_break then pen:=pen+broken_penalty;
17432  if pen<>0 then
17433    begin r:=new_penalty(pen);
17434    link(tail):=r; tail:=r;
17435    end;
17436  end
17437
17438@* \[40] Pre-hyphenation.
17439When the line-breaking routine is unable to find a feasible sequence of
17440breakpoints, it makes a second pass over the paragraph, attempting to
17441hyphenate the hyphenatable words. The goal of hyphenation is to insert
17442discretionary material into the paragraph so that there are more
17443potential places to break.
17444
17445The general rules for hyphenation are somewhat complex and technical,
17446because we want to be able to hyphenate words that are preceded or
17447followed by punctuation marks, and because we want the rules to work
17448for languages other than English. We also must contend with the fact
17449that hyphens might radically alter the ligature and kerning structure
17450of a word.
17451
17452A sequence of characters will be considered for hyphenation only if it
17453belongs to a ``potentially hyphenatable part'' of the current paragraph.
17454This is a sequence of nodes $p_0p_1\ldots p_m$ where $p_0$ is a glue node,
17455$p_1\ldots p_{m-1}$ are either character or ligature or whatsit or
17456implicit kern nodes, and $p_m$ is a glue or penalty or insertion or adjust
17457or mark or whatsit or explicit kern node.  (Therefore hyphenation is
17458disabled by boxes, math formulas, and discretionary nodes already inserted
17459by the user.) The ligature nodes among $p_1\ldots p_{m-1}$ are effectively
17460expanded into the original non-ligature characters; the kern nodes and
17461whatsits are ignored. Each character |c| is now classified as either a
17462nonletter (if |lc_code(c)=0|), a lowercase letter (if
17463|lc_code(c)=c|), or an uppercase letter (otherwise); an uppercase letter
17464is treated as if it were |lc_code(c)| for purposes of hyphenation. The
17465characters generated by $p_1\ldots p_{m-1}$ may begin with nonletters; let
17466$c_1$ be the first letter that is not in the middle of a ligature. Whatsit
17467nodes preceding $c_1$ are ignored; a whatsit found after $c_1$ will be the
17468terminating node $p_m$. All characters that do not have the same font as
17469$c_1$ will be treated as nonletters. The |hyphen_char| for that font
17470must be between 0 and 255, otherwise hyphenation will not be attempted.
17471\TeX\ looks ahead for as many consecutive letters $c_1\ldots c_n$ as
17472possible; however, |n| must be less than 64, so a character that would
17473otherwise be $c_{64}$ is effectively not a letter. Furthermore $c_n$ must
17474not be in the middle of a ligature.  In this way we obtain a string of
17475letters $c_1\ldots c_n$ that are generated by nodes $p_a\ldots p_b$, where
17476|1<=a<=b+1<=m|. If |n>=l_hyf+r_hyf|, this string qualifies for hyphenation;
17477however, |uc_hyph| must be positive, if $c_1$ is uppercase.
17478
17479The hyphenation process takes place in three stages. First, the candidate
17480sequence $c_1\ldots c_n$ is found; then potential positions for hyphens
17481are determined by referring to hyphenation tables; and finally, the nodes
17482$p_a\ldots p_b$ are replaced by a new sequence of nodes that includes the
17483discretionary breaks found.
17484
17485Fortunately, we do not have to do all this calculation very often, because
17486of the way it has been taken out of \TeX's inner loop. For example, when
17487the second edition of the author's 700-page book {\sl Seminumerical
17488Algorithms} was typeset by \TeX, only about 1.2 hyphenations needed to be
17489@^Knuth, Donald Ervin@>
17490tried per paragraph, since the line breaking algorithm needed to use two
17491passes on only about 5 per cent of the paragraphs.
17492
17493@<Initialize for hyphenating...@>=
17494begin @!init if trie_not_ready then init_trie;@+tini@;@/
17495cur_lang:=init_cur_lang; l_hyf:=init_l_hyf; r_hyf:=init_r_hyf;
17496end
17497
17498@ The letters $c_1\ldots c_n$ that are candidates for hyphenation are placed
17499into an array called |hc|; the number |n| is placed into |hn|; pointers to
17500nodes $p_{a-1}$ and~$p_b$ in the description above are placed into variables
17501|ha| and |hb|; and the font number is placed into |hf|.
17502
17503@<Glob...@>=
17504@!hc:array[0..65] of 0..256; {word to be hyphenated}
17505@!hn:small_number; {the number of positions occupied in |hc|}
17506@!ha,@!hb:pointer; {nodes |ha..hb| should be replaced by the hyphenated result}
17507@!hf:internal_font_number; {font number of the letters in |hc|}
17508@!hu:array[0..63] of 0..256; {like |hc|, before conversion to lowercase}
17509@!hyf_char:integer; {hyphen character of the relevant font}
17510@!cur_lang,@!init_cur_lang:ASCII_code; {current hyphenation table of interest}
17511@!l_hyf,@!r_hyf,@!init_l_hyf,@!init_r_hyf:integer; {limits on fragment sizes}
17512@!hyf_bchar:halfword; {boundary character after $c_n$}
17513
17514@ Hyphenation routines need a few more local variables.
17515
17516@<Local variables for line...@>=
17517@!j:small_number; {an index into |hc| or |hu|}
17518@!c:0..255; {character being considered for hyphenation}
17519
17520@ When the following code is activated, the |line_break| procedure is in its
17521second pass, and |cur_p| points to a glue node.
17522
17523@<Try to hyphenate...@>=
17524begin prev_s:=cur_p; s:=link(prev_s);
17525if s<>null then
17526  begin @<Skip to node |ha|, or |goto done1| if no hyphenation
17527    should be attempted@>;
17528  if l_hyf+r_hyf>63 then goto done1;
17529  @<Skip to node |hb|, putting letters into |hu| and |hc|@>;
17530  @<Check that the nodes following |hb| permit hyphenation and that at least
17531    |l_hyf+r_hyf| letters have been found, otherwise |goto done1|@>;
17532  hyphenate;
17533  end;
17534done1: end
17535
17536@ @<Declare subprocedures for |line_break|@>=
17537@t\4@>@<Declare the function called |reconstitute|@>
17538procedure hyphenate;
17539label common_ending,done,found,found1,found2,not_found,exit;
17540var @<Local variables for hyphenation@>@;
17541begin @<Find hyphen locations for the word in |hc|, or |return|@>;
17542@<If no hyphens were found, |return|@>;
17543@<Replace nodes |ha..hb| by a sequence of nodes that includes
17544  the discretionary hyphens@>;
17545exit:end;
17546
17547@ The first thing we need to do is find the node |ha| just before the
17548first letter.
17549
17550@<Skip to node |ha|, or |goto done1|...@>=
17551loop@+  begin if is_char_node(s) then
17552    begin c:=qo(character(s)); hf:=font(s);
17553    end
17554  else if type(s)=ligature_node then
17555    if lig_ptr(s)=null then goto continue
17556    else begin q:=lig_ptr(s); c:=qo(character(q)); hf:=font(q);
17557      end
17558  else if (type(s)=kern_node)and(subtype(s)=normal) then goto continue
17559  else if type(s)=whatsit_node then
17560    begin @<Advance \(p)past a whatsit node in the \(p)pre-hyphenation loop@>;
17561    goto continue;
17562    end
17563  else goto done1;
17564  if lc_code(c)<>0 then
17565    if (lc_code(c)=c)or(uc_hyph>0) then goto done2
17566    else goto done1;
17567continue: prev_s:=s; s:=link(prev_s);
17568  end;
17569done2: hyf_char:=hyphen_char[hf];
17570if hyf_char<0 then goto done1;
17571if hyf_char>255 then goto done1;
17572ha:=prev_s
17573
17574@ The word to be hyphenated is now moved to the |hu| and |hc| arrays.
17575
17576@<Skip to node |hb|, putting letters...@>=
17577hn:=0;
17578loop@+  begin if is_char_node(s) then
17579    begin if font(s)<>hf then goto done3;
17580    hyf_bchar:=character(s); c:=qo(hyf_bchar);
17581    if lc_code(c)=0 then goto done3;
17582    if hn=63 then goto done3;
17583    hb:=s; incr(hn); hu[hn]:=c; hc[hn]:=lc_code(c); hyf_bchar:=non_char;
17584    end
17585  else if type(s)=ligature_node then
17586    @<Move the characters of a ligature node to |hu| and |hc|;
17587      but |goto done3| if they are not all letters@>
17588  else if (type(s)=kern_node)and(subtype(s)=normal) then
17589    begin hb:=s;
17590    hyf_bchar:=font_bchar[hf];
17591    end
17592  else goto done3;
17593  s:=link(s);
17594  end;
17595done3:
17596
17597@ We let |j| be the index of the character being stored when a ligature node
17598is being expanded, since we do not want to advance |hn| until we are sure
17599that the entire ligature consists of letters. Note that it is possible
17600to get to |done3| with |hn=0| and |hb| not set to any value.
17601
17602@<Move the characters of a ligature node to |hu| and |hc|...@>=
17603begin if font(lig_char(s))<>hf then goto done3;
17604j:=hn; q:=lig_ptr(s);@+if q>null then hyf_bchar:=character(q);
17605while q>null do
17606  begin c:=qo(character(q));
17607  if lc_code(c)=0 then goto done3;
17608  if j=63 then goto done3;
17609  incr(j); hu[j]:=c; hc[j]:=lc_code(c);@/
17610  q:=link(q);
17611  end;
17612hb:=s; hn:=j;
17613if odd(subtype(s)) then hyf_bchar:=font_bchar[hf]@+else hyf_bchar:=non_char;
17614end
17615
17616@ @<Check that the nodes following |hb| permit hyphenation...@>=
17617if hn<l_hyf+r_hyf then goto done1; {|l_hyf| and |r_hyf| are |>=1|}
17618loop@+  begin if not(is_char_node(s)) then
17619    case type(s) of
17620    ligature_node: do_nothing;
17621    kern_node: if subtype(s)<>normal then goto done4;
17622    whatsit_node,glue_node,penalty_node,ins_node,adjust_node,mark_node:
17623      goto done4;
17624    othercases goto done1
17625    endcases;
17626  s:=link(s);
17627  end;
17628done4:
17629
17630@* \[41] Post-hyphenation.
17631If a hyphen may be inserted between |hc[j]| and |hc[j+1]|, the hyphenation
17632procedure will set |hyf[j]| to some small odd number. But before we look
17633at \TeX's hyphenation procedure, which is independent of the rest of the
17634line-breaking algorithm, let us consider what we will do with the hyphens
17635it finds, since it is better to work on this part of the program before
17636forgetting what |ha| and |hb|, etc., are all about.
17637
17638@<Glob...@>=
17639@!hyf:array [0..64] of 0..9; {odd values indicate discretionary hyphens}
17640@!init_list:pointer; {list of punctuation characters preceding the word}
17641@!init_lig:boolean; {does |init_list| represent a ligature?}
17642@!init_lft:boolean; {if so, did the ligature involve a left boundary?}
17643
17644@ @<Local variables for hyphenation@>=
17645@!i,@!j,@!l:0..65; {indices into |hc| or |hu|}
17646@!q,@!r,@!s:pointer; {temporary registers for list manipulation}
17647@!bchar:halfword; {right boundary character of hyphenated word, or |non_char|}
17648
17649@ \TeX\ will never insert a hyphen that has fewer than
17650\.{\\lefthyphenmin} letters before it or fewer than
17651\.{\\righthyphenmin} after it; hence, a short word has
17652comparatively little chance of being hyphenated. If no hyphens have
17653been found, we can save time by not having to make any changes to the
17654paragraph.
17655
17656@<If no hyphens were found, |return|@>=
17657for j:=l_hyf to hn-r_hyf do if odd(hyf[j]) then goto found1;
17658return;
17659found1:
17660
17661@ If hyphens are in fact going to be inserted, \TeX\ first deletes the
17662subsequence of nodes between |ha| and~|hb|. An attempt is made to
17663preserve the effect that implicit boundary characters and punctuation marks
17664had on ligatures inside the hyphenated word, by storing a left boundary or
17665preceding character in |hu[0]| and by storing a possible right boundary
17666in |bchar|. We set |j:=0| if |hu[0]| is to be part of the reconstruction;
17667otherwise |j:=1|.
17668The variable |s| will point to the tail of the current hlist, and
17669|q| will point to the node following |hb|, so that
17670things can be hooked up after we reconstitute the hyphenated word.
17671
17672@<Replace nodes |ha..hb| by a sequence of nodes...@>=
17673q:=link(hb); link(hb):=null; r:=link(ha); link(ha):=null; bchar:=hyf_bchar;
17674if is_char_node(ha) then
17675  if font(ha)<>hf then goto found2
17676  else begin init_list:=ha; init_lig:=false; hu[0]:=qo(character(ha));
17677    end
17678else if type(ha)=ligature_node then
17679  if font(lig_char(ha))<>hf then goto found2
17680  else begin init_list:=lig_ptr(ha); init_lig:=true; init_lft:=(subtype(ha)>1);
17681    hu[0]:=qo(character(lig_char(ha)));
17682    if init_list=null then if init_lft then
17683      begin hu[0]:=256; init_lig:=false;
17684      end; {in this case a ligature will be reconstructed from scratch}
17685    free_node(ha,small_node_size);
17686    end
17687else begin {no punctuation found; look for left boundary}
17688  if not is_char_node(r) then if type(r)=ligature_node then
17689   if subtype(r)>1 then goto found2;
17690  j:=1; s:=ha; init_list:=null; goto common_ending;
17691  end;
17692s:=cur_p; {we have |cur_p<>ha| because |type(cur_p)=glue_node|}
17693while link(s)<>ha do s:=link(s);
17694j:=0; goto common_ending;
17695found2: s:=ha; j:=0; hu[0]:=256; init_lig:=false; init_list:=null;
17696common_ending: flush_node_list(r);
17697@<Reconstitute nodes for the hyphenated word, inserting discretionary hyphens@>;
17698flush_list(init_list)
17699
17700@ We must now face the fact that the battle is not over, even though the
17701{\def\!{\kern-1pt}%
17702hyphens have been found: The process of reconstituting a word can be nontrivial
17703because ligatures might change when a hyphen is present. {\sl The \TeX book\/}
17704discusses the difficulties of the word ``difficult'', and
17705the discretionary material surrounding a
17706hyphen can be considerably more complex than that. Suppose
17707\.{abcdef} is a word in a font for which the only ligatures are \.{b\!c},
17708\.{c\!d}, \.{d\!e}, and \.{e\!f}. If this word permits hyphenation
17709between \.b and \.c, the two patterns with and without hyphenation are
17710$\.a\,\.b\,\.-\,\.{c\!d}\,\.{e\!f}$ and $\.a\,\.{b\!c}\,\.{d\!e}\,\.f$.
17711Thus the insertion of a hyphen might cause effects to ripple arbitrarily
17712far into the rest of the word. A further complication arises if additional
17713hyphens appear together with such rippling, e.g., if the word in the
17714example just given could also be hyphenated between \.c and \.d; \TeX\
17715avoids this by simply ignoring the additional hyphens in such weird cases.}
17716
17717Still further complications arise in the presence of ligatures that do not
17718delete the original characters. When punctuation precedes the word being
17719hyphenated, \TeX's method is not perfect under all possible scenarios,
17720because punctuation marks and letters can propagate information back and forth.
17721For example, suppose the original pre-hyphenation pair
17722\.{*a} changes to \.{*y} via a \.{\?=:} ligature, which changes to \.{xy}
17723via a \.{=:\?} ligature; if $p_{a-1}=\.x$ and $p_a=\.y$, the reconstitution
17724procedure isn't smart enough to obtain \.{xy} again. In such cases the
17725font designer should include a ligature that goes from \.{xa} to \.{xy}.
17726
17727@ The processing is facilitated by a subroutine called |reconstitute|. Given
17728a string of characters $x_j\ldots x_n$, there is a smallest index $m\ge j$
17729such that the ``translation'' of $x_j\ldots x_n$ by ligatures and kerning
17730has the form $y_1\ldots y_t$ followed by the translation of $x_{m+1}\ldots x_n$,
17731where $y_1\ldots y_t$ is some nonempty sequence of character, ligature, and
17732kern nodes. We call $x_j\ldots x_m$ a ``cut prefix'' of $x_j\ldots x_n$.
17733For example, if $x_1x_2x_3=\.{fly}$, and if the font contains `fl' as a
17734ligature and a kern between `fl' and `y', then $m=2$, $t=2$, and $y_1$ will
17735be a ligature node for `fl' followed by an appropriate kern node~$y_2$.
17736In the most common case, $x_j$~forms no ligature with $x_{j+1}$ and we
17737simply have $m=j$, $y_1=x_j$. If $m<n$ we can repeat the procedure on
17738$x_{m+1}\ldots x_n$ until the entire translation has been found.
17739
17740The |reconstitute| function returns the integer $m$ and puts the nodes
17741$y_1\ldots y_t$ into a linked list starting at |link(hold_head)|,
17742getting the input $x_j\ldots x_n$ from the |hu| array. If $x_j=256$,
17743we consider $x_j$ to be an implicit left boundary character; in this
17744case |j| must be strictly less than~|n|. There is a
17745parameter |bchar|, which is either 256 or an implicit right boundary character
17746assumed to be present just following~$x_n$. (The value |hu[n+1]| is never
17747explicitly examined, but the algorithm imagines that |bchar| is there.)
17748
17749If there exists an index |k| in the range $j\le k\le m$ such that |hyf[k]|
17750is odd and such that the result of |reconstitute| would have been different
17751if $x_{k+1}$ had been |hchar|, then |reconstitute| sets |hyphen_passed|
17752to the smallest such~|k|. Otherwise it sets |hyphen_passed| to zero.
17753
17754A special convention is used in the case |j=0|: Then we assume that the
17755translation of |hu[0]| appears in a special list of charnodes starting at
17756|init_list|; moreover, if |init_lig| is |true|, then |hu[0]| will be
17757a ligature character, involving a left boundary if |init_lft| is |true|.
17758This facility is provided for cases when a hyphenated
17759word is preceded by punctuation (like single or double quotes) that might
17760affect the translation of the beginning of the word.
17761
17762@<Glob...@>=
17763@!hyphen_passed:small_number; {first hyphen in a ligature, if any}
17764
17765@ @<Declare the function called |reconstitute|@>=
17766function reconstitute(@!j,@!n:small_number;@!bchar,@!hchar:halfword):
17767  small_number;
17768label continue,done;
17769var @!p:pointer; {temporary register for list manipulation}
17770@!t:pointer; {a node being appended to}
17771@!q:four_quarters; {character information or a lig/kern instruction}
17772@!cur_rh:halfword; {hyphen character for ligature testing}
17773@!test_char:halfword; {hyphen or other character for ligature testing}
17774@!w:scaled; {amount of kerning}
17775@!k:font_index; {position of current lig/kern instruction}
17776begin hyphen_passed:=0; t:=hold_head; w:=0; link(hold_head):=null;
17777 {at this point |ligature_present=lft_hit=rt_hit=false|}
17778@<Set up data structures with the cursor following position |j|@>;
17779continue:@<If there's a ligature or kern at the cursor position, update the data
17780  structures, possibly advancing~|j|; continue until the cursor moves@>;
17781@<Append a ligature and/or kern to the translation;
17782  |goto continue| if the stack of inserted ligatures is nonempty@>;
17783reconstitute:=j;
17784end;
17785
17786@ The reconstitution procedure shares many of the global data structures
17787by which \TeX\ has processed the words before they were hyphenated.
17788There is an implied ``cursor'' between characters |cur_l| and |cur_r|;
17789these characters will be tested for possible ligature activity. If
17790|ligature_present| then |cur_l| is a ligature character formed from the
17791original characters following |cur_q| in the current translation list.
17792There is a ``ligature stack'' between the cursor and character |j+1|,
17793consisting of pseudo-ligature nodes linked together by their |link| fields.
17794This stack is normally empty unless a ligature command has created a new
17795character that will need to be processed later. A pseudo-ligature is
17796a special node having a |character| field that represents a potential
17797ligature and a |lig_ptr| field that points to a |char_node| or is |null|.
17798We have
17799$$|cur_r|=\cases{|character(lig_stack)|,&if |lig_stack>null|;\cr
17800  |qi(hu[j+1])|,&if |lig_stack=null| and |j<n|;\cr
17801  bchar,&if |lig_stack=null| and |j=n|.\cr}$$
17802
17803@<Glob...@>=
17804@!cur_l,@!cur_r:halfword; {characters before and after the cursor}
17805@!cur_q:pointer; {where a ligature should be detached}
17806@!lig_stack:pointer; {unfinished business to the right of the cursor}
17807@!ligature_present:boolean; {should a ligature node be made for |cur_l|?}
17808@!lft_hit,@!rt_hit:boolean; {did we hit a ligature with a boundary character?}
17809
17810@ @d append_charnode_to_t(#)== begin link(t):=get_avail; t:=link(t);
17811    font(t):=hf; character(t):=#;
17812    end
17813@d set_cur_r==begin if j<n then cur_r:=qi(hu[j+1])@+else cur_r:=bchar;
17814    if odd(hyf[j]) then cur_rh:=hchar@+else cur_rh:=non_char;
17815    end
17816
17817@<Set up data structures with the cursor following position |j|@>=
17818cur_l:=qi(hu[j]); cur_q:=t;
17819if j=0 then
17820  begin ligature_present:=init_lig; p:=init_list;
17821  if ligature_present then lft_hit:=init_lft;
17822  while p>null do
17823    begin append_charnode_to_t(character(p)); p:=link(p);
17824    end;
17825  end
17826else if cur_l<non_char then append_charnode_to_t(cur_l);
17827lig_stack:=null; set_cur_r
17828
17829@ We may want to look at the lig/kern program twice, once for a hyphen
17830and once for a normal letter. (The hyphen might appear after the letter
17831in the program, so we'd better not try to look for both at once.)
17832
17833@<If there's a ligature or kern at the cursor position, update...@>=
17834if cur_l=non_char then
17835  begin k:=bchar_label[hf];
17836  if k=non_address then goto done@+else q:=font_info[k].qqqq;
17837  end
17838else begin q:=char_info(hf)(cur_l);
17839  if char_tag(q)<>lig_tag then goto done;
17840  k:=lig_kern_start(hf)(q); q:=font_info[k].qqqq;
17841  if skip_byte(q)>stop_flag then
17842    begin k:=lig_kern_restart(hf)(q); q:=font_info[k].qqqq;
17843    end;
17844  end; {now |k| is the starting address of the lig/kern program}
17845if cur_rh<non_char then test_char:=cur_rh@+else test_char:=cur_r;
17846loop@+begin if next_char(q)=test_char then if skip_byte(q)<=stop_flag then
17847    if cur_rh<non_char then
17848      begin hyphen_passed:=j; hchar:=non_char; cur_rh:=non_char;
17849      goto continue;
17850      end
17851    else begin if hchar<non_char then if odd(hyf[j]) then
17852        begin hyphen_passed:=j; hchar:=non_char;
17853        end;
17854      if op_byte(q)<kern_flag then
17855      @<Carry out a ligature replacement, updating the cursor structure
17856        and possibly advancing~|j|; |goto continue| if the cursor doesn't
17857        advance, otherwise |goto done|@>;
17858      w:=char_kern(hf)(q); goto done; {this kern will be inserted below}
17859     end;
17860  if skip_byte(q)>=stop_flag then
17861    if cur_rh=non_char then goto done
17862    else begin cur_rh:=non_char; goto continue;
17863      end;
17864  k:=k+qo(skip_byte(q))+1; q:=font_info[k].qqqq;
17865  end;
17866done:
17867
17868@ @d wrap_lig(#)==if ligature_present then
17869    begin p:=new_ligature(hf,cur_l,link(cur_q));
17870    if lft_hit then
17871      begin subtype(p):=2; lft_hit:=false;
17872      end;
17873    if # then if lig_stack=null then
17874      begin incr(subtype(p)); rt_hit:=false;
17875      end;
17876    link(cur_q):=p; t:=p; ligature_present:=false;
17877    end
17878@d pop_lig_stack==begin if lig_ptr(lig_stack)>null then
17879    begin link(t):=lig_ptr(lig_stack); {this is a charnode for |hu[j+1]|}
17880    t:=link(t); incr(j);
17881    end;
17882  p:=lig_stack; lig_stack:=link(p); free_node(p,small_node_size);
17883  if lig_stack=null then set_cur_r@+else cur_r:=character(lig_stack);
17884  end {if |lig_stack| isn't |null| we have |cur_rh=non_char|}
17885
17886@<Append a ligature and/or kern to the translation...@>=
17887wrap_lig(rt_hit);
17888if w<>0 then
17889  begin link(t):=new_kern(w); t:=link(t); w:=0;
17890  end;
17891if lig_stack>null then
17892  begin cur_q:=t; cur_l:=character(lig_stack); ligature_present:=true;
17893  pop_lig_stack; goto continue;
17894  end
17895
17896@ @<Carry out a ligature replacement, updating the cursor structure...@>=
17897begin if cur_l=non_char then lft_hit:=true;
17898if j=n then if lig_stack=null then rt_hit:=true;
17899check_interrupt; {allow a way out in case there's an infinite ligature loop}
17900case op_byte(q) of
17901qi(1),qi(5):begin cur_l:=rem_byte(q); {\.{=:\?}, \.{=:\?>}}
17902  ligature_present:=true;
17903  end;
17904qi(2),qi(6):begin cur_r:=rem_byte(q); {\.{\?=:}, \.{\?=:>}}
17905  if lig_stack>null then character(lig_stack):=cur_r
17906  else begin lig_stack:=new_lig_item(cur_r);
17907    if j=n then bchar:=non_char
17908    else begin p:=get_avail; lig_ptr(lig_stack):=p;
17909      character(p):=qi(hu[j+1]); font(p):=hf;
17910      end;
17911    end;
17912  end;
17913qi(3):begin cur_r:=rem_byte(q); {\.{\?=:\?}}
17914  p:=lig_stack; lig_stack:=new_lig_item(cur_r); link(lig_stack):=p;
17915  end;
17916qi(7),qi(11):begin wrap_lig(false); {\.{\?=:\?>}, \.{\?=:\?>>}}
17917  cur_q:=t; cur_l:=rem_byte(q); ligature_present:=true;
17918  end;
17919othercases begin cur_l:=rem_byte(q); ligature_present:=true; {\.{=:}}
17920  if lig_stack>null then pop_lig_stack
17921  else if j=n then goto done
17922  else begin append_charnode_to_t(cur_r); incr(j); set_cur_r;
17923    end;
17924  end
17925endcases;
17926if op_byte(q)>qi(4) then if op_byte(q)<>qi(7) then goto done;
17927goto continue;
17928end
17929
17930@ Okay, we're ready to insert the potential hyphenations that were found.
17931When the following program is executed, we want to append the word
17932|hu[1..hn]| after node |ha|, and node |q| should be appended to the result.
17933During this process, the variable |i| will be a temporary
17934index into |hu|; the variable |j| will be an index to our current position
17935in |hu|; the variable |l| will be the counterpart of |j|, in a discretionary
17936branch; the variable |r| will point to new nodes being created; and
17937we need a few new local variables:
17938
17939@<Local variables for hyph...@>=
17940@!major_tail,@!minor_tail:pointer; {the end of lists in the main and
17941  discretionary branches being reconstructed}
17942@!c:ASCII_code; {character temporarily replaced by a hyphen}
17943@!c_loc:0..63; {where that character came from}
17944@!r_count:integer; {replacement count for discretionary}
17945@!hyf_node:pointer; {the hyphen, if it exists}
17946
17947@ When the following code is performed, |hyf[0]| and |hyf[hn]| will be zero.
17948
17949@<Reconstitute nodes for the hyphenated word...@>=
17950repeat l:=j; j:=reconstitute(j,hn,bchar,qi(hyf_char))+1;
17951if hyphen_passed=0 then
17952  begin link(s):=link(hold_head);
17953  while link(s)>null do s:=link(s);
17954  if odd(hyf[j-1]) then
17955    begin l:=j; hyphen_passed:=j-1; link(hold_head):=null;
17956    end;
17957  end;
17958if hyphen_passed>0 then
17959  @<Create and append a discretionary node as an alternative to the
17960    unhyphenated word, and continue to develop both branches until they
17961    become equivalent@>;
17962until j>hn;
17963link(s):=q
17964
17965@ In this repeat loop we will insert another discretionary if |hyf[j-1]| is
17966odd, when both branches of the previous discretionary end at position |j-1|.
17967Strictly speaking, we aren't justified in doing this, because we don't know
17968that a hyphen after |j-1| is truly independent of those branches. But in almost
17969all applications we would rather not lose a potentially valuable hyphenation
17970point. (Consider the word `difficult', where the letter `c' is in position |j|.)
17971
17972@d advance_major_tail==begin major_tail:=link(major_tail); incr(r_count);
17973    end
17974
17975@<Create and append a discretionary node as an alternative...@>=
17976repeat r:=get_node(small_node_size);
17977link(r):=link(hold_head); type(r):=disc_node;
17978major_tail:=r; r_count:=0;
17979while link(major_tail)>null do advance_major_tail;
17980i:=hyphen_passed; hyf[i]:=0;
17981@<Put the \(c)characters |hu[l..i]| and a hyphen into |pre_break(r)|@>;
17982@<Put the \(c)characters |hu[i+1..@,]| into |post_break(r)|, appending to this
17983  list and to |major_tail| until synchronization has been achieved@>;
17984@<Move pointer |s| to the end of the current list, and set |replace_count(r)|
17985  appropriately@>;
17986hyphen_passed:=j-1; link(hold_head):=null;
17987until not odd(hyf[j-1])
17988
17989@ The new hyphen might combine with the previous character via ligature
17990or kern. At this point we have |l-1<=i<j| and |i<hn|.
17991
17992@<Put the \(c)characters |hu[l..i]| and a hyphen into |pre_break(r)|@>=
17993minor_tail:=null; pre_break(r):=null; hyf_node:=new_character(hf,hyf_char);
17994if hyf_node<>null then
17995  begin incr(i); c:=hu[i]; hu[i]:=hyf_char; free_avail(hyf_node);
17996  end;
17997while l<=i do
17998  begin l:=reconstitute(l,i,font_bchar[hf],non_char)+1;
17999  if link(hold_head)>null then
18000    begin if minor_tail=null then pre_break(r):=link(hold_head)
18001    else link(minor_tail):=link(hold_head);
18002    minor_tail:=link(hold_head);
18003    while link(minor_tail)>null do minor_tail:=link(minor_tail);
18004    end;
18005  end;
18006if hyf_node<>null then
18007  begin hu[i]:=c; {restore the character in the hyphen position}
18008  l:=i; decr(i);
18009  end
18010
18011@ The synchronization algorithm begins with |l=i+1<=j|.
18012
18013@<Put the \(c)characters |hu[i+1..@,]| into |post_break(r)|...@>=
18014minor_tail:=null; post_break(r):=null; c_loc:=0;
18015if bchar_label[hf]<>non_address then {put left boundary at beginning of new line}
18016  begin decr(l); c:=hu[l]; c_loc:=l; hu[l]:=256;
18017  end;
18018while l<j do
18019  begin repeat l:=reconstitute(l,hn,bchar,non_char)+1;
18020  if c_loc>0 then
18021    begin hu[c_loc]:=c; c_loc:=0;
18022    end;
18023  if link(hold_head)>null then
18024    begin if minor_tail=null then post_break(r):=link(hold_head)
18025    else link(minor_tail):=link(hold_head);
18026    minor_tail:=link(hold_head);
18027    while link(minor_tail)>null do minor_tail:=link(minor_tail);
18028    end;
18029  until l>=j;
18030  while l>j do
18031    @<Append characters of |hu[j..@,]| to |major_tail|, advancing~|j|@>;
18032  end
18033
18034@ @<Append characters of |hu[j..@,]|...@>=
18035begin j:=reconstitute(j,hn,bchar,non_char)+1;
18036link(major_tail):=link(hold_head);
18037while link(major_tail)>null do advance_major_tail;
18038end
18039
18040@ Ligature insertion can cause a word to grow exponentially in size. Therefore
18041we must test the size of |r_count| here, even though the hyphenated text
18042was at most 63 characters long.
18043
18044@<Move pointer |s| to the end of the current list...@>=
18045if r_count>127 then {we have to forget the discretionary hyphen}
18046  begin link(s):=link(r); link(r):=null; flush_node_list(r);
18047  end
18048else begin link(s):=r; replace_count(r):=r_count;
18049  end;
18050s:=major_tail
18051
18052@* \[42] Hyphenation.
18053When a word |hc[1..hn]| has been set up to contain a candidate for hyphenation,
18054\TeX\ first looks to see if it is in the user's exception dictionary. If not,
18055hyphens are inserted based on patterns that appear within the given word,
18056using an algorithm due to Frank~M. Liang.
18057@^Liang, Franklin Mark@>
18058
18059Let's consider Liang's method first, since it is much more interesting than the
18060exception-lookup routine.  The algorithm begins by setting |hyf[j]| to zero
18061for all |j|, and invalid characters are inserted into |hc[0]|
18062and |hc[hn+1]| to serve as delimiters. Then a reasonably fast method is
18063used to see which of a given set of patterns occurs in the word
18064|hc[0..(hn+1)]|. Each pattern $p_1\ldots p_k$ of length |k| has an associated
18065sequence of |k+1| numbers $n_0\ldots n_k$; and if the pattern occurs in
18066|hc[(j+1)..(j+k)]|, \TeX\ will set |hyf[j+i]:=@tmax@>(hyf[j+i],@t$n_i$@>)| for
18067|0<=i<=k|. After this has been done for each pattern that occurs, a
18068discretionary hyphen will be inserted between |hc[j]| and |hc[j+1]| when
18069|hyf[j]| is odd, as we have already seen.
18070
18071The set of patterns $p_1\ldots p_k$ and associated numbers $n_0\ldots n_k$
18072depends, of course, on the language whose words are being hyphenated, and
18073on the degree of hyphenation that is desired. A method for finding
18074appropriate |p|'s and |n|'s, from a given dictionary of words and acceptable
18075hyphenations, is discussed in Liang's Ph.D. thesis (Stanford University,
180761983); \TeX\ simply starts with the patterns and works from there.
18077
18078@ The patterns are stored in a compact table that is also efficient for
18079retrieval, using a variant of ``trie memory'' [cf.\ {\sl The Art of
18080Computer Programming \bf3} (1973), 481--505]. We can find each pattern
18081$p_1\ldots p_k$ by letting $z_0$ be one greater than the relevant language
18082index and then, for |1<=i<=k|,
18083setting |@t$z_i$@>:=trie_link@t$(z_{i-1})+p_i$@>|; the pattern will be
18084identified by the number $z_k$. Since all the pattern information is
18085packed together into a single |trie_link| array, it is necessary to
18086prevent confusion between the data from inequivalent patterns, so another
18087table is provided such that |trie_char@t$(z_i)=p_i$@>| for all |i|. There
18088is also a table |trie_op|$(z_k)$ to identify the numbers $n_0\ldots n_k$
18089associated with $p_1\ldots p_k$.
18090
18091Comparatively few different number sequences $n_0\ldots n_k$ actually occur,
18092since most of the |n|'s are generally zero. Therefore the number sequences
18093are encoded in such a way that |trie_op|$(z_k)$ is only one byte long.
18094If |trie_op(@t$z_k$@>)<>min_quarterword|, when $p_1\ldots p_k$ has matched
18095the letters in |hc[(l-k+1)..l@,]| of language |t|,
18096we perform all of the required operations
18097for this pattern by carrying out the following little program: Set
18098|v:=trie_op(@t$z_k$@>)|. Then set |v:=v+op_start[t]|,
18099|hyf[l-hyf_distance[v]]:=@tmax@>(hyf[l-hyf_distance[v]], hyf_num[v])|,
18100and |v:=hyf_next[v]|; repeat, if necessary, until |v=min_quarterword|.
18101
18102@<Types...@>=
18103@!trie_pointer=0..trie_size; {an index into |trie|}
18104
18105@ @d trie_link(#)==trie[#].rh {``downward'' link in a trie}
18106@d trie_char(#)==trie[#].b1 {character matched at this trie location}
18107@d trie_op(#)==trie[#].b0 {program for hyphenation at this trie location}
18108
18109@<Glob...@>=
18110@!trie:array[trie_pointer] of two_halves; {|trie_link|, |trie_char|, |trie_op|}
18111@!hyf_distance:array[1..trie_op_size] of small_number; {position |k-j| of $n_j$}
18112@!hyf_num:array[1..trie_op_size] of small_number; {value of $n_j$}
18113@!hyf_next:array[1..trie_op_size] of quarterword; {continuation code}
18114@!op_start:array[ASCII_code] of 0..trie_op_size; {offset for current language}
18115
18116@ @<Local variables for hyph...@>=
18117@!z:trie_pointer; {an index into |trie|}
18118@!v:integer; {an index into |hyf_distance|, etc.}
18119
18120@ Assuming that these auxiliary tables have been set up properly, the
18121hyphenation algorithm is quite short. In the following code we set |hc[hn+2]|
18122to the impossible value 256, in order to guarantee that |hc[hn+3]| will
18123never be fetched.
18124
18125@<Find hyphen locations for the word in |hc|...@>=
18126for j:=0 to hn do hyf[j]:=0;
18127@<Look for the word |hc[1..hn]| in the exception table, and |goto found| (with
18128  |hyf| containing the hyphens) if an entry is found@>;
18129if trie_char(cur_lang+1)<>qi(cur_lang) then return; {no patterns for |cur_lang|}
18130hc[0]:=0; hc[hn+1]:=0; hc[hn+2]:=256; {insert delimiters}
18131for j:=0 to hn-r_hyf+1 do
18132  begin z:=trie_link(cur_lang+1)+hc[j]; l:=j;
18133  while hc[l]=qo(trie_char(z)) do
18134    begin if trie_op(z)<>min_quarterword then
18135      @<Store \(m)maximum values in the |hyf| table@>;
18136    incr(l); z:=trie_link(z)+hc[l];
18137    end;
18138  end;
18139found: for j:=0 to l_hyf-1 do hyf[j]:=0;
18140for j:=0 to r_hyf-1 do hyf[hn-j]:=0
18141
18142@ @<Store \(m)maximum values in the |hyf| table@>=
18143begin v:=trie_op(z);
18144repeat v:=v+op_start[cur_lang]; i:=l-hyf_distance[v];
18145if hyf_num[v]>hyf[i] then hyf[i]:=hyf_num[v];
18146v:=hyf_next[v];
18147until v=min_quarterword;
18148end
18149
18150@ The exception table that is built by \TeX's \.{\\hyphenation} primitive is
18151organized as an ordered hash table [cf.\ Amble and Knuth, {\sl The Computer
18152@^Amble, Ole@> @^Knuth, Donald Ervin@>
18153Journal\/ \bf17} (1974), 135--142] using linear probing. If $\alpha$ and
18154$\beta$ are words, we will say that $\alpha<\beta$ if $\vert\alpha\vert<
18155\vert\beta\vert$ or if $\vert\alpha\vert=\vert\beta\vert$ and
18156$\alpha$ is lexicographically smaller than $\beta$. (The notation $\vert
18157\alpha\vert$ stands for the length of $\alpha$.) The idea of ordered hashing
18158is to arrange the table so that a given word $\alpha$ can be sought by computing
18159a hash address $h=h(\alpha)$ and then looking in table positions |h|, |h-1|,
18160\dots, until encountering the first word $\L\alpha$. If this word is
18161different from $\alpha$, we can conclude that $\alpha$ is not in the table.
18162
18163The words in the table point to lists in |mem| that specify hyphen positions
18164in their |info| fields. The list for $c_1\ldots c_n$ contains the number |k| if
18165the word $c_1\ldots c_n$ has a discretionary hyphen between $c_k$ and
18166$c_{k+1}$.
18167
18168@<Types...@>=
18169@!hyph_pointer=0..hyph_size; {an index into the ordered hash table}
18170
18171@ @<Glob...@>=
18172@!hyph_word:array[hyph_pointer] of str_number; {exception words}
18173@!hyph_list:array[hyph_pointer] of pointer; {lists of hyphen positions}
18174@!hyph_count:hyph_pointer; {the number of words in the exception dictionary}
18175
18176@ @<Local variables for init...@>=
18177@!z:hyph_pointer; {runs through the exception dictionary}
18178
18179@ @<Set init...@>=
18180for z:=0 to hyph_size do
18181  begin hyph_word[z]:=0; hyph_list[z]:=null;
18182  end;
18183hyph_count:=0;
18184
18185@ The algorithm for exception lookup is quite simple, as soon as we have
18186a few more local variables to work with.
18187
18188@<Local variables for hyph...@>=
18189@!h:hyph_pointer; {an index into |hyph_word| and |hyph_list|}
18190@!k:str_number; {an index into |str_start|}
18191@!u:pool_pointer; {an index into |str_pool|}
18192
18193@ First we compute the hash code |h|, then we search until we either
18194find the word or we don't. Words from different languages are kept
18195separate by appending the language code to the string.
18196
18197@<Look for the word |hc[1...@>=
18198h:=hc[1]; incr(hn); hc[hn]:=cur_lang;
18199for j:=2 to hn do h:=(h+h+hc[j]) mod hyph_size;
18200loop@+  begin @<If the string |hyph_word[h]| is less than \(hc)|hc[1..hn]|,
18201    |goto not_found|; but if the two strings are equal,
18202    set |hyf| to the hyphen positions and |goto found|@>;
18203  if h>0 then decr(h)@+else h:=hyph_size;
18204  end;
18205not_found: decr(hn)
18206
18207@ @<If the string |hyph_word[h]| is less than \(hc)...@>=
18208k:=hyph_word[h]; if k=0 then goto not_found;
18209if length(k)<hn then goto not_found;
18210if length(k)=hn then
18211  begin j:=1; u:=str_start[k];
18212  repeat if so(str_pool[u])<hc[j] then goto not_found;
18213  if so(str_pool[u])>hc[j] then goto done;
18214  incr(j); incr(u);
18215  until j>hn;
18216  @<Insert hyphens as specified in |hyph_list[h]|@>;
18217  decr(hn); goto found;
18218  end;
18219done:
18220
18221@ @<Insert hyphens as specified...@>=
18222s:=hyph_list[h];
18223while s<>null do
18224  begin hyf[info(s)]:=1; s:=link(s);
18225  end
18226
18227@ @<Search |hyph_list| for pointers to |p|@>=
18228for q:=0 to hyph_size do
18229  begin if hyph_list[q]=p then
18230    begin print_nl("HYPH("); print_int(q); print_char(")");
18231    end;
18232  end
18233
18234@ We have now completed the hyphenation routine, so the |line_break| procedure
18235is finished at last. Since the hyphenation exception table is fresh in our
18236minds, it's a good time to deal with the routine that adds new entries to it.
18237
18238When \TeX\ has scanned `\.{\\hyphenation}', it calls on a procedure named
18239|new_hyph_exceptions| to do the right thing.
18240
18241@d set_cur_lang==if language<=0 then cur_lang:=0
18242  else if language>255 then cur_lang:=0
18243  else cur_lang:=language
18244
18245@p procedure new_hyph_exceptions; {enters new exceptions}
18246label reswitch, exit, found, not_found;
18247var n:0..64; {length of current word; not always a |small_number|}
18248@!j:0..64; {an index into |hc|}
18249@!h:hyph_pointer; {an index into |hyph_word| and |hyph_list|}
18250@!k:str_number; {an index into |str_start|}
18251@!p:pointer; {head of a list of hyphen positions}
18252@!q:pointer; {used when creating a new node for list |p|}
18253@!s,@!t:str_number; {strings being compared or stored}
18254@!u,@!v:pool_pointer; {indices into |str_pool|}
18255begin scan_left_brace; {a left brace must follow \.{\\hyphenation}}
18256set_cur_lang;
18257@<Enter as many hyphenation exceptions as are listed,
18258until coming to a right brace; then |return|@>;
18259exit:end;
18260
18261@ @<Enter as many...@>=
18262n:=0; p:=null;
18263loop@+  begin get_x_token;
18264  reswitch: case cur_cmd of
18265  letter,other_char,char_given:@<Append a new letter or hyphen@>;
18266  char_num: begin scan_char_num; cur_chr:=cur_val; cur_cmd:=char_given;
18267    goto reswitch;
18268    end;
18269  spacer,right_brace: begin if n>1 then @<Enter a hyphenation exception@>;
18270    if cur_cmd=right_brace then return;
18271    n:=0; p:=null;
18272    end;
18273  othercases @<Give improper \.{\\hyphenation} error@>
18274  endcases;
18275  end
18276
18277@ @<Give improper \.{\\hyph...@>=
18278begin print_err("Improper "); print_esc("hyphenation");
18279@.Improper \\hyphenation...@>
18280  print(" will be flushed");
18281help2("Hyphenation exceptions must contain only letters")@/
18282  ("and hyphens. But continue; I'll forgive and forget.");
18283error;
18284end
18285
18286@ @<Append a new letter or hyphen@>=
18287if cur_chr="-" then @<Append the value |n| to list |p|@>
18288else  begin if lc_code(cur_chr)=0 then
18289    begin print_err("Not a letter");
18290@.Not a letter@>
18291    help2("Letters in \hyphenation words must have \lccode>0.")@/
18292      ("Proceed; I'll ignore the character I just read.");
18293    error;
18294    end
18295  else if n<63 then
18296    begin incr(n); hc[n]:=lc_code(cur_chr);
18297    end;
18298  end
18299
18300@ @<Append the value |n| to list |p|@>=
18301begin if n<63 then
18302  begin q:=get_avail; link(q):=p; info(q):=n; p:=q;
18303  end;
18304end
18305
18306@ @<Enter a hyphenation exception@>=
18307begin incr(n); hc[n]:=cur_lang; str_room(n); h:=0;
18308for j:=1 to n do
18309  begin h:=(h+h+hc[j]) mod hyph_size;
18310  append_char(hc[j]);
18311  end;
18312s:=make_string;
18313@<Insert the \(p)pair |(s,p)| into the exception table@>;
18314end
18315
18316@ @<Insert the \(p)pair |(s,p)|...@>=
18317if hyph_count=hyph_size then overflow("exception dictionary",hyph_size);
18318@:TeX capacity exceeded exception dictionary}{\quad exception dictionary@>
18319incr(hyph_count);
18320while hyph_word[h]<>0 do
18321  begin @<If the string |hyph_word[h]| is less than \(or)or equal to
18322  |s|, interchange |(hyph_word[h],hyph_list[h])| with |(s,p)|@>;
18323  if h>0 then decr(h)@+else h:=hyph_size;
18324  end;
18325hyph_word[h]:=s; hyph_list[h]:=p
18326
18327@ @<If the string |hyph_word[h]| is less than \(or)...@>=
18328k:=hyph_word[h];
18329if length(k)<length(s) then goto found;
18330if length(k)>length(s) then goto not_found;
18331u:=str_start[k]; v:=str_start[s];
18332repeat if str_pool[u]<str_pool[v] then goto found;
18333if str_pool[u]>str_pool[v] then goto not_found;
18334incr(u); incr(v);
18335until u=str_start[k+1];
18336found:q:=hyph_list[h]; hyph_list[h]:=p; p:=q;@/
18337t:=hyph_word[h]; hyph_word[h]:=s; s:=t;
18338not_found:
18339
18340@* \[43] Initializing the hyphenation tables.
18341The trie for \TeX's hyphenation algorithm is built from a sequence of
18342patterns following a \.{\\patterns} specification. Such a specification
18343is allowed only in \.{INITEX}, since the extra memory for auxiliary tables
18344and for the initialization program itself would only clutter up the
18345production version of \TeX\ with a lot of deadwood.
18346
18347The first step is to build a trie that is linked, instead of packed
18348into sequential storage, so that insertions are readily made.
18349After all patterns have been processed, \.{INITEX}
18350compresses the linked trie by identifying common subtries. Finally the
18351trie is packed into the efficient sequential form that the hyphenation
18352algorithm actually uses.
18353
18354@<Declare subprocedures for |line_break|@>=
18355@!init @<Declare procedures for preprocessing hyphenation patterns@>@;
18356tini
18357
18358@ Before we discuss trie building in detail, let's consider the simpler
18359problem of creating the |hyf_distance|, |hyf_num|, and |hyf_next| arrays.
18360
18361Suppose, for example, that \TeX\ reads the pattern `\.{ab2cde1}'. This is
18362a pattern of length 5, with $n_0\ldots n_5=0\,0\,2\,0\,0\,1$ in the
18363notation above. We want the corresponding |trie_op| code |v| to have
18364|hyf_distance[v]=3|, |hyf_num[v]=2|, and |hyf_next[v]=@t$v^\prime$@>|,
18365where the auxiliary |trie_op| code $v^\prime$ has
18366|hyf_distance[@t$v^\prime$@>]=0|, |hyf_num[@t$v^\prime$@>]=1|, and
18367|hyf_next[@t$v^\prime$@>]=min_quarterword|.
18368
18369\TeX\ computes an appropriate value |v| with the |new_trie_op| subroutine
18370below, by setting
18371$$\hbox{|@t$v^\prime$@>:=new_trie_op(0,1,min_quarterword)|,\qquad
18372|v:=new_trie_op(3,2,@t$v^\prime$@>)|.}$$
18373This subroutine looks up its three
18374parameters in a special hash table, assigning a new value only if these
18375three have not appeared before for the current language.
18376
18377The hash table is called |trie_op_hash|, and the number of entries it contains
18378is |trie_op_ptr|.
18379
18380@<Glob...@>=
18381@!init@! trie_op_hash:array[-trie_op_size..trie_op_size] of 0..trie_op_size;
18382  {trie op codes for quadruples}
18383@!trie_used:array[ASCII_code] of quarterword;
18384  {largest opcode used so far for this language}
18385@!trie_op_lang:array[1..trie_op_size] of ASCII_code;
18386  {language part of a hashed quadruple}
18387@!trie_op_val:array[1..trie_op_size] of quarterword;
18388  {opcode corresponding to a hashed quadruple}
18389@!trie_op_ptr:0..trie_op_size; {number of stored ops so far}
18390tini
18391
18392@ It's tempting to remove the |overflow| stops in the following procedure;
18393|new_trie_op| could return |min_quarterword| (thereby simply ignoring
18394part of a hyphenation pattern) instead of aborting the job. However, that would
18395lead to different hyphenation results on different installations of \TeX\
18396using the same patterns. The |overflow| stops are necessary for portability
18397of patterns.
18398
18399@<Declare procedures for preprocessing hyph...@>=
18400function new_trie_op(@!d,@!n:small_number;@!v:quarterword):quarterword;
18401label exit;
18402var h:-trie_op_size..trie_op_size; {trial hash location}
18403@!u:quarterword; {trial op code}
18404@!l:0..trie_op_size; {pointer to stored data}
18405begin h:=abs(n+313*d+361*v+1009*cur_lang) mod (trie_op_size+trie_op_size)
18406  - trie_op_size;
18407loop@+  begin l:=trie_op_hash[h];
18408  if l=0 then {empty position found for a new op}
18409    begin if trie_op_ptr=trie_op_size then
18410      overflow("pattern memory ops",trie_op_size);
18411    u:=trie_used[cur_lang];
18412    if u=max_quarterword then
18413      overflow("pattern memory ops per language",
18414        max_quarterword-min_quarterword);
18415    incr(trie_op_ptr); incr(u); trie_used[cur_lang]:=u;
18416    hyf_distance[trie_op_ptr]:=d;
18417    hyf_num[trie_op_ptr]:=n; hyf_next[trie_op_ptr]:=v;
18418    trie_op_lang[trie_op_ptr]:=cur_lang; trie_op_hash[h]:=trie_op_ptr;
18419    trie_op_val[trie_op_ptr]:=u; new_trie_op:=u; return;
18420    end;
18421  if (hyf_distance[l]=d)and(hyf_num[l]=n)and(hyf_next[l]=v)
18422   and(trie_op_lang[l]=cur_lang) then
18423    begin new_trie_op:=trie_op_val[l]; return;
18424    end;
18425  if h>-trie_op_size then decr(h)@+else h:=trie_op_size;
18426  end;
18427exit:end;
18428
18429@ After |new_trie_op| has compressed the necessary opcode information,
18430plenty of information is available to unscramble the data into the
18431final form needed by our hyphenation algorithm.
18432
18433@<Sort \(t)the hyphenation op tables into proper order@>=
18434op_start[0]:=-min_quarterword;
18435for j:=1 to 255 do op_start[j]:=op_start[j-1]+qo(trie_used[j-1]);
18436for j:=1 to trie_op_ptr do
18437  trie_op_hash[j]:=op_start[trie_op_lang[j]]+trie_op_val[j]; {destination}
18438for j:=1 to trie_op_ptr do while trie_op_hash[j]>j do
18439  begin k:=trie_op_hash[j];@/
18440  t:=hyf_distance[k]; hyf_distance[k]:=hyf_distance[j]; hyf_distance[j]:=t;@/
18441  t:=hyf_num[k]; hyf_num[k]:=hyf_num[j]; hyf_num[j]:=t;@/
18442  t:=hyf_next[k]; hyf_next[k]:=hyf_next[j]; hyf_next[j]:=t;@/
18443  trie_op_hash[j]:=trie_op_hash[k]; trie_op_hash[k]:=k;
18444  end
18445
18446@ Before we forget how to initialize the data structures that have been
18447mentioned so far, let's write down the code that gets them started.
18448
18449@<Initialize table entries...@>=
18450for k:=-trie_op_size to trie_op_size do trie_op_hash[k]:=0;
18451for k:=0 to 255 do trie_used[k]:=min_quarterword;
18452trie_op_ptr:=0;
18453
18454@ The linked trie that is used to preprocess hyphenation patterns appears
18455in several global arrays. Each node represents an instruction of the form
18456``if you see character |c|, then perform operation |o|, move to the
18457next character, and go to node |l|; otherwise go to node |r|.''
18458The four quantities |c|, |o|, |l|, and |r| are stored in four arrays
18459|trie_c|, |trie_o|, |trie_l|, and |trie_r|. The root of the trie
18460is |trie_l[0]|, and the number of nodes is |trie_ptr|. Null trie
18461pointers are represented by zero. To initialize the trie, we simply
18462set |trie_l[0]| and |trie_ptr| to zero. We also set |trie_c[0]| to some
18463arbitrary value, since the algorithm may access it.
18464
18465The algorithms maintain the condition
18466$$\hbox{|trie_c[trie_r[z]]>trie_c[z]|\qquad
18467whenever |z<>0| and |trie_r[z]<>0|};$$ in other words, sibling nodes are
18468ordered by their |c| fields.
18469
18470@d trie_root==trie_l[0] {root of the linked trie}
18471
18472@<Glob...@>=
18473@!init @!trie_c:packed array[trie_pointer] of packed_ASCII_code;
18474  {characters to match}
18475@t\hskip10pt@>@!trie_o:packed array[trie_pointer] of quarterword;
18476  {operations to perform}
18477@t\hskip10pt@>@!trie_l:packed array[trie_pointer] of trie_pointer;
18478  {left subtrie links}
18479@t\hskip10pt@>@!trie_r:packed array[trie_pointer] of trie_pointer;
18480  {right subtrie links}
18481@t\hskip10pt@>@!trie_ptr:trie_pointer; {the number of nodes in the trie}
18482@t\hskip10pt@>@!trie_hash:packed array[trie_pointer] of trie_pointer;
18483  {used to identify equivalent subtries}
18484tini
18485
18486@ Let us suppose that a linked trie has already been constructed.
18487Experience shows that we can often reduce its size by recognizing common
18488subtries; therefore another hash table is introduced for this purpose,
18489somewhat similar to |trie_op_hash|. The new hash table will be
18490initialized to zero.
18491
18492The function |trie_node(p)| returns |p| if |p| is distinct from other nodes
18493that it has seen, otherwise it returns the number of the first equivalent
18494node that it has seen.
18495
18496Notice that we might make subtries equivalent even if they correspond to
18497patterns for different languages, in which the trie ops might mean quite
18498different things. That's perfectly all right.
18499
18500@<Declare procedures for preprocessing hyph...@>=
18501function trie_node(@!p:trie_pointer):trie_pointer; {converts
18502  to a canonical form}
18503label exit;
18504var h:trie_pointer; {trial hash location}
18505@!q:trie_pointer; {trial trie node}
18506begin h:=abs(trie_c[p]+1009*trie_o[p]+@|
18507    2718*trie_l[p]+3142*trie_r[p]) mod trie_size;
18508loop@+  begin q:=trie_hash[h];
18509  if q=0 then
18510    begin trie_hash[h]:=p; trie_node:=p; return;
18511    end;
18512  if (trie_c[q]=trie_c[p])and(trie_o[q]=trie_o[p])and@|
18513    (trie_l[q]=trie_l[p])and(trie_r[q]=trie_r[p]) then
18514    begin trie_node:=q; return;
18515    end;
18516  if h>0 then decr(h)@+else h:=trie_size;
18517  end;
18518exit:end;
18519
18520@ A neat recursive procedure is now able to compress a trie by
18521traversing it and applying |trie_node| to its nodes in ``bottom up''
18522fashion. We will compress the entire trie by clearing |trie_hash| to
18523zero and then saying `|trie_root:=compress_trie(trie_root)|'.
18524@^recursion@>
18525
18526@<Declare procedures for preprocessing hyph...@>=
18527function compress_trie(@!p:trie_pointer):trie_pointer;
18528begin if p=0 then compress_trie:=0
18529else  begin trie_l[p]:=compress_trie(trie_l[p]);
18530  trie_r[p]:=compress_trie(trie_r[p]);
18531  compress_trie:=trie_node(p);
18532  end;
18533end;
18534
18535@ The compressed trie will be packed into the |trie| array using a
18536``top-down first-fit'' procedure. This is a little tricky, so the reader
18537should pay close attention: The |trie_hash| array is cleared to zero
18538again and renamed |trie_ref| for this phase of the operation; later on,
18539|trie_ref[p]| will be nonzero only if the linked trie node |p| is the
18540smallest character
18541in a family and if the characters |c| of that family have been allocated to
18542locations |trie_ref[p]+c| in the |trie| array. Locations of |trie| that
18543are in use will have |trie_link=0|, while the unused holes in |trie|
18544will be doubly linked with |trie_link| pointing to the next larger vacant
18545location and |trie_back| pointing to the next smaller one. This double
18546linking will have been carried out only as far as |trie_max|, where
18547|trie_max| is the largest index of |trie| that will be needed.
18548To save time at the low end of the trie, we maintain array entries
18549|trie_min[c]| pointing to the smallest hole that is greater than~|c|.
18550Another array |trie_taken| tells whether or not a given location is
18551equal to |trie_ref[p]| for some |p|; this array is used to ensure that
18552distinct nodes in the compressed trie will have distinct |trie_ref|
18553entries.
18554
18555@d trie_ref==trie_hash {where linked trie families go into |trie|}
18556@d trie_back(#)==trie[#].lh {backward links in |trie| holes}
18557
18558@<Glob...@>=
18559@!init@!trie_taken:packed array[1..trie_size] of boolean;
18560  {does a family start here?}
18561@t\hskip10pt@>@!trie_min:array[ASCII_code] of trie_pointer;
18562  {the first possible slot for each character}
18563@t\hskip10pt@>@!trie_max:trie_pointer; {largest location used in |trie|}
18564@t\hskip10pt@>@!trie_not_ready:boolean; {is the trie still in linked form?}
18565tini
18566
18567@ Each time \.{\\patterns} appears, it contributes further patterns to
18568the future trie, which will be built only when hyphenation is attempted or
18569when a format file is dumped. The boolean variable |trie_not_ready|
18570will change to |false| when the trie is compressed; this will disable
18571further patterns.
18572
18573@<Initialize table entries...@>=
18574trie_not_ready:=true; trie_root:=0; trie_c[0]:=si(0); trie_ptr:=0;
18575
18576@ Here is how the trie-compression data structures are initialized.
18577If storage is tight, it would be possible to overlap |trie_op_hash|,
18578|trie_op_lang|, and |trie_op_val| with |trie|, |trie_hash|, and |trie_taken|,
18579because we finish with the former just before we need the latter.
18580
18581@<Get ready to compress the trie@>=
18582@<Sort \(t)the hyphenation...@>;
18583for p:=0 to trie_size do trie_hash[p]:=0;
18584trie_root:=compress_trie(trie_root); {identify equivalent subtries}
18585for p:=0 to trie_ptr do trie_ref[p]:=0;
18586for p:=0 to 255 do trie_min[p]:=p+1;
18587trie_link(0):=1; trie_max:=0
18588
18589@ The |first_fit| procedure finds the smallest hole |z| in |trie| such that
18590a trie family starting at a given node |p| will fit into vacant positions
18591starting at |z|. If |c=trie_c[p]|, this means that location |z-c| must
18592not already be taken by some other family, and that |z-c+@t$c^\prime$@>|
18593must be vacant for all characters $c^\prime$ in the family. The procedure
18594sets |trie_ref[p]| to |z-c| when the first fit has been found.
18595
18596@<Declare procedures for preprocessing hyph...@>=
18597procedure first_fit(@!p:trie_pointer); {packs a family into |trie|}
18598label not_found,found;
18599var h:trie_pointer; {candidate for |trie_ref[p]|}
18600@!z:trie_pointer; {runs through holes}
18601@!q:trie_pointer; {runs through the family starting at |p|}
18602@!c:ASCII_code; {smallest character in the family}
18603@!l,@!r:trie_pointer; {left and right neighbors}
18604@!ll:1..256; {upper limit of |trie_min| updating}
18605begin c:=so(trie_c[p]);
18606z:=trie_min[c]; {get the first conceivably good hole}
18607loop@+  begin h:=z-c;@/
18608  @<Ensure that |trie_max>=h+256|@>;
18609  if trie_taken[h] then goto not_found;
18610  @<If all characters of the family fit relative to |h|, then
18611    |goto found|,\30\ otherwise |goto not_found|@>;
18612  not_found: z:=trie_link(z); {move to the next hole}
18613  end;
18614found: @<Pack the family into |trie| relative to |h|@>;
18615end;
18616
18617@ By making sure that |trie_max| is at least |h+256|, we can be sure that
18618|trie_max>z|, since |h=z-c|. It follows that location |trie_max| will
18619never be occupied in |trie|, and we will have |trie_max>=trie_link(z)|.
18620
18621@<Ensure that |trie_max>=h+256|@>=
18622if trie_max<h+256 then
18623  begin if trie_size<=h+256 then overflow("pattern memory",trie_size);
18624@:TeX capacity exceeded pattern memory}{\quad pattern memory@>
18625  repeat incr(trie_max); trie_taken[trie_max]:=false;
18626  trie_link(trie_max):=trie_max+1; trie_back(trie_max):=trie_max-1;
18627  until trie_max=h+256;
18628  end
18629
18630@ @<If all characters of the family fit relative to |h|...@>=
18631q:=trie_r[p];
18632while q>0 do
18633  begin if trie_link(h+so(trie_c[q]))=0 then goto not_found;
18634  q:=trie_r[q];
18635  end;
18636goto found
18637
18638@ @<Pack the family into |trie| relative to |h|@>=
18639trie_taken[h]:=true; trie_ref[p]:=h; q:=p;
18640repeat z:=h+so(trie_c[q]); l:=trie_back(z); r:=trie_link(z);
18641trie_back(r):=l; trie_link(l):=r; trie_link(z):=0;
18642if l<256 then
18643  begin if z<256 then ll:=z @+else ll:=256;
18644  repeat trie_min[l]:=r; incr(l);
18645  until l=ll;
18646  end;
18647q:=trie_r[q];
18648until q=0
18649
18650@ To pack the entire linked trie, we use the following recursive procedure.
18651@^recursion@>
18652
18653@<Declare procedures for preprocessing hyph...@>=
18654procedure trie_pack(@!p:trie_pointer); {pack subtries of a family}
18655var q:trie_pointer; {a local variable that need not be saved on recursive calls}
18656begin repeat q:=trie_l[p];
18657if (q>0)and(trie_ref[q]=0) then
18658  begin first_fit(q); trie_pack(q);
18659  end;
18660p:=trie_r[p];
18661until p=0;
18662end;
18663
18664@ When the whole trie has been allocated into the sequential table, we
18665must go through it once again so that |trie| contains the correct
18666information. Null pointers in the linked trie will be represented by the
18667value~0, which properly implements an ``empty'' family.
18668
18669@<Move the data into |trie|@>=
18670h.rh:=0; h.b0:=min_quarterword; h.b1:=min_quarterword; {|trie_link:=0|,
18671  |trie_op:=min_quarterword|, |trie_char:=qi(0)|}
18672if trie_root=0 then {no patterns were given}
18673  begin for r:=0 to 256 do trie[r]:=h;
18674  trie_max:=256;
18675  end
18676else begin trie_fix(trie_root); {this fixes the non-holes in |trie|}
18677  r:=0; {now we will zero out all the holes}
18678  repeat s:=trie_link(r); trie[r]:=h; r:=s;
18679  until r>trie_max;
18680  end;
18681trie_char(0):=qi("?"); {make |trie_char(c)<>c| for all |c|}
18682
18683@ The fixing-up procedure is, of course, recursive. Since the linked trie
18684usually has overlapping subtries, the same data may be moved several
18685times; but that causes no harm, and at most as much work is done as it
18686took to build the uncompressed trie.
18687@^recursion@>
18688
18689@<Declare procedures for preprocessing hyph...@>=
18690procedure trie_fix(@!p:trie_pointer); {moves |p| and its siblings into |trie|}
18691var q:trie_pointer; {a local variable that need not be saved on recursive calls}
18692@!c:ASCII_code; {another one that need not be saved}
18693@!z:trie_pointer; {|trie| reference; this local variable must be saved}
18694begin z:=trie_ref[p];
18695repeat q:=trie_l[p]; c:=so(trie_c[p]);
18696trie_link(z+c):=trie_ref[q]; trie_char(z+c):=qi(c); trie_op(z+c):=trie_o[p];
18697if q>0 then trie_fix(q);
18698p:=trie_r[p];
18699until p=0;
18700end;
18701
18702@ Now let's go back to the easier problem, of building the linked
18703trie.  When \.{INITEX} has scanned the `\.{\\patterns}' control
18704sequence, it calls on |new_patterns| to do the right thing.
18705
18706@<Declare procedures for preprocessing hyph...@>=
18707procedure new_patterns; {initializes the hyphenation pattern data}
18708label done, done1;
18709var k,@!l:0..64; {indices into |hc| and |hyf|;
18710                  not always in |small_number| range}
18711@!digit_sensed:boolean; {should the next digit be treated as a letter?}
18712@!v:quarterword; {trie op code}
18713@!p,@!q:trie_pointer; {nodes of trie traversed during insertion}
18714@!first_child:boolean; {is |p=trie_l[q]|?}
18715@!c:ASCII_code; {character being inserted}
18716begin if trie_not_ready then
18717  begin set_cur_lang; scan_left_brace; {a left brace must follow \.{\\patterns}}
18718  @<Enter all of the patterns into a linked trie, until coming to a right
18719  brace@>;
18720  end
18721else begin print_err("Too late for "); print_esc("patterns");
18722  help1("All patterns must be given before typesetting begins.");
18723  error; link(garbage):=scan_toks(false,false); flush_list(def_ref);
18724  end;
18725end;
18726
18727@ Novices are not supposed to be using \.{\\patterns}, so the error
18728messages are terse. (Note that all error messages appear in \TeX's string
18729pool, even if they are used only by \.{INITEX}.)
18730
18731@<Enter all of the patterns into a linked trie...@>=
18732k:=0; hyf[0]:=0; digit_sensed:=false;
18733loop@+  begin get_x_token;
18734  case cur_cmd of
18735  letter,other_char:@<Append a new letter or a hyphen level@>;
18736  spacer,right_brace: begin if k>0 then
18737      @<Insert a new pattern into the linked trie@>;
18738    if cur_cmd=right_brace then goto done;
18739    k:=0; hyf[0]:=0; digit_sensed:=false;
18740    end;
18741  othercases begin print_err("Bad "); print_esc("patterns");
18742@.Bad \\patterns@>
18743    help1("(See Appendix H.)"); error;
18744    end
18745  endcases;
18746  end;
18747done:
18748
18749@ @<Append a new letter or a hyphen level@>=
18750if digit_sensed or(cur_chr<"0")or(cur_chr>"9") then
18751  begin if cur_chr="." then cur_chr:=0 {edge-of-word delimiter}
18752  else  begin cur_chr:=lc_code(cur_chr);
18753    if cur_chr=0 then
18754      begin print_err("Nonletter");
18755@.Nonletter@>
18756      help1("(See Appendix H.)"); error;
18757      end;
18758    end;
18759  if k<63 then
18760    begin incr(k); hc[k]:=cur_chr; hyf[k]:=0; digit_sensed:=false;
18761    end;
18762  end
18763else if k<63 then
18764  begin hyf[k]:=cur_chr-"0"; digit_sensed:=true;
18765  end
18766
18767@ When the following code comes into play, the pattern $p_1\ldots p_k$
18768appears in |hc[1..k]|, and the corresponding sequence of numbers $n_0\ldots
18769n_k$ appears in |hyf[0..k]|.
18770
18771@<Insert a new pattern into the linked trie@>=
18772begin @<Compute the trie op code, |v|, and set |l:=0|@>;
18773q:=0; hc[0]:=cur_lang;
18774while l<=k do
18775  begin c:=hc[l]; incr(l); p:=trie_l[q]; first_child:=true;
18776  while (p>0)and(c>so(trie_c[p])) do
18777    begin q:=p; p:=trie_r[q]; first_child:=false;
18778    end;
18779  if (p=0)or(c<so(trie_c[p])) then
18780    @<Insert a new trie node between |q| and |p|, and
18781      make |p| point to it@>;
18782  q:=p; {now node |q| represents $p_1\ldots p_{l-1}$}
18783  end;
18784if trie_o[q]<>min_quarterword then
18785  begin print_err("Duplicate pattern");
18786@.Duplicate pattern@>
18787  help1("(See Appendix H.)"); error;
18788  end;
18789trie_o[q]:=v;
18790end
18791
18792@ @<Insert a new trie node between |q| and |p|...@>=
18793begin if trie_ptr=trie_size then overflow("pattern memory",trie_size);
18794@:TeX capacity exceeded pattern memory}{\quad pattern memory@>
18795incr(trie_ptr); trie_r[trie_ptr]:=p; p:=trie_ptr; trie_l[p]:=0;
18796if first_child then trie_l[q]:=p@+else trie_r[q]:=p;
18797trie_c[p]:=si(c); trie_o[p]:=min_quarterword;
18798end
18799
18800@ @<Compute the trie op code, |v|...@>=
18801if hc[1]=0 then hyf[0]:=0;
18802if hc[k]=0 then hyf[k]:=0;
18803l:=k; v:=min_quarterword;
18804loop@+  begin if hyf[l]<>0 then v:=new_trie_op(k-l,hyf[l],v);
18805  if l>0 then decr(l)@+else goto done1;
18806  end;
18807done1:
18808
18809@ Finally we put everything together: Here is how the trie gets to its
18810final, efficient form.
18811The following packing routine is rigged so that the root of the linked
18812tree gets mapped into location 1 of |trie|, as required by the hyphenation
18813algorithm. This happens because the first call of |first_fit| will
18814``take'' location~1.
18815
18816@<Declare procedures for preprocessing hyphenation patterns@>=
18817procedure init_trie;
18818var @!p:trie_pointer; {pointer for initialization}
18819@!j,@!k,@!t:integer; {all-purpose registers for initialization}
18820@!r,@!s:trie_pointer; {used to clean up the packed |trie|}
18821@!h:two_halves; {template used to zero out |trie|'s holes}
18822begin @<Get ready to compress the trie@>;
18823if trie_root<>0 then
18824  begin first_fit(trie_root); trie_pack(trie_root);
18825  end;
18826@<Move the data into |trie|@>;
18827trie_not_ready:=false;
18828end;
18829
18830@* \[44] Breaking vertical lists into pages.
18831The |vsplit| procedure, which implements \TeX's \.{\\vsplit} operation,
18832is considerably simpler than |line_break| because it doesn't have to
18833worry about hyphenation, and because its mission is to discover a single
18834break instead of an optimum sequence of breakpoints.  But before we get
18835into the details of |vsplit|, we need to consider a few more basic things.
18836
18837@ A subroutine called |prune_page_top| takes a pointer to a vlist and
18838returns a pointer to a modified vlist in which all glue, kern, and penalty nodes
18839have been deleted before the first box or rule node. However, the first
18840box or rule is actually preceded by a newly created glue node designed so that
18841the topmost baseline will be at distance |split_top_skip| from the top,
18842whenever this is possible without backspacing.
18843
18844In this routine and those that follow, we make use of the fact that a
18845vertical list contains no character nodes, hence the |type| field exists
18846for each node in the list.
18847@^data structure assumptions@>
18848
18849@p function prune_page_top(@!p:pointer):pointer; {adjust top after page break}
18850var prev_p:pointer; {lags one step behind |p|}
18851@!q:pointer; {temporary variable for list manipulation}
18852begin prev_p:=temp_head; link(temp_head):=p;
18853while p<>null do
18854  case type(p) of
18855  hlist_node,vlist_node,rule_node:@<Insert glue for |split_top_skip|
18856    and set~|p:=null|@>;
18857  whatsit_node,mark_node,ins_node: begin prev_p:=p; p:=link(prev_p);
18858    end;
18859  glue_node,kern_node,penalty_node: begin q:=p; p:=link(q); link(q):=null;
18860    link(prev_p):=p; flush_node_list(q);
18861    end;
18862  othercases confusion("pruning")
18863@:this can't happen pruning}{\quad pruning@>
18864  endcases;
18865prune_page_top:=link(temp_head);
18866end;
18867
18868@ @<Insert glue for |split_top_skip|...@>=
18869begin q:=new_skip_param(split_top_skip_code); link(prev_p):=q; link(q):=p;
18870  {now |temp_ptr=glue_ptr(q)|}
18871if width(temp_ptr)>height(p) then width(temp_ptr):=width(temp_ptr)-height(p)
18872else width(temp_ptr):=0;
18873p:=null;
18874end
18875
18876@ The next subroutine finds the best place to break a given vertical list
18877so as to obtain a box of height~|h|, with maximum depth~|d|.
18878A pointer to the beginning of the vertical list is given,
18879and a pointer to the optimum breakpoint is returned. The list is effectively
18880followed by a forced break, i.e., a penalty node with the |eject_penalty|;
18881if the best break occurs at this artificial node, the value |null| is returned.
18882
18883An array of six |scaled| distances is used to keep track of the height
18884from the beginning of the list to the current place, just as in |line_break|.
18885In fact, we use one of the same arrays, only changing its name to reflect
18886its new significance.
18887
18888@d active_height==active_width {new name for the six distance variables}
18889@d cur_height==active_height[1] {the natural height}
18890@d set_height_zero(#)==active_height[#]:=0 {initialize the height to zero}
18891@#
18892@d update_heights=90 {go here to record glue in the |active_height| table}
18893
18894@p function vert_break(@!p:pointer; @!h,@!d:scaled):pointer;
18895  {finds optimum page break}
18896label done,not_found,update_heights;
18897var prev_p:pointer; {if |p| is a glue node, |type(prev_p)| determines
18898  whether |p| is a legal breakpoint}
18899@!q,@!r:pointer; {glue specifications}
18900@!pi:integer; {penalty value}
18901@!b:integer; {badness at a trial breakpoint}
18902@!least_cost:integer; {the smallest badness plus penalties found so far}
18903@!best_place:pointer; {the most recent break that leads to |least_cost|}
18904@!prev_dp:scaled; {depth of previous box in the list}
18905@!t:small_number; {|type| of the node following a kern}
18906begin prev_p:=p; {an initial glue node is not a legal breakpoint}
18907least_cost:=awful_bad; do_all_six(set_height_zero); prev_dp:=0;
18908loop@+  begin @<If node |p| is a legal breakpoint, check if this break is
18909    the best known, and |goto done| if |p| is null or
18910    if the page-so-far is already too full to accept more stuff@>;
18911  prev_p:=p; p:=link(prev_p);
18912  end;
18913done: vert_break:=best_place;
18914end;
18915
18916@ A global variable |best_height_plus_depth| will be set to the natural size
18917of the box that corresponds to the optimum breakpoint found by |vert_break|.
18918(This value is used by the insertion-splitting algorithm of the page builder.)
18919
18920@<Glob...@>=
18921@!best_height_plus_depth:scaled; {height of the best box, without stretching or
18922  shrinking}
18923
18924@ A subtle point to be noted here is that the maximum depth~|d| might be
18925negative, so |cur_height| and |prev_dp| might need to be corrected even
18926after a glue or kern node.
18927
18928@<If node |p| is a legal breakpoint, check...@>=
18929if p=null then pi:=eject_penalty
18930else  @<Use node |p| to update the current height and depth measurements;
18931    if this node is not a legal breakpoint, |goto not_found|
18932    or |update_heights|,
18933    otherwise set |pi| to the associated penalty at the break@>;
18934@<Check if node |p| is a new champion breakpoint; then \(go)|goto done|
18935  if |p| is a forced break or if the page-so-far is already too full@>;
18936if (type(p)<glue_node)or(type(p)>kern_node) then goto not_found;
18937update_heights: @<Update the current height and depth measurements with
18938  respect to a glue or kern node~|p|@>;
18939not_found: if prev_dp>d then
18940    begin cur_height:=cur_height+prev_dp-d;
18941    prev_dp:=d;
18942    end;
18943
18944@ @<Use node |p| to update the current height and depth measurements...@>=
18945case type(p) of
18946hlist_node,vlist_node,rule_node: begin@t@>@;@/
18947  cur_height:=cur_height+prev_dp+height(p); prev_dp:=depth(p);
18948  goto not_found;
18949  end;
18950whatsit_node:@<Process whatsit |p| in |vert_break| loop, |goto not_found|@>;
18951glue_node: if precedes_break(prev_p) then pi:=0
18952  else goto update_heights;
18953kern_node: begin if link(p)=null then t:=penalty_node
18954  else t:=type(link(p));
18955  if t=glue_node then pi:=0@+else goto update_heights;
18956  end;
18957penalty_node: pi:=penalty(p);
18958mark_node,ins_node: goto not_found;
18959othercases confusion("vertbreak")
18960@:this can't happen vertbreak}{\quad vertbreak@>
18961endcases
18962
18963@ @d deplorable==100000 {more than |inf_bad|, but less than |awful_bad|}
18964
18965@<Check if node |p| is a new champion breakpoint; then \(go)...@>=
18966if pi<inf_penalty then
18967  begin @<Compute the badness, |b|, using |awful_bad|
18968    if the box is too full@>;
18969  if b<awful_bad then
18970    if pi<=eject_penalty then b:=pi
18971    else if b<inf_bad then b:=b+pi
18972      else b:=deplorable;
18973  if b<=least_cost then
18974    begin best_place:=p; least_cost:=b;
18975    best_height_plus_depth:=cur_height+prev_dp;
18976    end;
18977  if (b=awful_bad)or(pi<=eject_penalty) then goto done;
18978  end
18979
18980@ @<Compute the badness, |b|, using |awful_bad| if the box is too full@>=
18981if cur_height<h then
18982  if (active_height[3]<>0) or (active_height[4]<>0) or
18983    (active_height[5]<>0) then b:=0
18984  else b:=badness(h-cur_height,active_height[2])
18985else if cur_height-h>active_height[6] then b:=awful_bad
18986else b:=badness(cur_height-h,active_height[6])
18987
18988@ Vertical lists that are subject to the |vert_break| procedure should not
18989contain infinite shrinkability, since that would permit any amount of
18990information to ``fit'' on one page.
18991
18992@<Update the current height and depth measurements with...@>=
18993if type(p)=kern_node then q:=p
18994else  begin q:=glue_ptr(p);
18995  active_height[2+stretch_order(q)]:=@|
18996    active_height[2+stretch_order(q)]+stretch(q);@/
18997  active_height[6]:=active_height[6]+shrink(q);
18998  if (shrink_order(q)<>normal)and(shrink(q)<>0) then
18999    begin@t@>@;@/
19000    print_err("Infinite glue shrinkage found in box being split");@/
19001@.Infinite glue shrinkage...@>
19002    help4("The box you are \vsplitting contains some infinitely")@/
19003      ("shrinkable glue, e.g., `\vss' or `\vskip 0pt minus 1fil'.")@/
19004      ("Such glue doesn't belong there; but you can safely proceed,")@/
19005      ("since the offensive shrinkability has been made finite.");
19006    error; r:=new_spec(q); shrink_order(r):=normal; delete_glue_ref(q);
19007    glue_ptr(p):=r; q:=r;
19008    end;
19009  end;
19010cur_height:=cur_height+prev_dp+width(q); prev_dp:=0
19011
19012@ Now we are ready to consider |vsplit| itself. Most of
19013its work is accomplished by the two subroutines that we have just considered.
19014
19015Given the number of a vlist box |n|, and given a desired page height |h|,
19016the |vsplit| function finds the best initial segment of the vlist and
19017returns a box for a page of height~|h|. The remainder of the vlist, if
19018any, replaces the original box, after removing glue and penalties and
19019adjusting for |split_top_skip|. Mark nodes in the split-off box are used to
19020set the values of |split_first_mark| and |split_bot_mark|; we use the
19021fact that |split_first_mark=null| if and only if |split_bot_mark=null|.
19022
19023The original box becomes ``void'' if and only if it has been entirely
19024extracted.  The extracted box is ``void'' if and only if the original
19025box was void (or if it was, erroneously, an hlist box).
19026
19027@p function vsplit(@!n:eight_bits; @!h:scaled):pointer;
19028  {extracts a page of height |h| from box |n|}
19029label exit,done;
19030var v:pointer; {the box to be split}
19031p:pointer; {runs through the vlist}
19032q:pointer; {points to where the break occurs}
19033begin v:=box(n);
19034if split_first_mark<>null then
19035  begin delete_token_ref(split_first_mark); split_first_mark:=null;
19036  delete_token_ref(split_bot_mark); split_bot_mark:=null;
19037  end;
19038@<Dispense with trivial cases of void or bad boxes@>;
19039q:=vert_break(list_ptr(v),h,split_max_depth);
19040@<Look at all the marks in nodes before the break, and set the final
19041  link to |null| at the break@>;
19042q:=prune_page_top(q); p:=list_ptr(v); free_node(v,box_node_size);
19043if q=null then box(n):=null {the |eq_level| of the box stays the same}
19044else box(n):=vpack(q,natural);
19045vsplit:=vpackage(p,h,exactly,split_max_depth);
19046exit: end;
19047
19048@ @<Dispense with trivial cases of void or bad boxes@>=
19049if v=null then
19050  begin vsplit:=null; return;
19051  end;
19052if type(v)<>vlist_node then
19053  begin print_err(""); print_esc("vsplit"); print(" needs a ");
19054  print_esc("vbox");
19055@:vsplit_}{\.{\\vsplit needs a \\vbox}@>
19056  help2("The box you are trying to split is an \hbox.")@/
19057  ("I can't split such a box, so I'll leave it alone.");
19058  error; vsplit:=null; return;
19059  end
19060
19061@ It's possible that the box begins with a penalty node that is the
19062``best'' break, so we must be careful to handle this special case correctly.
19063
19064@<Look at all the marks...@>=
19065p:=list_ptr(v);
19066if p=q then list_ptr(v):=null
19067else loop@+begin if type(p)=mark_node then
19068    if split_first_mark=null then
19069      begin split_first_mark:=mark_ptr(p);
19070      split_bot_mark:=split_first_mark;
19071      token_ref_count(split_first_mark):=@|
19072        token_ref_count(split_first_mark)+2;
19073      end
19074    else  begin delete_token_ref(split_bot_mark);
19075      split_bot_mark:=mark_ptr(p);
19076      add_token_ref(split_bot_mark);
19077      end;
19078  if link(p)=q then
19079    begin link(p):=null; goto done;
19080    end;
19081  p:=link(p);
19082  end;
19083done:
19084
19085@* \[45] The page builder.
19086When \TeX\ appends new material to its main vlist in vertical mode, it uses
19087a method something like |vsplit| to decide where a page ends, except that
19088the calculations are done ``on line'' as new items come in.
19089The main complication in this process is that insertions must be put
19090into their boxes and removed from the vlist, in a more-or-less optimum manner.
19091
19092We shall use the term ``current page'' for that part of the main vlist that
19093is being considered as a candidate for being broken off and sent to the
19094user's output routine. The current page starts at |link(page_head)|, and
19095it ends at |page_tail|.  We have |page_head=page_tail| if this list is empty.
19096@^current page@>
19097
19098Utter chaos would reign if the user kept changing page specifications
19099while a page is being constructed, so the page builder keeps the pertinent
19100specifications frozen as soon as the page receives its first box or
19101insertion.  The global variable |page_contents| is |empty| when the
19102current page contains only mark nodes and content-less whatsit nodes; it
19103is |inserts_only| if the page contains only insertion nodes in addition to
19104marks and whatsits.  Glue nodes, kern nodes, and penalty nodes are
19105discarded until a box or rule node appears, at which time |page_contents|
19106changes to |box_there|.  As soon as |page_contents| becomes non-|empty|,
19107the current |vsize| and |max_depth| are squirreled away into |page_goal|
19108and |page_max_depth|; the latter values will be used until the page has
19109been forwarded to the user's output routine. The \.{\\topskip} adjustment
19110is made when |page_contents| changes to |box_there|.
19111
19112Although |page_goal| starts out equal to |vsize|, it is decreased by the
19113scaled natural height-plus-depth of the insertions considered so far, and by
19114the \.{\\skip} corrections for those insertions. Therefore it represents
19115the size into which the non-inserted material should fit, assuming that
19116all insertions in the current page have been made.
19117
19118The global variables |best_page_break| and |least_page_cost| correspond
19119respectively to the local variables |best_place| and |least_cost| in the
19120|vert_break| routine that we have already studied; i.e., they record the
19121location and value of the best place currently known for breaking the
19122current page. The value of |page_goal| at the time of the best break is
19123stored in |best_size|.
19124
19125@d inserts_only=1
19126  {|page_contents| when an insert node has been contributed, but no boxes}
19127@d box_there=2 {|page_contents| when a box or rule has been contributed}
19128
19129@<Glob...@>=
19130@!page_tail:pointer; {the final node on the current page}
19131@!page_contents:empty..box_there; {what is on the current page so far?}
19132@!page_max_depth:scaled; {maximum box depth on page being built}
19133@!best_page_break:pointer; {break here to get the best page known so far}
19134@!least_page_cost:integer; {the score for this currently best page}
19135@!best_size:scaled; {its |page_goal|}
19136
19137@ The page builder has another data structure to keep track of insertions.
19138This is a list of four-word nodes, starting and ending at |page_ins_head|.
19139That is, the first element of the list is node |r@t$_1$@>=link(page_ins_head)|;
19140node $r_j$ is followed by |r@t$_{j+1}$@>=link(r@t$_j$@>)|; and if there are
19141|n| items we have |r@t$_{n+1}$@>=page_ins_head|. The |subtype| field of
19142each node in this list refers to an insertion number; for example, `\.{\\insert
19143250}' would correspond to a node whose |subtype| is |qi(250)|
19144(the same as the |subtype| field of the relevant |ins_node|). These |subtype|
19145fields are in increasing order, and |subtype(page_ins_head)=
19146qi(255)|, so |page_ins_head| serves as a convenient sentinel
19147at the end of the list. A record is present for each insertion number that
19148appears in the current page.
19149
19150The |type| field in these nodes distinguishes two possibilities that
19151might occur as we look ahead before deciding on the optimum page break.
19152If |type(r)=inserting|, then |height(r)| contains the total of the
19153height-plus-depth dimensions of the box and all its inserts seen so far.
19154If |type(r)=split_up|, then no more insertions will be made into this box,
19155because at least one previous insertion was too big to fit on the current
19156page; |broken_ptr(r)| points to the node where that insertion will be
19157split, if \TeX\ decides to split it, |broken_ins(r)| points to the
19158insertion node that was tentatively split, and |height(r)| includes also the
19159natural height plus depth of the part that would be split off.
19160
19161In both cases, |last_ins_ptr(r)| points to the last |ins_node|
19162encountered for box |qo(subtype(r))| that would be at least partially
19163inserted on the next page; and |best_ins_ptr(r)| points to the last
19164such |ins_node| that should actually be inserted, to get the page with
19165minimum badness among all page breaks considered so far. We have
19166|best_ins_ptr(r)=null| if and only if no insertion for this box should
19167be made to produce this optimum page.
19168
19169The data structure definitions here use the fact that the |@!height| field
19170appears in the fourth word of a box node.
19171@^data structure assumptions@>
19172
19173@d page_ins_node_size=4 {number of words for a page insertion node}
19174@d inserting=0 {an insertion class that has not yet overflowed}
19175@d split_up=1 {an overflowed insertion class}
19176@d broken_ptr(#)==link(#+1)
19177  {an insertion for this class will break here if anywhere}
19178@d broken_ins(#)==info(#+1) {this insertion might break at |broken_ptr|}
19179@d last_ins_ptr(#)==link(#+2) {the most recent insertion for this |subtype|}
19180@d best_ins_ptr(#)==info(#+2) {the optimum most recent insertion}
19181
19182@<Initialize the special list heads...@>=
19183subtype(page_ins_head):=qi(255);
19184type(page_ins_head):=split_up; link(page_ins_head):=page_ins_head;
19185
19186@ An array |page_so_far| records the heights and depths of everything
19187on the current page. This array contains six |scaled| numbers, like the
19188similar arrays already considered in |line_break| and |vert_break|; and it
19189also contains |page_goal| and |page_depth|, since these values are
19190all accessible to the user via |set_page_dimen| commands. The
19191value of |page_so_far[1]| is also called |page_total|.  The stretch
19192and shrink components of the \.{\\skip} corrections for each insertion are
19193included in |page_so_far|, but the natural space components of these
19194corrections are not, since they have been subtracted from |page_goal|.
19195
19196The variable |page_depth| records the depth of the current page; it has been
19197adjusted so that it is at most |page_max_depth|. The variable
19198|last_glue| points to the glue specification of the most recent node
19199contributed from the contribution list, if this was a glue node; otherwise
19200|last_glue=max_halfword|. (If the contribution list is nonempty,
19201however, the value of |last_glue| is not necessarily accurate.)
19202The variables |last_penalty| and |last_kern| are similar.  And
19203finally, |insert_penalties| holds the sum of the penalties associated with
19204all split and floating insertions.
19205
19206@d page_goal==page_so_far[0] {desired height of information on page being built}
19207@d page_total==page_so_far[1] {height of the current page}
19208@d page_shrink==page_so_far[6] {shrinkability of the current page}
19209@d page_depth==page_so_far[7] {depth of the current page}
19210
19211@<Glob...@>=
19212@!page_so_far:array [0..7] of scaled; {height and glue of the current page}
19213@!last_glue:pointer; {used to implement \.{\\lastskip}}
19214@!last_penalty:integer; {used to implement \.{\\lastpenalty}}
19215@!last_kern:scaled; {used to implement \.{\\lastkern}}
19216@!insert_penalties:integer; {sum of the penalties for held-over insertions}
19217
19218@ @<Put each...@>=
19219primitive("pagegoal",set_page_dimen,0);
19220@!@:page_goal_}{\.{\\pagegoal} primitive@>
19221primitive("pagetotal",set_page_dimen,1);
19222@!@:page_total_}{\.{\\pagetotal} primitive@>
19223primitive("pagestretch",set_page_dimen,2);
19224@!@:page_stretch_}{\.{\\pagestretch} primitive@>
19225primitive("pagefilstretch",set_page_dimen,3);
19226@!@:page_fil_stretch_}{\.{\\pagefilstretch} primitive@>
19227primitive("pagefillstretch",set_page_dimen,4);
19228@!@:page_fill_stretch_}{\.{\\pagefillstretch} primitive@>
19229primitive("pagefilllstretch",set_page_dimen,5);
19230@!@:page_filll_stretch_}{\.{\\pagefilllstretch} primitive@>
19231primitive("pageshrink",set_page_dimen,6);
19232@!@:page_shrink_}{\.{\\pageshrink} primitive@>
19233primitive("pagedepth",set_page_dimen,7);
19234@!@:page_depth_}{\.{\\pagedepth} primitive@>
19235
19236@ @<Cases of |print_cmd_chr|...@>=
19237set_page_dimen: case chr_code of
192380: print_esc("pagegoal");
192391: print_esc("pagetotal");
192402: print_esc("pagestretch");
192413: print_esc("pagefilstretch");
192424: print_esc("pagefillstretch");
192435: print_esc("pagefilllstretch");
192446: print_esc("pageshrink");
19245othercases print_esc("pagedepth")
19246endcases;
19247
19248@ @d print_plus_end(#)==print(#);@+end
19249@d print_plus(#)==if page_so_far[#]<>0 then
19250  begin print(" plus "); print_scaled(page_so_far[#]); print_plus_end
19251
19252@p procedure print_totals;
19253begin print_scaled(page_total);
19254print_plus(2)("");
19255print_plus(3)("fil");
19256print_plus(4)("fill");
19257print_plus(5)("filll");
19258if page_shrink<>0 then
19259  begin print(" minus "); print_scaled(page_shrink);
19260  end;
19261end;
19262
19263@ @<Show the status of the current page@>=
19264if page_head<>page_tail then
19265  begin print_nl("### current page:");
19266  if output_active then print(" (held over for next output)");
19267@.held over for next output@>
19268  show_box(link(page_head));
19269  if page_contents>empty then
19270    begin print_nl("total height "); print_totals;
19271@:total_height}{\.{total height}@>
19272    print_nl(" goal height "); print_scaled(page_goal);
19273@.goal height@>
19274    r:=link(page_ins_head);
19275    while r<>page_ins_head do
19276      begin print_ln; print_esc("insert"); t:=qo(subtype(r));
19277      print_int(t); print(" adds ");
19278      if count(t)=1000 then t:=height(r)
19279      else t:=x_over_n(height(r),1000)*count(t);
19280      print_scaled(t);
19281      if type(r)=split_up then
19282        begin q:=page_head; t:=0;
19283        repeat q:=link(q);
19284        if (type(q)=ins_node)and(subtype(q)=subtype(r)) then incr(t);
19285        until q=broken_ins(r);
19286        print(", #"); print_int(t); print(" might split");
19287        end;
19288      r:=link(r);
19289      end;
19290    end;
19291  end
19292
19293@ Here is a procedure that is called when the |page_contents| is changing
19294from |empty| to |inserts_only| or |box_there|.
19295
19296@d set_page_so_far_zero(#)==page_so_far[#]:=0
19297
19298@p procedure freeze_page_specs(@!s:small_number);
19299begin page_contents:=s;
19300page_goal:=vsize; page_max_depth:=max_depth;
19301page_depth:=0; do_all_six(set_page_so_far_zero);
19302least_page_cost:=awful_bad;
19303@!stat if tracing_pages>0 then
19304  begin begin_diagnostic;
19305  print_nl("%% goal height="); print_scaled(page_goal);
19306@.goal height@>
19307  print(", max depth="); print_scaled(page_max_depth);
19308  end_diagnostic(false);
19309  end;@;@+tats@;@/
19310end;
19311
19312@ Pages are built by appending nodes to the current list in \TeX's
19313vertical mode, which is at the outermost level of the semantic nest. This
19314vlist is split into two parts; the ``current page'' that we have been
19315talking so much about already, and the ``contribution list'' that receives
19316new nodes as they are created.  The current page contains everything that
19317the page builder has accounted for in its data structures, as described
19318above, while the contribution list contains other things that have been
19319generated by other parts of \TeX\ but have not yet been
19320seen by the page builder.
19321The contribution list starts at |link(contrib_head)|, and it ends at the
19322current node in \TeX's vertical mode.
19323
19324When \TeX\ has appended new material in vertical mode, it calls the procedure
19325|build_page|, which tries to catch up by moving nodes from the contribution
19326list to the current page. This procedure will succeed in its goal of
19327emptying the contribution list, unless a page break is discovered, i.e.,
19328unless the current page has grown to the point where the optimum next
19329page break has been determined. In the latter case, the nodes after the
19330optimum break will go back onto the contribution list, and control will
19331effectively pass to the user's output routine.
19332
19333We make |type(page_head)=glue_node|, so that an initial glue node on
19334the current page will not be considered a valid breakpoint.
19335
19336@<Initialize the special list...@>=
19337type(page_head):=glue_node; subtype(page_head):=normal;
19338
19339@ The global variable |output_active| is true during the time the
19340user's output routine is driving \TeX.
19341
19342@<Glob...@>=
19343@!output_active:boolean; {are we in the midst of an output routine?}
19344
19345@ @<Set init...@>=
19346output_active:=false; insert_penalties:=0;
19347
19348@ The page builder is ready to start a fresh page if we initialize
19349the following state variables. (However, the page insertion list is initialized
19350elsewhere.)
19351
19352@<Start a new current page@>=
19353page_contents:=empty; page_tail:=page_head; link(page_head):=null;@/
19354last_glue:=max_halfword; last_penalty:=0; last_kern:=0;
19355page_depth:=0; page_max_depth:=0
19356
19357@ At certain times box 255 is supposed to be void (i.e., |null|),
19358or an insertion box is supposed to be ready to accept a vertical list.
19359If not, an error message is printed, and the following subroutine
19360flushes the unwanted contents, reporting them to the user.
19361
19362@p procedure box_error(@!n:eight_bits);
19363begin error; begin_diagnostic;
19364print_nl("The following box has been deleted:");
19365@.The following...deleted@>
19366show_box(box(n)); end_diagnostic(true);
19367flush_node_list(box(n)); box(n):=null;
19368end;
19369
19370@ The following procedure guarantees that a given box register
19371does not contain an \.{\\hbox}.
19372
19373@p procedure ensure_vbox(@!n:eight_bits);
19374var p:pointer; {the box register contents}
19375begin p:=box(n);
19376if p<>null then if type(p)=hlist_node then
19377  begin print_err("Insertions can only be added to a vbox");
19378@.Insertions can only...@>
19379  help3("Tut tut: You're trying to \insert into a")@/
19380    ("\box register that now contains an \hbox.")@/
19381    ("Proceed, and I'll discard its present contents.");
19382  box_error(n);
19383  end;
19384end;
19385
19386@ \TeX\ is not always in vertical mode at the time |build_page|
19387is called; the current mode reflects what \TeX\ should return to, after
19388the contribution list has been emptied. A call on |build_page| should
19389be immediately followed by `|goto big_switch|', which is \TeX's central
19390control point.
19391
19392@d contribute=80 {go here to link a node into the current page}
19393
19394@p @t\4@>@<Declare the procedure called |fire_up|@>@;@/
19395procedure build_page; {append contributions to the current page}
19396label exit,done,done1,continue,contribute,update_heights;
19397var p:pointer; {the node being appended}
19398@!q,@!r:pointer; {nodes being examined}
19399@!b,@!c:integer; {badness and cost of current page}
19400@!pi:integer; {penalty to be added to the badness}
19401@!n:min_quarterword..255; {insertion box number}
19402@!delta,@!h,@!w:scaled; {sizes used for insertion calculations}
19403begin if (link(contrib_head)=null)or output_active then return;
19404repeat continue: p:=link(contrib_head);@/
19405@<Update the values of |last_glue|, |last_penalty|, and |last_kern|@>;
19406@<Move node |p| to the current page; if it is time for a page break,
19407  put the nodes following the break back onto the contribution list,
19408  and |return| to the user's output routine if there is one@>;
19409until link(contrib_head)=null;
19410@<Make the contribution list empty by setting its tail to |contrib_head|@>;
19411exit:end;
19412
19413@ @d contrib_tail==nest[0].tail_field {tail of the contribution list}
19414
19415@<Make the contribution list empty...@>=
19416if nest_ptr=0 then tail:=contrib_head {vertical mode}
19417else contrib_tail:=contrib_head {other modes}
19418
19419@ @<Update the values of |last_glue|...@>=
19420if last_glue<>max_halfword then delete_glue_ref(last_glue);
19421last_penalty:=0; last_kern:=0;
19422if type(p)=glue_node then
19423  begin last_glue:=glue_ptr(p); add_glue_ref(last_glue);
19424  end
19425else  begin last_glue:=max_halfword;
19426  if type(p)=penalty_node then last_penalty:=penalty(p)
19427  else if type(p)=kern_node then last_kern:=width(p);
19428  end
19429
19430@ The code here is an example of a many-way switch into routines that
19431merge together in different places. Some people call this unstructured
19432programming, but the author doesn't see much wrong with it, as long as
19433@^Knuth, Donald Ervin@>
19434the various labels have a well-understood meaning.
19435
19436@<Move node |p| to the current page; ...@>=
19437@<If the current page is empty and node |p| is to be deleted, |goto done1|;
19438  otherwise use node |p| to update the state of the current page;
19439  if this node is an insertion, |goto contribute|; otherwise if this node
19440  is not a legal breakpoint, |goto contribute| or |update_heights|;
19441  otherwise set |pi| to the penalty associated with this breakpoint@>;
19442@<Check if node |p| is a new champion breakpoint; then \(if)if it is time for
19443  a page break, prepare for output, and either fire up the user's
19444  output routine and |return| or ship out the page and |goto done|@>;
19445if (type(p)<glue_node)or(type(p)>kern_node) then goto contribute;
19446update_heights:@<Update the current page measurements with respect to the
19447  glue or kern specified by node~|p|@>;
19448contribute: @<Make sure that |page_max_depth| is not exceeded@>;
19449@<Link node |p| into the current page and |goto done|@>;
19450done1:@<Recycle node |p|@>;
19451done:
19452
19453@ @<Link node |p| into the current page and |goto done|@>=
19454link(page_tail):=p; page_tail:=p;
19455link(contrib_head):=link(p); link(p):=null; goto done
19456
19457@ @<Recycle node |p|@>=
19458link(contrib_head):=link(p); link(p):=null; flush_node_list(p)
19459
19460@ The title of this section is already so long, it seems best to avoid
19461making it more accurate but still longer, by mentioning the fact that a
19462kern node at the end of the contribution list will not be contributed until
19463we know its successor.
19464
19465@<If the current page is empty...@>=
19466case type(p) of
19467hlist_node,vlist_node,rule_node: if page_contents<box_there then
19468    @<Initialize the current page, insert the \.{\\topskip} glue
19469      ahead of |p|, and |goto continue|@>
19470  else @<Prepare to move a box or rule node to the current page,
19471    then |goto contribute|@>;
19472whatsit_node: @<Prepare to move whatsit |p| to the current page,
19473  then |goto contribute|@>;
19474glue_node: if page_contents<box_there then goto done1
19475  else if precedes_break(page_tail) then pi:=0
19476  else goto update_heights;
19477kern_node: if page_contents<box_there then goto done1
19478  else if link(p)=null then return
19479  else if type(link(p))=glue_node then pi:=0
19480  else goto update_heights;
19481penalty_node: if page_contents<box_there then goto done1@+else pi:=penalty(p);
19482mark_node: goto contribute;
19483ins_node: @<Append an insertion to the current page and |goto contribute|@>;
19484othercases confusion("page")
19485@:this can't happen page}{\quad page@>
19486endcases
19487
19488@ @<Initialize the current page, insert the \.{\\topskip} glue...@>=
19489begin if page_contents=empty then freeze_page_specs(box_there)
19490else page_contents:=box_there;
19491q:=new_skip_param(top_skip_code); {now |temp_ptr=glue_ptr(q)|}
19492if width(temp_ptr)>height(p) then width(temp_ptr):=width(temp_ptr)-height(p)
19493else width(temp_ptr):=0;
19494link(q):=p; link(contrib_head):=q; goto continue;
19495end
19496
19497@ @<Prepare to move a box or rule node to the current page...@>=
19498begin page_total:=page_total+page_depth+height(p);
19499page_depth:=depth(p);
19500goto contribute;
19501end
19502
19503@ @<Make sure that |page_max_depth| is not exceeded@>=
19504if page_depth>page_max_depth then
19505  begin page_total:=@|
19506    page_total+page_depth-page_max_depth;@/
19507  page_depth:=page_max_depth;
19508  end;
19509
19510@ @<Update the current page measurements with respect to the glue...@>=
19511if type(p)=kern_node then q:=p
19512else begin q:=glue_ptr(p);
19513  page_so_far[2+stretch_order(q)]:=@|
19514    page_so_far[2+stretch_order(q)]+stretch(q);@/
19515  page_shrink:=page_shrink+shrink(q);
19516  if (shrink_order(q)<>normal)and(shrink(q)<>0) then
19517    begin@t@>@;@/
19518    print_err("Infinite glue shrinkage found on current page");@/
19519@.Infinite glue shrinkage...@>
19520    help4("The page about to be output contains some infinitely")@/
19521      ("shrinkable glue, e.g., `\vss' or `\vskip 0pt minus 1fil'.")@/
19522      ("Such glue doesn't belong there; but you can safely proceed,")@/
19523      ("since the offensive shrinkability has been made finite.");
19524    error;
19525    r:=new_spec(q); shrink_order(r):=normal; delete_glue_ref(q);
19526    glue_ptr(p):=r; q:=r;
19527    end;
19528  end;
19529page_total:=page_total+page_depth+width(q); page_depth:=0
19530
19531@ @<Check if node |p| is a new champion breakpoint; then \(if)...@>=
19532if pi<inf_penalty then
19533  begin @<Compute the badness, |b|, of the current page,
19534    using |awful_bad| if the box is too full@>;
19535  if b<awful_bad then
19536    if pi<=eject_penalty then c:=pi
19537    else  if b<inf_bad then c:=b+pi+insert_penalties
19538      else c:=deplorable
19539  else c:=b;
19540  if insert_penalties>=10000 then c:=awful_bad;
19541  @!stat if tracing_pages>0 then @<Display the page break cost@>;@+tats@;@/
19542  if c<=least_page_cost then
19543    begin best_page_break:=p; best_size:=page_goal;
19544    least_page_cost:=c;
19545    r:=link(page_ins_head);
19546    while r<>page_ins_head do
19547      begin best_ins_ptr(r):=last_ins_ptr(r);
19548      r:=link(r);
19549      end;
19550    end;
19551  if (c=awful_bad)or(pi<=eject_penalty) then
19552    begin fire_up(p); {output the current page at the best place}
19553    if output_active then return; {user's output routine will act}
19554    goto done; {the page has been shipped out by default output routine}
19555    end;
19556  end
19557
19558@ @<Display the page break cost@>=
19559begin begin_diagnostic; print_nl("%");
19560print(" t="); print_totals;@/
19561print(" g="); print_scaled(page_goal);@/
19562print(" b=");
19563if b=awful_bad then print_char("*")@+else print_int(b);
19564@.*\relax@>
19565print(" p="); print_int(pi);
19566print(" c=");
19567if c=awful_bad then print_char("*")@+else print_int(c);
19568if c<=least_page_cost then print_char("#");
19569end_diagnostic(false);
19570end
19571
19572@ @<Compute the badness, |b|, of the current page...@>=
19573if page_total<page_goal then
19574  if (page_so_far[3]<>0) or (page_so_far[4]<>0) or@|
19575    (page_so_far[5]<>0) then b:=0
19576  else b:=badness(page_goal-page_total,page_so_far[2])
19577else if page_total-page_goal>page_shrink then b:=awful_bad
19578else b:=badness(page_total-page_goal,page_shrink)
19579
19580@ @<Append an insertion to the current page and |goto contribute|@>=
19581begin if page_contents=empty then freeze_page_specs(inserts_only);
19582n:=subtype(p); r:=page_ins_head;
19583while n>=subtype(link(r)) do r:=link(r);
19584n:=qo(n);
19585if subtype(r)<>qi(n) then
19586  @<Create a page insertion node with |subtype(r)=qi(n)|, and
19587    include the glue correction for box |n| in the
19588    current page state@>;
19589if type(r)=split_up then insert_penalties:=insert_penalties+float_cost(p)
19590else  begin last_ins_ptr(r):=p;
19591  delta:=page_goal-page_total-page_depth+page_shrink;
19592    {this much room is left if we shrink the maximum}
19593  if count(n)=1000 then h:=height(p)
19594  else h:=x_over_n(height(p),1000)*count(n); {this much room is needed}
19595  if ((h<=0)or(h<=delta))and(height(p)+height(r)<=dimen(n)) then
19596    begin page_goal:=page_goal-h; height(r):=height(r)+height(p);
19597    end
19598  else @<Find the best way to split the insertion, and change
19599    |type(r)| to |split_up|@>;
19600  end;
19601goto contribute;
19602end
19603
19604@ We take note of the value of \.{\\skip} |n| and the height plus depth
19605of \.{\\box}~|n| only when the first \.{\\insert}~|n| node is
19606encountered for a new page. A user who changes the contents of \.{\\box}~|n|
19607after that first \.{\\insert}~|n| had better be either extremely careful
19608or extremely lucky, or both.
19609
19610@<Create a page insertion node...@>=
19611begin q:=get_node(page_ins_node_size); link(q):=link(r); link(r):=q; r:=q;
19612subtype(r):=qi(n); type(r):=inserting; ensure_vbox(n);
19613if box(n)=null then height(r):=0
19614else height(r):=height(box(n))+depth(box(n));
19615best_ins_ptr(r):=null;@/
19616q:=skip(n);
19617if count(n)=1000 then h:=height(r)
19618else h:=x_over_n(height(r),1000)*count(n);
19619page_goal:=page_goal-h-width(q);@/
19620page_so_far[2+stretch_order(q)]:=@|page_so_far[2+stretch_order(q)]+stretch(q);@/
19621page_shrink:=page_shrink+shrink(q);
19622if (shrink_order(q)<>normal)and(shrink(q)<>0) then
19623  begin print_err("Infinite glue shrinkage inserted from "); print_esc("skip");
19624@.Infinite glue shrinkage...@>
19625  print_int(n);
19626  help3("The correction glue for page breaking with insertions")@/
19627    ("must have finite shrinkability. But you may proceed,")@/
19628    ("since the offensive shrinkability has been made finite.");
19629  error;
19630  end;
19631end
19632
19633@ Here is the code that will split a long footnote between pages, in an
19634emergency. The current situation deserves to be recapitulated: Node |p|
19635is an insertion into box |n|; the insertion will not fit, in its entirety,
19636either because it would make the total contents of box |n| greater than
19637\.{\\dimen} |n|, or because it would make the incremental amount of growth
19638|h| greater than the available space |delta|, or both. (This amount |h| has
19639been weighted by the insertion scaling factor, i.e., by \.{\\count} |n|
19640over 1000.) Now we will choose the best way to break the vlist of the
19641insertion, using the same criteria as in the \.{\\vsplit} operation.
19642
19643@<Find the best way to split the insertion...@>=
19644begin if count(n)<=0 then w:=max_dimen
19645else  begin w:=page_goal-page_total-page_depth;
19646  if count(n)<>1000 then w:=x_over_n(w,count(n))*1000;
19647  end;
19648if w>dimen(n)-height(r) then w:=dimen(n)-height(r);
19649q:=vert_break(ins_ptr(p),w,depth(p));
19650height(r):=height(r)+best_height_plus_depth;
19651@!stat if tracing_pages>0 then @<Display the insertion split cost@>;@+tats@;@/
19652if count(n)<>1000 then
19653  best_height_plus_depth:=x_over_n(best_height_plus_depth,1000)*count(n);
19654page_goal:=page_goal-best_height_plus_depth;
19655type(r):=split_up; broken_ptr(r):=q; broken_ins(r):=p;
19656if q=null then insert_penalties:=insert_penalties+eject_penalty
19657else if type(q)=penalty_node then insert_penalties:=insert_penalties+penalty(q);
19658end
19659
19660@ @<Display the insertion split cost@>=
19661begin begin_diagnostic; print_nl("% split"); print_int(n);
19662@.split@>
19663print(" to "); print_scaled(w);
19664print_char(","); print_scaled(best_height_plus_depth);@/
19665print(" p=");
19666if q=null then print_int(eject_penalty)
19667else if type(q)=penalty_node then print_int(penalty(q))
19668else print_char("0");
19669end_diagnostic(false);
19670end
19671
19672@ When the page builder has looked at as much material as could appear before
19673the next page break, it makes its decision. The break that gave minimum
19674badness will be used to put a completed ``page'' into box 255, with insertions
19675appended to their other boxes.
19676
19677We also set the values of |top_mark|, |first_mark|, and |bot_mark|. The
19678program uses the fact that |bot_mark<>null| implies |first_mark<>null|;
19679it also knows that |bot_mark=null| implies |top_mark=first_mark=null|.
19680
19681The |fire_up| subroutine prepares to output the current page at the best
19682place; then it fires up the user's output routine, if there is one,
19683or it simply ships out the page. There is one parameter, |c|, which represents
19684the node that was being contributed to the page when the decision to
19685force an output was made.
19686
19687@<Declare the procedure called |fire_up|@>=
19688procedure fire_up(@!c:pointer);
19689label exit;
19690var p,@!q,@!r,@!s:pointer; {nodes being examined and/or changed}
19691@!prev_p:pointer; {predecessor of |p|}
19692@!n:min_quarterword..255; {insertion box number}
19693@!wait:boolean; {should the present insertion be held over?}
19694@!save_vbadness:integer; {saved value of |vbadness|}
19695@!save_vfuzz: scaled; {saved value of |vfuzz|}
19696@!save_split_top_skip: pointer; {saved value of |split_top_skip|}
19697begin @<Set the value of |output_penalty|@>;
19698if bot_mark<>null then
19699  begin if top_mark<>null then delete_token_ref(top_mark);
19700  top_mark:=bot_mark; add_token_ref(top_mark);
19701  delete_token_ref(first_mark); first_mark:=null;
19702  end;
19703@<Put the \(o)optimal current page into box 255, update |first_mark| and
19704  |bot_mark|, append insertions to their boxes, and put the
19705  remaining nodes back on the contribution list@>;
19706if (top_mark<>null)and(first_mark=null) then
19707  begin first_mark:=top_mark; add_token_ref(top_mark);
19708  end;
19709if output_routine<>null then
19710  if dead_cycles>=max_dead_cycles then
19711    @<Explain that too many dead cycles have occurred in a row@>
19712  else @<Fire up the user's output routine and |return|@>;
19713@<Perform the default output routine@>;
19714exit:end;
19715
19716@ @<Set the value of |output_penalty|@>=
19717if type(best_page_break)=penalty_node then
19718  begin geq_word_define(int_base+output_penalty_code,penalty(best_page_break));
19719  penalty(best_page_break):=inf_penalty;
19720  end
19721else geq_word_define(int_base+output_penalty_code,inf_penalty)
19722
19723@ As the page is finally being prepared for output,
19724pointer |p| runs through the vlist, with |prev_p| trailing behind;
19725pointer |q| is the tail of a list of insertions that
19726are being held over for a subsequent page.
19727
19728@<Put the \(o)optimal current page into box 255...@>=
19729if c=best_page_break then best_page_break:=null; {|c| not yet linked in}
19730@<Ensure that box 255 is empty before output@>;
19731insert_penalties:=0; {this will count the number of insertions held over}
19732save_split_top_skip:=split_top_skip;
19733if holding_inserts<=0 then
19734  @<Prepare all the boxes involved in insertions to act as queues@>;
19735q:=hold_head; link(q):=null; prev_p:=page_head; p:=link(prev_p);
19736while p<>best_page_break do
19737  begin if type(p)=ins_node then
19738    begin if holding_inserts<=0 then
19739       @<Either insert the material specified by node |p| into the
19740         appropriate box, or hold it for the next page;
19741         also delete node |p| from the current page@>;
19742    end
19743  else if type(p)=mark_node then @<Update the values of
19744    |first_mark| and |bot_mark|@>;
19745  prev_p:=p; p:=link(prev_p);
19746  end;
19747split_top_skip:=save_split_top_skip;
19748@<Break the current page at node |p|, put it in box~255,
19749  and put the remaining nodes on the contribution list@>;
19750@<Delete \(t)the page-insertion nodes@>
19751
19752@ @<Ensure that box 255 is empty before output@>=
19753if box(255)<>null then
19754  begin print_err(""); print_esc("box"); print("255 is not void");
19755@:box255}{\.{\\box255 is not void}@>
19756  help2("You shouldn't use \box255 except in \output routines.")@/
19757    ("Proceed, and I'll discard its present contents.");
19758  box_error(255);
19759  end
19760
19761@ @<Update the values of |first_mark| and |bot_mark|@>=
19762begin if first_mark=null then
19763  begin first_mark:=mark_ptr(p);
19764  add_token_ref(first_mark);
19765  end;
19766if bot_mark<>null then delete_token_ref(bot_mark);
19767bot_mark:=mark_ptr(p); add_token_ref(bot_mark);
19768end
19769
19770@ When the following code is executed, the current page runs from node
19771|link(page_head)| to node |prev_p|, and the nodes from |p| to |page_tail|
19772are to be placed back at the front of the contribution list. Furthermore
19773the heldover insertions appear in a list from |link(hold_head)| to |q|; we
19774will put them into the current page list for safekeeping while the user's
19775output routine is active.  We might have |q=hold_head|; and |p=null| if
19776and only if |prev_p=page_tail|. Error messages are suppressed within
19777|vpackage|, since the box might appear to be overfull or underfull simply
19778because the stretch and shrink from the \.{\\skip} registers for inserts
19779are not actually present in the box.
19780
19781@<Break the current page at node |p|, put it...@>=
19782if p<>null then
19783  begin if link(contrib_head)=null then
19784    if nest_ptr=0 then tail:=page_tail
19785    else contrib_tail:=page_tail;
19786  link(page_tail):=link(contrib_head);
19787  link(contrib_head):=p;
19788  link(prev_p):=null;
19789  end;
19790save_vbadness:=vbadness; vbadness:=inf_bad;
19791save_vfuzz:=vfuzz; vfuzz:=max_dimen; {inhibit error messages}
19792box(255):=vpackage(link(page_head),best_size,exactly,page_max_depth);
19793vbadness:=save_vbadness; vfuzz:=save_vfuzz;
19794if last_glue<>max_halfword then delete_glue_ref(last_glue);
19795@<Start a new current page@>; {this sets |last_glue:=max_halfword|}
19796if q<>hold_head then
19797  begin link(page_head):=link(hold_head); page_tail:=q;
19798  end
19799
19800@ If many insertions are supposed to go into the same box, we want to know
19801the position of the last node in that box, so that we don't need to waste time
19802when linking further information into it. The |last_ins_ptr| fields of the
19803page insertion nodes are therefore used for this purpose during the
19804packaging phase.
19805
19806@<Prepare all the boxes involved in insertions to act as queues@>=
19807begin r:=link(page_ins_head);
19808while r<>page_ins_head do
19809  begin if best_ins_ptr(r)<>null then
19810    begin n:=qo(subtype(r)); ensure_vbox(n);
19811    if box(n)=null then box(n):=new_null_box;
19812    p:=box(n)+list_offset;
19813    while link(p)<>null do p:=link(p);
19814    last_ins_ptr(r):=p;
19815    end;
19816  r:=link(r);
19817  end;
19818end
19819
19820@ @<Delete \(t)the page-insertion nodes@>=
19821r:=link(page_ins_head);
19822while r<>page_ins_head do
19823  begin q:=link(r); free_node(r,page_ins_node_size); r:=q;
19824  end;
19825link(page_ins_head):=page_ins_head
19826
19827@ We will set |best_ins_ptr:=null| and package the box corresponding to
19828insertion node~|r|, just after making the final insertion into that box.
19829If this final insertion is `|split_up|', the remainder after splitting
19830and pruning (if any) will be carried over to the next page.
19831
19832@<Either insert the material specified by node |p| into...@>=
19833begin r:=link(page_ins_head);
19834while subtype(r)<>subtype(p) do r:=link(r);
19835if best_ins_ptr(r)=null then wait:=true
19836else  begin wait:=false; s:=last_ins_ptr(r); link(s):=ins_ptr(p);
19837  if best_ins_ptr(r)=p then
19838    @<Wrap up the box specified by node |r|, splitting node |p| if
19839    called for; set |wait:=true| if node |p| holds a remainder after
19840    splitting@>
19841  else  begin while link(s)<>null do s:=link(s);
19842    last_ins_ptr(r):=s;
19843    end;
19844  end;
19845@<Either append the insertion node |p| after node |q|, and remove it
19846  from the current page, or delete |node(p)|@>;
19847end
19848
19849@ @<Wrap up the box specified by node |r|, splitting node |p| if...@>=
19850begin if type(r)=split_up then
19851  if (broken_ins(r)=p)and(broken_ptr(r)<>null) then
19852    begin while link(s)<>broken_ptr(r) do s:=link(s);
19853    link(s):=null;
19854    split_top_skip:=split_top_ptr(p);
19855    ins_ptr(p):=prune_page_top(broken_ptr(r));
19856    if ins_ptr(p)<>null then
19857      begin temp_ptr:=vpack(ins_ptr(p),natural);
19858      height(p):=height(temp_ptr)+depth(temp_ptr);
19859      free_node(temp_ptr,box_node_size); wait:=true;
19860      end;
19861    end;
19862best_ins_ptr(r):=null;
19863n:=qo(subtype(r));
19864temp_ptr:=list_ptr(box(n));
19865free_node(box(n),box_node_size);
19866box(n):=vpack(temp_ptr,natural);
19867end
19868
19869@ @<Either append the insertion node |p|...@>=
19870link(prev_p):=link(p); link(p):=null;
19871if wait then
19872  begin link(q):=p; q:=p; incr(insert_penalties);
19873  end
19874else  begin delete_glue_ref(split_top_ptr(p));
19875  free_node(p,ins_node_size);
19876  end;
19877p:=prev_p
19878
19879@ The list of heldover insertions, running from |link(page_head)| to
19880|page_tail|, must be moved to the contribution list when the user has
19881specified no output routine.
19882
19883@<Perform the default output routine@>=
19884begin if link(page_head)<>null then
19885  begin if link(contrib_head)=null then
19886    if nest_ptr=0 then tail:=page_tail@+else contrib_tail:=page_tail
19887  else link(page_tail):=link(contrib_head);
19888  link(contrib_head):=link(page_head);
19889  link(page_head):=null; page_tail:=page_head;
19890  end;
19891ship_out(box(255)); box(255):=null;
19892end
19893
19894@ @<Explain that too many dead cycles have occurred in a row@>=
19895begin print_err("Output loop---"); print_int(dead_cycles);
19896@.Output loop...@>
19897print(" consecutive dead cycles");
19898help3("I've concluded that your \output is awry; it never does a")@/
19899("\shipout, so I'm shipping \box255 out myself. Next time")@/
19900("increase \maxdeadcycles if you want me to be more patient!"); error;
19901end
19902
19903@ @<Fire up the user's output routine and |return|@>=
19904begin output_active:=true;
19905incr(dead_cycles);
19906push_nest; mode:=-vmode; prev_depth:=ignore_depth; mode_line:=-line;
19907begin_token_list(output_routine,output_text);
19908new_save_level(output_group); normal_paragraph;
19909scan_left_brace;
19910return;
19911end
19912
19913@ When the user's output routine finishes, it has constructed a vlist
19914in internal vertical mode, and \TeX\ will do the following:
19915
19916@<Resume the page builder after an output routine has come to an end@>=
19917begin if (loc<>null) or
19918 ((token_type<>output_text)and(token_type<>backed_up)) then
19919  @<Recover from an unbalanced output routine@>;
19920end_token_list; {conserve stack space in case more outputs are triggered}
19921end_graf; unsave; output_active:=false; insert_penalties:=0;@/
19922@<Ensure that box 255 is empty after output@>;
19923if tail<>head then {current list goes after heldover insertions}
19924  begin link(page_tail):=link(head);
19925  page_tail:=tail;
19926  end;
19927if link(page_head)<>null then {and both go before heldover contributions}
19928  begin if link(contrib_head)=null then contrib_tail:=page_tail;
19929  link(page_tail):=link(contrib_head);
19930  link(contrib_head):=link(page_head);
19931  link(page_head):=null; page_tail:=page_head;
19932  end;
19933pop_nest; build_page;
19934end
19935
19936@ @<Recover from an unbalanced output routine@>=
19937begin print_err("Unbalanced output routine");
19938@.Unbalanced output routine@>
19939help2("Your sneaky output routine has problematic {'s and/or }'s.")@/
19940("I can't handle that very well; good luck."); error;
19941repeat get_token;
19942until loc=null;
19943end {loops forever if reading from a file, since |null=min_halfword<=0|}
19944
19945@ @<Ensure that box 255 is empty after output@>=
19946if box(255)<>null then
19947  begin print_err("Output routine didn't use all of ");
19948  print_esc("box"); print_int(255);
19949@.Output routine didn't use...@>
19950  help3("Your \output commands should empty \box255,")@/
19951    ("e.g., by saying `\shipout\box255'.")@/
19952    ("Proceed; I'll discard its present contents.");
19953  box_error(255);
19954  end
19955
19956@* \[46] The chief executive.
19957We come now to the |main_control| routine, which contains the master
19958switch that causes all the various pieces of \TeX\ to do their things,
19959in the right order.
19960
19961In a sense, this is the grand climax of the program: It applies all the
19962tools that we have worked so hard to construct. In another sense, this is
19963the messiest part of the program: It necessarily refers to other pieces
19964of code all over the place, so that a person can't fully understand what is
19965going on without paging back and forth to be reminded of conventions that
19966are defined elsewhere. We are now at the hub of the web, the central nervous
19967system that touches most of the other parts and ties them together.
19968@^brain@>
19969
19970The structure of |main_control| itself is quite simple. There's a label
19971called |big_switch|, at which point the next token of input is fetched
19972using |get_x_token|. Then the program branches at high speed into one of
19973about 100 possible directions, based on the value of the current
19974mode and the newly fetched command code; the sum |abs(mode)+cur_cmd|
19975indicates what to do next. For example, the case `|vmode+letter|' arises
19976when a letter occurs in vertical mode (or internal vertical mode); this
19977case leads to instructions that initialize a new paragraph and enter
19978horizontal mode.
19979
19980The big |case| statement that contains this multiway switch has been labeled
19981|reswitch|, so that the program can |goto reswitch| when the next token
19982has already been fetched. Most of the cases are quite short; they call
19983an ``action procedure'' that does the work for that case, and then they
19984either |goto reswitch| or they ``fall through'' to the end of the |case|
19985statement, which returns control back to |big_switch|. Thus, |main_control|
19986is not an extremely large procedure, in spite of the multiplicity of things
19987it must do; it is small enough to be handled by \PASCAL\ compilers that put
19988severe restrictions on procedure size.
19989@!@^action procedure@>
19990
19991One case is singled out for special treatment, because it accounts for most
19992of \TeX's activities in typical applications. The process of reading simple
19993text and converting it into |char_node| records, while looking for ligatures
19994and kerns, is part of \TeX's ``inner loop''; the whole program runs
19995efficiently when its inner loop is fast, so this part has been written
19996with particular care.
19997
19998@ We shall concentrate first on the inner loop of |main_control|, deferring
19999consideration of the other cases until later.
20000@^inner loop@>
20001
20002@d big_switch=60 {go here to branch on the next token of input}
20003@d main_loop=70 {go here to typeset a string of consecutive characters}
20004@d main_loop_wrapup=80 {go here to finish a character or ligature}
20005@d main_loop_move=90 {go here to advance the ligature cursor}
20006@d main_loop_move_lig=95 {same, when advancing past a generated ligature}
20007@d main_loop_lookahead=100 {go here to bring in another character, if any}
20008@d main_lig_loop=110 {go here to check for ligatures or kerning}
20009@d append_normal_space=120 {go here to append a normal space between words}
20010
20011@p @t\4@>@<Declare action procedures for use by |main_control|@>@;
20012@t\4@>@<Declare the procedure called |handle_right_brace|@>@;
20013procedure main_control; {governs \TeX's activities}
20014label big_switch,reswitch,main_loop,main_loop_wrapup,
20015  main_loop_move,main_loop_move+1,main_loop_move+2,main_loop_move_lig,
20016  main_loop_lookahead,main_loop_lookahead+1,
20017  main_lig_loop,main_lig_loop+1,main_lig_loop+2,
20018  append_normal_space,exit;
20019var@!t:integer; {general-purpose temporary variable}
20020begin if every_job<>null then begin_token_list(every_job,every_job_text);
20021big_switch: get_x_token;@/
20022reswitch: @<Give diagnostic information, if requested@>;
20023case abs(mode)+cur_cmd of
20024hmode+letter,hmode+other_char,hmode+char_given: goto main_loop;
20025hmode+char_num: begin scan_char_num; cur_chr:=cur_val; goto main_loop;@+end;
20026hmode+no_boundary: begin get_x_token;
20027  if (cur_cmd=letter)or(cur_cmd=other_char)or(cur_cmd=char_given)or
20028   (cur_cmd=char_num) then cancel_boundary:=true;
20029  goto reswitch;
20030  end;
20031hmode+spacer: if space_factor=1000 then goto append_normal_space
20032  else app_space;
20033hmode+ex_space,mmode+ex_space: goto append_normal_space;
20034@t\4@>@<Cases of |main_control| that are not part of the inner loop@>@;
20035end; {of the big |case| statement}
20036goto big_switch;
20037main_loop:@<Append character |cur_chr| and the following characters (if~any)
20038  to the current hlist in the current font; |goto reswitch| when
20039  a non-character has been fetched@>;
20040append_normal_space:@<Append a normal inter-word space to the current list,
20041  then |goto big_switch|@>;
20042exit:end;
20043
20044@ When a new token has just been fetched at |big_switch|, we have an
20045ideal place to monitor \TeX's activity.
20046@^debugging@>
20047
20048@<Give diagnostic information, if requested@>=
20049if interrupt<>0 then if OK_to_interrupt then
20050  begin back_input; check_interrupt; goto big_switch;
20051  end;
20052@!debug if panicking then check_mem(false);@+@;@+gubed
20053if tracing_commands>0 then show_cur_cmd_chr
20054
20055@ The following part of the program was first written in a structured
20056manner, according to the philosophy that ``premature optimization is
20057the root of all evil.'' Then it was rearranged into pieces of
20058spaghetti so that the most common actions could proceed with little or
20059no redundancy.
20060
20061The original unoptimized form of this algorithm resembles the
20062|reconstitute| procedure, which was described earlier in connection with
20063hyphenation. Again we have an implied ``cursor'' between characters
20064|cur_l| and |cur_r|. The main difference is that the |lig_stack| can now
20065contain a charnode as well as pseudo-ligatures; that stack is now
20066usually nonempty, because the next character of input (if any) has been
20067appended to it. In |main_control| we have
20068$$|cur_r|=\cases{|character(lig_stack)|,&if |lig_stack>null|;\cr
20069  |font_bchar[cur_font]|,&otherwise;\cr}$$
20070except when |character(lig_stack)=font_false_bchar[cur_font]|.
20071Several additional global variables are needed.
20072
20073@<Glob...@>=
20074@!main_f:internal_font_number; {the current font}
20075@!main_i:four_quarters; {character information bytes for |cur_l|}
20076@!main_j:four_quarters; {ligature/kern command}
20077@!main_k:font_index; {index into |font_info|}
20078@!main_p:pointer; {temporary register for list manipulation}
20079@!main_s:integer; {space factor value}
20080@!bchar:halfword; {right boundary character of current font, or |non_char|}
20081@!false_bchar:halfword; {nonexistent character matching |bchar|, or |non_char|}
20082@!cancel_boundary:boolean; {should the left boundary be ignored?}
20083@!ins_disc:boolean; {should we insert a discretionary node?}
20084
20085@ The boolean variables of the main loop are normally false, and always reset
20086to false before the loop is left. That saves us the extra work of initializing
20087each time.
20088
20089@<Set init...@>=
20090ligature_present:=false; cancel_boundary:=false; lft_hit:=false; rt_hit:=false;
20091ins_disc:=false;
20092
20093@ We leave the |space_factor| unchanged if |sf_code(cur_chr)=0|; otherwise we
20094set it equal to |sf_code(cur_chr)|, except that it should never change
20095from a value less than 1000 to a value exceeding 1000. The most common
20096case is |sf_code(cur_chr)=1000|, so we want that case to be fast.
20097
20098The overall structure of the main loop is presented here. Some program labels
20099are inside the individual sections.
20100@^inner loop@>
20101
20102@d adjust_space_factor==@t@>@;@/
20103  main_s:=sf_code(cur_chr);
20104  if main_s=1000 then space_factor:=1000
20105  else if main_s<1000 then
20106    begin if main_s>0 then space_factor:=main_s;
20107    end
20108  else if space_factor<1000 then space_factor:=1000
20109  else space_factor:=main_s
20110
20111@<Append character |cur_chr|...@>=
20112adjust_space_factor;@/
20113main_f:=cur_font;
20114bchar:=font_bchar[main_f]; false_bchar:=font_false_bchar[main_f];
20115if mode>0 then if language<>clang then fix_language;
20116fast_get_avail(lig_stack); font(lig_stack):=main_f; cur_l:=qi(cur_chr);
20117character(lig_stack):=cur_l;@/
20118cur_q:=tail;
20119if cancel_boundary then
20120  begin cancel_boundary:=false; main_k:=non_address;
20121  end
20122else main_k:=bchar_label[main_f];
20123if main_k=non_address then goto main_loop_move+2; {no left boundary processing}
20124cur_r:=cur_l; cur_l:=non_char;
20125goto main_lig_loop+1; {begin with cursor after left boundary}
20126@#
20127main_loop_wrapup:@<Make a ligature node, if |ligature_present|;
20128  insert a null discretionary, if appropriate@>;
20129main_loop_move:@<If the cursor is immediately followed by the right boundary,
20130  |goto reswitch|; if it's followed by an invalid character, |goto big_switch|;
20131  otherwise move the cursor one step to the right and |goto main_lig_loop|@>;
20132main_loop_lookahead:@<Look ahead for another character, or leave |lig_stack|
20133  empty if there's none there@>;
20134main_lig_loop:@<If there's a ligature/kern command relevant to |cur_l| and
20135  |cur_r|, adjust the text appropriately; exit to |main_loop_wrapup|@>;
20136main_loop_move_lig:@<Move the cursor past a pseudo-ligature, then
20137  |goto main_loop_lookahead| or |main_lig_loop|@>
20138
20139@ If |link(cur_q)| is nonnull when |wrapup| is invoked, |cur_q| points to
20140the list of characters that were consumed while building the ligature
20141character~|cur_l|.
20142
20143A discretionary break is not inserted for an explicit hyphen when we are in
20144restricted horizontal mode. In particular, this avoids putting discretionary
20145nodes inside of other discretionaries.
20146@^inner loop@>
20147
20148@d pack_lig(#)== {the parameter is either |rt_hit| or |false|}
20149  begin main_p:=new_ligature(main_f,cur_l,link(cur_q));
20150  if lft_hit then
20151    begin subtype(main_p):=2; lft_hit:=false;
20152    end;
20153  if # then if lig_stack=null then
20154    begin incr(subtype(main_p)); rt_hit:=false;
20155    end;
20156  link(cur_q):=main_p; tail:=main_p; ligature_present:=false;
20157  end
20158
20159@d wrapup(#)==if cur_l<non_char then
20160  begin if link(cur_q)>null then
20161    if character(tail)=qi(hyphen_char[main_f]) then ins_disc:=true;
20162  if ligature_present then pack_lig(#);
20163  if ins_disc then
20164    begin ins_disc:=false;
20165    if mode>0 then tail_append(new_disc);
20166    end;
20167  end
20168
20169@<Make a ligature node, if |ligature_present|;...@>=
20170wrapup(rt_hit)
20171
20172@ @<If the cursor is immediately followed by the right boundary...@>=
20173@^inner loop@>
20174if lig_stack=null then goto reswitch;
20175cur_q:=tail; cur_l:=character(lig_stack);
20176main_loop_move+1:if not is_char_node(lig_stack) then goto main_loop_move_lig;
20177main_loop_move+2:if(cur_chr<font_bc[main_f])or(cur_chr>font_ec[main_f]) then
20178  begin char_warning(main_f,cur_chr); free_avail(lig_stack); goto big_switch;
20179  end;
20180main_i:=char_info(main_f)(cur_l);
20181if not char_exists(main_i) then
20182  begin char_warning(main_f,cur_chr); free_avail(lig_stack); goto big_switch;
20183  end;
20184link(tail):=lig_stack; tail:=lig_stack {|main_loop_lookahead| is next}
20185
20186@ Here we are at |main_loop_move_lig|.
20187When we begin this code we have |cur_q=tail| and |cur_l=character(lig_stack)|.
20188
20189@<Move the cursor past a pseudo-ligature...@>=
20190main_p:=lig_ptr(lig_stack);
20191if main_p>null then tail_append(main_p); {append a single character}
20192temp_ptr:=lig_stack; lig_stack:=link(temp_ptr);
20193free_node(temp_ptr,small_node_size);
20194main_i:=char_info(main_f)(cur_l); ligature_present:=true;
20195if lig_stack=null then
20196  if main_p>null then goto main_loop_lookahead
20197  else cur_r:=bchar
20198else cur_r:=character(lig_stack);
20199goto main_lig_loop
20200
20201@ The result of \.{\\char} can participate in a ligature or kern, so we must
20202look ahead for it.
20203
20204@<Look ahead for another character...@>=
20205get_next; {set only |cur_cmd| and |cur_chr|, for speed}
20206if cur_cmd=letter then goto main_loop_lookahead+1;
20207if cur_cmd=other_char then goto main_loop_lookahead+1;
20208if cur_cmd=char_given then goto main_loop_lookahead+1;
20209x_token; {now expand and set |cur_cmd|, |cur_chr|, |cur_tok|}
20210if cur_cmd=letter then goto main_loop_lookahead+1;
20211if cur_cmd=other_char then goto main_loop_lookahead+1;
20212if cur_cmd=char_given then goto main_loop_lookahead+1;
20213if cur_cmd=char_num then
20214  begin scan_char_num; cur_chr:=cur_val; goto main_loop_lookahead+1;
20215  end;
20216if cur_cmd=no_boundary then bchar:=non_char;
20217cur_r:=bchar; lig_stack:=null; goto main_lig_loop;
20218main_loop_lookahead+1: adjust_space_factor;
20219fast_get_avail(lig_stack); font(lig_stack):=main_f;
20220cur_r:=qi(cur_chr); character(lig_stack):=cur_r;
20221if cur_r=false_bchar then cur_r:=non_char {this prevents spurious ligatures}
20222
20223@ Even though comparatively few characters have a lig/kern program, several
20224of the instructions here count as part of \TeX's inner loop, since a
20225@^inner loop@>
20226potentially long sequential search must be performed. For example, tests with
20227Computer Modern Roman showed that about 40 per cent of all characters
20228actually encountered in practice had a lig/kern program, and that about four
20229lig/kern commands were investigated for every such character.
20230
20231At the beginning of this code we have |main_i=char_info(main_f)(cur_l)|.
20232
20233@<If there's a ligature/kern command...@>=
20234if char_tag(main_i)<>lig_tag then goto main_loop_wrapup;
20235if cur_r=non_char then goto main_loop_wrapup;
20236main_k:=lig_kern_start(main_f)(main_i); main_j:=font_info[main_k].qqqq;
20237if skip_byte(main_j)<=stop_flag then goto main_lig_loop+2;
20238main_k:=lig_kern_restart(main_f)(main_j);
20239main_lig_loop+1:main_j:=font_info[main_k].qqqq;
20240main_lig_loop+2:if next_char(main_j)=cur_r then
20241 if skip_byte(main_j)<=stop_flag then
20242  @<Do ligature or kern command, returning to |main_lig_loop|
20243  or |main_loop_wrapup| or |main_loop_move|@>;
20244if skip_byte(main_j)=qi(0) then incr(main_k)
20245else begin if skip_byte(main_j)>=stop_flag then goto main_loop_wrapup;
20246  main_k:=main_k+qo(skip_byte(main_j))+1;
20247  end;
20248goto main_lig_loop+1
20249
20250@ When a ligature or kern instruction matches a character, we know from
20251|read_font_info| that the character exists in the font, even though we
20252haven't verified its existence in the normal way.
20253
20254This section could be made into a subroutine, if the code inside
20255|main_control| needs to be shortened.
20256
20257\chardef\?='174 % vertical line to indicate character retention
20258
20259@<Do ligature or kern command...@>=
20260begin if op_byte(main_j)>=kern_flag then
20261  begin wrapup(rt_hit);
20262  tail_append(new_kern(char_kern(main_f)(main_j))); goto main_loop_move;
20263  end;
20264if cur_l=non_char then lft_hit:=true
20265else if lig_stack=null then rt_hit:=true;
20266check_interrupt; {allow a way out in case there's an infinite ligature loop}
20267case op_byte(main_j) of
20268qi(1),qi(5):begin cur_l:=rem_byte(main_j); {\.{=:\?}, \.{=:\?>}}
20269  main_i:=char_info(main_f)(cur_l); ligature_present:=true;
20270  end;
20271qi(2),qi(6):begin cur_r:=rem_byte(main_j); {\.{\?=:}, \.{\?=:>}}
20272  if lig_stack=null then {right boundary character is being consumed}
20273    begin lig_stack:=new_lig_item(cur_r); bchar:=non_char;
20274    end
20275  else if is_char_node(lig_stack) then {|link(lig_stack)=null|}
20276    begin main_p:=lig_stack; lig_stack:=new_lig_item(cur_r);
20277    lig_ptr(lig_stack):=main_p;
20278    end
20279  else character(lig_stack):=cur_r;
20280  end;
20281qi(3):begin cur_r:=rem_byte(main_j); {\.{\?=:\?}}
20282  main_p:=lig_stack; lig_stack:=new_lig_item(cur_r);
20283  link(lig_stack):=main_p;
20284  end;
20285qi(7),qi(11):begin wrapup(false); {\.{\?=:\?>}, \.{\?=:\?>>}}
20286  cur_q:=tail; cur_l:=rem_byte(main_j);
20287  main_i:=char_info(main_f)(cur_l); ligature_present:=true;
20288  end;
20289othercases begin cur_l:=rem_byte(main_j); ligature_present:=true; {\.{=:}}
20290  if lig_stack=null then goto main_loop_wrapup
20291  else goto main_loop_move+1;
20292  end
20293endcases;
20294if op_byte(main_j)>qi(4) then
20295  if op_byte(main_j)<>qi(7) then goto main_loop_wrapup;
20296if cur_l<non_char then goto main_lig_loop;
20297main_k:=bchar_label[main_f]; goto main_lig_loop+1;
20298end
20299
20300@ The occurrence of blank spaces is almost part of \TeX's inner loop,
20301@^inner loop@>
20302since we usually encounter about one space for every five non-blank characters.
20303Therefore |main_control| gives second-highest priority to ordinary spaces.
20304
20305When a glue parameter like \.{\\spaceskip} is set to `\.{0pt}', we will
20306see to it later that the corresponding glue specification is precisely
20307|zero_glue|, not merely a pointer to some specification that happens
20308to be full of zeroes. Therefore it is simple to test whether a glue parameter
20309is zero or~not.
20310
20311@<Append a normal inter-word space...@>=
20312if space_skip=zero_glue then
20313  begin @<Find the glue specification, |main_p|, for
20314    text spaces in the current font@>;
20315  temp_ptr:=new_glue(main_p);
20316  end
20317else temp_ptr:=new_param_glue(space_skip_code);
20318link(tail):=temp_ptr; tail:=temp_ptr;
20319goto big_switch
20320
20321@ Having |font_glue| allocated for each text font saves both time and memory.
20322If any of the three spacing parameters are subsequently changed by the
20323use of \.{\\fontdimen}, the |find_font_dimen| procedure deallocates the
20324|font_glue| specification allocated here.
20325
20326@<Find the glue specification...@>=
20327begin main_p:=font_glue[cur_font];
20328if main_p=null then
20329  begin main_p:=new_spec(zero_glue); main_k:=param_base[cur_font]+space_code;
20330  width(main_p):=font_info[main_k].sc; {that's |space(cur_font)|}
20331  stretch(main_p):=font_info[main_k+1].sc; {and |space_stretch(cur_font)|}
20332  shrink(main_p):=font_info[main_k+2].sc; {and |space_shrink(cur_font)|}
20333  font_glue[cur_font]:=main_p;
20334  end;
20335end
20336
20337@ @<Declare act...@>=
20338procedure app_space; {handle spaces when |space_factor<>1000|}
20339var@!q:pointer; {glue node}
20340begin if (space_factor>=2000)and(xspace_skip<>zero_glue) then
20341  q:=new_param_glue(xspace_skip_code)
20342else  begin if space_skip<>zero_glue then main_p:=space_skip
20343  else @<Find the glue specification...@>;
20344  main_p:=new_spec(main_p);
20345  @<Modify the glue specification in |main_p| according to the space factor@>;
20346  q:=new_glue(main_p); glue_ref_count(main_p):=null;
20347  end;
20348link(tail):=q; tail:=q;
20349end;
20350
20351@ @<Modify the glue specification in |main_p| according to the space factor@>=
20352if space_factor>=2000 then width(main_p):=width(main_p)+extra_space(cur_font);
20353stretch(main_p):=xn_over_d(stretch(main_p),space_factor,1000);
20354shrink(main_p):=xn_over_d(shrink(main_p),1000,space_factor)
20355
20356@ Whew---that covers the main loop. We can now proceed at a leisurely
20357pace through the other combinations of possibilities.
20358
20359@d any_mode(#)==vmode+#,hmode+#,mmode+# {for mode-independent commands}
20360
20361@<Cases of |main_control| that are not part of the inner loop@>=
20362any_mode(relax),vmode+spacer,mmode+spacer,mmode+no_boundary:do_nothing;
20363any_mode(ignore_spaces): begin @<Get the next non-blank non-call...@>;
20364  goto reswitch;
20365  end;
20366vmode+stop: if its_all_over then return; {this is the only way out}
20367@t\4@>@<Forbidden cases detected in |main_control|@>@+@,any_mode(mac_param):
20368  report_illegal_case;
20369@<Math-only cases in non-math modes, or vice versa@>: insert_dollar_sign;
20370@t\4@>@<Cases of |main_control| that build boxes and lists@>@;
20371@t\4@>@<Cases of |main_control| that don't depend on |mode|@>@;
20372@t\4@>@<Cases of |main_control| that are for extensions to \TeX@>@;
20373
20374@ Here is a list of cases where the user has probably gotten into or out of math
20375mode by mistake. \TeX\ will insert a dollar sign and rescan the current token.
20376
20377@d non_math(#)==vmode+#,hmode+#
20378
20379@<Math-only cases in non-math modes...@>=
20380non_math(sup_mark), non_math(sub_mark), non_math(math_char_num),
20381non_math(math_given), non_math(math_comp), non_math(delim_num),
20382non_math(left_right), non_math(above), non_math(radical),
20383non_math(math_style), non_math(math_choice), non_math(vcenter),
20384non_math(non_script), non_math(mkern), non_math(limit_switch),
20385non_math(mskip), non_math(math_accent),
20386mmode+endv, mmode+par_end, mmode+stop, mmode+vskip, mmode+un_vbox,
20387mmode+valign, mmode+hrule
20388
20389@ @<Declare action...@>=
20390procedure insert_dollar_sign;
20391begin back_input; cur_tok:=math_shift_token+"$";
20392print_err("Missing $ inserted");
20393@.Missing \$ inserted@>
20394help2("I've inserted a begin-math/end-math symbol since I think")@/
20395("you left one out. Proceed, with fingers crossed."); ins_error;
20396end;
20397
20398@ When erroneous situations arise, \TeX\ usually issues an error message
20399specific to the particular error. For example, `\.{\\noalign}' should
20400not appear in any mode, since it is recognized by the |align_peek| routine
20401in all of its legitimate appearances; a special error message is given
20402when `\.{\\noalign}' occurs elsewhere. But sometimes the most appropriate
20403error message is simply that the user is not allowed to do what he or she
20404has attempted. For example, `\.{\\moveleft}' is allowed only in vertical mode,
20405and `\.{\\lower}' only in non-vertical modes.  Such cases are enumerated
20406here and in the other sections referred to under `See also \dots.'
20407
20408@<Forbidden cases...@>=
20409vmode+vmove,hmode+hmove,mmode+hmove,any_mode(last_item),
20410
20411@ The `|you_cant|' procedure prints a line saying that the current command
20412is illegal in the current mode; it identifies these things symbolically.
20413
20414@<Declare action...@>=
20415procedure you_cant;
20416begin print_err("You can't use `");
20417@.You can't use x in y mode@>
20418print_cmd_chr(cur_cmd,cur_chr);
20419print("' in "); print_mode(mode);
20420end;
20421
20422@ @<Declare act...@>=
20423procedure report_illegal_case;
20424begin you_cant;
20425help4("Sorry, but I'm not programmed to handle this case;")@/
20426("I'll just pretend that you didn't ask for it.")@/
20427("If you're in the wrong mode, you might be able to")@/
20428("return to the right one by typing `I}' or `I$' or `I\par'.");@/
20429error;
20430end;
20431
20432@ Some operations are allowed only in privileged modes, i.e., in cases
20433that |mode>0|. The |privileged| function is used to detect violations
20434of this rule; it issues an error message and returns |false| if the
20435current |mode| is negative.
20436
20437@<Declare act...@>=
20438function privileged:boolean;
20439begin if mode>0 then privileged:=true
20440else  begin report_illegal_case; privileged:=false;
20441  end;
20442end;
20443
20444@ Either \.{\\dump} or \.{\\end} will cause |main_control| to enter the
20445endgame, since both of them have `|stop|' as their command code.
20446
20447@<Put each...@>=
20448primitive("end",stop,0);@/
20449@!@:end_}{\.{\\end} primitive@>
20450primitive("dump",stop,1);@/
20451@!@:dump_}{\.{\\dump} primitive@>
20452
20453@ @<Cases of |print_cmd_chr|...@>=
20454stop:if chr_code=1 then print_esc("dump")@+else print_esc("end");
20455
20456@ We don't want to leave |main_control| immediately when a |stop| command
20457is sensed, because it may be necessary to invoke an \.{\\output} routine
20458several times before things really grind to a halt. (The output routine
20459might even say `\.{\\gdef\\end\{...\}}', to prolong the life of the job.)
20460Therefore |its_all_over| is |true| only when the current page
20461and contribution list are empty, and when the last output was not a
20462``dead cycle.''
20463
20464@<Declare act...@>=
20465function its_all_over:boolean; {do this when \.{\\end} or \.{\\dump} occurs}
20466label exit;
20467begin if privileged then
20468  begin if (page_head=page_tail)and(head=tail)and(dead_cycles=0) then
20469    begin its_all_over:=true; return;
20470    end;
20471  back_input; {we will try to end again after ejecting residual material}
20472  tail_append(new_null_box);
20473  width(tail):=hsize;
20474  tail_append(new_glue(fill_glue));
20475  tail_append(new_penalty(-@'10000000000));@/
20476  build_page; {append \.{\\hbox to \\hsize\{\}\\vfill\\penalty-'10000000000}}
20477  end;
20478its_all_over:=false;
20479exit:end;
20480
20481@* \[47] Building boxes and lists.
20482The most important parts of |main_control| are concerned with \TeX's
20483chief mission of box-making. We need to control the activities that put
20484entries on vlists and hlists, as well as the activities that convert
20485those lists into boxes. All of the necessary machinery has already been
20486developed; it remains for us to ``push the buttons'' at the right times.
20487
20488@ As an introduction to these routines, let's consider one of the simplest
20489cases: What happens when `\.{\\hrule}' occurs in vertical mode, or
20490`\.{\\vrule}' in horizontal mode or math mode? The code in |main_control|
20491is short, since the |scan_rule_spec| routine already does most of what is
20492required; thus, there is no need for a special action procedure.
20493
20494Note that baselineskip calculations are disabled after a rule in vertical
20495mode, by setting |prev_depth:=ignore_depth|.
20496
20497@<Cases of |main_control| that build...@>=
20498vmode+hrule,hmode+vrule,mmode+vrule: begin tail_append(scan_rule_spec);
20499  if abs(mode)=vmode then prev_depth:=ignore_depth
20500  else if abs(mode)=hmode then space_factor:=1000;
20501  end;
20502
20503@ The processing of things like \.{\\hskip} and \.{\\vskip} is slightly
20504more complicated. But the code in |main_control| is very short, since
20505it simply calls on the action routine |append_glue|. Similarly, \.{\\kern}
20506activates |append_kern|.
20507
20508@<Cases of |main_control| that build...@>=
20509vmode+vskip,hmode+hskip,mmode+hskip,mmode+mskip: append_glue;
20510any_mode(kern),mmode+mkern: append_kern;
20511
20512@ The |hskip| and |vskip| command codes are used for control sequences
20513like \.{\\hss} and \.{\\vfil} as well as for \.{\\hskip} and \.{\\vskip}.
20514The difference is in the value of |cur_chr|.
20515
20516@d fil_code=0 {identifies \.{\\hfil} and \.{\\vfil}}
20517@d fill_code=1 {identifies \.{\\hfill} and \.{\\vfill}}
20518@d ss_code=2 {identifies \.{\\hss} and \.{\\vss}}
20519@d fil_neg_code=3 {identifies \.{\\hfilneg} and \.{\\vfilneg}}
20520@d skip_code=4 {identifies \.{\\hskip} and \.{\\vskip}}
20521@d mskip_code=5 {identifies \.{\\mskip}}
20522
20523@<Put each...@>=
20524primitive("hskip",hskip,skip_code);@/
20525@!@:hskip_}{\.{\\hskip} primitive@>
20526primitive("hfil",hskip,fil_code);
20527@!@:hfil_}{\.{\\hfil} primitive@>
20528primitive("hfill",hskip,fill_code);@/
20529@!@:hfill_}{\.{\\hfill} primitive@>
20530primitive("hss",hskip,ss_code);
20531@!@:hss_}{\.{\\hss} primitive@>
20532primitive("hfilneg",hskip,fil_neg_code);@/
20533@!@:hfil_neg_}{\.{\\hfilneg} primitive@>
20534primitive("vskip",vskip,skip_code);@/
20535@!@:vskip_}{\.{\\vskip} primitive@>
20536primitive("vfil",vskip,fil_code);
20537@!@:vfil_}{\.{\\vfil} primitive@>
20538primitive("vfill",vskip,fill_code);@/
20539@!@:vfill_}{\.{\\vfill} primitive@>
20540primitive("vss",vskip,ss_code);
20541@!@:vss_}{\.{\\vss} primitive@>
20542primitive("vfilneg",vskip,fil_neg_code);@/
20543@!@:vfil_neg_}{\.{\\vfilneg} primitive@>
20544primitive("mskip",mskip,mskip_code);@/
20545@!@:mskip_}{\.{\\mskip} primitive@>
20546primitive("kern",kern,explicit);
20547@!@:kern_}{\.{\\kern} primitive@>
20548primitive("mkern",mkern,mu_glue);@/
20549@!@:mkern_}{\.{\\mkern} primitive@>
20550
20551@ @<Cases of |print_cmd_chr|...@>=
20552hskip: case chr_code of
20553  skip_code:print_esc("hskip");
20554  fil_code:print_esc("hfil");
20555  fill_code:print_esc("hfill");
20556  ss_code:print_esc("hss");
20557  othercases print_esc("hfilneg")
20558  endcases;
20559vskip: case chr_code of
20560  skip_code:print_esc("vskip");
20561  fil_code:print_esc("vfil");
20562  fill_code:print_esc("vfill");
20563  ss_code:print_esc("vss");
20564  othercases print_esc("vfilneg")
20565  endcases;
20566mskip: print_esc("mskip");
20567kern: print_esc("kern");
20568mkern: print_esc("mkern");
20569
20570@ All the work relating to glue creation has been relegated to the
20571following subroutine. It does not call |build_page|, because it is
20572used in at least one place where that would be a mistake.
20573
20574@<Declare action...@>=
20575procedure append_glue;
20576var s:small_number; {modifier of skip command}
20577begin s:=cur_chr;
20578case s of
20579fil_code: cur_val:=fil_glue;
20580fill_code: cur_val:=fill_glue;
20581ss_code: cur_val:=ss_glue;
20582fil_neg_code: cur_val:=fil_neg_glue;
20583skip_code: scan_glue(glue_val);
20584mskip_code: scan_glue(mu_val);
20585end; {now |cur_val| points to the glue specification}
20586tail_append(new_glue(cur_val));
20587if s>=skip_code then
20588  begin decr(glue_ref_count(cur_val));
20589  if s>skip_code then subtype(tail):=mu_glue;
20590  end;
20591end;
20592
20593@ @<Declare act...@>=
20594procedure append_kern;
20595var s:quarterword; {|subtype| of the kern node}
20596begin s:=cur_chr; scan_dimen(s=mu_glue,false,false);
20597tail_append(new_kern(cur_val)); subtype(tail):=s;
20598end;
20599
20600@ Many of the actions related to box-making are triggered by the appearance
20601of braces in the input. For example, when the user says `\.{\\hbox}
20602\.{to} \.{100pt\{$\langle\,\hbox{hlist}\,\rangle$\}}' in vertical mode,
20603the information about the box size (100pt, |exactly|) is put onto |save_stack|
20604with a level boundary word just above it, and |cur_group:=adjusted_hbox_group|;
20605\TeX\ enters restricted horizontal mode to process the hlist. The right
20606brace eventually causes |save_stack| to be restored to its former state,
20607at which time the information about the box size (100pt, |exactly|) is
20608available once again; a box is packaged and we leave restricted horizontal
20609mode, appending the new box to the current list of the enclosing mode
20610(in this case to the current list of vertical mode), followed by any
20611vertical adjustments that were removed from the box by |hpack|.
20612
20613The next few sections of the program are therefore concerned with the
20614treatment of left and right curly braces.
20615
20616@ If a left brace occurs in the middle of a page or paragraph, it simply
20617introduces a new level of grouping, and the matching right brace will not have
20618such a drastic effect. Such grouping affects neither the mode nor the
20619current list.
20620
20621@<Cases of |main_control| that build...@>=
20622non_math(left_brace): new_save_level(simple_group);
20623any_mode(begin_group): new_save_level(semi_simple_group);
20624any_mode(end_group): if cur_group=semi_simple_group then unsave
20625  else off_save;
20626
20627@ We have to deal with errors in which braces and such things are not
20628properly nested. Sometimes the user makes an error of commission by
20629inserting an extra symbol, but sometimes the user makes an error of omission.
20630\TeX\ can't always tell one from the other, so it makes a guess and tries
20631to avoid getting into a loop.
20632
20633The |off_save| routine is called when the current group code is wrong. It tries
20634to insert something into the user's input that will help clean off
20635the top level.
20636
20637@<Declare act...@>=
20638procedure off_save;
20639var p:pointer; {inserted token}
20640begin if cur_group=bottom_level then
20641  @<Drop current token and complain that it was unmatched@>
20642else  begin back_input; p:=get_avail; link(temp_head):=p;
20643  print_err("Missing ");
20644  @<Prepare to insert a token that matches |cur_group|,
20645    and print what it is@>;
20646  print(" inserted"); ins_list(link(temp_head));
20647  help5("I've inserted something that you may have forgotten.")@/
20648  ("(See the <inserted text> above.)")@/
20649  ("With luck, this will get me unwedged. But if you")@/
20650  ("really didn't forget anything, try typing `2' now; then")@/
20651  ("my insertion and my current dilemma will both disappear.");
20652  error;
20653  end;
20654end;
20655
20656@ At this point, |link(temp_head)=p|, a pointer to an empty one-word node.
20657
20658@<Prepare to insert a token that matches |cur_group|...@>=
20659case cur_group of
20660semi_simple_group: begin info(p):=cs_token_flag+frozen_end_group;
20661  print_esc("endgroup");
20662@.Missing \\endgroup inserted@>
20663  end;
20664math_shift_group: begin info(p):=math_shift_token+"$"; print_char("$");
20665@.Missing \$ inserted@>
20666  end;
20667math_left_group: begin info(p):=cs_token_flag+frozen_right; link(p):=get_avail;
20668  p:=link(p); info(p):=other_token+"."; print_esc("right.");
20669@.Missing \\right\hbox{.} inserted@>
20670@^null delimiter@>
20671  end;
20672othercases begin info(p):=right_brace_token+"}"; print_char("}");
20673@.Missing \} inserted@>
20674  end
20675endcases
20676
20677@ @<Drop current token and complain that it was unmatched@>=
20678begin print_err("Extra "); print_cmd_chr(cur_cmd,cur_chr);
20679@.Extra x@>
20680help1("Things are pretty mixed up, but I think the worst is over.");@/
20681error;
20682end
20683
20684@ The routine for a |right_brace| character branches into many subcases,
20685since a variety of things may happen, depending on |cur_group|. Some
20686types of groups are not supposed to be ended by a right brace; error
20687messages are given in hopes of pinpointing the problem. Most branches
20688of this routine will be filled in later, when we are ready to understand
20689them; meanwhile, we must prepare ourselves to deal with such errors.
20690
20691@<Cases of |main_control| that build...@>=
20692any_mode(right_brace): handle_right_brace;
20693
20694@ @<Declare the procedure called |handle_right_brace|@>=
20695procedure handle_right_brace;
20696var p,@!q:pointer; {for short-term use}
20697@!d:scaled; {holds |split_max_depth| in |insert_group|}
20698@!f:integer; {holds |floating_penalty| in |insert_group|}
20699begin case cur_group of
20700simple_group: unsave;
20701bottom_level: begin print_err("Too many }'s");
20702@.Too many \}'s@>
20703  help2("You've closed more groups than you opened.")@/
20704  ("Such booboos are generally harmless, so keep going."); error;
20705  end;
20706semi_simple_group,math_shift_group,math_left_group: extra_right_brace;
20707@t\4@>@<Cases of |handle_right_brace| where a |right_brace| triggers
20708  a delayed action@>@;
20709othercases confusion("rightbrace")
20710@:this can't happen rightbrace}{\quad rightbrace@>
20711endcases;
20712end;
20713
20714@ @<Declare act...@>=
20715procedure extra_right_brace;
20716begin print_err("Extra }, or forgotten ");
20717@.Extra \}, or forgotten x@>
20718case cur_group of
20719semi_simple_group: print_esc("endgroup");
20720math_shift_group: print_char("$");
20721math_left_group: print_esc("right");
20722end;@/
20723help5("I've deleted a group-closing symbol because it seems to be")@/
20724("spurious, as in `$x}$'. But perhaps the } is legitimate and")@/
20725("you forgot something else, as in `\hbox{$x}'. In such cases")@/
20726("the way to recover is to insert both the forgotten and the")@/
20727("deleted material, e.g., by typing `I$}'."); error;
20728incr(align_state);
20729end;
20730
20731@ Here is where we clear the parameters that are supposed to revert to their
20732default values after every paragraph and when internal vertical mode is entered.
20733
20734@<Declare act...@>=
20735procedure normal_paragraph;
20736begin if looseness<>0 then eq_word_define(int_base+looseness_code,0);
20737if hang_indent<>0 then eq_word_define(dimen_base+hang_indent_code,0);
20738if hang_after<>1 then eq_word_define(int_base+hang_after_code,1);
20739if par_shape_ptr<>null then eq_define(par_shape_loc,shape_ref,null);
20740end;
20741
20742@ Now let's turn to the question of how \.{\\hbox} is treated. We actually
20743need to consider also a slightly larger context, since constructions like
20744`\.{\\setbox3=}\penalty0\.{\\hbox...}' and
20745`\.{\\leaders}\penalty0\.{\\hbox...}' and
20746`\.{\\lower3.8pt\\hbox...}'
20747are supposed to invoke quite
20748different actions after the box has been packaged. Conversely,
20749constructions like `\.{\\setbox3=}' can be followed by a variety of
20750different kinds of boxes, and we would like to encode such things in an
20751efficient way.
20752
20753In other words, there are two problems: to represent the context of a box,
20754and to represent its type.
20755
20756The first problem is solved by putting a ``context code'' on the |save_stack|,
20757just below the two entries that give the dimensions produced by |scan_spec|.
20758The context code is either a (signed) shift amount, or it is a large
20759integer |>=box_flag|, where |box_flag=@t$2^{30}$@>|. Codes |box_flag| through
20760|box_flag+255| represent `\.{\\setbox0}' through `\.{\\setbox255}';
20761codes |box_flag+256| through |box_flag+511| represent `\.{\\global\\setbox0}'
20762through `\.{\\global\\setbox255}';
20763code |box_flag+512| represents `\.{\\shipout}'; and codes |box_flag+513|
20764through |box_flag+515| represent `\.{\\leaders}', `\.{\\cleaders}',
20765and `\.{\\xleaders}'.
20766
20767The second problem is solved by giving the command code |make_box| to all
20768control sequences that produce a box, and by using the following |chr_code|
20769values to distinguish between them: |box_code|, |copy_code|, |last_box_code|,
20770|vsplit_code|, |vtop_code|, |vtop_code+vmode|, and |vtop_code+hmode|, where
20771the latter two are used to denote \.{\\vbox} and \.{\\hbox}, respectively.
20772
20773@d box_flag==@'10000000000 {context code for `\.{\\setbox0}'}
20774@d ship_out_flag==box_flag+512 {context code for `\.{\\shipout}'}
20775@d leader_flag==box_flag+513 {context code for `\.{\\leaders}'}
20776@d box_code=0 {|chr_code| for `\.{\\box}'}
20777@d copy_code=1 {|chr_code| for `\.{\\copy}'}
20778@d last_box_code=2 {|chr_code| for `\.{\\lastbox}'}
20779@d vsplit_code=3 {|chr_code| for `\.{\\vsplit}'}
20780@d vtop_code=4 {|chr_code| for `\.{\\vtop}'}
20781
20782@<Put each...@>=
20783primitive("moveleft",hmove,1);
20784@!@:move_left_}{\.{\\moveleft} primitive@>
20785primitive("moveright",hmove,0);@/
20786@!@:move_right_}{\.{\\moveright} primitive@>
20787primitive("raise",vmove,1);
20788@!@:raise_}{\.{\\raise} primitive@>
20789primitive("lower",vmove,0);
20790@!@:lower_}{\.{\\lower} primitive@>
20791@#
20792primitive("box",make_box,box_code);
20793@!@:box_}{\.{\\box} primitive@>
20794primitive("copy",make_box,copy_code);
20795@!@:copy_}{\.{\\copy} primitive@>
20796primitive("lastbox",make_box,last_box_code);
20797@!@:last_box_}{\.{\\lastbox} primitive@>
20798primitive("vsplit",make_box,vsplit_code);
20799@!@:vsplit_}{\.{\\vsplit} primitive@>
20800primitive("vtop",make_box,vtop_code);@/
20801@!@:vtop_}{\.{\\vtop} primitive@>
20802primitive("vbox",make_box,vtop_code+vmode);
20803@!@:vbox_}{\.{\\vbox} primitive@>
20804primitive("hbox",make_box,vtop_code+hmode);@/
20805@!@:hbox_}{\.{\\hbox} primitive@>
20806primitive("shipout",leader_ship,a_leaders-1); {|ship_out_flag=leader_flag-1|}
20807@!@:ship_out_}{\.{\\shipout} primitive@>
20808primitive("leaders",leader_ship,a_leaders);
20809@!@:leaders_}{\.{\\leaders} primitive@>
20810primitive("cleaders",leader_ship,c_leaders);
20811@!@:c_leaders_}{\.{\\cleaders} primitive@>
20812primitive("xleaders",leader_ship,x_leaders);
20813@!@:x_leaders_}{\.{\\xleaders} primitive@>
20814
20815@ @<Cases of |print_cmd_chr|...@>=
20816hmove: if chr_code=1 then print_esc("moveleft")@+else print_esc("moveright");
20817vmove: if chr_code=1 then print_esc("raise")@+else print_esc("lower");
20818make_box: case chr_code of
20819  box_code: print_esc("box");
20820  copy_code: print_esc("copy");
20821  last_box_code: print_esc("lastbox");
20822  vsplit_code: print_esc("vsplit");
20823  vtop_code: print_esc("vtop");
20824  vtop_code+vmode: print_esc("vbox");
20825  othercases print_esc("hbox")
20826  endcases;
20827leader_ship: if chr_code=a_leaders then print_esc("leaders")
20828  else if chr_code=c_leaders then print_esc("cleaders")
20829  else if chr_code=x_leaders then print_esc("xleaders")
20830  else print_esc("shipout");
20831
20832@ Constructions that require a box are started by calling |scan_box| with
20833a specified context code. The |scan_box| routine verifies
20834that a |make_box| command comes next and then it calls |begin_box|.
20835
20836@<Cases of |main_control| that build...@>=
20837vmode+hmove,hmode+vmove,mmode+vmove: begin t:=cur_chr;
20838  scan_normal_dimen;
20839  if t=0 then scan_box(cur_val)@+else scan_box(-cur_val);
20840  end;
20841any_mode(leader_ship): scan_box(leader_flag-a_leaders+cur_chr);
20842any_mode(make_box): begin_box(0);
20843
20844@ The global variable |cur_box| will point to a newly made box. If the box
20845is void, we will have |cur_box=null|. Otherwise we will have
20846|type(cur_box)=hlist_node| or |vlist_node| or |rule_node|; the |rule_node|
20847case can occur only with leaders.
20848
20849@<Glob...@>=
20850@!cur_box:pointer; {box to be placed into its context}
20851
20852@ The |box_end| procedure does the right thing with |cur_box|, if
20853|box_context| represents the context as explained above.
20854
20855@<Declare act...@>=
20856procedure box_end(@!box_context:integer);
20857var p:pointer; {|ord_noad| for new box in math mode}
20858begin if box_context<box_flag then @<Append box |cur_box| to the current list,
20859    shifted by |box_context|@>
20860else if box_context<ship_out_flag then @<Store \(c)|cur_box| in a box register@>
20861else if cur_box<>null then
20862  if box_context>ship_out_flag then @<Append a new leader node that
20863      uses |cur_box|@>
20864  else ship_out(cur_box);
20865end;
20866
20867@ The global variable |adjust_tail| will be non-null if and only if the
20868current box might include adjustments that should be appended to the
20869current vertical list.
20870
20871@<Append box |cur_box| to the current...@>=
20872begin if cur_box<>null then
20873  begin shift_amount(cur_box):=box_context;
20874  if abs(mode)=vmode then
20875    begin append_to_vlist(cur_box);
20876    if adjust_tail<>null then
20877      begin if adjust_head<>adjust_tail then
20878        begin link(tail):=link(adjust_head); tail:=adjust_tail;
20879        end;
20880      adjust_tail:=null;
20881      end;
20882    if mode>0 then build_page;
20883    end
20884  else  begin if abs(mode)=hmode then space_factor:=1000
20885    else  begin p:=new_noad;
20886      math_type(nucleus(p)):=sub_box;
20887      info(nucleus(p)):=cur_box; cur_box:=p;
20888      end;
20889    link(tail):=cur_box; tail:=cur_box;
20890    end;
20891  end;
20892end
20893
20894@ @<Store \(c)|cur_box| in a box register@>=
20895if box_context<box_flag+256 then
20896  eq_define(box_base-box_flag+box_context,box_ref,cur_box)
20897else geq_define(box_base-box_flag-256+box_context,box_ref,cur_box)
20898
20899@ @<Append a new leader node ...@>=
20900begin @<Get the next non-blank non-relax...@>;
20901if ((cur_cmd=hskip)and(abs(mode)<>vmode))or@|
20902   ((cur_cmd=vskip)and(abs(mode)=vmode)) then
20903  begin append_glue; subtype(tail):=box_context-(leader_flag-a_leaders);
20904  leader_ptr(tail):=cur_box;
20905  end
20906else  begin print_err("Leaders not followed by proper glue");
20907@.Leaders not followed by...@>
20908  help3("You should say `\leaders <box or rule><hskip or vskip>'.")@/
20909  ("I found the <box or rule>, but there's no suitable")@/
20910  ("<hskip or vskip>, so I'm ignoring these leaders."); back_error;
20911  flush_node_list(cur_box);
20912  end;
20913end
20914
20915@ Now that we can see what eventually happens to boxes, we can consider
20916the first steps in their creation. The |begin_box| routine is called when
20917|box_context| is a context specification, |cur_chr| specifies the type of
20918box desired, and |cur_cmd=make_box|.
20919
20920@<Declare act...@>=
20921procedure begin_box(@!box_context:integer);
20922label exit, done;
20923var @!p,@!q:pointer; {run through the current list}
20924@!m:quarterword; {the length of a replacement list}
20925@!k:halfword; {0 or |vmode| or |hmode|}
20926@!n:eight_bits; {a box number}
20927begin case cur_chr of
20928box_code: begin scan_eight_bit_int; cur_box:=box(cur_val);
20929  box(cur_val):=null; {the box becomes void, at the same level}
20930  end;
20931copy_code: begin scan_eight_bit_int; cur_box:=copy_node_list(box(cur_val));
20932  end;
20933last_box_code: @<If the current list ends with a box node, delete it from
20934  the list and make |cur_box| point to it; otherwise set |cur_box:=null|@>;
20935vsplit_code: @<Split off part of a vertical box, make |cur_box| point to it@>;
20936othercases @<Initiate the construction of an hbox or vbox, then |return|@>
20937endcases;@/
20938box_end(box_context); {in simple cases, we use the box immediately}
20939exit:end;
20940
20941@ Note that the condition |not is_char_node(tail)| implies that |head<>tail|,
20942since |head| is a one-word node.
20943
20944@<If the current list ends with a box node, delete it...@>=
20945begin cur_box:=null;
20946if abs(mode)=mmode then
20947  begin you_cant; help1("Sorry; this \lastbox will be void."); error;
20948  end
20949else if (mode=vmode)and(head=tail) then
20950  begin you_cant;
20951  help2("Sorry...I usually can't take things from the current page.")@/
20952    ("This \lastbox will therefore be void."); error;
20953  end
20954else  begin if not is_char_node(tail) then
20955    if (type(tail)=hlist_node)or(type(tail)=vlist_node) then
20956      @<Remove the last box, unless it's part of a discretionary@>;
20957  end;
20958end
20959
20960@ @<Remove the last box...@>=
20961begin q:=head;
20962repeat p:=q;
20963if not is_char_node(q) then if type(q)=disc_node then
20964  begin for m:=1 to replace_count(q) do p:=link(p);
20965  if p=tail then goto done;
20966  end;
20967q:=link(p);
20968until q=tail;
20969cur_box:=tail; shift_amount(cur_box):=0;
20970tail:=p; link(p):=null;
20971done:end
20972
20973@ Here we deal with things like `\.{\\vsplit 13 to 100pt}'.
20974
20975@<Split off part of a vertical box, make |cur_box| point to it@>=
20976begin scan_eight_bit_int; n:=cur_val;
20977if not scan_keyword("to") then
20978@.to@>
20979  begin print_err("Missing `to' inserted");
20980@.Missing `to' inserted@>
20981  help2("I'm working on `\vsplit<box number> to <dimen>';")@/
20982  ("will look for the <dimen> next."); error;
20983  end;
20984scan_normal_dimen;
20985cur_box:=vsplit(n,cur_val);
20986end
20987
20988@ Here is where we enter restricted horizontal mode or internal vertical
20989mode, in order to make a box.
20990
20991@<Initiate the construction of an hbox or vbox, then |return|@>=
20992begin k:=cur_chr-vtop_code; saved(0):=box_context;
20993if k=hmode then
20994  if (box_context<box_flag)and(abs(mode)=vmode) then
20995    scan_spec(adjusted_hbox_group,true)
20996  else scan_spec(hbox_group,true)
20997else  begin if k=vmode then scan_spec(vbox_group,true)
20998  else  begin scan_spec(vtop_group,true); k:=vmode;
20999    end;
21000  normal_paragraph;
21001  end;
21002push_nest; mode:=-k;
21003if k=vmode then
21004  begin prev_depth:=ignore_depth;
21005  if every_vbox<>null then begin_token_list(every_vbox,every_vbox_text);
21006  end
21007else  begin space_factor:=1000;
21008  if every_hbox<>null then begin_token_list(every_hbox,every_hbox_text);
21009  end;
21010return;
21011end
21012
21013@ @<Declare act...@>=
21014procedure scan_box(@!box_context:integer);
21015  {the next input should specify a box or perhaps a rule}
21016begin @<Get the next non-blank non-relax...@>;
21017if cur_cmd=make_box then begin_box(box_context)
21018else if (box_context>=leader_flag)and((cur_cmd=hrule)or(cur_cmd=vrule)) then
21019  begin cur_box:=scan_rule_spec; box_end(box_context);
21020  end
21021else  begin@t@>@;@/
21022  print_err("A <box> was supposed to be here");@/
21023@.A <box> was supposed to...@>
21024  help3("I was expecting to see \hbox or \vbox or \copy or \box or")@/
21025  ("something like that. So you might find something missing in")@/
21026  ("your output. But keep trying; you can fix this later."); back_error;
21027  end;
21028end;
21029
21030@ When the right brace occurs at the end of an \.{\\hbox} or \.{\\vbox} or
21031\.{\\vtop} construction, the |package| routine comes into action. We might
21032also have to finish a paragraph that hasn't ended.
21033
21034@<Cases of |handle...@>=
21035hbox_group: package(0);
21036adjusted_hbox_group: begin adjust_tail:=adjust_head; package(0);
21037  end;
21038vbox_group: begin end_graf; package(0);
21039  end;
21040vtop_group: begin end_graf; package(vtop_code);
21041  end;
21042
21043@ @<Declare action...@>=
21044procedure package(@!c:small_number);
21045var h:scaled; {height of box}
21046@!p:pointer; {first node in a box}
21047@!d:scaled; {max depth}
21048begin d:=box_max_depth; unsave; save_ptr:=save_ptr-3;
21049if mode=-hmode then cur_box:=hpack(link(head),saved(2),saved(1))
21050else  begin cur_box:=vpackage(link(head),saved(2),saved(1),d);
21051  if c=vtop_code then @<Readjust the height and depth of |cur_box|,
21052    for \.{\\vtop}@>;
21053  end;
21054pop_nest; box_end(saved(0));
21055end;
21056
21057@ The height of a `\.{\\vtop}' box is inherited from the first item on its list,
21058if that item is an |hlist_node|, |vlist_node|, or |rule_node|; otherwise
21059the \.{\\vtop} height is zero.
21060
21061
21062@<Readjust the height...@>=
21063begin h:=0; p:=list_ptr(cur_box);
21064if p<>null then if type(p)<=rule_node then h:=height(p);
21065depth(cur_box):=depth(cur_box)-h+height(cur_box); height(cur_box):=h;
21066end
21067
21068@ A paragraph begins when horizontal-mode material occurs in vertical mode,
21069or when the paragraph is explicitly started by `\.{\\indent}' or
21070`\.{\\noindent}'.
21071
21072@<Put each...@>=
21073primitive("indent",start_par,1);
21074@!@:indent_}{\.{\\indent} primitive@>
21075primitive("noindent",start_par,0);
21076@!@:no_indent_}{\.{\\noindent} primitive@>
21077
21078@ @<Cases of |print_cmd_chr|...@>=
21079start_par: if chr_code=0 then print_esc("noindent")@+ else print_esc("indent");
21080
21081@ @<Cases of |main_control| that build...@>=
21082vmode+start_par: new_graf(cur_chr>0);
21083vmode+letter,vmode+other_char,vmode+char_num,vmode+char_given,
21084   vmode+math_shift,vmode+un_hbox,vmode+vrule,
21085   vmode+accent,vmode+discretionary,vmode+hskip,vmode+valign,
21086   vmode+ex_space,vmode+no_boundary:@t@>@;@/
21087  begin back_input; new_graf(true);
21088  end;
21089
21090@ @<Declare act...@>=
21091function norm_min(@!h:integer):small_number;
21092begin if h<=0 then norm_min:=1@+else if h>=63 then norm_min:=63@+
21093else norm_min:=h;
21094end;
21095@#
21096procedure new_graf(@!indented:boolean);
21097begin prev_graf:=0;
21098if (mode=vmode)or(head<>tail) then
21099  tail_append(new_param_glue(par_skip_code));
21100push_nest; mode:=hmode; space_factor:=1000; set_cur_lang; clang:=cur_lang;
21101prev_graf:=(norm_min(left_hyphen_min)*@'100+norm_min(right_hyphen_min))
21102             *@'200000+cur_lang;
21103if indented then
21104  begin tail:=new_null_box; link(head):=tail; width(tail):=par_indent;@+
21105  end;
21106if every_par<>null then begin_token_list(every_par,every_par_text);
21107if nest_ptr=1 then build_page; {put |par_skip| glue on current page}
21108end;
21109
21110@ @<Cases of |main_control| that build...@>=
21111hmode+start_par,mmode+start_par: indent_in_hmode;
21112
21113@ @<Declare act...@>=
21114procedure indent_in_hmode;
21115var p,@!q:pointer;
21116begin if cur_chr>0 then {\.{\\indent}}
21117  begin p:=new_null_box; width(p):=par_indent;
21118  if abs(mode)=hmode then space_factor:=1000
21119  else  begin q:=new_noad; math_type(nucleus(q)):=sub_box;
21120    info(nucleus(q)):=p; p:=q;
21121    end;
21122  tail_append(p);
21123  end;
21124end;
21125
21126@ A paragraph ends when a |par_end| command is sensed, or when we are in
21127horizontal mode when reaching the right brace of vertical-mode routines
21128like \.{\\vbox}, \.{\\insert}, or \.{\\output}.
21129
21130@<Cases of |main_control| that build...@>=
21131vmode+par_end: begin normal_paragraph;
21132  if mode>0 then build_page;
21133  end;
21134hmode+par_end: begin if align_state<0 then off_save; {this tries to
21135    recover from an alignment that didn't end properly}
21136  end_graf; {this takes us to the enclosing mode, if |mode>0|}
21137  if mode=vmode then build_page;
21138  end;
21139hmode+stop,hmode+vskip,hmode+hrule,hmode+un_vbox,hmode+halign: head_for_vmode;
21140
21141@ @<Declare act...@>=
21142procedure head_for_vmode;
21143begin if mode<0 then
21144  if cur_cmd<>hrule then off_save
21145  else  begin print_err("You can't use `");
21146    print_esc("hrule"); print("' here except with leaders");
21147@.You can't use \\hrule...@>
21148    help2("To put a horizontal rule in an hbox or an alignment,")@/
21149      ("you should use \leaders or \hrulefill (see The TeXbook).");
21150    error;
21151    end
21152else  begin back_input; cur_tok:=par_token; back_input; token_type:=inserted;
21153  end;
21154end;
21155
21156@ @<Declare act...@>=
21157procedure end_graf;
21158begin if mode=hmode then
21159  begin if head=tail then pop_nest {null paragraphs are ignored}
21160  else line_break(widow_penalty);
21161  normal_paragraph;
21162  error_count:=0;
21163  end;
21164end;
21165
21166@ Insertion and adjustment and mark nodes are constructed by the following
21167pieces of the program.
21168
21169@<Cases of |main_control| that build...@>=
21170any_mode(insert),hmode+vadjust,mmode+vadjust: begin_insert_or_adjust;
21171any_mode(mark): make_mark;
21172
21173@ @<Forbidden...@>=
21174vmode+vadjust,
21175
21176@ @<Declare act...@>=
21177procedure begin_insert_or_adjust;
21178begin if cur_cmd=vadjust then cur_val:=255
21179else  begin scan_eight_bit_int;
21180  if cur_val=255 then
21181    begin print_err("You can't "); print_esc("insert"); print_int(255);
21182@.You can't \\insert255@>
21183    help1("I'm changing to \insert0; box 255 is special.");
21184    error; cur_val:=0;
21185    end;
21186  end;
21187saved(0):=cur_val; incr(save_ptr);
21188new_save_level(insert_group); scan_left_brace; normal_paragraph;
21189push_nest; mode:=-vmode; prev_depth:=ignore_depth;
21190end;
21191
21192@ @<Cases of |handle...@>=
21193insert_group: begin end_graf; q:=split_top_skip; add_glue_ref(q);
21194  d:=split_max_depth; f:=floating_penalty; unsave; decr(save_ptr);
21195  {now |saved(0)| is the insertion number, or 255 for |vadjust|}
21196  p:=vpack(link(head),natural); pop_nest;
21197  if saved(0)<255 then
21198    begin tail_append(get_node(ins_node_size));
21199    type(tail):=ins_node; subtype(tail):=qi(saved(0));
21200    height(tail):=height(p)+depth(p); ins_ptr(tail):=list_ptr(p);
21201    split_top_ptr(tail):=q; depth(tail):=d; float_cost(tail):=f;
21202    end
21203  else  begin tail_append(get_node(small_node_size));
21204    type(tail):=adjust_node;@/
21205    subtype(tail):=0; {the |subtype| is not used}
21206    adjust_ptr(tail):=list_ptr(p); delete_glue_ref(q);
21207    end;
21208  free_node(p,box_node_size);
21209  if nest_ptr=0 then build_page;
21210  end;
21211output_group: @<Resume the page builder...@>;
21212
21213@ @<Declare act...@>=
21214procedure make_mark;
21215var p:pointer; {new node}
21216begin p:=scan_toks(false,true); p:=get_node(small_node_size);
21217type(p):=mark_node; subtype(p):=0; {the |subtype| is not used}
21218mark_ptr(p):=def_ref; link(tail):=p; tail:=p;
21219end;
21220
21221@ Penalty nodes get into a list via the |break_penalty| command.
21222@^penalties@>
21223
21224@<Cases of |main_control| that build...@>=
21225any_mode(break_penalty): append_penalty;
21226
21227@ @<Declare action...@>=
21228procedure append_penalty;
21229begin scan_int; tail_append(new_penalty(cur_val));
21230if mode=vmode then build_page;
21231end;
21232
21233@ The |remove_item| command removes a penalty, kern, or glue node if it
21234appears at the tail of the current list, using a brute-force linear scan.
21235Like \.{\\lastbox}, this command is not allowed in vertical mode (except
21236internal vertical mode), since the current list in vertical mode is sent
21237to the page builder.  But if we happen to be able to implement it in
21238vertical mode, we do.
21239
21240@<Cases of |main_control| that build...@>=
21241any_mode(remove_item): delete_last;
21242
21243@ When |delete_last| is called, |cur_chr| is the |type| of node that
21244will be deleted, if present.
21245
21246@<Declare action...@>=
21247procedure delete_last;
21248label exit;
21249var @!p,@!q:pointer; {run through the current list}
21250@!m:quarterword; {the length of a replacement list}
21251begin if (mode=vmode)and(tail=head) then
21252  @<Apologize for inability to do the operation now,
21253    unless \.{\\unskip} follows non-glue@>
21254else  begin if not is_char_node(tail) then if type(tail)=cur_chr then
21255    begin q:=head;
21256    repeat p:=q;
21257    if not is_char_node(q) then if type(q)=disc_node then
21258      begin for m:=1 to replace_count(q) do p:=link(p);
21259      if p=tail then return;
21260      end;
21261    q:=link(p);
21262    until q=tail;
21263    link(p):=null; flush_node_list(tail); tail:=p;
21264    end;
21265  end;
21266exit:end;
21267
21268@ @<Apologize for inability to do the operation...@>=
21269begin if (cur_chr<>glue_node)or(last_glue<>max_halfword) then
21270  begin you_cant;
21271  help2("Sorry...I usually can't take things from the current page.")@/
21272    ("Try `I\vskip-\lastskip' instead.");
21273  if cur_chr=kern_node then help_line[0]:=
21274    ("Try `I\kern-\lastkern' instead.")
21275  else if cur_chr<>glue_node then help_line[0]:=@|
21276    ("Perhaps you can make the output routine do it.");
21277  error;
21278  end;
21279end
21280
21281@ @<Put each...@>=
21282primitive("unpenalty",remove_item,penalty_node);@/
21283@!@:un_penalty_}{\.{\\unpenalty} primitive@>
21284primitive("unkern",remove_item,kern_node);@/
21285@!@:un_kern_}{\.{\\unkern} primitive@>
21286primitive("unskip",remove_item,glue_node);@/
21287@!@:un_skip_}{\.{\\unskip} primitive@>
21288primitive("unhbox",un_hbox,box_code);@/
21289@!@:un_hbox_}{\.{\\unhbox} primitive@>
21290primitive("unhcopy",un_hbox,copy_code);@/
21291@!@:un_hcopy_}{\.{\\unhcopy} primitive@>
21292primitive("unvbox",un_vbox,box_code);@/
21293@!@:un_vbox_}{\.{\\unvbox} primitive@>
21294primitive("unvcopy",un_vbox,copy_code);@/
21295@!@:un_vcopy_}{\.{\\unvcopy} primitive@>
21296
21297@ @<Cases of |print_cmd_chr|...@>=
21298remove_item: if chr_code=glue_node then print_esc("unskip")
21299  else if chr_code=kern_node then print_esc("unkern")
21300  else print_esc("unpenalty");
21301un_hbox: if chr_code=copy_code then print_esc("unhcopy")
21302  else print_esc("unhbox");
21303un_vbox: if chr_code=copy_code then print_esc("unvcopy")
21304  else print_esc("unvbox");
21305
21306@ The |un_hbox| and |un_vbox| commands unwrap one of the 256 current boxes.
21307
21308@<Cases of |main_control| that build...@>=
21309vmode+un_vbox,hmode+un_hbox,mmode+un_hbox: unpackage;
21310
21311@ @<Declare act...@>=
21312procedure unpackage;
21313label exit;
21314var p:pointer; {the box}
21315@!c:box_code..copy_code; {should we copy?}
21316begin c:=cur_chr; scan_eight_bit_int; p:=box(cur_val);
21317if p=null then return;
21318if (abs(mode)=mmode)or((abs(mode)=vmode)and(type(p)<>vlist_node))or@|
21319   ((abs(mode)=hmode)and(type(p)<>hlist_node)) then
21320  begin print_err("Incompatible list can't be unboxed");
21321@.Incompatible list...@>
21322  help3("Sorry, Pandora. (You sneaky devil.)")@/
21323  ("I refuse to unbox an \hbox in vertical mode or vice versa.")@/
21324  ("And I can't open any boxes in math mode.");@/
21325  error; return;
21326  end;
21327if c=copy_code then link(tail):=copy_node_list(list_ptr(p))
21328else  begin link(tail):=list_ptr(p); box(cur_val):=null;
21329  free_node(p,box_node_size);
21330  end;
21331while link(tail)<>null do tail:=link(tail);
21332exit:end;
21333
21334@ @<Forbidden...@>=vmode+ital_corr,
21335
21336@ Italic corrections are converted to kern nodes when the |ital_corr| command
21337follows a character. In math mode the same effect is achieved by appending
21338a kern of zero here, since italic corrections are supplied later.
21339
21340@<Cases of |main_control| that build...@>=
21341hmode+ital_corr: append_italic_correction;
21342mmode+ital_corr: tail_append(new_kern(0));
21343
21344@ @<Declare act...@>=
21345procedure append_italic_correction;
21346label exit;
21347var p:pointer; {|char_node| at the tail of the current list}
21348@!f:internal_font_number; {the font in the |char_node|}
21349begin if tail<>head then
21350  begin if is_char_node(tail) then p:=tail
21351  else if type(tail)=ligature_node then p:=lig_char(tail)
21352  else return;
21353  f:=font(p);
21354  tail_append(new_kern(char_italic(f)(char_info(f)(character(p)))));
21355  subtype(tail):=explicit;
21356  end;
21357exit:end;
21358
21359@ Discretionary nodes are easy in the common case `\.{\\-}', but in the
21360general case we must process three braces full of items.
21361
21362@<Put each...@>=
21363primitive("-",discretionary,1);
21364@!@:Single-character primitives -}{\quad\.{\\-}@>
21365primitive("discretionary",discretionary,0);
21366@!@:discretionary_}{\.{\\discretionary} primitive@>
21367
21368@ @<Cases of |print_cmd_chr|...@>=
21369discretionary: if chr_code=1 then
21370  print_esc("-")@+else print_esc("discretionary");
21371
21372@ @<Cases of |main_control| that build...@>=
21373hmode+discretionary,mmode+discretionary: append_discretionary;
21374
21375@ The space factor does not change when we append a discretionary node,
21376but it starts out as 1000 in the subsidiary lists.
21377
21378@<Declare act...@>=
21379procedure append_discretionary;
21380var c:integer; {hyphen character}
21381begin tail_append(new_disc);
21382if cur_chr=1 then
21383  begin c:=hyphen_char[cur_font];
21384  if c>=0 then if c<256 then pre_break(tail):=new_character(cur_font,c);
21385  end
21386else  begin incr(save_ptr); saved(-1):=0; new_save_level(disc_group);
21387  scan_left_brace; push_nest; mode:=-hmode; space_factor:=1000;
21388  end;
21389end;
21390
21391@ The three discretionary lists are constructed somewhat as if they were
21392hboxes. A~subroutine called |build_discretionary| handles the transitions.
21393(This is sort of fun.)
21394
21395@<Cases of |handle...@>=
21396disc_group: build_discretionary;
21397
21398@ @<Declare act...@>=
21399procedure build_discretionary;
21400label done,exit;
21401var p,@!q:pointer; {for link manipulation}
21402@!n:integer; {length of discretionary list}
21403begin unsave;
21404@<Prune the current list, if necessary, until it contains only
21405  |char_node|, |kern_node|, |hlist_node|, |vlist_node|, |rule_node|,
21406  and |ligature_node| items; set |n| to the length of the list,
21407  and set |q| to the list's tail@>;
21408p:=link(head); pop_nest;
21409case saved(-1) of
214100:pre_break(tail):=p;
214111:post_break(tail):=p;
214122:@<Attach list |p| to the current list, and record its length;
21413  then finish up and |return|@>;
21414end; {there are no other cases}
21415incr(saved(-1)); new_save_level(disc_group); scan_left_brace;
21416push_nest; mode:=-hmode; space_factor:=1000;
21417exit:end;
21418
21419@ @<Attach list |p| to the current...@>=
21420begin if (n>0)and(abs(mode)=mmode) then
21421  begin print_err("Illegal math "); print_esc("discretionary");
21422@.Illegal math \\disc...@>
21423  help2("Sorry: The third part of a discretionary break must be")@/
21424  ("empty, in math formulas. I had to delete your third part.");
21425  flush_node_list(p); n:=0; error;
21426  end
21427else link(tail):=p;
21428if n<=max_quarterword then replace_count(tail):=n
21429else  begin print_err("Discretionary list is too long");
21430@.Discretionary list is too long@>
21431  help2("Wow---I never thought anybody would tweak me here.")@/
21432  ("You can't seriously need such a huge discretionary list?");
21433  error;
21434  end;
21435if n>0 then tail:=q;
21436decr(save_ptr); return;
21437end
21438
21439@ During this loop, |p=link(q)| and there are |n| items preceding |p|.
21440
21441@<Prune the current list, if necessary...@>=
21442q:=head; p:=link(q); n:=0;
21443while p<>null do
21444  begin if not is_char_node(p) then if type(p)>rule_node then
21445    if type(p)<>kern_node then if type(p)<>ligature_node then
21446      begin print_err("Improper discretionary list");
21447@.Improper discretionary list@>
21448      help1("Discretionary lists must contain only boxes and kerns.");@/
21449      error;
21450      begin_diagnostic;
21451      print_nl("The following discretionary sublist has been deleted:");
21452@.The following...deleted@>
21453      show_box(p);
21454      end_diagnostic(true);
21455      flush_node_list(p); link(q):=null; goto done;
21456      end;
21457  q:=p; p:=link(q); incr(n);
21458  end;
21459done:
21460
21461@ We need only one more thing to complete the horizontal mode routines, namely
21462the \.{\\accent} primitive.
21463
21464@<Cases of |main_control| that build...@>=
21465hmode+accent: make_accent;
21466
21467@ The positioning of accents is straightforward but tedious. Given an accent
21468of width |a|, designed for characters of height |x| and slant |s|;
21469and given a character of width |w|, height |h|, and slant |t|: We will shift
21470the accent down by |x-h|, and we will insert kern nodes that have the effect of
21471centering the accent over the character and shifting the accent to the
21472right by $\delta={1\over2}(w-a)+h\cdot t-x\cdot s$.  If either character is
21473absent from the font, we will simply use the other, without shifting.
21474
21475@<Declare act...@>=
21476procedure make_accent;
21477var s,@!t: real; {amount of slant}
21478@!p,@!q,@!r:pointer; {character, box, and kern nodes}
21479@!f:internal_font_number; {relevant font}
21480@!a,@!h,@!x,@!w,@!delta:scaled; {heights and widths, as explained above}
21481@!i:four_quarters; {character information}
21482begin scan_char_num; f:=cur_font; p:=new_character(f,cur_val);
21483if p<>null then
21484  begin x:=x_height(f); s:=slant(f)/float_constant(65536);
21485@^real division@>
21486  a:=char_width(f)(char_info(f)(character(p)));@/
21487  do_assignments;@/
21488  @<Create a character node |q| for the next character,
21489    but set |q:=null| if problems arise@>;
21490  if q<>null then @<Append the accent with appropriate kerns,
21491      then set |p:=q|@>;
21492  link(tail):=p; tail:=p; space_factor:=1000;
21493  end;
21494end;
21495
21496@ @<Create a character node |q| for the next...@>=
21497q:=null; f:=cur_font;
21498if (cur_cmd=letter)or(cur_cmd=other_char)or(cur_cmd=char_given) then
21499  q:=new_character(f,cur_chr)
21500else if cur_cmd=char_num then
21501  begin scan_char_num; q:=new_character(f,cur_val);
21502  end
21503else back_input
21504
21505@ The kern nodes appended here must be distinguished from other kerns, lest
21506they be wiped away by the hyphenation algorithm or by a previous line break.
21507
21508The two kerns are computed with (machine-dependent) |real| arithmetic, but
21509their sum is machine-independent; the net effect is machine-independent,
21510because the user cannot remove these nodes nor access them via \.{\\lastkern}.
21511
21512@<Append the accent with appropriate kerns...@>=
21513begin t:=slant(f)/float_constant(65536);
21514@^real division@>
21515i:=char_info(f)(character(q));
21516w:=char_width(f)(i); h:=char_height(f)(height_depth(i));
21517if h<>x then {the accent must be shifted up or down}
21518  begin p:=hpack(p,natural); shift_amount(p):=x-h;
21519  end;
21520delta:=round((w-a)/float_constant(2)+h*t-x*s);
21521@^real multiplication@>
21522@^real addition@>
21523r:=new_kern(delta); subtype(r):=acc_kern; link(tail):=r; link(r):=p;
21524tail:=new_kern(-a-delta); subtype(tail):=acc_kern; link(p):=tail; p:=q;
21525end
21526
21527@ When `\.{\\cr}' or `\.{\\span}' or a tab mark comes through the scanner
21528into |main_control|, it might be that the user has foolishly inserted
21529one of them into something that has nothing to do with alignment. But it is
21530far more likely that a left brace or right brace has been omitted, since
21531|get_next| takes actions appropriate to alignment only when `\.{\\cr}'
21532or `\.{\\span}' or tab marks occur with |align_state=0|. The following
21533program attempts to make an appropriate recovery.
21534
21535@<Cases of |main_control| that build...@>=
21536any_mode(car_ret), any_mode(tab_mark): align_error;
21537any_mode(no_align): no_align_error;
21538any_mode(omit): omit_error;
21539
21540@ @<Declare act...@>=
21541procedure align_error;
21542begin if abs(align_state)>2 then
21543  @<Express consternation over the fact that no alignment is in progress@>
21544else  begin back_input;
21545  if align_state<0 then
21546    begin print_err("Missing { inserted");
21547@.Missing \{ inserted@>
21548    incr(align_state); cur_tok:=left_brace_token+"{";
21549    end
21550  else  begin print_err("Missing } inserted");
21551@.Missing \} inserted@>
21552    decr(align_state); cur_tok:=right_brace_token+"}";
21553    end;
21554  help3("I've put in what seems to be necessary to fix")@/
21555    ("the current column of the current alignment.")@/
21556    ("Try to go on, since this might almost work."); ins_error;
21557  end;
21558end;
21559
21560@ @<Express consternation...@>=
21561begin print_err("Misplaced "); print_cmd_chr(cur_cmd,cur_chr);
21562@.Misplaced \&@>
21563@.Misplaced \\span@>
21564@.Misplaced \\cr@>
21565if cur_tok=tab_token+"&" then
21566  begin help6("I can't figure out why you would want to use a tab mark")@/
21567  ("here. If you just want an ampersand, the remedy is")@/
21568  ("simple: Just type `I\&' now. But if some right brace")@/
21569  ("up above has ended a previous alignment prematurely,")@/
21570  ("you're probably due for more error messages, and you")@/
21571  ("might try typing `S' now just to see what is salvageable.");
21572  end
21573else  begin help5("I can't figure out why you would want to use a tab mark")@/
21574  ("or \cr or \span just now. If something like a right brace")@/
21575  ("up above has ended a previous alignment prematurely,")@/
21576  ("you're probably due for more error messages, and you")@/
21577  ("might try typing `S' now just to see what is salvageable.");
21578  end;
21579error;
21580end
21581
21582@ The help messages here contain a little white lie, since \.{\\noalign}
21583and \.{\\omit} are allowed also after `\.{\\noalign\{...\}}'.
21584
21585@<Declare act...@>=
21586procedure no_align_error;
21587begin print_err("Misplaced "); print_esc("noalign");
21588@.Misplaced \\noalign@>
21589help2("I expect to see \noalign only after the \cr of")@/
21590  ("an alignment. Proceed, and I'll ignore this case."); error;
21591end;
21592procedure omit_error;
21593begin print_err("Misplaced "); print_esc("omit");
21594@.Misplaced \\omit@>
21595help2("I expect to see \omit only after tab marks or the \cr of")@/
21596  ("an alignment. Proceed, and I'll ignore this case."); error;
21597end;
21598
21599@ We've now covered most of the abuses of \.{\\halign} and \.{\\valign}.
21600Let's take a look at what happens when they are used correctly.
21601
21602@<Cases of |main_control| that build...@>=
21603vmode+halign,hmode+valign:init_align;
21604mmode+halign: if privileged then
21605  if cur_group=math_shift_group then init_align
21606  else off_save;
21607vmode+endv,hmode+endv: do_endv;
21608
21609@ An |align_group| code is supposed to remain on the |save_stack|
21610during an entire alignment, until |fin_align| removes it.
21611
21612A devious user might force an |endv| command to occur just about anywhere;
21613we must defeat such hacks.
21614
21615@<Declare act...@>=
21616procedure do_endv;
21617begin base_ptr:=input_ptr; input_stack[base_ptr]:=cur_input;
21618while (input_stack[base_ptr].index_field<>v_template) and
21619      (input_stack[base_ptr].loc_field=null) and
21620      (input_stack[base_ptr].state_field=token_list) do decr(base_ptr);
21621if (input_stack[base_ptr].index_field<>v_template) or
21622      (input_stack[base_ptr].loc_field<>null) or
21623      (input_stack[base_ptr].state_field<>token_list) then
21624  fatal_error("(interwoven alignment preambles are not allowed)");
21625@.interwoven alignment preambles...@>
21626 if cur_group=align_group then
21627  begin end_graf;
21628  if fin_col then fin_row;
21629  end
21630else off_save;
21631end;
21632
21633@ @<Cases of |handle_right_brace|...@>=
21634align_group: begin back_input; cur_tok:=cs_token_flag+frozen_cr;
21635  print_err("Missing "); print_esc("cr"); print(" inserted");
21636@.Missing \\cr inserted@>
21637  help1("I'm guessing that you meant to end an alignment here.");
21638  ins_error;
21639  end;
21640
21641@ @<Cases of |handle_right_brace|...@>=
21642no_align_group: begin end_graf; unsave; align_peek;
21643  end;
21644
21645@ Finally, \.{\\endcsname} is not supposed to get through to |main_control|.
21646
21647@<Cases of |main_control| that build...@>=
21648any_mode(end_cs_name): cs_error;
21649
21650@ @<Declare act...@>=
21651procedure cs_error;
21652begin print_err("Extra "); print_esc("endcsname");
21653@.Extra \\endcsname@>
21654help1("I'm ignoring this, since I wasn't doing a \csname.");
21655error;
21656end;
21657
21658@* \[48] Building math lists.
21659The routines that \TeX\ uses to create mlists are similar to those we have
21660just seen for the generation of hlists and vlists. But it is necessary to
21661make ``noads'' as well as nodes, so the reader should review the
21662discussion of math mode data structures before trying to make sense out of
21663the following program.
21664
21665Here is a little routine that needs to be done whenever a subformula
21666is about to be processed. The parameter is a code like |math_group|.
21667
21668@<Declare act...@>=
21669procedure push_math(@!c:group_code);
21670begin push_nest; mode:=-mmode; incompleat_noad:=null; new_save_level(c);
21671end;
21672
21673@ We get into math mode from horizontal mode when a `\.\$' (i.e., a
21674|math_shift| character) is scanned. We must check to see whether this
21675`\.\$' is immediately followed by another, in case display math mode is
21676called for.
21677
21678@<Cases of |main_control| that build...@>=
21679hmode+math_shift:init_math;
21680
21681@ @<Declare act...@>=
21682procedure init_math;
21683label reswitch,found,not_found,done;
21684var w:scaled; {new or partial |pre_display_size|}
21685@!l:scaled; {new |display_width|}
21686@!s:scaled; {new |display_indent|}
21687@!p:pointer; {current node when calculating |pre_display_size|}
21688@!q:pointer; {glue specification when calculating |pre_display_size|}
21689@!f:internal_font_number; {font in current |char_node|}
21690@!n:integer; {scope of paragraph shape specification}
21691@!v:scaled; {|w| plus possible glue amount}
21692@!d:scaled; {increment to |v|}
21693begin get_token; {|get_x_token| would fail on \.{\\ifmmode}\thinspace!}
21694if (cur_cmd=math_shift)and(mode>0) then @<Go into display math mode@>
21695else  begin back_input; @<Go into ordinary math mode@>;
21696  end;
21697end;
21698
21699@ @<Go into ordinary math mode@>=
21700begin push_math(math_shift_group); eq_word_define(int_base+cur_fam_code,-1);
21701if every_math<>null then begin_token_list(every_math,every_math_text);
21702end
21703
21704@ We get into ordinary math mode from display math mode when `\.{\\eqno}' or
21705`\.{\\leqno}' appears. In such cases |cur_chr| will be 0 or~1, respectively;
21706the value of |cur_chr| is placed onto |save_stack| for safe keeping.
21707
21708@<Cases of |main_control| that build...@>=
21709mmode+eq_no: if privileged then
21710  if cur_group=math_shift_group then start_eq_no
21711  else off_save;
21712
21713@ @<Put each...@>=
21714primitive("eqno",eq_no,0);
21715@!@:eq_no_}{\.{\\eqno} primitive@>
21716primitive("leqno",eq_no,1);
21717@!@:leq_no_}{\.{\\leqno} primitive@>
21718
21719@ When \TeX\ is in display math mode, |cur_group=math_shift_group|,
21720so it is not necessary for the |start_eq_no| procedure to test for
21721this condition.
21722
21723@<Declare act...@>=
21724procedure start_eq_no;
21725begin saved(0):=cur_chr; incr(save_ptr);
21726@<Go into ordinary math mode@>;
21727end;
21728
21729@ @<Cases of |print_cmd_chr|...@>=
21730eq_no:if chr_code=1 then print_esc("leqno")@+else print_esc("eqno");
21731
21732@ @<Forbidden...@>=non_math(eq_no),
21733
21734@ When we enter display math mode, we need to call |line_break| to
21735process the partial paragraph that has just been interrupted by the
21736display. Then we can set the proper values of |display_width| and
21737|display_indent| and |pre_display_size|.
21738
21739@<Go into display math mode@>=
21740begin if head=tail then {`\.{\\noindent\$\$}' or `\.{\$\${ }\$\$}'}
21741  begin pop_nest; w:=-max_dimen;
21742  end
21743else  begin line_break(display_widow_penalty);@/
21744  @<Calculate the natural width, |w|, by which the characters of the
21745    final line extend to the right of the reference point,
21746    plus two ems; or set |w:=max_dimen| if the non-blank information
21747    on that line is affected by stretching or shrinking@>;
21748  end;
21749{now we are in vertical mode, working on the list that will contain the display}
21750@<Calculate the length, |l|, and the shift amount, |s|, of the display lines@>;
21751push_math(math_shift_group); mode:=mmode;
21752eq_word_define(int_base+cur_fam_code,-1);@/
21753eq_word_define(dimen_base+pre_display_size_code,w);
21754eq_word_define(dimen_base+display_width_code,l);
21755eq_word_define(dimen_base+display_indent_code,s);
21756if every_display<>null then begin_token_list(every_display,every_display_text);
21757if nest_ptr=1 then build_page;
21758end
21759
21760@ @<Calculate the natural width, |w|, by which...@>=
21761v:=shift_amount(just_box)+2*quad(cur_font); w:=-max_dimen;
21762p:=list_ptr(just_box);
21763while p<>null do
21764  begin @<Let |d| be the natural width of node |p|;
21765    if the node is ``visible,'' |goto found|;
21766    if the node is glue that stretches or shrinks, set |v:=max_dimen|@>;
21767  if v<max_dimen then v:=v+d;
21768  goto not_found;
21769  found: if v<max_dimen then
21770    begin v:=v+d; w:=v;
21771    end
21772  else  begin w:=max_dimen; goto done;
21773    end;
21774  not_found: p:=link(p);
21775  end;
21776done:
21777
21778@ @<Let |d| be the natural width of node |p|...@>=
21779reswitch: if is_char_node(p) then
21780  begin f:=font(p); d:=char_width(f)(char_info(f)(character(p)));
21781  goto found;
21782  end;
21783case type(p) of
21784hlist_node,vlist_node,rule_node: begin d:=width(p); goto found;
21785  end;
21786ligature_node:@<Make node |p| look like a |char_node|...@>;
21787kern_node,math_node: d:=width(p);
21788glue_node:@<Let |d| be the natural width of this glue; if stretching
21789  or shrinking, set |v:=max_dimen|; |goto found| in the case of leaders@>;
21790whatsit_node: @<Let |d| be the width of the whatsit |p|@>;
21791othercases d:=0
21792endcases
21793
21794@ We need to be careful that |w|, |v|, and |d| do not depend on any |glue_set|
21795values, since such values are subject to system-dependent rounding.
21796System-dependent numbers are not allowed to infiltrate parameters like
21797|pre_display_size|, since \TeX82 is supposed to make the same decisions on all
21798machines.
21799
21800@<Let |d| be the natural width of this glue...@>=
21801begin q:=glue_ptr(p); d:=width(q);
21802if glue_sign(just_box)=stretching then
21803  begin if (glue_order(just_box)=stretch_order(q))and@|
21804     (stretch(q)<>0) then
21805    v:=max_dimen;
21806  end
21807else if glue_sign(just_box)=shrinking then
21808  begin if (glue_order(just_box)=shrink_order(q))and@|
21809     (shrink(q)<>0) then
21810    v:=max_dimen;
21811  end;
21812if subtype(p)>=a_leaders then goto found;
21813end
21814
21815@ A displayed equation is considered to be three lines long, so we
21816calculate the length and offset of line number |prev_graf+2|.
21817
21818@<Calculate the length, |l|, ...@>=
21819if par_shape_ptr=null then
21820  if (hang_indent<>0)and@|
21821   (((hang_after>=0)and(prev_graf+2>hang_after))or@|
21822    (prev_graf+1<-hang_after)) then
21823    begin l:=hsize-abs(hang_indent);
21824    if hang_indent>0 then s:=hang_indent@+else s:=0;
21825    end
21826  else  begin l:=hsize; s:=0;
21827    end
21828else  begin n:=info(par_shape_ptr);
21829  if prev_graf+2>=n then p:=par_shape_ptr+2*n
21830  else p:=par_shape_ptr+2*(prev_graf+2);
21831  s:=mem[p-1].sc; l:=mem[p].sc;
21832  end
21833
21834@ Subformulas of math formulas cause a new level of math mode to be entered,
21835on the semantic nest as well as the save stack. These subformulas arise in
21836several ways: (1)~A left brace by itself indicates the beginning of a
21837subformula that will be put into a box, thereby freezing its glue and
21838preventing line breaks. (2)~A subscript or superscript is treated as a
21839subformula if it is not a single character; the same applies to
21840the nucleus of things like \.{\\underline}. (3)~The \.{\\left} primitive
21841initiates a subformula that will be terminated by a matching \.{\\right}.
21842The group codes placed on |save_stack| in these three cases are
21843|math_group|, |math_group|, and |math_left_group|, respectively.
21844
21845Here is the code that handles case (1); the other cases are not quite as
21846trivial, so we shall consider them later.
21847
21848@<Cases of |main_control| that build...@>=
21849mmode+left_brace: begin tail_append(new_noad);
21850  back_input; scan_math(nucleus(tail));
21851  end;
21852
21853@ Recall that the |nucleus|, |subscr|, and |supscr| fields in a noad are
21854broken down into subfields called |math_type| and either |info| or
21855|(fam,character)|. The job of |scan_math| is to figure out what to place
21856in one of these principal fields; it looks at the subformula that
21857comes next in the input, and places an encoding of that subformula
21858into a given word of |mem|.
21859
21860@d fam_in_range==((cur_fam>=0)and(cur_fam<16))
21861
21862@<Declare act...@>=
21863procedure scan_math(@!p:pointer);
21864label restart,reswitch,exit;
21865var c:integer; {math character code}
21866begin restart:@<Get the next non-blank non-relax...@>;
21867reswitch:case cur_cmd of
21868letter,other_char,char_given: begin c:=ho(math_code(cur_chr));
21869    if c=@'100000 then
21870      begin @<Treat |cur_chr| as an active character@>;
21871      goto restart;
21872      end;
21873    end;
21874char_num: begin scan_char_num; cur_chr:=cur_val; cur_cmd:=char_given;
21875  goto reswitch;
21876  end;
21877math_char_num: begin scan_fifteen_bit_int; c:=cur_val;
21878  end;
21879math_given: c:=cur_chr;
21880delim_num: begin scan_twenty_seven_bit_int; c:=cur_val div @'10000;
21881  end;
21882othercases @<Scan a subformula enclosed in braces and |return|@>
21883endcases;@/
21884math_type(p):=math_char; character(p):=qi(c mod 256);
21885if (c>=var_code)and fam_in_range then fam(p):=cur_fam
21886else fam(p):=(c div 256) mod 16;
21887exit:end;
21888
21889@ An active character that is an |outer_call| is allowed here.
21890
21891@<Treat |cur_chr|...@>=
21892begin cur_cs:=cur_chr+active_base;
21893cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs);
21894x_token; back_input;
21895end
21896
21897@ The pointer |p| is placed on |save_stack| while a complex subformula
21898is being scanned.
21899
21900@<Scan a subformula...@>=
21901begin back_input; scan_left_brace;@/
21902saved(0):=p; incr(save_ptr); push_math(math_group); return;
21903end
21904
21905@ The simplest math formula is, of course, `\.{\${ }\$}', when no noads are
21906generated. The next simplest cases involve a single character, e.g.,
21907`\.{\$x\$}'. Even though such cases may not seem to be very interesting,
21908the reader can perhaps understand how happy the author was when `\.{\$x\$}'
21909was first properly typeset by \TeX. The code in this section was used.
21910@^Knuth, Donald Ervin@>
21911
21912@<Cases of |main_control| that build...@>=
21913mmode+letter,mmode+other_char,mmode+char_given:
21914  set_math_char(ho(math_code(cur_chr)));
21915mmode+char_num: begin scan_char_num; cur_chr:=cur_val;
21916  set_math_char(ho(math_code(cur_chr)));
21917  end;
21918mmode+math_char_num: begin scan_fifteen_bit_int; set_math_char(cur_val);
21919  end;
21920mmode+math_given: set_math_char(cur_chr);
21921mmode+delim_num: begin scan_twenty_seven_bit_int;
21922  set_math_char(cur_val div @'10000);
21923  end;
21924
21925@ The |set_math_char| procedure creates a new noad appropriate to a given
21926math code, and appends it to the current mlist. However, if the math code
21927is sufficiently large, the |cur_chr| is treated as an active character and
21928nothing is appended.
21929
21930@<Declare act...@>=
21931procedure set_math_char(@!c:integer);
21932var p:pointer; {the new noad}
21933begin if c>=@'100000 then
21934  @<Treat |cur_chr|...@>
21935else  begin p:=new_noad; math_type(nucleus(p)):=math_char;
21936  character(nucleus(p)):=qi(c mod 256);
21937  fam(nucleus(p)):=(c div 256) mod 16;
21938  if c>=var_code then
21939    begin if fam_in_range then fam(nucleus(p)):=cur_fam;
21940    type(p):=ord_noad;
21941    end
21942  else  type(p):=ord_noad+(c div @'10000);
21943  link(tail):=p; tail:=p;
21944  end;
21945end;
21946
21947@ Primitive math operators like \.{\\mathop} and \.{\\underline} are given
21948the command code |math_comp|, supplemented by the noad type that they
21949generate.
21950
21951@<Put each...@>=
21952primitive("mathord",math_comp,ord_noad);
21953@!@:math_ord_}{\.{\\mathord} primitive@>
21954primitive("mathop",math_comp,op_noad);
21955@!@:math_op_}{\.{\\mathop} primitive@>
21956primitive("mathbin",math_comp,bin_noad);
21957@!@:math_bin_}{\.{\\mathbin} primitive@>
21958primitive("mathrel",math_comp,rel_noad);
21959@!@:math_rel_}{\.{\\mathrel} primitive@>
21960primitive("mathopen",math_comp,open_noad);
21961@!@:math_open_}{\.{\\mathopen} primitive@>
21962primitive("mathclose",math_comp,close_noad);
21963@!@:math_close_}{\.{\\mathclose} primitive@>
21964primitive("mathpunct",math_comp,punct_noad);
21965@!@:math_punct_}{\.{\\mathpunct} primitive@>
21966primitive("mathinner",math_comp,inner_noad);
21967@!@:math_inner_}{\.{\\mathinner} primitive@>
21968primitive("underline",math_comp,under_noad);
21969@!@:underline_}{\.{\\underline} primitive@>
21970primitive("overline",math_comp,over_noad);@/
21971@!@:overline_}{\.{\\overline} primitive@>
21972primitive("displaylimits",limit_switch,normal);
21973@!@:display_limits_}{\.{\\displaylimits} primitive@>
21974primitive("limits",limit_switch,limits);
21975@!@:limits_}{\.{\\limits} primitive@>
21976primitive("nolimits",limit_switch,no_limits);
21977@!@:no_limits_}{\.{\\nolimits} primitive@>
21978
21979@ @<Cases of |print_cmd_chr|...@>=
21980math_comp: case chr_code of
21981  ord_noad: print_esc("mathord");
21982  op_noad: print_esc("mathop");
21983  bin_noad: print_esc("mathbin");
21984  rel_noad: print_esc("mathrel");
21985  open_noad: print_esc("mathopen");
21986  close_noad: print_esc("mathclose");
21987  punct_noad: print_esc("mathpunct");
21988  inner_noad: print_esc("mathinner");
21989  under_noad: print_esc("underline");
21990  othercases print_esc("overline")
21991  endcases;
21992limit_switch: if chr_code=limits then print_esc("limits")
21993  else if chr_code=no_limits then print_esc("nolimits")
21994  else print_esc("displaylimits");
21995
21996@ @<Cases of |main_control| that build...@>=
21997mmode+math_comp: begin tail_append(new_noad);
21998  type(tail):=cur_chr; scan_math(nucleus(tail));
21999  end;
22000mmode+limit_switch: math_limit_switch;
22001
22002@ @<Declare act...@>=
22003procedure math_limit_switch;
22004label exit;
22005begin if head<>tail then if type(tail)=op_noad then
22006  begin subtype(tail):=cur_chr; return;
22007  end;
22008print_err("Limit controls must follow a math operator");
22009@.Limit controls must follow...@>
22010help1("I'm ignoring this misplaced \limits or \nolimits command."); error;
22011exit:end;
22012
22013@ Delimiter fields of noads are filled in by the |scan_delimiter| routine.
22014The first parameter of this procedure is the |mem| address where the
22015delimiter is to be placed; the second tells if this delimiter follows
22016\.{\\radical} or not.
22017
22018@<Declare act...@>=
22019procedure scan_delimiter(@!p:pointer;@!r:boolean);
22020begin if r then scan_twenty_seven_bit_int
22021else  begin @<Get the next non-blank non-relax...@>;
22022  case cur_cmd of
22023  letter,other_char: cur_val:=del_code(cur_chr);
22024  delim_num: scan_twenty_seven_bit_int;
22025  othercases cur_val:=-1
22026  endcases;
22027  end;
22028if cur_val<0 then @<Report that an invalid delimiter code is being changed
22029   to null; set~|cur_val:=0|@>;
22030small_fam(p):=(cur_val div @'4000000) mod 16;
22031small_char(p):=qi((cur_val div @'10000) mod 256);
22032large_fam(p):=(cur_val div 256) mod 16;
22033large_char(p):=qi(cur_val mod 256);
22034end;
22035
22036@ @<Report that an invalid delimiter...@>=
22037begin print_err("Missing delimiter (. inserted)");
22038@.Missing delimiter...@>
22039help6("I was expecting to see something like `(' or `\{' or")@/
22040  ("`\}' here. If you typed, e.g., `{' instead of `\{', you")@/
22041  ("should probably delete the `{' by typing `1' now, so that")@/
22042  ("braces don't get unbalanced. Otherwise just proceed.")@/
22043  ("Acceptable delimiters are characters whose \delcode is")@/
22044  ("nonnegative, or you can use `\delimiter <delimiter code>'.");
22045back_error; cur_val:=0;
22046end
22047
22048@ @<Cases of |main_control| that build...@>=
22049mmode+radical:math_radical;
22050
22051@ @<Declare act...@>=
22052procedure math_radical;
22053begin tail_append(get_node(radical_noad_size));
22054type(tail):=radical_noad; subtype(tail):=normal;
22055mem[nucleus(tail)].hh:=empty_field;
22056mem[subscr(tail)].hh:=empty_field;
22057mem[supscr(tail)].hh:=empty_field;
22058scan_delimiter(left_delimiter(tail),true); scan_math(nucleus(tail));
22059end;
22060
22061@ @<Cases of |main_control| that build...@>=
22062mmode+accent,mmode+math_accent:math_ac;
22063
22064@ @<Declare act...@>=
22065procedure math_ac;
22066begin if cur_cmd=accent then
22067  @<Complain that the user should have said \.{\\mathaccent}@>;
22068tail_append(get_node(accent_noad_size));
22069type(tail):=accent_noad; subtype(tail):=normal;
22070mem[nucleus(tail)].hh:=empty_field;
22071mem[subscr(tail)].hh:=empty_field;
22072mem[supscr(tail)].hh:=empty_field;
22073math_type(accent_chr(tail)):=math_char;
22074scan_fifteen_bit_int;
22075character(accent_chr(tail)):=qi(cur_val mod 256);
22076if (cur_val>=var_code)and fam_in_range then fam(accent_chr(tail)):=cur_fam
22077else fam(accent_chr(tail)):=(cur_val div 256) mod 16;
22078scan_math(nucleus(tail));
22079end;
22080
22081@ @<Complain that the user should have said \.{\\mathaccent}@>=
22082begin print_err("Please use "); print_esc("mathaccent");
22083print(" for accents in math mode");
22084@.Please use \\mathaccent...@>
22085help2("I'm changing \accent to \mathaccent here; wish me luck.")@/
22086  ("(Accents are not the same in formulas as they are in text.)");
22087error;
22088end
22089
22090@ @<Cases of |main_control| that build...@>=
22091mmode+vcenter: begin scan_spec(vcenter_group,false); normal_paragraph;
22092  push_nest; mode:=-vmode; prev_depth:=ignore_depth;
22093  if every_vbox<>null then begin_token_list(every_vbox,every_vbox_text);
22094  end;
22095
22096@ @<Cases of |handle...@>=
22097vcenter_group: begin end_graf; unsave; save_ptr:=save_ptr-2;
22098  p:=vpack(link(head),saved(1),saved(0)); pop_nest;
22099  tail_append(new_noad); type(tail):=vcenter_noad;
22100  math_type(nucleus(tail)):=sub_box; info(nucleus(tail)):=p;
22101  end;
22102
22103@ The routine that inserts a |style_node| holds no surprises.
22104
22105@<Put each...@>=
22106primitive("displaystyle",math_style,display_style);
22107@!@:display_style_}{\.{\\displaystyle} primitive@>
22108primitive("textstyle",math_style,text_style);
22109@!@:text_style_}{\.{\\textstyle} primitive@>
22110primitive("scriptstyle",math_style,script_style);
22111@!@:script_style_}{\.{\\scriptstyle} primitive@>
22112primitive("scriptscriptstyle",math_style,script_script_style);
22113@!@:script_script_style_}{\.{\\scriptscriptstyle} primitive@>
22114
22115@ @<Cases of |print_cmd_chr|...@>=
22116math_style: print_style(chr_code);
22117
22118@ @<Cases of |main_control| that build...@>=
22119mmode+math_style: tail_append(new_style(cur_chr));
22120mmode+non_script: begin tail_append(new_glue(zero_glue));
22121  subtype(tail):=cond_math_glue;
22122  end;
22123mmode+math_choice: append_choices;
22124
22125@ The routine that scans the four mlists of a \.{\\mathchoice} is very
22126much like the routine that builds discretionary nodes.
22127
22128@<Declare act...@>=
22129procedure append_choices;
22130begin tail_append(new_choice); incr(save_ptr); saved(-1):=0;
22131push_math(math_choice_group); scan_left_brace;
22132end;
22133
22134@ @<Cases of |handle_right_brace|...@>=
22135math_choice_group: build_choices;
22136
22137@ @<Declare act...@>=
22138@t\4@>@<Declare the function called |fin_mlist|@>@t@>@;@/
22139procedure build_choices;
22140label exit;
22141var p:pointer; {the current mlist}
22142begin unsave; p:=fin_mlist(null);
22143case saved(-1) of
221440:display_mlist(tail):=p;
221451:text_mlist(tail):=p;
221462:script_mlist(tail):=p;
221473:begin script_script_mlist(tail):=p; decr(save_ptr); return;
22148  end;
22149end; {there are no other cases}
22150incr(saved(-1)); push_math(math_choice_group); scan_left_brace;
22151exit:end;
22152
22153@ Subscripts and superscripts are attached to the previous nucleus by the
22154@^superscripts@>@^subscripts@>
22155action procedure called |sub_sup|. We use the facts that |sub_mark=sup_mark+1|
22156and |subscr(p)=supscr(p)+1|.
22157
22158@<Cases of |main_control| that build...@>=
22159mmode+sub_mark,mmode+sup_mark: sub_sup;
22160
22161@ @<Declare act...@>=
22162procedure sub_sup;
22163var t:small_number; {type of previous sub/superscript}
22164@!p:pointer; {field to be filled by |scan_math|}
22165begin t:=empty; p:=null;
22166if tail<>head then if scripts_allowed(tail) then
22167  begin p:=supscr(tail)+cur_cmd-sup_mark; {|supscr| or |subscr|}
22168  t:=math_type(p);
22169  end;
22170if (p=null)or(t<>empty) then @<Insert a dummy noad to be sub/superscripted@>;
22171scan_math(p);
22172end;
22173
22174@ @<Insert a dummy...@>=
22175begin tail_append(new_noad);
22176p:=supscr(tail)+cur_cmd-sup_mark; {|supscr| or |subscr|}
22177if t<>empty then
22178  begin if cur_cmd=sup_mark then
22179    begin print_err("Double superscript");
22180@.Double superscript@>
22181    help1("I treat `x^1^2' essentially like `x^1{}^2'.");
22182    end
22183  else  begin print_err("Double subscript");
22184@.Double subscript@>
22185    help1("I treat `x_1_2' essentially like `x_1{}_2'.");
22186    end;
22187  error;
22188  end;
22189end
22190
22191@ An operation like `\.{\\over}' causes the current mlist to go into a
22192state of suspended animation: |incompleat_noad| points to a |fraction_noad|
22193that contains the mlist-so-far as its numerator, while the denominator
22194is yet to come. Finally when the mlist is finished, the denominator will
22195go into the incompleat fraction noad, and that noad will become the
22196whole formula, unless it is surrounded by `\.{\\left}' and `\.{\\right}'
22197delimiters.
22198
22199@d above_code=0 { `\.{\\above}' }
22200@d over_code=1 { `\.{\\over}' }
22201@d atop_code=2 { `\.{\\atop}' }
22202@d delimited_code=3 { `\.{\\abovewithdelims}', etc.}
22203
22204@<Put each...@>=
22205primitive("above",above,above_code);@/
22206@!@:above_}{\.{\\above} primitive@>
22207primitive("over",above,over_code);@/
22208@!@:over_}{\.{\\over} primitive@>
22209primitive("atop",above,atop_code);@/
22210@!@:atop_}{\.{\\atop} primitive@>
22211primitive("abovewithdelims",above,delimited_code+above_code);@/
22212@!@:above_with_delims_}{\.{\\abovewithdelims} primitive@>
22213primitive("overwithdelims",above,delimited_code+over_code);@/
22214@!@:over_with_delims_}{\.{\\overwithdelims} primitive@>
22215primitive("atopwithdelims",above,delimited_code+atop_code);
22216@!@:atop_with_delims_}{\.{\\atopwithdelims} primitive@>
22217
22218@ @<Cases of |print_cmd_chr|...@>=
22219above: case chr_code of
22220  over_code:print_esc("over");
22221  atop_code:print_esc("atop");
22222  delimited_code+above_code:print_esc("abovewithdelims");
22223  delimited_code+over_code:print_esc("overwithdelims");
22224  delimited_code+atop_code:print_esc("atopwithdelims");
22225  othercases print_esc("above")
22226  endcases;
22227
22228@ @<Cases of |main_control| that build...@>=
22229mmode+above: math_fraction;
22230
22231@ @<Declare act...@>=
22232procedure math_fraction;
22233var c:small_number; {the type of generalized fraction we are scanning}
22234begin c:=cur_chr;
22235if incompleat_noad<>null then
22236  @<Ignore the fraction operation and complain about this ambiguous case@>
22237else  begin incompleat_noad:=get_node(fraction_noad_size);
22238  type(incompleat_noad):=fraction_noad;
22239  subtype(incompleat_noad):=normal;
22240  math_type(numerator(incompleat_noad)):=sub_mlist;
22241  info(numerator(incompleat_noad)):=link(head);
22242  mem[denominator(incompleat_noad)].hh:=empty_field;
22243  mem[left_delimiter(incompleat_noad)].qqqq:=null_delimiter;
22244  mem[right_delimiter(incompleat_noad)].qqqq:=null_delimiter;@/
22245  link(head):=null; tail:=head;
22246  @<Use code |c| to distinguish between generalized fractions@>;
22247  end;
22248end;
22249
22250@ @<Use code |c|...@>=
22251if c>=delimited_code then
22252  begin scan_delimiter(left_delimiter(incompleat_noad),false);
22253  scan_delimiter(right_delimiter(incompleat_noad),false);
22254  end;
22255case c mod delimited_code of
22256above_code: begin scan_normal_dimen;
22257  thickness(incompleat_noad):=cur_val;
22258  end;
22259over_code: thickness(incompleat_noad):=default_code;
22260atop_code: thickness(incompleat_noad):=0;
22261end {there are no other cases}
22262
22263@ @<Ignore the fraction...@>=
22264begin if c>=delimited_code then
22265  begin scan_delimiter(garbage,false); scan_delimiter(garbage,false);
22266  end;
22267if c mod delimited_code=above_code then scan_normal_dimen;
22268print_err("Ambiguous; you need another { and }");
22269@.Ambiguous...@>
22270help3("I'm ignoring this fraction specification, since I don't")@/
22271  ("know whether a construction like `x \over y \over z'")@/
22272  ("means `{x \over y} \over z' or `x \over {y \over z}'.");
22273error;
22274end
22275
22276@ At the end of a math formula or subformula, the |fin_mlist| routine is
22277called upon to return a pointer to the newly completed mlist, and to
22278pop the nest back to the enclosing semantic level. The parameter to
22279|fin_mlist|, if not null, points to a |right_noad| that ends the
22280current mlist; this |right_noad| has not yet been appended.
22281
22282@<Declare the function called |fin_mlist|@>=
22283function fin_mlist(@!p:pointer):pointer;
22284var q:pointer; {the mlist to return}
22285begin if incompleat_noad<>null then @<Compleat the incompleat noad@>
22286else  begin link(tail):=p; q:=link(head);
22287  end;
22288pop_nest; fin_mlist:=q;
22289end;
22290
22291@ @<Compleat...@>=
22292begin math_type(denominator(incompleat_noad)):=sub_mlist;
22293info(denominator(incompleat_noad)):=link(head);
22294if p=null then q:=incompleat_noad
22295else  begin q:=info(numerator(incompleat_noad));
22296  if type(q)<>left_noad then confusion("right");
22297@:this can't happen right}{\quad right@>
22298  info(numerator(incompleat_noad)):=link(q);
22299  link(q):=incompleat_noad; link(incompleat_noad):=p;
22300  end;
22301end
22302
22303@ Now at last we're ready to see what happens when a right brace occurs
22304in a math formula. Two special cases are simplified here: Braces are effectively
22305removed when they surround a single Ord without sub/superscripts, or when they
22306surround an accent that is the nucleus of an Ord atom.
22307
22308@<Cases of |handle...@>=
22309math_group: begin unsave; decr(save_ptr);@/
22310  math_type(saved(0)):=sub_mlist; p:=fin_mlist(null); info(saved(0)):=p;
22311  if p<>null then if link(p)=null then
22312   if type(p)=ord_noad then
22313    begin if math_type(subscr(p))=empty then
22314     if math_type(supscr(p))=empty then
22315      begin mem[saved(0)].hh:=mem[nucleus(p)].hh;
22316      free_node(p,noad_size);
22317      end;
22318    end
22319  else if type(p)=accent_noad then if saved(0)=nucleus(tail) then
22320   if type(tail)=ord_noad then @<Replace the tail of the list by |p|@>;
22321  end;
22322
22323@ @<Replace the tail...@>=
22324begin q:=head; while link(q)<>tail do q:=link(q);
22325link(q):=p; free_node(tail,noad_size); tail:=p;
22326end
22327
22328@ We have dealt with all constructions of math mode except `\.{\\left}' and
22329`\.{\\right}', so the picture is completed by the following sections of
22330the program.
22331
22332@<Put each...@>=
22333primitive("left",left_right,left_noad);
22334@!@:left_}{\.{\\left} primitive@>
22335primitive("right",left_right,right_noad);
22336@!@:right_}{\.{\\right} primitive@>
22337text(frozen_right):="right"; eqtb[frozen_right]:=eqtb[cur_val];
22338
22339@ @<Cases of |print_cmd_chr|...@>=
22340left_right: if chr_code=left_noad then print_esc("left")
22341else print_esc("right");
22342
22343@ @<Cases of |main_control| that build...@>=
22344mmode+left_right: math_left_right;
22345
22346@ @<Declare act...@>=
22347procedure math_left_right;
22348var t:small_number; {|left_noad| or |right_noad|}
22349@!p:pointer; {new noad}
22350begin t:=cur_chr;
22351if (t=right_noad)and(cur_group<>math_left_group) then
22352  @<Try to recover from mismatched \.{\\right}@>
22353else  begin p:=new_noad; type(p):=t;
22354  scan_delimiter(delimiter(p),false);
22355  if t=left_noad then
22356    begin push_math(math_left_group); link(head):=p; tail:=p;
22357    end
22358  else  begin p:=fin_mlist(p); unsave; {end of |math_left_group|}
22359    tail_append(new_noad); type(tail):=inner_noad;
22360    math_type(nucleus(tail)):=sub_mlist;
22361    info(nucleus(tail)):=p;
22362    end;
22363  end;
22364end;
22365
22366@ @<Try to recover from mismatch...@>=
22367begin if cur_group=math_shift_group then
22368  begin scan_delimiter(garbage,false);
22369  print_err("Extra "); print_esc("right");
22370@.Extra \\right.@>
22371  help1("I'm ignoring a \right that had no matching \left.");
22372  error;
22373  end
22374else off_save;
22375end
22376
22377@ Here is the only way out of math mode.
22378
22379@<Cases of |main_control| that build...@>=
22380mmode+math_shift: if cur_group=math_shift_group then after_math
22381  else off_save;
22382
22383@ @<Declare act...@>=
22384procedure after_math;
22385var l:boolean; {`\.{\\leqno}' instead of `\.{\\eqno}'}
22386@!danger:boolean; {not enough symbol fonts are present}
22387@!m:integer; {|mmode| or |-mmode|}
22388@!p:pointer; {the formula}
22389@!a:pointer; {box containing equation number}
22390@<Local variables for finishing a displayed formula@>@;
22391begin danger:=false;
22392@<Check that the necessary fonts for math symbols are present;
22393  if not, flush the current math lists and set |danger:=true|@>;
22394m:=mode; l:=false; p:=fin_mlist(null); {this pops the nest}
22395if mode=-m then {end of equation number}
22396  begin @<Check that another \.\$ follows@>;
22397  cur_mlist:=p; cur_style:=text_style; mlist_penalties:=false;
22398  mlist_to_hlist; a:=hpack(link(temp_head),natural);
22399  unsave; decr(save_ptr); {now |cur_group=math_shift_group|}
22400  if saved(0)=1 then l:=true;
22401  danger:=false;
22402  @<Check that the necessary fonts for math symbols are present;
22403    if not, flush the current math lists and set |danger:=true|@>;
22404  m:=mode; p:=fin_mlist(null);
22405  end
22406else a:=null;
22407if m<0 then @<Finish math in text@>
22408else  begin if a=null then @<Check that another \.\$ follows@>;
22409  @<Finish displayed math@>;
22410  end;
22411end;
22412
22413@ @<Check that the necessary fonts...@>=
22414if (font_params[fam_fnt(2+text_size)]<total_mathsy_params)or@|
22415   (font_params[fam_fnt(2+script_size)]<total_mathsy_params)or@|
22416   (font_params[fam_fnt(2+script_script_size)]<total_mathsy_params) then
22417  begin print_err("Math formula deleted: Insufficient symbol fonts");@/
22418@.Math formula deleted...@>
22419  help3("Sorry, but I can't typeset math unless \textfont 2")@/
22420    ("and \scriptfont 2 and \scriptscriptfont 2 have all")@/
22421    ("the \fontdimen values needed in math symbol fonts.");
22422  error; flush_math; danger:=true;
22423  end
22424else if (font_params[fam_fnt(3+text_size)]<total_mathex_params)or@|
22425   (font_params[fam_fnt(3+script_size)]<total_mathex_params)or@|
22426   (font_params[fam_fnt(3+script_script_size)]<total_mathex_params) then
22427  begin print_err("Math formula deleted: Insufficient extension fonts");@/
22428  help3("Sorry, but I can't typeset math unless \textfont 3")@/
22429    ("and \scriptfont 3 and \scriptscriptfont 3 have all")@/
22430    ("the \fontdimen values needed in math extension fonts.");
22431  error; flush_math; danger:=true;
22432  end
22433
22434@ The |unsave| is done after everything else here; hence an appearance of
22435`\.{\\mathsurround}' inside of `\.{\$...\$}' affects the spacing at these
22436particular \.\$'s. This is consistent with the conventions of
22437`\.{\$\$...\$\$}', since `\.{\\abovedisplayskip}' inside a display affects the
22438space above that display.
22439
22440@<Finish math in text@>=
22441begin tail_append(new_math(math_surround,before));
22442cur_mlist:=p; cur_style:=text_style; mlist_penalties:=(mode>0); mlist_to_hlist;
22443link(tail):=link(temp_head);
22444while link(tail)<>null do tail:=link(tail);
22445tail_append(new_math(math_surround,after));
22446space_factor:=1000; unsave;
22447end
22448
22449@ \TeX\ gets to the following part of the program when the first `\.\$' ending
22450a display has been scanned.
22451
22452@<Check that another \.\$ follows@>=
22453begin get_x_token;
22454if cur_cmd<>math_shift then
22455  begin print_err("Display math should end with $$");
22456@.Display math...with \$\$@>
22457  help2("The `$' that I just saw supposedly matches a previous `$$'.")@/
22458    ("So I shall assume that you typed `$$' both times.");
22459  back_error;
22460  end;
22461end
22462
22463@ We have saved the worst for last: The fussiest part of math mode processing
22464occurs when a displayed formula is being centered and placed with an optional
22465equation number.
22466
22467@<Local variables for finishing...@>=
22468@!b:pointer; {box containing the equation}
22469@!w:scaled; {width of the equation}
22470@!z:scaled; {width of the line}
22471@!e:scaled; {width of equation number}
22472@!q:scaled; {width of equation number plus space to separate from equation}
22473@!d:scaled; {displacement of equation in the line}
22474@!s:scaled; {move the line right this much}
22475@!g1,@!g2:small_number; {glue parameter codes for before and after}
22476@!r:pointer; {kern node used to position the display}
22477@!t:pointer; {tail of adjustment list}
22478
22479@ At this time |p| points to the mlist for the formula; |a| is either
22480|null| or it points to a box containing the equation number; and we are in
22481vertical mode (or internal vertical mode).
22482
22483@<Finish displayed math@>=
22484cur_mlist:=p; cur_style:=display_style; mlist_penalties:=false;
22485mlist_to_hlist; p:=link(temp_head);@/
22486adjust_tail:=adjust_head; b:=hpack(p,natural); p:=list_ptr(b);
22487t:=adjust_tail; adjust_tail:=null;@/
22488w:=width(b); z:=display_width; s:=display_indent;
22489if (a=null)or danger then
22490  begin e:=0; q:=0;
22491  end
22492else  begin e:=width(a); q:=e+math_quad(text_size);
22493  end;
22494if w+q>z then
22495  @<Squeeze the equation as much as possible; if there is an equation
22496    number that should go on a separate line by itself,
22497    set~|e:=0|@>;
22498@<Determine the displacement, |d|, of the left edge of the equation, with
22499  respect to the line size |z|, assuming that |l=false|@>;
22500@<Append the glue or equation number preceding the display@>;
22501@<Append the display and perhaps also the equation number@>;
22502@<Append the glue or equation number following the display@>;
22503resume_after_display
22504
22505@ @<Declare act...@>=
22506procedure resume_after_display;
22507begin if cur_group<>math_shift_group then confusion("display");
22508@:this can't happen display}{\quad display@>
22509unsave; prev_graf:=prev_graf+3;
22510push_nest; mode:=hmode; space_factor:=1000; set_cur_lang; clang:=cur_lang;
22511prev_graf:=(norm_min(left_hyphen_min)*@'100+norm_min(right_hyphen_min))
22512             *@'200000+cur_lang;
22513@<Scan an optional space@>;
22514if nest_ptr=1 then build_page;
22515end;
22516
22517@ The user can force the equation number to go on a separate line
22518by causing its width to be zero.
22519
22520@<Squeeze the equation as much as possible...@>=
22521begin if (e<>0)and((w-total_shrink[normal]+q<=z)or@|
22522   (total_shrink[fil]<>0)or(total_shrink[fill]<>0)or
22523   (total_shrink[filll]<>0)) then
22524  begin free_node(b,box_node_size);
22525  b:=hpack(p,z-q,exactly);
22526  end
22527else  begin e:=0;
22528  if w>z then
22529    begin free_node(b,box_node_size);
22530    b:=hpack(p,z,exactly);
22531    end;
22532  end;
22533w:=width(b);
22534end
22535
22536@ We try first to center the display without regard to the existence of
22537the equation number. If that would make it too close (where ``too close''
22538means that the space between display and equation number is less than the
22539width of the equation number), we either center it in the remaining space
22540or move it as far from the equation number as possible. The latter alternative
22541is taken only if the display begins with glue, since we assume that the
22542user put glue there to control the spacing precisely.
22543
22544@<Determine the displacement, |d|, of the left edge of the equation...@>=
22545d:=half(z-w);
22546if (e>0)and(d<2*e) then {too close}
22547  begin d:=half(z-w-e);
22548  if p<>null then if not is_char_node(p) then if type(p)=glue_node then d:=0;
22549  end
22550
22551@ If the equation number is set on a line by itself, either before or
22552after the formula, we append an infinite penalty so that no page break will
22553separate the display from its number; and we use the same size and
22554displacement for all three potential lines of the display, even though
22555`\.{\\parshape}' may specify them differently.
22556
22557@<Append the glue or equation number preceding the display@>=
22558tail_append(new_penalty(pre_display_penalty));@/
22559if (d+s<=pre_display_size)or l then {not enough clearance}
22560  begin g1:=above_display_skip_code; g2:=below_display_skip_code;
22561  end
22562else  begin g1:=above_display_short_skip_code;
22563  g2:=below_display_short_skip_code;
22564  end;
22565if l and(e=0) then {it follows that |type(a)=hlist_node|}
22566  begin shift_amount(a):=s; append_to_vlist(a);
22567  tail_append(new_penalty(inf_penalty));
22568  end
22569else tail_append(new_param_glue(g1))
22570
22571@ @<Append the display and perhaps also the equation number@>=
22572if e<>0 then
22573  begin r:=new_kern(z-w-e-d);
22574  if l then
22575    begin link(a):=r; link(r):=b; b:=a; d:=0;
22576    end
22577  else  begin link(b):=r; link(r):=a;
22578    end;
22579  b:=hpack(b,natural);
22580  end;
22581shift_amount(b):=s+d; append_to_vlist(b)
22582
22583@ @<Append the glue or equation number following the display@>=
22584if (a<>null)and(e=0)and not l then
22585  begin tail_append(new_penalty(inf_penalty));
22586  shift_amount(a):=s+z-width(a);
22587  append_to_vlist(a);
22588  g2:=0;
22589  end;
22590if t<>adjust_head then {migrating material comes after equation number}
22591  begin link(tail):=link(adjust_head); tail:=t;
22592  end;
22593tail_append(new_penalty(post_display_penalty));
22594if g2>0 then tail_append(new_param_glue(g2))
22595
22596@ When \.{\\halign} appears in a display, the alignment routines operate
22597essentially as they do in vertical mode. Then the following program is
22598activated, with |p| and |q| pointing to the beginning and end of the
22599resulting list, and with |aux_save| holding the |prev_depth| value.
22600
22601@<Finish an alignment in a display@>=
22602begin do_assignments;
22603if cur_cmd<>math_shift then @<Pontificate about improper alignment in display@>
22604else @<Check that another \.\$ follows@>;
22605pop_nest;
22606tail_append(new_penalty(pre_display_penalty));
22607tail_append(new_param_glue(above_display_skip_code));
22608link(tail):=p;
22609if p<>null then tail:=q;
22610tail_append(new_penalty(post_display_penalty));
22611tail_append(new_param_glue(below_display_skip_code));
22612prev_depth:=aux_save.sc; resume_after_display;
22613end
22614
22615@ @<Pontificate...@>=
22616begin print_err("Missing $$ inserted");
22617@.Missing {\$\$} inserted@>
22618help2("Displays can use special alignments (like \eqalignno)")@/
22619  ("only if nothing but the alignment itself is between $$'s.");
22620back_error;
22621end
22622
22623@* \[49] Mode-independent processing.
22624The long |main_control| procedure has now been fully specified, except for
22625certain activities that are independent of the current mode. These activities
22626do not change the current vlist or hlist or mlist; if they change anything,
22627it is the value of a parameter or the meaning of a control sequence.
22628
22629Assignments to values in |eqtb| can be global or local. Furthermore, a
22630control sequence can be defined to be `\.{\\long}' or `\.{\\outer}', and
22631it might or might not be expanded. The prefixes `\.{\\global}', `\.{\\long}',
22632and `\.{\\outer}' can occur in any order. Therefore we assign binary numeric
22633codes, making it possible to accumulate the union of all specified prefixes
22634by adding the corresponding codes.  (\PASCAL's |set| operations could also
22635have been used.)
22636
22637@<Put each...@>=
22638primitive("long",prefix,1);
22639@!@:long_}{\.{\\long} primitive@>
22640primitive("outer",prefix,2);
22641@!@:outer_}{\.{\\outer} primitive@>
22642primitive("global",prefix,4);
22643@!@:global_}{\.{\\global} primitive@>
22644primitive("def",def,0);
22645@!@:def_}{\.{\\def} primitive@>
22646primitive("gdef",def,1);
22647@!@:gdef_}{\.{\\gdef} primitive@>
22648primitive("edef",def,2);
22649@!@:edef_}{\.{\\edef} primitive@>
22650primitive("xdef",def,3);
22651@!@:xdef_}{\.{\\xdef} primitive@>
22652
22653@ @<Cases of |print_cmd_chr|...@>=
22654prefix: if chr_code=1 then print_esc("long")
22655  else if chr_code=2 then print_esc("outer")
22656  else print_esc("global");
22657def: if chr_code=0 then print_esc("def")
22658  else if chr_code=1 then print_esc("gdef")
22659  else if chr_code=2 then print_esc("edef")
22660  else print_esc("xdef");
22661
22662@ Every prefix, and every command code that might or might not be prefixed,
22663calls the action procedure |prefixed_command|. This routine accumulates
22664a sequence of prefixes until coming to a non-prefix, then it carries out
22665the command.
22666
22667@<Cases of |main_control| that don't...@>=
22668any_mode(toks_register),
22669any_mode(assign_toks),
22670any_mode(assign_int),
22671any_mode(assign_dimen),
22672any_mode(assign_glue),
22673any_mode(assign_mu_glue),
22674any_mode(assign_font_dimen),
22675any_mode(assign_font_int),
22676any_mode(set_aux),
22677any_mode(set_prev_graf),
22678any_mode(set_page_dimen),
22679any_mode(set_page_int),
22680any_mode(set_box_dimen),
22681any_mode(set_shape),
22682any_mode(def_code),
22683any_mode(def_family),
22684any_mode(set_font),
22685any_mode(def_font),
22686any_mode(register),
22687any_mode(advance),
22688any_mode(multiply),
22689any_mode(divide),
22690any_mode(prefix),
22691any_mode(let),
22692any_mode(shorthand_def),
22693any_mode(read_to_cs),
22694any_mode(def),
22695any_mode(set_box),
22696any_mode(hyph_data),
22697any_mode(set_interaction):prefixed_command;
22698
22699@ If the user says, e.g., `\.{\\global\\global}', the redundancy is
22700silently accepted.
22701
22702@<Declare act...@>=
22703@t\4@>@<Declare subprocedures for |prefixed_command|@>@t@>@;@/
22704procedure prefixed_command;
22705label done,exit;
22706var a:small_number; {accumulated prefix codes so far}
22707@!f:internal_font_number; {identifies a font}
22708@!j:halfword; {index into a \.{\\parshape} specification}
22709@!k:font_index; {index into |font_info|}
22710@!p,@!q:pointer; {for temporary short-term use}
22711@!n:integer; {ditto}
22712@!e:boolean; {should a definition be expanded? or was \.{\\let} not done?}
22713begin a:=0;
22714while cur_cmd=prefix do
22715  begin if not odd(a div cur_chr) then a:=a+cur_chr;
22716  @<Get the next non-blank non-relax...@>;
22717  if cur_cmd<=max_non_prefixed_command then
22718    @<Discard erroneous prefixes and |return|@>;
22719  end;
22720@<Discard the prefixes \.{\\long} and \.{\\outer} if they are irrelevant@>;
22721@<Adjust \(f)for the setting of \.{\\globaldefs}@>;
22722case cur_cmd of
22723@t\4@>@<Assignments@>@;
22724othercases confusion("prefix")
22725@:this can't happen prefix}{\quad prefix@>
22726endcases;
22727done: @<Insert a token saved by \.{\\afterassignment}, if any@>;
22728exit:end;
22729
22730@ @<Discard erroneous...@>=
22731begin print_err("You can't use a prefix with `");
22732@.You can't use a prefix with x@>
22733print_cmd_chr(cur_cmd,cur_chr); print_char("'");
22734help1("I'll pretend you didn't say \long or \outer or \global.");
22735back_error; return;
22736end
22737
22738@ @<Discard the prefixes...@>=
22739if (cur_cmd<>def)and(a mod 4<>0) then
22740  begin print_err("You can't use `"); print_esc("long"); print("' or `");
22741  print_esc("outer"); print("' with `");
22742@.You can't use \\long...@>
22743  print_cmd_chr(cur_cmd,cur_chr); print_char("'");
22744  help1("I'll pretend you didn't say \long or \outer here.");
22745  error;
22746  end
22747
22748@ The previous routine does not have to adjust |a| so that |a mod 4=0|,
22749since the following routines test for the \.{\\global} prefix as follows.
22750
22751@d global==(a>=4)
22752@d define(#)==if global then geq_define(#)@+else eq_define(#)
22753@d word_define(#)==if global then geq_word_define(#)@+else eq_word_define(#)
22754
22755@<Adjust \(f)for the setting of \.{\\globaldefs}@>=
22756if global_defs<>0 then
22757  if global_defs<0 then
22758    begin if global then a:=a-4;
22759    end
22760  else  begin if not global then a:=a+4;
22761    end
22762
22763@ When a control sequence is to be defined, by \.{\\def} or \.{\\let} or
22764something similar, the |get_r_token| routine will substitute a special
22765control sequence for a token that is not redefinable.
22766
22767@<Declare subprocedures for |prefixed_command|@>=
22768procedure get_r_token;
22769label restart;
22770begin restart: repeat get_token;
22771until cur_tok<>space_token;
22772if (cur_cs=0)or(cur_cs>frozen_control_sequence) then
22773  begin print_err("Missing control sequence inserted");
22774@.Missing control...@>
22775  help5("Please don't say `\def cs{...}', say `\def\cs{...}'.")@/
22776  ("I've inserted an inaccessible control sequence so that your")@/
22777  ("definition will be completed without mixing me up too badly.")@/
22778  ("You can recover graciously from this error, if you're")@/
22779  ("careful; see exercise 27.2 in The TeXbook.");
22780@:TeXbook}{\sl The \TeX book@>
22781  if cur_cs=0 then back_input;
22782  cur_tok:=cs_token_flag+frozen_protection; ins_error; goto restart;
22783  end;
22784end;
22785
22786@ @<Initialize table entries...@>=
22787text(frozen_protection):="inaccessible";
22788
22789@ Here's an example of the way many of the following routines operate.
22790(Unfortunately, they aren't all as simple as this.)
22791
22792@<Assignments@>=
22793set_font: define(cur_font_loc,data,cur_chr);
22794
22795@ When a |def| command has been scanned,
22796|cur_chr| is odd if the definition is supposed to be global, and
22797|cur_chr>=2| if the definition is supposed to be expanded.
22798
22799@<Assignments@>=
22800def: begin if odd(cur_chr)and not global and(global_defs>=0) then a:=a+4;
22801  e:=(cur_chr>=2); get_r_token; p:=cur_cs;
22802  q:=scan_toks(true,e); define(p,call+(a mod 4),def_ref);
22803  end;
22804
22805@ Both \.{\\let} and \.{\\futurelet} share the command code |let|.
22806
22807@<Put each...@>=
22808primitive("let",let,normal);@/
22809@!@:let_}{\.{\\let} primitive@>
22810primitive("futurelet",let,normal+1);@/
22811@!@:future_let_}{\.{\\futurelet} primitive@>
22812
22813@ @<Cases of |print_cmd_chr|...@>=
22814let: if chr_code<>normal then print_esc("futurelet")@+else print_esc("let");
22815
22816@ @<Assignments@>=
22817let:  begin n:=cur_chr;
22818  get_r_token; p:=cur_cs;
22819  if n=normal then
22820    begin repeat get_token;
22821    until cur_cmd<>spacer;
22822    if cur_tok=other_token+"=" then
22823      begin get_token;
22824      if cur_cmd=spacer then get_token;
22825      end;
22826    end
22827  else  begin get_token; q:=cur_tok; get_token; back_input;
22828    cur_tok:=q; back_input; {look ahead, then back up}
22829    end; {note that |back_input| doesn't affect |cur_cmd|, |cur_chr|}
22830  if cur_cmd>=call then add_token_ref(cur_chr);
22831  define(p,cur_cmd,cur_chr);
22832  end;
22833
22834@ A \.{\\chardef} creates a control sequence whose |cmd| is |char_given|;
22835a \.{\\mathchardef} creates a control sequence whose |cmd| is |math_given|;
22836and the corresponding |chr| is the character code or math code. A \.{\\countdef}
22837or \.{\\dimendef} or \.{\\skipdef} or \.{\\muskipdef} creates a control
22838sequence whose |cmd| is |assign_int| or \dots\ or |assign_mu_glue|, and the
22839corresponding |chr| is the |eqtb| location of the internal register in question.
22840
22841@d char_def_code=0 {|shorthand_def| for \.{\\chardef}}
22842@d math_char_def_code=1 {|shorthand_def| for \.{\\mathchardef}}
22843@d count_def_code=2 {|shorthand_def| for \.{\\countdef}}
22844@d dimen_def_code=3 {|shorthand_def| for \.{\\dimendef}}
22845@d skip_def_code=4 {|shorthand_def| for \.{\\skipdef}}
22846@d mu_skip_def_code=5 {|shorthand_def| for \.{\\muskipdef}}
22847@d toks_def_code=6 {|shorthand_def| for \.{\\toksdef}}
22848
22849@<Put each...@>=
22850primitive("chardef",shorthand_def,char_def_code);@/
22851@!@:char_def_}{\.{\\chardef} primitive@>
22852primitive("mathchardef",shorthand_def,math_char_def_code);@/
22853@!@:math_char_def_}{\.{\\mathchardef} primitive@>
22854primitive("countdef",shorthand_def,count_def_code);@/
22855@!@:count_def_}{\.{\\countdef} primitive@>
22856primitive("dimendef",shorthand_def,dimen_def_code);@/
22857@!@:dimen_def_}{\.{\\dimendef} primitive@>
22858primitive("skipdef",shorthand_def,skip_def_code);@/
22859@!@:skip_def_}{\.{\\skipdef} primitive@>
22860primitive("muskipdef",shorthand_def,mu_skip_def_code);@/
22861@!@:mu_skip_def_}{\.{\\muskipdef} primitive@>
22862primitive("toksdef",shorthand_def,toks_def_code);@/
22863@!@:toks_def_}{\.{\\toksdef} primitive@>
22864
22865@ @<Cases of |print_cmd_chr|...@>=
22866shorthand_def: case chr_code of
22867  char_def_code: print_esc("chardef");
22868  math_char_def_code: print_esc("mathchardef");
22869  count_def_code: print_esc("countdef");
22870  dimen_def_code: print_esc("dimendef");
22871  skip_def_code: print_esc("skipdef");
22872  mu_skip_def_code: print_esc("muskipdef");
22873  othercases print_esc("toksdef")
22874  endcases;
22875char_given: begin print_esc("char"); print_hex(chr_code);
22876  end;
22877math_given: begin print_esc("mathchar"); print_hex(chr_code);
22878  end;
22879
22880@ We temporarily define |p| to be |relax|, so that an occurrence of |p|
22881while scanning the definition will simply stop the scanning instead of
22882producing an ``undefined control sequence'' error or expanding the
22883previous meaning.  This allows, for instance, `\.{\\chardef\\foo=123\\foo}'.
22884
22885@<Assignments@>=
22886shorthand_def: begin n:=cur_chr; get_r_token; p:=cur_cs; define(p,relax,256);
22887  scan_optional_equals;
22888  case n of
22889  char_def_code: begin scan_char_num; define(p,char_given,cur_val);
22890    end;
22891  math_char_def_code: begin scan_fifteen_bit_int; define(p,math_given,cur_val);
22892    end;
22893  othercases begin scan_eight_bit_int;
22894    case n of
22895    count_def_code: define(p,assign_int,count_base+cur_val);
22896    dimen_def_code: define(p,assign_dimen,scaled_base+cur_val);
22897    skip_def_code: define(p,assign_glue,skip_base+cur_val);
22898    mu_skip_def_code: define(p,assign_mu_glue,mu_skip_base+cur_val);
22899    toks_def_code: define(p,assign_toks,toks_base+cur_val);
22900    end; {there are no other cases}
22901    end
22902  endcases;
22903  end;
22904
22905@ @<Assignments@>=
22906read_to_cs: begin scan_int; n:=cur_val;
22907  if not scan_keyword("to") then
22908@.to@>
22909    begin print_err("Missing `to' inserted");
22910@.Missing `to'...@>
22911    help2("You should have said `\read<number> to \cs'.")@/
22912    ("I'm going to look for the \cs now."); error;
22913    end;
22914  get_r_token;
22915  p:=cur_cs; read_toks(n,p); define(p,call,cur_val);
22916  end;
22917
22918@ The token-list parameters, \.{\\output} and \.{\\everypar}, etc., receive
22919their values in the following way. (For safety's sake, we place an
22920enclosing pair of braces around an \.{\\output} list.)
22921
22922@<Assignments@>=
22923toks_register,assign_toks: begin q:=cur_cs;
22924  if cur_cmd=toks_register then
22925    begin scan_eight_bit_int; p:=toks_base+cur_val;
22926    end
22927  else p:=cur_chr; {|p=every_par_loc| or |output_routine_loc| or \dots}
22928  scan_optional_equals;
22929  @<Get the next non-blank non-relax non-call token@>;
22930  if cur_cmd<>left_brace then @<If the right-hand side is a token parameter
22931      or token register, finish the assignment and |goto done|@>;
22932  back_input; cur_cs:=q; q:=scan_toks(false,false);
22933  if link(def_ref)=null then {empty list: revert to the default}
22934    begin define(p,undefined_cs,null); free_avail(def_ref);
22935    end
22936  else  begin if p=output_routine_loc then {enclose in curlies}
22937      begin link(q):=get_avail; q:=link(q);
22938      info(q):=right_brace_token+"}";
22939      q:=get_avail; info(q):=left_brace_token+"{";
22940      link(q):=link(def_ref); link(def_ref):=q;
22941      end;
22942    define(p,call,def_ref);
22943    end;
22944  end;
22945
22946@ @<If the right-hand side is a token parameter...@>=
22947begin if cur_cmd=toks_register then
22948  begin scan_eight_bit_int; cur_cmd:=assign_toks; cur_chr:=toks_base+cur_val;
22949  end;
22950if cur_cmd=assign_toks then
22951  begin q:=equiv(cur_chr);
22952  if q=null then define(p,undefined_cs,null)
22953  else  begin add_token_ref(q); define(p,call,q);
22954    end;
22955  goto done;
22956  end;
22957end
22958
22959@ Similar routines are used to assign values to the numeric parameters.
22960
22961@<Assignments@>=
22962assign_int: begin p:=cur_chr; scan_optional_equals; scan_int;
22963  word_define(p,cur_val);
22964  end;
22965assign_dimen: begin p:=cur_chr; scan_optional_equals;
22966  scan_normal_dimen; word_define(p,cur_val);
22967  end;
22968assign_glue,assign_mu_glue: begin p:=cur_chr; n:=cur_cmd; scan_optional_equals;
22969  if n=assign_mu_glue then scan_glue(mu_val)@+else scan_glue(glue_val);
22970  trap_zero_glue;
22971  define(p,glue_ref,cur_val);
22972  end;
22973
22974@ When a glue register or parameter becomes zero, it will always point to
22975|zero_glue| because of the following procedure. (Exception: The tabskip
22976glue isn't trapped while preambles are being scanned.)
22977
22978@<Declare subprocedures for |prefixed_command|@>=
22979procedure trap_zero_glue;
22980begin if (width(cur_val)=0)and(stretch(cur_val)=0)and(shrink(cur_val)=0) then
22981  begin add_glue_ref(zero_glue);
22982  delete_glue_ref(cur_val); cur_val:=zero_glue;
22983  end;
22984end;
22985
22986@ The various character code tables are changed by the |def_code| commands,
22987and the font families are declared by |def_family|.
22988
22989@<Put each...@>=
22990primitive("catcode",def_code,cat_code_base);
22991@!@:cat_code_}{\.{\\catcode} primitive@>
22992primitive("mathcode",def_code,math_code_base);
22993@!@:math_code_}{\.{\\mathcode} primitive@>
22994primitive("lccode",def_code,lc_code_base);
22995@!@:lc_code_}{\.{\\lccode} primitive@>
22996primitive("uccode",def_code,uc_code_base);
22997@!@:uc_code_}{\.{\\uccode} primitive@>
22998primitive("sfcode",def_code,sf_code_base);
22999@!@:sf_code_}{\.{\\sfcode} primitive@>
23000primitive("delcode",def_code,del_code_base);
23001@!@:del_code_}{\.{\\delcode} primitive@>
23002primitive("textfont",def_family,math_font_base);
23003@!@:text_font_}{\.{\\textfont} primitive@>
23004primitive("scriptfont",def_family,math_font_base+script_size);
23005@!@:script_font_}{\.{\\scriptfont} primitive@>
23006primitive("scriptscriptfont",def_family,math_font_base+script_script_size);
23007@!@:script_script_font_}{\.{\\scriptscriptfont} primitive@>
23008
23009@ @<Cases of |print_cmd_chr|...@>=
23010def_code: if chr_code=cat_code_base then print_esc("catcode")
23011  else if chr_code=math_code_base then print_esc("mathcode")
23012  else if chr_code=lc_code_base then print_esc("lccode")
23013  else if chr_code=uc_code_base then print_esc("uccode")
23014  else if chr_code=sf_code_base then print_esc("sfcode")
23015  else print_esc("delcode");
23016def_family: print_size(chr_code-math_font_base);
23017
23018@ The different types of code values have different legal ranges; the
23019following program is careful to check each case properly.
23020
23021@<Assignments@>=
23022def_code: begin @<Let |n| be the largest legal code value, based on |cur_chr|@>;
23023  p:=cur_chr; scan_char_num; p:=p+cur_val; scan_optional_equals;
23024  scan_int;
23025  if ((cur_val<0)and(p<del_code_base))or(cur_val>n) then
23026    begin print_err("Invalid code ("); print_int(cur_val);
23027@.Invalid code@>
23028    if p<del_code_base then print("), should be in the range 0..")
23029    else print("), should be at most ");
23030    print_int(n);
23031    help1("I'm going to use 0 instead of that illegal code value.");@/
23032    error; cur_val:=0;
23033    end;
23034  if p<math_code_base then define(p,data,cur_val)
23035  else if p<del_code_base then define(p,data,hi(cur_val))
23036  else word_define(p,cur_val);
23037  end;
23038
23039@ @<Let |n| be the largest...@>=
23040if cur_chr=cat_code_base then n:=max_char_code
23041else if cur_chr=math_code_base then n:=@'100000
23042else if cur_chr=sf_code_base then n:=@'77777
23043else if cur_chr=del_code_base then n:=@'77777777
23044else n:=255
23045
23046@ @<Assignments@>=
23047def_family: begin p:=cur_chr; scan_four_bit_int; p:=p+cur_val;
23048  scan_optional_equals; scan_font_ident; define(p,data,cur_val);
23049  end;
23050
23051@ Next we consider changes to \TeX's numeric registers.
23052
23053@<Assignments@>=
23054register,advance,multiply,divide: do_register_command(a);
23055
23056@ We use the fact that |register<advance<multiply<divide|.
23057
23058@<Declare subprocedures for |prefixed_command|@>=
23059procedure do_register_command(@!a:small_number);
23060label found,exit;
23061var l,@!q,@!r,@!s:pointer; {for list manipulation}
23062@!p:int_val..mu_val; {type of register involved}
23063begin q:=cur_cmd;
23064@<Compute the register location |l| and its type |p|; but |return| if invalid@>;
23065if q=register then scan_optional_equals
23066else if scan_keyword("by") then do_nothing; {optional `\.{by}'}
23067@.by@>
23068arith_error:=false;
23069if q<multiply then @<Compute result of |register| or
23070    |advance|, put it in |cur_val|@>
23071else @<Compute result of |multiply| or |divide|, put it in |cur_val|@>;
23072if arith_error then
23073  begin print_err("Arithmetic overflow");
23074@.Arithmetic overflow@>
23075  help2("I can't carry out that multiplication or division,")@/
23076    ("since the result is out of range.");
23077  if p>=glue_val then delete_glue_ref(cur_val);
23078  error; return;
23079  end;
23080if p<glue_val then word_define(l,cur_val)
23081else  begin trap_zero_glue; define(l,glue_ref,cur_val);
23082  end;
23083exit: end;
23084
23085@ Here we use the fact that the consecutive codes |int_val..mu_val| and
23086|assign_int..assign_mu_glue| correspond to each other nicely.
23087
23088@<Compute the register location |l| and its type |p|...@>=
23089begin if q<>register then
23090  begin get_x_token;
23091  if (cur_cmd>=assign_int)and(cur_cmd<=assign_mu_glue) then
23092    begin l:=cur_chr; p:=cur_cmd-assign_int; goto found;
23093    end;
23094  if cur_cmd<>register then
23095    begin print_err("You can't use `"); print_cmd_chr(cur_cmd,cur_chr);
23096@.You can't use x after ...@>
23097    print("' after "); print_cmd_chr(q,0);
23098    help1("I'm forgetting what you said and not changing anything.");
23099    error; return;
23100    end;
23101  end;
23102p:=cur_chr; scan_eight_bit_int;
23103case p of
23104int_val: l:=cur_val+count_base;
23105dimen_val: l:=cur_val+scaled_base;
23106glue_val: l:=cur_val+skip_base;
23107mu_val: l:=cur_val+mu_skip_base;
23108end; {there are no other cases}
23109end;
23110found:
23111
23112@ @<Compute result of |register| or |advance|...@>=
23113if p<glue_val then
23114  begin if p=int_val then scan_int@+else scan_normal_dimen;
23115  if q=advance then cur_val:=cur_val+eqtb[l].int;
23116  end
23117else  begin scan_glue(p);
23118  if q=advance then @<Compute the sum of two glue specs@>;
23119  end
23120
23121@ @<Compute the sum of two glue specs@>=
23122begin q:=new_spec(cur_val); r:=equiv(l);
23123delete_glue_ref(cur_val);
23124width(q):=width(q)+width(r);
23125if stretch(q)=0 then stretch_order(q):=normal;
23126if stretch_order(q)=stretch_order(r) then stretch(q):=stretch(q)+stretch(r)
23127else if (stretch_order(q)<stretch_order(r))and(stretch(r)<>0) then
23128  begin stretch(q):=stretch(r); stretch_order(q):=stretch_order(r);
23129  end;
23130if shrink(q)=0 then shrink_order(q):=normal;
23131if shrink_order(q)=shrink_order(r) then shrink(q):=shrink(q)+shrink(r)
23132else if (shrink_order(q)<shrink_order(r))and(shrink(r)<>0) then
23133  begin shrink(q):=shrink(r); shrink_order(q):=shrink_order(r);
23134  end;
23135cur_val:=q;
23136end
23137
23138@ @<Compute result of |multiply| or |divide|...@>=
23139begin scan_int;
23140if p<glue_val then
23141  if q=multiply then
23142    if p=int_val then cur_val:=mult_integers(eqtb[l].int,cur_val)
23143    else cur_val:=nx_plus_y(eqtb[l].int,cur_val,0)
23144  else cur_val:=x_over_n(eqtb[l].int,cur_val)
23145else  begin s:=equiv(l); r:=new_spec(s);
23146  if q=multiply then
23147    begin width(r):=nx_plus_y(width(s),cur_val,0);
23148    stretch(r):=nx_plus_y(stretch(s),cur_val,0);
23149    shrink(r):=nx_plus_y(shrink(s),cur_val,0);
23150    end
23151  else  begin width(r):=x_over_n(width(s),cur_val);
23152    stretch(r):=x_over_n(stretch(s),cur_val);
23153    shrink(r):=x_over_n(shrink(s),cur_val);
23154    end;
23155  cur_val:=r;
23156  end;
23157end
23158
23159@ The processing of boxes is somewhat different, because we may need
23160to scan and create an entire box before we actually change the value of the old
23161one.
23162
23163@<Assignments@>=
23164set_box: begin scan_eight_bit_int;
23165  if global then n:=256+cur_val@+else n:=cur_val;
23166  scan_optional_equals;
23167  if set_box_allowed then scan_box(box_flag+n)
23168  else begin print_err("Improper "); print_esc("setbox");
23169@.Improper \\setbox@>
23170    help2("Sorry, \setbox is not allowed after \halign in a display,")@/
23171    ("or between \accent and an accented character."); error;
23172    end;
23173  end;
23174
23175@ The |space_factor| or |prev_depth| settings are changed when a |set_aux|
23176command is sensed. Similarly, |prev_graf| is changed in the presence of
23177|set_prev_graf|, and |dead_cycles| or |insert_penalties| in the presence of
23178|set_page_int|. These definitions are always global.
23179
23180When some dimension of a box register is changed, the change isn't exactly
23181global; but \TeX\ does not look at the \.{\\global} switch.
23182
23183@<Assignments@>=
23184set_aux:alter_aux;
23185set_prev_graf:alter_prev_graf;
23186set_page_dimen:alter_page_so_far;
23187set_page_int:alter_integer;
23188set_box_dimen:alter_box_dimen;
23189
23190@ @<Declare subprocedures for |prefixed_command|@>=
23191procedure alter_aux;
23192var c:halfword; {|hmode| or |vmode|}
23193begin if cur_chr<>abs(mode) then report_illegal_case
23194else  begin c:=cur_chr; scan_optional_equals;
23195  if c=vmode then
23196    begin scan_normal_dimen; prev_depth:=cur_val;
23197    end
23198  else  begin scan_int;
23199    if (cur_val<=0)or(cur_val>32767) then
23200      begin print_err("Bad space factor");
23201@.Bad space factor@>
23202      help1("I allow only values in the range 1..32767 here.");
23203      int_error(cur_val);
23204      end
23205    else space_factor:=cur_val;
23206    end;
23207  end;
23208end;
23209
23210@ @<Declare subprocedures for |prefixed_command|@>=
23211procedure alter_prev_graf;
23212var p:0..nest_size; {index into |nest|}
23213begin nest[nest_ptr]:=cur_list; p:=nest_ptr;
23214while abs(nest[p].mode_field)<>vmode do decr(p);
23215scan_optional_equals; scan_int;
23216if cur_val<0 then
23217  begin print_err("Bad "); print_esc("prevgraf");
23218@.Bad \\prevgraf@>
23219  help1("I allow only nonnegative values here.");
23220  int_error(cur_val);
23221  end
23222else  begin nest[p].pg_field:=cur_val; cur_list:=nest[nest_ptr];
23223  end;
23224end;
23225
23226@ @<Declare subprocedures for |prefixed_command|@>=
23227procedure alter_page_so_far;
23228var c:0..7; {index into |page_so_far|}
23229begin c:=cur_chr; scan_optional_equals; scan_normal_dimen;
23230page_so_far[c]:=cur_val;
23231end;
23232
23233@ @<Declare subprocedures for |prefixed_command|@>=
23234procedure alter_integer;
23235var c:0..1; {0 for \.{\\deadcycles}, 1 for \.{\\insertpenalties}}
23236begin c:=cur_chr; scan_optional_equals; scan_int;
23237if c=0 then dead_cycles:=cur_val
23238else insert_penalties:=cur_val;
23239end;
23240
23241@ @<Declare subprocedures for |prefixed_command|@>=
23242procedure alter_box_dimen;
23243var c:small_number; {|width_offset| or |height_offset| or |depth_offset|}
23244@!b:eight_bits; {box number}
23245begin c:=cur_chr; scan_eight_bit_int; b:=cur_val; scan_optional_equals;
23246scan_normal_dimen;
23247if box(b)<>null then mem[box(b)+c].sc:=cur_val;
23248end;
23249
23250@ Paragraph shapes are set up in the obvious way.
23251
23252@<Assignments@>=
23253set_shape: begin scan_optional_equals; scan_int; n:=cur_val;
23254  if n<=0 then p:=null
23255  else  begin p:=get_node(2*n+1); info(p):=n;
23256    for j:=1 to n do
23257      begin scan_normal_dimen;
23258      mem[p+2*j-1].sc:=cur_val; {indentation}
23259      scan_normal_dimen;
23260      mem[p+2*j].sc:=cur_val; {width}
23261      end;
23262    end;
23263  define(par_shape_loc,shape_ref,p);
23264  end;
23265
23266@ Here's something that isn't quite so obvious. It guarantees that
23267|info(par_shape_ptr)| can hold any positive~|n| for which |get_node(2*n+1)|
23268doesn't overflow the memory capacity.
23269
23270@<Check the ``constant''...@>=
23271if 2*max_halfword<mem_top-mem_min then bad:=41;
23272
23273@ New hyphenation data is loaded by the |hyph_data| command.
23274
23275@<Put each...@>=
23276primitive("hyphenation",hyph_data,0);
23277@!@:hyphenation_}{\.{\\hyphenation} primitive@>
23278primitive("patterns",hyph_data,1);
23279@!@:patterns_}{\.{\\patterns} primitive@>
23280
23281@ @<Cases of |print_cmd_chr|...@>=
23282hyph_data: if chr_code=1 then print_esc("patterns")
23283  else print_esc("hyphenation");
23284
23285@ @<Assignments@>=
23286hyph_data: if cur_chr=1 then
23287    begin @!init new_patterns; goto done;@;@+tini@/
23288    print_err("Patterns can be loaded only by INITEX");
23289@.Patterns can be...@>
23290    help0; error;
23291    repeat get_token; until cur_cmd=right_brace; {flush the patterns}
23292    return;
23293    end
23294  else  begin new_hyph_exceptions; goto done;
23295    end;
23296
23297@ All of \TeX's parameters are kept in |eqtb| except the font information,
23298the interaction mode, and the hyphenation tables; these are strictly global.
23299
23300@<Assignments@>=
23301assign_font_dimen: begin find_font_dimen(true); k:=cur_val;
23302  scan_optional_equals; scan_normal_dimen; font_info[k].sc:=cur_val;
23303  end;
23304assign_font_int: begin n:=cur_chr; scan_font_ident; f:=cur_val;
23305  scan_optional_equals; scan_int;
23306  if n=0 then hyphen_char[f]:=cur_val@+else skew_char[f]:=cur_val;
23307  end;
23308
23309@ @<Put each...@>=
23310primitive("hyphenchar",assign_font_int,0);
23311@!@:hyphen_char_}{\.{\\hyphenchar} primitive@>
23312primitive("skewchar",assign_font_int,1);
23313@!@:skew_char_}{\.{\\skewchar} primitive@>
23314
23315@ @<Cases of |print_cmd_chr|...@>=
23316assign_font_int: if chr_code=0 then print_esc("hyphenchar")
23317  else print_esc("skewchar");
23318
23319@ Here is where the information for a new font gets loaded.
23320
23321@<Assignments@>=
23322def_font: new_font(a);
23323
23324@ @<Declare subprocedures for |prefixed_command|@>=
23325procedure new_font(@!a:small_number);
23326label common_ending;
23327var u:pointer; {user's font identifier}
23328@!s:scaled; {stated ``at'' size, or negative of scaled magnification}
23329@!f:internal_font_number; {runs through existing fonts}
23330@!t:str_number; {name for the frozen font identifier}
23331@!old_setting:0..max_selector; {holds |selector| setting}
23332@!flushable_string:str_number; {string not yet referenced}
23333begin if job_name=0 then open_log_file;
23334  {avoid confusing \.{texput} with the font name}
23335@.texput@>
23336get_r_token; u:=cur_cs;
23337if u>=hash_base then t:=text(u)
23338else if u>=single_base then
23339  if u=null_cs then t:="FONT"@+else t:=u-single_base
23340else  begin old_setting:=selector; selector:=new_string;
23341  print("FONT"); print(u-active_base); selector:=old_setting;
23342@.FONTx@>
23343  str_room(1); t:=make_string;
23344  end;
23345define(u,set_font,null_font); scan_optional_equals; scan_file_name;
23346@<Scan the font size specification@>;
23347@<If this font has already been loaded, set |f| to the internal
23348  font number and |goto common_ending|@>;
23349f:=read_font_info(u,cur_name,cur_area,s);
23350common_ending: equiv(u):=f; eqtb[font_id_base+f]:=eqtb[u]; font_id_text(f):=t;
23351end;
23352
23353@ @<Scan the font size specification@>=
23354name_in_progress:=true; {this keeps |cur_name| from being changed}
23355if scan_keyword("at") then @<Put the \(p)(positive) `at' size into |s|@>
23356@.at@>
23357else if scan_keyword("scaled") then
23358@.scaled@>
23359  begin scan_int; s:=-cur_val;
23360  if (cur_val<=0)or(cur_val>32768) then
23361    begin print_err("Illegal magnification has been changed to 1000");@/
23362@.Illegal magnification...@>
23363    help1("The magnification ratio must be between 1 and 32768.");
23364    int_error(cur_val); s:=-1000;
23365    end;
23366  end
23367else s:=-1000;
23368name_in_progress:=false
23369
23370@ @<Put the \(p)(positive) `at' size into |s|@>=
23371begin scan_normal_dimen; s:=cur_val;
23372if (s<=0)or(s>=@'1000000000) then
23373  begin print_err("Improper `at' size (");
23374  print_scaled(s); print("pt), replaced by 10pt");
23375@.Improper `at' size...@>
23376  help2("I can only handle fonts at positive sizes that are")@/
23377  ("less than 2048pt, so I've changed what you said to 10pt.");
23378  error; s:=10*unity;
23379  end;
23380end
23381
23382@ When the user gives a new identifier to a font that was previously loaded,
23383the new name becomes the font identifier of record. Font names `\.{xyz}' and
23384`\.{XYZ}' are considered to be different.
23385
23386@<If this font has already been loaded...@>=
23387flushable_string:=str_ptr-1;
23388for f:=font_base+1 to font_ptr do
23389  if str_eq_str(font_name[f],cur_name)and str_eq_str(font_area[f],cur_area) then
23390    begin if cur_name=flushable_string then
23391      begin flush_string; cur_name:=font_name[f];
23392      end;
23393    if s>0 then
23394      begin if s=font_size[f] then goto common_ending;
23395      end
23396    else if font_size[f]=xn_over_d(font_dsize[f],-s,1000) then
23397      goto common_ending;
23398    end
23399
23400@ @<Cases of |print_cmd_chr|...@>=
23401set_font:begin print("select font "); slow_print(font_name[chr_code]);
23402  if font_size[chr_code]<>font_dsize[chr_code] then
23403    begin print(" at "); print_scaled(font_size[chr_code]);
23404    print("pt");
23405    end;
23406  end;
23407
23408@ @<Put each...@>=
23409primitive("batchmode",set_interaction,batch_mode);
23410@!@:batch_mode_}{\.{\\batchmode} primitive@>
23411primitive("nonstopmode",set_interaction,nonstop_mode);
23412@!@:nonstop_mode_}{\.{\\nonstopmode} primitive@>
23413primitive("scrollmode",set_interaction,scroll_mode);
23414@!@:scroll_mode_}{\.{\\scrollmode} primitive@>
23415primitive("errorstopmode",set_interaction,error_stop_mode);
23416@!@:error_stop_mode_}{\.{\\errorstopmode} primitive@>
23417
23418@ @<Cases of |print_cmd_chr|...@>=
23419set_interaction: case chr_code of
23420  batch_mode: print_esc("batchmode");
23421  nonstop_mode: print_esc("nonstopmode");
23422  scroll_mode: print_esc("scrollmode");
23423  othercases print_esc("errorstopmode")
23424  endcases;
23425
23426@ @<Assignments@>=
23427set_interaction: new_interaction;
23428
23429@ @<Declare subprocedures for |prefixed_command|@>=
23430procedure new_interaction;
23431begin print_ln;
23432interaction:=cur_chr;
23433@<Initialize the print |selector| based on |interaction|@>;
23434if log_opened then selector:=selector+2;
23435end;
23436
23437@ The \.{\\afterassignment} command puts a token into the global
23438variable |after_token|. This global variable is examined just after
23439every assignment has been performed.
23440
23441@<Glob...@>=
23442@!after_token:halfword; {zero, or a saved token}
23443
23444@ @<Set init...@>=
23445after_token:=0;
23446
23447@ @<Cases of |main_control| that don't...@>=
23448any_mode(after_assignment):begin get_token; after_token:=cur_tok;
23449  end;
23450
23451@ @<Insert a token saved by \.{\\afterassignment}, if any@>=
23452if after_token<>0 then
23453  begin cur_tok:=after_token; back_input; after_token:=0;
23454  end
23455
23456@ Here is a procedure that might be called `Get the next non-blank non-relax
23457non-call non-assignment token'.
23458
23459@<Declare act...@>=
23460procedure do_assignments;
23461label exit;
23462begin loop begin @<Get the next non-blank non-relax...@>;
23463  if cur_cmd<=max_non_prefixed_command then return;
23464  set_box_allowed:=false; prefixed_command; set_box_allowed:=true;
23465  end;
23466exit:end;
23467
23468@ @<Cases of |main_control| that don't...@>=
23469any_mode(after_group):begin get_token; save_for_after(cur_tok);
23470  end;
23471
23472@ Files for \.{\\read} are opened and closed by the |in_stream| command.
23473
23474@<Put each...@>=
23475primitive("openin",in_stream,1);
23476@!@:open_in_}{\.{\\openin} primitive@>
23477primitive("closein",in_stream,0);
23478@!@:close_in_}{\.{\\closein} primitive@>
23479
23480@ @<Cases of |print_cmd_chr|...@>=
23481in_stream: if chr_code=0 then print_esc("closein")
23482  else print_esc("openin");
23483
23484@ @<Cases of |main_control| that don't...@>=
23485any_mode(in_stream): open_or_close_in;
23486
23487@ @<Declare act...@>=
23488procedure open_or_close_in;
23489var c:0..1; {1 for \.{\\openin}, 0 for \.{\\closein}}
23490@!n:0..15; {stream number}
23491begin c:=cur_chr; scan_four_bit_int; n:=cur_val;
23492if read_open[n]<>closed then
23493  begin a_close(read_file[n]); read_open[n]:=closed;
23494  end;
23495if c<>0 then
23496  begin scan_optional_equals; scan_file_name;
23497  if cur_ext="" then cur_ext:=".tex";
23498  pack_cur_name;
23499  if a_open_in(read_file[n]) then read_open[n]:=just_open;
23500  end;
23501end;
23502
23503@ The user can issue messages to the terminal, regardless of the
23504current mode.
23505
23506@<Cases of |main_control| that don't...@>=
23507any_mode(message):issue_message;
23508
23509@ @<Put each...@>=
23510primitive("message",message,0);
23511@!@:message_}{\.{\\message} primitive@>
23512primitive("errmessage",message,1);
23513@!@:err_message_}{\.{\\errmessage} primitive@>
23514
23515@ @<Cases of |print_cmd_chr|...@>=
23516message: if chr_code=0 then print_esc("message")
23517  else print_esc("errmessage");
23518
23519@ @<Declare act...@>=
23520procedure issue_message;
23521var old_setting:0..max_selector; {holds |selector| setting}
23522@!c:0..1; {identifies \.{\\message} and \.{\\errmessage}}
23523@!s:str_number; {the message}
23524begin c:=cur_chr; link(garbage):=scan_toks(false,true);
23525old_setting:=selector; selector:=new_string;
23526token_show(def_ref); selector:=old_setting;
23527flush_list(def_ref);
23528str_room(1); s:=make_string;
23529if c=0 then @<Print string |s| on the terminal@>
23530else @<Print string |s| as an error message@>;
23531flush_string;
23532end;
23533
23534@ @<Print string |s| on the terminal@>=
23535begin if term_offset+length(s)>max_print_line-2 then print_ln
23536else if (term_offset>0)or(file_offset>0) then print_char(" ");
23537slow_print(s); update_terminal;
23538end
23539
23540@ If \.{\\errmessage} occurs often in |scroll_mode|, without user-defined
23541\.{\\errhelp}, we don't want to give a long help message each time. So we
23542give a verbose explanation only once.
23543
23544@<Glob...@>=
23545@!long_help_seen:boolean; {has the long \.{\\errmessage} help been used?}
23546
23547@ @<Set init...@>=long_help_seen:=false;
23548
23549@ @<Print string |s| as an error message@>=
23550begin print_err(""); slow_print(s);
23551if err_help<>null then use_err_help:=true
23552else if long_help_seen then help1("(That was another \errmessage.)")
23553else  begin if interaction<error_stop_mode then long_help_seen:=true;
23554  help4("This error message was generated by an \errmessage")@/
23555  ("command, so I can't give any explicit help.")@/
23556  ("Pretend that you're Hercule Poirot: Examine all clues,")@/
23557@^Poirot, Hercule@>
23558  ("and deduce the truth by order and method.");
23559  end;
23560error; use_err_help:=false;
23561end
23562
23563@ The |error| routine calls on |give_err_help| if help is requested from
23564the |err_help| parameter.
23565
23566@p procedure give_err_help;
23567begin token_show(err_help);
23568end;
23569
23570@ The \.{\\uppercase} and \.{\\lowercase} commands are implemented by
23571building a token list and then changing the cases of the letters in it.
23572
23573@<Cases of |main_control| that don't...@>=
23574any_mode(case_shift):shift_case;
23575
23576@ @<Put each...@>=
23577primitive("lowercase",case_shift,lc_code_base);
23578@!@:lowercase_}{\.{\\lowercase} primitive@>
23579primitive("uppercase",case_shift,uc_code_base);
23580@!@:uppercase_}{\.{\\uppercase} primitive@>
23581
23582@ @<Cases of |print_cmd_chr|...@>=
23583case_shift:if chr_code=lc_code_base then print_esc("lowercase")
23584  else print_esc("uppercase");
23585
23586@ @<Declare act...@>=
23587procedure shift_case;
23588var b:pointer; {|lc_code_base| or |uc_code_base|}
23589@!p:pointer; {runs through the token list}
23590@!t:halfword; {token}
23591@!c:eight_bits; {character code}
23592begin b:=cur_chr; p:=scan_toks(false,false); p:=link(def_ref);
23593while p<>null do
23594  begin @<Change the case of the token in |p|, if a change is appropriate@>;
23595  p:=link(p);
23596  end;
23597back_list(link(def_ref)); free_avail(def_ref); {omit reference count}
23598end;
23599
23600@ When the case of a |chr_code| changes, we don't change the |cmd|.
23601We also change active characters, using the fact that
23602|cs_token_flag+active_base| is a multiple of~256.
23603@^data structure assumptions@>
23604
23605@<Change the case of the token in |p|, if a change is appropriate@>=
23606t:=info(p);
23607if t<cs_token_flag+single_base then
23608  begin c:=t mod 256;
23609  if equiv(b+c)<>0 then info(p):=t-c+equiv(b+c);
23610  end
23611
23612@ We come finally to the last pieces missing from |main_control|, namely the
23613`\.{\\show}' commands that are useful when debugging.
23614
23615@<Cases of |main_control| that don't...@>=
23616any_mode(xray): show_whatever;
23617
23618@ @d show_code=0 { \.{\\show} }
23619@d show_box_code=1 { \.{\\showbox} }
23620@d show_the_code=2 { \.{\\showthe} }
23621@d show_lists=3 { \.{\\showlists} }
23622
23623@<Put each...@>=
23624primitive("show",xray,show_code);
23625@!@:show_}{\.{\\show} primitive@>
23626primitive("showbox",xray,show_box_code);
23627@!@:show_box_}{\.{\\showbox} primitive@>
23628primitive("showthe",xray,show_the_code);
23629@!@:show_the_}{\.{\\showthe} primitive@>
23630primitive("showlists",xray,show_lists);
23631@!@:show_lists_}{\.{\\showlists} primitive@>
23632
23633@ @<Cases of |print_cmd_chr|...@>=
23634xray: case chr_code of
23635  show_box_code:print_esc("showbox");
23636  show_the_code:print_esc("showthe");
23637  show_lists:print_esc("showlists");
23638  othercases print_esc("show")
23639  endcases;
23640
23641@ @<Declare act...@>=
23642procedure show_whatever;
23643label common_ending;
23644var p:pointer; {tail of a token list to show}
23645begin case cur_chr of
23646show_lists: begin begin_diagnostic; show_activities;
23647  end;
23648show_box_code: @<Show the current contents of a box@>;
23649show_code: @<Show the current meaning of a token, then |goto common_ending|@>;
23650othercases @<Show the current value of some parameter or register,
23651  then |goto common_ending|@>
23652endcases;@/
23653@<Complete a potentially long \.{\\show} command@>;
23654common_ending: if interaction<error_stop_mode then
23655  begin help0; decr(error_count);
23656  end
23657else if tracing_online>0 then
23658  begin@t@>@;@/
23659  help3("This isn't an error message; I'm just \showing something.")@/
23660  ("Type `I\show...' to show more (e.g., \show\cs,")@/
23661  ("\showthe\count10, \showbox255, \showlists).");
23662  end
23663else  begin@t@>@;@/
23664  help5("This isn't an error message; I'm just \showing something.")@/
23665  ("Type `I\show...' to show more (e.g., \show\cs,")@/
23666  ("\showthe\count10, \showbox255, \showlists).")@/
23667  ("And type `I\tracingonline=1\show...' to show boxes and")@/
23668  ("lists on your terminal as well as in the transcript file.");
23669  end;
23670error;
23671end;
23672
23673@ @<Show the current meaning of a token...@>=
23674begin get_token;
23675if interaction=error_stop_mode then wake_up_terminal;
23676print_nl("> ");
23677if cur_cs<>0 then
23678  begin sprint_cs(cur_cs); print_char("=");
23679  end;
23680print_meaning; goto common_ending;
23681end
23682
23683@ @<Cases of |print_cmd_chr|...@>=
23684undefined_cs: print("undefined");
23685call: print("macro");
23686long_call: print_esc("long macro");
23687outer_call: print_esc("outer macro");
23688long_outer_call: begin print_esc("long"); print_esc("outer macro");
23689  end;
23690end_template: print_esc("outer endtemplate");
23691
23692@ @<Show the current contents of a box@>=
23693begin scan_eight_bit_int; begin_diagnostic;
23694print_nl("> \box"); print_int(cur_val); print_char("=");
23695if box(cur_val)=null then print("void")
23696else show_box(box(cur_val));
23697end
23698
23699@ @<Show the current value of some parameter...@>=
23700begin p:=the_toks;
23701if interaction=error_stop_mode then wake_up_terminal;
23702print_nl("> "); token_show(temp_head);
23703flush_list(link(temp_head)); goto common_ending;
23704end
23705
23706@ @<Complete a potentially long \.{\\show} command@>=
23707end_diagnostic(true); print_err("OK");
23708@.OK@>
23709if selector=term_and_log then if tracing_online<=0 then
23710  begin selector:=term_only; print(" (see the transcript file)");
23711  selector:=term_and_log;
23712  end
23713
23714@* \[50] Dumping and undumping the tables.
23715After \.{INITEX} has seen a collection of fonts and macros, it
23716can write all the necessary information on an auxiliary file so
23717that production versions of \TeX\ are able to initialize their
23718memory at high speed. The present section of the program takes
23719care of such output and input. We shall consider simultaneously
23720the processes of storing and restoring,
23721so that the inverse relation between them is clear.
23722@.INITEX@>
23723
23724The global variable |format_ident| is a string that is printed right
23725after the |banner| line when \TeX\ is ready to start. For \.{INITEX} this
23726string says simply `\.{(INITEX)}'; for other versions of \TeX\ it says,
23727for example, `\.{(preloaded format=plain 1982.11.19)}', showing the year,
23728month, and day that the format file was created. We have |format_ident=0|
23729before \TeX's tables are loaded.
23730
23731@<Glob...@>=
23732@!format_ident:str_number;
23733
23734@ @<Set init...@>=
23735format_ident:=0;
23736
23737@ @<Initialize table entries...@>=
23738format_ident:=" (INITEX)";
23739
23740@ @<Declare act...@>=
23741@!init procedure store_fmt_file;
23742label found1,found2,done1,done2;
23743var j,@!k,@!l:integer; {all-purpose indices}
23744@!p,@!q: pointer; {all-purpose pointers}
23745@!x: integer; {something to dump}
23746@!w: four_quarters; {four ASCII codes}
23747begin @<If dumping is not allowed, abort@>;
23748@<Create the |format_ident|, open the format file,
23749  and inform the user that dumping has begun@>;
23750@<Dump constants for consistency check@>;
23751@<Dump the string pool@>;
23752@<Dump the dynamic memory@>;
23753@<Dump the table of equivalents@>;
23754@<Dump the font information@>;
23755@<Dump the hyphenation tables@>;
23756@<Dump a couple more things and the closing check word@>;
23757@<Close the format file@>;
23758end;
23759tini
23760
23761@ Corresponding to the procedure that dumps a format file, we have a function
23762that reads one in. The function returns |false| if the dumped format is
23763incompatible with the present \TeX\ table sizes, etc.
23764
23765@d bad_fmt=6666 {go here if the format file is unacceptable}
23766@d too_small(#)==begin wake_up_terminal;
23767  wterm_ln('---! Must increase the ',#);
23768@.Must increase the x@>
23769  goto bad_fmt;
23770  end
23771
23772@p @t\4@>@<Declare the function called |open_fmt_file|@>@;
23773function load_fmt_file:boolean;
23774label bad_fmt,exit;
23775var j,@!k:integer; {all-purpose indices}
23776@!p,@!q: pointer; {all-purpose pointers}
23777@!x: integer; {something undumped}
23778@!w: four_quarters; {four ASCII codes}
23779begin @<Undump constants for consistency check@>;
23780@<Undump the string pool@>;
23781@<Undump the dynamic memory@>;
23782@<Undump the table of equivalents@>;
23783@<Undump the font information@>;
23784@<Undump the hyphenation tables@>;
23785@<Undump a couple more things and the closing check word@>;
23786load_fmt_file:=true; return; {it worked!}
23787bad_fmt: wake_up_terminal;
23788  wterm_ln('(Fatal format file error; I''m stymied)');
23789@.Fatal format file error@>
23790load_fmt_file:=false;
23791exit:end;
23792
23793@ The user is not allowed to dump a format file unless |save_ptr=0|.
23794This condition implies that |cur_level=level_one|, hence
23795the |xeq_level| array is constant and it need not be dumped.
23796
23797@<If dumping is not allowed, abort@>=
23798if save_ptr<>0 then
23799  begin print_err("You can't dump inside a group");
23800@.You can't dump...@>
23801  help1("`{...\dump}' is a no-no."); succumb;
23802  end
23803
23804@ Format files consist of |memory_word| items, and we use the following
23805macros to dump words of different types:
23806
23807@d dump_wd(#)==begin fmt_file^:=#; put(fmt_file);@+end
23808@d dump_int(#)==begin fmt_file^.int:=#; put(fmt_file);@+end
23809@d dump_hh(#)==begin fmt_file^.hh:=#; put(fmt_file);@+end
23810@d dump_qqqq(#)==begin fmt_file^.qqqq:=#; put(fmt_file);@+end
23811
23812@<Glob...@>=
23813@!fmt_file:word_file; {for input or output of format information}
23814
23815@ The inverse macros are slightly more complicated, since we need to check
23816the range of the values we are reading in. We say `|undump(a)(b)(x)|' to
23817read an integer value |x| that is supposed to be in the range |a<=x<=b|.
23818
23819@d undump_wd(#)==begin get(fmt_file); #:=fmt_file^;@+end
23820@d undump_int(#)==begin get(fmt_file); #:=fmt_file^.int;@+end
23821@d undump_hh(#)==begin get(fmt_file); #:=fmt_file^.hh;@+end
23822@d undump_qqqq(#)==begin get(fmt_file); #:=fmt_file^.qqqq;@+end
23823@d undump_end_end(#)==#:=x;@+end
23824@d undump_end(#)==(x>#) then goto bad_fmt@+else undump_end_end
23825@d undump(#)==begin undump_int(x); if (x<#) or undump_end
23826@d undump_size_end_end(#)==too_small(#)@+else undump_end_end
23827@d undump_size_end(#)==if x># then undump_size_end_end
23828@d undump_size(#)==begin undump_int(x);
23829  if x<# then goto bad_fmt; undump_size_end
23830
23831@ The next few sections of the program should make it clear how we use the
23832dump/undump macros.
23833
23834@<Dump constants for consistency check@>=
23835dump_int(@$);@/
23836dump_int(mem_bot);@/
23837dump_int(mem_top);@/
23838dump_int(eqtb_size);@/
23839dump_int(hash_prime);@/
23840dump_int(hyph_size)
23841
23842@ Sections of a \.{WEB} program that are ``commented out'' still contribute
23843strings to the string pool; therefore \.{INITEX} and \TeX\ will have
23844the same strings. (And it is, of course, a good thing that they do.)
23845@.WEB@>
23846@^string pool@>
23847
23848@<Undump constants for consistency check@>=
23849x:=fmt_file^.int;
23850if x<>@$ then goto bad_fmt; {check that strings are the same}
23851undump_int(x);
23852if x<>mem_bot then goto bad_fmt;
23853undump_int(x);
23854if x<>mem_top then goto bad_fmt;
23855undump_int(x);
23856if x<>eqtb_size then goto bad_fmt;
23857undump_int(x);
23858if x<>hash_prime then goto bad_fmt;
23859undump_int(x);
23860if x<>hyph_size then goto bad_fmt
23861
23862@ @d dump_four_ASCII==
23863  w.b0:=qi(so(str_pool[k])); w.b1:=qi(so(str_pool[k+1]));
23864  w.b2:=qi(so(str_pool[k+2])); w.b3:=qi(so(str_pool[k+3]));
23865  dump_qqqq(w)
23866
23867@<Dump the string pool@>=
23868dump_int(pool_ptr);
23869dump_int(str_ptr);
23870for k:=0 to str_ptr do dump_int(str_start[k]);
23871k:=0;
23872while k+4<pool_ptr do
23873  begin dump_four_ASCII; k:=k+4;
23874  end;
23875k:=pool_ptr-4; dump_four_ASCII;
23876print_ln; print_int(str_ptr); print(" strings of total length ");
23877print_int(pool_ptr)
23878
23879@ @d undump_four_ASCII==
23880  undump_qqqq(w);
23881  str_pool[k]:=si(qo(w.b0)); str_pool[k+1]:=si(qo(w.b1));
23882  str_pool[k+2]:=si(qo(w.b2)); str_pool[k+3]:=si(qo(w.b3))
23883
23884@<Undump the string pool@>=
23885undump_size(0)(pool_size)('string pool size')(pool_ptr);
23886undump_size(0)(max_strings)('max strings')(str_ptr);
23887for k:=0 to str_ptr do undump(0)(pool_ptr)(str_start[k]);
23888k:=0;
23889while k+4<pool_ptr do
23890  begin undump_four_ASCII; k:=k+4;
23891  end;
23892k:=pool_ptr-4; undump_four_ASCII;
23893init_str_ptr:=str_ptr; init_pool_ptr:=pool_ptr
23894
23895@ By sorting the list of available spaces in the variable-size portion of
23896|mem|, we are usually able to get by without having to dump very much
23897of the dynamic memory.
23898
23899We recompute |var_used| and |dyn_used|, so that \.{INITEX} dumps valid
23900information even when it has not been gathering statistics.
23901
23902@<Dump the dynamic memory@>=
23903sort_avail; var_used:=0;
23904dump_int(lo_mem_max); dump_int(rover);
23905p:=mem_bot; q:=rover; x:=0;
23906repeat for k:=p to q+1 do dump_wd(mem[k]);
23907x:=x+q+2-p; var_used:=var_used+q-p;
23908p:=q+node_size(q); q:=rlink(q);
23909until q=rover;
23910var_used:=var_used+lo_mem_max-p; dyn_used:=mem_end+1-hi_mem_min;@/
23911for k:=p to lo_mem_max do dump_wd(mem[k]);
23912x:=x+lo_mem_max+1-p;
23913dump_int(hi_mem_min); dump_int(avail);
23914for k:=hi_mem_min to mem_end do dump_wd(mem[k]);
23915x:=x+mem_end+1-hi_mem_min;
23916p:=avail;
23917while p<>null do
23918  begin decr(dyn_used); p:=link(p);
23919  end;
23920dump_int(var_used); dump_int(dyn_used);
23921print_ln; print_int(x);
23922print(" memory locations dumped; current usage is ");
23923print_int(var_used); print_char("&"); print_int(dyn_used)
23924
23925@ @<Undump the dynamic memory@>=
23926undump(lo_mem_stat_max+1000)(hi_mem_stat_min-1)(lo_mem_max);
23927undump(lo_mem_stat_max+1)(lo_mem_max)(rover);
23928p:=mem_bot; q:=rover;
23929repeat for k:=p to q+1 do undump_wd(mem[k]);
23930p:=q+node_size(q);
23931if (p>lo_mem_max)or((q>=rlink(q))and(rlink(q)<>rover)) then goto bad_fmt;
23932q:=rlink(q);
23933until q=rover;
23934for k:=p to lo_mem_max do undump_wd(mem[k]);
23935if mem_min<mem_bot-2 then {make more low memory available}
23936  begin p:=llink(rover); q:=mem_min+1;
23937  link(mem_min):=null; info(mem_min):=null; {we don't use the bottom word}
23938  rlink(p):=q; llink(rover):=q;@/
23939  rlink(q):=rover; llink(q):=p; link(q):=empty_flag;
23940  node_size(q):=mem_bot-q;
23941  end;
23942undump(lo_mem_max+1)(hi_mem_stat_min)(hi_mem_min);
23943undump(null)(mem_top)(avail); mem_end:=mem_top;
23944for k:=hi_mem_min to mem_end do undump_wd(mem[k]);
23945undump_int(var_used); undump_int(dyn_used)
23946
23947@ @<Dump the table of equivalents@>=
23948@<Dump regions 1 to 4 of |eqtb|@>;
23949@<Dump regions 5 and 6 of |eqtb|@>;
23950dump_int(par_loc); dump_int(write_loc);@/
23951@<Dump the hash table@>
23952
23953@ @<Undump the table of equivalents@>=
23954@<Undump regions 1 to 6 of |eqtb|@>;
23955undump(hash_base)(frozen_control_sequence)(par_loc);
23956par_token:=cs_token_flag+par_loc;@/
23957undump(hash_base)(frozen_control_sequence)(write_loc);@/
23958@<Undump the hash table@>
23959
23960@ The table of equivalents usually contains repeated information, so we dump it
23961in compressed form: The sequence of $n+2$ values $(n,x_1,\ldots,x_n,m)$ in the
23962format file represents $n+m$ consecutive entries of |eqtb|, with |m| extra
23963copies of $x_n$, namely $(x_1,\ldots,x_n,x_n,\ldots,x_n)$.
23964
23965@<Dump regions 1 to 4 of |eqtb|@>=
23966k:=active_base;
23967repeat j:=k;
23968while j<int_base-1 do
23969  begin if (equiv(j)=equiv(j+1))and(eq_type(j)=eq_type(j+1))and@|
23970    (eq_level(j)=eq_level(j+1)) then goto found1;
23971  incr(j);
23972  end;
23973l:=int_base; goto done1; {|j=int_base-1|}
23974found1: incr(j); l:=j;
23975while j<int_base-1 do
23976  begin if (equiv(j)<>equiv(j+1))or(eq_type(j)<>eq_type(j+1))or@|
23977    (eq_level(j)<>eq_level(j+1)) then goto done1;
23978  incr(j);
23979  end;
23980done1:dump_int(l-k);
23981while k<l do
23982  begin dump_wd(eqtb[k]); incr(k);
23983  end;
23984k:=j+1; dump_int(k-l);
23985until k=int_base
23986
23987@ @<Dump regions 5 and 6 of |eqtb|@>=
23988repeat j:=k;
23989while j<eqtb_size do
23990  begin if eqtb[j].int=eqtb[j+1].int then goto found2;
23991  incr(j);
23992  end;
23993l:=eqtb_size+1; goto done2; {|j=eqtb_size|}
23994found2: incr(j); l:=j;
23995while j<eqtb_size do
23996  begin if eqtb[j].int<>eqtb[j+1].int then goto done2;
23997  incr(j);
23998  end;
23999done2:dump_int(l-k);
24000while k<l do
24001  begin dump_wd(eqtb[k]); incr(k);
24002  end;
24003k:=j+1; dump_int(k-l);
24004until k>eqtb_size
24005
24006@ @<Undump regions 1 to 6 of |eqtb|@>=
24007k:=active_base;
24008repeat undump_int(x);
24009if (x<1)or(k+x>eqtb_size+1) then goto bad_fmt;
24010for j:=k to k+x-1 do undump_wd(eqtb[j]);
24011k:=k+x;
24012undump_int(x);
24013if (x<0)or(k+x>eqtb_size+1) then goto bad_fmt;
24014for j:=k to k+x-1 do eqtb[j]:=eqtb[k-1];
24015k:=k+x;
24016until k>eqtb_size
24017
24018@ A different scheme is used to compress the hash table, since its lower
24019region is usually sparse. When |text(p)<>0| for |p<=hash_used|, we output
24020two words, |p| and |hash[p]|. The hash table is, of course, densely packed
24021for |p>=hash_used|, so the remaining entries are output in a~block.
24022
24023@<Dump the hash table@>=
24024dump_int(hash_used); cs_count:=frozen_control_sequence-1-hash_used;
24025for p:=hash_base to hash_used do if text(p)<>0 then
24026  begin dump_int(p); dump_hh(hash[p]); incr(cs_count);
24027  end;
24028for p:=hash_used+1 to undefined_control_sequence-1 do dump_hh(hash[p]);
24029dump_int(cs_count);@/
24030print_ln; print_int(cs_count); print(" multiletter control sequences")
24031
24032@ @<Undump the hash table@>=
24033undump(hash_base)(frozen_control_sequence)(hash_used); p:=hash_base-1;
24034repeat undump(p+1)(hash_used)(p); undump_hh(hash[p]);
24035until p=hash_used;
24036for p:=hash_used+1 to undefined_control_sequence-1 do undump_hh(hash[p]);
24037undump_int(cs_count)
24038
24039@ @<Dump the font information@>=
24040dump_int(fmem_ptr);
24041for k:=0 to fmem_ptr-1 do dump_wd(font_info[k]);
24042dump_int(font_ptr);
24043for k:=null_font to font_ptr do
24044  @<Dump the array info for internal font number |k|@>;
24045print_ln; print_int(fmem_ptr-7); print(" words of font info for ");
24046print_int(font_ptr-font_base); print(" preloaded font");
24047if font_ptr<>font_base+1 then print_char("s")
24048
24049@ @<Undump the font information@>=
24050undump_size(7)(font_mem_size)('font mem size')(fmem_ptr);
24051for k:=0 to fmem_ptr-1 do undump_wd(font_info[k]);
24052undump_size(font_base)(font_max)('font max')(font_ptr);
24053for k:=null_font to font_ptr do
24054  @<Undump the array info for internal font number |k|@>
24055
24056@ @<Dump the array info for internal font number |k|@>=
24057begin dump_qqqq(font_check[k]);
24058dump_int(font_size[k]);
24059dump_int(font_dsize[k]);
24060dump_int(font_params[k]);@/
24061dump_int(hyphen_char[k]);
24062dump_int(skew_char[k]);@/
24063dump_int(font_name[k]);
24064dump_int(font_area[k]);@/
24065dump_int(font_bc[k]);
24066dump_int(font_ec[k]);@/
24067dump_int(char_base[k]);
24068dump_int(width_base[k]);
24069dump_int(height_base[k]);@/
24070dump_int(depth_base[k]);
24071dump_int(italic_base[k]);
24072dump_int(lig_kern_base[k]);@/
24073dump_int(kern_base[k]);
24074dump_int(exten_base[k]);
24075dump_int(param_base[k]);@/
24076dump_int(font_glue[k]);@/
24077dump_int(bchar_label[k]);
24078dump_int(font_bchar[k]);
24079dump_int(font_false_bchar[k]);@/
24080print_nl("\font"); print_esc(font_id_text(k)); print_char("=");
24081print_file_name(font_name[k],font_area[k],"");
24082if font_size[k]<>font_dsize[k] then
24083  begin print(" at "); print_scaled(font_size[k]); print("pt");
24084  end;
24085end
24086
24087@ @<Undump the array info for internal font number |k|@>=
24088begin undump_qqqq(font_check[k]);@/
24089undump_int(font_size[k]);
24090undump_int(font_dsize[k]);
24091undump(min_halfword)(max_halfword)(font_params[k]);@/
24092undump_int(hyphen_char[k]);
24093undump_int(skew_char[k]);@/
24094undump(0)(str_ptr)(font_name[k]);
24095undump(0)(str_ptr)(font_area[k]);@/
24096undump(0)(255)(font_bc[k]);
24097undump(0)(255)(font_ec[k]);@/
24098undump_int(char_base[k]);
24099undump_int(width_base[k]);
24100undump_int(height_base[k]);@/
24101undump_int(depth_base[k]);
24102undump_int(italic_base[k]);
24103undump_int(lig_kern_base[k]);@/
24104undump_int(kern_base[k]);
24105undump_int(exten_base[k]);
24106undump_int(param_base[k]);@/
24107undump(min_halfword)(lo_mem_max)(font_glue[k]);@/
24108undump(0)(fmem_ptr-1)(bchar_label[k]);
24109undump(min_quarterword)(non_char)(font_bchar[k]);
24110undump(min_quarterword)(non_char)(font_false_bchar[k]);
24111end
24112
24113@ @<Dump the hyphenation tables@>=
24114dump_int(hyph_count);
24115for k:=0 to hyph_size do if hyph_word[k]<>0 then
24116  begin dump_int(k); dump_int(hyph_word[k]); dump_int(hyph_list[k]);
24117  end;
24118print_ln; print_int(hyph_count); print(" hyphenation exception");
24119if hyph_count<>1 then print_char("s");
24120if trie_not_ready then init_trie;
24121dump_int(trie_max);
24122for k:=0 to trie_max do dump_hh(trie[k]);
24123dump_int(trie_op_ptr);
24124for k:=1 to trie_op_ptr do
24125  begin dump_int(hyf_distance[k]);
24126  dump_int(hyf_num[k]);
24127  dump_int(hyf_next[k]);
24128  end;
24129print_nl("Hyphenation trie of length "); print_int(trie_max);
24130@.Hyphenation trie...@>
24131print(" has "); print_int(trie_op_ptr); print(" op");
24132if trie_op_ptr<>1 then print_char("s");
24133print(" out of "); print_int(trie_op_size);
24134for k:=255 downto 0 do if trie_used[k]>min_quarterword then
24135  begin print_nl("  "); print_int(qo(trie_used[k]));
24136  print(" for language "); print_int(k);
24137  dump_int(k); dump_int(qo(trie_used[k]));
24138  end
24139
24140@ Only ``nonempty'' parts of |op_start| need to be restored.
24141
24142@<Undump the hyphenation tables@>=
24143undump(0)(hyph_size)(hyph_count);
24144for k:=1 to hyph_count do
24145  begin undump(0)(hyph_size)(j);
24146  undump(0)(str_ptr)(hyph_word[j]);
24147  undump(min_halfword)(max_halfword)(hyph_list[j]);
24148  end;
24149undump_size(0)(trie_size)('trie size')(j); @+init trie_max:=j;@+tini
24150for k:=0 to j do undump_hh(trie[k]);
24151undump_size(0)(trie_op_size)('trie op size')(j); @+init trie_op_ptr:=j;@+tini
24152for k:=1 to j do
24153  begin undump(0)(63)(hyf_distance[k]); {a |small_number|}
24154  undump(0)(63)(hyf_num[k]);
24155  undump(min_quarterword)(max_quarterword)(hyf_next[k]);
24156  end;
24157init for k:=0 to 255 do trie_used[k]:=min_quarterword;@+tini@;@/
24158k:=256;
24159while j>0 do
24160  begin undump(0)(k-1)(k); undump(1)(j)(x);@+init trie_used[k]:=qi(x);@+tini@;@/
24161  j:=j-x; op_start[k]:=qo(j);
24162  end;
24163@!init trie_not_ready:=false @+tini
24164
24165@ We have already printed a lot of statistics, so we set |tracing_stats:=0|
24166to prevent them from appearing again.
24167
24168@<Dump a couple more things and the closing check word@>=
24169dump_int(interaction); dump_int(format_ident); dump_int(69069);
24170tracing_stats:=0
24171
24172@ @<Undump a couple more things and the closing check word@>=
24173undump(batch_mode)(error_stop_mode)(interaction);
24174undump(0)(str_ptr)(format_ident);
24175undump_int(x);
24176if (x<>69069)or eof(fmt_file) then goto bad_fmt
24177
24178@ @<Create the |format_ident|...@>=
24179selector:=new_string;
24180print(" (preloaded format="); print(job_name); print_char(" ");
24181print_int(year); print_char(".");
24182print_int(month); print_char("."); print_int(day); print_char(")");
24183if interaction=batch_mode then selector:=log_only
24184else selector:=term_and_log;
24185str_room(1);
24186format_ident:=make_string;
24187pack_job_name(format_extension);
24188while not w_open_out(fmt_file) do
24189  prompt_file_name("format file name",format_extension);
24190print_nl("Beginning to dump on file ");
24191@.Beginning to dump...@>
24192slow_print(w_make_name_string(fmt_file)); flush_string;
24193print_nl(""); slow_print(format_ident)
24194
24195@ @<Close the format file@>=
24196w_close(fmt_file)
24197
24198@* \[51] The main program.
24199This is it: the part of \TeX\ that executes all those procedures we have
24200written.
24201
24202Well---almost. Let's leave space for a few more routines that we may
24203have forgotten.
24204
24205@p @<Last-minute procedures@>
24206
24207@ We have noted that there are two versions of \TeX82. One, called \.{INITEX},
24208@.INITEX@>
24209has to be run first; it initializes everything from scratch, without
24210reading a format file, and it has the capability of dumping a format file.
24211The other one is called `\.{VIRTEX}'; it is a ``virgin'' program that needs
24212@.VIRTEX@>
24213to input a format file in order to get started. \.{VIRTEX} typically has
24214more memory capacity than \.{INITEX}, because it does not need the space
24215consumed by the auxiliary hyphenation tables and the numerous calls on
24216|primitive|, etc.
24217
24218The \.{VIRTEX} program cannot read a format file instantaneously, of course;
24219the best implementations therefore allow for production versions of \TeX\ that
24220not only avoid the loading routine for \PASCAL\ object code, they also have
24221a format file pre-loaded. This is impossible to do if we stick to standard
24222\PASCAL; but there is a simple way to fool many systems into avoiding the
24223initialization, as follows:\quad(1)~We declare a global integer variable
24224called |ready_already|. The probability is negligible that this
24225variable holds any particular value like 314159 when \.{VIRTEX} is first
24226loaded.\quad(2)~After we have read in a format file and initialized
24227everything, we set |ready_already:=314159|.\quad(3)~Soon \.{VIRTEX}
24228will print `\.*', waiting for more input; and at this point we
24229interrupt the program and save its core image in some form that the
24230operating system can reload speedily.\quad(4)~When that core image is
24231activated, the program starts again at the beginning; but now
24232|ready_already=314159| and all the other global variables have
24233their initial values too. The former chastity has vanished!
24234
24235In other words, if we allow ourselves to test the condition
24236|ready_already=314159|, before |ready_already| has been
24237assigned a value, we can avoid the lengthy initialization. Dirty tricks
24238rarely pay off so handsomely.
24239@^dirty \PASCAL@>
24240@^system dependencies@>
24241
24242On systems that allow such preloading, the standard program called \.{TeX}
24243should be the one that has \.{plain} format preloaded, since that agrees
24244with {\sl The \TeX book}. Other versions, e.g., \.{AmSTeX}, should also
24245@:TeXbook}{\sl The \TeX book@>
24246@.AmSTeX@>
24247@.plain@>
24248be provided for commonly used formats.
24249
24250@<Glob...@>=
24251@!ready_already:integer; {a sacrifice of purity for economy}
24252
24253@ Now this is really it: \TeX\ starts and ends here.
24254
24255The initial test involving |ready_already| should be deleted if the
24256\PASCAL\ runtime system is smart enough to detect such a ``mistake.''
24257@^system dependencies@>
24258
24259@p begin @!{|start_here|}
24260history:=fatal_error_stop; {in case we quit during initialization}
24261t_open_out; {open the terminal for output}
24262if ready_already=314159 then goto start_of_TEX;
24263@<Check the ``constant'' values...@>@;
24264if bad>0 then
24265  begin wterm_ln('Ouch---my internal constants have been clobbered!',
24266    '---case ',bad:1);
24267@.Ouch...clobbered@>
24268  goto final_end;
24269  end;
24270initialize; {set global variables to their starting values}
24271@!init if not get_strings_started then goto final_end;
24272init_prim; {call |primitive| for each primitive}
24273init_str_ptr:=str_ptr; init_pool_ptr:=pool_ptr; fix_date_and_time;
24274tini@/
24275ready_already:=314159;
24276start_of_TEX: @<Initialize the output routines@>;
24277@<Get the first line of input and prepare to start@>;
24278history:=spotless; {ready to go!}
24279main_control; {come to life}
24280final_cleanup; {prepare for death}
24281end_of_TEX: close_files_and_terminate;
24282final_end: ready_already:=0;
24283end.
24284
24285@ Here we do whatever is needed to complete \TeX's job gracefully on the
24286local operating system. The code here might come into play after a fatal
24287error; it must therefore consist entirely of ``safe'' operations that
24288cannot produce error messages. For example, it would be a mistake to call
24289|str_room| or |make_string| at this time, because a call on |overflow|
24290might lead to an infinite loop.
24291@^system dependencies@>
24292
24293Actually there's one way to get error messages, via |prepare_mag|;
24294but that can't cause infinite recursion.
24295@^recursion@>
24296
24297This program doesn't bother to close the input files that may still be open.
24298
24299@<Last-minute...@>=
24300procedure close_files_and_terminate;
24301var k:integer; {all-purpose index}
24302begin @<Finish the extensions@>;
24303@!stat if tracing_stats>0 then @<Output statistics about this job@>;@;@+tats@/
24304wake_up_terminal; @<Finish the \.{DVI} file@>;
24305if log_opened then
24306  begin wlog_cr; a_close(log_file); selector:=selector-2;
24307  if selector=term_only then
24308    begin print_nl("Transcript written on ");
24309@.Transcript written...@>
24310    slow_print(log_name); print_char(".");
24311    end;
24312  end;
24313end;
24314
24315@ The present section goes directly to the log file instead of using
24316|print| commands, because there's no need for these strings to take
24317up |str_pool| memory when a non-{\bf stat} version of \TeX\ is being used.
24318
24319@<Output statistics...@>=
24320if log_opened then
24321  begin wlog_ln(' ');
24322  wlog_ln('Here is how much of TeX''s memory',' you used:');
24323@.Here is how much...@>
24324  wlog(' ',str_ptr-init_str_ptr:1,' string');
24325  if str_ptr<>init_str_ptr+1 then wlog('s');
24326  wlog_ln(' out of ', max_strings-init_str_ptr:1);@/
24327  wlog_ln(' ',pool_ptr-init_pool_ptr:1,' string characters out of ',
24328    pool_size-init_pool_ptr:1);@/
24329  wlog_ln(' ',lo_mem_max-mem_min+mem_end-hi_mem_min+2:1,@|
24330    ' words of memory out of ',mem_end+1-mem_min:1);@/
24331  wlog_ln(' ',cs_count:1,' multiletter control sequences out of ',
24332    hash_size:1);@/
24333  wlog(' ',fmem_ptr:1,' words of font info for ',
24334    font_ptr-font_base:1,' font');
24335  if font_ptr<>font_base+1 then wlog('s');
24336  wlog_ln(', out of ',font_mem_size:1,' for ',font_max-font_base:1);@/
24337  wlog(' ',hyph_count:1,' hyphenation exception');
24338  if hyph_count<>1 then wlog('s');
24339  wlog_ln(' out of ',hyph_size:1);@/
24340  wlog_ln(' ',max_in_stack:1,'i,',max_nest_stack:1,'n,',@|
24341    max_param_stack:1,'p,',@|
24342    max_buf_stack+1:1,'b,',@|
24343    max_save_stack+6:1,'s stack positions out of ',@|
24344    stack_size:1,'i,',
24345    nest_size:1,'n,',
24346    param_size:1,'p,',
24347    buf_size:1,'b,',
24348    save_size:1,'s');
24349  end
24350
24351@ We get to the |final_cleanup| routine when \.{\\end} or \.{\\dump} has
24352been scanned and |its_all_over|\kern-2pt.
24353
24354@<Last-minute...@>=
24355procedure final_cleanup;
24356label exit;
24357var c:small_number; {0 for \.{\\end}, 1 for \.{\\dump}}
24358begin c:=cur_chr;
24359if job_name=0 then open_log_file;
24360while input_ptr>0 do
24361  if state=token_list then end_token_list@+else end_file_reading;
24362while open_parens>0 do
24363  begin print(" )"); decr(open_parens);
24364  end;
24365if cur_level>level_one then
24366  begin print_nl("("); print_esc("end occurred ");
24367  print("inside a group at level ");
24368@:end_}{\.{(\\end occurred...)}@>
24369  print_int(cur_level-level_one); print_char(")");
24370  end;
24371while cond_ptr<>null do
24372  begin print_nl("("); print_esc("end occurred ");
24373  print("when "); print_cmd_chr(if_test,cur_if);
24374  if if_line<>0 then
24375    begin print(" on line "); print_int(if_line);
24376    end;
24377  print(" was incomplete)");
24378  if_line:=if_line_field(cond_ptr);
24379  cur_if:=subtype(cond_ptr); temp_ptr:=cond_ptr;
24380  cond_ptr:=link(cond_ptr); free_node(temp_ptr,if_node_size);
24381  end;
24382if history<>spotless then
24383 if ((history=warning_issued)or(interaction<error_stop_mode)) then
24384  if selector=term_and_log then
24385  begin selector:=term_only;
24386  print_nl("(see the transcript file for additional information)");
24387@.see the transcript file...@>
24388  selector:=term_and_log;
24389  end;
24390if c=1 then
24391  begin @!init for c:=top_mark_code to split_bot_mark_code do
24392    if cur_mark[c]<>null then delete_token_ref(cur_mark[c]);
24393  if last_glue<>max_halfword then delete_glue_ref(last_glue);
24394  store_fmt_file; return;@+tini@/
24395  print_nl("(\dump is performed only by INITEX)"); return;
24396@:dump_}{\.{\\dump...only by INITEX}@>
24397  end;
24398exit:end;
24399
24400@ @<Last-minute...@>=
24401@!init procedure init_prim; {initialize all the primitives}
24402begin no_new_control_sequence:=false;
24403@<Put each...@>;
24404no_new_control_sequence:=true;
24405end;
24406tini
24407
24408@ When we begin the following code, \TeX's tables may still contain garbage;
24409the strings might not even be present. Thus we must proceed cautiously to get
24410bootstrapped in.
24411
24412But when we finish this part of the program, \TeX\ is ready to call on the
24413|main_control| routine to do its work.
24414
24415@<Get the first line...@>=
24416begin @<Initialize the input routines@>;
24417if (format_ident=0)or(buffer[loc]="&") then
24418  begin if format_ident<>0 then initialize; {erase preloaded format}
24419  if not open_fmt_file then goto final_end;
24420  if not load_fmt_file then
24421    begin w_close(fmt_file); goto final_end;
24422    end;
24423  w_close(fmt_file);
24424  while (loc<limit)and(buffer[loc]=" ") do incr(loc);
24425  end;
24426if end_line_char_inactive then decr(limit)
24427else  buffer[limit]:=end_line_char;
24428fix_date_and_time;@/
24429@<Compute the magic offset@>;
24430@<Initialize the print |selector|...@>;
24431if (loc<limit)and(cat_code(buffer[loc])<>escape) then start_input;
24432  {\.{\\input} assumed}
24433end
24434
24435@* \[52] Debugging.
24436Once \TeX\ is working, you should be able to diagnose most errors with
24437the \.{\\show} commands and other diagnostic features. But for the initial
24438stages of debugging, and for the revelation of really deep mysteries, you
24439can compile \TeX\ with a few more aids, including the \PASCAL\ runtime
24440checks and its debugger. An additional routine called |debug_help|
24441will also come into play when you type `\.D' after an error message;
24442|debug_help| also occurs just before a fatal error causes \TeX\ to succumb.
24443@^debugging@>
24444@^system dependencies@>
24445
24446The interface to |debug_help| is primitive, but it is good enough when used
24447with a \PASCAL\ debugger that allows you to set breakpoints and to read
24448variables and change their values. After getting the prompt `\.{debug \#}', you
24449type either a negative number (this exits |debug_help|), or zero (this
24450goes to a location where you can set a breakpoint, thereby entering into
24451dialog with the \PASCAL\ debugger), or a positive number |m| followed by
24452an argument |n|. The meaning of |m| and |n| will be clear from the
24453program below. (If |m=13|, there is an additional argument, |l|.)
24454@.debug \#@>
24455
24456@d breakpoint=888 {place where a breakpoint is desirable}
24457
24458@<Last-minute...@>=
24459@!debug procedure debug_help; {routine to display various things}
24460label breakpoint,exit;
24461var k,@!l,@!m,@!n:integer;
24462begin loop begin wake_up_terminal;
24463  print_nl("debug # (-1 to exit):"); update_terminal;
24464@.debug \#@>
24465  read(term_in,m);
24466  if m<0 then return
24467  else if m=0 then
24468    begin goto breakpoint;@\ {go to every label at least once}
24469    breakpoint: m:=0; @{'BREAKPOINT'@}@\
24470    end
24471  else  begin read(term_in,n);
24472    case m of
24473    @t\4@>@<Numbered cases for |debug_help|@>@;
24474    othercases print("?")
24475    endcases;
24476    end;
24477  end;
24478exit:end;
24479gubed
24480
24481@ @<Numbered cases...@>=
244821: print_word(mem[n]); {display |mem[n]| in all forms}
244832: print_int(info(n));
244843: print_int(link(n));
244854: print_word(eqtb[n]);
244865: print_word(font_info[n]);
244876: print_word(save_stack[n]);
244887: show_box(n);
24489  {show a box, abbreviated by |show_box_depth| and |show_box_breadth|}
244908: begin breadth_max:=10000; depth_threshold:=pool_size-pool_ptr-10;
24491  show_node_list(n); {show a box in its entirety}
24492  end;
244939: show_token_list(n,null,1000);
2449410: slow_print(n);
2449511: check_mem(n>0); {check wellformedness; print new busy locations if |n>0|}
2449612: search_mem(n); {look for pointers to |n|}
2449713: begin read(term_in,l); print_cmd_chr(n,l);
24498  end;
2449914: for k:=0 to n do print(buffer[k]);
2450015: begin font_in_short_display:=null_font; short_display(n);
24501  end;
2450216: panicking:=not panicking;
24503
24504@* \[53] Extensions.
24505The program above includes a bunch of ``hooks'' that allow further
24506capabilities to be added without upsetting \TeX's basic structure.
24507Most of these hooks are concerned with ``whatsit'' nodes, which are
24508intended to be used for special purposes; whenever a new extension to
24509\TeX\ involves a new kind of whatsit node, a corresponding change needs
24510to be made to the routines below that deal with such nodes,
24511but it will usually be unnecessary to make many changes to the
24512other parts of this program.
24513
24514In order to demonstrate how extensions can be made, we shall treat
24515`\.{\\write}', `\.{\\openout}', `\.{\\closeout}', `\.{\\immediate}',
24516`\.{\\special}', and `\.{\\setlanguage}' as if they were extensions.
24517These commands are actually primitives of \TeX, and they should
24518appear in all implementations of the system; but let's try to imagine
24519that they aren't. Then the program below illustrates how a person
24520could add them.
24521
24522Sometimes, of course, an extension will require changes to \TeX\ itself;
24523no system of hooks could be complete enough for all conceivable extensions.
24524The features associated with `\.{\\write}' are almost all confined to the
24525following paragraphs, but there are small parts of the |print_ln| and
24526|print_char| procedures that were introduced specifically to \.{\\write}
24527characters. Furthermore one of the token lists recognized by the scanner
24528is a |write_text|; and there are a few other miscellaneous places where we
24529have already provided for some aspect of \.{\\write}.  The goal of a \TeX\
24530extender should be to minimize alterations to the standard parts of the
24531program, and to avoid them completely if possible. He or she should also
24532be quite sure that there's no easy way to accomplish the desired goals
24533with the standard features that \TeX\ already has. ``Think thrice before
24534extending,'' because that may save a lot of work, and it will also keep
24535incompatible extensions of \TeX\ from proliferating.
24536@^system dependencies@>
24537@^extensions to \TeX@>
24538
24539@ First let's consider the format of whatsit nodes that are used to represent
24540the data associated with \.{\\write} and its relatives. Recall that a whatsit
24541has |type=whatsit_node|, and the |subtype| is supposed to distinguish
24542different kinds of whatsits. Each node occupies two or more words; the
24543exact number is immaterial, as long as it is readily determined from the
24544|subtype| or other data.
24545
24546We shall introduce five |subtype| values here, corresponding to the
24547control sequences \.{\\openout}, \.{\\write}, \.{\\closeout}, \.{\\special}, and
24548\.{\\setlanguage}. The second word of I/O whatsits has a |write_stream| field
24549that identifies the write-stream number (0 to 15, or 16 for out-of-range and
24550positive, or 17 for out-of-range and negative).
24551In the case of \.{\\write} and \.{\\special}, there is also a field that
24552points to the reference count of a token list that should be sent. In the
24553case of \.{\\openout}, we need three words and three auxiliary subfields
24554to hold the string numbers for name, area, and extension.
24555
24556@d write_node_size=2 {number of words in a write/whatsit node}
24557@d open_node_size=3 {number of words in an open/whatsit node}
24558@d open_node=0 {|subtype| in whatsits that represent files to \.{\\openout}}
24559@d write_node=1 {|subtype| in whatsits that represent things to \.{\\write}}
24560@d close_node=2 {|subtype| in whatsits that represent streams to \.{\\closeout}}
24561@d special_node=3 {|subtype| in whatsits that represent \.{\\special} things}
24562@d language_node=4 {|subtype| in whatsits that change the current language}
24563@d what_lang(#)==link(#+1) {language number, in the range |0..255|}
24564@d what_lhm(#)==type(#+1) {minimum left fragment, in the range |1..63|}
24565@d what_rhm(#)==subtype(#+1) {minimum right fragment, in the range |1..63|}
24566@d write_tokens(#) == link(#+1) {reference count of token list to write}
24567@d write_stream(#) == info(#+1) {stream number (0 to 17)}
24568@d open_name(#) == link(#+1) {string number of file name to open}
24569@d open_area(#) == info(#+2) {string number of file area for |open_name|}
24570@d open_ext(#) == link(#+2) {string number of file extension for |open_name|}
24571
24572@ The sixteen possible \.{\\write} streams are represented by the |write_file|
24573array. The |j|th file is open if and only if |write_open[j]=true|. The last
24574two streams are special; |write_open[16]| represents a stream number
24575greater than 15, while |write_open[17]| represents a negative stream number,
24576and both of these variables are always |false|.
24577
24578@<Glob...@>=
24579@!write_file:array[0..15] of alpha_file;
24580@!write_open:array[0..17] of boolean;
24581
24582@ @<Set init...@>=
24583for k:=0 to 17 do write_open[k]:=false;
24584
24585@ Extensions might introduce new command codes; but it's best to use
24586|extension| with a modifier, whenever possible, so that |main_control|
24587stays the same.
24588
24589@d immediate_code=4 {command modifier for \.{\\immediate}}
24590@d set_language_code=5 {command modifier for \.{\\setlanguage}}
24591
24592@<Put each...@>=
24593primitive("openout",extension,open_node);@/
24594@!@:open_out_}{\.{\\openout} primitive@>
24595primitive("write",extension,write_node); write_loc:=cur_val;@/
24596@!@:write_}{\.{\\write} primitive@>
24597primitive("closeout",extension,close_node);@/
24598@!@:close_out_}{\.{\\closeout} primitive@>
24599primitive("special",extension,special_node);@/
24600@!@:special_}{\.{\\special} primitive@>
24601primitive("immediate",extension,immediate_code);@/
24602@!@:immediate_}{\.{\\immediate} primitive@>
24603primitive("setlanguage",extension,set_language_code);@/
24604@!@:set_language_}{\.{\\setlanguage} primitive@>
24605
24606@ The variable |write_loc| just introduced is used to provide an
24607appropriate error message in case of ``runaway'' write texts.
24608
24609@<Glob...@>=
24610@!write_loc:pointer; {|eqtb| address of \.{\\write}}
24611
24612@ @<Cases of |print_cmd_chr|...@>=
24613extension: case chr_code of
24614  open_node:print_esc("openout");
24615  write_node:print_esc("write");
24616  close_node:print_esc("closeout");
24617  special_node:print_esc("special");
24618  immediate_code:print_esc("immediate");
24619  set_language_code:print_esc("setlanguage");
24620  othercases print("[unknown extension!]")
24621  endcases;
24622
24623@ When an |extension| command occurs in |main_control|, in any mode,
24624the |do_extension| routine is called.
24625
24626@<Cases of |main_control| that are for extensions...@>=
24627any_mode(extension):do_extension;
24628
24629@ @<Declare act...@>=
24630@t\4@>@<Declare procedures needed in |do_extension|@>@;
24631procedure do_extension;
24632var i,@!j,@!k:integer; {all-purpose integers}
24633@!p,@!q,@!r:pointer; {all-purpose pointers}
24634begin case cur_chr of
24635open_node:@<Implement \.{\\openout}@>;
24636write_node:@<Implement \.{\\write}@>;
24637close_node:@<Implement \.{\\closeout}@>;
24638special_node:@<Implement \.{\\special}@>;
24639immediate_code:@<Implement \.{\\immediate}@>;
24640set_language_code:@<Implement \.{\\setlanguage}@>;
24641othercases confusion("ext1")
24642@:this can't happen ext1}{\quad ext1@>
24643endcases;
24644end;
24645
24646@ Here is a subroutine that creates a whatsit node having a given |subtype|
24647and a given number of words. It initializes only the first word of the whatsit,
24648and appends it to the current list.
24649
24650@<Declare procedures needed in |do_extension|@>=
24651procedure new_whatsit(@!s:small_number;@!w:small_number);
24652var p:pointer; {the new node}
24653begin p:=get_node(w); type(p):=whatsit_node; subtype(p):=s;
24654link(tail):=p; tail:=p;
24655end;
24656
24657@ The next subroutine uses |cur_chr| to decide what sort of whatsit is
24658involved, and also inserts a |write_stream| number.
24659
24660@<Declare procedures needed in |do_ext...@>=
24661procedure new_write_whatsit(@!w:small_number);
24662begin new_whatsit(cur_chr,w);
24663if w<>write_node_size then scan_four_bit_int
24664else  begin scan_int;
24665  if cur_val<0 then cur_val:=17
24666  else if cur_val>15 then cur_val:=16;
24667  end;
24668write_stream(tail):=cur_val;
24669end;
24670
24671@ @<Implement \.{\\openout}@>=
24672begin new_write_whatsit(open_node_size);
24673scan_optional_equals; scan_file_name;@/
24674open_name(tail):=cur_name; open_area(tail):=cur_area; open_ext(tail):=cur_ext;
24675end
24676
24677@ When `\.{\\write 12\{...\}}' appears, we scan the token list `\.{\{...\}}'
24678without expanding its macros; the macros will be expanded later when this
24679token list is rescanned.
24680
24681@<Implement \.{\\write}@>=
24682begin k:=cur_cs; new_write_whatsit(write_node_size);@/
24683cur_cs:=k; p:=scan_toks(false,false); write_tokens(tail):=def_ref;
24684end
24685
24686@ @<Implement \.{\\closeout}@>=
24687begin new_write_whatsit(write_node_size); write_tokens(tail):=null;
24688end
24689
24690@ When `\.{\\special\{...\}}' appears, we expand the macros in the token
24691list as in \.{\\xdef} and \.{\\mark}.
24692
24693@<Implement \.{\\special}@>=
24694begin new_whatsit(special_node,write_node_size); write_stream(tail):=null;
24695p:=scan_toks(false,true); write_tokens(tail):=def_ref;
24696end
24697
24698@ Each new type of node that appears in our data structure must be capable
24699of being displayed, copied, destroyed, and so on. The routines that we
24700need for write-oriented whatsits are somewhat like those for mark nodes;
24701other extensions might, of course, involve more subtlety here.
24702
24703@<Basic printing...@>=
24704procedure print_write_whatsit(@!s:str_number;@!p:pointer);
24705begin print_esc(s);
24706if write_stream(p)<16 then print_int(write_stream(p))
24707else if write_stream(p)=16 then print_char("*")
24708@.*\relax@>
24709else print_char("-");
24710end;
24711
24712@ @<Display the whatsit...@>=
24713case subtype(p) of
24714open_node:begin print_write_whatsit("openout",p);
24715  print_char("="); print_file_name(open_name(p),open_area(p),open_ext(p));
24716  end;
24717write_node:begin print_write_whatsit("write",p);
24718  print_mark(write_tokens(p));
24719  end;
24720close_node:print_write_whatsit("closeout",p);
24721special_node:begin print_esc("special");
24722  print_mark(write_tokens(p));
24723  end;
24724language_node:begin print_esc("setlanguage");
24725  print_int(what_lang(p)); print(" (hyphenmin ");
24726  print_int(what_lhm(p)); print_char(",");
24727  print_int(what_rhm(p)); print_char(")");
24728  end;
24729othercases print("whatsit?")
24730endcases
24731
24732@ @<Make a partial copy of the whatsit...@>=
24733case subtype(p) of
24734open_node: begin r:=get_node(open_node_size); words:=open_node_size;
24735  end;
24736write_node,special_node: begin r:=get_node(write_node_size);
24737  add_token_ref(write_tokens(p)); words:=write_node_size;
24738  end;
24739close_node,language_node: begin r:=get_node(small_node_size);
24740  words:=small_node_size;
24741  end;
24742othercases confusion("ext2")
24743@:this can't happen ext2}{\quad ext2@>
24744endcases
24745
24746@ @<Wipe out the whatsit...@>=
24747begin case subtype(p) of
24748open_node: free_node(p,open_node_size);
24749write_node,special_node: begin delete_token_ref(write_tokens(p));
24750  free_node(p,write_node_size); goto done;
24751  end;
24752close_node,language_node: free_node(p,small_node_size);
24753othercases confusion("ext3")
24754@:this can't happen ext3}{\quad ext3@>
24755endcases;@/
24756goto done;
24757end
24758
24759@ @<Incorporate a whatsit node into a vbox@>=do_nothing
24760
24761@ @<Incorporate a whatsit node into an hbox@>=do_nothing
24762
24763@ @<Let |d| be the width of the whatsit |p|@>=d:=0
24764
24765@ @d adv_past(#)==@+if subtype(#)=language_node then
24766    begin cur_lang:=what_lang(#); l_hyf:=what_lhm(#); r_hyf:=what_rhm(#);@+end
24767
24768@<Advance \(p)past a whatsit node in the \(l)|line_break| loop@>=@+
24769adv_past(cur_p)
24770
24771@ @<Advance \(p)past a whatsit node in the \(p)pre-hyphenation loop@>=@+
24772adv_past(s)
24773
24774@ @<Prepare to move whatsit |p| to the current page, then |goto contribute|@>=
24775goto contribute
24776
24777@ @<Process whatsit |p| in |vert_break| loop, |goto not_found|@>=
24778goto not_found
24779
24780@ @<Output the whatsit node |p| in a vlist@>=
24781out_what(p)
24782
24783@ @<Output the whatsit node |p| in an hlist@>=
24784out_what(p)
24785
24786@ After all this preliminary shuffling, we come finally to the routines
24787that actually send out the requested data. Let's do \.{\\special} first
24788(it's easier).
24789
24790@<Declare procedures needed in |hlist_out|, |vlist_out|@>=
24791procedure special_out(@!p:pointer);
24792var old_setting:0..max_selector; {holds print |selector|}
24793@!k:pool_pointer; {index into |str_pool|}
24794begin synch_h; synch_v;@/
24795old_setting:=selector; selector:=new_string;
24796show_token_list(link(write_tokens(p)),null,pool_size-pool_ptr);
24797selector:=old_setting;
24798str_room(1);
24799if cur_length<256 then
24800  begin dvi_out(xxx1); dvi_out(cur_length);
24801  end
24802else  begin dvi_out(xxx4); dvi_four(cur_length);
24803  end;
24804for k:=str_start[str_ptr] to pool_ptr-1 do dvi_out(so(str_pool[k]));
24805pool_ptr:=str_start[str_ptr]; {erase the string}
24806end;
24807
24808@ To write a token list, we must run it through \TeX's scanner, expanding
24809macros and \.{\\the} and \.{\\number}, etc. This might cause runaways,
24810if a delimited macro parameter isn't matched, and runaways would be
24811extremely confusing since we are calling on \TeX's scanner in the middle
24812of a \.{\\shipout} command. Therefore we will put a dummy control sequence as
24813a ``stopper,'' right after the token list. This control sequence is
24814artificially defined to be \.{\\outer}.
24815@:end_write_}{\.{\\endwrite}@>
24816
24817@<Initialize table...@>=
24818text(end_write):="endwrite"; eq_level(end_write):=level_one;
24819eq_type(end_write):=outer_call; equiv(end_write):=null;
24820
24821@ @<Declare procedures needed in |hlist_out|, |vlist_out|@>=
24822procedure write_out(@!p:pointer);
24823var old_setting:0..max_selector; {holds print |selector|}
24824@!old_mode:integer; {saved |mode|}
24825@!j:small_number; {write stream number}
24826@!q,@!r:pointer; {temporary variables for list manipulation}
24827begin @<Expand macros in the token list
24828  and make |link(def_ref)| point to the result@>;
24829old_setting:=selector; j:=write_stream(p);
24830if write_open[j] then selector:=j
24831else  begin {write to the terminal if file isn't open}
24832  if (j=17)and(selector=term_and_log) then selector:=log_only;
24833  print_nl("");
24834  end;
24835token_show(def_ref); print_ln;
24836flush_list(def_ref); selector:=old_setting;
24837end;
24838
24839@ The final line of this routine is slightly subtle; at least, the author
24840didn't think about it until getting burnt! There is a used-up token list
24841@^Knuth, Donald Ervin@>
24842on the stack, namely the one that contained |end_write_token|. (We
24843insert this artificial `\.{\\endwrite}' to prevent runaways, as explained
24844above.) If it were not removed, and if there were numerous writes on a
24845single page, the stack would overflow.
24846
24847@d end_write_token==cs_token_flag+end_write
24848
24849@<Expand macros in the token list and...@>=
24850q:=get_avail; info(q):=right_brace_token+"}";@/
24851r:=get_avail; link(q):=r; info(r):=end_write_token; ins_list(q);@/
24852begin_token_list(write_tokens(p),write_text);@/
24853q:=get_avail; info(q):=left_brace_token+"{"; ins_list(q);
24854{now we're ready to scan
24855  `\.\{$\langle\,$token list$\,\rangle$\.{\} \\endwrite}'}
24856old_mode:=mode; mode:=0;
24857  {disable \.{\\prevdepth}, \.{\\spacefactor}, \.{\\lastskip}, \.{\\prevgraf}}
24858cur_cs:=write_loc; q:=scan_toks(false,true); {expand macros, etc.}
24859get_token;@+if cur_tok<>end_write_token then
24860  @<Recover from an unbalanced write command@>;
24861mode:=old_mode;
24862end_token_list {conserve stack space}
24863
24864@ @<Recover from an unbalanced write command@>=
24865begin print_err("Unbalanced write command");
24866@.Unbalanced write...@>
24867help2("On this page there's a \write with fewer real {'s than }'s.")@/
24868("I can't handle that very well; good luck."); error;
24869repeat get_token;
24870until cur_tok=end_write_token;
24871end
24872
24873@ The |out_what| procedure takes care of outputting whatsit nodes for
24874|vlist_out| and |hlist_out|\kern-.3pt.
24875
24876@<Declare procedures needed in |hlist_out|, |vlist_out|@>=
24877procedure out_what(@!p:pointer);
24878var j:small_number; {write stream number}
24879begin case subtype(p) of
24880open_node,write_node,close_node:@<Do some work that has been queued up
24881  for \.{\\write}@>;
24882special_node:special_out(p);
24883language_node:do_nothing;
24884othercases confusion("ext4")
24885@:this can't happen ext4}{\quad ext4@>
24886endcases;
24887end;
24888
24889@ We don't implement \.{\\write} inside of leaders. (The reason is that
24890the number of times a leader box appears might be different in different
24891implementations, due to machine-dependent rounding in the glue calculations.)
24892@^leaders@>
24893
24894@<Do some work that has been queued up...@>=
24895if not doing_leaders then
24896  begin j:=write_stream(p);
24897  if subtype(p)=write_node then write_out(p)
24898  else  begin if write_open[j] then a_close(write_file[j]);
24899    if subtype(p)=close_node then write_open[j]:=false
24900    else if j<16 then
24901      begin cur_name:=open_name(p); cur_area:=open_area(p);
24902      cur_ext:=open_ext(p);
24903      if cur_ext="" then cur_ext:=".tex";
24904      pack_cur_name;
24905      while not a_open_out(write_file[j]) do
24906        prompt_file_name("output file name",".tex");
24907      write_open[j]:=true;
24908      end;
24909    end;
24910  end
24911
24912@ The presence of `\.{\\immediate}' causes the |do_extension| procedure
24913to descend to one level of recursion. Nothing happens unless \.{\\immediate}
24914is followed by `\.{\\openout}', `\.{\\write}', or `\.{\\closeout}'.
24915@^recursion@>
24916
24917@<Implement \.{\\immediate}@>=
24918begin get_x_token;
24919if (cur_cmd=extension)and(cur_chr<=close_node) then
24920  begin p:=tail; do_extension; {append a whatsit node}
24921  out_what(tail); {do the action immediately}
24922  flush_node_list(tail); tail:=p; link(p):=null;
24923  end
24924else back_input;
24925end
24926
24927@ The \.{\\language} extension is somewhat different.
24928We need a subroutine that comes into play when a character of
24929a non-|clang| language is being appended to the current paragraph.
24930
24931@<Declare action...@>=
24932procedure fix_language;
24933var @!l:ASCII_code; {the new current language}
24934begin if language<=0 then l:=0
24935else if language>255 then l:=0
24936else l:=language;
24937if l<>clang then
24938  begin new_whatsit(language_node,small_node_size);
24939  what_lang(tail):=l; clang:=l;@/
24940  what_lhm(tail):=norm_min(left_hyphen_min);
24941  what_rhm(tail):=norm_min(right_hyphen_min);
24942  end;
24943end;
24944
24945@ @<Implement \.{\\setlanguage}@>=
24946if abs(mode)<>hmode then report_illegal_case
24947else begin new_whatsit(language_node,small_node_size);
24948  scan_int;
24949  if cur_val<=0 then clang:=0
24950  else if cur_val>255 then clang:=0
24951  else clang:=cur_val;
24952  what_lang(tail):=clang;
24953  what_lhm(tail):=norm_min(left_hyphen_min);
24954  what_rhm(tail):=norm_min(right_hyphen_min);
24955  end
24956
24957@ @<Finish the extensions@>=
24958for k:=0 to 15 do if write_open[k] then a_close(write_file[k])
24959
24960@* \[54] System-dependent changes.
24961This section should be replaced, if necessary, by any special
24962modifications of the program
24963that are necessary to make \TeX\ work at a particular installation.
24964It is usually best to design your change file so that all changes to
24965previous sections preserve the section numbering; then everybody's version
24966will be consistent with the published program. More extensive changes,
24967which introduce new sections, can be inserted here; then only the index
24968itself will get a new section number.
24969@^system dependencies@>
24970
24971@* \[55] Index.
24972Here is where you can find all uses of each identifier in the program,
24973with underlined entries pointing to where the identifier was defined.
24974If the identifier is only one letter long, however, you get to see only
24975the underlined entries. {\sl All references are to section numbers instead of
24976page numbers.}
24977
24978This index also lists error messages and other aspects of the program
24979that you might want to look up some day. For example, the entry
24980for ``system dependencies'' lists all sections that should receive
24981special attention from people who are installing \TeX\ in a new
24982operating environment. A list of various things that can't happen appears
24983under ``this can't happen''. Approximately 40 sections are listed under
24984``inner loop''; these account for about 60\pct! of \TeX's running time,
24985exclusive of input and output.
24986