1% Copyright (c) 1994-2008 by SIL International
2% Copyright (c) 2009-2012 by Jonathan Kew
3% Copyright (c) 2010-2012 by Han Han The Thanh
4% Copyright (c) 2012-2013 by Khaled Hosny
5
6% SIL Author(s): Jonathan Kew
7
8% Part of the XeTeX typesetting system
9
10% Permission is hereby granted, free of charge, to any person obtaining a copy
11% of this software and associated documentation files (the "Software"), to deal
12% in the Software without restriction, including without limitation the rights
13% to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
14% copies of the Software, and to permit persons to whom the Software is
15% furnished to do so, subject to the following conditions:
16
17% The above copyright notice and this permission notice shall be included in all
18% copies or substantial portions of the Software.
19
20% THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21% IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22% FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
23% COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
24% IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
25% CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
26
27% Except as contained in this notice, the name of the copyright holders shall
28% not be used in advertising or otherwise to promote the sale, use or other
29% dealings in this Software without prior written authorization from the
30% copyright holders.
31
32% e-TeX is copyright (C) 1999-2012 by P. Breitenlohner (1994,98 by the NTS
33% team); all rights are reserved. Copying of this file is authorized only if
34% (1) you are P. Breitenlohner, or if (2) you make absolutely no changes to
35% your copy. (Programs such as TIE allow the application of several change
36% files to tex.web; the master files tex.web and etex.ch should stay intact.)
37
38% See etex_gen.tex for hints on how to install this program.
39% And see etripman.tex for details about how to validate it.
40
41% e-TeX and NTS are trademarks of the NTS group.
42% TeX is a trademark of the American Mathematical Society.
43% METAFONT is a trademark of Addison-Wesley Publishing Company.
44
45% This program is directly derived from Donald E. Knuth's TeX;
46% the change history which follows and the reward offered for finders of
47% bugs refer specifically to TeX; they should not be taken as referring
48% to e-TeX, although the change history is relevant in that it
49% demonstrates the evolutionary path followed.  This program is not TeX;
50% that name is reserved strictly for the program which is the creation
51% and sole responsibility of Professor Knuth.
52
53% Version 0 was released in September 1982 after it passed a variety of tests.
54% Version 1 was released in November 1983 after thorough testing.
55% Version 1.1 fixed ``disappearing font identifiers'' et alia (July 1984).
56% Version 1.2 allowed `0' in response to an error, et alia (October 1984).
57% Version 1.3 made memory allocation more flexible and local (November 1984).
58% Version 1.4 fixed accents right after line breaks, et alia (April 1985).
59% Version 1.5 fixed \the\toks after other expansion in \edefs (August 1985).
60% Version 2.0 (almost identical to 1.5) corresponds to "Volume B" (April 1986).
61% Version 2.1 corrected anomalies in discretionary breaks (January 1987).
62% Version 2.2 corrected "(Please type...)" with null \endlinechar (April 1987).
63% Version 2.3 avoided incomplete page in premature termination (August 1987).
64% Version 2.4 fixed \noaligned rules in indented displays (August 1987).
65% Version 2.5 saved cur_order when expanding tokens (September 1987).
66% Version 2.6 added 10sp slop when shipping leaders (November 1987).
67% Version 2.7 improved rounding of negative-width characters (November 1987).
68% Version 2.8 fixed weird bug if no \patterns are used (December 1987).
69% Version 2.9 made \csname\endcsname's "relax" local (December 1987).
70% Version 2.91 fixed \outer\def\a0{}\a\a bug (April 1988).
71% Version 2.92 fixed \patterns, also file names with complex macros (May 1988).
72% Version 2.93 fixed negative halving in allocator when mem_min<0 (June 1988).
73% Version 2.94 kept open_log_file from calling fatal_error (November 1988).
74% Version 2.95 solved that problem a better way (December 1988).
75% Version 2.96 corrected bug in "Infinite shrinkage" recovery (January 1989).
76% Version 2.97 corrected blunder in creating 2.95 (February 1989).
77% Version 2.98 omitted save_for_after at outer level (March 1989).
78% Version 2.99 caught $$\begingroup\halign..$$ (June 1989).
79% Version 2.991 caught .5\ifdim.6... (June 1989).
80% Version 2.992 introduced major changes for 8-bit extensions (September 1989).
81% Version 2.993 fixed a save_stack synchronization bug et alia (December 1989).
82% Version 3.0 fixed unusual displays; was more \output robust (March 1990).
83% Version 3.1 fixed nullfont, disabled \write{\the\prevgraf} (September 1990).
84% Version 3.14 fixed unprintable font names and corrected typos (March 1991).
85% Version 3.141 more of same; reconstituted ligatures better (March 1992).
86% Version 3.1415 preserved nonexplicit kerns, tidied up (February 1993).
87% Version 3.14159 allowed fontmemsize to change; bulletproofing (March 1995).
88% Version 3.141592 fixed \xleaders, glueset, weird alignments (December 2002).
89% Version 3.1415926 was a general cleanup with minor fixes (February 2008).
90% Version 3.14159265 was similar (January 2014).
91
92% A reward of $327.68 will be paid to the first finder of any remaining bug.
93
94% A preliminary version of TeX--XeT was released in April 1992.
95% TeX--XeT version 1.0 was released in June 1992,
96%   version 1.1 prevented arith overflow in glue computation (Oct 1992).
97% A preliminary e-TeX version 0.95 was operational in March 1994.
98% Version 1.0beta was released in May 1995.
99% Version 1.01beta fixed bugs in just_copy and every_eof (December 1995).
100% Version 1.02beta allowed 256 mark classes (March 1996).
101% Version 1.1 changed \group{type,level} -> \currentgroup{type,level},
102%             first public release (October 1996).
103% Version 2.0 development was started in March 1997;
104%             fixed a ligature-\beginR bug in January 1998;
105%             was released in March 1998.
106% Version 2.1 fixed a \marks bug (when min_halfword<>0) (January 1999).
107% Version 2.2 development was started in Feb 2003; released in Oct 2004.
108%             fixed a bug in sparse array handling (0=>null), Jun 2002;
109%             fixed a bug in \lastnodetype (cur_val=>cur_val_level)
110%                 reported by Hartmut Henkel <hartmut_henkel@@gmx.de>,
111%                 fix by Fabrice Popineau <Fabrice.Popineau@@supelec.fr>,
112%                 Jan 2004;
113%             another bug in sparse array handling (cur_ptr=>cur_chr)
114%                 reported by Taco Hoekwater <taco@@elvenkind.com>, Jul 2004;
115%             fixed a sparse array reference count bug (\let,\futurelet),
116%                 fix by Bernd Raichle <berd@@dante.de>, Aug 2004;
117%             reorganized handling of banner, additional token list and
118%                 integer parameters, and similar in order to reduce the
119%                 interference between eTeX, pdfTeX, and web2c change files.
120%             adapted to tex.web 3.141592, revised glue rounding for mixed
121%                 direction typesetting;
122%             fixed a bug in the revised glue rounding code, detected by
123%                 Tigran Aivazian <tigran@@aivazian.fsnet.co.uk>, Oct 2004.
124% Version 2.3 development was started in Feb 2008; released in Apr 2011.
125%             fixed a bug in hyph_code handling (\savinghyphcodes)
126%                 reported by Vladimir Volovich <vvv@@vsu.ru>, Feb 2008.
127%             fixed the error messages for improper use of \protected,
128%                 reported by Heiko Oberdiek
129%                 <heiko.oberdiek@@googlemail.com>, May 2010.
130%             some rearrangements to reduce interferences between
131%                 e-TeX and pTeX, in part suggested by Hironori Kitagawa
132%                 <h_kitagawa2001@@yahoo.co.jp>, Mar 2011.
133% Version 2.4 fixed an uninitialized line number bug, released in May 2012.
134% Version 2.5 development was started in Aug 2012; released in Feb 2013.
135%             better tracing of font definitions, reported by
136%                 Bruno Le Floch <blflatex@@gmail.com>, Jul 2012.
137% Version 2.6 development was started in Mar 2013; released in ??? 201?.
138%             enable hyphenation of text between \beginL and \endL or
139%                 between \beginR and \endR, problem reported by
140%                 Vafa Khalighi <vafalgk@@gmail.com>, Nov 2013.
141%             better handling of right-to-left text -- to be done.
142
143% Although considerable effort has been expended to make the e-TeX program
144% correct and reliable, no warranty is implied; the author disclaims any
145% obligation or liability for damages, including but not limited to
146% special, indirect, or consequential damages arising out of or in
147% connection with the use or performance of this software. This work has
148% been a ``labor of love'' and the author hopes that users enjoy it.
149
150% Here is TeX material that gets inserted after \input webmac
151\input xewebmac
152\def\hang{\hangindent 3em\noindent\ignorespaces}
153\def\hangg#1 {\hang\hbox{#1 }}
154\def\textindent#1{\hangindent2.5em\noindent\hbox to2.5em{\hss#1 }\ignorespaces}
155\font\ninerm=cmr9
156\let\mc=\ninerm % medium caps for names like SAIL
157
158\def\mirror#1{%
159  \phantom{#1}%
160  \rlap{\special{pdf:btrans xscale -1}#1\special{pdf:etrans}}%
161}
162\def\XeTeX{X\kern-.125em\lower.5ex\hbox{\mirror{E}}\kern-.1667em\TeX}
163
164\def\eTeX{$\varepsilon$-\TeX}
165\font\revrm=xbmc10 % for right-to-left text
166% to generate xbmc10 (i.e., reflected cmbx10) use a file
167% xbmc10.mf containing:
168%+++++++++++++++++++++++++++++++++++++++++++++++++
169%     if unknown cmbase: input cmbase fi
170%     extra_endchar := extra_endchar &
171%       "currentpicture:=currentpicture " &
172%       "reflectedabout((.5[l,r],0),(.5[l,r],1));";
173%     input cmbx10
174%+++++++++++++++++++++++++++++++++++++++++++++++++
175\def\TeXeT{\TeX-{\revrm\beginR\TeX\endR}} % for TeX-XeT
176\def\PASCAL{Pascal}
177\def\ph{\hbox{Pascal-H}}
178\def\pct!{{\char`\%}} % percent sign in ordinary text
179\def\grp{\.{\char'173...\char'175}}
180\font\logo=logo10 % font used for the METAFONT logo
181\def\MF{{\logo META}\-{\logo FONT}}
182\def\<#1>{$\langle#1\rangle$}
183\def\section{\mathhexbox278}
184
185\def\(#1){} % this is used to make section names sort themselves better
186\def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@>
187
188\outer\def\N#1. \[#2]#3.{\MN#1.\vfil\eject % begin starred section
189  \def\rhead{PART #2:\uppercase{#3}} % define running headline
190  \message{*\modno} % progress report
191  \edef\next{\write\cont{\Z{\?#2]#3}{\modno}{\the\pageno}}}\next
192  \ifon\startsection{\bf\ignorespaces#3.\quad}\ignorespaces}
193\let\?=\relax % we want to be able to \write a \?
194
195\def\title{\XeTeX}
196% system dependent redefinitions of \title should come later
197% and should use:
198%    \toks0=\expandafter{\title}
199%    \edef\title{...\the\toks0...}
200\def\topofcontents{\hsize 5.5in
201  \vglue 0pt plus 1fil minus 1.5in
202  \def\?##1]{\hbox to 1in{\hfil##1.\ }}
203  }
204\def\botofcontents{\vskip 0pt plus 1fil minus 1.5in}
205\pageno=3
206\def\glob{13} % this should be the section number of "<Global...>"
207\def\gglob{20, 26} % this should be the next two sections of "<Global...>"
208
209@* \[1] Introduction.
210This is \XeTeX, a program derived from and extending the capabilities of
211\TeX, a document compiler intended to produce typesetting of high
212quality.
213The \PASCAL\ program that follows is the definition of \TeX82, a standard
214@:PASCAL}{\PASCAL@>
215@!@:TeX82}{\TeX82@>
216version of \TeX\ that is designed to be highly portable so that identical output
217will be obtainable on a great variety of computers.
218
219The main purpose of the following program is to explain the algorithms of \TeX\
220as clearly as possible. As a result, the program will not necessarily be very
221efficient when a particular \PASCAL\ compiler has translated it into a
222particular machine language. However, the program has been written so that it
223can be tuned to run efficiently in a wide variety of operating environments
224by making comparatively few changes. Such flexibility is possible because
225the documentation that follows is written in the \.{WEB} language, which is
226at a higher level than \PASCAL; the preprocessing step that converts \.{WEB}
227to \PASCAL\ is able to introduce most of the necessary refinements.
228Semi-automatic translation to other languages is also feasible, because the
229program below does not make extensive use of features that are peculiar to
230\PASCAL.
231
232A large piece of software like \TeX\ has inherent complexity that cannot
233be reduced below a certain level of difficulty, although each individual
234part is fairly simple by itself. The \.{WEB} language is intended to make
235the algorithms as readable as possible, by reflecting the way the
236individual program pieces fit together and by providing the
237cross-references that connect different parts. Detailed comments about
238what is going on, and about why things were done in certain ways, have
239been liberally sprinkled throughout the program.  These comments explain
240features of the implementation, but they rarely attempt to explain the
241\TeX\ language itself, since the reader is supposed to be familiar with
242{\sl The \TeX book}.
243@.WEB@>
244@:TeXbook}{\sl The \TeX book@>
245
246@ The present implementation has a long ancestry, beginning in the summer
247of~1977, when Michael~F. Plass and Frank~M. Liang designed and coded
248a prototype
249@^Plass, Michael Frederick@>
250@^Liang, Franklin Mark@>
251@^Knuth, Donald Ervin@>
252based on some specifications that the author had made in May of that year.
253This original proto\TeX\ included macro definitions and elementary
254manipulations on boxes and glue, but it did not have line-breaking,
255page-breaking, mathematical formulas, alignment routines, error recovery,
256or the present semantic nest; furthermore,
257it used character lists instead of token lists, so that a control sequence
258like \.{\\halign} was represented by a list of seven characters. A
259complete version of \TeX\ was designed and coded by the author in late
2601977 and early 1978; that program, like its prototype, was written in the
261{\mc SAIL} language, for which an excellent debugging system was
262available. Preliminary plans to convert the {\mc SAIL} code into a form
263somewhat like the present ``web'' were developed by Luis Trabb~Pardo and
264@^Trabb Pardo, Luis Isidoro@>
265the author at the beginning of 1979, and a complete implementation was
266created by Ignacio~A. Zabala in 1979 and 1980. The \TeX82 program, which
267@^Zabala Salelles, Ignacio Andr\'es@>
268was written by the author during the latter part of 1981 and the early
269part of 1982, also incorporates ideas from the 1979 implementation of
270@^Guibas, Leonidas Ioannis@>
271@^Sedgewick, Robert@>
272@^Wyatt, Douglas Kirk@>
273\TeX\ in {\mc MESA} that was written by Leonidas Guibas, Robert Sedgewick,
274and Douglas Wyatt at the Xerox Palo Alto Research Center.  Several hundred
275refinements were introduced into \TeX82 based on the experiences gained with
276the original implementations, so that essentially every part of the system
277has been substantially improved. After the appearance of ``Version 0'' in
278September 1982, this program benefited greatly from the comments of
279many other people, notably David~R. Fuchs and Howard~W. Trickey.
280A final revision in September 1989 extended the input character set to
281eight-bit codes and introduced the ability to hyphenate words from
282different languages, based on some ideas of Michael~J. Ferguson.
283@^Fuchs, David Raymond@>
284@^Trickey, Howard Wellington@>
285@^Ferguson, Michael John@>
286
287No doubt there still is plenty of room for improvement, but the author
288is firmly committed to keeping \TeX82 ``frozen'' from now on; stability
289and reliability are to be its main virtues.
290
291On the other hand, the \.{WEB} description can be extended without changing
292the core of \TeX82 itself, and the program has been designed so that such
293extensions are not extremely difficult to make.
294The |banner| string defined here should be changed whenever \TeX\
295undergoes any modifications, so that it will be clear which version of
296\TeX\ might be the guilty party when a problem arises.
297@^extensions to \TeX@>
298@^system dependencies@>
299
300This program contains code for various features extending \TeX,
301therefore this program is called `\XeTeX' and not
302`\TeX'; the official name `\TeX' by itself is reserved
303for software systems that are fully compatible with each other.
304A special test suite called the ``\.{TRIP} test'' is available for
305helping to determine whether a particular implementation deserves to be
306known as `\TeX' [cf.~Stanford Computer Science report CS1027,
307November 1984].
308
309A similar test suite called the ``\.{e-TRIP} test'' is available for
310helping to determine whether a particular implementation deserves to be
311known as `\eTeX'.
312
313@d eTeX_version=2 { \.{\\eTeXversion} }
314@d eTeX_revision==".6" { \.{\\eTeXrevision} }
315@d eTeX_version_string=='-2.6' {current \eTeX\ version}
316@#
317@d XeTeX_version=0 { \.{\\XeTeXversion} }
318@d XeTeX_revision==".99992" { \.{\\XeTeXrevision} }
319@d XeTeX_version_string=='-0.99992' {current \XeTeX\ version}
320@#
321@d XeTeX_banner=='This is XeTeX, Version 3.14159265',eTeX_version_string,XeTeX_version_string
322  {printed when \XeTeX\ starts}
323@#
324@d banner=='This is TeX, Version 3.14159265' {printed when \TeX\ starts}
325@#
326@d TEX==XETEX {change program name into |XETEX|}
327@#
328@d TeXXeT_code=0 {the \TeXXeT\ feature is optional}
329@#
330@d XeTeX_dash_break_code=1 {non-zero to enable breaks after en- and em-dashes}
331@#
332@d XeTeX_upwards_code=2 {non-zero if the main vertical list is being built upwards}
333@d XeTeX_use_glyph_metrics_code=3 {non-zero to use exact glyph height/depth}
334@d XeTeX_inter_char_tokens_code=4 {non-zero to enable \.{\\XeTeXinterchartokens} insertion}
335@#
336@d XeTeX_input_normalization_code=5 {normalization mode:, 1 for NFC, 2 for NFD, else none}
337@#
338@d XeTeX_default_input_mode_code=6 {input mode for newly opened files}
339@d XeTeX_input_mode_auto=0
340@d XeTeX_input_mode_utf8=1
341@d XeTeX_input_mode_utf16be=2
342@d XeTeX_input_mode_utf16le=3
343@d XeTeX_input_mode_raw=4
344@d XeTeX_input_mode_icu_mapping=5
345@#
346@d XeTeX_default_input_encoding_code=7 {|str_number| of encoding name if |mode =| ICU}
347@#
348@d XeTeX_tracing_fonts_code=8 {non-zero to log native fonts used}
349@d eTeX_states=9 {number of \eTeX\ state variables in |eqtb|}
350
351@ Different \PASCAL s have slightly different conventions, and the present
352@!@:PASCAL H}{\ph@>
353program expresses \TeX\ in terms of the \PASCAL\ that was
354available to the author in 1982. Constructions that apply to
355this particular compiler, which we shall call \ph, should help the
356reader see how to make an appropriate interface for other systems
357if necessary. (\ph\ is Charles Hedrick's modification of a compiler
358@^Hedrick, Charles Locke@>
359for the DECsystem-10 that was originally developed at the University of
360Hamburg; cf.\ {\sl SOFTWARE---Practice \AM\ Experience \bf6} (1976),
36129--42. The \TeX\ program below is intended to be adaptable, without
362extensive changes, to most other versions of \PASCAL, so it does not fully
363use the admirable features of \ph. Indeed, a conscious effort has been
364made here to avoid using several idiosyncratic features of standard
365\PASCAL\ itself, so that most of the code can be translated mechanically
366into other high-level languages. For example, the `\&{with}' and `\\{new}'
367features are not used, nor are pointer types, set types, or enumerated
368scalar types; there are no `\&{var}' parameters, except in the case of files
369--- \eTeX, however, does use `\&{var}' parameters for the |reverse| function;
370there are no tag fields on variant records; there are no assignments
371|real:=integer|; no procedures are declared local to other procedures.)
372
373The portions of this program that involve system-dependent code, where
374changes might be necessary because of differences between \PASCAL\ compilers
375and/or differences between
376operating systems, can be identified by looking at the sections whose
377numbers are listed under `system dependencies' in the index. Furthermore,
378the index entries for `dirty \PASCAL' list all places where the restrictions
379of \PASCAL\ have not been followed perfectly, for one reason or another.
380@!@^system dependencies@>
381@!@^dirty \PASCAL@>
382
383Incidentally, \PASCAL's standard |round| function can be problematical,
384because it disagrees with the IEEE floating-point standard.
385Many implementors have
386therefore chosen to substitute their own home-grown rounding procedure.
387
388@ The program begins with a normal \PASCAL\ program heading, whose
389components will be filled in later, using the conventions of \.{WEB}.
390@.WEB@>
391For example, the portion of the program called `\X\glob:Global
392variables\X' below will be replaced by a sequence of variable declarations
393that starts in $\section\glob$ of this documentation. In this way, we are able
394to define each individual global variable when we are prepared to
395understand what it means; we do not have to define all of the globals at
396once.  Cross references in $\section\glob$, where it says ``See also
397sections \gglob, \dots,'' also make it possible to look at the set of
398all global variables, if desired.  Similar remarks apply to the other
399portions of the program heading.
400
401Actually the heading shown here is not quite normal: The |program| line
402does not mention any |output| file, because \ph\ would ask the \TeX\ user
403to specify a file name if |output| were specified here.
404@:PASCAL H}{\ph@>
405@^system dependencies@>
406
407@d mtype==t@&y@&p@&e {this is a \.{WEB} coding trick:}
408@f mtype==type {`\&{mtype}' will be equivalent to `\&{type}'}
409@f type==true {but `|type|' will not be treated as a reserved word}
410
411@p @t\4@>@<Compiler directives@>@/
412program TEX; {all file names are defined dynamically}
413label @<Labels in the outer block@>@/
414const @<Constants in the outer block@>@/
415mtype @<Types in the outer block@>@/
416var @<Global variables@>@/
417@#
418procedure initialize; {this procedure gets things started properly}
419  var @<Local variables for initialization@>@/
420  begin @<Initialize whatever \TeX\ might access@>@;
421  end;@#
422@t\4@>@<Basic printing procedures@>@/
423@t\4@>@<Error handling procedures@>@/
424
425@ The overall \TeX\ program begins with the heading just shown, after which
426comes a bunch of procedure declarations and function declarations.
427Finally we will get to the main program, which begins with the
428comment `|start_here|'. If you want to skip down to the
429main program now, you can look up `|start_here|' in the index.
430But the author suggests that the best way to understand this program
431is to follow pretty much the order of \TeX's components as they appear in the
432\.{WEB} description you are now reading, since the present ordering is
433intended to combine the advantages of the ``bottom up'' and ``top down''
434approaches to the problem of understanding a somewhat complicated system.
435
436@ Three labels must be declared in the main program, so we give them
437symbolic names.
438
439@d start_of_TEX=1 {go here when \TeX's variables are initialized}
440@d end_of_TEX=9998 {go here to close files and terminate gracefully}
441@d final_end=9999 {this label marks the ending of the program}
442
443@<Labels in the out...@>=
444start_of_TEX@t\hskip-2pt@>, end_of_TEX@t\hskip-2pt@>,@,final_end;
445  {key control points}
446
447@ Some of the code below is intended to be used only when diagnosing the
448strange behavior that sometimes occurs when \TeX\ is being installed or
449when system wizards are fooling around with \TeX\ without quite knowing
450what they are doing. Such code will not normally be compiled; it is
451delimited by the codewords `$|debug|\ldots|gubed|$', with apologies
452to people who wish to preserve the purity of English.
453
454Similarly, there is some conditional code delimited by
455`$|stat|\ldots|tats|$' that is intended for use when statistics are to be
456kept about \TeX's memory usage.  The |stat| $\ldots$ |tats| code also
457implements diagnostic information for \.{\\tracingparagraphs} and
458\.{\\tracingpages}.
459@^debugging@>
460
461@d debug==@{ {change this to `$\\{debug}\equiv\null$' when debugging}
462@d gubed==@t@>@} {change this to `$\\{gubed}\equiv\null$' when debugging}
463@f debug==begin
464@f gubed==end
465@#
466@d stat==@{ {change this to `$\\{stat}\equiv\null$' when gathering
467  usage statistics}
468@d tats==@t@>@} {change this to `$\\{tats}\equiv\null$' when gathering
469  usage statistics}
470@f stat==begin
471@f tats==end
472
473@ This program has two important variations: (1) There is a long and slow
474version called \.{INITEX}, which does the extra calculations needed to
475@.INITEX@>
476initialize \TeX's internal tables; and (2)~there is a shorter and faster
477production version, which cuts the initialization to a bare minimum.
478Parts of the program that are needed in (1) but not in (2) are delimited by
479the codewords `$|init|\ldots|tini|$'.
480
481@d init== {change this to `$\\{init}\equiv\.{@@\{}$' in the production version}
482@d tini== {change this to `$\\{tini}\equiv\.{@@\}}$' in the production version}
483@f init==begin
484@f tini==end
485
486@<Initialize whatever...@>=
487@<Set initial values of key variables@>@/
488@!init @<Initialize table entries (done by \.{INITEX} only)@>@;@+tini
489
490@ If the first character of a \PASCAL\ comment is a dollar sign,
491\ph\ treats the comment as a list of ``compiler directives'' that will
492affect the translation of this program into machine language.  The
493directives shown below specify full checking and inclusion of the \PASCAL\
494debugger when \TeX\ is being debugged, but they cause range checking and other
495redundant code to be eliminated when the production system is being generated.
496Arithmetic overflow will be detected in all cases.
497@:PASCAL H}{\ph@>
498@^system dependencies@>
499@^overflow in arithmetic@>
500
501@<Compiler directives@>=
502@{@&$C-,A+,D-@} {no range check, catch arithmetic overflow, no debug overhead}
503@!debug @{@&$C+,D+@}@+ gubed {but turn everything on when debugging}
504
505@ This \TeX\ implementation conforms to the rules of the {\sl Pascal User
506@:PASCAL}{\PASCAL@>
507@^system dependencies@>
508Manual} published by Jensen and Wirth in 1975, except where system-dependent
509@^Wirth, Niklaus@>
510@^Jensen, Kathleen@>
511code is necessary to make a useful system program, and except in another
512respect where such conformity would unnecessarily obscure the meaning
513and clutter up the code: We assume that |case| statements may include a
514default case that applies if no matching label is found. Thus, we shall use
515constructions like
516$$\vbox{\halign{\ignorespaces#\hfil\cr
517|case x of|\cr
5181: $\langle\,$code for $x=1\,\rangle$;\cr
5193: $\langle\,$code for $x=3\,\rangle$;\cr
520|othercases| $\langle\,$code for |x<>1| and |x<>3|$\,\rangle$\cr
521|endcases|\cr}}$$
522since most \PASCAL\ compilers have plugged this hole in the language by
523incorporating some sort of default mechanism. For example, the \ph\
524compiler allows `|others|:' as a default label, and other \PASCAL s allow
525syntaxes like `\&{else}' or `\&{otherwise}' or `\\{otherwise}:', etc. The
526definitions of |othercases| and |endcases| should be changed to agree with
527local conventions.  Note that no semicolon appears before |endcases| in
528this program, so the definition of |endcases| should include a semicolon
529if the compiler wants one. (Of course, if no default mechanism is
530available, the |case| statements of \TeX\ will have to be laboriously
531extended by listing all remaining cases. People who are stuck with such
532\PASCAL s have, in fact, done this, successfully but not happily!)
533@:PASCAL H}{\ph@>
534
535@d othercases == others: {default for cases not listed explicitly}
536@d endcases == @+end {follows the default case in an extended |case| statement}
537@f othercases == else
538@f endcases == end
539
540@ The following parameters can be changed at compile time to extend or
541reduce \TeX's capacity. They may have different values in \.{INITEX} and
542in production versions of \TeX.
543@.INITEX@>
544@^system dependencies@>
545
546@<Constants...@>=
547@!mem_max=30000; {greatest index in \TeX's internal |mem| array;
548  must be strictly less than |max_halfword|;
549  must be equal to |mem_top| in \.{INITEX}, otherwise |>=mem_top|}
550@!mem_min=0; {smallest index in \TeX's internal |mem| array;
551  must be |min_halfword| or more;
552  must be equal to |mem_bot| in \.{INITEX}, otherwise |<=mem_bot|}
553@!buf_size=500; {maximum number of characters simultaneously present in
554  current lines of open files and in control sequences between
555  \.{\\csname} and \.{\\endcsname}; must not exceed |max_halfword|}
556@!error_line=72; {width of context lines on terminal error messages}
557@!half_error_line=42; {width of first lines of contexts in terminal
558  error messages; should be between 30 and |error_line-15|}
559@!max_print_line=79; {width of longest text lines output; should be at least 60}
560@!stack_size=200; {maximum number of simultaneous input sources}
561@!max_in_open=6; {maximum number of input files and error insertions that
562  can be going on simultaneously}
563@!font_max=75; {maximum internal font number; must not exceed |max_quarterword|
564  and must be at most |font_base+256|}
565@!font_mem_size=20000; {number of words of |font_info| for all fonts}
566@!param_size=60; {maximum number of simultaneous macro parameters}
567@!nest_size=40; {maximum number of semantic levels simultaneously active}
568@!max_strings=3000; {maximum number of strings; must not exceed |max_halfword|}
569@!string_vacancies=8000; {the minimum number of characters that should be
570  available for the user's control sequences and font names,
571  after \TeX's own error messages are stored}
572@!pool_size=32000; {maximum number of characters in strings, including all
573  error messages and help texts, and the names of all fonts and
574  control sequences; must exceed |string_vacancies| by the total
575  length of \TeX's own strings, which is currently about 23000}
576@!save_size=600; {space for saving values outside of current group; must be
577  at most |max_halfword|}
578@!trie_size=8000; {space for hyphenation patterns; should be larger for
579  \.{INITEX} than it is in production versions of \TeX}
580@!trie_op_size=500; {space for ``opcodes'' in the hyphenation patterns}
581@!dvi_buf_size=800; {size of the output buffer; must be a multiple of 8}
582@!file_name_size=40; {file names shouldn't be longer than this}
583@!pool_name='TeXformats:TEX.POOL                     ';
584  {string of length |file_name_size|; tells where the string pool appears}
585@.TeXformats@>
586
587@ Like the preceding parameters, the following quantities can be changed
588at compile time to extend or reduce \TeX's capacity. But if they are changed,
589it is necessary to rerun the initialization program \.{INITEX}
590@.INITEX@>
591to generate new tables for the production \TeX\ program.
592One can't simply make helter-skelter changes to the following constants,
593since certain rather complex initialization
594numbers are computed from them. They are defined here using
595\.{WEB} macros, instead of being put into \PASCAL's |const| list, in order to
596emphasize this distinction.
597
598@d mem_bot=0 {smallest index in the |mem| array dumped by \.{INITEX};
599  must not be less than |mem_min|}
600@d mem_top==30000 {largest index in the |mem| array dumped by \.{INITEX};
601  must be substantially larger than |mem_bot|
602  and not greater than |mem_max|}
603@d font_base=0 {smallest internal font number; must not be less
604  than |min_quarterword|}
605@d hash_size=2100 {maximum number of control sequences; it should be at most
606  about |(mem_max-mem_min)/10|}
607@d hash_prime=1777 {a prime number equal to about 85\pct! of |hash_size|}
608@d hyph_size=307 {another prime; the number of \.{\\hyphenation} exceptions}
609@d biggest_char=65535 {the largest allowed character number;
610   must be |<=max_quarterword|, this refers to UTF16 codepoints that we store in strings, etc;
611 actual character codes can exceed this range, up to |biggest_usv|}
612@d too_big_char=65536 {|biggest_char+1|}
613@d special_char=65537 {|biggest_char+2|}
614@d number_chars=65536 {|biggest_char+1|}
615@d biggest_usv=@"10FFFF {the largest Unicode Scalar Value}
616@d too_big_usv=@"110000 {|biggest_usv+1|}
617@d number_usvs=@"110000 {|biggest_usv+1|}
618@d biggest_reg=255 {the largest allowed register number;
619   must be |<=max_quarterword|}
620@d number_regs=256 {|biggest_reg+1|}
621@d font_biggest=255 {the real biggest font}
622@d number_fonts=font_biggest-font_base+2
623@d number_math_families=256
624@d number_math_fonts=number_math_families+number_math_families+number_math_families
625@d math_font_biggest=number_math_fonts-1
626@d text_size=0 {size code for the largest size in a family}
627@d script_size=number_math_families {size code for the medium size in a family}
628@d script_script_size=number_math_families+number_math_families {size code for the smallest size in a family}
629@d biggest_lang=255 {the largest hyphenation language}
630@d too_big_lang=256 {|biggest_lang+1|}
631@^system dependencies@>
632
633@ In case somebody has inadvertently made bad settings of the ``constants,''
634\TeX\ checks them using a global variable called |bad|.
635
636This is the first of many sections of \TeX\ where global variables are
637defined.
638
639@<Glob...@>=
640@!bad:integer; {is some ``constant'' wrong?}
641
642@ Later on we will say `\ignorespaces|if mem_max>=max_halfword then bad:=14|',
643or something similar. (We can't do that until |max_halfword| has been defined.)
644
645@<Check the ``constant'' values for consistency@>=
646bad:=0;
647if (half_error_line<30)or(half_error_line>error_line-15) then bad:=1;
648if max_print_line<60 then bad:=2;
649if dvi_buf_size mod 8<>0 then bad:=3;
650if mem_bot+1100>mem_top then bad:=4;
651if hash_prime>hash_size then bad:=5;
652if max_in_open>=128 then bad:=6;
653if mem_top<256+11 then bad:=7; {we will want |null_list>255|}
654
655@ Labels are given symbolic names by the following definitions, so that
656occasional |goto| statements will be meaningful. We insert the label
657`|exit|' just before the `\ignorespaces|end|\unskip' of a procedure in
658which we have used the `|return|' statement defined below; the label
659`|restart|' is occasionally used at the very beginning of a procedure; and
660the label `|reswitch|' is occasionally used just prior to a |case|
661statement in which some cases change the conditions and we wish to branch
662to the newly applicable case.  Loops that are set up with the |loop|
663construction defined below are commonly exited by going to `|done|' or to
664`|found|' or to `|not_found|', and they are sometimes repeated by going to
665`|continue|'.  If two or more parts of a subroutine start differently but
666end up the same, the shared code may be gathered together at
667`|common_ending|'.
668
669Incidentally, this program never declares a label that isn't actually used,
670because some fussy \PASCAL\ compilers will complain about redundant labels.
671
672@d exit=10 {go here to leave a procedure}
673@d restart=20 {go here to start a procedure again}
674@d reswitch=21 {go here to start a case statement again}
675@d continue=22 {go here to resume a loop}
676@d done=30 {go here to exit a loop}
677@d done1=31 {like |done|, when there is more than one loop}
678@d done2=32 {for exiting the second loop in a long block}
679@d done3=33 {for exiting the third loop in a very long block}
680@d done4=34 {for exiting the fourth loop in an extremely long block}
681@d done5=35 {for exiting the fifth loop in an immense block}
682@d done6=36 {for exiting the sixth loop in a block}
683@d found=40 {go here when you've found it}
684@d found1=41 {like |found|, when there's more than one per routine}
685@d found2=42 {like |found|, when there's more than two per routine}
686@d not_found=45 {go here when you've found nothing}
687@d not_found1=46 {like |not_found|, when there's more than one}
688@d not_found2=47 {like |not_found|, when there's more than two}
689@d not_found3=48 {like |not_found|, when there's more than three}
690@d not_found4=49 {like |not_found|, when there's more than four}
691@d common_ending=50 {go here when you want to merge with another branch}
692
693@ Here are some macros for common programming idioms.
694
695@d incr(#) == #:=#+1 {increase a variable by unity}
696@d decr(#) == #:=#-1 {decrease a variable by unity}
697@d negate(#) == #:=-# {change the sign of a variable}
698@d loop == @+ while true do@+ {repeat over and over until a |goto| happens}
699@f loop == xclause
700  {\.{WEB}'s |xclause| acts like `\ignorespaces|while true do|\unskip'}
701@d do_nothing == {empty statement}
702@d return == goto exit {terminate a procedure call}
703@f return == nil
704@d empty=0 {symbolic name for a null constant}
705
706@* \[2] The character set.
707In order to make \TeX\ readily portable to a wide variety of
708computers, all of its input text is converted to an internal eight-bit
709code that includes standard ASCII, the ``American Standard Code for
710Information Interchange.''  This conversion is done immediately when each
711character is read in. Conversely, characters are converted from ASCII to
712the user's external representation just before they are output to a
713text file.
714
715Such an internal code is relevant to users of \TeX\ primarily because it
716governs the positions of characters in the fonts. For example, the
717character `\.A' has ASCII code $65=@'101$, and when \TeX\ typesets
718this letter it specifies character number 65 in the current font.
719If that font actually has `\.A' in a different position, \TeX\ doesn't
720know what the real position is; the program that does the actual printing from
721\TeX's device-independent files is responsible for converting from ASCII to
722a particular font encoding.
723@^ASCII code@>
724
725\TeX's internal code also defines the value of constants
726that begin with a reverse apostrophe; and it provides an index to the
727\.{\\catcode}, \.{\\mathcode}, \.{\\uccode}, \.{\\lccode}, and \.{\\delcode}
728tables.
729
730@ Characters of text that have been converted to \TeX's internal form
731are said to be of type |ASCII_code|, which is a subrange of the integers.
732For xetex, we rename |ASCII_code| as |UTF16_code|. But we also have a
733new type |UTF8_code|, used when we construct filenames to pass to the
734system libraries.
735
736@d ASCII_code==UTF16_code
737@d packed_ASCII_code==packed_UTF16_code
738
739@<Types...@>=
740@!ASCII_code=0..biggest_char; {16-bit numbers}
741@!UTF8_code=0..255; {8-bit numbers}
742@!UnicodeScalar=0..biggest_usv; {Unicode scalars}
743
744@ The original \PASCAL\ compiler was designed in the late 60s, when six-bit
745character sets were common, so it did not make provision for lowercase
746letters. Nowadays, of course, we need to deal with both capital and small
747letters in a convenient way, especially in a program for typesetting;
748so the present specification of \TeX\ has been written under the assumption
749that the \PASCAL\ compiler and run-time system permit the use of text files
750with more than 64 distinguishable characters. More precisely, we assume that
751the character set contains at least the letters and symbols associated
752with ASCII codes @'40 through @'176; all of these characters are now
753available on most computer terminals.
754
755Since we are dealing with more characters than were present in the first
756\PASCAL\ compilers, we have to decide what to call the associated data
757type. Some \PASCAL s use the original name |char| for the
758characters in text files, even though there now are more than 64 such
759characters, while other \PASCAL s consider |char| to be a 64-element
760subrange of a larger data type that has some other name.
761
762In order to accommodate this difference, we shall use the name |text_char|
763to stand for the data type of the characters that are converted to and
764from |ASCII_code| when they are input and output. We shall also assume
765that |text_char| consists of the elements |chr(first_text_char)| through
766|chr(last_text_char)|, inclusive. The following definitions should be
767adjusted if necessary.
768@^system dependencies@>
769
770@d text_char == char {the data type of characters in text files}
771@d first_text_char=0 {ordinal number of the smallest element of |text_char|}
772@d last_text_char=biggest_char {ordinal number of the largest element of |text_char|}
773
774@<Local variables for init...@>=
775@!i:integer;
776
777@ The \TeX\ processor converts between ASCII code and
778the user's external character set by means of arrays |xord| and |xchr|
779that are analogous to \PASCAL's |ord| and |chr| functions.
780
781@<Glob...@>=
782@!xchr: array [ASCII_code] of text_char;
783  {specifies conversion of output characters}
784
785@ Since we are assuming that our \PASCAL\ system is able to read and
786write the visible characters of standard ASCII (although not
787necessarily using the ASCII codes to represent them), the following
788assignment statements initialize the standard part of the |xchr| array
789properly, without needing any system-dependent changes. On the other
790hand, it is possible to implement \TeX\ with less complete character
791sets, and in such cases it will be necessary to change something here.
792@^system dependencies@>
793
794@ Some of the ASCII codes without visible characters have been given symbolic
795names in this program because they are used with a special meaning.
796
797@d null_code=@'0 {ASCII code that might disappear}
798@d carriage_return=@'15 {ASCII code used at end of line}
799@d invalid_code=@'177 {ASCII code that many systems prohibit in text files}
800
801@ The ASCII code is ``standard'' only to a certain extent, since many
802computer installations have found it advantageous to have ready access
803to more than 94 printing characters. Appendix~C of {\sl The \TeX book\/}
804gives a complete specification of the intended correspondence between
805characters and \TeX's internal representation.
806@:TeXbook}{\sl The \TeX book@>
807
808If \TeX\ is being used
809on a garden-variety \PASCAL\ for which only standard ASCII
810codes will appear in the input and output files, it doesn't really matter
811what codes are specified in |xchr[0..@'37]|, but the safest policy is to
812blank everything out by using the code shown below.
813
814However, other settings of |xchr| will make \TeX\ more friendly on
815computers that have an extended character set, so that users can type things
816like `\.^^Z' instead of `\.{\\ne}'. People with extended character sets can
817assign codes arbitrarily, giving an |xchr| equivalent to whatever
818characters the users of \TeX\ are allowed to have in their input files.
819It is best to make the codes correspond to the intended interpretations as
820shown in Appendix~C whenever possible; but this is not necessary. For
821example, in countries with an alphabet of more than 26 letters, it is
822usually best to map the additional letters into codes less than~@'40.
823To get the most ``permissive'' character set, change |' '| on the
824right of these assignment statements to |chr(i)|.
825@^character set dependencies@>
826@^system dependencies@>
827
828@<Set init...@>=
829for i:=0 to @'37 do xchr[i]:=' ';
830for i:=@'177 to @'377 do xchr[i]:=' ';
831
832@ The following system-independent code makes the |xord| array contain a
833suitable inverse to the information in |xchr|. Note that if |xchr[i]=xchr[j]|
834where |i<j<@'177|, the value of |xord[xchr[i]]| will turn out to be
835|j| or more; hence, standard ASCII code numbers will be used instead of
836codes below @'40 in case there is a coincidence.
837
838@<Set init...@>=
839for i:=0 to @'176 do xord[xchr[i]]:=i;
840
841@* \[3] Input and output.
842The bane of portability is the fact that different operating systems treat
843input and output quite differently, perhaps because computer scientists
844have not given sufficient attention to this problem. People have felt somehow
845that input and output are not part of ``real'' programming. Well, it is true
846that some kinds of programming are more fun than others. With existing
847input/output conventions being so diverse and so messy, the only sources of
848joy in such parts of the code are the rare occasions when one can find a
849way to make the program a little less bad than it might have been. We have
850two choices, either to attack I/O now and get it over with, or to postpone
851I/O until near the end. Neither prospect is very attractive, so let's
852get it over with.
853
854The basic operations we need to do are (1)~inputting and outputting of
855text, to or from a file or the user's terminal; (2)~inputting and
856outputting of eight-bit bytes, to or from a file; (3)~instructing the
857operating system to initiate (``open'') or to terminate (``close'') input or
858output from a specified file; (4)~testing whether the end of an input
859file has been reached.
860
861\TeX\ needs to deal with two kinds of files.
862We shall use the term |alpha_file| for a file that contains textual data,
863and the term |byte_file| for a file that contains eight-bit binary information.
864These two types turn out to be the same on many computers, but
865sometimes there is a significant distinction, so we shall be careful to
866distinguish between them. Standard protocols for transferring
867such files from computer to computer, via high-speed networks, are
868now becoming available to more and more communities of users.
869
870The program actually makes use also of a third kind of file, called a
871|word_file|, when dumping and reloading base information for its own
872initialization.  We shall define a word file later; but it will be possible
873for us to specify simple operations on word files before they are defined.
874
875@<Types...@>=
876@!eight_bits=0..255; {unsigned one-byte quantity}
877@!alpha_file=packed file of text_char; {files that contain textual data}
878@!byte_file=packed file of eight_bits; {files that contain binary data}
879
880@ Most of what we need to do with respect to input and output can be handled
881by the I/O facilities that are standard in \PASCAL, i.e., the routines
882called |get|, |put|, |eof|, and so on. But
883standard \PASCAL\ does not allow file variables to be associated with file
884names that are determined at run time, so it cannot be used to implement
885\TeX; some sort of extension to \PASCAL's ordinary |reset| and |rewrite|
886is crucial for our purposes. We shall assume that |name_of_file| is a variable
887of an appropriate type such that the \PASCAL\ run-time system being used to
888implement \TeX\ can open a file whose external name is specified by
889|name_of_file|.
890@^system dependencies@>
891
892@<Glob...@>=
893@!name_of_file:packed array[1..file_name_size] of char;@;@/
894  {on some systems this may be a \&{record} variable}
895@!name_of_file16:array[1..file_name_size] of UTF16_code;@;@/
896  {but sometimes we need a UTF16 version of the name}
897@!name_length:0..file_name_size;@/{this many characters are actually
898  relevant in |name_of_file| (the rest are blank)}
899@!name_length16:0..file_name_size;
900
901@ The \ph\ compiler with which the present version of \TeX\ was prepared has
902extended the rules of \PASCAL\ in a very convenient way. To open file~|f|,
903we can write
904$$\vbox{\halign{#\hfil\qquad&#\hfil\cr
905|reset(f,@t\\{name}@>,'/O')|&for input;\cr
906|rewrite(f,@t\\{name}@>,'/O')|&for output.\cr}}$$
907The `\\{name}' parameter, which is of type `{\bf packed array
908$[\langle\\{any}\rangle]$ of \\{char}}', stands for the name of
909the external file that is being opened for input or output.
910Blank spaces that might appear in \\{name} are ignored.
911
912The `\.{/O}' parameter tells the operating system not to issue its own
913error messages if something goes wrong. If a file of the specified name
914cannot be found, or if such a file cannot be opened for some other reason
915(e.g., someone may already be trying to write the same file), we will have
916|@!erstat(f)<>0| after an unsuccessful |reset| or |rewrite|.  This allows
917\TeX\ to undertake appropriate corrective action.
918@:PASCAL H}{\ph@>
919@^system dependencies@>
920
921\TeX's file-opening procedures return |false| if no file identified by
922|name_of_file| could be opened.
923
924@d reset_OK(#)==erstat(#)=0
925@d rewrite_OK(#)==erstat(#)=0
926
927@p function a_open_in(var f:alpha_file):boolean;
928  {open a text file for input}
929begin reset(f,name_of_file,'/O'); a_open_in:=reset_OK(f);
930end;
931@#
932function a_open_out(var f:alpha_file):boolean;
933  {open a text file for output}
934begin rewrite(f,name_of_file,'/O'); a_open_out:=rewrite_OK(f);
935end;
936@#
937function b_open_in(var f:byte_file):boolean;
938  {open a binary file for input}
939begin reset(f,name_of_file,'/O'); b_open_in:=reset_OK(f);
940end;
941@#
942function b_open_out(var f:byte_file):boolean;
943  {open a binary file for output}
944begin rewrite(f,name_of_file,'/O'); b_open_out:=rewrite_OK(f);
945end;
946@#
947function w_open_in(var f:word_file):boolean;
948  {open a word file for input}
949begin reset(f,name_of_file,'/O'); w_open_in:=reset_OK(f);
950end;
951@#
952function w_open_out(var f:word_file):boolean;
953  {open a word file for output}
954begin rewrite(f,name_of_file,'/O'); w_open_out:=rewrite_OK(f);
955end;
956
957@ Files can be closed with the \ph\ routine `|close(f)|', which
958@:PASCAL H}{\ph@>
959@^system dependencies@>
960should be used when all input or output with respect to |f| has been completed.
961This makes |f| available to be opened again, if desired; and if |f| was used for
962output, the |close| operation makes the corresponding external file appear
963on the user's area, ready to be read.
964
965These procedures should not generate error messages if a file is
966being closed before it has been successfully opened.
967
968@p procedure a_close(var f:alpha_file); {close a text file}
969begin close(f);
970end;
971@#
972procedure b_close(var f:byte_file); {close a binary file}
973begin close(f);
974end;
975@#
976procedure w_close(var f:word_file); {close a word file}
977begin close(f);
978end;
979
980@ Binary input and output are done with \PASCAL's ordinary |get| and |put|
981procedures, so we don't have to make any other special arrangements for
982binary~I/O. Text output is also easy to do with standard \PASCAL\ routines.
983The treatment of text input is more difficult, however, because
984of the necessary translation to |ASCII_code| values.
985\TeX's conventions should be efficient, and they should
986blend nicely with the user's operating environment.
987
988@ Input from text files is read one line at a time, using a routine called
989|input_ln|. This function is defined in terms of global variables called
990|buffer|, |first|, and |last| that will be described in detail later; for
991now, it suffices for us to know that |buffer| is an array of |ASCII_code|
992values, and that |first| and |last| are indices into this array
993representing the beginning and ending of a line of text.
994
995@<Glob...@>=
996@!buffer:array[0..buf_size] of ASCII_code; {lines of characters being read}
997@!first:0..buf_size; {the first unused position in |buffer|}
998@!last:0..buf_size; {end of the line just input to |buffer|}
999@!max_buf_stack:0..buf_size; {largest index used in |buffer|}
1000
1001@ The |input_ln| function brings the next line of input from the specified
1002file into available positions of the buffer array and returns the value
1003|true|, unless the file has already been entirely read, in which case it
1004returns |false| and sets |last:=first|.  In general, the |ASCII_code|
1005numbers that represent the next line of the file are input into
1006|buffer[first]|, |buffer[first+1]|, \dots, |buffer[last-1]|; and the
1007global variable |last| is set equal to |first| plus the length of the
1008line. Trailing blanks are removed from the line; thus, either |last=first|
1009(in which case the line was entirely blank) or |buffer[last-1]<>" "|.
1010
1011An overflow error is given, however, if the normal actions of |input_ln|
1012would make |last>=buf_size|; this is done so that other parts of \TeX\
1013can safely look at the contents of |buffer[last+1]| without overstepping
1014the bounds of the |buffer| array. Upon entry to |input_ln|, the condition
1015|first<buf_size| will always hold, so that there is always room for an
1016``empty'' line.
1017
1018The variable |max_buf_stack|, which is used to keep track of how large
1019the |buf_size| parameter must be to accommodate the present job, is
1020also kept up to date by |input_ln|.
1021
1022If the |bypass_eoln| parameter is |true|, |input_ln| will do a |get|
1023before looking at the first character of the line; this skips over
1024an |eoln| that was in |f^|. The procedure does not do a |get| when it
1025reaches the end of the line; therefore it can be used to acquire input
1026from the user's terminal as well as from ordinary text files.
1027
1028Standard \PASCAL\ says that a file should have |eoln| immediately
1029before |eof|, but \TeX\ needs only a weaker restriction: If |eof|
1030occurs in the middle of a line, the system function |eoln| should return
1031a |true| result (even though |f^| will be undefined).
1032
1033Since the inner loop of |input_ln| is part of \TeX's ``inner loop''---each
1034character of input comes in at this place---it is wise to reduce system
1035overhead by making use of special routines that read in an entire array
1036of characters at once, if such routines are available. The following
1037code uses standard \PASCAL\ to illustrate what needs to be done, but
1038finer tuning is often possible at well-developed \PASCAL\ sites.
1039@^inner loop@>
1040
1041@p function input_ln(var f:alpha_file;@!bypass_eoln:boolean):boolean;
1042  {inputs the next line or returns |false|}
1043var last_nonblank:0..buf_size; {|last| with trailing blanks removed}
1044begin if bypass_eoln then if not eof(f) then get(f);
1045  {input the first character of the line into |f^|}
1046last:=first; {cf.\ Matthew 19\thinspace:\thinspace30}
1047if eof(f) then input_ln:=false
1048else  begin last_nonblank:=first;
1049  while not eoln(f) do
1050    begin if last>=max_buf_stack then
1051      begin max_buf_stack:=last+1;
1052      if max_buf_stack=buf_size then
1053        @<Report overflow of the input buffer, and abort@>;
1054      end;
1055    buffer[last]:=xord[f^]; get(f); incr(last);
1056    if buffer[last-1]<>" " then last_nonblank:=last;
1057    end;
1058  last:=last_nonblank; input_ln:=true;
1059  end;
1060end;
1061
1062@ The user's terminal acts essentially like other files of text, except
1063that it is used both for input and for output. When the terminal is
1064considered an input file, the file variable is called |term_in|, and when it
1065is considered an output file the file variable is |term_out|.
1066@^system dependencies@>
1067
1068@<Glob...@>=
1069@!term_in:alpha_file; {the terminal as an input file}
1070@!term_out:alpha_file; {the terminal as an output file}
1071
1072@ Here is how to open the terminal files
1073in \ph. The `\.{/I}' switch suppresses the first |get|.
1074@:PASCAL H}{\ph@>
1075@^system dependencies@>
1076
1077@d t_open_in==reset(term_in,'TTY:','/O/I') {open the terminal for text input}
1078@d t_open_out==rewrite(term_out,'TTY:','/O') {open the terminal for text output}
1079
1080@ Sometimes it is necessary to synchronize the input/output mixture that
1081happens on the user's terminal, and three system-dependent
1082procedures are used for this
1083purpose. The first of these, |update_terminal|, is called when we want
1084to make sure that everything we have output to the terminal so far has
1085actually left the computer's internal buffers and been sent.
1086The second, |clear_terminal|, is called when we wish to cancel any
1087input that the user may have typed ahead (since we are about to
1088issue an unexpected error message). The third, |wake_up_terminal|,
1089is supposed to revive the terminal if the user has disabled it by
1090some instruction to the operating system.  The following macros show how
1091these operations can be specified in \ph:
1092@:PASCAL H}{\ph@>
1093@^system dependencies@>
1094
1095@d update_terminal == break(term_out) {empty the terminal output buffer}
1096@d clear_terminal == break_in(term_in,true) {clear the terminal input buffer}
1097@d wake_up_terminal == do_nothing {cancel the user's cancellation of output}
1098
1099@ We need a special routine to read the first line of \TeX\ input from
1100the user's terminal. This line is different because it is read before we
1101have opened the transcript file; there is sort of a ``chicken and
1102egg'' problem here. If the user types `\.{\\input paper}' on the first
1103line, or if some macro invoked by that line does such an \.{\\input},
1104the transcript file will be named `\.{paper.log}'; but if no \.{\\input}
1105commands are performed during the first line of terminal input, the transcript
1106file will acquire its default name `\.{texput.log}'. (The transcript file
1107will not contain error messages generated by the first line before the
1108first \.{\\input} command.)
1109@.texput@>
1110
1111The first line is even more special if we are lucky enough to have an operating
1112system that treats \TeX\ differently from a run-of-the-mill \PASCAL\ object
1113program. It's nice to let the user start running a \TeX\ job by typing
1114a command line like `\.{tex paper}'; in such a case, \TeX\ will operate
1115as if the first line of input were `\.{paper}', i.e., the first line will
1116consist of the remainder of the command line, after the part that invoked
1117\TeX.
1118
1119The first line is special also because it may be read before \TeX\ has
1120input a format file. In such cases, normal error messages cannot yet
1121be given. The following code uses concepts that will be explained later.
1122(If the \PASCAL\ compiler does not support non-local |@!goto|\unskip, the
1123@^system dependencies@>
1124statement `|goto final_end|' should be replaced by something that
1125quietly terminates the program.)
1126
1127@<Report overflow of the input buffer, and abort@>=
1128if format_ident=0 then
1129  begin write_ln(term_out,'Buffer size exceeded!'); goto final_end;
1130@.Buffer size exceeded@>
1131  end
1132else begin cur_input.loc_field:=first; cur_input.limit_field:=last-1;
1133  overflow("buffer size",buf_size);
1134@:TeX capacity exceeded buffer size}{\quad buffer size@>
1135  end
1136
1137@ Different systems have different ways to get started. But regardless of
1138what conventions are adopted, the routine that initializes the terminal
1139should satisfy the following specifications:
1140
1141\yskip\textindent{1)}It should open file |term_in| for input from the
1142  terminal. (The file |term_out| will already be open for output to the
1143  terminal.)
1144
1145\textindent{2)}If the user has given a command line, this line should be
1146  considered the first line of terminal input. Otherwise the
1147  user should be prompted with `\.{**}', and the first line of input
1148  should be whatever is typed in response.
1149
1150\textindent{3)}The first line of input, which might or might not be a
1151  command line, should appear in locations |first| to |last-1| of the
1152  |buffer| array.
1153
1154\textindent{4)}The global variable |loc| should be set so that the
1155  character to be read next by \TeX\ is in |buffer[loc]|. This
1156  character should not be blank, and we should have |loc<last|.
1157
1158\yskip\noindent(It may be necessary to prompt the user several times
1159before a non-blank line comes in. The prompt is `\.{**}' instead of the
1160later `\.*' because the meaning is slightly different: `\.{\\input}' need
1161not be typed immediately after~`\.{**}'.)
1162
1163@d loc==cur_input.loc_field {location of first unread character in |buffer|}
1164
1165@ The following program does the required initialization
1166without retrieving a possible command line.
1167It should be clear how to modify this routine to deal with command lines,
1168if the system permits them.
1169@^system dependencies@>
1170
1171@p function init_terminal:boolean; {gets the terminal input started}
1172label exit;
1173begin t_open_in;
1174loop@+begin wake_up_terminal; write(term_out,'**'); update_terminal;
1175@.**@>
1176  if not input_ln(term_in,true) then {this shouldn't happen}
1177    begin write_ln(term_out);
1178    write(term_out,'! End of file on the terminal... why?');
1179@.End of file on the terminal@>
1180    init_terminal:=false; return;
1181    end;
1182  loc:=first;
1183  while (loc<last)and(buffer[loc]=" ") do incr(loc);
1184  if loc<last then
1185    begin init_terminal:=true;
1186    return; {return unless the line was all blank}
1187    end;
1188  write_ln(term_out,'Please type the name of your input file.');
1189  end;
1190exit:end;
1191
1192@* \[4] String handling.
1193Control sequence names and diagnostic messages are variable-length strings
1194of eight-bit characters. Since \PASCAL\ does not have a well-developed string
1195mechanism, \TeX\ does all of its string processing by homegrown methods.
1196
1197Elaborate facilities for dynamic strings are not needed, so all of the
1198necessary operations can be handled with a simple data structure.
1199The array |str_pool| contains all of the (eight-bit) ASCII codes in all
1200of the strings, and the array |str_start| contains indices of the starting
1201points of each string. Strings are referred to by integer numbers, so that
1202string number |s| comprises the characters |str_pool[j]| for
1203|str_start_macro[s]<=j<str_start_macro[s+1]|. Additional integer variables
1204|pool_ptr| and |str_ptr| indicate the number of entries used so far
1205in |str_pool| and |str_start|, respectively; locations
1206|str_pool[pool_ptr]| and |str_start_macro[str_ptr]| are
1207ready for the next string to be allocated.
1208
1209String numbers 0 to 255 are reserved for strings that correspond to single
1210ASCII characters. This is in accordance with the conventions of \.{WEB},
1211@.WEB@>
1212which converts single-character strings into the ASCII code number of the
1213single character involved, while it converts other strings into integers
1214and builds a string pool file. Thus, when the string constant \.{"."} appears
1215in the program below, \.{WEB} converts it into the integer 46, which is the
1216ASCII code for a period, while \.{WEB} will convert a string like \.{"hello"}
1217into some integer greater than~255. String number 46 will presumably be the
1218single character `\..'; but some ASCII codes have no standard visible
1219representation, and \TeX\ sometimes needs to be able to print an arbitrary
1220ASCII character, so the first 256 strings are used to specify exactly what
1221should be printed for each of the 256 possibilities.
1222
1223Elements of the |str_pool| array must be ASCII codes that can actually
1224be printed; i.e., they must have an |xchr| equivalent in the local
1225character set. (This restriction applies only to preloaded strings,
1226not to those generated dynamically by the user.)
1227
1228Some \PASCAL\ compilers won't pack integers into a single byte unless the
1229integers lie in the range |-128..127|. To accommodate such systems
1230we access the string pool only via macros that can easily be redefined.
1231@^system dependencies@>
1232
1233@d si(#) == # {convert from |ASCII_code| to |packed_ASCII_code|}
1234@d so(#) == # {convert from |packed_ASCII_code| to |ASCII_code|}
1235@d str_start_macro(#) == str_start[(#) - too_big_char]
1236
1237@<Types...@>=
1238@!pool_pointer = 0..pool_size; {for variables that point into |str_pool|}
1239@!str_number = 0..max_strings; {for variables that point into |str_start|}
1240@!packed_ASCII_code = 0..biggest_char; {elements of |str_pool| array}
1241
1242@ @<Glob...@>=
1243@!str_pool:packed array[pool_pointer] of packed_ASCII_code; {the characters}
1244@!str_start : array[str_number] of pool_pointer; {the starting pointers}
1245@!pool_ptr : pool_pointer; {first unused position in |str_pool|}
1246@!str_ptr : str_number; {number of the current string being created}
1247@!init_pool_ptr : pool_pointer; {the starting value of |pool_ptr|}
1248@!init_str_ptr : str_number; {the starting value of |str_ptr|}
1249
1250@ Several of the elementary string operations are performed using \.{WEB}
1251macros instead of \PASCAL\ procedures, because many of the
1252operations are done quite frequently and we want to avoid the
1253overhead of procedure calls. For example, here is
1254a simple macro that computes the length of a string.
1255@.WEB@>
1256
1257@p function length(s:str_number):integer;
1258   {the number of characters in string number |s|}
1259begin if (s>=@"10000) then length:=str_start_macro(s+1)-str_start_macro(s)
1260else if (s>=@"20) and (s<@"7F) then length:=1
1261else if (s<=@"7F) then length:=3
1262else if (s<@"100) then length:=4
1263else length:=8
1264end;
1265
1266@ The length of the current string is called |cur_length|:
1267
1268@d cur_length == (pool_ptr - str_start_macro(str_ptr))
1269
1270@ Strings are created by appending character codes to |str_pool|.
1271The |append_char| macro, defined here, does not check to see if the
1272value of |pool_ptr| has gotten too high; this test is supposed to be
1273made before |append_char| is used. There is also a |flush_char|
1274macro, which erases the last character appended.
1275
1276To test if there is room to append |l| more characters to |str_pool|,
1277we shall write |str_room(l)|, which aborts \TeX\ and gives an
1278apologetic error message if there isn't enough room.
1279
1280@d append_char(#) == {put |ASCII_code| \# at the end of |str_pool|}
1281begin str_pool[pool_ptr]:=si(#); incr(pool_ptr);
1282end
1283@d flush_char == decr(pool_ptr) {forget the last character in the pool}
1284@d str_room(#) == {make sure that the pool hasn't overflowed}
1285  begin if pool_ptr+# > pool_size then
1286  overflow("pool size",pool_size-init_pool_ptr);
1287@:TeX capacity exceeded pool size}{\quad pool size@>
1288  end
1289
1290@ Once a sequence of characters has been appended to |str_pool|, it
1291officially becomes a string when the function |make_string| is called.
1292This function returns the identification number of the new string as its
1293value.
1294
1295@p function make_string : str_number; {current string enters the pool}
1296begin if str_ptr=max_strings then
1297  overflow("number of strings",max_strings-init_str_ptr);
1298@:TeX capacity exceeded number of strings}{\quad number of strings@>
1299incr(str_ptr); str_start_macro(str_ptr):=pool_ptr;
1300make_string:=str_ptr-1;
1301end;
1302
1303@ To destroy the most recently made string, we say |flush_string|.
1304
1305@d flush_string==begin decr(str_ptr); pool_ptr:=str_start_macro(str_ptr);
1306  end
1307
1308@p procedure append_str(@!s:str_number); { append an existing string to the current string }
1309var i: integer;
1310    j: pool_pointer;
1311begin
1312  i:=length(s);
1313  str_room(i);
1314  j:=str_start_macro(s);
1315  while (i > 0) do begin
1316    append_char(str_pool[j]);
1317    incr(j); decr(i);
1318    end;
1319end;
1320
1321@ The following subroutine compares string |s| with another string of the
1322same length that appears in |buffer| starting at position |k|;
1323the result is |true| if and only if the strings are equal.
1324Empirical tests indicate that |str_eq_buf| is used in such a way that
1325it tends to return |true| about 80 percent of the time.
1326
1327@p function str_eq_buf(@!s:str_number;@!k:integer):boolean;
1328  {test equality of strings}
1329label not_found; {loop exit}
1330var j: pool_pointer; {running index}
1331@!result: boolean; {result of comparison}
1332begin j:=str_start_macro(s);
1333while j<str_start_macro(s+1) do
1334  begin
1335    if buffer[k]>=@"10000 then
1336      if so(str_pool[j])<>@"D800+(buffer[k]-@"10000)div@"400 then
1337        begin result:=false; goto not_found;
1338        end
1339      else if so(str_pool[j+1])<>@"DC00+(buffer[k]-@"10000)mod@"400 then
1340        begin result:=false; goto not_found;
1341        end
1342      else incr(j)
1343    else if so(str_pool[j])<>buffer[k] then
1344    begin result:=false; goto not_found;
1345    end;
1346  incr(j); incr(k);
1347  end;
1348result:=true;
1349not_found: str_eq_buf:=result;
1350end;
1351
1352@ Here is a similar routine, but it compares two strings in the string pool,
1353and it does not assume that they have the same length.
1354
1355@p function str_eq_str(@!s,@!t:str_number):boolean;
1356  {test equality of strings}
1357label not_found; {loop exit}
1358var j,@!k: pool_pointer; {running indices}
1359@!result: boolean; {result of comparison}
1360begin result:=false;
1361if length(s)<>length(t) then goto not_found;
1362if (length(s)=1) then begin
1363  if s<65536 then begin
1364    if t<65536 then begin
1365      if s<>t then goto not_found;
1366      end
1367    else begin
1368      if s<>str_pool[str_start_macro(t)] then goto not_found;
1369      end;
1370    end
1371  else begin
1372    if t<65536 then begin
1373      if str_pool[str_start_macro(s)]<>t then goto not_found;
1374      end
1375    else begin
1376      if str_pool[str_start_macro(s)]<>str_pool[str_start_macro(t)] then
1377        goto not_found;
1378      end;
1379    end;
1380  end
1381else begin
1382  j:=str_start_macro(s); k:=str_start_macro(t);
1383  while j<str_start_macro(s+1) do
1384    begin if str_pool[j]<>str_pool[k] then goto not_found;
1385    incr(j); incr(k);
1386    end;
1387  end;
1388result:=true;
1389not_found: str_eq_str:=result;
1390end;
1391
1392@ The initial values of |str_pool|, |str_start|, |pool_ptr|,
1393and |str_ptr| are computed by the \.{INITEX} program, based in part
1394on the information that \.{WEB} has output while processing \TeX.
1395@.INITEX@>
1396@^string pool@>
1397
1398@p @!init function get_strings_started:boolean; {initializes the string pool,
1399  but returns |false| if something goes wrong}
1400label done,exit;
1401var
1402@!m,@!n:text_char; {characters input from |pool_file|}
1403@!g:str_number; {garbage}
1404@!a:integer; {accumulator for check sum}
1405@!c:boolean; {check sum has been checked}
1406begin pool_ptr:=0; str_ptr:=0; str_start[0]:=0;
1407@<Make the first 256 strings@>;
1408@<Read the other strings from the \.{TEX.POOL} file and return |true|,
1409  or give an error message and return |false|@>;
1410exit:end;
1411tini
1412
1413@ The first 65536 strings will consist of a single character only.
1414But we don't actually make them; they're simulated on the fly.
1415
1416@<Make the first 256...@>=
1417begin
1418str_ptr:=too_big_char;
1419str_start_macro(str_ptr):=pool_ptr;
1420end
1421
1422@ The first 128 strings will contain 95 standard ASCII characters, and the
1423other 33 characters will be printed in three-symbol form like `\.{\^\^A}'
1424unless a system-dependent change is made here. Installations that have
1425an extended character set, where for example |xchr[@'32]=@t\.{\'^^Z\'}@>|,
1426would like string @'32 to be the single character @'32 instead of the
1427three characters @'136, @'136, @'132 (\.{\^\^Z}). On the other hand,
1428even people with an extended character set will want to represent string
1429@'15 by \.{\^\^M}, since @'15 is |carriage_return|; the idea is to
1430produce visible strings instead of tabs or line-feeds or carriage-returns
1431or bell-rings or characters that are treated anomalously in text files.
1432
1433Unprintable characters of codes 128--255 are, similarly, rendered
1434\.{\^\^80}--\.{\^\^ff}.
1435
1436The boolean expression defined here should be |true| unless \TeX\
1437internal code number~|k| corresponds to a non-troublesome visible
1438symbol in the local character set.  An appropriate formula for the
1439extended character set recommended in {\sl The \TeX book\/} would, for
1440example, be `|k in [0,@'10..@'12,@'14,@'15,@'33,@'177..@'377]|'.
1441If character |k| cannot be printed, and |k<@'200|, then character |k+@'100| or
1442|k-@'100| must be printable; moreover, ASCII codes |[@'41..@'46,
1443@'60..@'71, @'136, @'141..@'146, @'160..@'171]| must be printable.
1444Thus, at least 81 printable characters are needed.
1445@:TeXbook}{\sl The \TeX book@>
1446@^character set dependencies@>
1447@^system dependencies@>
1448
1449@ When the \.{WEB} system program called \.{TANGLE} processes the \.{TEX.WEB}
1450description that you are now reading, it outputs the \PASCAL\ program
1451\.{TEX.PAS} and also a string pool file called \.{TEX.POOL}. The \.{INITEX}
1452@.WEB@>@.INITEX@>
1453program reads the latter file, where each string appears as a two-digit decimal
1454length followed by the string itself, and the information is recorded in
1455\TeX's string memory.
1456
1457@<Glob...@>=
1458@!init @!pool_file:alpha_file; {the string-pool file output by \.{TANGLE}}
1459tini
1460
1461@ @d bad_pool(#)==begin wake_up_terminal; write_ln(term_out,#);
1462  a_close(pool_file); get_strings_started:=false; return;
1463  end
1464@<Read the other strings...@>=
1465name_of_file:=pool_name; {we needn't set |name_length|}
1466if a_open_in(pool_file) then
1467  begin c:=false;
1468  repeat @<Read one string, but return |false| if the
1469    string memory space is getting too tight for comfort@>;
1470  until c;
1471  a_close(pool_file); get_strings_started:=true;
1472  end
1473else  bad_pool('! I can''t read TEX.POOL.')
1474@.I can't read TEX.POOL@>
1475
1476@ @<Read one string...@>=
1477begin if eof(pool_file) then bad_pool('! TEX.POOL has no check sum.');
1478@.TEX.POOL has no check sum@>
1479read(pool_file,m,n); {read two digits of string length}
1480if m='*' then @<Check the pool check sum@>
1481else  begin if (xord[m]<"0")or(xord[m]>"9")or@|
1482      (xord[n]<"0")or(xord[n]>"9") then
1483    bad_pool('! TEX.POOL line doesn''t begin with two digits.');
1484@.TEX.POOL line doesn't...@>
1485  l:=xord[m]*10+xord[n]-"0"*11; {compute the length}
1486  if pool_ptr+l+string_vacancies>pool_size then
1487    bad_pool('! You have to increase POOLSIZE.');
1488@.You have to increase POOLSIZE@>
1489  for k:=1 to l do
1490    begin if eoln(pool_file) then m:=' '@+else read(pool_file,m);
1491    append_char(xord[m]);
1492    end;
1493  read_ln(pool_file); g:=make_string;
1494  end;
1495end
1496
1497@ The \.{WEB} operation \.{@@\$} denotes the value that should be at the
1498end of this \.{TEX.POOL} file; any other value means that the wrong pool
1499file has been loaded.
1500@^check sum@>
1501
1502@<Check the pool check sum@>=
1503begin a:=0; k:=1;
1504loop@+  begin if (xord[n]<"0")or(xord[n]>"9") then
1505  bad_pool('! TEX.POOL check sum doesn''t have nine digits.');
1506@.TEX.POOL check sum...@>
1507  a:=10*a+xord[n]-"0";
1508  if k=9 then goto done;
1509  incr(k); read(pool_file,n);
1510  end;
1511done: if a<>@$ then bad_pool('! TEX.POOL doesn''t match; TANGLE me again.');
1512@.TEX.POOL doesn't match@>
1513c:=true;
1514end
1515
1516@* \[5] On-line and off-line printing.
1517Messages that are sent to a user's terminal and to the transcript-log file
1518are produced by several `|print|' procedures. These procedures will
1519direct their output to a variety of places, based on the setting of
1520the global variable |selector|, which has the following possible
1521values:
1522
1523\yskip
1524\hang |term_and_log|, the normal setting, prints on the terminal and on the
1525  transcript file.
1526
1527\hang |log_only|, prints only on the transcript file.
1528
1529\hang |term_only|, prints only on the terminal.
1530
1531\hang |no_print|, doesn't print at all. This is used only in rare cases
1532  before the transcript file is open.
1533
1534\hang |pseudo|, puts output into a cyclic buffer that is used
1535  by the |show_context| routine; when we get to that routine we shall discuss
1536  the reasoning behind this curious mode.
1537
1538\hang |new_string|, appends the output to the current string in the
1539  string pool.
1540
1541\hang 0 to 15, prints on one of the sixteen files for \.{\\write} output.
1542
1543\yskip
1544\noindent The symbolic names `|term_and_log|', etc., have been assigned
1545numeric codes that satisfy the convenient relations |no_print+1=term_only|,
1546|no_print+2=log_only|, |term_only+2=log_only+1=term_and_log|.
1547
1548Three additional global variables, |tally| and |term_offset| and
1549|file_offset|, record the number of characters that have been printed
1550since they were most recently cleared to zero. We use |tally| to record
1551the length of (possibly very long) stretches of printing; |term_offset|
1552and |file_offset|, on the other hand, keep track of how many characters
1553have appeared so far on the current line that has been output to the
1554terminal or to the transcript file, respectively.
1555
1556@d no_print=16 {|selector| setting that makes data disappear}
1557@d term_only=17 {printing is destined for the terminal only}
1558@d log_only=18 {printing is destined for the transcript file only}
1559@d term_and_log=19 {normal |selector| setting}
1560@d pseudo=20 {special |selector| setting for |show_context|}
1561@d new_string=21 {printing is deflected to the string pool}
1562@d max_selector=21 {highest selector setting}
1563
1564@<Glob...@>=
1565@!log_file : alpha_file; {transcript of \TeX\ session}
1566@!selector : 0..max_selector; {where to print a message}
1567@!dig : array[0..22] of 0..15; {digits in a number being output}
1568@!tally : integer; {the number of characters recently printed}
1569@!term_offset : 0..max_print_line;
1570  {the number of characters on the current terminal line}
1571@!file_offset : 0..max_print_line;
1572  {the number of characters on the current file line}
1573@!trick_buf:array[0..error_line] of ASCII_code; {circular buffer for
1574  pseudoprinting}
1575@!trick_count: integer; {threshold for pseudoprinting, explained later}
1576@!first_count: integer; {another variable for pseudoprinting}
1577
1578@ @<Initialize the output routines@>=
1579selector:=term_only; tally:=0; term_offset:=0; file_offset:=0;
1580
1581@ Macro abbreviations for output to the terminal and to the log file are
1582defined here for convenience. Some systems need special conventions
1583for terminal output, and it is possible to adhere to those conventions
1584by changing |wterm|, |wterm_ln|, and |wterm_cr| in this section.
1585@^system dependencies@>
1586
1587@d wterm(#)==write(term_out,#)
1588@d wterm_ln(#)==write_ln(term_out,#)
1589@d wterm_cr==write_ln(term_out)
1590@d wlog(#)==write(log_file,#)
1591@d wlog_ln(#)==write_ln(log_file,#)
1592@d wlog_cr==write_ln(log_file)
1593
1594@ To end a line of text output, we call |print_ln|.
1595
1596@<Basic print...@>=
1597procedure print_ln; {prints an end-of-line}
1598begin case selector of
1599term_and_log: begin wterm_cr; wlog_cr;
1600  term_offset:=0; file_offset:=0;
1601  end;
1602log_only: begin wlog_cr; file_offset:=0;
1603  end;
1604term_only: begin wterm_cr; term_offset:=0;
1605  end;
1606no_print,pseudo,new_string: do_nothing;
1607othercases write_ln(write_file[selector])
1608endcases;@/
1609end; {|tally| is not affected}
1610
1611@ The |print_raw_char| procedure sends one character to the desired destination,
1612using the |xchr| array to map it into an external character compatible with
1613|input_ln|. All printing comes through |print_ln|, |print_char| or
1614|print_visible_char|. When printing a multi-byte character, the boolean
1615parameter |incr_offset| is set |false| except for the very last byte, to avoid
1616calling |print_ln| in the middle of such character.
1617
1618@<Basic printing...@>=
1619procedure print_raw_char(@!s:ASCII_code;@!incr_offset:boolean); {prints a single character}
1620label exit; {label is not used but nonetheless kept (for other changes?)}
1621begin
1622case selector of
1623term_and_log: begin wterm(xchr[s]); wlog(xchr[s]);
1624  if incr_offset then
1625    begin incr(term_offset); incr(file_offset);
1626    end;
1627  if term_offset=max_print_line then
1628    begin wterm_cr; term_offset:=0;
1629    end;
1630  if file_offset=max_print_line then
1631    begin wlog_cr; file_offset:=0;
1632    end;
1633  end;
1634log_only: begin wlog(xchr[s]);
1635  if incr_offset then incr(file_offset);
1636  if file_offset=max_print_line then print_ln;
1637  end;
1638term_only: begin wterm(xchr[s]);
1639  if incr_offset then incr(term_offset);
1640  if term_offset=max_print_line then print_ln;
1641  end;
1642no_print: do_nothing;
1643pseudo: if tally<trick_count then trick_buf[tally mod error_line]:=s;
1644new_string: begin if pool_ptr<pool_size then append_char(s);
1645  end; {we drop characters if the string space is full}
1646othercases write(write_file[selector],xchr[s])
1647endcases;@/
1648incr(tally);
1649exit:end;
1650
1651@ The |print_char| procedure sends one character to the desired destination.
1652Control sequence names, file names and string constructed with
1653\.{\\string} might contain |ASCII_code| values that can't
1654be printed using |print_raw_char|.  These characters will be printed
1655in three- or four-symbol form like `\.{\^\^A}' or `\.{\^\^e4}',
1656unless the -8bit option is enabled.
1657Output that goes to the terminal and/or log file is treated differently
1658when it comes to determining whether a character is printable.
1659
1660@d print_visible_char(#)==print_raw_char(#, true)
1661@d print_lc_hex(#)==l:=#;
1662  if l<10 then print_visible_char(l+"0")@+else print_visible_char(l-10+"a")
1663
1664@<Basic printing...@>=
1665procedure print_char(@!s:integer); {prints a single character}
1666label exit;
1667var l: small_number;
1668begin if (selector>pseudo) and (not doing_special) then
1669  {``printing'' to a new string, encode as UTF-16 rather than UTF-8}
1670  begin if s>=@"10000 then
1671   begin print_visible_char(@"D800 + (s - @"10000) div @"400);
1672   print_visible_char(@"DC00 + (s - @"10000) mod @"400);
1673   end else print_visible_char(s);
1674   return;
1675  end;
1676if @<Character |s| is the current new-line character@> then
1677 if selector<pseudo then
1678  begin print_ln; return;
1679  end;
1680 if (s < 32) and (eight_bit_p = 0) and (not doing_special) then {control char: \.{\^\^X}}
1681  begin print_visible_char("^"); print_visible_char("^"); print_visible_char(s+64);
1682  end
1683 else if s < 127 then { printable ASCII }
1684  print_visible_char(s)
1685 else if (s = 127) then { DEL }
1686  begin if (eight_bit_p = 0) and (not doing_special) then
1687   begin print_visible_char("^"); print_visible_char("^"); print_visible_char("?")
1688   end else
1689    print_visible_char(s)
1690  end
1691 else if (s < @"A0) and (eight_bit_p = 0) and (not doing_special) then {C1 controls: \.{\^\^xx}}
1692  begin print_visible_char("^"); print_visible_char("^");
1693  print_lc_hex((s mod @"100) div @"10); print_lc_hex(s mod @"10);
1694  end
1695 else begin
1696  { |char >= 128|: encode as UTF8 }
1697  if s<@"800 then begin
1698   print_raw_char(@"C0 + s div @"40, false);
1699   print_raw_char(@"80 + s mod @"40, true);
1700   end
1701  else if s<@"10000 then begin
1702   print_raw_char(@"E0 + (s div @"1000), false);
1703   print_raw_char(@"80 + (s mod @"1000) div @"40, false);
1704   print_raw_char(@"80 + (s mod @"40), true);
1705   end
1706  else begin
1707   print_raw_char(@"F0 + (s div @"40000), false);
1708   print_raw_char(@"80 + (s mod @"40000) div @"1000, false);
1709   print_raw_char(@"80 + (s mod @"1000) div @"40, false);
1710   print_raw_char(@"80 + (s mod @"40), true);
1711   end
1712 end;
1713exit:end;
1714
1715@ @d native_room(#)==while native_text_size <= native_len+# do begin
1716    native_text_size:=native_text_size+128;
1717    native_text:=xrealloc(native_text, native_text_size * sizeof(UTF16_code));
1718  end
1719@d append_native(#)==begin native_text[native_len]:=#; incr(native_len); end
1720
1721@ @<Glob...@>=
1722@!doing_special: boolean;
1723@!native_text: ^UTF16_code; { buffer for collecting native-font strings }
1724@!native_text_size: integer; { size of buffer }
1725@!native_len: integer;
1726@!save_native_len: integer;
1727
1728@ @<Set init...@>=
1729doing_special:=false;
1730native_text_size:=128;
1731native_text:=xmalloc(native_text_size * sizeof(UTF16_code));
1732
1733@ An entire string is output by calling |print|. Note that if we are outputting
1734the single standard ASCII character \.c, we could call |print("c")|, since
1735|"c"=99| is the number of a single-character string, as explained above. But
1736|print_char("c")| is quicker, so \TeX\ goes directly to the |print_char|
1737routine when it knows that this is safe. (The present implementation
1738assumes that it is always safe to print a visible ASCII character.)
1739@^system dependencies@>
1740
1741@<Basic print...@>=
1742procedure print(@!s:integer); {prints string |s|}
1743label exit;
1744var j:pool_pointer; {current character code position}
1745@!nl:integer; {new-line character to restore}
1746begin if s>=str_ptr then s:="???" {this can't happen}
1747@.???@>
1748else if s<biggest_char then
1749  if s<0 then s:="???" {can't happen}
1750  else begin if selector>pseudo then
1751      begin print_char(s); return; {internal strings are not expanded}
1752      end;
1753    if (@<Character |s| is the current new-line character@>) then
1754      if selector<pseudo then
1755        begin print_ln; return;
1756        end;
1757    nl:=new_line_char;
1758    new_line_char:=-1;
1759    print_char(s);
1760    new_line_char:=nl;
1761    return;
1762    end;
1763j:=str_start_macro(s);
1764while j<str_start_macro(s+1) do begin
1765  if (so(str_pool[j])>=@"D800) and (so(str_pool[j])<=@"DBFF)
1766    and (j+1<str_start_macro(s+1))
1767    and (so(str_pool[j+1])>=@"DC00) and (so(str_pool[j+1])<=@"DFFF) then
1768    begin print_char(@"10000 + (so(str_pool[j])-@"D800) * @"400
1769                     + so(str_pool[j+1])-@"DC00); j:=j+2;
1770    end
1771  else begin print_char(so(str_pool[j])); incr(j);
1772  end;
1773end;
1774exit:end;
1775
1776@ Old versions of \TeX\ needed a procedure called |slow_print| whose function
1777is now subsumed by |print| and the new functionality of |print_char| and
1778|print_visible_char|.  We retain the old name |slow_print| here as a
1779possible aid to future software arch\ae ologists.
1780
1781@d slow_print == print
1782
1783@ Here is the very first thing that \TeX\ prints: a headline that identifies
1784the version number and format package. The |term_offset| variable is temporarily
1785incorrect, but the discrepancy is not serious since we assume that the banner
1786and format identifier together will occupy at most |max_print_line|
1787character positions.
1788
1789@<Initialize the output...@>=
1790wterm(banner);
1791if format_ident=0 then wterm_ln(' (no format preloaded)')
1792else  begin slow_print(format_ident); print_ln;
1793  end;
1794update_terminal;
1795
1796@ The procedure |print_nl| is like |print|, but it makes sure that the
1797string appears at the beginning of a new line.
1798
1799@<Basic print...@>=
1800procedure print_nl(@!s:str_number); {prints string |s| at beginning of line}
1801begin if ((term_offset>0)and(odd(selector)))or@|
1802  ((file_offset>0)and(selector>=log_only)) then print_ln;
1803print(s);
1804end;
1805
1806@ The procedure |print_esc| prints a string that is preceded by
1807the user's escape character (which is usually a backslash).
1808
1809@<Basic print...@>=
1810procedure print_esc(@!s:str_number); {prints escape character, then |s|}
1811var c:integer; {the escape character code}
1812begin  @<Set variable |c| to the current escape character@>;
1813if c>=0 then if c<=biggest_usv then print_char(c);
1814slow_print(s);
1815end;
1816
1817@ An array of digits in the range |0..15| is printed by |print_the_digs|.
1818
1819@<Basic print...@>=
1820procedure print_the_digs(@!k:eight_bits);
1821  {prints |dig[k-1]|$\,\ldots\,$|dig[0]|}
1822begin while k>0 do
1823  begin decr(k);
1824  if dig[k]<10 then print_char("0"+dig[k])
1825  else print_char("A"-10+dig[k]);
1826  end;
1827end;
1828
1829@ The following procedure, which prints out the decimal representation of a
1830given integer |n|, has been written carefully so that it works properly
1831if |n=0| or if |(-n)| would cause overflow. It does not apply |mod| or |div|
1832to negative arguments, since such operations are not implemented consistently
1833by all \PASCAL\ compilers.
1834
1835@<Basic print...@>=
1836procedure print_int(@!n:integer); {prints an integer in decimal form}
1837var k:0..23; {index to current digit; we assume that $|n|<10^{23}$}
1838@!m:integer; {used to negate |n| in possibly dangerous cases}
1839begin k:=0;
1840if n<0 then
1841  begin print_char("-");
1842  if n>-100000000 then negate(n)
1843  else  begin m:=-1-n; n:=m div 10; m:=(m mod 10)+1; k:=1;
1844    if m<10 then dig[0]:=m
1845    else  begin dig[0]:=0; incr(n);
1846      end;
1847    end;
1848  end;
1849repeat dig[k]:=n mod 10; n:=n div 10; incr(k);
1850until n=0;
1851print_the_digs(k);
1852end;
1853
1854@ Here is a trivial procedure to print two digits; it is usually called with
1855a parameter in the range |0<=n<=99|.
1856
1857@p procedure print_two(@!n:integer); {prints two least significant digits}
1858begin n:=abs(n) mod 100; print_char("0"+(n div 10));
1859print_char("0"+(n mod 10));
1860end;
1861
1862@ Hexadecimal printing of nonnegative integers is accomplished by |print_hex|.
1863
1864@p procedure print_hex(@!n:integer);
1865  {prints a positive integer in hexadecimal form}
1866var k:0..22; {index to current digit; we assume that $0\L n<16^{22}$}
1867begin k:=0; print_char("""");
1868repeat dig[k]:=n mod 16; n:=n div 16; incr(k);
1869until n=0;
1870print_the_digs(k);
1871end;
1872
1873@ Old versions of \TeX\ needed a procedure called |print_ASCII| whose function
1874is now subsumed by |print|. We retain the old name here as a possible aid to
1875future software arch\ae ologists.
1876
1877@d print_ASCII == print
1878
1879@ Roman numerals are produced by the |print_roman_int| routine.  Readers
1880who like puzzles might enjoy trying to figure out how this tricky code
1881works; therefore no explanation will be given. Notice that 1990 yields
1882\.{mcmxc}, not \.{mxm}.
1883
1884@p procedure print_roman_int(@!n:integer);
1885label exit;
1886var j,@!k: pool_pointer; {mysterious indices into |str_pool|}
1887@!u,@!v: nonnegative_integer; {mysterious numbers}
1888begin j:=str_start_macro("m2d5c2l5x2v5i"); v:=1000;
1889loop@+  begin while n>=v do
1890    begin print_char(so(str_pool[j])); n:=n-v;
1891    end;
1892  if n<=0 then return; {nonpositive input produces no output}
1893  k:=j+2; u:=v div (so(str_pool[k-1])-"0");
1894  if str_pool[k-1]=si("2") then
1895    begin k:=k+2; u:=u div (so(str_pool[k-1])-"0");
1896    end;
1897  if n+u>=v then
1898    begin print_char(so(str_pool[k])); n:=n+u;
1899    end
1900  else  begin j:=j+2; v:=v div (so(str_pool[j-1])-"0");
1901    end;
1902  end;
1903exit:end;
1904
1905@ The |print| subroutine will not print a string that is still being
1906created. The following procedure will.
1907
1908@p procedure print_current_string; {prints a yet-unmade string}
1909var j:pool_pointer; {points to current character code}
1910begin j:=str_start_macro(str_ptr);
1911while j<pool_ptr do
1912  begin print_char(so(str_pool[j])); incr(j);
1913  end;
1914end;
1915
1916@ Here is a procedure that asks the user to type a line of input,
1917assuming that the |selector| setting is either |term_only| or |term_and_log|.
1918The input is placed into locations |first| through |last-1| of the
1919|buffer| array, and echoed on the transcript file if appropriate.
1920
1921This procedure is never called when |interaction<scroll_mode|.
1922
1923@d prompt_input(#)==begin wake_up_terminal; print(#); term_input;
1924    end {prints a string and gets a line of input}
1925
1926@p procedure term_input; {gets a line from the terminal}
1927var k:0..buf_size; {index into |buffer|}
1928begin update_terminal; {now the user sees the prompt for sure}
1929if not input_ln(term_in,true) then fatal_error("End of file on the terminal!");
1930@.End of file on the terminal@>
1931term_offset:=0; {the user's line ended with \<\rm return>}
1932decr(selector); {prepare to echo the input}
1933if last<>first then for k:=first to last-1 do print(buffer[k]);
1934print_ln; incr(selector); {restore previous status}
1935end;
1936
1937@* \[6] Reporting errors.
1938When something anomalous is detected, \TeX\ typically does something like this:
1939$$\vbox{\halign{#\hfil\cr
1940|print_err("Something anomalous has been detected");|\cr
1941|help3("This is the first line of my offer to help.")|\cr
1942|("This is the second line. I'm trying to")|\cr
1943|("explain the best way for you to proceed.");|\cr
1944|error;|\cr}}$$
1945A two-line help message would be given using |help2|, etc.; these informal
1946helps should use simple vocabulary that complements the words used in the
1947official error message that was printed. (Outside the U.S.A., the help
1948messages should preferably be translated into the local vernacular. Each
1949line of help is at most 60 characters long, in the present implementation,
1950so that |max_print_line| will not be exceeded.)
1951
1952The |print_err| procedure supplies a `\.!' before the official message,
1953and makes sure that the terminal is awake if a stop is going to occur.
1954The |error| procedure supplies a `\..' after the official message, then it
1955shows the location of the error; and if |interaction=error_stop_mode|,
1956it also enters into a dialog with the user, during which time the help
1957message may be printed.
1958@^system dependencies@>
1959
1960@ The global variable |interaction| has four settings, representing increasing
1961amounts of user interaction:
1962
1963@d batch_mode=0 {omits all stops and omits terminal output}
1964@d nonstop_mode=1 {omits all stops}
1965@d scroll_mode=2 {omits error stops}
1966@d error_stop_mode=3 {stops at every opportunity to interact}
1967@d print_err(#)==begin if interaction=error_stop_mode then wake_up_terminal;
1968  print_nl("! "); print(#);
1969  end
1970
1971@<Glob...@>=
1972@!interaction:batch_mode..error_stop_mode; {current level of interaction}
1973
1974@ @<Set init...@>=interaction:=error_stop_mode;
1975
1976@ \TeX\ is careful not to call |error| when the print |selector| setting
1977might be unusual. The only possible values of |selector| at the time of
1978error messages are
1979
1980\yskip\hang|no_print| (when |interaction=batch_mode|
1981  and |log_file| not yet open);
1982
1983\hang|term_only| (when |interaction>batch_mode| and |log_file| not yet open);
1984
1985\hang|log_only| (when |interaction=batch_mode| and |log_file| is open);
1986
1987\hang|term_and_log| (when |interaction>batch_mode| and |log_file| is open).
1988
1989@<Initialize the print |selector| based on |interaction|@>=
1990if interaction=batch_mode then selector:=no_print@+else selector:=term_only
1991
1992@ A global variable |deletions_allowed| is set |false| if the |get_next|
1993routine is active when |error| is called; this ensures that |get_next|
1994and related routines like |get_token| will never be called recursively.
1995A similar interlock is provided by |set_box_allowed|.
1996@^recursion@>
1997
1998The global variable |history| records the worst level of error that
1999has been detected. It has four possible values: |spotless|, |warning_issued|,
2000|error_message_issued|, and |fatal_error_stop|.
2001
2002Another global variable, |error_count|, is increased by one when an
2003|error| occurs without an interactive dialog, and it is reset to zero at
2004the end of every paragraph.  If |error_count| reaches 100, \TeX\ decides
2005that there is no point in continuing further.
2006
2007@d spotless=0 {|history| value when nothing has been amiss yet}
2008@d warning_issued=1 {|history| value when |begin_diagnostic| has been called}
2009@d error_message_issued=2 {|history| value when |error| has been called}
2010@d fatal_error_stop=3 {|history| value when termination was premature}
2011
2012@<Glob...@>=
2013@!deletions_allowed:boolean; {is it safe for |error| to call |get_token|?}
2014@!set_box_allowed:boolean; {is it safe to do a \.{\\setbox} assignment?}
2015@!history:spotless..fatal_error_stop; {has the source input been clean so far?}
2016@!error_count:-1..100; {the number of scrolled errors since the
2017  last paragraph ended}
2018
2019@ The value of |history| is initially |fatal_error_stop|, but it will
2020be changed to |spotless| if \TeX\ survives the initialization process.
2021
2022@<Set init...@>=
2023deletions_allowed:=true; set_box_allowed:=true;
2024error_count:=0; {|history| is initialized elsewhere}
2025
2026@ Since errors can be detected almost anywhere in \TeX, we want to declare the
2027error procedures near the beginning of the program. But the error procedures
2028in turn use some other procedures, which need to be declared |forward|
2029before we get to |error| itself.
2030
2031It is possible for |error| to be called recursively if some error arises
2032when |get_token| is being used to delete a token, and/or if some fatal error
2033occurs while \TeX\ is trying to fix a non-fatal one. But such recursion
2034@^recursion@>
2035is never more than two levels deep.
2036
2037@<Error handling...@>=
2038procedure@?normalize_selector; forward;@t\2@>@/
2039procedure@?get_token; forward;@t\2@>@/
2040procedure@?term_input; forward;@t\2@>@/
2041procedure@?show_context; forward;@t\2@>@/
2042procedure@?begin_file_reading; forward;@t\2@>@/
2043procedure@?open_log_file; forward;@t\2@>@/
2044procedure@?close_files_and_terminate; forward;@t\2@>@/
2045procedure@?clear_for_error_prompt; forward;@t\2@>@/
2046procedure@?give_err_help; forward;@t\2@>@/
2047@t\4\hskip-\fontdimen2\font@>@;@+@!debug@+procedure@?debug_help;
2048  forward;@;@+gubed
2049
2050@ Individual lines of help are recorded in the array |help_line|, which
2051contains entries in positions |0..(help_ptr-1)|. They should be printed
2052in reverse order, i.e., with |help_line[0]| appearing last.
2053
2054@d hlp1(#)==help_line[0]:=#;@+end
2055@d hlp2(#)==help_line[1]:=#; hlp1
2056@d hlp3(#)==help_line[2]:=#; hlp2
2057@d hlp4(#)==help_line[3]:=#; hlp3
2058@d hlp5(#)==help_line[4]:=#; hlp4
2059@d hlp6(#)==help_line[5]:=#; hlp5
2060@d help0==help_ptr:=0 {sometimes there might be no help}
2061@d help1==@+begin help_ptr:=1; hlp1 {use this with one help line}
2062@d help2==@+begin help_ptr:=2; hlp2 {use this with two help lines}
2063@d help3==@+begin help_ptr:=3; hlp3 {use this with three help lines}
2064@d help4==@+begin help_ptr:=4; hlp4 {use this with four help lines}
2065@d help5==@+begin help_ptr:=5; hlp5 {use this with five help lines}
2066@d help6==@+begin help_ptr:=6; hlp6 {use this with six help lines}
2067
2068@<Glob...@>=
2069@!help_line:array[0..5] of str_number; {helps for the next |error|}
2070@!help_ptr:0..6; {the number of help lines present}
2071@!use_err_help:boolean; {should the |err_help| list be shown?}
2072
2073@ @<Set init...@>=
2074help_ptr:=0; use_err_help:=false;
2075
2076@ The |jump_out| procedure just cuts across all active procedure levels and
2077goes to |end_of_TEX|. This is the only nontrivial |@!goto| statement in the
2078whole program. It is used when there is no recovery from a particular error.
2079
2080Some \PASCAL\ compilers do not implement non-local |goto| statements.
2081@^system dependencies@>
2082In such cases the body of |jump_out| should simply be
2083`|close_files_and_terminate|;\thinspace' followed by a call on some system
2084procedure that quietly terminates the program.
2085
2086@<Error hand...@>=
2087procedure jump_out;
2088begin goto end_of_TEX;
2089end;
2090
2091@ Here now is the general |error| routine.
2092
2093@<Error hand...@>=
2094procedure error; {completes the job of error reporting}
2095label continue,exit;
2096var c:ASCII_code; {what the user types}
2097@!s1,@!s2,@!s3,@!s4:integer;
2098  {used to save global variables when deleting tokens}
2099begin if history<error_message_issued then history:=error_message_issued;
2100print_char("."); show_context;
2101if interaction=error_stop_mode then @<Get user's advice and |return|@>;
2102incr(error_count);
2103if error_count=100 then
2104  begin print_nl("(That makes 100 errors; please try again.)");
2105@.That makes 100 errors...@>
2106  history:=fatal_error_stop; jump_out;
2107  end;
2108@<Put help message on the transcript file@>;
2109exit:end;
2110
2111@ @<Get user's advice...@>=
2112loop@+begin continue: clear_for_error_prompt; prompt_input("? ");
2113@.?\relax@>
2114  if last=first then return;
2115  c:=buffer[first];
2116  if c>="a" then c:=c+"A"-"a"; {convert to uppercase}
2117  @<Interpret code |c| and |return| if done@>;
2118  end
2119
2120@ It is desirable to provide an `\.E' option here that gives the user
2121an easy way to return from \TeX\ to the system editor, with the offending
2122line ready to be edited. But such an extension requires some system
2123wizardry, so the present implementation simply types out the name of the
2124file that should be
2125edited and the relevant line number.
2126@^system dependencies@>
2127
2128There is a secret `\.D' option available when the debugging routines haven't
2129been commented~out.
2130@^debugging@>
2131
2132@<Interpret code |c| and |return| if done@>=
2133case c of
2134"0","1","2","3","4","5","6","7","8","9": if deletions_allowed then
2135  @<Delete \(c)|c-"0"| tokens and |goto continue|@>;
2136@t\4\4@>@;@+@!debug "D": begin debug_help; goto continue;@+end;@+gubed@/
2137"E": if base_ptr>0 then
2138  begin print_nl("You want to edit file ");
2139@.You want to edit file x@>
2140  slow_print(input_stack[base_ptr].name_field);
2141  print(" at line "); print_int(line);
2142  interaction:=scroll_mode; jump_out;
2143  end;
2144"H": @<Print the help information and |goto continue|@>;
2145"I":@<Introduce new material from the terminal and |return|@>;
2146"Q","R","S":@<Change the interaction level and |return|@>;
2147"X":begin interaction:=scroll_mode; jump_out;
2148  end;
2149othercases do_nothing
2150endcases;@/
2151@<Print the menu of available options@>
2152
2153@ @<Print the menu...@>=
2154begin print("Type <return> to proceed, S to scroll future error messages,");@/
2155@.Type <return> to proceed...@>
2156print_nl("R to run without stopping, Q to run quietly,");@/
2157print_nl("I to insert something, ");
2158if base_ptr>0 then print("E to edit your file,");
2159if deletions_allowed then
2160  print_nl("1 or ... or 9 to ignore the next 1 to 9 tokens of input,");
2161print_nl("H for help, X to quit.");
2162end
2163
2164@ Here the author of \TeX\ apologizes for making use of the numerical
2165relation between |"Q"|, |"R"|, |"S"|, and the desired interaction settings
2166|batch_mode|, |nonstop_mode|, |scroll_mode|.
2167@^Knuth, Donald Ervin@>
2168
2169@<Change the interaction...@>=
2170begin error_count:=0; interaction:=batch_mode+c-"Q";
2171print("OK, entering ");
2172case c of
2173"Q":begin print_esc("batchmode"); decr(selector);
2174  end;
2175"R":print_esc("nonstopmode");
2176"S":print_esc("scrollmode");
2177end; {there are no other cases}
2178print("..."); print_ln; update_terminal; return;
2179end
2180
2181@ When the following code is executed, |buffer[(first+1)..(last-1)]| may
2182contain the material inserted by the user; otherwise another prompt will
2183be given. In order to understand this part of the program fully, you need
2184to be familiar with \TeX's input stacks.
2185
2186@<Introduce new material...@>=
2187begin begin_file_reading; {enter a new syntactic level for terminal input}
2188{now |state=mid_line|, so an initial blank space will count as a blank}
2189if last>first+1 then
2190  begin loc:=first+1; buffer[first]:=" ";
2191  end
2192else  begin prompt_input("insert>"); loc:=first;
2193@.insert>@>
2194  end;
2195first:=last;
2196cur_input.limit_field:=last-1; {no |end_line_char| ends this line}
2197return;
2198end
2199
2200@ We allow deletion of up to 99 tokens at a time.
2201
2202@<Delete \(c)|c-"0"| tokens...@>=
2203begin s1:=cur_tok; s2:=cur_cmd; s3:=cur_chr; s4:=align_state;
2204align_state:=1000000; OK_to_interrupt:=false;
2205if (last>first+1) and (buffer[first+1]>="0")and(buffer[first+1]<="9") then
2206  c:=c*10+buffer[first+1]-"0"*11
2207else c:=c-"0";
2208while c>0 do
2209  begin get_token; {one-level recursive call of |error| is possible}
2210  decr(c);
2211  end;
2212cur_tok:=s1; cur_cmd:=s2; cur_chr:=s3; align_state:=s4; OK_to_interrupt:=true;
2213help2("I have just deleted some text, as you asked.")@/
2214("You can now delete more, or insert, or whatever.");
2215show_context; goto continue;
2216end
2217
2218@ @<Print the help info...@>=
2219begin if use_err_help then
2220  begin give_err_help; use_err_help:=false;
2221  end
2222else  begin if help_ptr=0 then
2223    help2("Sorry, I don't know how to help in this situation.")@/
2224    @t\kern1em@>("Maybe you should try asking a human?");
2225  repeat decr(help_ptr); print(help_line[help_ptr]); print_ln;
2226  until help_ptr=0;
2227  end;
2228help4("Sorry, I already gave what help I could...")@/
2229  ("Maybe you should try asking a human?")@/
2230  ("An error might have occurred before I noticed any problems.")@/
2231  ("``If all else fails, read the instructions.''");@/
2232goto continue;
2233end
2234
2235@ @<Put help message on the transcript file@>=
2236if interaction>batch_mode then decr(selector); {avoid terminal output}
2237if use_err_help then
2238  begin print_ln; give_err_help;
2239  end
2240else while help_ptr>0 do
2241  begin decr(help_ptr); print_nl(help_line[help_ptr]);
2242  end;
2243print_ln;
2244if interaction>batch_mode then incr(selector); {re-enable terminal output}
2245print_ln
2246
2247@ A dozen or so error messages end with a parenthesized integer, so we
2248save a teeny bit of program space by declaring the following procedure:
2249
2250@p procedure int_error(@!n:integer);
2251begin print(" ("); print_int(n); print_char(")"); error;
2252end;
2253
2254@ In anomalous cases, the print selector might be in an unknown state;
2255the following subroutine is called to fix things just enough to keep
2256running a bit longer.
2257
2258@p procedure normalize_selector;
2259begin if log_opened then selector:=term_and_log
2260else selector:=term_only;
2261if job_name=0 then open_log_file;
2262if interaction=batch_mode then decr(selector);
2263end;
2264
2265@ The following procedure prints \TeX's last words before dying.
2266
2267@d succumb==begin if interaction=error_stop_mode then
2268    interaction:=scroll_mode; {no more interaction}
2269  if log_opened then error;
2270  @!debug if interaction>batch_mode then debug_help;@+gubed@;@/
2271  history:=fatal_error_stop; jump_out; {irrecoverable error}
2272  end
2273
2274@<Error hand...@>=
2275procedure fatal_error(@!s:str_number); {prints |s|, and that's it}
2276begin normalize_selector;@/
2277print_err("Emergency stop"); help1(s); succumb;
2278@.Emergency stop@>
2279end;
2280
2281@ Here is the most dreaded error message.
2282
2283@<Error hand...@>=
2284procedure overflow(@!s:str_number;@!n:integer); {stop due to finiteness}
2285begin normalize_selector;
2286print_err("TeX capacity exceeded, sorry [");
2287@.TeX capacity exceeded ...@>
2288print(s); print_char("="); print_int(n); print_char("]");
2289help2("If you really absolutely need more capacity,")@/
2290  ("you can ask a wizard to enlarge me.");
2291succumb;
2292end;
2293
2294@ The program might sometime run completely amok, at which point there is
2295no choice but to stop. If no previous error has been detected, that's bad
2296news; a message is printed that is really intended for the \TeX\
2297maintenance person instead of the user (unless the user has been
2298particularly diabolical).  The index entries for `this can't happen' may
2299help to pinpoint the problem.
2300@^dry rot@>
2301
2302@<Error hand...@>=
2303procedure confusion(@!s:str_number);
2304  {consistency check violated; |s| tells where}
2305begin normalize_selector;
2306if history<error_message_issued then
2307  begin print_err("This can't happen ("); print(s); print_char(")");
2308@.This can't happen@>
2309  help1("I'm broken. Please show this to someone who can fix can fix");
2310  end
2311else  begin print_err("I can't go on meeting you like this");
2312@.I can't go on...@>
2313  help2("One of your faux pas seems to have wounded me deeply...")@/
2314    ("in fact, I'm barely conscious. Please fix it and try again.");
2315  end;
2316succumb;
2317end;
2318
2319@ Users occasionally want to interrupt \TeX\ while it's running.
2320If the \PASCAL\ runtime system allows this, one can implement
2321a routine that sets the global variable |interrupt| to some nonzero value
2322when such an interrupt is signalled. Otherwise there is probably at least
2323a way to make |interrupt| nonzero using the \PASCAL\ debugger.
2324@^system dependencies@>
2325@^debugging@>
2326
2327@d check_interrupt==begin if interrupt<>0 then pause_for_instructions;
2328  end
2329
2330@<Global...@>=
2331@!interrupt:integer; {should \TeX\ pause for instructions?}
2332@!OK_to_interrupt:boolean; {should interrupts be observed?}
2333
2334@ @<Set init...@>=
2335interrupt:=0; OK_to_interrupt:=true;
2336
2337@ When an interrupt has been detected, the program goes into its
2338highest interaction level and lets the user have nearly the full flexibility of
2339the |error| routine.  \TeX\ checks for interrupts only at times when it is
2340safe to do this.
2341
2342@p procedure pause_for_instructions;
2343begin if OK_to_interrupt then
2344  begin interaction:=error_stop_mode;
2345  if (selector=log_only)or(selector=no_print) then
2346    incr(selector);
2347  print_err("Interruption");
2348@.Interruption@>
2349  help3("You rang?")@/
2350  ("Try to insert some instructions for me (e.g.,`I\showlists'),")@/
2351  ("unless you just want to quit by typing `X'.");
2352  deletions_allowed:=false; error; deletions_allowed:=true;
2353  interrupt:=0;
2354  end;
2355end;
2356
2357@* \[7] Arithmetic with scaled dimensions.
2358The principal computations performed by \TeX\ are done entirely in terms of
2359integers less than $2^{31}$ in magnitude; and divisions are done only when both
2360dividend and divisor are nonnegative. Thus, the arithmetic specified in this
2361program can be carried out in exactly the same way on a wide variety of
2362computers, including some small ones. Why? Because the arithmetic
2363calculations need to be spelled out precisely in order to guarantee that
2364\TeX\ will produce identical output on different machines. If some
2365quantities were rounded differently in different implementations, we would
2366find that line breaks and even page breaks might occur in different places.
2367Hence the arithmetic of \TeX\ has been designed with care, and systems that
2368claim to be implementations of \TeX82 should follow precisely the
2369@:TeX82}{\TeX82@>
2370calculations as they appear in the present program.
2371
2372(Actually there are three places where \TeX\ uses |div| with a possibly negative
2373numerator. These are harmless; see |div| in the index. Also if the user
2374sets the \.{\\time} or the \.{\\year} to a negative value, some diagnostic
2375information will involve negative-numerator division. The same remarks
2376apply for |mod| as well as for |div|.)
2377
2378@ Here is a routine that calculates half of an integer, using an
2379unambiguous convention with respect to signed odd numbers.
2380
2381@p function half(@!x:integer):integer;
2382begin if odd(x) then half:=(x+1) div 2
2383else half:=x @!div 2;
2384end;
2385
2386@ Fixed-point arithmetic is done on {\sl scaled integers\/} that are multiples
2387of $2^{-16}$. In other words, a binary point is assumed to be sixteen bit
2388positions from the right end of a binary computer word.
2389
2390@d unity == @'200000 {$2^{16}$, represents 1.00000}
2391@d two == @'400000 {$2^{17}$, represents 2.00000}
2392
2393@<Types...@>=
2394@!scaled = integer; {this type is used for scaled integers}
2395@!nonnegative_integer=0..@'17777777777; {$0\L x<2^{31}$}
2396@!small_number=0..63; {this type is self-explanatory}
2397
2398@ The following function is used to create a scaled integer from a given decimal
2399fraction $(.d_0d_1\ldots d_{k-1})$, where |0<=k<=17|. The digit $d_i$ is
2400given in |dig[i]|, and the calculation produces a correctly rounded result.
2401
2402@p function round_decimals(@!k:small_number) : scaled;
2403  {converts a decimal fraction}
2404var a:integer; {the accumulator}
2405begin a:=0;
2406while k>0 do
2407  begin decr(k); a:=(a+dig[k]*two) div 10;
2408  end;
2409round_decimals:=(a+1) div 2;
2410end;
2411
2412@ Conversely, here is a procedure analogous to |print_int|. If the output
2413of this procedure is subsequently read by \TeX\ and converted by the
2414|round_decimals| routine above, it turns out that the original value will
2415be reproduced exactly; the ``simplest'' such decimal number is output,
2416but there is always at least one digit following the decimal point.
2417
2418The invariant relation in the \&{repeat} loop is that a sequence of
2419decimal digits yet to be printed will yield the original number if and only if
2420they form a fraction~$f$ in the range $s-\delta\L10\cdot2^{16}f<s$.
2421We can stop if and only if $f=0$ satisfies this condition; the loop will
2422terminate before $s$ can possibly become zero.
2423
2424@p procedure print_scaled(@!s:scaled); {prints scaled real, rounded to five
2425  digits}
2426var delta:scaled; {amount of allowable inaccuracy}
2427begin if s<0 then
2428  begin print_char("-"); negate(s); {print the sign, if negative}
2429  end;
2430print_int(s div unity); {print the integer part}
2431print_char(".");
2432s:=10*(s mod unity)+5; delta:=10;
2433repeat if delta>unity then s:=s+@'100000-50000; {round the last digit}
2434print_char("0"+(s div unity)); s:=10*(s mod unity); delta:=delta*10;
2435until s<=delta;
2436end;
2437
2438@ Physical sizes that a \TeX\ user specifies for portions of documents are
2439represented internally as scaled points. Thus, if we define an `sp' (scaled
2440@^sp@>
2441point) as a unit equal to $2^{-16}$ printer's points, every dimension
2442inside of \TeX\ is an integer number of sp. There are exactly
24434,736,286.72 sp per inch.  Users are not allowed to specify dimensions
2444larger than $2^{30}-1$ sp, which is a distance of about 18.892 feet (5.7583
2445meters); two such quantities can be added without overflow on a 32-bit
2446computer.
2447
2448The present implementation of \TeX\ does not check for overflow when
2449@^overflow in arithmetic@>
2450dimensions are added or subtracted. This could be done by inserting a
2451few dozen tests of the form `\ignorespaces|if x>=@'10000000000 then
2452@t\\{report\_overflow}@>|', but the chance of overflow is so remote that
2453such tests do not seem worthwhile.
2454
2455\TeX\ needs to do only a few arithmetic operations on scaled quantities,
2456other than addition and subtraction, and the following subroutines do most of
2457the work. A single computation might use several subroutine calls, and it is
2458desirable to avoid producing multiple error messages in case of arithmetic
2459overflow; so the routines set the global variable |arith_error| to |true|
2460instead of reporting errors directly to the user. Another global variable,
2461|remainder|, holds the remainder after a division.
2462
2463@<Glob...@>=
2464@!arith_error:boolean; {has arithmetic overflow occurred recently?}
2465@!remainder:scaled; {amount subtracted to get an exact division}
2466
2467@ The first arithmetical subroutine we need computes $nx+y$, where |x|
2468and~|y| are |scaled| and |n| is an integer. We will also use it to
2469multiply integers.
2470
2471@d nx_plus_y(#)==mult_and_add(#,@'7777777777)
2472@d mult_integers(#)==mult_and_add(#,0,@'17777777777)
2473
2474@p function mult_and_add(@!n:integer;@!x,@!y,@!max_answer:scaled):scaled;
2475begin if n<0 then
2476  begin negate(x); negate(n);
2477  end;
2478if n=0 then mult_and_add:=y
2479else if ((x<=(max_answer-y) div n)and(-x<=(max_answer+y) div n)) then
2480  mult_and_add:=n*x+y
2481else  begin arith_error:=true; mult_and_add:=0;
2482  end;
2483end;
2484
2485@ We also need to divide scaled dimensions by integers.
2486
2487@p function x_over_n(@!x:scaled;@!n:integer):scaled;
2488var negative:boolean; {should |remainder| be negated?}
2489begin negative:=false;
2490if n=0 then
2491  begin arith_error:=true; x_over_n:=0; remainder:=x;
2492  end
2493else  begin if n<0 then
2494    begin negate(x); negate(n); negative:=true;
2495    end;
2496  if x>=0 then
2497    begin x_over_n:=x div n; remainder:=x mod n;
2498    end
2499  else  begin x_over_n:=-((-x) div n); remainder:=-((-x) mod n);
2500    end;
2501  end;
2502if negative then negate(remainder);
2503end;
2504
2505@ Then comes the multiplication of a scaled number by a fraction |n/d|,
2506where |n| and |d| are nonnegative integers |<=@t$2^{16}$@>| and |d| is
2507positive. It would be too dangerous to multiply by~|n| and then divide
2508by~|d|, in separate operations, since overflow might well occur; and it
2509would be too inaccurate to divide by |d| and then multiply by |n|. Hence
2510this subroutine simulates 1.5-precision arithmetic.
2511
2512@p function xn_over_d(@!x:scaled; @!n,@!d:integer):scaled;
2513var positive:boolean; {was |x>=0|?}
2514@!t,@!u,@!v:nonnegative_integer; {intermediate quantities}
2515begin if x>=0 then positive:=true
2516else  begin negate(x); positive:=false;
2517  end;
2518t:=(x mod @'100000)*n;
2519u:=(x div @'100000)*n+(t div @'100000);
2520v:=(u mod d)*@'100000 + (t mod @'100000);
2521if u div d>=@'100000 then arith_error:=true
2522else u:=@'100000*(u div d) + (v div d);
2523if positive then
2524  begin xn_over_d:=u; remainder:=v mod d;
2525  end
2526else  begin xn_over_d:=-u; remainder:=-(v mod d);
2527  end;
2528end;
2529
2530@ The next subroutine is used to compute the ``badness'' of glue, when a
2531total~|t| is supposed to be made from amounts that sum to~|s|.  According
2532to {\sl The \TeX book}, the badness of this situation is $100(t/s)^3$;
2533however, badness is simply a heuristic, so we need not squeeze out the
2534last drop of accuracy when computing it. All we really want is an
2535approximation that has similar properties.
2536@:TeXbook}{\sl The \TeX book@>
2537
2538The actual method used to compute the badness is easier to read from the
2539program than to describe in words. It produces an integer value that is a
2540reasonably close approximation to $100(t/s)^3$, and all implementations
2541of \TeX\ should use precisely this method. Any badness of $2^{13}$ or more is
2542treated as infinitely bad, and represented by 10000.
2543
2544It is not difficult to prove that $$\hbox{|badness(t+1,s)>=badness(t,s)
2545>=badness(t,s+1)|}.$$ The badness function defined here is capable of
2546computing at most 1095 distinct values, but that is plenty.
2547
2548@d inf_bad = 10000 {infinitely bad value}
2549
2550@p function badness(@!t,@!s:scaled):halfword; {compute badness, given |t>=0|}
2551var r:integer; {approximation to $\alpha t/s$, where $\alpha^3\approx
2552  100\cdot2^{18}$}
2553begin if t=0 then badness:=0
2554else if s<=0 then badness:=inf_bad
2555else  begin if t<=7230584 then  r:=(t*297) div s {$297^3=99.94\times2^{18}$}
2556  else if s>=1663497 then r:=t div (s div 297)
2557  else r:=t;
2558  if r>1290 then badness:=inf_bad {$1290^3<2^{31}<1291^3$}
2559  else badness:=(r*r*r+@'400000) div @'1000000;
2560  end; {that was $r^3/2^{18}$, rounded to the nearest integer}
2561end;
2562
2563@ When \TeX\ ``packages'' a list into a box, it needs to calculate the
2564proportionality ratio by which the glue inside the box should stretch
2565or shrink. This calculation does not affect \TeX's decision making,
2566so the precise details of rounding, etc., in the glue calculation are not
2567of critical importance for the consistency of results on different computers.
2568
2569We shall use the type |glue_ratio| for such proportionality ratios.
2570A glue ratio should take the same amount of memory as an
2571|integer| (usually 32 bits) if it is to blend smoothly with \TeX's
2572other data structures. Thus |glue_ratio| should be equivalent to
2573|short_real| in some implementations of \PASCAL. Alternatively,
2574it is possible to deal with glue ratios using nothing but fixed-point
2575arithmetic; see {\sl TUGboat \bf3},1 (March 1982), 10--27. (But the
2576routines cited there must be modified to allow negative glue ratios.)
2577@^system dependencies@>
2578
2579@d set_glue_ratio_zero(#) == #:=0.0 {store the representation of zero ratio}
2580@d set_glue_ratio_one(#) == #:=1.0 {store the representation of unit ratio}
2581@d float(#) == # {convert from |glue_ratio| to type |real|}
2582@d unfloat(#) == # {convert from |real| to type |glue_ratio|}
2583@d float_constant(#) == #.0 {convert |integer| constant to |real|}
2584
2585@<Types...@>=
2586@!glue_ratio=real; {one-word representation of a glue expansion factor}
2587
2588@* \[8] Packed data.
2589In order to make efficient use of storage space, \TeX\ bases its major data
2590structures on a |memory_word|, which contains either a (signed) integer,
2591possibly scaled, or a (signed) |glue_ratio|, or a small number of
2592fields that are one half or one quarter of the size used for storing
2593integers.
2594
2595If |x| is a variable of type |memory_word|, it contains up to four
2596fields that can be referred to as follows:
2597$$\vbox{\halign{\hfil#&#\hfil&#\hfil\cr
2598|x|&.|int|&(an |integer|)\cr
2599|x|&.|sc|\qquad&(a |scaled| integer)\cr
2600|x|&.|gr|&(a |glue_ratio|)\cr
2601|x.hh.lh|, |x.hh|&.|rh|&(two halfword fields)\cr
2602|x.hh.b0|, |x.hh.b1|, |x.hh|&.|rh|&(two quarterword fields, one halfword
2603  field)\cr
2604|x.qqqq.b0|, |x.qqqq.b1|, |x.qqqq|&.|b2|, |x.qqqq.b3|\hskip-100pt
2605  &\qquad\qquad\qquad(four quarterword fields)\cr}}$$
2606This is somewhat cumbersome to write, and not very readable either, but
2607macros will be used to make the notation shorter and more transparent.
2608The \PASCAL\ code below gives a formal definition of |memory_word| and
2609its subsidiary types, using packed variant records. \TeX\ makes no
2610assumptions about the relative positions of the fields within a word.
2611
2612Since we are assuming 32-bit integers, a halfword must contain at least
261316 bits, and a quarterword must contain at least 8 bits.
2614@^system dependencies@>
2615But it doesn't hurt to have more bits; for example, with enough 36-bit
2616words you might be able to have |mem_max| as large as 262142, which is
2617eight times as much memory as anybody had during the first four years of
2618\TeX's existence.
2619
2620N.B.: Valuable memory space will be dreadfully wasted unless \TeX\ is compiled
2621by a \PASCAL\ that packs all of the |memory_word| variants into
2622the space of a single integer. This means, for example, that |glue_ratio|
2623words should be |short_real| instead of |real| on some computers. Some
2624\PASCAL\ compilers will pack an integer whose subrange is `|0..255|' into
2625an eight-bit field, but others insist on allocating space for an additional
2626sign bit; on such systems you can get 256 values into a quarterword only
2627if the subrange is `|-128..127|'.
2628
2629The present implementation tries to accommodate as many variations as possible,
2630so it makes few assumptions. If integers having the subrange
2631`|min_quarterword..max_quarterword|' can be packed into a quarterword,
2632and if integers having the subrange `|min_halfword..max_halfword|'
2633can be packed into a halfword, everything should work satisfactorily.
2634
2635It is usually most efficient to have |min_quarterword=min_halfword=0|,
2636so one should try to achieve this unless it causes a severe problem.
2637The values defined here are recommended for most 32-bit computers.
2638
2639@d min_quarterword=0 {smallest allowable value in a |quarterword|}
2640@d max_quarterword=@"FFFF {largest allowable value in a |quarterword|}
2641@d min_halfword==-@"FFFFFFF {smallest allowable value in a |halfword|}
2642@d max_halfword==@"3FFFFFFF {largest allowable value in a |halfword|}
2643
2644@ Here are the inequalities that the quarterword and halfword values
2645must satisfy (or rather, the inequalities that they mustn't satisfy):
2646
2647@<Check the ``constant''...@>=
2648init if (mem_min<>mem_bot)or(mem_max<>mem_top) then bad:=10;@+tini@;@/
2649if (mem_min>mem_bot)or(mem_max<mem_top) then bad:=10;
2650if (min_quarterword>0)or(max_quarterword<@"7FFF) then bad:=11;
2651if (min_halfword>0)or(max_halfword<@"3FFFFFFF) then bad:=12;
2652if (min_quarterword<min_halfword)or@|
2653  (max_quarterword>max_halfword) then bad:=13;
2654if (mem_min<min_halfword)or(mem_max>=max_halfword)or@|
2655  (mem_bot-mem_min>max_halfword+1) then bad:=14;
2656if (font_base<min_quarterword)or(font_max>max_quarterword) then bad:=15;
2657if font_max>font_base+256 then bad:=16;
2658if (save_size>max_halfword)or(max_strings>max_halfword) then bad:=17;
2659if buf_size>max_halfword then bad:=18;
2660if max_quarterword-min_quarterword<@"FFFF then bad:=19;
2661
2662@ The operation of adding or subtracting |min_quarterword| occurs quite
2663frequently in \TeX, so it is convenient to abbreviate this operation
2664by using the macros |qi| and |qo| for input and output to and from
2665quarterword format.
2666
2667The inner loop of \TeX\ will run faster with respect to compilers
2668that don't optimize expressions like `|x+0|' and `|x-0|', if these
2669macros are simplified in the obvious way when |min_quarterword=0|.
2670@^inner loop@>@^system dependencies@>
2671
2672@d qi(#)==#+min_quarterword
2673  {to put an |eight_bits| item into a quarterword}
2674@d qo(#)==#-min_quarterword
2675  {to take an |eight_bits| item out of a quarterword}
2676@d hi(#)==#+min_halfword
2677  {to put a sixteen-bit item into a halfword}
2678@d ho(#)==#-min_halfword
2679  {to take a sixteen-bit item from a halfword}
2680
2681@ The reader should study the following definitions closely:
2682@^system dependencies@>
2683
2684@d sc==int {|scaled| data is equivalent to |integer|}
2685
2686@<Types...@>=
2687@!quarterword = min_quarterword..max_quarterword; {1/4 of a word}
2688@!halfword=min_halfword..max_halfword; {1/2 of a word}
2689@!two_choices = 1..2; {used when there are two variants in a record}
2690@!four_choices = 1..4; {used when there are four variants in a record}
2691@!two_halves = packed record@;@/
2692  @!rh:halfword;
2693  case two_choices of
2694  1: (@!lh:halfword);
2695  2: (@!b0:quarterword; @!b1:quarterword);
2696  end;
2697@!four_quarters = packed record@;@/
2698  @!b0:quarterword;
2699  @!b1:quarterword;
2700  @!b2:quarterword;
2701  @!b3:quarterword;
2702  end;
2703@!memory_word = record@;@/
2704  case four_choices of
2705  1: (@!int:integer);
2706  2: (@!gr:glue_ratio);
2707  3: (@!hh:two_halves);
2708  4: (@!qqqq:four_quarters);
2709  end;
2710@!word_file = gzFile;
2711
2712@ When debugging, we may want to print a |memory_word| without knowing
2713what type it is; so we print it in all modes.
2714@^dirty \PASCAL@>@^debugging@>
2715
2716@p @!debug procedure print_word(@!w:memory_word);
2717  {prints |w| in all ways}
2718begin print_int(w.int); print_char(" ");@/
2719print_scaled(w.sc); print_char(" ");@/
2720print_scaled(round(unity*float(w.gr))); print_ln;@/
2721@^real multiplication@>
2722print_int(w.hh.lh); print_char("="); print_int(w.hh.b0); print_char(":");
2723print_int(w.hh.b1); print_char(";"); print_int(w.hh.rh); print_char(" ");@/
2724print_int(w.qqqq.b0); print_char(":"); print_int(w.qqqq.b1); print_char(":");
2725print_int(w.qqqq.b2); print_char(":"); print_int(w.qqqq.b3);
2726end;
2727gubed
2728
2729@* \[9] Dynamic memory allocation.
2730The \TeX\ system does nearly all of its own memory allocation, so that it
2731can readily be transported into environments that do not have automatic
2732facilities for strings, garbage collection, etc., and so that it can be in
2733control of what error messages the user receives. The dynamic storage
2734requirements of \TeX\ are handled by providing a large array |mem| in
2735which consecutive blocks of words are used as nodes by the \TeX\ routines.
2736
2737Pointer variables are indices into this array, or into another array
2738called |eqtb| that will be explained later. A pointer variable might
2739also be a special flag that lies outside the bounds of |mem|, so we
2740allow pointers to assume any |halfword| value. The minimum halfword
2741value represents a null pointer. \TeX\ does not assume that |mem[null]| exists.
2742
2743@d pointer==halfword {a flag or a location in |mem| or |eqtb|}
2744@d null==min_halfword {the null pointer}
2745
2746@<Glob...@>=
2747@!temp_ptr:pointer; {a pointer variable for occasional emergency use}
2748
2749@ The |mem| array is divided into two regions that are allocated separately,
2750but the dividing line between these two regions is not fixed; they grow
2751together until finding their ``natural'' size in a particular job.
2752Locations less than or equal to |lo_mem_max| are used for storing
2753variable-length records consisting of two or more words each. This region
2754is maintained using an algorithm similar to the one described in exercise
27552.5--19 of {\sl The Art of Computer Programming}. However, no size field
2756appears in the allocated nodes; the program is responsible for knowing the
2757relevant size when a node is freed. Locations greater than or equal to
2758|hi_mem_min| are used for storing one-word records; a conventional
2759\.{AVAIL} stack is used for allocation in this region.
2760
2761Locations of |mem| between |mem_bot| and |mem_top| may be dumped as part
2762of preloaded format files, by the \.{INITEX} preprocessor.
2763@.INITEX@>
2764Production versions of \TeX\ may extend the memory at both ends in order to
2765provide more space; locations between |mem_min| and |mem_bot| are always
2766used for variable-size nodes, and locations between |mem_top| and |mem_max|
2767are always used for single-word nodes.
2768
2769The key pointers that govern |mem| allocation have a prescribed order:
2770$$\advance\thickmuskip-2mu
2771\hbox{|null<=mem_min<=mem_bot<lo_mem_max<
2772  hi_mem_min<mem_top<=mem_end<=mem_max|.}$$
2773
2774Empirical tests show that the present implementation of \TeX\ tends to
2775spend about 9\pct! of its running time allocating nodes, and about 6\pct!
2776deallocating them after their use.
2777
2778@<Glob...@>=
2779@!mem : array[mem_min..mem_max] of memory_word; {the big dynamic storage area}
2780@!lo_mem_max : pointer; {the largest location of variable-size memory in use}
2781@!hi_mem_min : pointer; {the smallest location of one-word memory in use}
2782
2783@ In order to study the memory requirements of particular applications, it
2784is possible to prepare a version of \TeX\ that keeps track of current and
2785maximum memory usage. When code between the delimiters |@!stat| $\ldots$
2786|tats| is not ``commented out,'' \TeX\ will run a bit slower but it will
2787report these statistics when |tracing_stats| is sufficiently large.
2788
2789@<Glob...@>=
2790@!var_used, @!dyn_used : integer; {how much memory is in use}
2791
2792@ Let's consider the one-word memory region first, since it's the
2793simplest. The pointer variable |mem_end| holds the highest-numbered location
2794of |mem| that has ever been used. The free locations of |mem| that
2795occur between |hi_mem_min| and |mem_end|, inclusive, are of type
2796|two_halves|, and we write |info(p)| and |link(p)| for the |lh|
2797and |rh| fields of |mem[p]| when it is of this type. The single-word
2798free locations form a linked list
2799$$|avail|,\;\hbox{|link(avail)|},\;\hbox{|link(link(avail))|},\;\ldots$$
2800terminated by |null|.
2801
2802@d link(#) == mem[#].hh.rh {the |link| field of a memory word}
2803@d info(#) == mem[#].hh.lh {the |info| field of a memory word}
2804
2805@<Glob...@>=
2806@!avail : pointer; {head of the list of available one-word nodes}
2807@!mem_end : pointer; {the last one-word node used in |mem|}
2808
2809@ If memory is exhausted, it might mean that the user has forgotten
2810a right brace. We will define some procedures later that try to help
2811pinpoint the trouble.
2812
2813@p @<Declare the procedure called |show_token_list|@>@/
2814@<Declare the procedure called |runaway|@>
2815
2816@ The function |get_avail| returns a pointer to a new one-word node whose
2817|link| field is null. However, \TeX\ will halt if there is no more room left.
2818@^inner loop@>
2819
2820If the available-space list is empty, i.e., if |avail=null|,
2821we try first to increase |mem_end|. If that cannot be done, i.e., if
2822|mem_end=mem_max|, we try to decrease |hi_mem_min|. If that cannot be
2823done, i.e., if |hi_mem_min=lo_mem_max+1|, we have to quit.
2824
2825@p function get_avail : pointer; {single-word node allocation}
2826var p:pointer; {the new node being got}
2827begin p:=avail; {get top location in the |avail| stack}
2828if p<>null then avail:=link(avail) {and pop it off}
2829else if mem_end<mem_max then {or go into virgin territory}
2830  begin incr(mem_end); p:=mem_end;
2831  end
2832else   begin decr(hi_mem_min); p:=hi_mem_min;
2833  if hi_mem_min<=lo_mem_max then
2834    begin runaway; {if memory is exhausted, display possible runaway text}
2835    overflow("main memory size",mem_max+1-mem_min);
2836      {quit; all one-word nodes are busy}
2837@:TeX capacity exceeded main memory size}{\quad main memory size@>
2838    end;
2839  end;
2840link(p):=null; {provide an oft-desired initialization of the new node}
2841@!stat incr(dyn_used);@+tats@;{maintain statistics}
2842get_avail:=p;
2843end;
2844
2845@ Conversely, a one-word node is recycled by calling |free_avail|.
2846This routine is part of \TeX's ``inner loop,'' so we want it to be fast.
2847@^inner loop@>
2848
2849@d free_avail(#)== {single-word node liberation}
2850  begin link(#):=avail; avail:=#;
2851  @!stat decr(dyn_used);@+tats@/
2852  end
2853
2854@ There's also a |fast_get_avail| routine, which saves the procedure-call
2855overhead at the expense of extra programming. This routine is used in
2856the places that would otherwise account for the most calls of |get_avail|.
2857@^inner loop@>
2858
2859@d fast_get_avail(#)==@t@>@;@/
2860  begin #:=avail; {avoid |get_avail| if possible, to save time}
2861  if #=null then #:=get_avail
2862  else  begin avail:=link(#); link(#):=null;
2863    @!stat incr(dyn_used);@+tats@/
2864    end;
2865  end
2866
2867@ The procedure |flush_list(p)| frees an entire linked list of
2868one-word nodes that starts at position |p|.
2869@^inner loop@>
2870
2871@p procedure flush_list(@!p:pointer); {makes list of single-word nodes
2872  available}
2873var @!q,@!r:pointer; {list traversers}
2874begin if p<>null then
2875  begin r:=p;
2876  repeat q:=r; r:=link(r); @!stat decr(dyn_used);@+tats@/
2877  until r=null; {now |q| is the last node on the list}
2878  link(q):=avail; avail:=p;
2879  end;
2880end;
2881
2882@ The available-space list that keeps track of the variable-size portion
2883of |mem| is a nonempty, doubly-linked circular list of empty nodes,
2884pointed to by the roving pointer |rover|.
2885
2886Each empty node has size 2 or more; the first word contains the special
2887value |max_halfword| in its |link| field and the size in its |info| field;
2888the second word contains the two pointers for double linking.
2889
2890Each nonempty node also has size 2 or more. Its first word is of type
2891|two_halves|\kern-1pt, and its |link| field is never equal to |max_halfword|.
2892Otherwise there is complete flexibility with respect to the contents
2893of its other fields and its other words.
2894
2895(We require |mem_max<max_halfword| because terrible things can happen
2896when |max_halfword| appears in the |link| field of a nonempty node.)
2897
2898@d empty_flag == max_halfword {the |link| of an empty variable-size node}
2899@d is_empty(#) == (link(#)=empty_flag) {tests for empty node}
2900@d node_size == info {the size field in empty variable-size nodes}
2901@d llink(#) == info(#+1) {left link in doubly-linked list of empty nodes}
2902@d rlink(#) == link(#+1) {right link in doubly-linked list of empty nodes}
2903
2904@<Glob...@>=
2905@!rover : pointer; {points to some node in the list of empties}
2906
2907@ A call to |get_node| with argument |s| returns a pointer to a new node
2908of size~|s|, which must be 2~or more. The |link| field of the first word
2909of this new node is set to null. An overflow stop occurs if no suitable
2910space exists.
2911
2912If |get_node| is called with $s=2^{30}$, it simply merges adjacent free
2913areas and returns the value |max_halfword|.
2914
2915@p function get_node(@!s:integer):pointer; {variable-size node allocation}
2916label found,exit,restart;
2917var p:pointer; {the node currently under inspection}
2918@!q:pointer; {the node physically after node |p|}
2919@!r:integer; {the newly allocated node, or a candidate for this honor}
2920@!t:integer; {temporary register}
2921begin restart: p:=rover; {start at some free node in the ring}
2922repeat @<Try to allocate within node |p| and its physical successors,
2923  and |goto found| if allocation was possible@>;
2924@^inner loop@>
2925p:=rlink(p); {move to the next node in the ring}
2926until p=rover; {repeat until the whole list has been traversed}
2927if s=@'10000000000 then
2928  begin get_node:=max_halfword; return;
2929  end;
2930if lo_mem_max+2<hi_mem_min then if lo_mem_max+2<=mem_bot+max_halfword then
2931  @<Grow more variable-size memory and |goto restart|@>;
2932overflow("main memory size",mem_max+1-mem_min);
2933  {sorry, nothing satisfactory is left}
2934@:TeX capacity exceeded main memory size}{\quad main memory size@>
2935found: link(r):=null; {this node is now nonempty}
2936@!stat var_used:=var_used+s; {maintain usage statistics}
2937tats@;@/
2938get_node:=r;
2939exit:end;
2940
2941@ The lower part of |mem| grows by 1000 words at a time, unless
2942we are very close to going under. When it grows, we simply link
2943a new node into the available-space list. This method of controlled
2944growth helps to keep the |mem| usage consecutive when \TeX\ is
2945implemented on ``virtual memory'' systems.
2946@^virtual memory@>
2947
2948@<Grow more variable-size memory and |goto restart|@>=
2949begin if hi_mem_min-lo_mem_max>=1998 then t:=lo_mem_max+1000
2950else t:=lo_mem_max+1+(hi_mem_min-lo_mem_max) div 2;
2951  {|lo_mem_max+2<=t<hi_mem_min|}
2952p:=llink(rover); q:=lo_mem_max; rlink(p):=q; llink(rover):=q;@/
2953if t>mem_bot+max_halfword then t:=mem_bot+max_halfword;
2954rlink(q):=rover; llink(q):=p; link(q):=empty_flag; node_size(q):=t-lo_mem_max;@/
2955lo_mem_max:=t; link(lo_mem_max):=null; info(lo_mem_max):=null;
2956rover:=q; goto restart;
2957end
2958
2959@ Empirical tests show that the routine in this section performs a
2960node-merging operation about 0.75 times per allocation, on the average,
2961after which it finds that |r>p+1| about 95\pct! of the time.
2962
2963@<Try to allocate...@>=
2964q:=p+node_size(p); {find the physical successor}
2965@^inner loop@>
2966while is_empty(q) do {merge node |p| with node |q|}
2967  begin t:=rlink(q);
2968  if q=rover then rover:=t;
2969  llink(t):=llink(q); rlink(llink(q)):=t;@/
2970  q:=q+node_size(q);
2971  end;
2972r:=q-s;
2973if r>p+1 then @<Allocate from the top of node |p| and |goto found|@>;
2974if r=p then if rlink(p)<>p then
2975  @<Allocate entire node |p| and |goto found|@>;
2976node_size(p):=q-p {reset the size in case it grew}
2977
2978@ @<Allocate from the top...@>=
2979begin node_size(p):=r-p; {store the remaining size}
2980@^inner loop@>
2981rover:=p; {start searching here next time}
2982goto found;
2983end
2984
2985@ Here we delete node |p| from the ring, and let |rover| rove around.
2986
2987@<Allocate entire...@>=
2988begin rover:=rlink(p); t:=llink(p);
2989llink(rover):=t; rlink(t):=rover;
2990goto found;
2991end
2992
2993@ Conversely, when some variable-size node |p| of size |s| is no longer needed,
2994the operation |free_node(p,s)| will make its words available, by inserting
2995|p| as a new empty node just before where |rover| now points.
2996@^inner loop@>
2997
2998@p procedure free_node(@!p:pointer; @!s:halfword); {variable-size node
2999  liberation}
3000var q:pointer; {|llink(rover)|}
3001begin node_size(p):=s; link(p):=empty_flag;
3002q:=llink(rover); llink(p):=q; rlink(p):=rover; {set both links}
3003llink(rover):=p; rlink(q):=p; {insert |p| into the ring}
3004@!stat var_used:=var_used-s;@+tats@;{maintain statistics}
3005end;
3006
3007@ Just before \.{INITEX} writes out the memory, it sorts the doubly linked
3008available space list. The list is probably very short at such times, so a
3009simple insertion sort is used. The smallest available location will be
3010pointed to by |rover|, the next-smallest by |rlink(rover)|, etc.
3011
3012@p @!init procedure sort_avail; {sorts the available variable-size nodes
3013  by location}
3014var p,@!q,@!r: pointer; {indices into |mem|}
3015@!old_rover:pointer; {initial |rover| setting}
3016begin p:=get_node(@'10000000000); {merge adjacent free areas}
3017p:=rlink(rover); rlink(rover):=max_halfword; old_rover:=rover;
3018while p<>old_rover do @<Sort \(p)|p| into the list starting at |rover|
3019  and advance |p| to |rlink(p)|@>;
3020p:=rover;
3021while rlink(p)<>max_halfword do
3022  begin llink(rlink(p)):=p; p:=rlink(p);
3023  end;
3024rlink(p):=rover; llink(rover):=p;
3025end;
3026tini
3027
3028@ The following |while| loop is guaranteed to
3029terminate, since the list that starts at
3030|rover| ends with |max_halfword| during the sorting procedure.
3031
3032@<Sort \(p)|p|...@>=
3033if p<rover then
3034  begin q:=p; p:=rlink(q); rlink(q):=rover; rover:=q;
3035  end
3036else  begin q:=rover;
3037  while rlink(q)<p do q:=rlink(q);
3038  r:=rlink(p); rlink(p):=rlink(q); rlink(q):=p; p:=r;
3039  end
3040
3041@* \[10] Data structures for boxes and their friends.
3042From the computer's standpoint, \TeX's chief mission is to create
3043horizontal and vertical lists. We shall now investigate how the elements
3044of these lists are represented internally as nodes in the dynamic memory.
3045
3046A horizontal or vertical list is linked together by |link| fields in
3047the first word of each node. Individual nodes represent boxes, glue,
3048penalties, or special things like discretionary hyphens; because of this
3049variety, some nodes are longer than others, and we must distinguish different
3050kinds of nodes. We do this by putting a `|type|' field in the first word,
3051together with the link and an optional `|subtype|'.
3052
3053@d type(#) == mem[#].hh.b0 {identifies what kind of node this is}
3054@d subtype(#) == mem[#].hh.b1 {secondary identification in some cases}
3055
3056@ A |@!char_node|, which represents a single character, is the most important
3057kind of node because it accounts for the vast majority of all boxes.
3058Special precautions are therefore taken to ensure that a |char_node| does
3059not take up much memory space. Every such node is one word long, and in fact
3060it is identifiable by this property, since other kinds of nodes have at least
3061two words, and they appear in |mem| locations less than |hi_mem_min|.
3062This makes it possible to omit the |type| field in a |char_node|, leaving
3063us room for two bytes that identify a |font| and a |character| within
3064that font.
3065
3066Note that the format of a |char_node| allows for up to 256 different
3067fonts and up to 256 characters per font; but most implementations will
3068probably limit the total number of fonts to fewer than 75 per job,
3069and most fonts will stick to characters whose codes are
3070less than 128 (since higher codes
3071are more difficult to access on most keyboards).
3072
3073Extensions of \TeX\ intended for oriental languages will need even more
3074than $256\times256$ possible characters, when we consider different sizes
3075@^oriental characters@>@^Chinese characters@>@^Japanese characters@>
3076and styles of type.  It is suggested that Chinese and Japanese fonts be
3077handled by representing such characters in two consecutive |char_node|
3078entries: The first of these has |font=font_base|, and its |link| points
3079to the second;
3080the second identifies the font and the character dimensions.
3081The saving feature about oriental characters is that most of them have
3082the same box dimensions. The |character| field of the first |char_node|
3083is a ``\\{charext}'' that distinguishes between graphic symbols whose
3084dimensions are identical for typesetting purposes. (See the \MF\ manual.)
3085Such an extension of \TeX\ would not be difficult; further details are
3086left to the reader.
3087
3088In order to make sure that the |character| code fits in a quarterword,
3089\TeX\ adds the quantity |min_quarterword| to the actual code.
3090
3091Character nodes appear only in horizontal lists, never in vertical lists.
3092
3093@d is_char_node(#) == (#>=hi_mem_min)
3094  {does the argument point to a |char_node|?}
3095@d font == type {the font code in a |char_node|}
3096@d character == subtype {the character code in a |char_node|}
3097
3098@ An |hlist_node| stands for a box that was made from a horizontal list.
3099Each |hlist_node| is seven words long, and contains the following fields
3100(in addition to the mandatory |type| and |link|, which we shall not
3101mention explicitly when discussing the other node types): The |height| and
3102|width| and |depth| are scaled integers denoting the dimensions of the
3103box.  There is also a |shift_amount| field, a scaled integer indicating
3104how much this box should be lowered (if it appears in a horizontal list),
3105or how much it should be moved to the right (if it appears in a vertical
3106list). There is a |list_ptr| field, which points to the beginning of the
3107list from which this box was fabricated; if |list_ptr| is |null|, the box
3108is empty. Finally, there are three fields that represent the setting of
3109the glue:  |glue_set(p)| is a word of type |glue_ratio| that represents
3110the proportionality constant for glue setting; |glue_sign(p)| is
3111|stretching| or |shrinking| or |normal| depending on whether or not the
3112glue should stretch or shrink or remain rigid; and |glue_order(p)|
3113specifies the order of infinity to which glue setting applies (|normal|,
3114|fil|, |fill|, or |filll|). The |subtype| field is not used in \TeX.
3115In \eTeX\ the |subtype| field records the box direction mode |box_lr|.
3116
3117@d hlist_node=0 {|type| of hlist nodes}
3118@d box_node_size=7 {number of words to allocate for a box node}
3119@d width_offset=1 {position of |width| field in a box node}
3120@d depth_offset=2 {position of |depth| field in a box node}
3121@d height_offset=3 {position of |height| field in a box node}
3122@d width(#) == mem[#+width_offset].sc {width of the box, in sp}
3123@d depth(#) == mem[#+depth_offset].sc {depth of the box, in sp}
3124@d height(#) == mem[#+height_offset].sc {height of the box, in sp}
3125@d shift_amount(#) == mem[#+4].sc {repositioning distance, in sp}
3126@d list_offset=5 {position of |list_ptr| field in a box node}
3127@d list_ptr(#) == link(#+list_offset) {beginning of the list inside the box}
3128@d glue_order(#) == subtype(#+list_offset) {applicable order of infinity}
3129@d glue_sign(#) == type(#+list_offset) {stretching or shrinking}
3130@d normal=0 {the most common case when several cases are named}
3131@d stretching = 1 {glue setting applies to the stretch components}
3132@d shrinking = 2 {glue setting applies to the shrink components}
3133@d glue_offset = 6 {position of |glue_set| in a box node}
3134@d glue_set(#) == mem[#+glue_offset].gr
3135  {a word of type |glue_ratio| for glue setting}
3136
3137@ The |new_null_box| function returns a pointer to an |hlist_node| in
3138which all subfields have the values corresponding to `\.{\\hbox\{\}}'.
3139The |subtype| field is set to |min_quarterword|, since that's the desired
3140|span_count| value if this |hlist_node| is changed to an |unset_node|.
3141
3142@p function new_null_box:pointer; {creates a new box node}
3143var p:pointer; {the new node}
3144begin p:=get_node(box_node_size); type(p):=hlist_node;
3145subtype(p):=min_quarterword;
3146width(p):=0; depth(p):=0; height(p):=0; shift_amount(p):=0; list_ptr(p):=null;
3147glue_sign(p):=normal; glue_order(p):=normal; set_glue_ratio_zero(glue_set(p));
3148new_null_box:=p;
3149end;
3150
3151@ A |vlist_node| is like an |hlist_node| in all respects except that it
3152contains a vertical list.
3153
3154@d vlist_node=1 {|type| of vlist nodes}
3155
3156@ A |rule_node| stands for a solid black rectangle; it has |width|,
3157|depth|, and |height| fields just as in an |hlist_node|. However, if
3158any of these dimensions is $-2^{30}$, the actual value will be determined
3159by running the rule up to the boundary of the innermost enclosing box.
3160This is called a ``running dimension.'' The |width| is never running in
3161an hlist; the |height| and |depth| are never running in a~vlist.
3162
3163@d rule_node=2 {|type| of rule nodes}
3164@d rule_node_size=4 {number of words to allocate for a rule node}
3165@d null_flag==-@'10000000000 {$-2^{30}$, signifies a missing item}
3166@d is_running(#) == (#=null_flag) {tests for a running dimension}
3167
3168@ A new rule node is delivered by the |new_rule| function. It
3169makes all the dimensions ``running,'' so you have to change the
3170ones that are not allowed to run.
3171
3172@p function new_rule:pointer;
3173var p:pointer; {the new node}
3174begin p:=get_node(rule_node_size); type(p):=rule_node;
3175subtype(p):=0; {the |subtype| is not used}
3176width(p):=null_flag; depth(p):=null_flag; height(p):=null_flag;
3177new_rule:=p;
3178end;
3179
3180@ Insertions are represented by |ins_node| records, where the |subtype|
3181indicates the corresponding box number. For example, `\.{\\insert 250}'
3182leads to an |ins_node| whose |subtype| is |250+min_quarterword|.
3183The |height| field of an |ins_node| is slightly misnamed; it actually holds
3184the natural height plus depth of the vertical list being inserted.
3185The |depth| field holds the |split_max_depth| to be used in case this
3186insertion is split, and the |split_top_ptr| points to the corresponding
3187|split_top_skip|. The |float_cost| field holds the |floating_penalty| that
3188will be used if this insertion floats to a subsequent page after a
3189split insertion of the same class.  There is one more field, the
3190|ins_ptr|, which points to the beginning of the vlist for the insertion.
3191
3192@d ins_node=3 {|type| of insertion nodes}
3193@d ins_node_size=5 {number of words to allocate for an insertion}
3194@d float_cost(#)==mem[#+1].int {the |floating_penalty| to be used}
3195@d ins_ptr(#)==info(#+4) {the vertical list to be inserted}
3196@d split_top_ptr(#)==link(#+4) {the |split_top_skip| to be used}
3197
3198@ A |mark_node| has a |mark_ptr| field that points to the reference count
3199of a token list that contains the user's \.{\\mark} text.
3200In addition there is a |mark_class| field that contains the mark class.
3201
3202@d mark_node=4 {|type| of a mark node}
3203@d small_node_size=2 {number of words to allocate for most node types}
3204@d mark_ptr(#)==link(#+1) {head of the token list for a mark}
3205@d mark_class(#)==info(#+1) {the mark class}
3206
3207@ An |adjust_node|, which occurs only in horizontal lists,
3208specifies material that will be moved out into the surrounding
3209vertical list; i.e., it is used to implement \TeX's `\.{\\vadjust}'
3210operation.  The |adjust_ptr| field points to the vlist containing this
3211material.
3212
3213@d adjust_node=5 {|type| of an adjust node}
3214@d adjust_pre == subtype  {|if subtype <>0| it is pre-adjustment}
3215@#{|append_list| is used to append a list to |tail|}
3216@d append_list(#) == begin link(tail):=link(#); append_list_end
3217@d append_list_end(#) == tail:=#; end
3218@d adjust_ptr(#)==mem[#+1].int
3219  {vertical list to be moved out of horizontal list}
3220
3221@ A |ligature_node|, which occurs only in horizontal lists, specifies
3222a character that was fabricated from the interaction of two or more
3223actual characters.  The second word of the node, which is called the
3224|lig_char| word, contains |font| and |character| fields just as in a
3225|char_node|. The characters that generated the ligature have not been
3226forgotten, since they are needed for diagnostic messages and for
3227hyphenation; the |lig_ptr| field points to a linked list of character
3228nodes for all original characters that have been deleted. (This list
3229might be empty if the characters that generated the ligature were
3230retained in other nodes.)
3231
3232The |subtype| field is 0, plus 2 and/or 1 if the original source of the
3233ligature included implicit left and/or right boundaries.
3234
3235@d ligature_node=6 {|type| of a ligature node}
3236@d lig_char(#)==#+1 {the word where the ligature is to be found}
3237@d lig_ptr(#)==link(lig_char(#)) {the list of characters}
3238
3239@ The |new_ligature| function creates a ligature node having given
3240contents of the |font|, |character|, and |lig_ptr| fields. We also have
3241a |new_lig_item| function, which returns a two-word node having a given
3242|character| field. Such nodes are used for temporary processing as ligatures
3243are being created.
3244
3245@p function new_ligature(@!f,@!c:quarterword; @!q:pointer):pointer;
3246var p:pointer; {the new node}
3247begin p:=get_node(small_node_size); type(p):=ligature_node;
3248font(lig_char(p)):=f; character(lig_char(p)):=c; lig_ptr(p):=q;
3249subtype(p):=0; new_ligature:=p;
3250end;
3251@#
3252function new_lig_item(@!c:quarterword):pointer;
3253var p:pointer; {the new node}
3254begin p:=get_node(small_node_size); character(p):=c; lig_ptr(p):=null;
3255new_lig_item:=p;
3256end;
3257
3258@ A |disc_node|, which occurs only in horizontal lists, specifies a
3259``dis\-cretion\-ary'' line break. If such a break occurs at node |p|, the text
3260that starts at |pre_break(p)| will precede the break, the text that starts at
3261|post_break(p)| will follow the break, and text that appears in the next
3262|replace_count(p)| nodes will be ignored. For example, an ordinary
3263discretionary hyphen, indicated by `\.{\\-}', yields a |disc_node| with
3264|pre_break| pointing to a |char_node| containing a hyphen, |post_break=null|,
3265and |replace_count=0|. All three of the discretionary texts must be
3266lists that consist entirely of character, kern, box, rule, and ligature nodes.
3267
3268If |pre_break(p)=null|, the |ex_hyphen_penalty| will be charged for this
3269break.  Otherwise the |hyphen_penalty| will be charged.  The texts will
3270actually be substituted into the list by the line-breaking algorithm if it
3271decides to make the break, and the discretionary node will disappear at
3272that time; thus, the output routine sees only discretionaries that were
3273not chosen.
3274
3275@d disc_node=7 {|type| of a discretionary node}
3276@d replace_count==subtype {how many subsequent nodes to replace}
3277@d pre_break==llink {text that precedes a discretionary break}
3278@d post_break==rlink {text that follows a discretionary break}
3279
3280@p function new_disc:pointer; {creates an empty |disc_node|}
3281var p:pointer; {the new node}
3282begin p:=get_node(small_node_size); type(p):=disc_node;
3283replace_count(p):=0; pre_break(p):=null; post_break(p):=null;
3284new_disc:=p;
3285end;
3286
3287@ A |whatsit_node| is a wild card reserved for extensions to \TeX. The
3288|subtype| field in its first word says what `\\{whatsit}' it is, and
3289implicitly determines the node size (which must be 2 or more) and the
3290format of the remaining words. When a |whatsit_node| is encountered
3291in a list, special actions are invoked; knowledgeable people who are
3292careful not to mess up the rest of \TeX\ are able to make \TeX\ do new
3293things by adding code at the end of the program. For example, there
3294might be a `\TeX nicolor' extension to specify different colors of ink,
3295@^extensions to \TeX@>
3296and the whatsit node might contain the desired parameters.
3297
3298The present implementation of \TeX\ treats the features associated with
3299`\.{\\write}' and `\.{\\special}' as if they were extensions, in order to
3300illustrate how such routines might be coded. We shall defer further
3301discussion of extensions until the end of this program.
3302
3303@d whatsit_node=8 {|type| of special extension nodes}
3304
3305@ To support ``native'' fonts, we build |native_word_node|s, which are variable
3306size whatsits.  These have the same |width|, |depth|, and |height| fields as a
3307|box_node|, at offsets 1-3, and then a word containing a size field for the
3308node, a font number, a length, and a glyph count.  Then there is a field
3309containing a C pointer to a glyph info array; this and the glyph count are set
3310by |set_native_metrics|.  Copying and freeing of these nodes needs to take
3311account of this!  This is followed by |2*length| bytes, for the actual
3312characters of the string (in UTF-16).
3313
3314So |native_node_size|, which does not include any space for the actual text, is
33156.
3316
33170-3 whatsits subtypes are used for open, write, close, special; 4 is language;
3318pdf\TeX\ uses up through 30-something, so we use subtypes starting from 40.
3319
3320There are also |glyph_node|s; these are like |native_word_node|s in having
3321|width|, |depth|, and |height| fields, but then they contain a glyph ID rather
3322than size and length fields, and there's no subsidiary C pointer.
3323
3324@d native_word_node=40 {|subtype| of whatsits that hold |native_font| words}
3325@d glyph_node=41 {|subtype| in whatsits that hold glyph numbers}
3326
3327@d native_node_size=6 {size of a |native_word| node (plus the actual chars) -- see also \.{xetex.h}}
3328@d glyph_node_size=5
3329@d native_size(#)==mem[#+4].qqqq.b0
3330@d native_font(#)==mem[#+4].qqqq.b1
3331@d native_length(#)==mem[#+4].qqqq.b2
3332@d native_glyph_count(#)==mem[#+4].qqqq.b3
3333@d native_glyph_info_ptr(#)==mem[#+5].ptr
3334@d native_glyph_info_size=10 {number of bytes of info per glyph: 16-bit glyph ID, 32-bit x and y coords}
3335@d native_glyph==native_length {in |glyph_node|s, we store the glyph number here}
3336
3337@d free_native_glyph_info(#) ==
3338  begin
3339    if native_glyph_info_ptr(#) <> null_ptr then begin
3340      libc_free(native_glyph_info_ptr(#));
3341      native_glyph_info_ptr(#):=null_ptr;
3342      native_glyph_count(#):=0;
3343    end
3344  end
3345
3346@p procedure copy_native_glyph_info(src:pointer; dest:pointer);
3347var glyph_count:integer;
3348begin
3349  if native_glyph_info_ptr(src) <> null_ptr then begin
3350    glyph_count:=native_glyph_count(src);
3351    native_glyph_info_ptr(dest):=xmalloc_array(char, glyph_count * native_glyph_info_size);
3352    memcpy(native_glyph_info_ptr(dest), native_glyph_info_ptr(src), glyph_count * native_glyph_info_size);
3353    native_glyph_count(dest):=glyph_count;
3354  end
3355end;
3356
3357@ Picture files are handled with nodes that include fields for the transform associated
3358with the picture, and a pathname for the picture file itself.
3359They also have
3360the |width|, |depth|, and |height| fields of a |box_node| at offsets 1-3. (|depth| will
3361always be zero, as it happens.)
3362
3363So |pic_node_size|, which does not include any space for the picture file pathname, is 7.
3364
3365A |pdf_node| is just like |pic_node|, but generate a different \.{XDV} file code.
3366
3367@d pic_node=42 {|subtype| in whatsits that hold picture file references}
3368@d pdf_node=43 {|subtype| in whatsits that hold PDF page references}
3369@#
3370@d pic_node_size=8 {must sync with \.{xetex.h}}
3371@d pic_path_length(#)==mem[#+4].hh.b0
3372@d pic_page(#)==mem[#+4].hh.b1
3373@d pic_transform1(#)==mem[#+5].hh.lh
3374@d pic_transform2(#)==mem[#+5].hh.rh
3375@d pic_transform3(#)==mem[#+6].hh.lh
3376@d pic_transform4(#)==mem[#+6].hh.rh
3377@d pic_transform5(#)==mem[#+7].hh.lh
3378@d pic_transform6(#)==mem[#+7].hh.rh
3379
3380@ A |math_node|, which occurs only in horizontal lists, appears before and
3381after mathematical formulas. The |subtype| field is |before| before the
3382formula and |after| after it. There is a |width| field, which represents
3383the amount of surrounding space inserted by \.{\\mathsurround}.
3384
3385In addition a |math_node| with |subtype>after| and |width=0| will be
3386(ab)used to record a regular |math_node| reinserted after being
3387discarded at a line break or one of the text direction primitives (
3388\.{\\beginL}, \.{\\endL}, \.{\\beginR}, and \.{\\endR} ).
3389
3390@d math_node=9 {|type| of a math node}
3391@d before=0 {|subtype| for math node that introduces a formula}
3392@d after=1 {|subtype| for math node that winds up a formula}
3393@#
3394@d M_code=2
3395@d begin_M_code=M_code+before {|subtype| for \.{\\beginM} node}
3396@d end_M_code=M_code+after {|subtype| for \.{\\endM} node}
3397@d L_code=4
3398@d begin_L_code=L_code+begin_M_code {|subtype| for \.{\\beginL} node}
3399@d end_L_code=L_code+end_M_code {|subtype| for \.{\\endL} node}
3400@d R_code=L_code+L_code
3401@d begin_R_code=R_code+begin_M_code {|subtype| for \.{\\beginR} node}
3402@d end_R_code=R_code+end_M_code {|subtype| for \.{\\endR} node}
3403@#
3404@d end_LR(#)==odd(subtype(#))
3405@d end_LR_type(#)==(L_code*(subtype(#) div L_code)+end_M_code)
3406@d begin_LR_type(#)==(#-after+before)
3407
3408@p function new_math(@!w:scaled;@!s:small_number):pointer;
3409var p:pointer; {the new node}
3410begin p:=get_node(small_node_size); type(p):=math_node;
3411subtype(p):=s; width(p):=w; new_math:=p;
3412end;
3413
3414@ \TeX\ makes use of the fact that |hlist_node|, |vlist_node|,
3415|rule_node|, |ins_node|, |mark_node|, |adjust_node|, |ligature_node|,
3416|disc_node|, |whatsit_node|, and |math_node| are at the low end of the
3417type codes, by permitting a break at glue in a list if and only if the
3418|type| of the previous node is less than |math_node|. Furthermore, a
3419node is discarded after a break if its type is |math_node| or~more.
3420
3421@d precedes_break(#)==(type(#)<math_node)
3422@d non_discardable(#)==(type(#)<math_node)
3423
3424@ A |glue_node| represents glue in a list. However, it is really only
3425a pointer to a separate glue specification, since \TeX\ makes use of the
3426fact that many essentially identical nodes of glue are usually present.
3427If |p| points to a |glue_node|, |glue_ptr(p)| points to
3428another packet of words that specify the stretch and shrink components, etc.
3429
3430Glue nodes also serve to represent leaders; the |subtype| is used to
3431distinguish between ordinary glue (which is called |normal|) and the three
3432kinds of leaders (which are called |a_leaders|, |c_leaders|, and |x_leaders|).
3433The |leader_ptr| field points to a rule node or to a box node containing the
3434leaders; it is set to |null| in ordinary glue nodes.
3435
3436Many kinds of glue are computed from \TeX's ``skip'' parameters, and
3437it is helpful to know which parameter has led to a particular glue node.
3438Therefore the |subtype| is set to indicate the source of glue, whenever
3439it originated as a parameter. We will be defining symbolic names for the
3440parameter numbers later (e.g., |line_skip_code=0|, |baseline_skip_code=1|,
3441etc.); it suffices for now to say that the |subtype| of parametric glue
3442will be the same as the parameter number, plus~one.
3443
3444In math formulas there are two more possibilities for the |subtype| in a
3445glue node: |mu_glue| denotes an \.{\\mskip} (where the units are scaled \.{mu}
3446instead of scaled \.{pt}); and |cond_math_glue| denotes the `\.{\\nonscript}'
3447feature that cancels the glue node immediately following if it appears
3448in a subscript.
3449
3450@d glue_node=10 {|type| of node that points to a glue specification}
3451@d cond_math_glue=98 {special |subtype| to suppress glue in the next node}
3452@d mu_glue=99 {|subtype| for math glue}
3453@d a_leaders=100 {|subtype| for aligned leaders}
3454@d c_leaders=101 {|subtype| for centered leaders}
3455@d x_leaders=102 {|subtype| for expanded leaders}
3456@d glue_ptr==llink {pointer to a glue specification}
3457@d leader_ptr==rlink {pointer to box or rule node for leaders}
3458
3459@ A glue specification has a halfword reference count in its first word,
3460@^reference counts@>
3461representing |null| plus the number of glue nodes that point to it (less one).
3462Note that the reference count appears in the same position as
3463the |link| field in list nodes; this is the field that is initialized
3464to |null| when a node is allocated, and it is also the field that is flagged
3465by |empty_flag| in empty nodes.
3466
3467Glue specifications also contain three |scaled| fields, for the |width|,
3468|stretch|, and |shrink| dimensions. Finally, there are two one-byte
3469fields called |stretch_order| and |shrink_order|; these contain the
3470orders of infinity (|normal|, |fil|, |fill|, or |filll|)
3471corresponding to the stretch and shrink values.
3472
3473@d glue_spec_size=4 {number of words to allocate for a glue specification}
3474@d glue_ref_count(#) == link(#) {reference count of a glue specification}
3475@d stretch(#) == mem[#+2].sc {the stretchability of this glob of glue}
3476@d shrink(#) == mem[#+3].sc {the shrinkability of this glob of glue}
3477@d stretch_order == type {order of infinity for stretching}
3478@d shrink_order == subtype {order of infinity for shrinking}
3479@d fil=1 {first-order infinity}
3480@d fill=2 {second-order infinity}
3481@d filll=3 {third-order infinity}
3482
3483@<Types...@>=
3484@!glue_ord=normal..filll; {infinity to the 0, 1, 2, or 3 power}
3485
3486@ Here is a function that returns a pointer to a copy of a glue spec.
3487The reference count in the copy is |null|, because there is assumed
3488to be exactly one reference to the new specification.
3489
3490@p function new_spec(@!p:pointer):pointer; {duplicates a glue specification}
3491var q:pointer; {the new spec}
3492begin q:=get_node(glue_spec_size);@/
3493mem[q]:=mem[p]; glue_ref_count(q):=null;@/
3494width(q):=width(p); stretch(q):=stretch(p); shrink(q):=shrink(p);
3495new_spec:=q;
3496end;
3497
3498@ And here's a function that creates a glue node for a given parameter
3499identified by its code number; for example,
3500|new_param_glue(line_skip_code)| returns a pointer to a glue node for the
3501current \.{\\lineskip}.
3502
3503@p function new_param_glue(@!n:small_number):pointer;
3504var p:pointer; {the new node}
3505@!q:pointer; {the glue specification}
3506begin p:=get_node(small_node_size); type(p):=glue_node; subtype(p):=n+1;
3507leader_ptr(p):=null;@/
3508q:=@<Current |mem| equivalent of glue parameter number |n|@>@t@>;
3509glue_ptr(p):=q; incr(glue_ref_count(q));
3510new_param_glue:=p;
3511end;
3512
3513@ Glue nodes that are more or less anonymous are created by |new_glue|,
3514whose argument points to a glue specification.
3515
3516@p function new_glue(@!q:pointer):pointer;
3517var p:pointer; {the new node}
3518begin p:=get_node(small_node_size); type(p):=glue_node; subtype(p):=normal;
3519leader_ptr(p):=null; glue_ptr(p):=q; incr(glue_ref_count(q));
3520new_glue:=p;
3521end;
3522
3523@ Still another subroutine is needed: This one is sort of a combination
3524of |new_param_glue| and |new_glue|. It creates a glue node for one of
3525the current glue parameters, but it makes a fresh copy of the glue
3526specification, since that specification will probably be subject to change,
3527while the parameter will stay put. The global variable |temp_ptr| is
3528set to the address of the new spec.
3529
3530@p function new_skip_param(@!n:small_number):pointer;
3531var p:pointer; {the new node}
3532begin temp_ptr:=new_spec(@<Current |mem| equivalent of glue parameter...@>);
3533p:=new_glue(temp_ptr); glue_ref_count(temp_ptr):=null; subtype(p):=n+1;
3534new_skip_param:=p;
3535end;
3536
3537@ A |kern_node| has a |width| field to specify a (normally negative)
3538amount of spacing. This spacing correction appears in horizontal lists
3539between letters like A and V when the font designer said that it looks
3540better to move them closer together or further apart. A kern node can
3541also appear in a vertical list, when its `|width|' denotes additional
3542spacing in the vertical direction. The |subtype| is either |normal| (for
3543kerns inserted from font information or math mode calculations) or |explicit|
3544(for kerns inserted from \.{\\kern} and \.{\\/} commands) or |acc_kern|
3545(for kerns inserted from non-math accents) or |mu_glue| (for kerns
3546inserted from \.{\\mkern} specifications in math formulas).
3547
3548@d kern_node=11 {|type| of a kern node}
3549@d explicit=1 {|subtype| of kern nodes from \.{\\kern} and \.{\\/}}
3550@d acc_kern=2 {|subtype| of kern nodes from accents}
3551
3552@# {memory structure for marginal kerns}
3553@d margin_kern_node = 40
3554@d margin_kern_node_size = 3
3555@d margin_char(#) == info(# + 2)    {unused for now; relevant for font expansion}
3556
3557@# {|subtype| of marginal kerns}
3558@d left_side == 0
3559@d right_side == 1
3560
3561@# {base for lp/rp codes starts from 2:
3562    0 for |hyphen_char|,
3563    1 for |skew_char|}
3564@d lp_code_base == 2
3565@d rp_code_base == 3
3566
3567@d max_hlist_stack = 512 {maximum fill level for |hlist_stack|}
3568{maybe good if larger than |2 * max_quarterword|, so that box nesting level would overflow first}
3569
3570@ The |new_kern| function creates a kern node having a given width.
3571
3572@p function new_kern(@!w:scaled):pointer;
3573var p:pointer; {the new node}
3574begin p:=get_node(small_node_size); type(p):=kern_node;
3575subtype(p):=normal;
3576width(p):=w;
3577new_kern:=p;
3578end;
3579
3580@ @<Glob...@>=
3581@!last_leftmost_char: pointer;
3582@!last_rightmost_char: pointer;
3583@!hlist_stack:array[0..max_hlist_stack] of pointer; {stack for |find_protchar_left()| and |find_protchar_right()|}
3584@!hlist_stack_level:0..max_hlist_stack; {fill level for |hlist_stack|}
3585@!first_p: pointer; {to access the first node of the paragraph}
3586@!global_prev_p: pointer; {to access |prev_p| in |line_break|; should be kept in sync with |prev_p| by |update_prev_p|}
3587
3588@ A |penalty_node| specifies the penalty associated with line or page
3589breaking, in its |penalty| field. This field is a fullword integer, but
3590the full range of integer values is not used: Any penalty |>=10000| is
3591treated as infinity, and no break will be allowed for such high values.
3592Similarly, any penalty |<=-10000| is treated as negative infinity, and a
3593break will be forced.
3594
3595@d penalty_node=12 {|type| of a penalty node}
3596@d inf_penalty=inf_bad {``infinite'' penalty value}
3597@d eject_penalty=-inf_penalty {``negatively infinite'' penalty value}
3598@d penalty(#) == mem[#+1].int {the added cost of breaking a list here}
3599
3600@ Anyone who has been reading the last few sections of the program will
3601be able to guess what comes next.
3602
3603@p function new_penalty(@!m:integer):pointer;
3604var p:pointer; {the new node}
3605begin p:=get_node(small_node_size); type(p):=penalty_node;
3606subtype(p):=0; {the |subtype| is not used}
3607penalty(p):=m; new_penalty:=p;
3608end;
3609
3610@ You might think that we have introduced enough node types by now. Well,
3611almost, but there is one more: An |unset_node| has nearly the same format
3612as an |hlist_node| or |vlist_node|; it is used for entries in \.{\\halign}
3613or \.{\\valign} that are not yet in their final form, since the box
3614dimensions are their ``natural'' sizes before any glue adjustment has been
3615made. The |glue_set| word is not present; instead, we have a |glue_stretch|
3616field, which contains the total stretch of order |glue_order| that is
3617present in the hlist or vlist being boxed.
3618Similarly, the |shift_amount| field is replaced by a |glue_shrink| field,
3619containing the total shrink of order |glue_sign| that is present.
3620The |subtype| field is called |span_count|; an unset box typically
3621contains the data for |qo(span_count)+1| columns.
3622Unset nodes will be changed to box nodes when alignment is completed.
3623
3624@d unset_node=13 {|type| for an unset node}
3625@d glue_stretch(#)==mem[#+glue_offset].sc {total stretch in an unset node}
3626@d glue_shrink==shift_amount {total shrink in an unset node}
3627@d span_count==subtype {indicates the number of spanned columns}
3628
3629@ In fact, there are still more types coming. When we get to math formula
3630processing we will see that a |style_node| has |type=14|; and a number
3631of larger type codes will also be defined, for use in math mode only.
3632
3633@ Warning: If any changes are made to these data structure layouts, such as
3634changing any of the node sizes or even reordering the words of nodes,
3635the |copy_node_list| procedure and the memory initialization code
3636below may have to be changed. Such potentially dangerous parts of the
3637program are listed in the index under `data structure assumptions'.
3638@!@^data structure assumptions@>
3639However, other references to the nodes are made symbolically in terms of
3640the \.{WEB} macro definitions above, so that format changes will leave
3641\TeX's other algorithms intact.
3642@^system dependencies@>
3643
3644@* \[11] Memory layout.
3645Some areas of |mem| are dedicated to fixed usage, since static allocation is
3646more efficient than dynamic allocation when we can get away with it. For
3647example, locations |mem_bot| to |mem_bot+3| are always used to store the
3648specification for glue that is `\.{0pt plus 0pt minus 0pt}'. The
3649following macro definitions accomplish the static allocation by giving
3650symbolic names to the fixed positions. Static variable-size nodes appear
3651in locations |mem_bot| through |lo_mem_stat_max|, and static single-word nodes
3652appear in locations |hi_mem_stat_min| through |mem_top|, inclusive. It is
3653harmless to let |lig_trick| and |garbage| share the same location of |mem|.
3654
3655@d zero_glue==mem_bot {specification for \.{0pt plus 0pt minus 0pt}}
3656@d fil_glue==zero_glue+glue_spec_size {\.{0pt plus 1fil minus 0pt}}
3657@d fill_glue==fil_glue+glue_spec_size {\.{0pt plus 1fill minus 0pt}}
3658@d ss_glue==fill_glue+glue_spec_size {\.{0pt plus 1fil minus 1fil}}
3659@d fil_neg_glue==ss_glue+glue_spec_size {\.{0pt plus -1fil minus 0pt}}
3660@d lo_mem_stat_max==fil_neg_glue+glue_spec_size-1 {largest statically
3661  allocated word in the variable-size |mem|}
3662@#
3663@d page_ins_head==mem_top {list of insertion data for current page}
3664@d contrib_head==mem_top-1 {vlist of items not yet on current page}
3665@d page_head==mem_top-2 {vlist for current page}
3666@d temp_head==mem_top-3 {head of a temporary list of some kind}
3667@d hold_head==mem_top-4 {head of a temporary list of another kind}
3668@d adjust_head==mem_top-5 {head of adjustment list returned by |hpack|}
3669@d active==mem_top-7 {head of active list in |line_break|, needs two words}
3670@d align_head==mem_top-8 {head of preamble list for alignments}
3671@d end_span==mem_top-9 {tail of spanned-width lists}
3672@d omit_template==mem_top-10 {a constant token list}
3673@d null_list==mem_top-11 {permanently empty list}
3674@d lig_trick==mem_top-12 {a ligature masquerading as a |char_node|}
3675@d garbage==mem_top-12 {used for scrap information}
3676@d backup_head==mem_top-13 {head of token list built by |scan_keyword|}
3677@d pre_adjust_head==mem_top-14  {head of pre-adjustment list returned by |hpack|}
3678@d hi_mem_stat_min==mem_top-14 {smallest statically allocated word in
3679  the one-word |mem|}
3680@d hi_mem_stat_usage=15 {the number of one-word nodes always present}
3681
3682@ The following code gets |mem| off to a good start, when \TeX\ is
3683initializing itself the slow~way.
3684
3685@<Local variables for init...@>=
3686@!k:integer; {index into |mem|, |eqtb|, etc.}
3687
3688@ @<Initialize table entries...@>=
3689for k:=mem_bot+1 to lo_mem_stat_max do mem[k].sc:=0;
3690  {all glue dimensions are zeroed}
3691@^data structure assumptions@>
3692k:=mem_bot;@+while k<=lo_mem_stat_max do
3693    {set first words of glue specifications}
3694  begin glue_ref_count(k):=null+1;
3695  stretch_order(k):=normal; shrink_order(k):=normal;
3696  k:=k+glue_spec_size;
3697  end;
3698stretch(fil_glue):=unity; stretch_order(fil_glue):=fil;@/
3699stretch(fill_glue):=unity; stretch_order(fill_glue):=fill;@/
3700stretch(ss_glue):=unity; stretch_order(ss_glue):=fil;@/
3701shrink(ss_glue):=unity; shrink_order(ss_glue):=fil;@/
3702stretch(fil_neg_glue):=-unity; stretch_order(fil_neg_glue):=fil;@/
3703rover:=lo_mem_stat_max+1;
3704link(rover):=empty_flag; {now initialize the dynamic memory}
3705node_size(rover):=1000; {which is a 1000-word available node}
3706llink(rover):=rover; rlink(rover):=rover;@/
3707lo_mem_max:=rover+1000; link(lo_mem_max):=null; info(lo_mem_max):=null;@/
3708for k:=hi_mem_stat_min to mem_top do
3709  mem[k]:=mem[lo_mem_max]; {clear list heads}
3710@<Initialize the special list heads and constant nodes@>;
3711avail:=null; mem_end:=mem_top;
3712hi_mem_min:=hi_mem_stat_min; {initialize the one-word memory}
3713var_used:=lo_mem_stat_max+1-mem_bot; dyn_used:=hi_mem_stat_usage;
3714  {initialize statistics}
3715
3716@ If \TeX\ is extended improperly, the |mem| array might get screwed up.
3717For example, some pointers might be wrong, or some ``dead'' nodes might not
3718have been freed when the last reference to them disappeared. Procedures
3719|check_mem| and |search_mem| are available to help diagnose such
3720problems. These procedures make use of two arrays called |free| and
3721|was_free| that are present only if \TeX's debugging routines have
3722been included. (You may want to decrease the size of |mem| while you
3723@^debugging@>
3724are debugging.)
3725
3726@<Glob...@>=
3727@!debug @!free: packed array [mem_min..mem_max] of boolean; {free cells}
3728@t\hskip10pt@>@!was_free: packed array [mem_min..mem_max] of boolean;
3729  {previously free cells}
3730@t\hskip10pt@>@!was_mem_end,@!was_lo_max,@!was_hi_min: pointer;
3731  {previous |mem_end|, |lo_mem_max|, and |hi_mem_min|}
3732@t\hskip10pt@>@!panicking:boolean; {do we want to check memory constantly?}
3733gubed
3734
3735@ @<Set initial...@>=
3736@!debug was_mem_end:=mem_min; {indicate that everything was previously free}
3737was_lo_max:=mem_min; was_hi_min:=mem_max;
3738panicking:=false;
3739gubed
3740
3741@ Procedure |check_mem| makes sure that the available space lists of
3742|mem| are well formed, and it optionally prints out all locations
3743that are reserved now but were free the last time this procedure was called.
3744
3745@p @!debug procedure check_mem(@!print_locs : boolean);
3746label done1,done2; {loop exits}
3747var p,@!q:pointer; {current locations of interest in |mem|}
3748@!clobbered:boolean; {is something amiss?}
3749begin for p:=mem_min to lo_mem_max do free[p]:=false; {you can probably
3750  do this faster}
3751for p:=hi_mem_min to mem_end do free[p]:=false; {ditto}
3752@<Check single-word |avail| list@>;
3753@<Check variable-size |avail| list@>;
3754@<Check flags of unavailable nodes@>;
3755if print_locs then @<Print newly busy locations@>;
3756for p:=mem_min to lo_mem_max do was_free[p]:=free[p];
3757for p:=hi_mem_min to mem_end do was_free[p]:=free[p];
3758  {|was_free:=free| might be faster}
3759was_mem_end:=mem_end; was_lo_max:=lo_mem_max; was_hi_min:=hi_mem_min;
3760end;
3761gubed
3762
3763@ @<Check single-word...@>=
3764p:=avail; q:=null; clobbered:=false;
3765while p<>null do
3766  begin if (p>mem_end)or(p<hi_mem_min) then clobbered:=true
3767  else if free[p] then clobbered:=true;
3768  if clobbered then
3769    begin print_nl("AVAIL list clobbered at ");
3770@.AVAIL list clobbered...@>
3771    print_int(q); goto done1;
3772    end;
3773  free[p]:=true; q:=p; p:=link(q);
3774  end;
3775done1:
3776
3777@ @<Check variable-size...@>=
3778p:=rover; q:=null; clobbered:=false;
3779repeat if (p>=lo_mem_max)or(p<mem_min) then clobbered:=true
3780  else if (rlink(p)>=lo_mem_max)or(rlink(p)<mem_min) then clobbered:=true
3781  else if  not(is_empty(p))or(node_size(p)<2)or@|
3782   (p+node_size(p)>lo_mem_max)or@| (llink(rlink(p))<>p) then clobbered:=true;
3783  if clobbered then
3784  begin print_nl("Double-AVAIL list clobbered at ");
3785  print_int(q); goto done2;
3786  end;
3787for q:=p to p+node_size(p)-1 do {mark all locations free}
3788  begin if free[q] then
3789    begin print_nl("Doubly free location at ");
3790@.Doubly free location...@>
3791    print_int(q); goto done2;
3792    end;
3793  free[q]:=true;
3794  end;
3795q:=p; p:=rlink(p);
3796until p=rover;
3797done2:
3798
3799@ @<Check flags...@>=
3800p:=mem_min;
3801while p<=lo_mem_max do {node |p| should not be empty}
3802  begin if is_empty(p) then
3803    begin print_nl("Bad flag at "); print_int(p);
3804@.Bad flag...@>
3805    end;
3806  while (p<=lo_mem_max) and not free[p] do incr(p);
3807  while (p<=lo_mem_max) and free[p] do incr(p);
3808  end
3809
3810@ @<Print newly busy...@>=
3811begin print_nl("New busy locs:");
3812for p:=mem_min to lo_mem_max do
3813  if not free[p] and ((p>was_lo_max) or was_free[p]) then
3814    begin print_char(" "); print_int(p);
3815    end;
3816for p:=hi_mem_min to mem_end do
3817  if not free[p] and
3818   ((p<was_hi_min) or (p>was_mem_end) or was_free[p]) then
3819    begin print_char(" "); print_int(p);
3820    end;
3821end
3822
3823@ The |search_mem| procedure attempts to answer the question ``Who points
3824to node~|p|?'' In doing so, it fetches |link| and |info| fields of |mem|
3825that might not be of type |two_halves|. Strictly speaking, this is
3826@^dirty \PASCAL@>
3827undefined in \PASCAL, and it can lead to ``false drops'' (words that seem to
3828point to |p| purely by coincidence). But for debugging purposes, we want
3829to rule out the places that do {\sl not\/} point to |p|, so a few false
3830drops are tolerable.
3831
3832@p @!debug procedure search_mem(@!p:pointer); {look for pointers to |p|}
3833var q:integer; {current position being searched}
3834begin for q:=mem_min to lo_mem_max do
3835  begin if link(q)=p then
3836    begin print_nl("LINK("); print_int(q); print_char(")");
3837    end;
3838  if info(q)=p then
3839    begin print_nl("INFO("); print_int(q); print_char(")");
3840    end;
3841  end;
3842for q:=hi_mem_min to mem_end do
3843  begin if link(q)=p then
3844    begin print_nl("LINK("); print_int(q); print_char(")");
3845    end;
3846  if info(q)=p then
3847    begin print_nl("INFO("); print_int(q); print_char(")");
3848    end;
3849  end;
3850@<Search |eqtb| for equivalents equal to |p|@>;
3851@<Search |save_stack| for equivalents that point to |p|@>;
3852@<Search |hyph_list| for pointers to |p|@>;
3853end;
3854gubed
3855
3856@ Some stuff for character protrusion.
3857
3858@p
3859procedure pdf_error(t, p: str_number);
3860begin
3861  normalize_selector;
3862  print_err("Error");
3863  if t <> 0 then begin
3864    print(" (");
3865    print(t);
3866    print(")");
3867  end;
3868  print(": "); print(p);
3869  succumb;
3870end;
3871
3872function prev_rightmost(s, e: pointer): pointer;
3873{finds the node preceding the rightmost node |e|; |s| is some node
3874before |e|}
3875var p: pointer;
3876begin
3877  prev_rightmost:=null;
3878  p:=s;
3879  if p = null then
3880    return;
3881  while link(p) <> e do begin
3882    p:=link(p);
3883    if p = null then
3884      return;
3885  end;
3886  prev_rightmost:=p;
3887end;
3888
3889function round_xn_over_d(@!x:scaled; @!n,@!d:integer):scaled;
3890var positive:boolean; {was |x>=0|?}
3891@!t,@!u,@!v:nonnegative_integer; {intermediate quantities}
3892begin if x>=0 then positive:=true
3893else  begin negate(x); positive:=false;
3894  end;
3895t:=(x mod @'100000)*n;
3896u:=(x div @'100000)*n+(t div @'100000);
3897v:=(u mod d)*@'100000 + (t mod @'100000);
3898if u div d>=@'100000 then arith_error:=true
3899else u:=@'100000*(u div d) + (v div d);
3900v:=v mod d;
3901if 2*v >= d then
3902    incr(u);
3903if positive then
3904    round_xn_over_d:=u
3905else
3906    round_xn_over_d:=-u;
3907end;
3908
3909@* \[12] Displaying boxes.
3910We can reinforce our knowledge of the data structures just introduced
3911by considering two procedures that display a list in symbolic form.
3912The first of these, called |short_display|, is used in ``overfull box''
3913messages to give the top-level description of a list. The other one,
3914called |show_node_list|, prints a detailed description of exactly what
3915is in the data structure.
3916
3917The philosophy of |short_display| is to ignore the fine points about exactly
3918what is inside boxes, except that ligatures and discretionary breaks are
3919expanded. As a result, |short_display| is a recursive procedure, but the
3920recursion is never more than one level deep.
3921@^recursion@>
3922
3923A global variable |font_in_short_display| keeps track of the font code that
3924is assumed to be present when |short_display| begins; deviations from this
3925font will be printed.
3926
3927@<Glob...@>=
3928@!font_in_short_display:integer; {an internal font number}
3929
3930@ Boxes, rules, inserts, whatsits, marks, and things in general that are
3931sort of ``complicated'' are indicated only by printing `\.{[]}'.
3932
3933@p procedure short_display(@!p:integer); {prints highlights of list |p|}
3934var n:integer; {for replacement counts}
3935begin while p>mem_min do
3936  begin if is_char_node(p) then
3937    begin if p<=mem_end then
3938      begin if font(p)<>font_in_short_display then
3939        begin if (font(p)<font_base)or(font(p)>font_max) then
3940          print_char("*")
3941@.*\relax@>
3942        else @<Print the font identifier for |font(p)|@>;
3943        print_char(" "); font_in_short_display:=font(p);
3944        end;
3945      print_ASCII(qo(character(p)));
3946      end;
3947    end
3948  else @<Print a short indication of the contents of node |p|@>;
3949  p:=link(p);
3950  end;
3951end;
3952
3953@ @<Print a short indication of the contents of node |p|@>=
3954case type(p) of
3955hlist_node,vlist_node,ins_node,mark_node,adjust_node,
3956  unset_node: print("[]");
3957whatsit_node:
3958  case subtype(p) of
3959    native_word_node: begin
3960      if native_font(p)<>font_in_short_display then begin
3961        print_esc(font_id_text(native_font(p)));
3962        print_char(" ");
3963        font_in_short_display:=native_font(p);
3964      end;
3965      print_native_word(p);
3966    end;
3967    othercases print("[]")
3968  endcases;
3969rule_node: print_char("|");
3970glue_node: if glue_ptr(p)<>zero_glue then print_char(" ");
3971math_node: if subtype(p)>=L_code then print("[]")
3972  else print_char("$");
3973ligature_node: short_display(lig_ptr(p));
3974disc_node: begin short_display(pre_break(p));
3975  short_display(post_break(p));@/
3976  n:=replace_count(p);
3977  while n>0 do
3978    begin if link(p)<>null then p:=link(p);
3979    decr(n);
3980    end;
3981  end;
3982othercases do_nothing
3983endcases
3984
3985@ The |show_node_list| routine requires some auxiliary subroutines: one to
3986print a font-and-character combination, one to print a token list without
3987its reference count, and one to print a rule dimension.
3988
3989@p procedure print_font_and_char(@!p:integer); {prints |char_node| data}
3990begin if p>mem_end then print_esc("CLOBBERED.")
3991else  begin if (font(p)<font_base)or(font(p)>font_max) then print_char("*")
3992@.*\relax@>
3993  else @<Print the font identifier for |font(p)|@>;
3994  print_char(" "); print_ASCII(qo(character(p)));
3995  end;
3996end;
3997@#
3998procedure print_mark(@!p:integer); {prints token list data in braces}
3999begin print_char("{");
4000if (p<hi_mem_min)or(p>mem_end) then print_esc("CLOBBERED.")
4001else show_token_list(link(p),null,max_print_line-10);
4002print_char("}");
4003end;
4004@#
4005procedure print_rule_dimen(@!d:scaled); {prints dimension in rule node}
4006begin if is_running(d) then print_char("*") else print_scaled(d);
4007@.*\relax@>
4008end;
4009
4010@ Then there is a subroutine that prints glue stretch and shrink, possibly
4011followed by the name of finite units:
4012
4013@p procedure print_glue(@!d:scaled;@!order:integer;@!s:str_number);
4014  {prints a glue component}
4015begin print_scaled(d);
4016if (order<normal)or(order>filll) then print("foul")
4017else if order>normal then
4018  begin print("fil");
4019  while order>fil do
4020    begin print_char("l"); decr(order);
4021    end;
4022  end
4023else if s<>0 then print(s);
4024end;
4025
4026@ The next subroutine prints a whole glue specification.
4027
4028@p procedure print_spec(@!p:integer;@!s:str_number);
4029  {prints a glue specification}
4030begin if (p<mem_min)or(p>=lo_mem_max) then print_char("*")
4031@.*\relax@>
4032else  begin print_scaled(width(p));
4033  if s<>0 then print(s);
4034  if stretch(p)<>0 then
4035    begin print(" plus "); print_glue(stretch(p),stretch_order(p),s);
4036    end;
4037  if shrink(p)<>0 then
4038    begin print(" minus "); print_glue(shrink(p),shrink_order(p),s);
4039    end;
4040  end;
4041end;
4042
4043@ We also need to declare some procedures that appear later in this
4044documentation.
4045
4046@p @<Declare procedures needed for displaying the elements of mlists@>@;
4047@<Declare the procedure called |print_skip_param|@>
4048
4049@ Since boxes can be inside of boxes, |show_node_list| is inherently recursive,
4050@^recursion@>
4051up to a given maximum number of levels.  The history of nesting is indicated
4052by the current string, which will be printed at the beginning of each line;
4053the length of this string, namely |cur_length|, is the depth of nesting.
4054
4055Recursive calls on |show_node_list| therefore use the following pattern:
4056
4057@d node_list_display(#)==
4058  begin append_char("."); show_node_list(#); flush_char;
4059  end {|str_room| need not be checked; see |show_box| below}
4060
4061@ A global variable called |depth_threshold| is used to record the maximum
4062depth of nesting for which |show_node_list| will show information.  If we
4063have |depth_threshold=0|, for example, only the top level information will
4064be given and no sublists will be traversed. Another global variable, called
4065|breadth_max|, tells the maximum number of items to show at each level;
4066|breadth_max| had better be positive, or you won't see anything.
4067
4068@<Glob...@>=
4069@!depth_threshold : integer; {maximum nesting depth in box displays}
4070@!breadth_max : integer; {maximum number of items shown at the same list level}
4071
4072@ Now we are ready for |show_node_list| itself. This procedure has been
4073written to be ``extra robust'' in the sense that it should not crash or get
4074into a loop even if the data structures have been messed up by bugs in
4075the rest of the program. You can safely call its parent routine
4076|show_box(p)| for arbitrary values of |p| when you are debugging \TeX.
4077However, in the presence of bad data, the procedure may
4078@^dirty \PASCAL@>@^debugging@>
4079fetch a |memory_word| whose variant is different from the way it was stored;
4080for example, it might try to read |mem[p].hh| when |mem[p]|
4081contains a scaled integer, if |p| is a pointer that has been
4082clobbered or chosen at random.
4083
4084@p procedure show_node_list(@!p:integer); {prints a node list symbolically}
4085label exit;
4086var n:integer; {the number of items already printed at this level}
4087i:integer; {temp index for printing chars of picfile paths}
4088@!g:real; {a glue ratio, as a floating point number}
4089begin if cur_length>depth_threshold then
4090  begin if p>null then print(" []");
4091    {indicate that there's been some truncation}
4092  return;
4093  end;
4094n:=0;
4095while p>mem_min do
4096  begin print_ln; print_current_string; {display the nesting history}
4097  if p>mem_end then {pointer out of range}
4098    begin print("Bad link, display aborted."); return;
4099@.Bad link...@>
4100    end;
4101  incr(n); if n>breadth_max then {time to stop}
4102    begin print("etc."); return;
4103@.etc@>
4104    end;
4105  @<Display node |p|@>;
4106  p:=link(p);
4107  end;
4108exit:
4109end;
4110
4111@ @<Display node |p|@>=
4112if is_char_node(p) then print_font_and_char(p)
4113else  case type(p) of
4114  hlist_node,vlist_node,unset_node: @<Display box |p|@>;
4115  rule_node: @<Display rule |p|@>;
4116  ins_node: @<Display insertion |p|@>;
4117  whatsit_node: @<Display the whatsit node |p|@>;
4118  glue_node: @<Display glue |p|@>;
4119  kern_node: @<Display kern |p|@>;
4120  margin_kern_node: begin
4121    print_esc("kern");
4122    print_scaled(width(p));
4123    if subtype(p) = left_side then
4124      print(" (left margin)")
4125    else
4126      print(" (right margin)");
4127    end;
4128  math_node: @<Display math node |p|@>;
4129  ligature_node: @<Display ligature |p|@>;
4130  penalty_node: @<Display penalty |p|@>;
4131  disc_node: @<Display discretionary |p|@>;
4132  mark_node: @<Display mark |p|@>;
4133  adjust_node: @<Display adjustment |p|@>;
4134  @t\4@>@<Cases of |show_node_list| that arise in mlists only@>@;
4135  othercases print("Unknown node type!")
4136  endcases
4137
4138@ @<Display box |p|@>=
4139begin if type(p)=hlist_node then print_esc("h")
4140else if type(p)=vlist_node then print_esc("v")
4141else print_esc("unset");
4142print("box("); print_scaled(height(p)); print_char("+");
4143print_scaled(depth(p)); print(")x"); print_scaled(width(p));
4144if type(p)=unset_node then
4145  @<Display special fields of the unset node |p|@>
4146else  begin @<Display the value of |glue_set(p)|@>;
4147  if shift_amount(p)<>0 then
4148    begin print(", shifted "); print_scaled(shift_amount(p));
4149    end;
4150  if eTeX_ex then @<Display if this box is never to be reversed@>;
4151  end;
4152node_list_display(list_ptr(p)); {recursive call}
4153end
4154
4155@ @<Display special fields of the unset node |p|@>=
4156begin if span_count(p)<>min_quarterword then
4157  begin print(" ("); print_int(qo(span_count(p))+1);
4158  print(" columns)");
4159  end;
4160if glue_stretch(p)<>0 then
4161  begin print(", stretch "); print_glue(glue_stretch(p),glue_order(p),0);
4162  end;
4163if glue_shrink(p)<>0 then
4164  begin print(", shrink "); print_glue(glue_shrink(p),glue_sign(p),0);
4165  end;
4166end
4167
4168@ The code will have to change in this place if |glue_ratio| is
4169a structured type instead of an ordinary |real|. Note that this routine
4170should avoid arithmetic errors even if the |glue_set| field holds an
4171arbitrary random value. The following code assumes that a properly
4172formed nonzero |real| number has absolute value $2^{20}$ or more when
4173it is regarded as an integer; this precaution was adequate to prevent
4174floating point underflow on the author's computer.
4175@^system dependencies@>
4176@^dirty \PASCAL@>
4177
4178@<Display the value of |glue_set(p)|@>=
4179g:=float(glue_set(p));
4180if (g<>float_constant(0))and(glue_sign(p)<>normal) then
4181  begin print(", glue set ");
4182  if glue_sign(p)=shrinking then print("- ");
4183  if abs(mem[p+glue_offset].int)<@'4000000 then print("?.?")
4184  else if abs(g)>float_constant(20000) then
4185    begin if g>float_constant(0) then print_char(">")
4186    else print("< -");
4187    print_glue(20000*unity,glue_order(p),0);
4188    end
4189  else print_glue(round(unity*g),glue_order(p),0);
4190@^real multiplication@>
4191  end
4192
4193@ @<Display rule |p|@>=
4194begin print_esc("rule("); print_rule_dimen(height(p)); print_char("+");
4195print_rule_dimen(depth(p)); print(")x"); print_rule_dimen(width(p));
4196end
4197
4198@ @<Display insertion |p|@>=
4199begin print_esc("insert"); print_int(qo(subtype(p)));
4200print(", natural size "); print_scaled(height(p));
4201print("; split("); print_spec(split_top_ptr(p),0);
4202print_char(","); print_scaled(depth(p));
4203print("); float cost "); print_int(float_cost(p));
4204node_list_display(ins_ptr(p)); {recursive call}
4205end
4206
4207@ @<Display glue |p|@>=
4208if subtype(p)>=a_leaders then @<Display leaders |p|@>
4209else  begin print_esc("glue");
4210  if subtype(p)<>normal then
4211    begin print_char("(");
4212    if subtype(p)<cond_math_glue then
4213      print_skip_param(subtype(p)-1)
4214    else if subtype(p)=cond_math_glue then print_esc("nonscript")
4215    else print_esc("mskip");
4216    print_char(")");
4217    end;
4218  if subtype(p)<>cond_math_glue then
4219    begin print_char(" ");
4220    if subtype(p)<cond_math_glue then print_spec(glue_ptr(p),0)
4221    else print_spec(glue_ptr(p),"mu");
4222    end;
4223  end
4224
4225@ @<Display leaders |p|@>=
4226begin print_esc("");
4227if subtype(p)=c_leaders then print_char("c")
4228else if subtype(p)=x_leaders then print_char("x");
4229print("leaders "); print_spec(glue_ptr(p),0);
4230node_list_display(leader_ptr(p)); {recursive call}
4231end
4232
4233@ An ``explicit'' kern value is indicated implicitly by an explicit space.
4234
4235@<Display kern |p|@>=
4236if subtype(p)<>mu_glue then
4237  begin print_esc("kern");
4238  if subtype(p)<>normal then print_char(" ");
4239  print_scaled(width(p));
4240  if subtype(p)=acc_kern then print(" (for accent)");
4241@.for accent@>
4242  end
4243else  begin print_esc("mkern"); print_scaled(width(p)); print("mu");
4244  end
4245
4246@ @<Display math node |p|@>=
4247if subtype(p)>after then
4248  begin if end_LR(p) then print_esc("end")
4249  else print_esc("begin");
4250  if subtype(p)>R_code then print_char("R")
4251  else if subtype(p)>L_code then print_char("L")
4252  else print_char("M");
4253  end else
4254begin print_esc("math");
4255if subtype(p)=before then print("on")
4256else print("off");
4257if width(p)<>0 then
4258  begin print(", surrounded "); print_scaled(width(p));
4259  end;
4260end
4261
4262@ @<Display ligature |p|@>=
4263begin print_font_and_char(lig_char(p)); print(" (ligature ");
4264if subtype(p)>1 then print_char("|");
4265font_in_short_display:=font(lig_char(p)); short_display(lig_ptr(p));
4266if odd(subtype(p)) then print_char("|");
4267print_char(")");
4268end
4269
4270@ @<Display penalty |p|@>=
4271begin print_esc("penalty "); print_int(penalty(p));
4272end
4273
4274@ The |post_break| list of a discretionary node is indicated by a prefixed
4275`\.{\char'174}' instead of the `\..' before the |pre_break| list.
4276
4277@<Display discretionary |p|@>=
4278begin print_esc("discretionary");
4279if replace_count(p)>0 then
4280  begin print(" replacing "); print_int(replace_count(p));
4281  end;
4282node_list_display(pre_break(p)); {recursive call}
4283append_char("|"); show_node_list(post_break(p)); flush_char; {recursive call}
4284end
4285
4286@ @<Display mark |p|@>=
4287begin print_esc("mark");
4288if mark_class(p)<>0 then
4289  begin print_char("s"); print_int(mark_class(p));
4290  end;
4291print_mark(mark_ptr(p));
4292end
4293
4294@ @<Display adjustment |p|@>=
4295begin print_esc("vadjust"); if adjust_pre(p) <> 0 then print(" pre ");
4296node_list_display(adjust_ptr(p)); {recursive call}
4297end
4298
4299@ The recursive machinery is started by calling |show_box|.
4300@^recursion@>
4301
4302@p procedure show_box(@!p:pointer);
4303begin @<Assign the values |depth_threshold:=show_box_depth| and
4304  |breadth_max:=show_box_breadth|@>;
4305if breadth_max<=0 then breadth_max:=5;
4306if pool_ptr+depth_threshold>=pool_size then
4307  depth_threshold:=pool_size-pool_ptr-1;
4308  {now there's enough room for prefix string}
4309show_node_list(p); {the show starts at |p|}
4310print_ln;
4311end;
4312
4313procedure short_display_n(@!p, m:integer); {prints highlights of list |p|}
4314begin
4315  breadth_max:=m;
4316  depth_threshold:=pool_size-pool_ptr-1;
4317  show_node_list(p); {the show starts at |p|}
4318end;
4319
4320@* \[13] Destroying boxes.
4321When we are done with a node list, we are obliged to return it to free
4322storage, including all of its sublists. The recursive procedure
4323|flush_node_list| does this for us.
4324
4325@ First, however, we shall consider two non-recursive procedures that do
4326simpler tasks. The first of these, |delete_token_ref|, is called when
4327a pointer to a token list's reference count is being removed. This means
4328that the token list should disappear if the reference count was |null|,
4329otherwise the count should be decreased by one.
4330@^reference counts@>
4331
4332@d token_ref_count(#) == info(#) {reference count preceding a token list}
4333
4334@p procedure delete_token_ref(@!p:pointer); {|p| points to the reference count
4335  of a token list that is losing one reference}
4336begin if token_ref_count(p)=null then flush_list(p)
4337else decr(token_ref_count(p));
4338end;
4339
4340@ Similarly, |delete_glue_ref| is called when a pointer to a glue
4341specification is being withdrawn.
4342@^reference counts@>
4343@d fast_delete_glue_ref(#)==@t@>@;@/
4344  begin if glue_ref_count(#)=null then free_node(#,glue_spec_size)
4345  else decr(glue_ref_count(#));
4346  end
4347
4348@p procedure delete_glue_ref(@!p:pointer); {|p| points to a glue specification}
4349fast_delete_glue_ref(p);
4350
4351@ Now we are ready to delete any node list, recursively.
4352In practice, the nodes deleted are usually charnodes (about 2/3 of the time),
4353and they are glue nodes in about half of the remaining cases.
4354@^recursion@>
4355
4356@p procedure flush_node_list(@!p:pointer); {erase list of nodes starting at |p|}
4357label done; {go here when node |p| has been freed}
4358var q:pointer; {successor to node |p|}
4359begin while p<>null do
4360@^inner loop@>
4361  begin q:=link(p);
4362  if is_char_node(p) then free_avail(p)
4363  else  begin case type(p) of
4364    hlist_node,vlist_node,unset_node: begin flush_node_list(list_ptr(p));
4365      free_node(p,box_node_size); goto done;
4366      end;
4367    rule_node: begin free_node(p,rule_node_size); goto done;
4368      end;
4369    ins_node: begin flush_node_list(ins_ptr(p));
4370      delete_glue_ref(split_top_ptr(p));
4371      free_node(p,ins_node_size); goto done;
4372      end;
4373    whatsit_node: @<Wipe out the whatsit node |p| and |goto done|@>;
4374    glue_node: begin fast_delete_glue_ref(glue_ptr(p));
4375      if leader_ptr(p)<>null then flush_node_list(leader_ptr(p));
4376      end;
4377    kern_node,math_node,penalty_node: do_nothing;
4378    margin_kern_node: begin
4379        free_node(p, margin_kern_node_size);
4380        goto done;
4381      end;
4382    ligature_node: flush_node_list(lig_ptr(p));
4383    mark_node: delete_token_ref(mark_ptr(p));
4384    disc_node: begin flush_node_list(pre_break(p));
4385      flush_node_list(post_break(p));
4386      end;
4387    adjust_node: flush_node_list(adjust_ptr(p));
4388    @t\4@>@<Cases of |flush_node_list| that arise in mlists only@>@;
4389    othercases confusion("flushing")
4390@:this can't happen flushing}{\quad flushing@>
4391    endcases;@/
4392    free_node(p,small_node_size);
4393    done:end;
4394  p:=q;
4395  end;
4396end;
4397
4398@* \[14] Copying boxes.
4399Another recursive operation that acts on boxes is sometimes needed: The
4400procedure |copy_node_list| returns a pointer to another node list that has
4401the same structure and meaning as the original. Note that since glue
4402specifications and token lists have reference counts, we need not make
4403copies of them. Reference counts can never get too large to fit in a
4404halfword, since each pointer to a node is in a different memory address,
4405and the total number of memory addresses fits in a halfword.
4406@^recursion@>
4407@^reference counts@>
4408
4409(Well, there actually are also references from outside |mem|; if the
4410|save_stack| is made arbitrarily large, it would theoretically be possible
4411to break \TeX\ by overflowing a reference count. But who would want to do that?)
4412
4413@d add_token_ref(#)==incr(token_ref_count(#)) {new reference to a token list}
4414@d add_glue_ref(#)==incr(glue_ref_count(#)) {new reference to a glue spec}
4415
4416@ The copying procedure copies words en masse without bothering
4417to look at their individual fields. If the node format changes---for
4418example, if the size is altered, or if some link field is moved to another
4419relative position---then this code may need to be changed too.
4420@^data structure assumptions@>
4421
4422@p function copy_node_list(@!p:pointer):pointer; {makes a duplicate of the
4423  node list that starts at |p| and returns a pointer to the new list}
4424var h:pointer; {temporary head of copied list}
4425@!q:pointer; {previous position in new list}
4426@!r:pointer; {current node being fabricated for new list}
4427@!words:0..5; {number of words remaining to be copied}
4428begin h:=get_avail; q:=h;
4429while p<>null do
4430  begin @<Make a copy of node |p| in node |r|@>;
4431  link(q):=r; q:=r; p:=link(p);
4432  end;
4433link(q):=null; q:=link(h); free_avail(h);
4434copy_node_list:=q;
4435end;
4436
4437@ @<Make a copy of node |p|...@>=
4438words:=1; {this setting occurs in more branches than any other}
4439if is_char_node(p) then r:=get_avail
4440else @<Case statement to copy different types and set |words| to the number
4441  of initial words not yet copied@>;
4442while words>0 do
4443  begin decr(words); mem[r+words]:=mem[p+words];
4444  end
4445
4446@ @<Case statement to copy...@>=
4447case type(p) of
4448hlist_node,vlist_node,unset_node: begin r:=get_node(box_node_size);
4449  mem[r+6]:=mem[p+6]; mem[r+5]:=mem[p+5]; {copy the last two words}
4450  list_ptr(r):=copy_node_list(list_ptr(p)); {this affects |mem[r+5]|}
4451  words:=5;
4452  end;
4453rule_node: begin r:=get_node(rule_node_size); words:=rule_node_size;
4454  end;
4455ins_node: begin r:=get_node(ins_node_size); mem[r+4]:=mem[p+4];
4456  add_glue_ref(split_top_ptr(p));
4457  ins_ptr(r):=copy_node_list(ins_ptr(p)); {this affects |mem[r+4]|}
4458  words:=ins_node_size-1;
4459  end;
4460whatsit_node:@<Make a partial copy of the whatsit node |p| and make |r|
4461  point to it; set |words| to the number of initial words not yet copied@>;
4462glue_node: begin r:=get_node(small_node_size); add_glue_ref(glue_ptr(p));
4463  glue_ptr(r):=glue_ptr(p); leader_ptr(r):=copy_node_list(leader_ptr(p));
4464  end;
4465kern_node,math_node,penalty_node: begin r:=get_node(small_node_size);
4466  words:=small_node_size;
4467  end;
4468margin_kern_node: begin
4469    r:=get_node(margin_kern_node_size);
4470    words:=margin_kern_node_size;
4471  end;
4472ligature_node: begin r:=get_node(small_node_size);
4473  mem[lig_char(r)]:=mem[lig_char(p)]; {copy |font| and |character|}
4474  lig_ptr(r):=copy_node_list(lig_ptr(p));
4475  end;
4476disc_node: begin r:=get_node(small_node_size);
4477  pre_break(r):=copy_node_list(pre_break(p));
4478  post_break(r):=copy_node_list(post_break(p));
4479  end;
4480mark_node: begin r:=get_node(small_node_size); add_token_ref(mark_ptr(p));
4481  words:=small_node_size;
4482  end;
4483adjust_node: begin r:=get_node(small_node_size);
4484  adjust_ptr(r):=copy_node_list(adjust_ptr(p));
4485  end; {|words=1=small_node_size-1|}
4486othercases confusion("copying")
4487@:this can't happen copying}{\quad copying@>
4488endcases
4489
4490@* \[15] The command codes.
4491Before we can go any further, we need to define symbolic names for the internal
4492code numbers that represent the various commands obeyed by \TeX. These codes
4493are somewhat arbitrary, but not completely so. For example, the command
4494codes for character types are fixed by the language, since a user says,
4495e.g., `\.{\\catcode \`\\\${} = 3}' to make \.{\char'44} a math delimiter,
4496and the command code |math_shift| is equal to~3. Some other codes have
4497been made adjacent so that |case| statements in the program need not consider
4498cases that are widely spaced, or so that |case| statements can be replaced
4499by |if| statements.
4500
4501At any rate, here is the list, for future reference. First come the
4502``catcode'' commands, several of which share their numeric codes with
4503ordinary commands when the catcode cannot emerge from \TeX's scanning routine.
4504
4505@d escape=0 {escape delimiter (called \.\\ in {\sl The \TeX book\/})}
4506@:TeXbook}{\sl The \TeX book@>
4507@d relax=0 {do nothing ( \.{\\relax} )}
4508@d left_brace=1 {beginning of a group ( \.\{ )}
4509@d right_brace=2 {ending of a group ( \.\} )}
4510@d math_shift=3 {mathematics shift character ( \.\$ )}
4511@d tab_mark=4 {alignment delimiter ( \.\&, \.{\\span} )}
4512@d car_ret=5 {end of line ( |carriage_return|, \.{\\cr}, \.{\\crcr} )}
4513@d out_param=5 {output a macro parameter}
4514@d mac_param=6 {macro parameter symbol ( \.\# )}
4515@d sup_mark=7 {superscript ( \.{\char'136} )}
4516@d sub_mark=8 {subscript ( \.{\char'137} )}
4517@d ignore=9 {characters to ignore ( \.{\^\^@@} )}
4518@d endv=9 {end of \<v_j> list in alignment template}
4519@d spacer=10 {characters equivalent to blank space ( \.{\ } )}
4520@d letter=11 {characters regarded as letters ( \.{A..Z}, \.{a..z} )}
4521@d other_char=12 {none of the special character types}
4522@d active_char=13 {characters that invoke macros ( \.{\char`\~} )}
4523@d par_end=13 {end of paragraph ( \.{\\par} )}
4524@d match=13 {match a macro parameter}
4525@d comment=14 {characters that introduce comments ( \.\% )}
4526@d end_match=14 {end of parameters to macro}
4527@d stop=14 {end of job ( \.{\\end}, \.{\\dump} )}
4528@d invalid_char=15 {characters that shouldn't appear ( \.{\^\^?} )}
4529@d delim_num=15 {specify delimiter numerically ( \.{\\delimiter} )}
4530@d max_char_code=15 {largest catcode for individual characters}
4531
4532@ Next are the ordinary run-of-the-mill command codes.  Codes that are
4533|min_internal| or more represent internal quantities that might be
4534expanded by `\.{\\the}'.
4535
4536@d char_num=16 {character specified numerically ( \.{\\char} )}
4537@d math_char_num=17 {explicit math code ( \.{\\mathchar} )}
4538@d mark=18 {mark definition ( \.{\\mark} )}
4539@d xray=19 {peek inside of \TeX\ ( \.{\\show}, \.{\\showbox}, etc.~)}
4540@d make_box=20 {make a box ( \.{\\box}, \.{\\copy}, \.{\\hbox}, etc.~)}
4541@d hmove=21 {horizontal motion ( \.{\\moveleft}, \.{\\moveright} )}
4542@d vmove=22 {vertical motion ( \.{\\raise}, \.{\\lower} )}
4543@d un_hbox=23 {unglue a box ( \.{\\unhbox}, \.{\\unhcopy} )}
4544@d un_vbox=24 {unglue a box ( \.{\\unvbox}, \.{\\unvcopy} )}
4545  {( or \.{\\pagediscards}, \.{\\splitdiscards} )}
4546@d remove_item=25 {nullify last item ( \.{\\unpenalty},
4547  \.{\\unkern}, \.{\\unskip} )}
4548@d hskip=26 {horizontal glue ( \.{\\hskip}, \.{\\hfil}, etc.~)}
4549@d vskip=27 {vertical glue ( \.{\\vskip}, \.{\\vfil}, etc.~)}
4550@d mskip=28 {math glue ( \.{\\mskip} )}
4551@d kern=29 {fixed space ( \.{\\kern})}
4552@d mkern=30 {math kern ( \.{\\mkern} )}
4553@d leader_ship=31 {use a box ( \.{\\shipout}, \.{\\leaders}, etc.~)}
4554@d halign=32 {horizontal table alignment ( \.{\\halign} )}
4555@d valign=33 {vertical table alignment ( \.{\\valign} )}
4556  {or text direction directives ( \.{\\beginL}, etc.~)}
4557@d no_align=34 {temporary escape from alignment ( \.{\\noalign} )}
4558@d vrule=35 {vertical rule ( \.{\\vrule} )}
4559@d hrule=36 {horizontal rule ( \.{\\hrule} )}
4560@d insert=37 {vlist inserted in box ( \.{\\insert} )}
4561@d vadjust=38 {vlist inserted in enclosing paragraph ( \.{\\vadjust} )}
4562@d ignore_spaces=39 {gobble |spacer| tokens ( \.{\\ignorespaces} )}
4563@d after_assignment=40 {save till assignment is done ( \.{\\afterassignment} )}
4564@d after_group=41 {save till group is done ( \.{\\aftergroup} )}
4565@d break_penalty=42 {additional badness ( \.{\\penalty} )}
4566@d start_par=43 {begin paragraph ( \.{\\indent}, \.{\\noindent} )}
4567@d ital_corr=44 {italic correction ( \.{\\/} )}
4568@d accent=45 {attach accent in text ( \.{\\accent} )}
4569@d math_accent=46 {attach accent in math ( \.{\\mathaccent} )}
4570@d discretionary=47 {discretionary texts ( \.{\\-}, \.{\\discretionary} )}
4571@d eq_no=48 {equation number ( \.{\\eqno}, \.{\\leqno} )}
4572@d left_right=49 {variable delimiter ( \.{\\left}, \.{\\right} )}
4573  {( or \.{\\middle} )}
4574@d math_comp=50 {component of formula ( \.{\\mathbin}, etc.~)}
4575@d limit_switch=51 {diddle limit conventions ( \.{\\displaylimits}, etc.~)}
4576@d above=52 {generalized fraction ( \.{\\above}, \.{\\atop}, etc.~)}
4577@d math_style=53 {style specification ( \.{\\displaystyle}, etc.~)}
4578@d math_choice=54 {choice specification ( \.{\\mathchoice} )}
4579@d non_script=55 {conditional math glue ( \.{\\nonscript} )}
4580@d vcenter=56 {vertically center a vbox ( \.{\\vcenter} )}
4581@d case_shift=57 {force specific case ( \.{\\lowercase}, \.{\\uppercase}~)}
4582@d message=58 {send to user ( \.{\\message}, \.{\\errmessage} )}
4583@d extension=59 {extensions to \TeX\ ( \.{\\write}, \.{\\special}, etc.~)}
4584@d in_stream=60 {files for reading ( \.{\\openin}, \.{\\closein} )}
4585@d begin_group=61 {begin local grouping ( \.{\\begingroup} )}
4586@d end_group=62 {end local grouping ( \.{\\endgroup} )}
4587@d omit=63 {omit alignment template ( \.{\\omit} )}
4588@d ex_space=64 {explicit space ( \.{\\\ } )}
4589@d no_boundary=65 {suppress boundary ligatures ( \.{\\noboundary} )}
4590@d radical=66 {square root and similar signs ( \.{\\radical} )}
4591@d end_cs_name=67 {end control sequence ( \.{\\endcsname} )}
4592@d min_internal=68 {the smallest code that can follow \.{\\the}}
4593@d char_given=68 {character code defined by \.{\\chardef}}
4594@d math_given=69 {math code defined by \.{\\mathchardef}}
4595@d XeTeX_math_given=70 {extended math code defined by \.{\\Umathchardef}}
4596@d last_item=71 {most recent item ( \.{\\lastpenalty},
4597  \.{\\lastkern}, \.{\\lastskip} )}
4598@d max_non_prefixed_command=71 {largest command code that can't be \.{\\global}}
4599
4600@ The next codes are special; they all relate to mode-independent
4601assignment of values to \TeX's internal registers or tables.
4602Codes that are |max_internal| or less represent internal quantities
4603that might be expanded by `\.{\\the}'.
4604
4605@d toks_register=72 {token list register ( \.{\\toks} )}
4606@d assign_toks=73 {special token list ( \.{\\output}, \.{\\everypar}, etc.~)}
4607@d assign_int=74 {user-defined integer ( \.{\\tolerance}, \.{\\day}, etc.~)}
4608@d assign_dimen=75 {user-defined length ( \.{\\hsize}, etc.~)}
4609@d assign_glue=76 {user-defined glue ( \.{\\baselineskip}, etc.~)}
4610@d assign_mu_glue=77 {user-defined muglue ( \.{\\thinmuskip}, etc.~)}
4611@d assign_font_dimen=78 {user-defined font dimension ( \.{\\fontdimen} )}
4612@d assign_font_int=79 {user-defined font integer ( \.{\\hyphenchar},
4613  \.{\\skewchar} )}
4614@d set_aux=80 {specify state info ( \.{\\spacefactor}, \.{\\prevdepth} )}
4615@d set_prev_graf=81 {specify state info ( \.{\\prevgraf} )}
4616@d set_page_dimen=82 {specify state info ( \.{\\pagegoal}, etc.~)}
4617@d set_page_int=83 {specify state info ( \.{\\deadcycles},
4618  \.{\\insertpenalties} )}
4619  {( or \.{\\interactionmode} )}
4620@d set_box_dimen=84 {change dimension of box ( \.{\\wd}, \.{\\ht}, \.{\\dp} )}
4621@d set_shape=85 {specify fancy paragraph shape ( \.{\\parshape} )}
4622  {(or \.{\\interlinepenalties}, etc.~)}
4623@d def_code=86 {define a character code ( \.{\\catcode}, etc.~)}
4624@d XeTeX_def_code=87 {\.{\\Umathcode}, \.{\\Udelcode}}
4625@d def_family=88 {declare math fonts ( \.{\\textfont}, etc.~)}
4626@d set_font=89 {set current font ( font identifiers )}
4627@d def_font=90 {define a font file ( \.{\\font} )}
4628@d register=91 {internal register ( \.{\\count}, \.{\\dimen}, etc.~)}
4629@d max_internal=91 {the largest code that can follow \.{\\the}}
4630@d advance=92 {advance a register or parameter ( \.{\\advance} )}
4631@d multiply=93 {multiply a register or parameter ( \.{\\multiply} )}
4632@d divide=94 {divide a register or parameter ( \.{\\divide} )}
4633@d prefix=95 {qualify a definition ( \.{\\global}, \.{\\long}, \.{\\outer} )}
4634  {( or \.{\\protected} )}
4635@d let=96 {assign a command code ( \.{\\let}, \.{\\futurelet} )}
4636@d shorthand_def=97 {code definition ( \.{\\chardef}, \.{\\countdef}, etc.~)}
4637@d read_to_cs=98 {read into a control sequence ( \.{\\read} )}
4638  {( or \.{\\readline} )}
4639@d def=99 {macro definition ( \.{\\def}, \.{\\gdef}, \.{\\xdef}, \.{\\edef} )}
4640@d set_box=100 {set a box ( \.{\\setbox} )}
4641@d hyph_data=101 {hyphenation data ( \.{\\hyphenation}, \.{\\patterns} )}
4642@d set_interaction=102 {define level of interaction ( \.{\\batchmode}, etc.~)}
4643@d max_command=102 {the largest command code seen at |big_switch|}
4644
4645@ The remaining command codes are extra special, since they cannot get through
4646\TeX's scanner to the main control routine. They have been given values higher
4647than |max_command| so that their special nature is easily discernible.
4648The ``expandable'' commands come first.
4649
4650@d undefined_cs=max_command+1 {initial state of most |eq_type| fields}
4651@d expand_after=max_command+2 {special expansion ( \.{\\expandafter} )}
4652@d no_expand=max_command+3 {special nonexpansion ( \.{\\noexpand} )}
4653@d input=max_command+4 {input a source file ( \.{\\input}, \.{\\endinput} )}
4654  {( or \.{\\scantokens} )}
4655@d if_test=max_command+5 {conditional text ( \.{\\if}, \.{\\ifcase}, etc.~)}
4656@d fi_or_else=max_command+6 {delimiters for conditionals ( \.{\\else}, etc.~)}
4657@d cs_name=max_command+7 {make a control sequence from tokens ( \.{\\csname} )}
4658@d convert=max_command+8 {convert to text ( \.{\\number}, \.{\\string}, etc.~)}
4659@d the=max_command+9 {expand an internal quantity ( \.{\\the} )}
4660  {( or \.{\\unexpanded}, \.{\\detokenize} )}
4661@d top_bot_mark=max_command+10 {inserted mark ( \.{\\topmark}, etc.~)}
4662@d call=max_command+11 {non-long, non-outer control sequence}
4663@d long_call=max_command+12 {long, non-outer control sequence}
4664@d outer_call=max_command+13 {non-long, outer control sequence}
4665@d long_outer_call=max_command+14 {long, outer control sequence}
4666@d end_template=max_command+15 {end of an alignment template}
4667@d dont_expand=max_command+16 {the following token was marked by \.{\\noexpand}}
4668@d glue_ref=max_command+17 {the equivalent points to a glue specification}
4669@d shape_ref=max_command+18 {the equivalent points to a parshape specification}
4670@d box_ref=max_command+19 {the equivalent points to a box node, or is |null|}
4671@d data=max_command+20 {the equivalent is simply a halfword number}
4672
4673@* \[16] The semantic nest.
4674\TeX\ is typically in the midst of building many lists at once. For example,
4675when a math formula is being processed, \TeX\ is in math mode and
4676working on an mlist; this formula has temporarily interrupted \TeX\ from
4677being in horizontal mode and building the hlist of a paragraph; and this
4678paragraph has temporarily interrupted \TeX\ from being in vertical mode
4679and building the vlist for the next page of a document. Similarly, when a
4680\.{\\vbox} occurs inside of an \.{\\hbox}, \TeX\ is temporarily
4681interrupted from working in restricted horizontal mode, and it enters
4682internal vertical mode.  The ``semantic nest'' is a stack that
4683keeps track of what lists and modes are currently suspended.
4684
4685At each level of processing we are in one of six modes:
4686
4687\yskip\hang|vmode| stands for vertical mode (the page builder);
4688
4689\hang|hmode| stands for horizontal mode (the paragraph builder);
4690
4691\hang|mmode| stands for displayed formula mode;
4692
4693\hang|-vmode| stands for internal vertical mode (e.g., in a \.{\\vbox});
4694
4695\hang|-hmode| stands for restricted horizontal mode (e.g., in an \.{\\hbox});
4696
4697\hang|-mmode| stands for math formula mode (not displayed).
4698
4699\yskip\noindent The mode is temporarily set to zero while processing \.{\\write}
4700texts in the |ship_out| routine.
4701
4702Numeric values are assigned to |vmode|, |hmode|, and |mmode| so that
4703\TeX's ``big semantic switch'' can select the appropriate thing to
4704do by computing the value |abs(mode)+cur_cmd|, where |mode| is the current
4705mode and |cur_cmd| is the current command code.
4706
4707@d vmode=1 {vertical mode}
4708@d hmode=vmode+max_command+1 {horizontal mode}
4709@d mmode=hmode+max_command+1 {math mode}
4710
4711@p procedure print_mode(@!m:integer); {prints the mode represented by |m|}
4712begin if m>0 then
4713  case m div (max_command+1) of
4714  0:print("vertical");
4715  1:print("horizontal");
4716  2:print("display math");
4717  end
4718else if m=0 then print("no")
4719else  case (-m) div (max_command+1) of
4720  0:print("internal vertical");
4721  1:print("restricted horizontal");
4722  2:print("math");
4723  end;
4724print(" mode");
4725end;
4726
4727@ The state of affairs at any semantic level can be represented by
4728five values:
4729
4730\yskip\hang|mode| is the number representing the semantic mode, as
4731just explained.
4732
4733\yskip\hang|head| is a |pointer| to a list head for the list being built;
4734|link(head)| therefore points to the first element of the list, or
4735to |null| if the list is empty.
4736
4737\yskip\hang|tail| is a |pointer| to the final node of the list being
4738built; thus, |tail=head| if and only if the list is empty.
4739
4740\yskip\hang|prev_graf| is the number of lines of the current paragraph that
4741have already been put into the present vertical list.
4742
4743\yskip\hang|aux| is an auxiliary |memory_word| that gives further information
4744that is needed to characterize the situation.
4745
4746\yskip\noindent
4747In vertical mode, |aux| is also known as |prev_depth|; it is the scaled
4748value representing the depth of the previous box, for use in baseline
4749calculations, or it is |<=-1000|pt if the next box on the vertical list is to
4750be exempt from baseline calculations.  In horizontal mode, |aux| is also
4751known as |space_factor| and |clang|; it holds the current space factor used in
4752spacing calculations, and the current language used for hyphenation.
4753(The value of |clang| is undefined in restricted horizontal mode.)
4754In math mode, |aux| is also known as |incompleat_noad|; if
4755not |null|, it points to a record that represents the numerator of a
4756generalized fraction for which the denominator is currently being formed
4757in the current list.
4758
4759There is also a sixth quantity, |mode_line|, which correlates
4760the semantic nest with the user's input; |mode_line| contains the source
4761line number at which the current level of nesting was entered. The negative
4762of this line number is the |mode_line| at the level of the
4763user's output routine.
4764
4765A seventh quantity, |eTeX_aux|, is used by the extended features \eTeX.
4766In vertical modes it is known as |LR_save| and holds the LR stack when a
4767paragraph is interrupted by a displayed formula.  In display math mode
4768it is known as |LR_box| and holds a pointer to a prototype box for the
4769display.  In math mode it is known as |delim_ptr| and points to the most
4770recent |left_noad| or |middle_noad| of a |math_left_group|.
4771
4772In horizontal mode, the |prev_graf| field is used for initial language data.
4773
4774The semantic nest is an array called |nest| that holds the |mode|, |head|,
4775|tail|, |prev_graf|, |aux|, and |mode_line| values for all semantic levels
4776below the currently active one. Information about the currently active
4777level is kept in the global quantities |mode|, |head|, |tail|, |prev_graf|,
4778|aux|, and |mode_line|, which live in a \PASCAL\ record that is ready to
4779be pushed onto |nest| if necessary.
4780
4781@d ignore_depth==-65536000 {|prev_depth| value that is ignored}
4782
4783@<Types...@>=
4784@!list_state_record=record@!mode_field:-mmode..mmode;@+
4785  @!head_field,@!tail_field: pointer;
4786  @!eTeX_aux_field: pointer;
4787  @!pg_field,@!ml_field: integer;@+
4788  @!aux_field: memory_word;
4789  end;
4790
4791@ @d mode==cur_list.mode_field {current mode}
4792@d head==cur_list.head_field {header node of current list}
4793@d tail==cur_list.tail_field {final node on current list}
4794@d eTeX_aux==cur_list.eTeX_aux_field {auxiliary data for \eTeX}
4795@d LR_save==eTeX_aux {LR stack when a paragraph is interrupted}
4796@d LR_box==eTeX_aux {prototype box for display}
4797@d delim_ptr==eTeX_aux {most recent left or right noad of a math left group}
4798@d prev_graf==cur_list.pg_field {number of paragraph lines accumulated}
4799@d aux==cur_list.aux_field {auxiliary data about the current list}
4800@d prev_depth==aux.sc {the name of |aux| in vertical mode}
4801@d space_factor==aux.hh.lh {part of |aux| in horizontal mode}
4802@d clang==aux.hh.rh {the other part of |aux| in horizontal mode}
4803@d incompleat_noad==aux.int {the name of |aux| in math mode}
4804@d mode_line==cur_list.ml_field {source file line number at beginning of list}
4805
4806@<Glob...@>=
4807@!nest:array[0..nest_size] of list_state_record;
4808@!nest_ptr:0..nest_size; {first unused location of |nest|}
4809@!max_nest_stack:0..nest_size; {maximum of |nest_ptr| when pushing}
4810@!cur_list:list_state_record; {the ``top'' semantic state}
4811@!shown_mode:-mmode..mmode; {most recent mode shown by \.{\\tracingcommands}}
4812
4813@ Here is a common way to make the current list grow:
4814
4815@d tail_append(#)==begin link(tail):=#; tail:=link(tail);
4816  end
4817
4818@ We will see later that the vertical list at the bottom semantic level is split
4819into two parts; the ``current page'' runs from |page_head| to |page_tail|,
4820and the ``contribution list'' runs from |contrib_head| to |tail| of
4821semantic level zero. The idea is that contributions are first formed in
4822vertical mode, then ``contributed'' to the current page (during which time
4823the page-breaking decisions are made). For now, we don't need to know
4824any more details about the page-building process.
4825
4826@<Set init...@>=
4827nest_ptr:=0; max_nest_stack:=0;
4828mode:=vmode; head:=contrib_head; tail:=contrib_head;
4829eTeX_aux:=null;
4830prev_depth:=ignore_depth; mode_line:=0;
4831prev_graf:=0; shown_mode:=0;
4832@<Start a new current page@>;
4833
4834@ When \TeX's work on one level is interrupted, the state is saved by
4835calling |push_nest|. This routine changes |head| and |tail| so that
4836a new (empty) list is begun; it does not change |mode| or |aux|.
4837
4838@p procedure push_nest; {enter a new semantic level, save the old}
4839begin if nest_ptr>max_nest_stack then
4840  begin max_nest_stack:=nest_ptr;
4841  if nest_ptr=nest_size then overflow("semantic nest size",nest_size);
4842@:TeX capacity exceeded semantic nest size}{\quad semantic nest size@>
4843  end;
4844nest[nest_ptr]:=cur_list; {stack the record}
4845incr(nest_ptr); head:=get_avail; tail:=head; prev_graf:=0; mode_line:=line;
4846eTeX_aux:=null;
4847end;
4848
4849@ Conversely, when \TeX\ is finished on the current level, the former
4850state is restored by calling |pop_nest|. This routine will never be
4851called at the lowest semantic level, nor will it be called unless |head|
4852is a node that should be returned to free memory.
4853
4854@p procedure pop_nest; {leave a semantic level, re-enter the old}
4855begin free_avail(head); decr(nest_ptr); cur_list:=nest[nest_ptr];
4856end;
4857
4858@ Here is a procedure that displays what \TeX\ is working on, at all levels.
4859
4860@p procedure@?print_totals; forward;@t\2@>
4861procedure show_activities;
4862var p:0..nest_size; {index into |nest|}
4863@!m:-mmode..mmode; {mode}
4864@!a:memory_word; {auxiliary}
4865@!q,@!r:pointer; {for showing the current page}
4866@!t:integer; {ditto}
4867begin nest[nest_ptr]:=cur_list; {put the top level into the array}
4868print_nl(""); print_ln;
4869for p:=nest_ptr downto 0 do
4870  begin m:=nest[p].mode_field; a:=nest[p].aux_field;
4871  print_nl("### "); print_mode(m);
4872  print(" entered at line "); print_int(abs(nest[p].ml_field));
4873  if m=hmode then if nest[p].pg_field <> @'40600000 then
4874    begin print(" (language"); print_int(nest[p].pg_field mod @'200000);
4875    print(":hyphenmin"); print_int(nest[p].pg_field div @'20000000);
4876    print_char(","); print_int((nest[p].pg_field div @'200000) mod @'100);
4877    print_char(")");
4878    end;
4879  if nest[p].ml_field<0 then print(" (\output routine)");
4880  if p=0 then
4881    begin @<Show the status of the current page@>;
4882    if link(contrib_head)<>null then
4883      print_nl("### recent contributions:");
4884    end;
4885  show_box(link(nest[p].head_field));
4886  @<Show the auxiliary field, |a|@>;
4887  end;
4888end;
4889
4890@ @<Show the auxiliary...@>=
4891case abs(m) div (max_command+1) of
48920: begin print_nl("prevdepth ");
4893  if a.sc<=ignore_depth then print("ignored")
4894  else print_scaled(a.sc);
4895  if nest[p].pg_field<>0 then
4896    begin print(", prevgraf ");
4897    print_int(nest[p].pg_field); print(" line");
4898    if nest[p].pg_field<>1 then print_char("s");
4899    end;
4900  end;
49011: begin print_nl("spacefactor "); print_int(a.hh.lh);
4902  if m>0 then@+ if a.hh.rh>0 then
4903    begin print(", current language "); print_int(a.hh.rh);@+
4904    end;
4905  end;
49062: if a.int<>null then
4907  begin print("this will be denominator of:"); show_box(a.int);@+
4908  end;
4909end {there are no other cases}
4910
4911@* \[17] The table of equivalents.
4912Now that we have studied the data structures for \TeX's semantic routines,
4913we ought to consider the data structures used by its syntactic routines. In
4914other words, our next concern will be
4915the tables that \TeX\ looks at when it is scanning
4916what the user has written.
4917
4918The biggest and most important such table is called |eqtb|. It holds the
4919current ``equivalents'' of things; i.e., it explains what things mean
4920or what their current values are, for all quantities that are subject to
4921the nesting structure provided by \TeX's grouping mechanism. There are six
4922parts to |eqtb|:
4923
4924\yskip\hangg 1) |eqtb[active_base..(hash_base-1)]| holds the current
4925equivalents of single-character control sequences.
4926
4927\yskip\hangg 2) |eqtb[hash_base..(glue_base-1)]| holds the current
4928equivalents of multiletter control sequences.
4929
4930\yskip\hangg 3) |eqtb[glue_base..(local_base-1)]| holds the current
4931equivalents of glue parameters like the current baselineskip.
4932
4933\yskip\hangg 4) |eqtb[local_base..(int_base-1)]| holds the current
4934equivalents of local halfword quantities like the current box registers,
4935the current ``catcodes,'' the current font, and a pointer to the current
4936paragraph shape.
4937
4938\yskip\hangg 5) |eqtb[int_base..(dimen_base-1)]| holds the current
4939equivalents of fullword integer parameters like the current hyphenation
4940penalty.
4941
4942\yskip\hangg 6) |eqtb[dimen_base..eqtb_size]| holds the current equivalents
4943of fullword dimension parameters like the current hsize or amount of
4944hanging indentation.
4945
4946\yskip\noindent Note that, for example, the current amount of
4947baselineskip glue is determined by the setting of a particular location
4948in region~3 of |eqtb|, while the current meaning of the control sequence
4949`\.{\\baselineskip}' (which might have been changed by \.{\\def} or
4950\.{\\let}) appears in region~2.
4951
4952@ Each entry in |eqtb| is a |memory_word|. Most of these words are of type
4953|two_halves|, and subdivided into three fields:
4954
4955\yskip\hangg 1) The |eq_level| (a quarterword) is the level of grouping at
4956which this equivalent was defined. If the level is |level_zero|, the
4957equivalent has never been defined; |level_one| refers to the outer level
4958(outside of all groups), and this level is also used for global
4959definitions that never go away. Higher levels are for equivalents that
4960will disappear at the end of their group.  @^global definitions@>
4961
4962\yskip\hangg 2) The |eq_type| (another quarterword) specifies what kind of
4963entry this is. There are many types, since each \TeX\ primitive like
4964\.{\\hbox}, \.{\\def}, etc., has its own special code. The list of
4965command codes above includes all possible settings of the |eq_type| field.
4966
4967\yskip\hangg 3) The |equiv| (a halfword) is the current equivalent value.
4968This may be a font number, a pointer into |mem|, or a variety of other
4969things.
4970
4971@d eq_level_field(#)==#.hh.b1
4972@d eq_type_field(#)==#.hh.b0
4973@d equiv_field(#)==#.hh.rh
4974@d eq_level(#)==eq_level_field(eqtb[#]) {level of definition}
4975@d eq_type(#)==eq_type_field(eqtb[#]) {command code for equivalent}
4976@d equiv(#)==equiv_field(eqtb[#]) {equivalent value}
4977@d level_zero=min_quarterword {level for undefined quantities}
4978@d level_one=level_zero+1 {outermost level for defined quantities}
4979
4980@ Many locations in |eqtb| have symbolic names. The purpose of the next
4981paragraphs is to define these names, and to set up the initial values of the
4982equivalents.
4983
4984In the first region we have |number_chars| equivalents for ``active characters''
4985that act as control sequences, followed by |number_chars| equivalents for
4986single-character control sequences.
4987
4988Then comes region~2, which corresponds to the hash table that we will
4989define later.  The maximum address in this region is used for a dummy
4990control sequence that is perpetually undefined. There also are several
4991locations for control sequences that are perpetually defined
4992(since they are used in error recovery).
4993
4994@d active_base=1 {beginning of region 1, for active character equivalents}
4995@d single_base=active_base+number_usvs
4996   {equivalents of one-character control sequences}
4997@d null_cs=single_base+number_usvs {equivalent of \.{\\csname\\endcsname}}
4998@d hash_base=null_cs+1 {beginning of region 2, for the hash table}
4999@d frozen_control_sequence=hash_base+hash_size {for error recovery}
5000@d frozen_protection=frozen_control_sequence {inaccessible but definable}
5001@d frozen_cr=frozen_control_sequence+1 {permanent `\.{\\cr}'}
5002@d frozen_end_group=frozen_control_sequence+2 {permanent `\.{\\endgroup}'}
5003@d frozen_right=frozen_control_sequence+3 {permanent `\.{\\right}'}
5004@d frozen_fi=frozen_control_sequence+4 {permanent `\.{\\fi}'}
5005@d frozen_end_template=frozen_control_sequence+5 {permanent `\.{\\endtemplate}'}
5006@d frozen_endv=frozen_control_sequence+6 {second permanent `\.{\\endtemplate}'}
5007@d frozen_relax=frozen_control_sequence+7 {permanent `\.{\\relax}'}
5008@d end_write=frozen_control_sequence+8 {permanent `\.{\\endwrite}'}
5009@d frozen_dont_expand=frozen_control_sequence+9
5010  {permanent `\.{\\notexpanded:}'}
5011@d frozen_null_font=frozen_control_sequence+10
5012  {permanent `\.{\\nullfont}'}
5013@d font_id_base=frozen_null_font-font_base
5014  {begins table of 257 permanent font identifiers}
5015@d undefined_control_sequence=frozen_null_font+257 {dummy location}
5016@d glue_base=undefined_control_sequence+1 {beginning of region 3}
5017
5018@<Initialize table entries...@>=
5019eq_type(undefined_control_sequence):=undefined_cs;
5020equiv(undefined_control_sequence):=null;
5021eq_level(undefined_control_sequence):=level_zero;
5022for k:=active_base to undefined_control_sequence-1 do
5023  eqtb[k]:=eqtb[undefined_control_sequence];
5024
5025@ Here is a routine that displays the current meaning of an |eqtb| entry
5026in region 1 or~2. (Similar routines for the other regions will appear
5027below.)
5028
5029@<Show equivalent |n|, in region 1 or 2@>=
5030begin sprint_cs(n); print_char("="); print_cmd_chr(eq_type(n),equiv(n));
5031if eq_type(n)>=call then
5032  begin print_char(":"); show_token_list(link(equiv(n)),null,32);
5033  end;
5034end
5035
5036@ Region 3 of |eqtb| contains the |number_regs| \.{\\skip} registers,
5037as well as the glue parameters defined here. It is important that the
5038``muskip'' parameters have larger numbers than the others.
5039
5040@d line_skip_code=0 {interline glue if |baseline_skip| is infeasible}
5041@d baseline_skip_code=1 {desired glue between baselines}
5042@d par_skip_code=2 {extra glue just above a paragraph}
5043@d above_display_skip_code=3 {extra glue just above displayed math}
5044@d below_display_skip_code=4 {extra glue just below displayed math}
5045@d above_display_short_skip_code=5
5046  {glue above displayed math following short lines}
5047@d below_display_short_skip_code=6
5048  {glue below displayed math following short lines}
5049@d left_skip_code=7 {glue at left of justified lines}
5050@d right_skip_code=8 {glue at right of justified lines}
5051@d top_skip_code=9 {glue at top of main pages}
5052@d split_top_skip_code=10 {glue at top of split pages}
5053@d tab_skip_code=11 {glue between aligned entries}
5054@d space_skip_code=12 {glue between words (if not |zero_glue|)}
5055@d xspace_skip_code=13 {glue after sentences (if not |zero_glue|)}
5056@d par_fill_skip_code=14 {glue on last line of paragraph}
5057@d XeTeX_linebreak_skip_code=15 {glue introduced at potential linebreak location}
5058@d thin_mu_skip_code=16 {thin space in math formula}
5059@d med_mu_skip_code=17 {medium space in math formula}
5060@d thick_mu_skip_code=18 {thick space in math formula}
5061@d glue_pars=19 {total number of glue parameters}
5062@d skip_base=glue_base+glue_pars {table of |number_regs| ``skip'' registers}
5063@d mu_skip_base=skip_base+number_regs
5064   {table of |number_regs| ``muskip'' registers}
5065@d local_base=mu_skip_base+number_regs {beginning of region 4}
5066@#
5067@d skip(#)==equiv(skip_base+#) {|mem| location of glue specification}
5068@d mu_skip(#)==equiv(mu_skip_base+#) {|mem| location of math glue spec}
5069@d glue_par(#)==equiv(glue_base+#) {|mem| location of glue specification}
5070@d line_skip==glue_par(line_skip_code)
5071@d baseline_skip==glue_par(baseline_skip_code)
5072@d par_skip==glue_par(par_skip_code)
5073@d above_display_skip==glue_par(above_display_skip_code)
5074@d below_display_skip==glue_par(below_display_skip_code)
5075@d above_display_short_skip==glue_par(above_display_short_skip_code)
5076@d below_display_short_skip==glue_par(below_display_short_skip_code)
5077@d left_skip==glue_par(left_skip_code)
5078@d right_skip==glue_par(right_skip_code)
5079@d top_skip==glue_par(top_skip_code)
5080@d split_top_skip==glue_par(split_top_skip_code)
5081@d tab_skip==glue_par(tab_skip_code)
5082@d space_skip==glue_par(space_skip_code)
5083@d xspace_skip==glue_par(xspace_skip_code)
5084@d par_fill_skip==glue_par(par_fill_skip_code)
5085@d XeTeX_linebreak_skip==glue_par(XeTeX_linebreak_skip_code)
5086@d thin_mu_skip==glue_par(thin_mu_skip_code)
5087@d med_mu_skip==glue_par(med_mu_skip_code)
5088@d thick_mu_skip==glue_par(thick_mu_skip_code)
5089
5090@<Current |mem| equivalent of glue parameter number |n|@>=glue_par(n)
5091
5092@ Sometimes we need to convert \TeX's internal code numbers into symbolic
5093form. The |print_skip_param| routine gives the symbolic name of a glue
5094parameter.
5095
5096@<Declare the procedure called |print_skip_param|@>=
5097procedure print_skip_param(@!n:integer);
5098begin case n of
5099line_skip_code: print_esc("lineskip");
5100baseline_skip_code: print_esc("baselineskip");
5101par_skip_code: print_esc("parskip");
5102above_display_skip_code: print_esc("abovedisplayskip");
5103below_display_skip_code: print_esc("belowdisplayskip");
5104above_display_short_skip_code: print_esc("abovedisplayshortskip");
5105below_display_short_skip_code: print_esc("belowdisplayshortskip");
5106left_skip_code: print_esc("leftskip");
5107right_skip_code: print_esc("rightskip");
5108top_skip_code: print_esc("topskip");
5109split_top_skip_code: print_esc("splittopskip");
5110tab_skip_code: print_esc("tabskip");
5111space_skip_code: print_esc("spaceskip");
5112xspace_skip_code: print_esc("xspaceskip");
5113par_fill_skip_code: print_esc("parfillskip");
5114XeTeX_linebreak_skip_code: print_esc("XeTeXlinebreakskip");
5115thin_mu_skip_code: print_esc("thinmuskip");
5116med_mu_skip_code: print_esc("medmuskip");
5117thick_mu_skip_code: print_esc("thickmuskip");
5118othercases print("[unknown glue parameter!]")
5119endcases;
5120end;
5121
5122@ The symbolic names for glue parameters are put into \TeX's hash table
5123by using the routine called |primitive|, defined below. Let us enter them
5124now, so that we don't have to list all those parameter names anywhere else.
5125
5126@<Put each of \TeX's primitives into the hash table@>=
5127primitive("lineskip",assign_glue,glue_base+line_skip_code);@/
5128@!@:line_skip_}{\.{\\lineskip} primitive@>
5129primitive("baselineskip",assign_glue,glue_base+baseline_skip_code);@/
5130@!@:baseline_skip_}{\.{\\baselineskip} primitive@>
5131primitive("parskip",assign_glue,glue_base+par_skip_code);@/
5132@!@:par_skip_}{\.{\\parskip} primitive@>
5133primitive("abovedisplayskip",assign_glue,glue_base+above_display_skip_code);@/
5134@!@:above_display_skip_}{\.{\\abovedisplayskip} primitive@>
5135primitive("belowdisplayskip",assign_glue,glue_base+below_display_skip_code);@/
5136@!@:below_display_skip_}{\.{\\belowdisplayskip} primitive@>
5137primitive("abovedisplayshortskip",
5138  assign_glue,glue_base+above_display_short_skip_code);@/
5139@!@:above_display_short_skip_}{\.{\\abovedisplayshortskip} primitive@>
5140primitive("belowdisplayshortskip",
5141  assign_glue,glue_base+below_display_short_skip_code);@/
5142@!@:below_display_short_skip_}{\.{\\belowdisplayshortskip} primitive@>
5143primitive("leftskip",assign_glue,glue_base+left_skip_code);@/
5144@!@:left_skip_}{\.{\\leftskip} primitive@>
5145primitive("rightskip",assign_glue,glue_base+right_skip_code);@/
5146@!@:right_skip_}{\.{\\rightskip} primitive@>
5147primitive("topskip",assign_glue,glue_base+top_skip_code);@/
5148@!@:top_skip_}{\.{\\topskip} primitive@>
5149primitive("splittopskip",assign_glue,glue_base+split_top_skip_code);@/
5150@!@:split_top_skip_}{\.{\\splittopskip} primitive@>
5151primitive("tabskip",assign_glue,glue_base+tab_skip_code);@/
5152@!@:tab_skip_}{\.{\\tabskip} primitive@>
5153primitive("spaceskip",assign_glue,glue_base+space_skip_code);@/
5154@!@:space_skip_}{\.{\\spaceskip} primitive@>
5155primitive("xspaceskip",assign_glue,glue_base+xspace_skip_code);@/
5156@!@:xspace_skip_}{\.{\\xspaceskip} primitive@>
5157primitive("parfillskip",assign_glue,glue_base+par_fill_skip_code);@/
5158@!@:par_fill_skip_}{\.{\\parfillskip} primitive@>
5159primitive("XeTeXlinebreakskip",assign_glue,glue_base+XeTeX_linebreak_skip_code);@/
5160@!@:XeTeX_linebreak_skip_}{\.{\\XeTeXlinebreakskip} primitive@>
5161primitive("thinmuskip",assign_mu_glue,glue_base+thin_mu_skip_code);@/
5162@!@:thin_mu_skip_}{\.{\\thinmuskip} primitive@>
5163primitive("medmuskip",assign_mu_glue,glue_base+med_mu_skip_code);@/
5164@!@:med_mu_skip_}{\.{\\medmuskip} primitive@>
5165primitive("thickmuskip",assign_mu_glue,glue_base+thick_mu_skip_code);@/
5166@!@:thick_mu_skip_}{\.{\\thickmuskip} primitive@>
5167
5168@ @<Cases of |print_cmd_chr| for symbolic printing of primitives@>=
5169assign_glue,assign_mu_glue: if chr_code<skip_base then
5170    print_skip_param(chr_code-glue_base)
5171  else if chr_code<mu_skip_base then
5172    begin print_esc("skip"); print_int(chr_code-skip_base);
5173    end
5174  else  begin print_esc("muskip"); print_int(chr_code-mu_skip_base);
5175    end;
5176
5177@ All glue parameters and registers are initially `\.{0pt plus0pt minus0pt}'.
5178
5179@<Initialize table entries...@>=
5180equiv(glue_base):=zero_glue; eq_level(glue_base):=level_one;
5181eq_type(glue_base):=glue_ref;
5182for k:=glue_base+1 to local_base-1 do eqtb[k]:=eqtb[glue_base];
5183glue_ref_count(zero_glue):=glue_ref_count(zero_glue)+local_base-glue_base;
5184
5185@ @<Show equivalent |n|, in region 3@>=
5186if n<skip_base then
5187  begin print_skip_param(n-glue_base); print_char("=");
5188  if n<glue_base+thin_mu_skip_code then print_spec(equiv(n),"pt")
5189  else print_spec(equiv(n),"mu");
5190  end
5191else if n<mu_skip_base then
5192  begin print_esc("skip"); print_int(n-skip_base); print_char("=");
5193  print_spec(equiv(n),"pt");
5194  end
5195else  begin print_esc("muskip"); print_int(n-mu_skip_base); print_char("=");
5196  print_spec(equiv(n),"mu");
5197  end
5198
5199@ Region 4 of |eqtb| contains the local quantities defined here. The
5200bulk of this region is taken up by five tables that are indexed by eight-bit
5201characters; these tables are important to both the syntactic and semantic
5202portions of \TeX. There are also a bunch of special things like font and
5203token parameters, as well as the tables of \.{\\toks} and \.{\\box}
5204registers.
5205
5206@d par_shape_loc=local_base {specifies paragraph shape}
5207@d output_routine_loc=local_base+1 {points to token list for \.{\\output}}
5208@d every_par_loc=local_base+2 {points to token list for \.{\\everypar}}
5209@d every_math_loc=local_base+3 {points to token list for \.{\\everymath}}
5210@d every_display_loc=local_base+4 {points to token list for \.{\\everydisplay}}
5211@d every_hbox_loc=local_base+5 {points to token list for \.{\\everyhbox}}
5212@d every_vbox_loc=local_base+6 {points to token list for \.{\\everyvbox}}
5213@d every_job_loc=local_base+7 {points to token list for \.{\\everyjob}}
5214@d every_cr_loc=local_base+8 {points to token list for \.{\\everycr}}
5215@d err_help_loc=local_base+9 {points to token list for \.{\\errhelp}}
5216@d tex_toks=local_base+10 {end of \TeX's token list parameters}
5217@#
5218@d etex_toks_base=tex_toks {base for \eTeX's token list parameters}
5219@d every_eof_loc=etex_toks_base {points to token list for \.{\\everyeof}}
5220@d XeTeX_inter_char_loc=every_eof_loc+1 {not really used, but serves as a flag}
5221@d etex_toks=XeTeX_inter_char_loc+1 {end of \eTeX's token list parameters}
5222@#
5223@d toks_base=etex_toks {table of |number_regs| token list registers}
5224@#
5225@d etex_pen_base=toks_base+number_regs {start of table of \eTeX's penalties}
5226@d inter_line_penalties_loc=etex_pen_base {additional penalties between lines}
5227@d club_penalties_loc=etex_pen_base+1 {penalties for creating club lines}
5228@d widow_penalties_loc=etex_pen_base+2 {penalties for creating widow lines}
5229@d display_widow_penalties_loc=etex_pen_base+3 {ditto, just before a display}
5230@d etex_pens=etex_pen_base+4 {end of table of \eTeX's penalties}
5231@#
5232@d box_base=etex_pens {table of |number_regs| box registers}
5233@d cur_font_loc=box_base+number_regs {internal font number outside math mode}
5234@d math_font_base=cur_font_loc+1 {table of |number_math_fonts| math font numbers}
5235@d cat_code_base=math_font_base+number_math_fonts
5236  {table of |number_usvs| command codes (the ``catcodes'')}
5237@d lc_code_base=cat_code_base+number_usvs {table of |number_usvs| lowercase mappings}
5238@d uc_code_base=lc_code_base+number_usvs {table of |number_usvs| uppercase mappings}
5239@d sf_code_base=uc_code_base+number_usvs {table of |number_usvs| spacefactor mappings}
5240@d math_code_base=sf_code_base+number_usvs {table of |number_usvs| math mode mappings}
5241@d int_base=math_code_base+number_usvs {beginning of region 5}
5242@#
5243@d par_shape_ptr==equiv(par_shape_loc)
5244@d output_routine==equiv(output_routine_loc)
5245@d every_par==equiv(every_par_loc)
5246@d every_math==equiv(every_math_loc)
5247@d every_display==equiv(every_display_loc)
5248@d every_hbox==equiv(every_hbox_loc)
5249@d every_vbox==equiv(every_vbox_loc)
5250@d every_job==equiv(every_job_loc)
5251@d every_cr==equiv(every_cr_loc)
5252@d err_help==equiv(err_help_loc)
5253@d toks(#)==equiv(toks_base+#)
5254@d box(#)==equiv(box_base+#)
5255@d cur_font==equiv(cur_font_loc)
5256@d fam_fnt(#)==equiv(math_font_base+#)
5257@d cat_code(#)==equiv(cat_code_base+#)
5258@d lc_code(#)==equiv(lc_code_base+#)
5259@d uc_code(#)==equiv(uc_code_base+#)
5260@d sf_code(#)==equiv(sf_code_base+#)
5261@d math_code(#)==equiv(math_code_base+#)
5262  {Note: |math_code(c)| is the true math code plus |min_halfword|}
5263
5264@<Put each...@>=
5265primitive("output",assign_toks,output_routine_loc);
5266@!@:output_}{\.{\\output} primitive@>
5267primitive("everypar",assign_toks,every_par_loc);
5268@!@:every_par_}{\.{\\everypar} primitive@>
5269primitive("everymath",assign_toks,every_math_loc);
5270@!@:every_math_}{\.{\\everymath} primitive@>
5271primitive("everydisplay",assign_toks,every_display_loc);
5272@!@:every_display_}{\.{\\everydisplay} primitive@>
5273primitive("everyhbox",assign_toks,every_hbox_loc);
5274@!@:every_hbox_}{\.{\\everyhbox} primitive@>
5275primitive("everyvbox",assign_toks,every_vbox_loc);
5276@!@:every_vbox_}{\.{\\everyvbox} primitive@>
5277primitive("everyjob",assign_toks,every_job_loc);
5278@!@:every_job_}{\.{\\everyjob} primitive@>
5279primitive("everycr",assign_toks,every_cr_loc);
5280@!@:every_cr_}{\.{\\everycr} primitive@>
5281primitive("errhelp",assign_toks,err_help_loc);
5282@!@:err_help_}{\.{\\errhelp} primitive@>
5283
5284@ @<Cases of |print_cmd_chr|...@>=
5285assign_toks: if chr_code>=toks_base then
5286  begin print_esc("toks"); print_int(chr_code-toks_base);
5287  end
5288else  case chr_code of
5289  output_routine_loc: print_esc("output");
5290  every_par_loc: print_esc("everypar");
5291  every_math_loc: print_esc("everymath");
5292  every_display_loc: print_esc("everydisplay");
5293  every_hbox_loc: print_esc("everyhbox");
5294  every_vbox_loc: print_esc("everyvbox");
5295  every_job_loc: print_esc("everyjob");
5296  every_cr_loc: print_esc("everycr");
5297  @/@<Cases of |assign_toks| for |print_cmd_chr|@>@/
5298  othercases print_esc("errhelp")
5299  endcases;
5300
5301@ We initialize most things to null or undefined values. An undefined font
5302is represented by the internal code |font_base|.
5303
5304However, the character code tables are given initial values based on the
5305conventional interpretation of ASCII code. These initial values should
5306not be changed when \TeX\ is adapted for use with non-English languages;
5307all changes to the initialization conventions should be made in format
5308packages, not in \TeX\ itself, so that global interchange of formats is
5309possible.
5310
5311@d null_font==font_base
5312@d var_fam_class = 7
5313@d active_math_char = @"1FFFFF
5314@d is_active_math_char(#) == math_char_field(#) = active_math_char
5315@d is_var_family(#) == math_class_field(#) = 7
5316
5317@<Initialize table entries...@>=
5318par_shape_ptr:=null; eq_type(par_shape_loc):=shape_ref;
5319eq_level(par_shape_loc):=level_one;@/
5320for k:=etex_pen_base to etex_pens-1 do
5321  eqtb[k]:=eqtb[par_shape_loc];
5322for k:=output_routine_loc to toks_base+number_regs-1 do
5323  eqtb[k]:=eqtb[undefined_control_sequence];
5324box(0):=null; eq_type(box_base):=box_ref; eq_level(box_base):=level_one;
5325for k:=box_base+1 to box_base+number_regs-1 do eqtb[k]:=eqtb[box_base];
5326cur_font:=null_font; eq_type(cur_font_loc):=data;
5327eq_level(cur_font_loc):=level_one;@/
5328for k:=math_font_base to math_font_base+number_math_fonts-1 do eqtb[k]:=eqtb[cur_font_loc];
5329equiv(cat_code_base):=0; eq_type(cat_code_base):=data;
5330eq_level(cat_code_base):=level_one;@/
5331for k:=cat_code_base+1 to int_base-1 do eqtb[k]:=eqtb[cat_code_base];
5332for k:=0 to number_usvs-1 do
5333  begin cat_code(k):=other_char; math_code(k):=hi(k); sf_code(k):=1000;
5334  end;
5335cat_code(carriage_return):=car_ret; cat_code(" "):=spacer;
5336cat_code("\"):=escape; cat_code("%"):=comment;
5337cat_code(invalid_code):=invalid_char; cat_code(null_code):=ignore;
5338for k:="0" to "9" do math_code(k):=hi(k + set_class_field(var_fam_class));
5339for k:="A" to "Z" do
5340  begin cat_code(k):=letter; cat_code(k+"a"-"A"):=letter;@/
5341  math_code(k):=hi(k + set_family_field(1) + set_class_field(var_fam_class));
5342  math_code(k+"a"-"A"):=hi(k+"a"-"A" + set_family_field(1) + set_class_field(var_fam_class));@/
5343  lc_code(k):=k+"a"-"A"; lc_code(k+"a"-"A"):=k+"a"-"A";@/
5344  uc_code(k):=k; uc_code(k+"a"-"A"):=k;@/
5345  sf_code(k):=999;
5346  end;
5347
5348@ @<Show equivalent |n|, in region 4@>=
5349if (n=par_shape_loc) or ((n>=etex_pen_base) and (n<etex_pens)) then
5350  begin print_cmd_chr(set_shape,n); print_char("=");
5351  if equiv(n)=null then print_char("0")
5352  else if n>par_shape_loc then
5353    begin print_int(penalty(equiv(n))); print_char(" ");
5354    print_int(penalty(equiv(n)+1));
5355    if penalty(equiv(n))>1 then print_esc("ETC.");
5356    end
5357  else print_int(info(par_shape_ptr));
5358  end
5359else if n<toks_base then
5360  begin print_cmd_chr(assign_toks,n); print_char("=");
5361  if equiv(n)<>null then show_token_list(link(equiv(n)),null,32);
5362  end
5363else if n<box_base then
5364  begin print_esc("toks"); print_int(n-toks_base); print_char("=");
5365  if equiv(n)<>null then show_token_list(link(equiv(n)),null,32);
5366  end
5367else if n<cur_font_loc then
5368  begin print_esc("box"); print_int(n-box_base); print_char("=");
5369  if equiv(n)=null then print("void")
5370  else  begin depth_threshold:=0; breadth_max:=1; show_node_list(equiv(n));
5371    end;
5372  end
5373else if n<cat_code_base then @<Show the font identifier in |eqtb[n]|@>
5374else @<Show the halfword code in |eqtb[n]|@>
5375
5376@ @<Show the font identifier in |eqtb[n]|@>=
5377begin if n=cur_font_loc then print("current font")
5378else if n<math_font_base+script_size then
5379  begin print_esc("textfont"); print_int(n-math_font_base);
5380  end
5381else if n<math_font_base+script_script_size then
5382  begin print_esc("scriptfont"); print_int(n-math_font_base-script_size);
5383  end
5384else  begin print_esc("scriptscriptfont");
5385  print_int(n-math_font_base-script_script_size);
5386  end;
5387print_char("=");@/
5388print_esc(hash[font_id_base+equiv(n)].rh);
5389  {that's |font_id_text(equiv(n))|}
5390end
5391
5392@ @<Show the halfword code in |eqtb[n]|@>=
5393if n<math_code_base then
5394  begin if n<lc_code_base then
5395    begin print_esc("catcode"); print_int(n-cat_code_base);
5396    end
5397  else if n<uc_code_base then
5398    begin print_esc("lccode"); print_int(n-lc_code_base);
5399    end
5400  else if n<sf_code_base then
5401    begin print_esc("uccode"); print_int(n-uc_code_base);
5402    end
5403  else  begin print_esc("sfcode"); print_int(n-sf_code_base);
5404    end;
5405  print_char("="); print_int(equiv(n));
5406  end
5407else  begin print_esc("mathcode"); print_int(n-math_code_base);
5408  print_char("="); print_int(ho(equiv(n)));
5409  end
5410
5411@ Region 5 of |eqtb| contains the integer parameters and registers defined
5412here, as well as the |del_code| table. The latter table differs from the
5413|cat_code..math_code| tables that precede it, since delimiter codes are
5414fullword integers while the other kinds of codes occupy at most a
5415halfword. This is what makes region~5 different from region~4. We will
5416store the |eq_level| information in an auxiliary array of quarterwords
5417that will be defined later.
5418
5419@d pretolerance_code=0 {badness tolerance before hyphenation}
5420@d tolerance_code=1 {badness tolerance after hyphenation}
5421@d line_penalty_code=2 {added to the badness of every line}
5422@d hyphen_penalty_code=3 {penalty for break after discretionary hyphen}
5423@d ex_hyphen_penalty_code=4 {penalty for break after explicit hyphen}
5424@d club_penalty_code=5 {penalty for creating a club line}
5425@d widow_penalty_code=6 {penalty for creating a widow line}
5426@d display_widow_penalty_code=7 {ditto, just before a display}
5427@d broken_penalty_code=8 {penalty for breaking a page at a broken line}
5428@d bin_op_penalty_code=9 {penalty for breaking after a binary operation}
5429@d rel_penalty_code=10 {penalty for breaking after a relation}
5430@d pre_display_penalty_code=11
5431  {penalty for breaking just before a displayed formula}
5432@d post_display_penalty_code=12
5433  {penalty for breaking just after a displayed formula}
5434@d inter_line_penalty_code=13 {additional penalty between lines}
5435@d double_hyphen_demerits_code=14 {demerits for double hyphen break}
5436@d final_hyphen_demerits_code=15 {demerits for final hyphen break}
5437@d adj_demerits_code=16 {demerits for adjacent incompatible lines}
5438@d mag_code=17 {magnification ratio}
5439@d delimiter_factor_code=18 {ratio for variable-size delimiters}
5440@d looseness_code=19 {change in number of lines for a paragraph}
5441@d time_code=20 {current time of day}
5442@d day_code=21 {current day of the month}
5443@d month_code=22 {current month of the year}
5444@d year_code=23 {current year of our Lord}
5445@d show_box_breadth_code=24 {nodes per level in |show_box|}
5446@d show_box_depth_code=25 {maximum level in |show_box|}
5447@d hbadness_code=26 {hboxes exceeding this badness will be shown by |hpack|}
5448@d vbadness_code=27 {vboxes exceeding this badness will be shown by |vpack|}
5449@d pausing_code=28 {pause after each line is read from a file}
5450@d tracing_online_code=29 {show diagnostic output on terminal}
5451@d tracing_macros_code=30 {show macros as they are being expanded}
5452@d tracing_stats_code=31 {show memory usage if \TeX\ knows it}
5453@d tracing_paragraphs_code=32 {show line-break calculations}
5454@d tracing_pages_code=33 {show page-break calculations}
5455@d tracing_output_code=34 {show boxes when they are shipped out}
5456@d tracing_lost_chars_code=35 {show characters that aren't in the font}
5457@d tracing_commands_code=36 {show command codes at |big_switch|}
5458@d tracing_restores_code=37 {show equivalents when they are restored}
5459@d uc_hyph_code=38 {hyphenate words beginning with a capital letter}
5460@d output_penalty_code=39 {penalty found at current page break}
5461@d max_dead_cycles_code=40 {bound on consecutive dead cycles of output}
5462@d hang_after_code=41 {hanging indentation changes after this many lines}
5463@d floating_penalty_code=42 {penalty for insertions heldover after a split}
5464@d global_defs_code=43 {override \.{\\global} specifications}
5465@d cur_fam_code=44 {current family}
5466@d escape_char_code=45 {escape character for token output}
5467@d default_hyphen_char_code=46 {value of \.{\\hyphenchar} when a font is loaded}
5468@d default_skew_char_code=47 {value of \.{\\skewchar} when a font is loaded}
5469@d end_line_char_code=48 {character placed at the right end of the buffer}
5470@d new_line_char_code=49 {character that prints as |print_ln|}
5471@d language_code=50 {current hyphenation table}
5472@d left_hyphen_min_code=51 {minimum left hyphenation fragment size}
5473@d right_hyphen_min_code=52 {minimum right hyphenation fragment size}
5474@d holding_inserts_code=53 {do not remove insertion nodes from \.{\\box255}}
5475@d error_context_lines_code=54 {maximum intermediate line pairs shown}
5476@d tex_int_pars=55 {total number of \TeX's integer parameters}
5477@#
5478@d etex_int_base=tex_int_pars {base for \eTeX's integer parameters}
5479@d tracing_assigns_code=etex_int_base {show assignments}
5480@d tracing_groups_code=etex_int_base+1 {show save/restore groups}
5481@d tracing_ifs_code=etex_int_base+2 {show conditionals}
5482@d tracing_scan_tokens_code=etex_int_base+3 {show pseudo file open and close}
5483@d tracing_nesting_code=etex_int_base+4 {show incomplete groups and ifs within files}
5484@d pre_display_direction_code=etex_int_base+5 {text direction preceding a display}
5485@d last_line_fit_code=etex_int_base+6 {adjustment for last line of paragraph}
5486@d saving_vdiscards_code=etex_int_base+7 {save items discarded from vlists}
5487@d saving_hyph_codes_code=etex_int_base+8 {save hyphenation codes for languages}
5488@d suppress_fontnotfound_error_code=etex_int_base+9 {suppress errors for missing fonts}
5489@d XeTeX_linebreak_locale_code=etex_int_base+10 {string number of locale to use for linebreak locations}
5490@d XeTeX_linebreak_penalty_code=etex_int_base+11 {penalty to use at locale-dependent linebreak locations}
5491@d XeTeX_protrude_chars_code=etex_int_base+12 {protrude chars at left/right edge of paragraphs}
5492@d eTeX_state_code=etex_int_base+13 {\eTeX\ state variables}
5493@d etex_int_pars=eTeX_state_code+eTeX_states {total number of \eTeX's integer parameters}
5494@#
5495@d int_pars=etex_int_pars {total number of integer parameters}
5496@d count_base=int_base+int_pars {|number_regs| user \.{\\count} registers}
5497@d del_code_base=count_base+number_regs {|number_usvs| delimiter code mappings}
5498@d dimen_base=del_code_base+number_usvs {beginning of region 6}
5499@#
5500@d del_code(#)==eqtb[del_code_base+#].int
5501@d count(#)==eqtb[count_base+#].int
5502@d int_par(#)==eqtb[int_base+#].int {an integer parameter}
5503@d pretolerance==int_par(pretolerance_code)
5504@d tolerance==int_par(tolerance_code)
5505@d line_penalty==int_par(line_penalty_code)
5506@d hyphen_penalty==int_par(hyphen_penalty_code)
5507@d ex_hyphen_penalty==int_par(ex_hyphen_penalty_code)
5508@d club_penalty==int_par(club_penalty_code)
5509@d widow_penalty==int_par(widow_penalty_code)
5510@d display_widow_penalty==int_par(display_widow_penalty_code)
5511@d broken_penalty==int_par(broken_penalty_code)
5512@d bin_op_penalty==int_par(bin_op_penalty_code)
5513@d rel_penalty==int_par(rel_penalty_code)
5514@d pre_display_penalty==int_par(pre_display_penalty_code)
5515@d post_display_penalty==int_par(post_display_penalty_code)
5516@d inter_line_penalty==int_par(inter_line_penalty_code)
5517@d double_hyphen_demerits==int_par(double_hyphen_demerits_code)
5518@d final_hyphen_demerits==int_par(final_hyphen_demerits_code)
5519@d adj_demerits==int_par(adj_demerits_code)
5520@d mag==int_par(mag_code)
5521@d delimiter_factor==int_par(delimiter_factor_code)
5522@d looseness==int_par(looseness_code)
5523@d time==int_par(time_code)
5524@d day==int_par(day_code)
5525@d month==int_par(month_code)
5526@d year==int_par(year_code)
5527@d show_box_breadth==int_par(show_box_breadth_code)
5528@d show_box_depth==int_par(show_box_depth_code)
5529@d hbadness==int_par(hbadness_code)
5530@d vbadness==int_par(vbadness_code)
5531@d pausing==int_par(pausing_code)
5532@d tracing_online==int_par(tracing_online_code)
5533@d tracing_macros==int_par(tracing_macros_code)
5534@d tracing_stats==int_par(tracing_stats_code)
5535@d tracing_paragraphs==int_par(tracing_paragraphs_code)
5536@d tracing_pages==int_par(tracing_pages_code)
5537@d tracing_output==int_par(tracing_output_code)
5538@d tracing_lost_chars==int_par(tracing_lost_chars_code)
5539@d tracing_commands==int_par(tracing_commands_code)
5540@d tracing_restores==int_par(tracing_restores_code)
5541@d uc_hyph==int_par(uc_hyph_code)
5542@d output_penalty==int_par(output_penalty_code)
5543@d max_dead_cycles==int_par(max_dead_cycles_code)
5544@d hang_after==int_par(hang_after_code)
5545@d floating_penalty==int_par(floating_penalty_code)
5546@d global_defs==int_par(global_defs_code)
5547@d cur_fam==int_par(cur_fam_code)
5548@d escape_char==int_par(escape_char_code)
5549@d default_hyphen_char==int_par(default_hyphen_char_code)
5550@d default_skew_char==int_par(default_skew_char_code)
5551@d end_line_char==int_par(end_line_char_code)
5552@d new_line_char==int_par(new_line_char_code)
5553@d language==int_par(language_code)
5554@d left_hyphen_min==int_par(left_hyphen_min_code)
5555@d right_hyphen_min==int_par(right_hyphen_min_code)
5556@d holding_inserts==int_par(holding_inserts_code)
5557@d error_context_lines==int_par(error_context_lines_code)
5558@#
5559@d tracing_assigns==int_par(tracing_assigns_code)
5560@d tracing_groups==int_par(tracing_groups_code)
5561@d tracing_ifs==int_par(tracing_ifs_code)
5562@d tracing_scan_tokens==int_par(tracing_scan_tokens_code)
5563@d tracing_nesting==int_par(tracing_nesting_code)
5564@d pre_display_direction==int_par(pre_display_direction_code)
5565@d last_line_fit==int_par(last_line_fit_code)
5566@d saving_vdiscards==int_par(saving_vdiscards_code)
5567@d saving_hyph_codes==int_par(saving_hyph_codes_code)
5568@d suppress_fontnotfound_error==int_par(suppress_fontnotfound_error_code)
5569@d XeTeX_linebreak_locale==int_par(XeTeX_linebreak_locale_code)
5570@d XeTeX_linebreak_penalty==int_par(XeTeX_linebreak_penalty_code)
5571@d XeTeX_protrude_chars==int_par(XeTeX_protrude_chars_code)
5572
5573@<Assign the values |depth_threshold:=show_box_depth|...@>=
5574depth_threshold:=show_box_depth;
5575breadth_max:=show_box_breadth
5576
5577@ We can print the symbolic name of an integer parameter as follows.
5578
5579@p procedure print_param(@!n:integer);
5580begin case n of
5581pretolerance_code:print_esc("pretolerance");
5582tolerance_code:print_esc("tolerance");
5583line_penalty_code:print_esc("linepenalty");
5584hyphen_penalty_code:print_esc("hyphenpenalty");
5585ex_hyphen_penalty_code:print_esc("exhyphenpenalty");
5586club_penalty_code:print_esc("clubpenalty");
5587widow_penalty_code:print_esc("widowpenalty");
5588display_widow_penalty_code:print_esc("displaywidowpenalty");
5589broken_penalty_code:print_esc("brokenpenalty");
5590bin_op_penalty_code:print_esc("binoppenalty");
5591rel_penalty_code:print_esc("relpenalty");
5592pre_display_penalty_code:print_esc("predisplaypenalty");
5593post_display_penalty_code:print_esc("postdisplaypenalty");
5594inter_line_penalty_code:print_esc("interlinepenalty");
5595double_hyphen_demerits_code:print_esc("doublehyphendemerits");
5596final_hyphen_demerits_code:print_esc("finalhyphendemerits");
5597adj_demerits_code:print_esc("adjdemerits");
5598mag_code:print_esc("mag");
5599delimiter_factor_code:print_esc("delimiterfactor");
5600looseness_code:print_esc("looseness");
5601time_code:print_esc("time");
5602day_code:print_esc("day");
5603month_code:print_esc("month");
5604year_code:print_esc("year");
5605show_box_breadth_code:print_esc("showboxbreadth");
5606show_box_depth_code:print_esc("showboxdepth");
5607hbadness_code:print_esc("hbadness");
5608vbadness_code:print_esc("vbadness");
5609pausing_code:print_esc("pausing");
5610tracing_online_code:print_esc("tracingonline");
5611tracing_macros_code:print_esc("tracingmacros");
5612tracing_stats_code:print_esc("tracingstats");
5613tracing_paragraphs_code:print_esc("tracingparagraphs");
5614tracing_pages_code:print_esc("tracingpages");
5615tracing_output_code:print_esc("tracingoutput");
5616tracing_lost_chars_code:print_esc("tracinglostchars");
5617tracing_commands_code:print_esc("tracingcommands");
5618tracing_restores_code:print_esc("tracingrestores");
5619uc_hyph_code:print_esc("uchyph");
5620output_penalty_code:print_esc("outputpenalty");
5621max_dead_cycles_code:print_esc("maxdeadcycles");
5622hang_after_code:print_esc("hangafter");
5623floating_penalty_code:print_esc("floatingpenalty");
5624global_defs_code:print_esc("globaldefs");
5625cur_fam_code:print_esc("fam");
5626escape_char_code:print_esc("escapechar");
5627default_hyphen_char_code:print_esc("defaulthyphenchar");
5628default_skew_char_code:print_esc("defaultskewchar");
5629end_line_char_code:print_esc("endlinechar");
5630new_line_char_code:print_esc("newlinechar");
5631language_code:print_esc("language");
5632left_hyphen_min_code:print_esc("lefthyphenmin");
5633right_hyphen_min_code:print_esc("righthyphenmin");
5634holding_inserts_code:print_esc("holdinginserts");
5635error_context_lines_code:print_esc("errorcontextlines");
5636XeTeX_linebreak_penalty_code:print_esc("XeTeXlinebreakpenalty");
5637XeTeX_protrude_chars_code:print_esc("XeTeXprotrudechars");
5638@/@<Cases for |print_param|@>@/
5639othercases print("[unknown integer parameter!]")
5640endcases;
5641end;
5642
5643@ The integer parameter names must be entered into the hash table.
5644
5645@<Put each...@>=
5646primitive("pretolerance",assign_int,int_base+pretolerance_code);@/
5647@!@:pretolerance_}{\.{\\pretolerance} primitive@>
5648primitive("tolerance",assign_int,int_base+tolerance_code);@/
5649@!@:tolerance_}{\.{\\tolerance} primitive@>
5650primitive("linepenalty",assign_int,int_base+line_penalty_code);@/
5651@!@:line_penalty_}{\.{\\linepenalty} primitive@>
5652primitive("hyphenpenalty",assign_int,int_base+hyphen_penalty_code);@/
5653@!@:hyphen_penalty_}{\.{\\hyphenpenalty} primitive@>
5654primitive("exhyphenpenalty",assign_int,int_base+ex_hyphen_penalty_code);@/
5655@!@:ex_hyphen_penalty_}{\.{\\exhyphenpenalty} primitive@>
5656primitive("clubpenalty",assign_int,int_base+club_penalty_code);@/
5657@!@:club_penalty_}{\.{\\clubpenalty} primitive@>
5658primitive("widowpenalty",assign_int,int_base+widow_penalty_code);@/
5659@!@:widow_penalty_}{\.{\\widowpenalty} primitive@>
5660primitive("displaywidowpenalty",
5661  assign_int,int_base+display_widow_penalty_code);@/
5662@!@:display_widow_penalty_}{\.{\\displaywidowpenalty} primitive@>
5663primitive("brokenpenalty",assign_int,int_base+broken_penalty_code);@/
5664@!@:broken_penalty_}{\.{\\brokenpenalty} primitive@>
5665primitive("binoppenalty",assign_int,int_base+bin_op_penalty_code);@/
5666@!@:bin_op_penalty_}{\.{\\binoppenalty} primitive@>
5667primitive("relpenalty",assign_int,int_base+rel_penalty_code);@/
5668@!@:rel_penalty_}{\.{\\relpenalty} primitive@>
5669primitive("predisplaypenalty",assign_int,int_base+pre_display_penalty_code);@/
5670@!@:pre_display_penalty_}{\.{\\predisplaypenalty} primitive@>
5671primitive("postdisplaypenalty",assign_int,int_base+post_display_penalty_code);@/
5672@!@:post_display_penalty_}{\.{\\postdisplaypenalty} primitive@>
5673primitive("interlinepenalty",assign_int,int_base+inter_line_penalty_code);@/
5674@!@:inter_line_penalty_}{\.{\\interlinepenalty} primitive@>
5675primitive("doublehyphendemerits",
5676  assign_int,int_base+double_hyphen_demerits_code);@/
5677@!@:double_hyphen_demerits_}{\.{\\doublehyphendemerits} primitive@>
5678primitive("finalhyphendemerits",
5679  assign_int,int_base+final_hyphen_demerits_code);@/
5680@!@:final_hyphen_demerits_}{\.{\\finalhyphendemerits} primitive@>
5681primitive("adjdemerits",assign_int,int_base+adj_demerits_code);@/
5682@!@:adj_demerits_}{\.{\\adjdemerits} primitive@>
5683primitive("mag",assign_int,int_base+mag_code);@/
5684@!@:mag_}{\.{\\mag} primitive@>
5685primitive("delimiterfactor",assign_int,int_base+delimiter_factor_code);@/
5686@!@:delimiter_factor_}{\.{\\delimiterfactor} primitive@>
5687primitive("looseness",assign_int,int_base+looseness_code);@/
5688@!@:looseness_}{\.{\\looseness} primitive@>
5689primitive("time",assign_int,int_base+time_code);@/
5690@!@:time_}{\.{\\time} primitive@>
5691primitive("day",assign_int,int_base+day_code);@/
5692@!@:day_}{\.{\\day} primitive@>
5693primitive("month",assign_int,int_base+month_code);@/
5694@!@:month_}{\.{\\month} primitive@>
5695primitive("year",assign_int,int_base+year_code);@/
5696@!@:year_}{\.{\\year} primitive@>
5697primitive("showboxbreadth",assign_int,int_base+show_box_breadth_code);@/
5698@!@:show_box_breadth_}{\.{\\showboxbreadth} primitive@>
5699primitive("showboxdepth",assign_int,int_base+show_box_depth_code);@/
5700@!@:show_box_depth_}{\.{\\showboxdepth} primitive@>
5701primitive("hbadness",assign_int,int_base+hbadness_code);@/
5702@!@:hbadness_}{\.{\\hbadness} primitive@>
5703primitive("vbadness",assign_int,int_base+vbadness_code);@/
5704@!@:vbadness_}{\.{\\vbadness} primitive@>
5705primitive("pausing",assign_int,int_base+pausing_code);@/
5706@!@:pausing_}{\.{\\pausing} primitive@>
5707primitive("tracingonline",assign_int,int_base+tracing_online_code);@/
5708@!@:tracing_online_}{\.{\\tracingonline} primitive@>
5709primitive("tracingmacros",assign_int,int_base+tracing_macros_code);@/
5710@!@:tracing_macros_}{\.{\\tracingmacros} primitive@>
5711primitive("tracingstats",assign_int,int_base+tracing_stats_code);@/
5712@!@:tracing_stats_}{\.{\\tracingstats} primitive@>
5713primitive("tracingparagraphs",assign_int,int_base+tracing_paragraphs_code);@/
5714@!@:tracing_paragraphs_}{\.{\\tracingparagraphs} primitive@>
5715primitive("tracingpages",assign_int,int_base+tracing_pages_code);@/
5716@!@:tracing_pages_}{\.{\\tracingpages} primitive@>
5717primitive("tracingoutput",assign_int,int_base+tracing_output_code);@/
5718@!@:tracing_output_}{\.{\\tracingoutput} primitive@>
5719primitive("tracinglostchars",assign_int,int_base+tracing_lost_chars_code);@/
5720@!@:tracing_lost_chars_}{\.{\\tracinglostchars} primitive@>
5721primitive("tracingcommands",assign_int,int_base+tracing_commands_code);@/
5722@!@:tracing_commands_}{\.{\\tracingcommands} primitive@>
5723primitive("tracingrestores",assign_int,int_base+tracing_restores_code);@/
5724@!@:tracing_restores_}{\.{\\tracingrestores} primitive@>
5725primitive("uchyph",assign_int,int_base+uc_hyph_code);@/
5726@!@:uc_hyph_}{\.{\\uchyph} primitive@>
5727primitive("outputpenalty",assign_int,int_base+output_penalty_code);@/
5728@!@:output_penalty_}{\.{\\outputpenalty} primitive@>
5729primitive("maxdeadcycles",assign_int,int_base+max_dead_cycles_code);@/
5730@!@:max_dead_cycles_}{\.{\\maxdeadcycles} primitive@>
5731primitive("hangafter",assign_int,int_base+hang_after_code);@/
5732@!@:hang_after_}{\.{\\hangafter} primitive@>
5733primitive("floatingpenalty",assign_int,int_base+floating_penalty_code);@/
5734@!@:floating_penalty_}{\.{\\floatingpenalty} primitive@>
5735primitive("globaldefs",assign_int,int_base+global_defs_code);@/
5736@!@:global_defs_}{\.{\\globaldefs} primitive@>
5737primitive("fam",assign_int,int_base+cur_fam_code);@/
5738@!@:fam_}{\.{\\fam} primitive@>
5739primitive("escapechar",assign_int,int_base+escape_char_code);@/
5740@!@:escape_char_}{\.{\\escapechar} primitive@>
5741primitive("defaulthyphenchar",assign_int,int_base+default_hyphen_char_code);@/
5742@!@:default_hyphen_char_}{\.{\\defaulthyphenchar} primitive@>
5743primitive("defaultskewchar",assign_int,int_base+default_skew_char_code);@/
5744@!@:default_skew_char_}{\.{\\defaultskewchar} primitive@>
5745primitive("endlinechar",assign_int,int_base+end_line_char_code);@/
5746@!@:end_line_char_}{\.{\\endlinechar} primitive@>
5747primitive("newlinechar",assign_int,int_base+new_line_char_code);@/
5748@!@:new_line_char_}{\.{\\newlinechar} primitive@>
5749primitive("language",assign_int,int_base+language_code);@/
5750@!@:language_}{\.{\\language} primitive@>
5751primitive("lefthyphenmin",assign_int,int_base+left_hyphen_min_code);@/
5752@!@:left_hyphen_min_}{\.{\\lefthyphenmin} primitive@>
5753primitive("righthyphenmin",assign_int,int_base+right_hyphen_min_code);@/
5754@!@:right_hyphen_min_}{\.{\\righthyphenmin} primitive@>
5755primitive("holdinginserts",assign_int,int_base+holding_inserts_code);@/
5756@!@:holding_inserts_}{\.{\\holdinginserts} primitive@>
5757primitive("errorcontextlines",assign_int,int_base+error_context_lines_code);@/
5758@!@:error_context_lines_}{\.{\\errorcontextlines} primitive@>
5759primitive("XeTeXlinebreakpenalty",assign_int,int_base+XeTeX_linebreak_penalty_code);@/
5760@!@:XeTeX_linebreak_penalty_}{\.{\\XeTeXlinebreakpenalty} primitive@>
5761primitive("XeTeXprotrudechars",assign_int,int_base+XeTeX_protrude_chars_code);@/
5762@!@:XeTeX_protrude_chars_}{\.{\\XeTeXprotrudechars} primitive@>
5763
5764@ @<Cases of |print_cmd_chr|...@>=
5765assign_int: if chr_code<count_base then print_param(chr_code-int_base)
5766  else  begin print_esc("count"); print_int(chr_code-count_base);
5767    end;
5768
5769@ The integer parameters should really be initialized by a macro package;
5770the following initialization does the minimum to keep \TeX\ from
5771complete failure.
5772@^null delimiter@>
5773
5774@<Initialize table entries...@>=
5775for k:=int_base to del_code_base-1 do eqtb[k].int:=0;
5776mag:=1000; tolerance:=10000; hang_after:=1; max_dead_cycles:=25;
5777escape_char:="\"; end_line_char:=carriage_return;
5778for k:=0 to number_chars-1 do del_code(k):=-1;
5779del_code("."):=0; {this null delimiter is used in error recovery}
5780
5781@ The following procedure, which is called just before \TeX\ initializes its
5782input and output, establishes the initial values of the date and time.
5783@^system dependencies@>
5784Since standard \PASCAL\ cannot provide such information, something special
5785is needed. The program here simply specifies July 4, 1776, at noon; but
5786users probably want a better approximation to the truth.
5787
5788@p procedure fix_date_and_time;
5789begin time:=12*60; {minutes since midnight}
5790day:=4; {fourth day of the month}
5791month:=7; {seventh month of the year}
5792year:=1776; {Anno Domini}
5793end;
5794
5795@ @<Show equivalent |n|, in region 5@>=
5796begin if n<count_base then print_param(n-int_base)
5797else if  n<del_code_base then
5798  begin print_esc("count"); print_int(n-count_base);
5799  end
5800else  begin print_esc("delcode"); print_int(n-del_code_base);
5801  end;
5802print_char("="); print_int(eqtb[n].int);
5803end
5804
5805@ @<Set variable |c| to the current escape character@>=c:=escape_char
5806
5807@ @<Character |s| is the current new-line character@>=s=new_line_char
5808
5809@ \TeX\ is occasionally supposed to print diagnostic information that
5810goes only into the transcript file, unless |tracing_online| is positive.
5811Here are two routines that adjust the destination of print commands:
5812
5813@p procedure begin_diagnostic; {prepare to do some tracing}
5814begin old_setting:=selector;
5815if (tracing_online<=0)and(selector=term_and_log) then
5816  begin decr(selector);
5817  if history=spotless then history:=warning_issued;
5818  end;
5819end;
5820@#
5821procedure end_diagnostic(@!blank_line:boolean);
5822  {restore proper conditions after tracing}
5823begin print_nl("");
5824if blank_line then print_ln;
5825selector:=old_setting;
5826end;
5827
5828@ Of course we had better declare another global variable, if the previous
5829routines are going to work.
5830
5831@<Glob...@>=
5832@!old_setting:0..max_selector;
5833
5834@ The final region of |eqtb| contains the dimension parameters defined
5835here, and the |number_regs| \.{\\dimen} registers.
5836
5837@d par_indent_code=0 {indentation of paragraphs}
5838@d math_surround_code=1 {space around math in text}
5839@d line_skip_limit_code=2 {threshold for |line_skip| instead of |baseline_skip|}
5840@d hsize_code=3 {line width in horizontal mode}
5841@d vsize_code=4 {page height in vertical mode}
5842@d max_depth_code=5 {maximum depth of boxes on main pages}
5843@d split_max_depth_code=6 {maximum depth of boxes on split pages}
5844@d box_max_depth_code=7 {maximum depth of explicit vboxes}
5845@d hfuzz_code=8 {tolerance for overfull hbox messages}
5846@d vfuzz_code=9 {tolerance for overfull vbox messages}
5847@d delimiter_shortfall_code=10 {maximum amount uncovered by variable delimiters}
5848@d null_delimiter_space_code=11 {blank space in null delimiters}
5849@d script_space_code=12 {extra space after subscript or superscript}
5850@d pre_display_size_code=13 {length of text preceding a display}
5851@d display_width_code=14 {length of line for displayed equation}
5852@d display_indent_code=15 {indentation of line for displayed equation}
5853@d overfull_rule_code=16 {width of rule that identifies overfull hboxes}
5854@d hang_indent_code=17 {amount of hanging indentation}
5855@d h_offset_code=18 {amount of horizontal offset when shipping pages out}
5856@d v_offset_code=19 {amount of vertical offset when shipping pages out}
5857@d emergency_stretch_code=20 {reduces badnesses on final pass of line-breaking}
5858@d pdf_page_width_code=21 {page width of the PDF output}
5859@d pdf_page_height_code=22 {page height of the PDF output}
5860@d dimen_pars=23 {total number of dimension parameters}
5861@d scaled_base=dimen_base+dimen_pars
5862  {table of |number_regs| user-defined \.{\\dimen} registers}
5863@d eqtb_size=scaled_base+biggest_reg {largest subscript of |eqtb|}
5864@#
5865@d dimen(#)==eqtb[scaled_base+#].sc
5866@d dimen_par(#)==eqtb[dimen_base+#].sc {a scaled quantity}
5867@d par_indent==dimen_par(par_indent_code)
5868@d math_surround==dimen_par(math_surround_code)
5869@d line_skip_limit==dimen_par(line_skip_limit_code)
5870@d hsize==dimen_par(hsize_code)
5871@d vsize==dimen_par(vsize_code)
5872@d max_depth==dimen_par(max_depth_code)
5873@d split_max_depth==dimen_par(split_max_depth_code)
5874@d box_max_depth==dimen_par(box_max_depth_code)
5875@d hfuzz==dimen_par(hfuzz_code)
5876@d vfuzz==dimen_par(vfuzz_code)
5877@d delimiter_shortfall==dimen_par(delimiter_shortfall_code)
5878@d null_delimiter_space==dimen_par(null_delimiter_space_code)
5879@d script_space==dimen_par(script_space_code)
5880@d pre_display_size==dimen_par(pre_display_size_code)
5881@d display_width==dimen_par(display_width_code)
5882@d display_indent==dimen_par(display_indent_code)
5883@d overfull_rule==dimen_par(overfull_rule_code)
5884@d hang_indent==dimen_par(hang_indent_code)
5885@d h_offset==dimen_par(h_offset_code)
5886@d v_offset==dimen_par(v_offset_code)
5887@d emergency_stretch==dimen_par(emergency_stretch_code)
5888@d pdf_page_width    == dimen_par(pdf_page_width_code)
5889@d pdf_page_height   == dimen_par(pdf_page_height_code)
5890
5891@p procedure print_length_param(@!n:integer);
5892begin case n of
5893par_indent_code:print_esc("parindent");
5894math_surround_code:print_esc("mathsurround");
5895line_skip_limit_code:print_esc("lineskiplimit");
5896hsize_code:print_esc("hsize");
5897vsize_code:print_esc("vsize");
5898max_depth_code:print_esc("maxdepth");
5899split_max_depth_code:print_esc("splitmaxdepth");
5900box_max_depth_code:print_esc("boxmaxdepth");
5901hfuzz_code:print_esc("hfuzz");
5902vfuzz_code:print_esc("vfuzz");
5903delimiter_shortfall_code:print_esc("delimitershortfall");
5904null_delimiter_space_code:print_esc("nulldelimiterspace");
5905script_space_code:print_esc("scriptspace");
5906pre_display_size_code:print_esc("predisplaysize");
5907display_width_code:print_esc("displaywidth");
5908display_indent_code:print_esc("displayindent");
5909overfull_rule_code:print_esc("overfullrule");
5910hang_indent_code:print_esc("hangindent");
5911h_offset_code:print_esc("hoffset");
5912v_offset_code:print_esc("voffset");
5913emergency_stretch_code:print_esc("emergencystretch");
5914pdf_page_width_code:    print_esc("pdfpagewidth");
5915pdf_page_height_code:   print_esc("pdfpageheight");
5916othercases print("[unknown dimen parameter!]")
5917endcases;
5918end;
5919
5920@ @<Put each...@>=
5921primitive("parindent",assign_dimen,dimen_base+par_indent_code);@/
5922@!@:par_indent_}{\.{\\parindent} primitive@>
5923primitive("mathsurround",assign_dimen,dimen_base+math_surround_code);@/
5924@!@:math_surround_}{\.{\\mathsurround} primitive@>
5925primitive("lineskiplimit",assign_dimen,dimen_base+line_skip_limit_code);@/
5926@!@:line_skip_limit_}{\.{\\lineskiplimit} primitive@>
5927primitive("hsize",assign_dimen,dimen_base+hsize_code);@/
5928@!@:hsize_}{\.{\\hsize} primitive@>
5929primitive("vsize",assign_dimen,dimen_base+vsize_code);@/
5930@!@:vsize_}{\.{\\vsize} primitive@>
5931primitive("maxdepth",assign_dimen,dimen_base+max_depth_code);@/
5932@!@:max_depth_}{\.{\\maxdepth} primitive@>
5933primitive("splitmaxdepth",assign_dimen,dimen_base+split_max_depth_code);@/
5934@!@:split_max_depth_}{\.{\\splitmaxdepth} primitive@>
5935primitive("boxmaxdepth",assign_dimen,dimen_base+box_max_depth_code);@/
5936@!@:box_max_depth_}{\.{\\boxmaxdepth} primitive@>
5937primitive("hfuzz",assign_dimen,dimen_base+hfuzz_code);@/
5938@!@:hfuzz_}{\.{\\hfuzz} primitive@>
5939primitive("vfuzz",assign_dimen,dimen_base+vfuzz_code);@/
5940@!@:vfuzz_}{\.{\\vfuzz} primitive@>
5941primitive("delimitershortfall",
5942  assign_dimen,dimen_base+delimiter_shortfall_code);@/
5943@!@:delimiter_shortfall_}{\.{\\delimitershortfall} primitive@>
5944primitive("nulldelimiterspace",
5945  assign_dimen,dimen_base+null_delimiter_space_code);@/
5946@!@:null_delimiter_space_}{\.{\\nulldelimiterspace} primitive@>
5947primitive("scriptspace",assign_dimen,dimen_base+script_space_code);@/
5948@!@:script_space_}{\.{\\scriptspace} primitive@>
5949primitive("predisplaysize",assign_dimen,dimen_base+pre_display_size_code);@/
5950@!@:pre_display_size_}{\.{\\predisplaysize} primitive@>
5951primitive("displaywidth",assign_dimen,dimen_base+display_width_code);@/
5952@!@:display_width_}{\.{\\displaywidth} primitive@>
5953primitive("displayindent",assign_dimen,dimen_base+display_indent_code);@/
5954@!@:display_indent_}{\.{\\displayindent} primitive@>
5955primitive("overfullrule",assign_dimen,dimen_base+overfull_rule_code);@/
5956@!@:overfull_rule_}{\.{\\overfullrule} primitive@>
5957primitive("hangindent",assign_dimen,dimen_base+hang_indent_code);@/
5958@!@:hang_indent_}{\.{\\hangindent} primitive@>
5959primitive("hoffset",assign_dimen,dimen_base+h_offset_code);@/
5960@!@:h_offset_}{\.{\\hoffset} primitive@>
5961primitive("voffset",assign_dimen,dimen_base+v_offset_code);@/
5962@!@:v_offset_}{\.{\\voffset} primitive@>
5963primitive("emergencystretch",assign_dimen,dimen_base+emergency_stretch_code);@/
5964@!@:emergency_stretch_}{\.{\\emergencystretch} primitive@>
5965
5966primitive("pdfpagewidth",assign_dimen,dimen_base+pdf_page_width_code);@/
5967@!@:pdf_page_width_}{\.{\\pdfpagewidth} primitive@>
5968primitive("pdfpageheight",assign_dimen,dimen_base+pdf_page_height_code);@/
5969@!@:pdf_page_height_}{\.{\\pdfpageheight} primitive@>
5970
5971@ @<Cases of |print_cmd_chr|...@>=
5972assign_dimen: if chr_code<scaled_base then
5973    print_length_param(chr_code-dimen_base)
5974  else  begin print_esc("dimen"); print_int(chr_code-scaled_base);
5975    end;
5976
5977@ @<Initialize table entries...@>=
5978for k:=dimen_base to eqtb_size do eqtb[k].sc:=0;
5979
5980@ @<Show equivalent |n|, in region 6@>=
5981begin if n<scaled_base then print_length_param(n-dimen_base)
5982else  begin print_esc("dimen"); print_int(n-scaled_base);
5983  end;
5984print_char("="); print_scaled(eqtb[n].sc); print("pt");
5985end
5986
5987@ Here is a procedure that displays the contents of |eqtb[n]|
5988symbolically.
5989
5990@p@t\4@>@<Declare the procedure called |print_cmd_chr|@>@;@/
5991@!stat procedure show_eqtb(@!n:pointer);
5992begin if n<active_base then print_char("?") {this can't happen}
5993else if n<glue_base then @<Show equivalent |n|, in region 1 or 2@>
5994else if n<local_base then @<Show equivalent |n|, in region 3@>
5995else if n<int_base then @<Show equivalent |n|, in region 4@>
5996else if n<dimen_base then @<Show equivalent |n|, in region 5@>
5997else if n<=eqtb_size then @<Show equivalent |n|, in region 6@>
5998else print_char("?"); {this can't happen either}
5999end;
6000tats
6001
6002@ The last two regions of |eqtb| have fullword values instead of the
6003three fields |eq_level|, |eq_type|, and |equiv|. An |eq_type| is unnecessary,
6004but \TeX\ needs to store the |eq_level| information in another array
6005called |xeq_level|.
6006
6007@<Glob...@>=
6008@!eqtb:array[active_base..eqtb_size] of memory_word;
6009@!xeq_level:array[int_base..eqtb_size] of quarterword;
6010
6011@ @<Set init...@>=
6012for k:=int_base to eqtb_size do xeq_level[k]:=level_one;
6013
6014@ When the debugging routine |search_mem| is looking for pointers having a
6015given value, it is interested only in regions 1 to~3 of~|eqtb|, and in the
6016first part of region~4.
6017
6018@<Search |eqtb| for equivalents equal to |p|@>=
6019for q:=active_base to box_base+biggest_reg do
6020  begin if equiv(q)=p then
6021    begin print_nl("EQUIV("); print_int(q); print_char(")");
6022    end;
6023  end
6024
6025@* \[18] The hash table.
6026Control sequences are stored and retrieved by means of a fairly standard hash
6027table algorithm called the method of ``coalescing lists'' (cf.\ Algorithm 6.4C
6028in {\sl The Art of Computer Programming\/}). Once a control sequence enters the
6029table, it is never removed, because there are complicated situations
6030involving \.{\\gdef} where the removal of a control sequence at the end of
6031a group would be a mistake preventable only by the introduction of a
6032complicated reference-count mechanism.
6033
6034The actual sequence of letters forming a control sequence identifier is
6035stored in the |str_pool| array together with all the other strings. An
6036auxiliary array |hash| consists of items with two halfword fields per
6037word. The first of these, called |next(p)|, points to the next identifier
6038belonging to the same coalesced list as the identifier corresponding to~|p|;
6039and the other, called |text(p)|, points to the |str_start| entry for
6040|p|'s identifier. If position~|p| of the hash table is empty, we have
6041|text(p)=0|; if position |p| is either empty or the end of a coalesced
6042hash list, we have |next(p)=0|. An auxiliary pointer variable called
6043|hash_used| is maintained in such a way that all locations |p>=hash_used|
6044are nonempty. The global variable |cs_count| tells how many multiletter
6045control sequences have been defined, if statistics are being kept.
6046
6047A global boolean variable called |no_new_control_sequence| is set to
6048|true| during the time that new hash table entries are forbidden.
6049
6050@d next(#) == hash[#].lh {link for coalesced lists}
6051@d text(#) == hash[#].rh {string number for control sequence name}
6052@d hash_is_full == (hash_used=hash_base) {test if all positions are occupied}
6053@d font_id_text(#) == text(font_id_base+#) {a frozen font identifier's name}
6054
6055@<Glob...@>=
6056@!hash: array[hash_base..undefined_control_sequence-1] of two_halves;
6057  {the hash table}
6058@!hash_used:pointer; {allocation pointer for |hash|}
6059@!no_new_control_sequence:boolean; {are new identifiers legal?}
6060@!cs_count:integer; {total number of known identifiers}
6061
6062@ Primitive support needs a few extra variables and definitions
6063
6064@d prim_size=480 {maximum number of primitives }
6065@d prim_prime=409 {about 85\pct! of |primitive_size|}
6066@d prim_base=1
6067@d prim_next(#) == prim[#].lh {link for coalesced lists}
6068@d prim_text(#) == prim[#].rh {string number for control sequence name}
6069@d prim_is_full == (prim_used=prim_base) {test if all positions are occupied}
6070@d prim_eq_level_field(#)==#.hh.b1
6071@d prim_eq_type_field(#)==#.hh.b0
6072@d prim_equiv_field(#)==#.hh.rh
6073@d prim_eq_level(#)==prim_eq_level_field(prim_eqtb[#]) {level of definition}
6074@d prim_eq_type(#)==prim_eq_type_field(prim_eqtb[#]) {command code for equivalent}
6075@d prim_equiv(#)==prim_equiv_field(prim_eqtb[#]) {equivalent value}
6076@d undefined_primitive=0
6077
6078@<Glob...@>=
6079@!prim: array [0..prim_size] of two_halves;  {the primitives table}
6080@!prim_used:pointer; {allocation pointer for |prim|}
6081@!prim_eqtb:array[0..prim_size] of memory_word;
6082
6083@ @<Set init...@>=
6084no_new_control_sequence:=true; {new identifiers are usually forbidden}
6085prim_next(0):=0; prim_text(0):=0;
6086for k:=1 to prim_size do prim[k]:=prim[0];
6087prim_eq_level(0):=level_zero;
6088prim_eq_type(0):=undefined_cs;
6089prim_equiv(0):=null;
6090for k:=1 to prim_size do prim_eqtb[k]:=prim_eqtb[0];
6091next(hash_base):=0; text(hash_base):=0;
6092for k:=hash_base+1 to undefined_control_sequence-1 do hash[k]:=hash[hash_base];
6093
6094@ @<Initialize table entries...@>=
6095prim_used:=prim_size; {nothing is used}
6096hash_used:=frozen_control_sequence; {nothing is used}
6097cs_count:=0;
6098eq_type(frozen_dont_expand):=dont_expand;
6099text(frozen_dont_expand):="notexpanded:";
6100@.notexpanded:@>
6101
6102eq_type(frozen_primitive):=ignore_spaces;
6103equiv(frozen_primitive):=1;
6104eq_level(frozen_primitive):=level_one;
6105text(frozen_primitive):="primitive";
6106
6107@ Here is the subroutine that searches the hash table for an identifier
6108that matches a given string of length |l>0| appearing in |buffer[j..
6109(j+l-1)]|. If the identifier is found, the corresponding hash table address
6110is returned. Otherwise, if the global variable |no_new_control_sequence|
6111is |true|, the dummy address |undefined_control_sequence| is returned.
6112Otherwise the identifier is inserted into the hash table and its location
6113is returned.
6114
6115@p function id_lookup(@!j,@!l:integer):pointer; {search the hash table}
6116label found; {go here if you found it}
6117var h:integer; {hash code}
6118@!d:integer; {number of characters in incomplete current string}
6119@!p:pointer; {index in |hash| array}
6120@!k:pointer; {index in |buffer| array}
6121@!ll:integer; {length in UTF16 code units}
6122begin @<Compute the hash code |h|@>;
6123p:=h+hash_base; {we start searching here; note that |0<=h<hash_prime|}
6124ll:=l; for d:=0 to l-1 do if buffer[j+d]>=@"10000 then incr(ll);
6125loop@+begin if text(p)>0 then if length(text(p))=ll then
6126    if str_eq_buf(text(p),j) then goto found;
6127  if next(p)=0 then
6128    begin if no_new_control_sequence then
6129      p:=undefined_control_sequence
6130    else @<Insert a new control sequence after |p|, then make
6131      |p| point to it@>;
6132    goto found;
6133    end;
6134  p:=next(p);
6135  end;
6136found: id_lookup:=p;
6137end;
6138
6139@ @<Insert a new control...@>=
6140begin if text(p)>0 then
6141  begin repeat if hash_is_full then overflow("hash size",hash_size);
6142@:TeX capacity exceeded hash size}{\quad hash size@>
6143  decr(hash_used);
6144  until text(hash_used)=0; {search for an empty location in |hash|}
6145  next(p):=hash_used; p:=hash_used;
6146  end;
6147str_room(ll); d:=cur_length;
6148while pool_ptr>str_start_macro(str_ptr) do
6149  begin decr(pool_ptr); str_pool[pool_ptr+l]:=str_pool[pool_ptr];
6150  end; {move current string up to make room for another}
6151for k:=j to j+l-1 do begin
6152  if buffer[k]<@"10000 then append_char(buffer[k])
6153  else begin
6154    append_char(@"D800+(buffer[k]-@"10000)div@"400);
6155    append_char(@"DC00+(buffer[k]-@"10000)mod@"400);
6156  end
6157end;
6158text(p):=make_string; pool_ptr:=pool_ptr+d;
6159@!stat incr(cs_count);@+tats@;@/
6160end
6161
6162@ The value of |hash_prime| should be roughly 85\pct! of |hash_size|, and it
6163should be a prime number.  The theory of hashing tells us to expect fewer
6164than two table probes, on the average, when the search is successful.
6165[See J.~S. Vitter, {\sl Journal of the ACM\/ \bf30} (1983), 231--258.]
6166@^Vitter, Jeffrey Scott@>
6167
6168@<Compute the hash code |h|@>=
6169h:=0;
6170for k:=j to j+l-1 do
6171  begin h:=h+h+buffer[k];
6172  while h>=hash_prime do h:=h-hash_prime;
6173  end
6174
6175@ Here is the subroutine that searches the primitive table for an identifier
6176
6177@p function prim_lookup(@!s:str_number):pointer; {search the primitives table}
6178label found; {go here if you found it}
6179var h:integer; {hash code}
6180@!p:pointer; {index in |hash| array}
6181@!k:pointer; {index in string pool}
6182@!j,@!l:integer;
6183begin
6184if s<256 then begin
6185  p:=s;
6186  if (p<0) or (prim_eq_level(p)<>level_one) then
6187    p:=undefined_primitive;
6188end
6189else begin
6190  j:=str_start_macro(s);
6191  if s = str_ptr then l:=cur_length else l:=length(s);
6192  @<Compute the primitive code |h|@>;
6193  p:=h+prim_base; {we start searching here; note that |0<=h<hash_prime|}
6194  loop@+begin if prim_text(p)>0 then if length(prim_text(p))=l then
6195    if str_eq_str(prim_text(p),s) then goto found;
6196    if prim_next(p)=0 then
6197      begin if no_new_control_sequence then
6198        p:=undefined_primitive
6199      else @<Insert a new primitive after |p|, then make
6200        |p| point to it@>;
6201      goto found;
6202      end;
6203    p:=prim_next(p);
6204    end;
6205  end;
6206found: prim_lookup:=p;
6207end;
6208
6209@ @<Insert a new primitive...@>=
6210begin if prim_text(p)>0 then
6211  begin repeat if prim_is_full then overflow("primitive size",prim_size);
6212@:TeX capacity exceeded primitive size}{\quad primitive size@>
6213  decr(prim_used);
6214  until prim_text(prim_used)=0; {search for an empty location in |prim|}
6215  prim_next(p):=prim_used; p:=prim_used;
6216  end;
6217prim_text(p):=s;
6218end
6219
6220@ The value of |prim_prime| should be roughly 85\pct! of
6221|prim_size|, and it should be a prime number.
6222
6223@<Compute the primitive code |h|@>=
6224h:=str_pool[j];
6225for k:=j+1 to j+l-1 do
6226  begin h:=h+h+str_pool[k];
6227  while h>=prim_prime do h:=h-prim_prime;
6228  end
6229
6230@ Single-character control sequences do not need to be looked up in a hash
6231table, since we can use the character code itself as a direct address.
6232The procedure |print_cs| prints the name of a control sequence, given
6233a pointer to its address in |eqtb|. A space is printed after the name
6234unless it is a single nonletter or an active character. This procedure
6235might be invoked with invalid data, so it is ``extra robust.'' The
6236individual characters must be printed one at a time using |print|, since
6237they may be unprintable.
6238
6239@<Basic printing...@>=
6240procedure print_cs(@!p:integer); {prints a purported control sequence}
6241begin if p<hash_base then {single character}
6242  if p>=single_base then
6243    if p=null_cs then
6244      begin print_esc("csname"); print_esc("endcsname"); print_char(" ");
6245      end
6246    else  begin print_esc(p-single_base);
6247      if cat_code(p-single_base)=letter then print_char(" ");
6248      end
6249  else if p<active_base then print_esc("IMPOSSIBLE.")
6250@.IMPOSSIBLE@>
6251  else print_char(p-active_base)
6252else if p>=undefined_control_sequence then print_esc("IMPOSSIBLE.")
6253else if (text(p)<0)or(text(p)>=str_ptr) then print_esc("NONEXISTENT.")
6254@.NONEXISTENT@>
6255else  begin print_esc(text(p));
6256  print_char(" ");
6257  end;
6258end;
6259
6260@ Here is a similar procedure; it avoids the error checks, and it never
6261prints a space after the control sequence.
6262
6263@<Basic printing procedures@>=
6264procedure sprint_cs(@!p:pointer); {prints a control sequence}
6265begin if p<hash_base then
6266  if p<single_base then print_char(p-active_base)
6267  else  if p<null_cs then print_esc(p-single_base)
6268    else  begin print_esc("csname"); print_esc("endcsname");
6269      end
6270else print_esc(text(p));
6271end;
6272
6273@ We need to put \TeX's ``primitive'' control sequences into the hash
6274table, together with their command code (which will be the |eq_type|)
6275and an operand (which will be the |equiv|). The |primitive| procedure
6276does this, in a way that no \TeX\ user can. The global value |cur_val|
6277contains the new |eqtb| pointer after |primitive| has acted.
6278
6279@p @!init procedure primitive(@!s:str_number;@!c:quarterword;@!o:halfword);
6280var k:pool_pointer; {index into |str_pool|}
6281@!j:0..buf_size; {index into |buffer|}
6282@!l:small_number; {length of the string}
6283@!prim_val:integer; {needed to fill |prim_eqtb|}
6284begin if s<256 then begin
6285  cur_val:=s+single_base;
6286  prim_val:=s;
6287end
6288else  begin k:=str_start_macro(s); l:=str_start_macro(s+1)-k;
6289    {we will move |s| into the (possibly non-empty) |buffer|}
6290  if first+l>buf_size+1 then
6291      overflow("buffer size",buf_size);
6292@:TeX capacity exceeded buffer size}{\quad buffer size@>
6293  for j:=0 to l-1 do buffer[first+j]:=so(str_pool[k+j]);
6294  cur_val:=id_lookup(first,l); {|no_new_control_sequence| is |false|}
6295  flush_string; text(cur_val):=s; {we don't want to have the string twice}
6296  prim_val:=prim_lookup(s);
6297  end;
6298eq_level(cur_val):=level_one; eq_type(cur_val):=c; equiv(cur_val):=o;
6299prim_eq_level(prim_val):=level_one;
6300prim_eq_type(prim_val):=c;
6301prim_equiv(prim_val):=o;
6302end;
6303tini
6304
6305@ Many of \TeX's primitives need no |equiv|, since they are identifiable
6306by their |eq_type| alone. These primitives are loaded into the hash table
6307as follows:
6308
6309@<Put each of \TeX's primitives into the hash table@>=
6310primitive(" ",ex_space,0);@/
6311@!@:Single-character primitives /}{\quad\.{\\\ }@>
6312primitive("/",ital_corr,0);@/
6313@!@:Single-character primitives /}{\quad\.{\\/}@>
6314primitive("accent",accent,0);@/
6315@!@:accent_}{\.{\\accent} primitive@>
6316primitive("advance",advance,0);@/
6317@!@:advance_}{\.{\\advance} primitive@>
6318primitive("afterassignment",after_assignment,0);@/
6319@!@:after_assignment_}{\.{\\afterassignment} primitive@>
6320primitive("aftergroup",after_group,0);@/
6321@!@:after_group_}{\.{\\aftergroup} primitive@>
6322primitive("begingroup",begin_group,0);@/
6323@!@:begin_group_}{\.{\\begingroup} primitive@>
6324primitive("char",char_num,0);@/
6325@!@:char_}{\.{\\char} primitive@>
6326primitive("csname",cs_name,0);@/
6327@!@:cs_name_}{\.{\\csname} primitive@>
6328primitive("delimiter",delim_num,0);@/
6329@!@:delimiter_}{\.{\\delimiter} primitive@>
6330primitive("XeTeXdelimiter",delim_num,1);@/
6331primitive("Udelimiter",delim_num,1);@/
6332@!@:U_delimiter_}{\.{\\Udelimiter} primitive@>
6333primitive("divide",divide,0);@/
6334@!@:divide_}{\.{\\divide} primitive@>
6335primitive("endcsname",end_cs_name,0);@/
6336@!@:end_cs_name_}{\.{\\endcsname} primitive@>
6337primitive("endgroup",end_group,0);
6338@!@:end_group_}{\.{\\endgroup} primitive@>
6339text(frozen_end_group):="endgroup"; eqtb[frozen_end_group]:=eqtb[cur_val];@/
6340primitive("expandafter",expand_after,0);@/
6341@!@:expand_after_}{\.{\\expandafter} primitive@>
6342primitive("font",def_font,0);@/
6343@!@:font_}{\.{\\font} primitive@>
6344primitive("fontdimen",assign_font_dimen,0);@/
6345@!@:font_dimen_}{\.{\\fontdimen} primitive@>
6346primitive("halign",halign,0);@/
6347@!@:halign_}{\.{\\halign} primitive@>
6348primitive("hrule",hrule,0);@/
6349@!@:hrule_}{\.{\\hrule} primitive@>
6350primitive("ignorespaces",ignore_spaces,0);@/
6351@!@:ignore_spaces_}{\.{\\ignorespaces} primitive@>
6352primitive("insert",insert,0);@/
6353@!@:insert_}{\.{\\insert} primitive@>
6354primitive("mark",mark,0);@/
6355@!@:mark_}{\.{\\mark} primitive@>
6356primitive("mathaccent",math_accent,0);@/
6357@!@:math_accent_}{\.{\\mathaccent} primitive@>
6358primitive("XeTeXmathaccent",math_accent,1);@/
6359primitive("Umathaccent",math_accent,1);@/
6360@!@:U_math_accent_}{\.{\\Umathaccent} primitive@>
6361primitive("mathchar",math_char_num,0);@/
6362@!@:math_char_}{\.{\\mathchar} primitive@>
6363primitive("XeTeXmathcharnum",math_char_num,1);@/
6364primitive("Umathcharnum",math_char_num,1);@/
6365@!@:U_math_char_num_}{\.{\\Umathcharnum} primitive@>
6366primitive("XeTeXmathchar",math_char_num,2);@/
6367primitive("Umathchar",math_char_num,2);@/
6368@!@:U_math_char_}{\.{\\Umathchar} primitive@>
6369primitive("mathchoice",math_choice,0);@/
6370@!@:math_choice_}{\.{\\mathchoice} primitive@>
6371primitive("multiply",multiply,0);@/
6372@!@:multiply_}{\.{\\multiply} primitive@>
6373primitive("noalign",no_align,0);@/
6374@!@:no_align_}{\.{\\noalign} primitive@>
6375primitive("noboundary",no_boundary,0);@/
6376@!@:no_boundary_}{\.{\\noboundary} primitive@>
6377primitive("noexpand",no_expand,0);@/
6378@!@:no_expand_}{\.{\\noexpand} primitive@>
6379primitive("primitive",no_expand,1);@/
6380@!@:primitive_}{\.{\\primitive} primitive@>
6381primitive("nonscript",non_script,0);@/
6382@!@:non_script_}{\.{\\nonscript} primitive@>
6383primitive("omit",omit,0);@/
6384@!@:omit_}{\.{\\omit} primitive@>
6385primitive("parshape",set_shape,par_shape_loc);@/
6386@!@:par_shape_}{\.{\\parshape} primitive@>
6387primitive("penalty",break_penalty,0);@/
6388@!@:penalty_}{\.{\\penalty} primitive@>
6389primitive("prevgraf",set_prev_graf,0);@/
6390@!@:prev_graf_}{\.{\\prevgraf} primitive@>
6391primitive("radical",radical,0);@/
6392@!@:radical_}{\.{\\radical} primitive@>
6393primitive("XeTeXradical",radical,1);@/
6394primitive("Uradical",radical,1);@/
6395@!@:U_radical_}{\.{\\Uradical} primitive@>
6396primitive("read",read_to_cs,0);@/
6397@!@:read_}{\.{\\read} primitive@>
6398primitive("relax",relax,too_big_usv); {cf.\ |scan_file_name|}
6399@!@:relax_}{\.{\\relax} primitive@>
6400text(frozen_relax):="relax"; eqtb[frozen_relax]:=eqtb[cur_val];@/
6401primitive("setbox",set_box,0);@/
6402@!@:set_box_}{\.{\\setbox} primitive@>
6403primitive("the",the,0);@/
6404@!@:the_}{\.{\\the} primitive@>
6405primitive("toks",toks_register,mem_bot);@/
6406@!@:toks_}{\.{\\toks} primitive@>
6407primitive("vadjust",vadjust,0);@/
6408@!@:vadjust_}{\.{\\vadjust} primitive@>
6409primitive("valign",valign,0);@/
6410@!@:valign_}{\.{\\valign} primitive@>
6411primitive("vcenter",vcenter,0);@/
6412@!@:vcenter_}{\.{\\vcenter} primitive@>
6413primitive("vrule",vrule,0);@/
6414@!@:vrule_}{\.{\\vrule} primitive@>
6415
6416@ Each primitive has a corresponding inverse, so that it is possible to
6417display the cryptic numeric contents of |eqtb| in symbolic form.
6418Every call of |primitive| in this program is therefore accompanied by some
6419straightforward code that forms part of the |print_cmd_chr| routine
6420below.
6421
6422@<Cases of |print_cmd_chr|...@>=
6423accent: print_esc("accent");
6424advance: print_esc("advance");
6425after_assignment: print_esc("afterassignment");
6426after_group: print_esc("aftergroup");
6427assign_font_dimen: print_esc("fontdimen");
6428begin_group: print_esc("begingroup");
6429break_penalty: print_esc("penalty");
6430char_num: print_esc("char");
6431cs_name: print_esc("csname");
6432def_font: print_esc("font");
6433delim_num: if chr_code=1 then print_esc("Udelimiter")
6434  else print_esc("delimiter");
6435divide: print_esc("divide");
6436end_cs_name: print_esc("endcsname");
6437end_group: print_esc("endgroup");
6438ex_space: print_esc(" ");
6439expand_after: if chr_code=0 then print_esc("expandafter")
6440  @<Cases of |expandafter| for |print_cmd_chr|@>;
6441halign: print_esc("halign");
6442hrule: print_esc("hrule");
6443ignore_spaces: if chr_code=0 then print_esc("ignorespaces") else print_esc("primitive");
6444insert: print_esc("insert");
6445ital_corr: print_esc("/");
6446mark: begin print_esc("mark");
6447  if chr_code>0 then print_char("s");
6448  end;
6449math_accent: if chr_code=1 then print_esc("Umathaccent")
6450  else print_esc("mathaccent");
6451math_char_num: if chr_code=2 then print_esc("Umathchar")
6452  else if chr_code=1 then print_esc("Umathcharnum")
6453  else print_esc("mathchar");
6454math_choice: print_esc("mathchoice");
6455multiply: print_esc("multiply");
6456no_align: print_esc("noalign");
6457no_boundary:print_esc("noboundary");
6458no_expand: if chr_code=0 then print_esc("noexpand")
6459   else print_esc("primitive");
6460non_script: print_esc("nonscript");
6461omit: print_esc("omit");
6462radical: if chr_code=1 then print_esc("Uradical") else print_esc("radical");
6463read_to_cs: if chr_code=0 then print_esc("read")
6464  @<Cases of |read| for |print_cmd_chr|@>;
6465relax: print_esc("relax");
6466set_box: print_esc("setbox");
6467set_prev_graf: print_esc("prevgraf");
6468set_shape: case chr_code of
6469  par_shape_loc: print_esc("parshape");
6470  @<Cases of |set_shape| for |print_cmd_chr|@>@;@/
6471  end; {there are no other cases}
6472the: if chr_code=0 then print_esc("the")
6473  @<Cases of |the| for |print_cmd_chr|@>;
6474toks_register: @<Cases of |toks_register| for |print_cmd_chr|@>;
6475vadjust: print_esc("vadjust");
6476valign: if chr_code=0 then print_esc("valign")@/
6477  @<Cases of |valign| for |print_cmd_chr|@>;
6478vcenter: print_esc("vcenter");
6479vrule: print_esc("vrule");
6480
6481@ We will deal with the other primitives later, at some point in the program
6482where their |eq_type| and |equiv| values are more meaningful.  For example,
6483the primitives for math mode will be loaded when we consider the routines
6484that deal with formulas. It is easy to find where each particular
6485primitive was treated by looking in the index at the end; for example, the
6486section where |"radical"| entered |eqtb| is listed under `\.{\\radical}
6487primitive'. (Primitives consisting of a single nonalphabetic character,
6488@!like `\.{\\/}', are listed under `Single-character primitives'.)
6489@!@^Single-character primitives@>
6490
6491Meanwhile, this is a convenient place to catch up on something we were unable
6492to do before the hash table was defined:
6493
6494@<Print the font identifier for |font(p)|@>=
6495print_esc(font_id_text(font(p)))
6496
6497@* \[19] Saving and restoring equivalents.
6498The nested structure provided by `$\.{\char'173}\ldots\.{\char'175}$' groups
6499in \TeX\ means that |eqtb| entries valid in outer groups should be saved
6500and restored later if they are overridden inside the braces. When a new |eqtb|
6501value is being assigned, the program therefore checks to see if the previous
6502entry belongs to an outer level. In such a case, the old value is placed
6503on the |save_stack| just before the new value enters |eqtb|. At the
6504end of a grouping level, i.e., when the right brace is sensed, the
6505|save_stack| is used to restore the outer values, and the inner ones are
6506destroyed.
6507
6508Entries on the |save_stack| are of type |memory_word|. The top item on
6509this stack is |save_stack[p]|, where |p=save_ptr-1|; it contains three
6510fields called |save_type|, |save_level|, and |save_index|, and it is
6511interpreted in one of five ways:
6512
6513\yskip\hangg 1) If |save_type(p)=restore_old_value|, then
6514|save_index(p)| is a location in |eqtb| whose current value should
6515be destroyed at the end of the current group and replaced by |save_stack[p-1]|.
6516Furthermore if |save_index(p)>=int_base|, then |save_level(p)|
6517should replace the corresponding entry in |xeq_level|.
6518
6519\yskip\hangg 2) If |save_type(p)=restore_zero|, then |save_index(p)|
6520is a location in |eqtb| whose current value should be destroyed at the end
6521of the current group, when it should be
6522replaced by the current value of |eqtb[undefined_control_sequence]|.
6523
6524\yskip\hangg 3) If |save_type(p)=insert_token|, then |save_index(p)|
6525is a token that should be inserted into \TeX's input when the current
6526group ends.
6527
6528\yskip\hangg 4) If |save_type(p)=level_boundary|, then |save_level(p)|
6529is a code explaining what kind of group we were previously in, and
6530|save_index(p)| points to the level boundary word at the bottom of
6531the entries for that group.
6532Furthermore, in extended \eTeX\ mode, |save_stack[p-1]| contains the
6533source line number at which the current level of grouping was entered.
6534
6535\yskip\hang 5) If |save_type(p)=restore_sa|, then |sa_chain| points to a
6536chain of sparse array entries to be restored at the end of the current
6537group. Furthermore |save_index(p)| and |save_level(p)| should replace
6538the values of |sa_chain| and |sa_level| respectively.
6539
6540@d save_type(#)==save_stack[#].hh.b0 {classifies a |save_stack| entry}
6541@d save_level(#)==save_stack[#].hh.b1
6542  {saved level for regions 5 and 6, or group code}
6543@d save_index(#)==save_stack[#].hh.rh
6544  {|eqtb| location or token or |save_stack| location}
6545@d restore_old_value=0 {|save_type| when a value should be restored later}
6546@d restore_zero=1 {|save_type| when an undefined entry should be restored}
6547@d insert_token=2 {|save_type| when a token is being saved for later use}
6548@d level_boundary=3 {|save_type| corresponding to beginning of group}
6549@d restore_sa=4 {|save_type| when sparse array entries should be restored}
6550
6551@p@t\4@>@<Declare \eTeX\ procedures for tracing and input@>
6552
6553@ Here are the group codes that are used to discriminate between different
6554kinds of groups. They allow \TeX\ to decide what special actions, if any,
6555should be performed when a group ends.
6556\def\grp{\.{\char'173...\char'175}}
6557
6558Some groups are not supposed to be ended by right braces. For example,
6559the `\.\$' that begins a math formula causes a |math_shift_group| to
6560be started, and this should be terminated by a matching `\.\$'. Similarly,
6561a group that starts with \.{\\left} should end with \.{\\right}, and
6562one that starts with \.{\\begingroup} should end with \.{\\endgroup}.
6563
6564@d bottom_level=0 {group code for the outside world}
6565@d simple_group=1 {group code for local structure only}
6566@d hbox_group=2 {code for `\.{\\hbox}\grp'}
6567@d adjusted_hbox_group=3 {code for `\.{\\hbox}\grp' in vertical mode}
6568@d vbox_group=4 {code for `\.{\\vbox}\grp'}
6569@d vtop_group=5 {code for `\.{\\vtop}\grp'}
6570@d align_group=6 {code for `\.{\\halign}\grp', `\.{\\valign}\grp'}
6571@d no_align_group=7 {code for `\.{\\noalign}\grp'}
6572@d output_group=8 {code for output routine}
6573@d math_group=9 {code for, e.g., `\.{\char'136}\grp'}
6574@d disc_group=10 {code for `\.{\\discretionary}\grp\grp\grp'}
6575@d insert_group=11 {code for `\.{\\insert}\grp', `\.{\\vadjust}\grp'}
6576@d vcenter_group=12 {code for `\.{\\vcenter}\grp'}
6577@d math_choice_group=13 {code for `\.{\\mathchoice}\grp\grp\grp\grp'}
6578@d semi_simple_group=14 {code for `\.{\\begingroup...\\endgroup}'}
6579@d math_shift_group=15 {code for `\.{\$...\$}'}
6580@d math_left_group=16 {code for `\.{\\left...\\right}'}
6581@d max_group_code=16
6582
6583@<Types...@>=
6584@!group_code=0..max_group_code; {|save_level| for a level boundary}
6585
6586@ The global variable |cur_group| keeps track of what sort of group we are
6587currently in. Another global variable, |cur_boundary|, points to the
6588topmost |level_boundary| word.  And |cur_level| is the current depth of
6589nesting. The routines are designed to preserve the condition that no entry
6590in the |save_stack| or in |eqtb| ever has a level greater than |cur_level|.
6591
6592@ @<Glob...@>=
6593@!save_stack : array[0..save_size] of memory_word;
6594@!save_ptr : 0..save_size; {first unused entry on |save_stack|}
6595@!max_save_stack:0..save_size; {maximum usage of save stack}
6596@!cur_level: quarterword; {current nesting level for groups}
6597@!cur_group: group_code; {current group type}
6598@!cur_boundary: 0..save_size; {where the current level begins}
6599
6600@ At this time it might be a good idea for the reader to review the introduction
6601to |eqtb| that was given above just before the long lists of parameter names.
6602Recall that the ``outer level'' of the program is |level_one|, since
6603undefined control sequences are assumed to be ``defined'' at |level_zero|.
6604
6605@<Set init...@>=
6606save_ptr:=0; cur_level:=level_one; cur_group:=bottom_level; cur_boundary:=0;
6607max_save_stack:=0;
6608
6609@ The following macro is used to test if there is room for up to seven more
6610entries on |save_stack|. By making a conservative test like this, we can
6611get by with testing for overflow in only a few places.
6612
6613@d check_full_save_stack==if save_ptr>max_save_stack then
6614  begin max_save_stack:=save_ptr;
6615  if max_save_stack>save_size-7 then overflow("save size",save_size);
6616@:TeX capacity exceeded save size}{\quad save size@>
6617  end
6618
6619@ Procedure |new_save_level| is called when a group begins. The
6620argument is a group identification code like `|hbox_group|'. After
6621calling this routine, it is safe to put five more entries on |save_stack|.
6622
6623In some cases integer-valued items are placed onto the
6624|save_stack| just below a |level_boundary| word, because this is a
6625convenient place to keep information that is supposed to ``pop up'' just
6626when the group has finished.
6627For example, when `\.{\\hbox to 100pt}\grp' is being treated, the 100pt
6628dimension is stored on |save_stack| just before |new_save_level| is
6629called.
6630
6631We use the notation |saved(k)| to stand for an integer item that
6632appears in location |save_ptr+k| of the save stack.
6633
6634@d saved(#)==save_stack[save_ptr+#].int
6635
6636@p procedure new_save_level(@!c:group_code); {begin a new level of grouping}
6637begin check_full_save_stack;
6638if eTeX_ex then
6639  begin saved(0):=line; incr(save_ptr);
6640  end;
6641save_type(save_ptr):=level_boundary; save_level(save_ptr):=cur_group;
6642save_index(save_ptr):=cur_boundary;
6643if cur_level=max_quarterword then overflow("grouping levels",
6644@:TeX capacity exceeded grouping levels}{\quad grouping levels@>
6645  max_quarterword-min_quarterword);
6646  {quit if |(cur_level+1)| is too big to be stored in |eqtb|}
6647cur_boundary:=save_ptr; cur_group:=c;
6648@!stat if tracing_groups>0 then group_trace(false);@+tats@;@/
6649incr(cur_level); incr(save_ptr);
6650end;
6651
6652@ Just before an entry of |eqtb| is changed, the following procedure should
6653be called to update the other data structures properly. It is important
6654to keep in mind that reference counts in |mem| include references from
6655within |save_stack|, so these counts must be handled carefully.
6656@^reference counts@>
6657
6658@p procedure eq_destroy(@!w:memory_word); {gets ready to forget |w|}
6659var q:pointer; {|equiv| field of |w|}
6660begin case eq_type_field(w) of
6661call,long_call,outer_call,long_outer_call: delete_token_ref(equiv_field(w));
6662glue_ref: delete_glue_ref(equiv_field(w));
6663shape_ref: begin q:=equiv_field(w); {we need to free a \.{\\parshape} block}
6664  if q<>null then free_node(q,info(q)+info(q)+1);
6665  end; {such a block is |2n+1| words long, where |n=info(q)|}
6666box_ref: flush_node_list(equiv_field(w));
6667@/@<Cases for |eq_destroy|@>@/
6668othercases do_nothing
6669endcases;
6670end;
6671
6672@ To save a value of |eqtb[p]| that was established at level |l|, we
6673can use the following subroutine.
6674
6675@p procedure eq_save(@!p:pointer;@!l:quarterword); {saves |eqtb[p]|}
6676begin check_full_save_stack;
6677if l=level_zero then save_type(save_ptr):=restore_zero
6678else  begin save_stack[save_ptr]:=eqtb[p]; incr(save_ptr);
6679  save_type(save_ptr):=restore_old_value;
6680  end;
6681save_level(save_ptr):=l; save_index(save_ptr):=p; incr(save_ptr);
6682end;
6683
6684@ The procedure |eq_define| defines an |eqtb| entry having specified
6685|eq_type| and |equiv| fields, and saves the former value if appropriate.
6686This procedure is used only for entries in the first four regions of |eqtb|,
6687i.e., only for entries that have |eq_type| and |equiv| fields.
6688After calling this routine, it is safe to put four more entries on
6689|save_stack|, provided that there was room for four more entries before
6690the call, since |eq_save| makes the necessary test.
6691
6692@d assign_trace(#)==@!stat if tracing_assigns>0 then restore_trace(#);
6693  tats
6694
6695@p procedure eq_define(@!p:pointer;@!t:quarterword;@!e:halfword);
6696  {new data for |eqtb|}
6697label exit;
6698begin if eTeX_ex and(eq_type(p)=t)and(equiv(p)=e) then
6699  begin assign_trace(p,"reassigning")@;@/
6700  eq_destroy(eqtb[p]); return;
6701  end;
6702assign_trace(p,"changing")@;@/
6703if eq_level(p)=cur_level then eq_destroy(eqtb[p])
6704else if cur_level>level_one then eq_save(p,eq_level(p));
6705eq_level(p):=cur_level; eq_type(p):=t; equiv(p):=e;
6706assign_trace(p,"into")@;@/
6707exit:end;
6708
6709@ The counterpart of |eq_define| for the remaining (fullword) positions in
6710|eqtb| is called |eq_word_define|. Since |xeq_level[p]>=level_one| for all
6711|p|, a `|restore_zero|' will never be used in this case.
6712
6713@p procedure eq_word_define(@!p:pointer;@!w:integer);
6714label exit;
6715begin if eTeX_ex and(eqtb[p].int=w) then
6716  begin assign_trace(p,"reassigning")@;@/
6717  return;
6718  end;
6719assign_trace(p,"changing")@;@/
6720if xeq_level[p]<>cur_level then
6721  begin eq_save(p,xeq_level[p]); xeq_level[p]:=cur_level;
6722  end;
6723eqtb[p].int:=w;
6724assign_trace(p,"into")@;@/
6725exit:end;
6726
6727@ The |eq_define| and |eq_word_define| routines take care of local definitions.
6728@^global definitions@>
6729Global definitions are done in almost the same way, but there is no need
6730to save old values, and the new value is associated with |level_one|.
6731
6732@p procedure geq_define(@!p:pointer;@!t:quarterword;@!e:halfword);
6733  {global |eq_define|}
6734begin assign_trace(p,"globally changing")@;@/
6735begin eq_destroy(eqtb[p]);
6736eq_level(p):=level_one; eq_type(p):=t; equiv(p):=e;
6737end;
6738assign_trace(p,"into")@;@/
6739end;
6740@#
6741procedure geq_word_define(@!p:pointer;@!w:integer); {global |eq_word_define|}
6742begin assign_trace(p,"globally changing")@;@/
6743begin eqtb[p].int:=w; xeq_level[p]:=level_one;
6744end;
6745assign_trace(p,"into")@;@/
6746end;
6747
6748@ Subroutine |save_for_after| puts a token on the stack for save-keeping.
6749
6750@p procedure save_for_after(@!t:halfword);
6751begin if cur_level>level_one then
6752  begin check_full_save_stack;
6753  save_type(save_ptr):=insert_token; save_level(save_ptr):=level_zero;
6754  save_index(save_ptr):=t; incr(save_ptr);
6755  end;
6756end;
6757
6758@ The |unsave| routine goes the other way, taking items off of |save_stack|.
6759This routine takes care of restoration when a level ends; everything
6760belonging to the topmost group is cleared off of the save stack.
6761
6762@p
6763procedure@?back_input; forward; @t\2@>
6764procedure unsave; {pops the top level off the save stack}
6765label done;
6766var p:pointer; {position to be restored}
6767@!l:quarterword; {saved level, if in fullword regions of |eqtb|}
6768@!t:halfword; {saved value of |cur_tok|}
6769@!a:boolean; {have we already processed an \.{\\aftergroup} ?}
6770begin a:=false;
6771if cur_level>level_one then
6772  begin decr(cur_level);
6773  @<Clear off top level from |save_stack|@>;
6774  end
6775else confusion("curlevel"); {|unsave| is not used when |cur_group=bottom_level|}
6776@:this can't happen curlevel}{\quad curlevel@>
6777end;
6778
6779@ @<Clear off...@>=
6780loop@+begin decr(save_ptr);
6781  if save_type(save_ptr)=level_boundary then goto done;
6782  p:=save_index(save_ptr);
6783  if save_type(save_ptr)=insert_token then
6784    @<Insert token |p| into \TeX's input@>
6785  else if save_type(save_ptr)=restore_sa then
6786    begin sa_restore; sa_chain:=p; sa_level:=save_level(save_ptr);
6787    end
6788  else  begin if save_type(save_ptr)=restore_old_value then
6789      begin l:=save_level(save_ptr); decr(save_ptr);
6790      end
6791    else save_stack[save_ptr]:=eqtb[undefined_control_sequence];
6792    @<Store \(s)|save_stack[save_ptr]| in |eqtb[p]|, unless
6793      |eqtb[p]| holds a global value@>;
6794    end;
6795  end;
6796done: @!stat if tracing_groups>0 then group_trace(true);@+tats@;@/
6797if grp_stack[in_open]=cur_boundary then group_warning;
6798  {groups possibly not properly nested with files}
6799cur_group:=save_level(save_ptr); cur_boundary:=save_index(save_ptr);
6800if eTeX_ex then decr(save_ptr)
6801
6802@ A global definition, which sets the level to |level_one|,
6803@^global definitions@>
6804will not be undone by |unsave|. If at least one global definition of
6805|eqtb[p]| has been carried out within the group that just ended, the
6806last such definition will therefore survive.
6807
6808@<Store \(s)|save...@>=
6809if p<int_base then
6810  if eq_level(p)=level_one then
6811    begin eq_destroy(save_stack[save_ptr]); {destroy the saved value}
6812    @!stat if tracing_restores>0 then restore_trace(p,"retaining");@+tats@;@/
6813    end
6814  else  begin eq_destroy(eqtb[p]); {destroy the current value}
6815    eqtb[p]:=save_stack[save_ptr]; {restore the saved value}
6816    @!stat if tracing_restores>0 then restore_trace(p,"restoring");@+tats@;@/
6817    end
6818else if xeq_level[p]<>level_one then
6819  begin eqtb[p]:=save_stack[save_ptr]; xeq_level[p]:=l;
6820  @!stat if tracing_restores>0 then restore_trace(p,"restoring");@+tats@;@/
6821  end
6822else  begin
6823  @!stat if tracing_restores>0 then restore_trace(p,"retaining");@+tats@;@/
6824  end
6825
6826@ @<Declare \eTeX\ procedures for tr...@>=
6827@!stat procedure restore_trace(@!p:pointer;@!s:str_number);
6828  {|eqtb[p]| has just been restored or retained}
6829begin begin_diagnostic; print_char("{"); print(s); print_char(" ");
6830show_eqtb(p); print_char("}");
6831end_diagnostic(false);
6832end;
6833tats
6834
6835@ When looking for possible pointers to a memory location, it is helpful
6836to look for references from |eqtb| that might be waiting on the
6837save stack. Of course, we might find spurious pointers too; but this
6838routine is merely an aid when debugging, and at such times we are
6839grateful for any scraps of information, even if they prove to be irrelevant.
6840@^dirty \PASCAL@>
6841
6842@<Search |save_stack| for equivalents that point to |p|@>=
6843if save_ptr>0 then for q:=0 to save_ptr-1 do
6844  begin if equiv_field(save_stack[q])=p then
6845    begin print_nl("SAVE("); print_int(q); print_char(")");
6846    end;
6847  end
6848
6849@ Most of the parameters kept in |eqtb| can be changed freely, but there's
6850an exception:  The magnification should not be used with two different
6851values during any \TeX\ job, since a single magnification is applied to an
6852entire run. The global variable |mag_set| is set to the current magnification
6853whenever it becomes necessary to ``freeze'' it at a particular value.
6854
6855@<Glob...@>=
6856@!mag_set:integer; {if nonzero, this magnification should be used henceforth}
6857
6858@ @<Set init...@>=
6859mag_set:=0;
6860
6861@ The |prepare_mag| subroutine is called whenever \TeX\ wants to use |mag|
6862for magnification.
6863
6864@p procedure prepare_mag;
6865begin if (mag_set>0)and(mag<>mag_set) then
6866  begin print_err("Incompatible magnification ("); print_int(mag);
6867@.Incompatible magnification@>
6868  print(");"); print_nl(" the previous value will be retained");
6869  help2("I can handle only one magnification ratio per job. So I've")@/
6870  ("reverted to the magnification you used earlier on this run.");@/
6871  int_error(mag_set);
6872  geq_word_define(int_base+mag_code,mag_set); {|mag:=mag_set|}
6873  end;
6874if (mag<=0)or(mag>32768) then
6875  begin print_err("Illegal magnification has been changed to 1000");@/
6876@.Illegal magnification...@>
6877  help1("The magnification ratio must be between 1 and 32768.");
6878  int_error(mag); geq_word_define(int_base+mag_code,1000);
6879  end;
6880mag_set:=mag;
6881end;
6882
6883@* \[20] Token lists.
6884A \TeX\ token is either a character or a control sequence, and it is
6885@^token@>
6886represented internally in one of two ways: (1)~A character whose ASCII
6887code number is |c| and whose command code is |m| is represented as the
6888number $2^{21}m+c$; the command code is in the range |1<=m<=14|. (2)~A control
6889sequence whose |eqtb| address is |p| is represented as the number
6890|cs_token_flag+p|. Here |cs_token_flag=@t$2^{25}-1$@>| is larger than
6891$2^{21}m+c$, yet it is small enough that |cs_token_flag+p< max_halfword|;
6892thus, a token fits comfortably in a halfword.
6893
6894A token |t| represents a |left_brace| command if and only if
6895|t<left_brace_limit|; it represents a |right_brace| command if and only if
6896we have |left_brace_limit<=t<right_brace_limit|; and it represents a |match| or
6897|end_match| command if and only if |match_token<=t<=end_match_token|.
6898The following definitions take care of these token-oriented constants
6899and a few others.
6900
6901@d cs_token_flag=@"1FFFFFF {amount added to the |eqtb| location in a
6902  token that stands for a control sequence; is a multiple of~|@"10000|, less~1}
6903@d max_char_val=@"200000 {to separate char and command code}
6904@d left_brace_token=@"200000 {$2^{21}\cdot|left_brace|$}
6905@d left_brace_limit=@"400000 {$2^{21}\cdot(|left_brace|+1)$}
6906@d right_brace_token=@"400000 {$2^{21}\cdot|right_brace|$}
6907@d right_brace_limit=@"600000 {$2^{21}\cdot(|right_brace|+1)$}
6908@d math_shift_token=@"600000 {$2^{21}\cdot|math_shift|$}
6909@d tab_token=@"800000 {$2^{21}\cdot|tab_mark|$}
6910@d out_param_token=@"A00000 {$2^{21}\cdot|out_param|$}
6911@d space_token=@"1400020 {$2^{21}\cdot|spacer|+|" "|$}
6912@d letter_token=@"1600000 {$2^{21}\cdot|letter|$}
6913@d other_token=@"1800000 {$2^{21}\cdot|other_char|$}
6914@d match_token=@"1A00000 {$2^{21}\cdot|match|$}
6915@d end_match_token=@"1C00000 {$2^{21}\cdot|end_match|$}
6916@#
6917@d protected_token=end_match_token+1 {$2^{21}\cdot|end_match|+1$}
6918
6919@ @<Check the ``constant''...@>=
6920if cs_token_flag+undefined_control_sequence>max_halfword then bad:=21;
6921
6922@ A token list is a singly linked list of one-word nodes in |mem|, where
6923each word contains a token and a link. Macro definitions, output-routine
6924definitions, marks, \.{\\write} texts, and a few other things
6925are remembered by \TeX\ in the form
6926of token lists, usually preceded by a node with a reference count in its
6927|token_ref_count| field. The token stored in location |p| is called
6928|info(p)|.
6929
6930Three special commands appear in the token lists of macro definitions.
6931When |m=match|, it means that \TeX\ should scan a parameter
6932for the current macro; when |m=end_match|, it means that parameter
6933matching should end and \TeX\ should start reading the macro text; and
6934when |m=out_param|, it means that \TeX\ should insert parameter
6935number |c| into the text at this point.
6936
6937The enclosing \.{\char'173} and \.{\char'175} characters of a macro
6938definition are omitted, but the final right brace of an output routine
6939is included at the end of its token list.
6940
6941Here is an example macro definition that illustrates these conventions.
6942After \TeX\ processes the text
6943$$\.{\\def\\mac a\#1\#2 \\b \{\#1\\-a \#\#1\#2 \#2\}}$$
6944the definition of \.{\\mac} is represented as a token list containing
6945$$\def\,{\hskip2pt}
6946\vbox{\halign{\hfil#\hfil\cr
6947(reference count), |letter|\,\.a, |match|\,\#, |match|\,\#, |spacer|\,\.\ ,
6948\.{\\b}, |end_match|,\cr
6949|out_param|\,1, \.{\\-}, |letter|\,\.a, |spacer|\,\.\ , |mac_param|\,\#,
6950|other_char|\,\.1,\cr
6951|out_param|\,2, |spacer|\,\.\ , |out_param|\,2.\cr}}$$
6952The procedure |scan_toks| builds such token lists, and |macro_call|
6953does the parameter matching.
6954@^reference counts@>
6955
6956Examples such as
6957$$\.{\\def\\m\{\\def\\m\{a\}\ b\}}$$
6958explain why reference counts would be needed even if \TeX\ had no \.{\\let}
6959operation: When the token list for \.{\\m} is being read, the redefinition of
6960\.{\\m} changes the |eqtb| entry before the token list has been fully
6961consumed, so we dare not simply destroy a token list when its
6962control sequence is being redefined.
6963
6964If the parameter-matching part of a definition ends with `\.{\#\{}',
6965the corresponding token list will have `\.\{' just before the `|end_match|'
6966and also at the very end. The first `\.\{' is used to delimit the parameter; the
6967second one keeps the first from disappearing.
6968
6969@ The procedure |show_token_list|, which prints a symbolic form of
6970the token list that starts at a given node |p|, illustrates these
6971conventions. The token list being displayed should not begin with a reference
6972count. However, the procedure is intended to be robust, so that if the
6973memory links are awry or if |p| is not really a pointer to a token list,
6974nothing catastrophic will happen.
6975
6976An additional parameter |q| is also given; this parameter is either null
6977or it points to a node in the token list where a certain magic computation
6978takes place that will be explained later. (Basically, |q| is non-null when
6979we are printing the two-line context information at the time of an error
6980message; |q| marks the place corresponding to where the second line
6981should begin.)
6982
6983For example, if |p| points to the node containing the first \.a in the
6984token list above, then |show_token_list| will print the string
6985$$\hbox{`\.{a\#1\#2\ \\b\ ->\#1\\-a\ \#\#1\#2\ \#2}';}$$
6986and if |q| points to the node containing the second \.a,
6987the magic computation will be performed just before the second \.a is printed.
6988
6989The generation will stop, and `\.{\\ETC.}' will be printed, if the length
6990of printing exceeds a given limit~|l|. Anomalous entries are printed in the
6991form of control sequences that are not followed by a blank space, e.g.,
6992`\.{\\BAD.}'; this cannot be confused with actual control sequences because
6993a real control sequence named \.{BAD} would come out `\.{\\BAD\ }'.
6994
6995@<Declare the procedure called |show_token_list|@>=
6996procedure show_token_list(@!p,@!q:integer;@!l:integer);
6997label exit;
6998var m,@!c:integer; {pieces of a token}
6999@!match_chr:integer; {character used in a `|match|'}
7000@!n:ASCII_code; {the highest parameter number, as an ASCII digit}
7001begin match_chr:="#"; n:="0"; tally:=0;
7002while (p<>null) and (tally<l) do
7003  begin if p=q then @<Do magic computation@>;
7004  @<Display token |p|, and |return| if there are problems@>;
7005  p:=link(p);
7006  end;
7007if p<>null then print_esc("ETC.");
7008@.ETC@>
7009exit:
7010end;
7011
7012@ @<Display token |p|...@>=
7013if (p<hi_mem_min) or (p>mem_end) then
7014  begin print_esc("CLOBBERED."); return;
7015@.CLOBBERED@>
7016  end;
7017if info(p)>=cs_token_flag then print_cs(info(p)-cs_token_flag)
7018else  begin m:=info(p) div max_char_val; c:=info(p) mod max_char_val;
7019  if info(p)<0 then print_esc("BAD.")
7020@.BAD@>
7021  else @<Display the token $(|m|,|c|)$@>;
7022  end
7023
7024@ The procedure usually ``learns'' the character code used for macro
7025parameters by seeing one in a |match| command before it runs into any
7026|out_param| commands.
7027
7028@<Display the token ...@>=
7029case m of
7030left_brace,right_brace,math_shift,tab_mark,sup_mark,sub_mark,spacer,
7031  letter,other_char: print_char(c);
7032mac_param: begin print_char(c); print_char(c);
7033  end;
7034out_param: begin print_char(match_chr);
7035  if c<=9 then print_char(c+"0")
7036  else  begin print_char("!"); return;
7037    end;
7038  end;
7039match: begin match_chr:=c; print_char(c); incr(n); print_char(n);
7040  if n>"9" then return;
7041  end;
7042end_match: if c=0 then print("->");
7043@.->@>
7044othercases print_esc("BAD.")
7045@.BAD@>
7046endcases
7047
7048@ Here's the way we sometimes want to display a token list, given a pointer
7049to its reference count; the pointer may be null.
7050
7051@p procedure token_show(@!p:pointer);
7052begin if p<>null then show_token_list(link(p),null,10000000);
7053end;
7054
7055@ The |print_meaning| subroutine displays |cur_cmd| and |cur_chr| in
7056symbolic form, including the expansion of a macro or mark.
7057
7058@p procedure print_meaning;
7059begin print_cmd_chr(cur_cmd,cur_chr);
7060if cur_cmd>=call then
7061  begin print_char(":"); print_ln; token_show(cur_chr);
7062  end
7063else if (cur_cmd=top_bot_mark)and(cur_chr<marks_code) then
7064  begin print_char(":"); print_ln;
7065  token_show(cur_mark[cur_chr]);
7066  end;
7067end;
7068
7069@* \[21] Introduction to the syntactic routines.
7070Let's pause a moment now and try to look at the Big Picture.
7071The \TeX\ program consists of three main parts: syntactic routines,
7072semantic routines, and output routines. The chief purpose of the
7073syntactic routines is to deliver the user's input to the semantic routines,
7074one token at a time. The semantic routines act as an interpreter
7075responding to these tokens, which may be regarded as commands. And the
7076output routines are periodically called on to convert box-and-glue
7077lists into a compact set of instructions that will be sent
7078to a typesetter. We have discussed the basic data structures and utility
7079routines of \TeX, so we are good and ready to plunge into the real activity by
7080considering the syntactic routines.
7081
7082Our current goal is to come to grips with the |get_next| procedure,
7083which is the keystone of \TeX's input mechanism. Each call of |get_next|
7084sets the value of three variables |cur_cmd|, |cur_chr|, and |cur_cs|,
7085representing the next input token.
7086$$\vbox{\halign{#\hfil\cr
7087  \hbox{|cur_cmd| denotes a command code from the long list of codes
7088   given above;}\cr
7089  \hbox{|cur_chr| denotes a character code or other modifier of the command
7090   code;}\cr
7091  \hbox{|cur_cs| is the |eqtb| location of the current control sequence,}\cr
7092  \hbox{\qquad if the current token was a control sequence,
7093   otherwise it's zero.}\cr}}$$
7094Underlying this external behavior of |get_next| is all the machinery
7095necessary to convert from character files to tokens. At a given time we
7096may be only partially finished with the reading of several files (for
7097which \.{\\input} was specified), and partially finished with the expansion
7098of some user-defined macros and/or some macro parameters, and partially
7099finished with the generation of some text in a template for \.{\\halign},
7100and so on. When reading a character file, special characters must be
7101classified as math delimiters, etc.; comments and extra blank spaces must
7102be removed, paragraphs must be recognized, and control sequences must be
7103found in the hash table. Furthermore there are occasions in which the
7104scanning routines have looked ahead for a word like `\.{plus}' but only
7105part of that word was found, hence a few characters must be put back
7106into the input and scanned again.
7107
7108To handle these situations, which might all be present simultaneously,
7109\TeX\ uses various stacks that hold information about the incomplete
7110activities, and there is a finite state control for each level of the
7111input mechanism. These stacks record the current state of an implicitly
7112recursive process, but the |get_next| procedure is not recursive.
7113Therefore it will not be difficult to translate these algorithms into
7114low-level languages that do not support recursion.
7115
7116@<Glob...@>=
7117@!cur_cmd: eight_bits; {current command set by |get_next|}
7118@!cur_chr: halfword; {operand of current command}
7119@!cur_cs: pointer; {control sequence found here, zero if none found}
7120@!cur_tok: halfword; {packed representative of |cur_cmd| and |cur_chr|}
7121
7122@ The |print_cmd_chr| routine prints a symbolic interpretation of a
7123command code and its modifier. This is used in certain `\.{You can\'t}'
7124error messages, and in the implementation of diagnostic routines like
7125\.{\\show}.
7126
7127The body of |print_cmd_chr| is a rather tedious listing of print
7128commands, and most of it is essentially an inverse to the |primitive|
7129routine that enters a \TeX\ primitive into |eqtb|. Therefore much of
7130this procedure appears elsewhere in the program,
7131together with the corresponding |primitive| calls.
7132
7133@d chr_cmd(#)==begin print(#);
7134   if chr_code < @"10000 then print_ASCII(chr_code)
7135   else print_char(chr_code); {non-Plane 0 Unicodes can't be sent through |print_ASCII|}
7136  end
7137
7138@<Declare the procedure called |print_cmd_chr|@>=
7139procedure print_cmd_chr(@!cmd:quarterword;@!chr_code:halfword);
7140var n:integer; {temp variable}
7141@!font_name_str:str_number; {local vars for \.{\\fontname} quoting extension}
7142@!quote_char:UTF16_code;
7143begin case cmd of
7144left_brace: chr_cmd("begin-group character ");
7145right_brace: chr_cmd("end-group character ");
7146math_shift: chr_cmd("math shift character ");
7147mac_param: chr_cmd("macro parameter character ");
7148sup_mark: chr_cmd("superscript character ");
7149sub_mark: chr_cmd("subscript character ");
7150endv: print("end of alignment template");
7151spacer: chr_cmd("blank space ");
7152letter: chr_cmd("the letter ");
7153other_char: chr_cmd("the character ");
7154@t\4@>@<Cases of |print_cmd_chr| for symbolic printing of primitives@>@/
7155othercases print("[unknown command code!]")
7156endcases;
7157end;
7158
7159@ Here is a procedure that displays the current command.
7160
7161@p procedure show_cur_cmd_chr;
7162var n:integer; {level of \.{\\if...\\fi} nesting}
7163@!l:integer; {line where \.{\\if} started}
7164@!p:pointer;
7165begin begin_diagnostic; print_nl("{");
7166if mode<>shown_mode then
7167  begin print_mode(mode); print(": "); shown_mode:=mode;
7168  end;
7169print_cmd_chr(cur_cmd,cur_chr);
7170if tracing_ifs>0 then
7171  if cur_cmd>=if_test then if cur_cmd<=fi_or_else then
7172    begin print(": ");
7173    if cur_cmd=fi_or_else then
7174      begin print_cmd_chr(if_test,cur_if); print_char(" ");
7175      n:=0; l:=if_line;
7176      end
7177    else  begin n:=1; l:=line;
7178      end;
7179    p:=cond_ptr;
7180    while p<>null do
7181      begin incr(n); p:=link(p);
7182      end;
7183    print("(level "); print_int(n); print_char(")"); print_if_line(l);
7184    end;
7185print_char("}");
7186end_diagnostic(false);
7187end;
7188
7189@* \[22] Input stacks and states.
7190This implementation of
7191\TeX\ uses two different conventions for representing sequential stacks.
7192@^stack conventions@>@^conventions for representing stacks@>
7193
7194\yskip\hangg 1) If there is frequent access to the top entry, and if the
7195stack is essentially never empty, then the top entry is kept in a global
7196variable (even better would be a machine register), and the other entries
7197appear in the array $\\{stack}[0\to(\\{ptr}-1)]$. For example, the
7198semantic stack described above is handled this way, and so is the input
7199stack that we are about to study.
7200
7201\yskip\hangg 2) If there is infrequent top access, the entire stack contents
7202are in the array $\\{stack}[0\to(\\{ptr}-1)]$. For example, the |save_stack|
7203is treated this way, as we have seen.
7204
7205\yskip\noindent
7206The state of \TeX's input mechanism appears in the input stack, whose
7207entries are records with six fields, called |state|, |index|, |start|, |loc|,
7208|limit|, and |name|. This stack is maintained with
7209convention~(1), so it is declared in the following way:
7210
7211@<Types...@>=
7212@!in_state_record = record
7213  @!state_field, @!index_field: quarterword;
7214  @!start_field,@!loc_field, @!limit_field, @!name_field: halfword;
7215  end;
7216
7217@ @<Glob...@>=
7218@!input_stack : array[0..stack_size] of in_state_record;
7219@!input_ptr : 0..stack_size; {first unused location of |input_stack|}
7220@!max_in_stack: 0..stack_size; {largest value of |input_ptr| when pushing}
7221@!cur_input : in_state_record;
7222  {the ``top'' input state, according to convention (1)}
7223
7224@ We've already defined the special variable |loc==cur_input.loc_field|
7225in our discussion of basic input-output routines. The other components of
7226|cur_input| are defined in the same way:
7227
7228@d state==cur_input.state_field {current scanner state}
7229@d index==cur_input.index_field {reference for buffer information}
7230@d start==cur_input.start_field {starting position in |buffer|}
7231@d limit==cur_input.limit_field {end of current line in |buffer|}
7232@d name==cur_input.name_field {name of the current file}
7233
7234@ Let's look more closely now at the control variables
7235(|state|,~|index|,~|start|,~|loc|,~|limit|,~|name|),
7236assuming that \TeX\ is reading a line of characters that have been input
7237from some file or from the user's terminal. There is an array called
7238|buffer| that acts as a stack of all lines of characters that are
7239currently being read from files, including all lines on subsidiary
7240levels of the input stack that are not yet completed. \TeX\ will return to
7241the other lines when it is finished with the present input file.
7242
7243(Incidentally, on a machine with byte-oriented addressing, it might be
7244appropriate to combine |buffer| with the |str_pool| array,
7245letting the buffer entries grow downward from the top of the string pool
7246and checking that these two tables don't bump into each other.)
7247
7248The line we are currently working on begins in position |start| of the
7249buffer; the next character we are about to read is |buffer[loc]|; and
7250|limit| is the location of the last character present.  If |loc>limit|,
7251the line has been completely read. Usually |buffer[limit]| is the
7252|end_line_char|, denoting the end of a line, but this is not
7253true if the current line is an insertion that was entered on the user's
7254terminal in response to an error message.
7255
7256The |name| variable is a string number that designates the name of
7257the current file, if we are reading a text file. It is zero if we
7258are reading from the terminal; it is |n+1| if we are reading from
7259input stream |n|, where |0<=n<=16|. (Input stream 16 stands for
7260an invalid stream number; in such cases the input is actually from
7261the terminal, under control of the procedure |read_toks|.)
7262Finally |18<=name<=19| indicates that we are reading a pseudo file
7263created by the \.{\\scantokens} command.
7264
7265The |state| variable has one of three values, when we are scanning such
7266files:
7267$$\baselineskip 15pt\vbox{\halign{#\hfil\cr
72681) |state=mid_line| is the normal state.\cr
72692) |state=skip_blanks| is like |mid_line|, but blanks are ignored.\cr
72703) |state=new_line| is the state at the beginning of a line.\cr}}$$
7271These state values are assigned numeric codes so that if we add the state
7272code to the next character's command code, we get distinct values. For
7273example, `|mid_line+spacer|' stands for the case that a blank
7274space character occurs in the middle of a line when it is not being
7275ignored; after this case is processed, the next value of |state| will
7276be |skip_blanks|.
7277
7278@d mid_line=1 {|state| code when scanning a line of characters}
7279@d skip_blanks=2+max_char_code {|state| code when ignoring blanks}
7280@d new_line=3+max_char_code+max_char_code {|state| code at start of line}
7281
7282@ Additional information about the current line is available via the
7283|index| variable, which counts how many lines of characters are present
7284in the buffer below the current level. We have |index=0| when reading
7285from the terminal and prompting the user for each line; then if the user types,
7286e.g., `\.{\\input paper}', we will have |index=1| while reading
7287the file \.{paper.tex}. However, it does not follow that |index| is the
7288same as the input stack pointer, since many of the levels on the input
7289stack may come from token lists. For example, the instruction `\.{\\input
7290paper}' might occur in a token list.
7291
7292The global variable |in_open| is equal to the |index|
7293value of the highest non-token-list level. Thus, the number of partially read
7294lines in the buffer is |in_open+1|, and we have |in_open=index|
7295when we are not reading a token list.
7296
7297If we are not currently reading from the terminal, or from an input
7298stream, we are reading from the file variable |input_file[index]|. We use
7299the notation |terminal_input| as a convenient abbreviation for |name=0|,
7300and |cur_file| as an abbreviation for |input_file[index]|.
7301
7302The global variable |line| contains the line number in the topmost
7303open file, for use in error messages. If we are not reading from
7304the terminal, |line_stack[index]| holds the line number for the
7305enclosing level, so that |line| can be restored when the current
7306file has been read. Line numbers should never be negative, since the
7307negative of the current line number is used to identify the user's output
7308routine in the |mode_line| field of the semantic nest entries.
7309
7310If more information about the input state is needed, it can be
7311included in small arrays like those shown here. For example,
7312the current page or segment number in the input file might be
7313put into a variable |@!page|, maintained for enclosing levels in
7314`\ignorespaces|@!page_stack:array[1..max_in_open] of integer|\unskip'
7315by analogy with |line_stack|.
7316@^system dependencies@>
7317
7318@d terminal_input==(name=0) {are we reading from the terminal?}
7319@d cur_file==input_file[index] {the current |alpha_file| variable}
7320
7321@<Glob...@>=
7322@!in_open : 0..max_in_open; {the number of lines in the buffer, less one}
7323@!open_parens : 0..max_in_open; {the number of open text files}
7324@!input_file : array[1..max_in_open] of alpha_file;
7325@!line : integer; {current line number in the current source file}
7326@!line_stack : array[1..max_in_open] of integer;
7327
7328@ Users of \TeX\ sometimes forget to balance left and right braces properly,
7329and one of the ways \TeX\ tries to spot such errors is by considering an
7330input file as broken into subfiles by control sequences that
7331are declared to be \.{\\outer}.
7332
7333A variable called |scanner_status| tells \TeX\ whether or not to complain
7334when a subfile ends. This variable has six possible values:
7335
7336\yskip\hang|normal|, means that a subfile can safely end here without incident.
7337
7338\yskip\hang|skipping|, means that a subfile can safely end here, but not a file,
7339because we're reading past some conditional text that was not selected.
7340
7341\yskip\hang|defining|, means that a subfile shouldn't end now because a
7342macro is being defined.
7343
7344\yskip\hang|matching|, means that a subfile shouldn't end now because a
7345macro is being used and we are searching for the end of its arguments.
7346
7347\yskip\hang|aligning|, means that a subfile shouldn't end now because we are
7348not finished with the preamble of an \.{\\halign} or \.{\\valign}.
7349
7350\yskip\hang|absorbing|, means that a subfile shouldn't end now because we are
7351reading a balanced token list for \.{\\message}, \.{\\write}, etc.
7352
7353\yskip\noindent
7354If the |scanner_status| is not |normal|, the variable |warning_index| points
7355to the |eqtb| location for the relevant control sequence name to print
7356in an error message.
7357
7358@d skipping=1 {|scanner_status| when passing conditional text}
7359@d defining=2 {|scanner_status| when reading a macro definition}
7360@d matching=3 {|scanner_status| when reading macro arguments}
7361@d aligning=4 {|scanner_status| when reading an alignment preamble}
7362@d absorbing=5 {|scanner_status| when reading a balanced text}
7363
7364@<Glob...@>=
7365@!scanner_status : normal..absorbing; {can a subfile end now?}
7366@!warning_index : pointer; {identifier relevant to non-|normal| scanner status}
7367@!def_ref : pointer; {reference count of token list being defined}
7368
7369@ Here is a procedure that uses |scanner_status| to print a warning message
7370when a subfile has ended, and at certain other crucial times:
7371
7372@<Declare the procedure called |runaway|@>=
7373procedure runaway;
7374var p:pointer; {head of runaway list}
7375begin if scanner_status>skipping then
7376  begin print_nl("Runaway ");
7377@.Runaway...@>
7378  case scanner_status of
7379  defining: begin print("definition"); p:=def_ref;
7380    end;
7381  matching: begin print("argument"); p:=temp_head;
7382    end;
7383  aligning: begin print("preamble"); p:=hold_head;
7384    end;
7385  absorbing: begin print("text"); p:=def_ref;
7386    end;
7387  end; {there are no other cases}
7388  print_char("?");print_ln; show_token_list(link(p),null,error_line-10);
7389  end;
7390end;
7391
7392@ However, all this discussion about input state really applies only to the
7393case that we are inputting from a file. There is another important case,
7394namely when we are currently getting input from a token list. In this case
7395|state=token_list|, and the conventions about the other state variables
7396are different:
7397
7398\yskip\hang|loc| is a pointer to the current node in the token list, i.e.,
7399the node that will be read next. If |loc=null|, the token list has been
7400fully read.
7401
7402\yskip\hang|start| points to the first node of the token list; this node
7403may or may not contain a reference count, depending on the type of token
7404list involved.
7405
7406\yskip\hang|token_type|, which takes the place of |index| in the
7407discussion above, is a code number that explains what kind of token list
7408is being scanned.
7409
7410\yskip\hang|name| points to the |eqtb| address of the control sequence
7411being expanded, if the current token list is a macro.
7412
7413\yskip\hang|param_start|, which takes the place of |limit|, tells where
7414the parameters of the current macro begin in the |param_stack|, if the
7415current token list is a macro.
7416
7417\yskip\noindent The |token_type| can take several values, depending on
7418where the current token list came from:
7419
7420\yskip\hang|parameter|, if a parameter is being scanned;
7421
7422\hang|u_template|, if the \<u_j> part of an alignment
7423template is being scanned;
7424
7425\hang|v_template|, if the \<v_j> part of an alignment
7426template is being scanned;
7427
7428\hang|backed_up|, if the token list being scanned has been inserted as
7429`to be read again'.
7430
7431\hang|inserted|, if the token list being scanned has been inserted as
7432the text expansion of a \.{\\count} or similar variable;
7433
7434\hang|macro|, if a user-defined control sequence is being scanned;
7435
7436\hang|output_text|, if an \.{\\output} routine is being scanned;
7437
7438\hang|every_par_text|, if the text of \.{\\everypar} is being scanned;
7439
7440\hang|every_math_text|, if the text of \.{\\everymath} is being scanned;
7441
7442\hang|every_display_text|, if the text of \.{\\everydisplay} is being scanned;
7443
7444\hang|every_hbox_text|, if the text of \.{\\everyhbox} is being scanned;
7445
7446\hang|every_vbox_text|, if the text of \.{\\everyvbox} is being scanned;
7447
7448\hang|every_job_text|, if the text of \.{\\everyjob} is being scanned;
7449
7450\hang|every_cr_text|, if the text of \.{\\everycr} is being scanned;
7451
7452\hang|mark_text|, if the text of a \.{\\mark} is being scanned;
7453
7454\hang|write_text|, if the text of a \.{\\write} is being scanned.
7455
7456\yskip\noindent
7457The codes for |output_text|, |every_par_text|, etc., are equal to a constant
7458plus the corresponding codes for token list parameters |output_routine_loc|,
7459|every_par_loc|, etc.  The token list begins with a reference count if and
7460only if |token_type>=macro|.
7461@^reference counts@>
7462
7463Since \eTeX's additional token list parameters precede |toks_base|, the
7464corresponding token types must precede |write_text|.
7465
7466@d token_list=0 {|state| code when scanning a token list}
7467@d token_type==index {type of current token list}
7468@d param_start==limit {base of macro parameters in |param_stack|}
7469@d parameter=0 {|token_type| code for parameter}
7470@d u_template=1 {|token_type| code for \<u_j> template}
7471@d v_template=2 {|token_type| code for \<v_j> template}
7472@d backed_up=3 {|token_type| code for text to be reread}
7473@d backed_up_char=4 {special code for backed-up char from \\XeTeXinterchartoks hook}
7474@d inserted=5 {|token_type| code for inserted texts}
7475@d macro=6 {|token_type| code for defined control sequences}
7476@d output_text=7 {|token_type| code for output routines}
7477@d every_par_text=8 {|token_type| code for \.{\\everypar}}
7478@d every_math_text=9 {|token_type| code for \.{\\everymath}}
7479@d every_display_text=10 {|token_type| code for \.{\\everydisplay}}
7480@d every_hbox_text=11 {|token_type| code for \.{\\everyhbox}}
7481@d every_vbox_text=12 {|token_type| code for \.{\\everyvbox}}
7482@d every_job_text=13 {|token_type| code for \.{\\everyjob}}
7483@d every_cr_text=14 {|token_type| code for \.{\\everycr}}
7484@d mark_text=15 {|token_type| code for \.{\\topmark}, etc.}
7485@#
7486@d eTeX_text_offset=output_routine_loc-output_text
7487@d every_eof_text=every_eof_loc-eTeX_text_offset
7488  {|token_type| code for \.{\\everyeof}}
7489@#
7490@d inter_char_text=XeTeX_inter_char_loc-eTeX_text_offset
7491  {|token_type| code for \.{\\XeTeXinterchartoks}}
7492@#
7493@d write_text=toks_base-eTeX_text_offset {|token_type| code for \.{\\write}}
7494
7495@ The |param_stack| is an auxiliary array used to hold pointers to the token
7496lists for parameters at the current level and subsidiary levels of input.
7497This stack is maintained with convention (2), and it grows at a different
7498rate from the others.
7499
7500@<Glob...@>=
7501@!param_stack:array [0..param_size] of pointer;
7502  {token list pointers for parameters}
7503@!param_ptr:0..param_size; {first unused entry in |param_stack|}
7504@!max_param_stack:integer;
7505  {largest value of |param_ptr|, will be |<=param_size+9|}
7506
7507@ The input routines must also interact with the processing of
7508\.{\\halign} and \.{\\valign}, since the appearance of tab marks and
7509\.{\\cr} in certain places is supposed to trigger the beginning of special
7510\<v_j> template text in the scanner. This magic is accomplished by an
7511|align_state| variable that is increased by~1 when a `\.{\char'173}' is
7512scanned and decreased by~1 when a `\.{\char'175}' is scanned. The |align_state|
7513is nonzero during the \<u_j> template, after which it is set to zero; the
7514\<v_j> template begins when a tab mark or \.{\\cr} occurs at a time that
7515|align_state=0|.
7516
7517@<Glob...@>=
7518@!align_state:integer; {group level with respect to current alignment}
7519
7520@ Thus, the ``current input state'' can be very complicated indeed; there
7521can be many levels and each level can arise in a variety of ways. The
7522|show_context| procedure, which is used by \TeX's error-reporting routine to
7523print out the current input state on all levels down to the most recent
7524line of characters from an input file, illustrates most of these conventions.
7525The global variable |base_ptr| contains the lowest level that was
7526displayed by this procedure.
7527
7528@<Glob...@>=
7529@!base_ptr:0..stack_size; {shallowest level shown by |show_context|}
7530
7531@ The status at each level is indicated by printing two lines, where the first
7532line indicates what was read so far and the second line shows what remains
7533to be read. The context is cropped, if necessary, so that the first line
7534contains at most |half_error_line| characters, and the second contains
7535at most |error_line|. Non-current input levels whose |token_type| is
7536`|backed_up|' are shown only if they have not been fully read.
7537
7538@p procedure show_context; {prints where the scanner is}
7539label done;
7540var old_setting:0..max_selector; {saved |selector| setting}
7541@!nn:integer; {number of contexts shown so far, less one}
7542@!bottom_line:boolean; {have we reached the final context to be shown?}
7543@<Local variables for formatting calculations@>@/
7544begin base_ptr:=input_ptr; input_stack[base_ptr]:=cur_input;
7545  {store current state}
7546nn:=-1; bottom_line:=false;
7547loop@+begin cur_input:=input_stack[base_ptr]; {enter into the context}
7548  if (state<>token_list) then
7549    if (name>19) or (base_ptr=0) then bottom_line:=true;
7550  if (base_ptr=input_ptr)or bottom_line or(nn<error_context_lines) then
7551    @<Display the current context@>
7552  else if nn=error_context_lines then
7553    begin print_nl("..."); incr(nn); {omitted if |error_context_lines<0|}
7554    end;
7555  if bottom_line then goto done;
7556  decr(base_ptr);
7557  end;
7558done: cur_input:=input_stack[input_ptr]; {restore original state}
7559end;
7560
7561@ @<Display the current context@>=
7562begin if (base_ptr=input_ptr) or (state<>token_list) or
7563   (token_type<>backed_up) or (loc<>null) then
7564    {we omit backed-up token lists that have already been read}
7565  begin tally:=0; {get ready to count characters}
7566  old_setting:=selector;
7567  if state<>token_list then
7568    begin @<Print location of current line@>;
7569    @<Pseudoprint the line@>;
7570    end
7571  else  begin @<Print type of token list@>;
7572    @<Pseudoprint the token list@>;
7573    end;
7574  selector:=old_setting; {stop pseudoprinting}
7575  @<Print two lines using the tricky pseudoprinted information@>;
7576  incr(nn);
7577  end;
7578end
7579
7580@ This routine should be changed, if necessary, to give the best possible
7581indication of where the current line resides in the input file.
7582For example, on some systems it is best to print both a page and line number.
7583@^system dependencies@>
7584
7585@<Print location of current line@>=
7586if name<=17 then
7587  if terminal_input then
7588    if base_ptr=0 then print_nl("<*>") else print_nl("<insert> ")
7589  else  begin print_nl("<read ");
7590    if name=17 then print_char("*")@+else print_int(name-1);
7591@.*\relax@>
7592    print_char(">");
7593    end
7594else  begin print_nl("l.");
7595  if index=in_open then print_int(line)
7596  else print_int(line_stack[index+1]); {input from a pseudo file}
7597  end;
7598print_char(" ")
7599
7600@ @<Print type of token list@>=
7601case token_type of
7602parameter: print_nl("<argument> ");
7603u_template,v_template: print_nl("<template> ");
7604backed_up,backed_up_char: if loc=null then print_nl("<recently read> ")
7605  else print_nl("<to be read again> ");
7606inserted: print_nl("<inserted text> ");
7607macro: begin print_ln; print_cs(name);
7608  end;
7609output_text: print_nl("<output> ");
7610every_par_text: print_nl("<everypar> ");
7611every_math_text: print_nl("<everymath> ");
7612every_display_text: print_nl("<everydisplay> ");
7613every_hbox_text: print_nl("<everyhbox> ");
7614every_vbox_text: print_nl("<everyvbox> ");
7615every_job_text: print_nl("<everyjob> ");
7616every_cr_text: print_nl("<everycr> ");
7617mark_text: print_nl("<mark> ");
7618every_eof_text: print_nl("<everyeof> ");
7619inter_char_text: print_nl("<XeTeXinterchartoks> ");
7620write_text: print_nl("<write> ");
7621othercases print_nl("?") {this should never happen}
7622endcases
7623
7624@ Here it is necessary to explain a little trick. We don't want to store a long
7625string that corresponds to a token list, because that string might take up
7626lots of memory; and we are printing during a time when an error message is
7627being given, so we dare not do anything that might overflow one of \TeX's
7628tables. So `pseudoprinting' is the answer: We enter a mode of printing
7629that stores characters into a buffer of length |error_line|, where character
7630$k+1$ is placed into \hbox{|trick_buf[k mod error_line]|} if
7631|k<trick_count|, otherwise character |k| is dropped. Initially we set
7632|tally:=0| and |trick_count:=1000000|; then when we reach the
7633point where transition from line 1 to line 2 should occur, we
7634set |first_count:=tally| and |trick_count:=@tmax@>(error_line,
7635tally+1+error_line-half_error_line)|. At the end of the
7636pseudoprinting, the values of |first_count|, |tally|, and
7637|trick_count| give us all the information we need to print the two lines,
7638and all of the necessary text is in |trick_buf|.
7639
7640Namely, let |l| be the length of the descriptive information that appears
7641on the first line. The length of the context information gathered for that
7642line is |k=first_count|, and the length of the context information
7643gathered for line~2 is $m=\min(|tally|, |trick_count|)-k$. If |l+k<=h|,
7644where |h=half_error_line|, we print |trick_buf[0..k-1]| after the
7645descriptive information on line~1, and set |n:=l+k|; here |n| is the
7646length of line~1. If $l+k>h$, some cropping is necessary, so we set |n:=h|
7647and print `\.{...}' followed by
7648$$\hbox{|trick_buf[(l+k-h+3)..k-1]|,}$$
7649where subscripts of |trick_buf| are circular modulo |error_line|. The
7650second line consists of |n|~spaces followed by |trick_buf[k..(k+m-1)]|,
7651unless |n+m>error_line|; in the latter case, further cropping is done.
7652This is easier to program than to explain.
7653
7654@<Local variables for formatting...@>=
7655@!i:0..buf_size; {index into |buffer|}
7656@!j:0..buf_size; {end of current line in |buffer|}
7657@!l:0..half_error_line; {length of descriptive information on line 1}
7658@!m:integer; {context information gathered for line 2}
7659@!n:0..error_line; {length of line 1}
7660@!p: integer; {starting or ending place in |trick_buf|}
7661@!q: integer; {temporary index}
7662
7663@ The following code sets up the print routines so that they will gather
7664the desired information.
7665
7666@d begin_pseudoprint==
7667  begin l:=tally; tally:=0; selector:=pseudo;
7668  trick_count:=1000000;
7669  end
7670@d set_trick_count==
7671  begin first_count:=tally;
7672  trick_count:=tally+1+error_line-half_error_line;
7673  if trick_count<error_line then trick_count:=error_line;
7674  end
7675
7676@ And the following code uses the information after it has been gathered.
7677
7678@<Print two lines using the tricky pseudoprinted information@>=
7679if trick_count=1000000 then set_trick_count;
7680  {|set_trick_count| must be performed}
7681if tally<trick_count then m:=tally-first_count
7682else m:=trick_count-first_count; {context on line 2}
7683if l+first_count<=half_error_line then
7684  begin p:=0; n:=l+first_count;
7685  end
7686else  begin print("..."); p:=l+first_count-half_error_line+3;
7687  n:=half_error_line;
7688  end;
7689for q:=p to first_count-1 do print_visible_char(trick_buf[q mod error_line]);
7690print_ln;
7691for q:=1 to n do print_visible_char(" "); {print |n| spaces to begin line~2}
7692if m+n<=error_line then p:=first_count+m else p:=first_count+(error_line-n-3);
7693for q:=first_count to p-1 do print_visible_char(trick_buf[q mod error_line]);
7694if m+n>error_line then print("...")
7695
7696@ But the trick is distracting us from our current goal, which is to
7697understand the input state. So let's concentrate on the data structures that
7698are being pseudoprinted as we finish up the |show_context| procedure.
7699
7700@<Pseudoprint the line@>=
7701begin_pseudoprint;
7702if buffer[limit]=end_line_char then j:=limit
7703else j:=limit+1; {determine the effective end of the line}
7704if j>0 then for i:=start to j-1 do
7705  begin if i=loc then set_trick_count;
7706  print_char(buffer[i]);
7707  end
7708
7709@ @<Pseudoprint the token list@>=
7710begin_pseudoprint;
7711if token_type<macro then show_token_list(start,loc,100000)
7712else show_token_list(link(start),loc,100000) {avoid reference count}
7713
7714@ Here is the missing piece of |show_token_list| that is activated when the
7715token beginning line~2 is about to be shown:
7716
7717@<Do magic computation@>=set_trick_count
7718
7719@* \[23] Maintaining the input stacks.
7720The following subroutines change the input status in commonly needed ways.
7721
7722First comes |push_input|, which stores the current state and creates a
7723new level (having, initially, the same properties as the old).
7724
7725@d push_input==@t@> {enter a new input level, save the old}
7726  begin if input_ptr>max_in_stack then
7727    begin max_in_stack:=input_ptr;
7728    if input_ptr=stack_size then overflow("input stack size",stack_size);
7729@:TeX capacity exceeded input stack size}{\quad input stack size@>
7730    end;
7731  input_stack[input_ptr]:=cur_input; {stack the record}
7732  incr(input_ptr);
7733  end
7734
7735@ And of course what goes up must come down.
7736
7737@d pop_input==@t@> {leave an input level, re-enter the old}
7738  begin decr(input_ptr); cur_input:=input_stack[input_ptr];
7739  end
7740
7741@ Here is a procedure that starts a new level of token-list input, given
7742a token list |p| and its type |t|. If |t=macro|, the calling routine should
7743set |name| and |loc|.
7744
7745@d back_list(#)==begin_token_list(#,backed_up) {backs up a simple token list}
7746@d ins_list(#)==begin_token_list(#,inserted) {inserts a simple token list}
7747
7748@p procedure begin_token_list(@!p:pointer;@!t:quarterword);
7749begin push_input; state:=token_list; start:=p; token_type:=t;
7750if t>=macro then {the token list starts with a reference count}
7751  begin add_token_ref(p);
7752  if t=macro then param_start:=param_ptr
7753  else  begin loc:=link(p);
7754    if tracing_macros>1 then
7755      begin begin_diagnostic; print_nl("");
7756      case t of
7757      mark_text:print_esc("mark");
7758      write_text:print_esc("write");
7759      othercases print_cmd_chr(assign_toks,t-output_text+output_routine_loc)
7760      endcases;@/
7761      print("->"); token_show(p); end_diagnostic(false);
7762      end;
7763    end;
7764  end
7765else loc:=p;
7766end;
7767
7768@ When a token list has been fully scanned, the following computations
7769should be done as we leave that level of input. The |token_type| tends
7770to be equal to either |backed_up| or |inserted| about 2/3 of the time.
7771@^inner loop@>
7772
7773@p procedure end_token_list; {leave a token-list input level}
7774begin if token_type>=backed_up then {token list to be deleted}
7775  begin if token_type<=inserted then flush_list(start)
7776  else  begin delete_token_ref(start); {update reference count}
7777    if token_type=macro then {parameters must be flushed}
7778      while param_ptr>param_start do
7779        begin decr(param_ptr);
7780        flush_list(param_stack[param_ptr]);
7781        end;
7782    end;
7783  end
7784else if token_type=u_template then
7785  if align_state>500000 then align_state:=0
7786  else fatal_error("(interwoven alignment preambles are not allowed)");
7787@.interwoven alignment preambles...@>
7788pop_input;
7789check_interrupt;
7790end;
7791
7792@ Sometimes \TeX\ has read too far and wants to ``unscan'' what it has
7793seen. The |back_input| procedure takes care of this by putting the token
7794just scanned back into the input stream, ready to be read again. This
7795procedure can be used only if |cur_tok| represents the token to be
7796replaced. Some applications of \TeX\ use this procedure a lot,
7797so it has been slightly optimized for speed.
7798@^inner loop@>
7799
7800@p procedure back_input; {undoes one token of input}
7801var p:pointer; {a token list of length one}
7802begin while (state=token_list)and(loc=null)and(token_type<>v_template) do
7803  end_token_list; {conserve stack space}
7804p:=get_avail; info(p):=cur_tok;
7805if cur_tok<right_brace_limit then
7806  if cur_tok<left_brace_limit then decr(align_state)
7807  else incr(align_state);
7808push_input; state:=token_list; start:=p; token_type:=backed_up;
7809loc:=p; {that was |back_list(p)|, without procedure overhead}
7810end;
7811
7812@ @<Insert token |p| into \TeX's input@>=
7813begin t:=cur_tok; cur_tok:=p;
7814if a then
7815  begin p:=get_avail; info(p):=cur_tok; link(p):=loc; loc:=p; start:=p;
7816  if cur_tok<right_brace_limit then
7817    if cur_tok<left_brace_limit then decr(align_state)
7818    else incr(align_state);
7819  end
7820else  begin back_input; a:=eTeX_ex;
7821  end;
7822cur_tok:=t;
7823end
7824
7825@ The |back_error| routine is used when we want to replace an offending token
7826just before issuing an error message. This routine, like |back_input|,
7827requires that |cur_tok| has been set. We disable interrupts during the
7828call of |back_input| so that the help message won't be lost.
7829
7830@p procedure back_error; {back up one token and call |error|}
7831begin OK_to_interrupt:=false; back_input; OK_to_interrupt:=true; error;
7832end;
7833@#
7834procedure ins_error; {back up one inserted token and call |error|}
7835begin OK_to_interrupt:=false; back_input; token_type:=inserted;
7836OK_to_interrupt:=true; error;
7837end;
7838
7839@ The |begin_file_reading| procedure starts a new level of input for lines
7840of characters to be read from a file, or as an insertion from the
7841terminal. It does not take care of opening the file, nor does it set |loc|
7842or |limit| or |line|.
7843@^system dependencies@>
7844
7845@p procedure begin_file_reading;
7846begin if in_open=max_in_open then overflow("text input levels",max_in_open);
7847@:TeX capacity exceeded text input levels}{\quad text input levels@>
7848if first=buf_size then overflow("buffer size",buf_size);
7849@:TeX capacity exceeded buffer size}{\quad buffer size@>
7850incr(in_open); push_input; index:=in_open;
7851eof_seen[index]:=false;
7852grp_stack[index]:=cur_boundary; if_stack[index]:=cond_ptr;
7853line_stack[index]:=line; start:=first; state:=mid_line;
7854name:=0; {|terminal_input| is now |true|}
7855end;
7856
7857@ Conversely, the variables must be downdated when such a level of input
7858is finished:
7859
7860@p procedure end_file_reading;
7861begin first:=start; line:=line_stack[index];
7862if (name=18)or(name=19) then pseudo_close else
7863if name>17 then u_close(cur_file); {forget it}
7864pop_input; decr(in_open);
7865end;
7866
7867@ In order to keep the stack from overflowing during a long sequence of
7868inserted `\.{\\show}' commands, the following routine removes completed
7869error-inserted lines from memory.
7870
7871@p procedure clear_for_error_prompt;
7872begin while (state<>token_list)and terminal_input and@|
7873  (input_ptr>0)and(loc>limit) do end_file_reading;
7874print_ln; clear_terminal;
7875end;
7876
7877@ To get \TeX's whole input mechanism going, we perform the following
7878actions.
7879
7880@<Initialize the input routines@>=
7881begin input_ptr:=0; max_in_stack:=0;
7882in_open:=0; open_parens:=0; max_buf_stack:=0;
7883grp_stack[0]:=0; if_stack[0]:=null;
7884param_ptr:=0; max_param_stack:=0;
7885first:=buf_size; repeat buffer[first]:=0; decr(first); until first=0;
7886scanner_status:=normal; warning_index:=null; first:=1;
7887state:=new_line; start:=1; index:=0; line:=0; name:=0;
7888force_eof:=false;
7889align_state:=1000000;@/
7890if not init_terminal then goto final_end;
7891limit:=last; first:=last+1; {|init_terminal| has set |loc| and |last|}
7892end
7893
7894@* \[24] Getting the next token.
7895The heart of \TeX's input mechanism is the |get_next| procedure, which
7896we shall develop in the next few sections of the program. Perhaps we
7897shouldn't actually call it the ``heart,'' however, because it really acts
7898as \TeX's eyes and mouth, reading the source files and gobbling them up.
7899And it also helps \TeX\ to regurgitate stored token lists that are to be
7900processed again.
7901@^eyes and mouth@>
7902
7903The main duty of |get_next| is to input one token and to set |cur_cmd|
7904and |cur_chr| to that token's command code and modifier. Furthermore, if
7905the input token is a control sequence, the |eqtb| location of that control
7906sequence is stored in |cur_cs|; otherwise |cur_cs| is set to zero.
7907
7908Underlying this simple description is a certain amount of complexity
7909because of all the cases that need to be handled.
7910However, the inner loop of |get_next| is reasonably short and fast.
7911
7912When |get_next| is asked to get the next token of a \.{\\read} line,
7913it sets |cur_cmd=cur_chr=cur_cs=0| in the case that no more tokens
7914appear on that line. (There might not be any tokens at all, if the
7915|end_line_char| has |ignore| as its catcode.)
7916
7917@ The value of |par_loc| is the |eqtb| address of `\.{\\par}'. This quantity
7918is needed because a blank line of input is supposed to be exactly equivalent
7919to the appearance of \.{\\par}; we must set |cur_cs:=par_loc|
7920when detecting a blank line.
7921
7922@<Glob...@>=
7923@!par_loc:pointer; {location of `\.{\\par}' in |eqtb|}
7924@!par_token:halfword; {token representing `\.{\\par}'}
7925
7926@ @<Put each...@>=
7927primitive("par",par_end,too_big_usv); {cf.\ |scan_file_name|}
7928@!@:par_}{\.{\\par} primitive@>
7929par_loc:=cur_val; par_token:=cs_token_flag+par_loc;
7930
7931@ @<Cases of |print_cmd_chr|...@>=
7932par_end:print_esc("par");
7933
7934@ Before getting into |get_next|, let's consider the subroutine that
7935is called when an `\.{\\outer}' control sequence has been scanned or
7936when the end of a file has been reached. These two cases are distinguished
7937by |cur_cs|, which is zero at the end of a file.
7938
7939@p procedure check_outer_validity;
7940var p:pointer; {points to inserted token list}
7941@!q:pointer; {auxiliary pointer}
7942begin if scanner_status<>normal then
7943  begin deletions_allowed:=false;
7944  @<Back up an outer control sequence so that it can be reread@>;
7945  if scanner_status>skipping then
7946    @<Tell the user what has run away and try to recover@>
7947  else  begin print_err("Incomplete "); print_cmd_chr(if_test,cur_if);
7948@.Incomplete \\if...@>
7949    print("; all text was ignored after line "); print_int(skip_line);
7950    help3("A forbidden control sequence occurred in skipped text.")@/
7951    ("This kind of error happens when you say `\if...' and forget")@/
7952    ("the matching `\fi'. I've inserted a `\fi'; this might work.");
7953    if cur_cs<>0 then cur_cs:=0
7954    else help_line[2]:=@|
7955      "The file ended while I was skipping conditional text.";
7956    cur_tok:=cs_token_flag+frozen_fi; ins_error;
7957    end;
7958  deletions_allowed:=true;
7959  end;
7960end;
7961
7962@ An outer control sequence that occurs in a \.{\\read} will not be reread,
7963since the error recovery for \.{\\read} is not very powerful.
7964
7965@<Back up an outer control sequence so that it can be reread@>=
7966if cur_cs<>0 then
7967  begin if (state=token_list)or(name<1)or(name>17) then
7968    begin p:=get_avail; info(p):=cs_token_flag+cur_cs;
7969    back_list(p); {prepare to read the control sequence again}
7970    end;
7971  cur_cmd:=spacer; cur_chr:=" "; {replace it by a space}
7972  end
7973
7974@ @<Tell the user what has run away...@>=
7975begin runaway; {print a definition, argument, or preamble}
7976if cur_cs=0 then print_err("File ended")
7977@.File ended while scanning...@>
7978else  begin cur_cs:=0; print_err("Forbidden control sequence found");
7979@.Forbidden control sequence...@>
7980  end;
7981print(" while scanning ");
7982@<Print either `\.{definition}' or `\.{use}' or `\.{preamble}' or `\.{text}',
7983  and insert tokens that should lead to recovery@>;
7984print(" of "); sprint_cs(warning_index);
7985help4("I suspect you have forgotten a `}', causing me")@/
7986("to read past where you wanted me to stop.")@/
7987("I'll try to recover; but if the error is serious,")@/
7988("you'd better type `E' or `X' now and fix your file.");@/
7989error;
7990end
7991
7992@ The recovery procedure can't be fully understood without knowing more
7993about the \TeX\ routines that should be aborted, but we can sketch the
7994ideas here:  For a runaway definition we will insert a right brace; for a
7995runaway preamble, we will insert a special \.{\\cr} token and a right
7996brace; and for a runaway argument, we will set |long_state| to
7997|outer_call| and insert \.{\\par}.
7998
7999@<Print either `\.{definition}' or ...@>=
8000p:=get_avail;
8001case scanner_status of
8002defining:begin print("definition"); info(p):=right_brace_token+"}";
8003  end;
8004matching:begin print("use"); info(p):=par_token; long_state:=outer_call;
8005  end;
8006aligning:begin print("preamble"); info(p):=right_brace_token+"}"; q:=p;
8007  p:=get_avail; link(p):=q; info(p):=cs_token_flag+frozen_cr;
8008  align_state:=-1000000;
8009  end;
8010absorbing:begin print("text"); info(p):=right_brace_token+"}";
8011  end;
8012end; {there are no other cases}
8013ins_list(p)
8014
8015@ We need to mention a procedure here that may be called by |get_next|.
8016
8017@p procedure@?firm_up_the_line; forward;
8018
8019@ Now we're ready to take the plunge into |get_next| itself. Parts of
8020this routine are executed more often than any other instructions of \TeX.
8021@^mastication@>@^inner loop@>
8022
8023@d switch=25 {a label in |get_next|}
8024@d start_cs=26 {another}
8025@d not_exp=27
8026
8027@p procedure get_next; {sets |cur_cmd|, |cur_chr|, |cur_cs| to next token}
8028label restart, {go here to get the next input token}
8029  switch, {go here to eat the next character from a file}
8030  reswitch, {go here to digest it again}
8031  start_cs, {go here to start looking for a control sequence}
8032  found, {go here when a control sequence has been found}
8033  not_exp, {go here when ^^ turned out not to start an expanded code}
8034  exit; {go here when the next input token has been got}
8035var k:0..buf_size; {an index into |buffer|}
8036@!t:halfword; {a token}
8037@!cat:0..max_char_code; {|cat_code(cur_chr)|, usually}
8038@!c:UnicodeScalar; {constituent of a possible expanded code}
8039@!lower:UTF16_code; {lower surrogate of a possible UTF-16 compound}
8040@!d:small_number; {number of excess characters in an expanded code}
8041@!sup_count:small_number; {number of identical |sup_mark| characters}
8042begin restart: cur_cs:=0;
8043if state<>token_list then
8044@<Input from external file, |goto restart| if no input found@>
8045else @<Input from token list, |goto restart| if end of list or
8046  if a parameter needs to be expanded@>;
8047@<If an alignment entry has just ended, take appropriate action@>;
8048exit:end;
8049
8050@ An alignment entry ends when a tab or \.{\\cr} occurs, provided that the
8051current level of braces is the same as the level that was present at the
8052beginning of that alignment entry; i.e., provided that |align_state| has
8053returned to the value it had after the \<u_j> template for that entry.
8054@^inner loop@>
8055
8056@<If an alignment entry has just ended, take appropriate action@>=
8057if cur_cmd<=car_ret then if cur_cmd>=tab_mark then if align_state=0 then
8058  @<Insert the \(v)\<v_j> template and |goto restart|@>
8059
8060@ @<Input from external file, |goto restart| if no input found@>=
8061@^inner loop@>
8062begin switch: if loc<=limit then {current line not yet finished}
8063  begin cur_chr:=buffer[loc]; incr(loc);
8064  if (cur_chr >= @"D800) and (cur_chr < @"DC00)
8065  and (loc <= limit) and (buffer[loc] >= @"DC00) and (buffer[loc] < @"E000) then
8066    begin
8067      lower := buffer[loc] - @"DC00;
8068      incr(loc);
8069      cur_chr := @"10000 + (cur_chr - @"D800) * 1024 + lower;
8070    end;
8071  reswitch: cur_cmd:=cat_code(cur_chr);
8072  @<Change state if necessary, and |goto switch| if the
8073    current character should be ignored,
8074    or |goto reswitch| if the current character
8075    changes to another@>;
8076  end
8077else  begin state:=new_line;@/
8078  @<Move to next line of file,
8079    or |goto restart| if there is no next line,
8080    or |return| if a \.{\\read} line has finished@>;
8081  check_interrupt;
8082  goto switch;
8083  end;
8084end
8085
8086@ The following 48-way switch accomplishes the scanning quickly, assuming
8087that a decent \PASCAL\ compiler has translated the code. Note that the numeric
8088values for |mid_line|, |skip_blanks|, and |new_line| are spaced
8089apart from each other by |max_char_code+1|, so we can add a character's
8090command code to the state to get a single number that characterizes both.
8091
8092@d any_state_plus(#) == mid_line+#,skip_blanks+#,new_line+#
8093
8094@<Change state if necessary...@>=
8095case state+cur_cmd of
8096@<Cases where character is ignored@>: goto switch;
8097any_state_plus(escape): @<Scan a control sequence
8098  and set |state:=skip_blanks| or |mid_line|@>;
8099any_state_plus(active_char): @<Process an active-character control sequence
8100  and set |state:=mid_line|@>;
8101any_state_plus(sup_mark): @<If this |sup_mark| starts an expanded character
8102  like~\.{\^\^A} or~\.{\^\^df}, then |goto reswitch|,
8103  otherwise set |state:=mid_line|@>;
8104any_state_plus(invalid_char): @<Decry the invalid character and
8105  |goto restart|@>;
8106@t\4@>@<Handle situations involving spaces, braces, changes of state@>@;
8107othercases do_nothing
8108endcases
8109
8110@ @<Cases where character is ignored@>=
8111any_state_plus(ignore),skip_blanks+spacer,new_line+spacer
8112
8113@ We go to |restart| instead of to |switch|, because |state| might equal
8114|token_list| after the error has been dealt with
8115(cf.\ |clear_for_error_prompt|).
8116
8117@<Decry the invalid...@>=
8118begin print_err("Text line contains an invalid character");
8119@.Text line contains...@>
8120help2("A funny symbol that I can't read has just been input.")@/
8121("Continue, and I'll forget that it ever happened.");@/
8122deletions_allowed:=false; error; deletions_allowed:=true;
8123goto restart;
8124end
8125
8126@ @d add_delims_to(#)==#+math_shift,#+tab_mark,#+mac_param,
8127  #+sub_mark,#+letter,#+other_char
8128
8129@<Handle situations involving spaces, braces, changes of state@>=
8130mid_line+spacer:@<Enter |skip_blanks| state, emit a space@>;
8131mid_line+car_ret:@<Finish line, emit a space@>;
8132skip_blanks+car_ret,any_state_plus(comment):
8133  @<Finish line, |goto switch|@>;
8134new_line+car_ret:@<Finish line, emit a \.{\\par}@>;
8135mid_line+left_brace: incr(align_state);
8136skip_blanks+left_brace,new_line+left_brace: begin
8137  state:=mid_line; incr(align_state);
8138  end;
8139mid_line+right_brace: decr(align_state);
8140skip_blanks+right_brace,new_line+right_brace: begin
8141  state:=mid_line; decr(align_state);
8142  end;
8143add_delims_to(skip_blanks),add_delims_to(new_line): state:=mid_line;
8144
8145@ When a character of type |spacer| gets through, its character code is
8146changed to $\.{"\ "}=@'40$. This means that the ASCII codes for tab and space,
8147and for the space inserted at the end of a line, will
8148be treated alike when macro parameters are being matched. We do this
8149since such characters are indistinguishable on most computer terminal displays.
8150
8151@<Finish line, emit a space@>=
8152begin loc:=limit+1; cur_cmd:=spacer; cur_chr:=" ";
8153end
8154
8155@ The following code is performed only when |cur_cmd=spacer|.
8156
8157@<Enter |skip_blanks| state, emit a space@>=
8158begin state:=skip_blanks; cur_chr:=" ";
8159end
8160
8161@ @<Finish line, |goto switch|@>=
8162begin loc:=limit+1; goto switch;
8163end
8164
8165@ @<Finish line, emit a \.{\\par}@>=
8166begin loc:=limit+1; cur_cs:=par_loc; cur_cmd:=eq_type(cur_cs);
8167cur_chr:=equiv(cur_cs);
8168if cur_cmd>=outer_call then check_outer_validity;
8169end
8170
8171@ Notice that a code like \.{\^\^8} becomes \.x if not followed by a hex digit.
8172
8173@d is_hex(#)==(((#>="0")and(#<="9"))or((#>="a")and(#<="f")))
8174@d hex_to_cur_chr==
8175  if c<="9" then cur_chr:=c-"0" @+else cur_chr:=c-"a"+10;
8176  if cc<="9" then cur_chr:=16*cur_chr+cc-"0"
8177  else cur_chr:=16*cur_chr+cc-"a"+10
8178@d long_hex_to_cur_chr==
8179  if c<="9" then cur_chr:=c-"0" @+else cur_chr:=c-"a"+10;
8180  if cc<="9" then cur_chr:=16*cur_chr+cc-"0"
8181  else cur_chr:=16*cur_chr+cc-"a"+10;
8182  if ccc<="9" then cur_chr:=16*cur_chr+ccc-"0"
8183  else cur_chr:=16*cur_chr+ccc-"a"+10;
8184  if cccc<="9" then cur_chr:=16*cur_chr+cccc-"0"
8185  else cur_chr:=16*cur_chr+cccc-"a"+10
8186
8187@<If this |sup_mark| starts an expanded character...@>=
8188begin if cur_chr=buffer[loc] then if loc<limit then
8189  begin sup_count:=2;
8190  {we have |^^| and another char; check how many |^|s we have altogether, up to a max of 6}
8191  while (sup_count<6) and (loc+2*sup_count-2<=limit) and (cur_chr=buffer[loc+sup_count-1]) do
8192    incr(sup_count);
8193  {check whether we have enough hex chars for the number of |^|s}
8194  for d:=1 to sup_count do
8195    if not is_hex(buffer[loc+sup_count-2+d]) then {found a non-hex char, so do single |^^X| style}
8196      begin c:=buffer[loc+1];
8197      if c<@'200 then
8198        begin loc:=loc+2;
8199        if c<@'100 then cur_chr:=c+@'100 @+else cur_chr:=c-@'100;
8200        goto reswitch;
8201        end;
8202      goto not_exp;
8203      end;
8204  {there were the right number of hex chars, so convert them}
8205  cur_chr:=0;
8206  for d:=1 to sup_count do
8207    begin c:=buffer[loc+sup_count-2+d];
8208    if c<="9" then cur_chr:=16*cur_chr+c-"0"
8209    else cur_chr:=16*cur_chr+c-"a"+10;
8210    end;
8211  {check the resulting value is within the valid range}
8212  if cur_chr>biggest_usv then
8213    begin cur_chr:=buffer[loc];
8214    goto not_exp;
8215    end;
8216  loc:=loc+2*sup_count-1;
8217  goto reswitch;
8218  end;
8219not_exp:
8220state:=mid_line;
8221end
8222
8223@ @<Process an active-character...@>=
8224begin cur_cs:=cur_chr+active_base;
8225cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs); state:=mid_line;
8226if cur_cmd>=outer_call then check_outer_validity;
8227end
8228
8229@ Control sequence names are scanned only when they appear in some line of
8230a file; once they have been scanned the first time, their |eqtb| location
8231serves as a unique identification, so \TeX\ doesn't need to refer to the
8232original name any more except when it prints the equivalent in symbolic form.
8233
8234The program that scans a control sequence has been written carefully
8235in order to avoid the blowups that might otherwise occur if a malicious
8236user tried something like `\.{\\catcode\'15=0}'. The algorithm might
8237look at |buffer[limit+1]|, but it never looks at |buffer[limit+2]|.
8238
8239If expanded characters like `\.{\^\^A}' or `\.{\^\^df}'
8240appear in or just following
8241a control sequence name, they are converted to single characters in the
8242buffer and the process is repeated, slowly but surely.
8243
8244@<Scan a control...@>=
8245begin if loc>limit then cur_cs:=null_cs {|state| is irrelevant in this case}
8246else  begin start_cs: k:=loc; cur_chr:=buffer[k]; cat:=cat_code(cur_chr);
8247  incr(k);
8248  if cat=letter then state:=skip_blanks
8249  else if cat=spacer then state:=skip_blanks
8250  else state:=mid_line;
8251  if (cat=letter)and(k<=limit) then
8252    @<Scan ahead in the buffer until finding a nonletter;
8253    if an expanded code is encountered, reduce it
8254    and |goto start_cs|; otherwise if a multiletter control
8255    sequence is found, adjust |cur_cs| and |loc|, and
8256    |goto found|@>
8257  else @<If an expanded code is present, reduce it and |goto start_cs|@>;
8258  {At this point, we have a single-character cs name in the buffer.
8259   But if the character code is > @"FFFF, we treat it like a multiletter name
8260   for string purposes, because we use UTF-16 in the string pool.}
8261  if buffer[loc]>@"FFFF then begin
8262    cur_cs:=id_lookup(loc,1); incr(loc); goto found;
8263    end;
8264  cur_cs:=single_base+buffer[loc]; incr(loc);
8265  end;
8266found: cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs);
8267if cur_cmd>=outer_call then check_outer_validity;
8268end
8269
8270@ Whenever we reach the following piece of code, we will have
8271|cur_chr=buffer[k-1]| and |k<=limit+1| and |cat=cat_code(cur_chr)|. If an
8272expanded code like \.{\^\^A} or \.{\^\^df} appears in |buffer[(k-1)..(k+1)]|
8273or |buffer[(k-1)..(k+2)]|, we
8274will store the corresponding code in |buffer[k-1]| and shift the rest of
8275the buffer left two or three places.
8276
8277@<If an expanded...@>=
8278begin if (cat=sup_mark) and (buffer[k]=cur_chr) and (k<limit) then
8279  begin sup_count:=2;
8280  {we have |^^| and another char; check how many |^|s we have altogether, up to a max of 6}
8281  while (sup_count<6) and (k+2*sup_count-2<=limit) and (buffer[k+sup_count-1]=cur_chr) do
8282    incr(sup_count);
8283  {check whether we have enough hex chars for the number of |^|s}
8284  for d:=1 to sup_count do
8285    if not is_hex(buffer[k+sup_count-2+d]) then {found a non-hex char, so do single |^^X| style}
8286      begin c:=buffer[k+1];
8287      if c<@'200 then
8288        begin if c<@'100 then buffer[k-1]:=c+@'100 @+else buffer[k-1]:=c-@'100;
8289        d:=2; limit:=limit-d;
8290        while k<=limit do
8291          begin buffer[k]:=buffer[k+d]; incr(k);
8292          end;
8293        goto start_cs;
8294        end
8295      else sup_count:=0;
8296      end;
8297  if sup_count>0 then {there were the right number of hex chars, so convert them}
8298    begin cur_chr:=0;
8299    for d:=1 to sup_count do
8300      begin c:=buffer[k+sup_count-2+d];
8301      if c<="9" then cur_chr:=16*cur_chr+c-"0"
8302      else cur_chr:=16*cur_chr+c-"a"+10;
8303      end;
8304    {check the resulting value is within the valid range}
8305    if cur_chr>biggest_usv then cur_chr:=buffer[k]
8306    else  begin buffer[k-1]:=cur_chr;
8307      d:=2*sup_count-1;
8308      {shift the rest of the buffer left by |d| chars}
8309      limit:=limit-d;
8310      while k<=limit do
8311        begin buffer[k]:=buffer[k+d]; incr(k);
8312        end;
8313      goto start_cs;
8314      end
8315    end
8316  end
8317end
8318
8319@ @<Scan ahead in the buffer...@>=
8320begin repeat cur_chr:=buffer[k]; cat:=cat_code(cur_chr); incr(k);
8321until (cat<>letter)or(k>limit);
8322@<If an expanded...@>;
8323if cat<>letter then decr(k);
8324  {now |k| points to first nonletter}
8325if k>loc+1 then {multiletter control sequence has been scanned}
8326  begin cur_cs:=id_lookup(loc,k-loc); loc:=k; goto found;
8327  end;
8328end
8329
8330@ Let's consider now what happens when |get_next| is looking at a token list.
8331
8332@<Input from token list, |goto restart| if end of list or
8333  if a parameter needs to be expanded@>=
8334if loc<>null then {list not exhausted}
8335@^inner loop@>
8336  begin t:=info(loc); loc:=link(loc); {move to next}
8337  if t>=cs_token_flag then {a control sequence token}
8338    begin cur_cs:=t-cs_token_flag;
8339    cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs);
8340    if cur_cmd>=outer_call then
8341      if cur_cmd=dont_expand then
8342        @<Get the next token, suppressing expansion@>
8343      else check_outer_validity;
8344    end
8345  else  begin cur_cmd:=t div max_char_val; cur_chr:=t mod max_char_val;
8346    case cur_cmd of
8347    left_brace: incr(align_state);
8348    right_brace: decr(align_state);
8349    out_param: @<Insert macro parameter and |goto restart|@>;
8350    othercases do_nothing
8351    endcases;
8352    end;
8353  end
8354else  begin {we are done with this token list}
8355  end_token_list; goto restart; {resume previous level}
8356  end
8357
8358@ The present point in the program is reached only when the |expand|
8359routine has inserted a special marker into the input. In this special
8360case, |info(loc)| is known to be a control sequence token, and |link(loc)=null|.
8361
8362@d no_expand_flag=special_char {this characterizes a special variant of |relax|}
8363
8364@<Get the next token, suppressing expansion@>=
8365begin cur_cs:=info(loc)-cs_token_flag; loc:=null;@/
8366cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs);
8367if cur_cmd>max_command then
8368  begin cur_cmd:=relax; cur_chr:=no_expand_flag;
8369  end;
8370end
8371
8372@ @<Insert macro parameter...@>=
8373begin begin_token_list(param_stack[param_start+cur_chr-1],parameter);
8374goto restart;
8375end
8376
8377@ All of the easy branches of |get_next| have now been taken care of.
8378There is one more branch.
8379
8380@d end_line_char_inactive == (end_line_char<0)or(end_line_char>255)
8381
8382@<Move to next line of file, or |goto restart|...@>=
8383if name>17 then @<Read next line of file into |buffer|, or
8384  |goto restart| if the file has ended@>
8385else  begin if not terminal_input then {\.{\\read} line has ended}
8386    begin cur_cmd:=0; cur_chr:=0; return;
8387    end;
8388  if input_ptr>0 then {text was inserted during error recovery}
8389    begin end_file_reading; goto restart; {resume previous level}
8390    end;
8391  if selector<log_only then open_log_file;
8392  if interaction>nonstop_mode then
8393    begin if end_line_char_inactive then incr(limit);
8394    if limit=start then {previous line was empty}
8395      print_nl("(Please type a command or say `\end')");
8396@.Please type...@>
8397    print_ln; first:=start;
8398    prompt_input("*"); {input on-line into |buffer|}
8399@.*\relax@>
8400    limit:=last;
8401    if end_line_char_inactive then decr(limit)
8402    else  buffer[limit]:=end_line_char;
8403    first:=limit+1;
8404    loc:=start;
8405    end
8406  else fatal_error("*** (job aborted, no legal \end found)");
8407@.job aborted@>
8408    {nonstop mode, which is intended for overnight batch processing,
8409    never waits for on-line input}
8410  end
8411
8412@ The global variable |force_eof| is normally |false|; it is set |true|
8413by an \.{\\endinput} command.
8414
8415@<Glob...@>=
8416@!force_eof:boolean; {should the next \.{\\input} be aborted early?}
8417
8418@ @<Read next line of file into |buffer|, or
8419  |goto restart| if the file has ended@>=
8420begin incr(line); first:=start;
8421if not force_eof then
8422  if name<=19 then
8423    begin if pseudo_input then {not end of file}
8424      firm_up_the_line {this sets |limit|}
8425    else if (every_eof<>null)and not eof_seen[index] then
8426      begin limit:=first-1; eof_seen[index]:=true; {fake one empty line}
8427      begin_token_list(every_eof,every_eof_text); goto restart;
8428      end
8429    else force_eof:=true;
8430    end
8431  else
8432  begin if input_ln(cur_file,true) then {not end of file}
8433    firm_up_the_line {this sets |limit|}
8434  else if (every_eof<>null)and not eof_seen[index] then
8435    begin limit:=first-1; eof_seen[index]:=true; {fake one empty line}
8436    begin_token_list(every_eof,every_eof_text); goto restart;
8437    end
8438  else force_eof:=true;
8439  end;
8440if force_eof then
8441  begin if tracing_nesting>0 then
8442    if (grp_stack[in_open]<>cur_boundary)or@|
8443        (if_stack[in_open]<>cond_ptr) then file_warning;
8444    {give warning for some unfinished groups and/or conditionals}
8445  if name>=19 then
8446  begin print_char(")"); decr(open_parens);
8447  update_terminal; {show user that file has been read}
8448  end;
8449  force_eof:=false;
8450  end_file_reading; {resume previous level}
8451  check_outer_validity; goto restart;
8452  end;
8453if end_line_char_inactive then decr(limit)
8454else  buffer[limit]:=end_line_char;
8455first:=limit+1; loc:=start; {ready to read}
8456end
8457
8458@ If the user has set the |pausing| parameter to some positive value,
8459and if nonstop mode has not been selected, each line of input is displayed
8460on the terminal and the transcript file, followed by `\.{=>}'.
8461\TeX\ waits for a response. If the response is simply |carriage_return|, the
8462line is accepted as it stands, otherwise the line typed is
8463used instead of the line in the file.
8464
8465@p procedure firm_up_the_line;
8466var k:0..buf_size; {an index into |buffer|}
8467begin limit:=last;
8468if pausing>0 then if interaction>nonstop_mode then
8469  begin wake_up_terminal; print_ln;
8470  if start<limit then for k:=start to limit-1 do print(buffer[k]);
8471  first:=limit; prompt_input("=>"); {wait for user response}
8472@.=>@>
8473  if last>first then
8474    begin for k:=first to last-1 do {move line down in buffer}
8475      buffer[k+start-first]:=buffer[k];
8476    limit:=start+last-first;
8477    end;
8478  end;
8479end;
8480
8481@ Since |get_next| is used so frequently in \TeX, it is convenient
8482to define three related procedures that do a little more:
8483
8484\yskip\hang|get_token| not only sets |cur_cmd| and |cur_chr|, it
8485also sets |cur_tok|, a packed halfword version of the current token.
8486
8487\yskip\hang|get_x_token|, meaning ``get an expanded token,'' is like
8488|get_token|, but if the current token turns out to be a user-defined
8489control sequence (i.e., a macro call), or a conditional,
8490or something like \.{\\topmark} or \.{\\expandafter} or \.{\\csname},
8491it is eliminated from the input by beginning the expansion of the macro
8492or the evaluation of the conditional.
8493
8494\yskip\hang|x_token| is like |get_x_token| except that it assumes that
8495|get_next| has already been called.
8496
8497\yskip\noindent
8498In fact, these three procedures account for almost every use of |get_next|.
8499
8500@ No new control sequences will be defined except during a call of
8501|get_token|, or when \.{\\csname} compresses a token list, because
8502|no_new_control_sequence| is always |true| at other times.
8503
8504@p procedure get_token; {sets |cur_cmd|, |cur_chr|, |cur_tok|}
8505begin no_new_control_sequence:=false; get_next; no_new_control_sequence:=true;
8506@^inner loop@>
8507if cur_cs=0 then cur_tok:=(cur_cmd*max_char_val)+cur_chr
8508else cur_tok:=cs_token_flag+cur_cs;
8509end;
8510
8511@* \[25] Expanding the next token.
8512Only a dozen or so command codes |>max_command| can possibly be returned by
8513|get_next|; in increasing order, they are |undefined_cs|, |expand_after|,
8514|no_expand|, |input|, |if_test|, |fi_or_else|, |cs_name|, |convert|, |the|,
8515|top_bot_mark|, |call|, |long_call|, |outer_call|, |long_outer_call|, and
8516|end_template|.{\emergencystretch=40pt\par}
8517
8518The |expand| subroutine is used when |cur_cmd>max_command|. It removes a
8519``call'' or a conditional or one of the other special operations just
8520listed.  It follows that |expand| might invoke itself recursively. In all
8521cases, |expand| destroys the current token, but it sets things up so that
8522the next |get_next| will deliver the appropriate next token. The value of
8523|cur_tok| need not be known when |expand| is called.
8524
8525Since several of the basic scanning routines communicate via global variables,
8526their values are saved as local variables of |expand| so that
8527recursive calls don't invalidate them.
8528@^recursion@>
8529
8530@p@t\4@>@<Declare the procedure called |macro_call|@>@;@/
8531@t\4@>@<Declare the procedure called |insert_relax|@>@;@/
8532@t\4@>@<Declare \eTeX\ procedures for expanding@>@;@/
8533procedure@?pass_text; forward;@t\2@>
8534procedure@?start_input; forward;@t\2@>
8535procedure@?conditional; forward;@t\2@>
8536procedure@?get_x_token; forward;@t\2@>
8537procedure@?conv_toks; forward;@t\2@>
8538procedure@?ins_the_toks; forward;@t\2@>
8539procedure expand;
8540label reswitch;
8541var t:halfword; {token that is being ``expanded after''}
8542@!b:boolean; {keep track of nested csnames}
8543@!p,@!q,@!r:pointer; {for list manipulation}
8544@!j:0..buf_size; {index into |buffer|}
8545@!cv_backup:integer; {to save the global quantity |cur_val|}
8546@!cvl_backup,@!radix_backup,@!co_backup:small_number;
8547  {to save |cur_val_level|, etc.}
8548@!backup_backup:pointer; {to save |link(backup_head)|}
8549@!save_scanner_status:small_number; {temporary storage of |scanner_status|}
8550begin cv_backup:=cur_val; cvl_backup:=cur_val_level; radix_backup:=radix;
8551co_backup:=cur_order; backup_backup:=link(backup_head);
8552reswitch:
8553if cur_cmd<call then @<Expand a nonmacro@>
8554else if cur_cmd<end_template then macro_call
8555else @<Insert a token containing |frozen_endv|@>;
8556cur_val:=cv_backup; cur_val_level:=cvl_backup; radix:=radix_backup;
8557cur_order:=co_backup; link(backup_head):=backup_backup;
8558end;
8559
8560@ @<Glob...@>=
8561@!is_in_csname: boolean;
8562
8563@ @<Set init...@>=
8564is_in_csname:=false;
8565
8566@ @<Expand a nonmacro@>=
8567begin if tracing_commands>1 then show_cur_cmd_chr;
8568case cur_cmd of
8569top_bot_mark:@<Insert the \(a)appropriate mark text into the scanner@>;
8570expand_after:if cur_chr=0 then @<Expand the token after the next token@>
8571  else @<Negate a boolean conditional and |goto reswitch|@>;
8572no_expand: if cur_chr=0 then @<Suppress expansion of the next token@>
8573  else @<Implement \.{\\primitive}@>;
8574cs_name:@<Manufacture a control sequence name@>;
8575convert:conv_toks; {this procedure is discussed in Part 27 below}
8576the:ins_the_toks; {this procedure is discussed in Part 27 below}
8577if_test:conditional; {this procedure is discussed in Part 28 below}
8578fi_or_else:@<Terminate the current conditional and skip to \.{\\fi}@>;
8579input:@<Initiate or terminate input from a file@>;
8580othercases @<Complain about an undefined macro@>
8581endcases;
8582end
8583
8584@ It takes only a little shuffling to do what \TeX\ calls \.{\\expandafter}.
8585
8586@<Expand the token after...@>=
8587begin get_token; t:=cur_tok; get_token;
8588if cur_cmd>max_command then expand@+else back_input;
8589cur_tok:=t; back_input;
8590end
8591
8592@ The implementation of \.{\\noexpand} is a bit trickier, because it is
8593necessary to insert a special `|dont_expand|' marker into \TeX's reading
8594mechanism.  This special marker is processed by |get_next|, but it does
8595not slow down the inner loop.
8596
8597Since \.{\\outer} macros might arise here, we must also
8598clear the |scanner_status| temporarily.
8599
8600@<Suppress expansion...@>=
8601begin save_scanner_status:=scanner_status; scanner_status:=normal;
8602get_token; scanner_status:=save_scanner_status; t:=cur_tok;
8603back_input; {now |start| and |loc| point to the backed-up token |t|}
8604if t>=cs_token_flag then
8605  begin p:=get_avail; info(p):=cs_token_flag+frozen_dont_expand;
8606  link(p):=loc; start:=p; loc:=p;
8607  end;
8608end
8609
8610@ The \.{\\primitive} handling. If the primitive meaning of the next
8611token is an expandable command, it suffices to replace the current
8612token with the primitive one and restart |expand|/
8613
8614Otherwise, the token we just read has to be pushed back, as well
8615as a token matching the internal form of \.{\\primitive}, that is
8616sneaked in as an alternate form of |ignore_spaces|.
8617@!@:primitive_}{\.{\\primitive} primitive (internalized)@>
8618
8619Simply pushing back a token that matches the correct internal command
8620does not work, because approach would not survive roundtripping to a
8621temporary file.
8622
8623@<Implement \.{\\primitive}@>=
8624begin save_scanner_status:=scanner_status; scanner_status:=normal;
8625get_token; scanner_status:=save_scanner_status;
8626if cur_cs < hash_base then
8627  cur_cs:=prim_lookup(cur_cs-257)
8628else
8629  cur_cs:=prim_lookup(text(cur_cs));
8630if cur_cs<>undefined_primitive then begin
8631  t:=prim_eq_type(cur_cs);
8632  if t>max_command then begin
8633    cur_cmd:=t;
8634    cur_chr:=prim_equiv(cur_cs);
8635    cur_tok:=(cur_cmd*max_char_val)+cur_chr;
8636    cur_cs :=0;
8637    goto reswitch;
8638    end
8639  else begin
8640    back_input; { now |loc| and |start| point to a one-item list }
8641    p:=get_avail; info(p):=cs_token_flag+frozen_primitive;
8642    link(p):=loc; loc:=p; start:=p;
8643    end;
8644  end;
8645end
8646
8647@ @<Complain about an undefined macro@>=
8648begin print_err("Undefined control sequence");
8649@.Undefined control sequence@>
8650help5("The control sequence at the end of the top line")@/
8651("of your error message was never \def'ed. If you have")@/
8652("misspelled it (e.g., `\hobx'), type `I' and the correct")@/
8653("spelling (e.g., `I\hbox'). Otherwise just continue,")@/
8654("and I'll forget about whatever was undefined.");
8655error;
8656end
8657
8658@ The |expand| procedure and some other routines that construct token
8659lists find it convenient to use the following macros, which are valid only if
8660the variables |p| and |q| are reserved for token-list building.
8661
8662@d store_new_token(#)==begin q:=get_avail; link(p):=q; info(q):=#;
8663  p:=q; {|link(p)| is |null|}
8664  end
8665@d fast_store_new_token(#)==begin fast_get_avail(q); link(p):=q; info(q):=#;
8666  p:=q; {|link(p)| is |null|}
8667  end
8668
8669@ @<Manufacture a control...@>=
8670begin r:=get_avail; p:=r; {head of the list of characters}
8671b:=is_in_csname; is_in_csname:=true;
8672repeat get_x_token;
8673if cur_cs=0 then store_new_token(cur_tok);
8674until cur_cs<>0;
8675if cur_cmd<>end_cs_name then @<Complain about missing \.{\\endcsname}@>;
8676is_in_csname:=b;
8677@<Look up the characters of list |r| in the hash table, and set |cur_cs|@>;
8678flush_list(r);
8679if eq_type(cur_cs)=undefined_cs then
8680  begin eq_define(cur_cs,relax,too_big_usv);
8681        {N.B.: The |save_stack| might change}
8682  end; {the control sequence will now match `\.{\\relax}'}
8683cur_tok:=cur_cs+cs_token_flag; back_input;
8684end
8685
8686@ @<Complain about missing \.{\\endcsname}@>=
8687begin print_err("Missing "); print_esc("endcsname"); print(" inserted");
8688@.Missing \\endcsname...@>
8689help2("The control sequence marked <to be read again> should")@/
8690  ("not appear between \csname and \endcsname.");
8691back_error;
8692end
8693
8694@ @<Look up the characters of list |r| in the hash table...@>=
8695j:=first; p:=link(r);
8696while p<>null do
8697  begin if j>=max_buf_stack then
8698    begin max_buf_stack:=j+1;
8699    if max_buf_stack=buf_size then
8700      overflow("buffer size",buf_size);
8701@:TeX capacity exceeded buffer size}{\quad buffer size@>
8702    end;
8703  buffer[j]:=info(p) mod max_char_val; incr(j); p:=link(p);
8704  end;
8705if (j>first+1) or (buffer[first]>@"FFFF) then
8706  begin no_new_control_sequence:=false; cur_cs:=id_lookup(first,j-first);
8707  no_new_control_sequence:=true;
8708  end
8709else if j=first then cur_cs:=null_cs {the list is empty}
8710else cur_cs:=single_base+buffer[first] {the list has length one}
8711
8712@ An |end_template| command is effectively changed to an |endv| command
8713by the following code. (The reason for this is discussed below; the
8714|frozen_end_template| at the end of the template has passed the
8715|check_outer_validity| test, so its mission of error detection has been
8716accomplished.)
8717
8718@<Insert a token containing |frozen_endv|@>=
8719begin cur_tok:=cs_token_flag+frozen_endv; back_input;
8720end
8721
8722@ The processing of \.{\\input} involves the |start_input| subroutine,
8723which will be declared later; the processing of \.{\\endinput} is trivial.
8724
8725@<Put each...@>=
8726primitive("input",input,0);@/
8727@!@:input_}{\.{\\input} primitive@>
8728primitive("endinput",input,1);@/
8729@!@:end_input_}{\.{\\endinput} primitive@>
8730
8731@ @<Cases of |print_cmd_chr|...@>=
8732input: if chr_code=0 then print_esc("input")
8733  @/@<Cases of |input| for |print_cmd_chr|@>@/
8734  else print_esc("endinput");
8735
8736@ @<Initiate or terminate input...@>=
8737if cur_chr=1 then force_eof:=true
8738@/@<Cases for |input|@>@/
8739else if name_in_progress then insert_relax
8740else start_input
8741
8742@ Sometimes the expansion looks too far ahead, so we want to insert
8743a harmless \.{\\relax} into the user's input.
8744
8745@<Declare the procedure called |insert_relax|@>=
8746procedure insert_relax;
8747begin cur_tok:=cs_token_flag+cur_cs; back_input;
8748cur_tok:=cs_token_flag+frozen_relax; back_input; token_type:=inserted;
8749end;
8750
8751@ Here is a recursive procedure that is \TeX's usual way to get the
8752next token of input. It has been slightly optimized to take account of
8753common cases.
8754
8755@p procedure get_x_token; {sets |cur_cmd|, |cur_chr|, |cur_tok|,
8756  and expands macros}
8757label restart,done;
8758begin restart: get_next;
8759@^inner loop@>
8760if cur_cmd<=max_command then goto done;
8761if cur_cmd>=call then
8762  if cur_cmd<end_template then macro_call
8763  else  begin cur_cs:=frozen_endv; cur_cmd:=endv;
8764    goto done; {|cur_chr=null_list|}
8765    end
8766else expand;
8767goto restart;
8768done: if cur_cs=0 then cur_tok:=(cur_cmd*max_char_val)+cur_chr
8769else cur_tok:=cs_token_flag+cur_cs;
8770end;
8771
8772@ The |get_x_token| procedure is equivalent to two consecutive
8773procedure calls: |get_next; x_token|.
8774
8775@p procedure x_token; {|get_x_token| without the initial |get_next|}
8776begin while cur_cmd>max_command do
8777  begin expand;
8778  get_next;
8779  end;
8780if cur_cs=0 then cur_tok:=(cur_cmd*max_char_val)+cur_chr
8781else cur_tok:=cs_token_flag+cur_cs;
8782end;
8783
8784@ A control sequence that has been \.{\\def}'ed by the user is expanded by
8785\TeX's |macro_call| procedure.
8786
8787Before we get into the details of |macro_call|, however, let's consider the
8788treatment of primitives like \.{\\topmark}, since they are essentially
8789macros without parameters. The token lists for such marks are kept in a
8790global array of five pointers; we refer to the individual entries of this
8791array by symbolic names |top_mark|, etc. The value of |top_mark| is either
8792|null| or a pointer to the reference count of a token list.
8793
8794@d marks_code==5 {add this for \.{\\topmarks} etc.}
8795@#
8796@d top_mark_code=0 {the mark in effect at the previous page break}
8797@d first_mark_code=1 {the first mark between |top_mark| and |bot_mark|}
8798@d bot_mark_code=2 {the mark in effect at the current page break}
8799@d split_first_mark_code=3 {the first mark found by \.{\\vsplit}}
8800@d split_bot_mark_code=4 {the last mark found by \.{\\vsplit}}
8801@d top_mark==cur_mark[top_mark_code]
8802@d first_mark==cur_mark[first_mark_code]
8803@d bot_mark==cur_mark[bot_mark_code]
8804@d split_first_mark==cur_mark[split_first_mark_code]
8805@d split_bot_mark==cur_mark[split_bot_mark_code]
8806
8807@<Glob...@>=
8808@!cur_mark:array[top_mark_code..split_bot_mark_code] of pointer;
8809  {token lists for marks}
8810
8811@ @<Set init...@>=
8812top_mark:=null; first_mark:=null; bot_mark:=null;
8813split_first_mark:=null; split_bot_mark:=null;
8814
8815@ @<Put each...@>=
8816primitive("topmark",top_bot_mark,top_mark_code);
8817@!@:top_mark_}{\.{\\topmark} primitive@>
8818primitive("firstmark",top_bot_mark,first_mark_code);
8819@!@:first_mark_}{\.{\\firstmark} primitive@>
8820primitive("botmark",top_bot_mark,bot_mark_code);
8821@!@:bot_mark_}{\.{\\botmark} primitive@>
8822primitive("splitfirstmark",top_bot_mark,split_first_mark_code);
8823@!@:split_first_mark_}{\.{\\splitfirstmark} primitive@>
8824primitive("splitbotmark",top_bot_mark,split_bot_mark_code);
8825@!@:split_bot_mark_}{\.{\\splitbotmark} primitive@>
8826
8827@ @<Cases of |print_cmd_chr|...@>=
8828top_bot_mark: begin case (chr_code mod marks_code) of
8829  first_mark_code: print_esc("firstmark");
8830  bot_mark_code: print_esc("botmark");
8831  split_first_mark_code: print_esc("splitfirstmark");
8832  split_bot_mark_code: print_esc("splitbotmark");
8833  othercases print_esc("topmark")
8834  endcases;
8835  if chr_code>=marks_code then print_char("s");
8836  end;
8837
8838@ The following code is activated when |cur_cmd=top_bot_mark| and
8839when |cur_chr| is a code like |top_mark_code|.
8840
8841@<Insert the \(a)appropriate mark text into the scanner@>=
8842begin t:=cur_chr mod marks_code;
8843if cur_chr>=marks_code then scan_register_num@+else cur_val:=0;
8844if cur_val=0 then cur_ptr:=cur_mark[t]
8845else @<Compute the mark pointer for mark type |t| and class |cur_val|@>;
8846if cur_ptr<>null then begin_token_list(cur_ptr,mark_text);
8847end
8848
8849@ Now let's consider |macro_call| itself, which is invoked when \TeX\ is
8850scanning a control sequence whose |cur_cmd| is either |call|, |long_call|,
8851|outer_call|, or |long_outer_call|.  The control sequence definition
8852appears in the token list whose reference count is in location |cur_chr|
8853of |mem|.
8854
8855The global variable |long_state| will be set to |call| or to |long_call|,
8856depending on whether or not the control sequence disallows \.{\\par}
8857in its parameters. The |get_next| routine will set |long_state| to
8858|outer_call| and emit \.{\\par}, if a file ends or if an \.{\\outer}
8859control sequence occurs in the midst of an argument.
8860
8861@<Glob...@>=
8862@!long_state:call..long_outer_call; {governs the acceptance of \.{\\par}}
8863
8864@ The parameters, if any, must be scanned before the macro is expanded.
8865Parameters are token lists without reference counts. They are placed on
8866an auxiliary stack called |pstack| while they are being scanned, since
8867the |param_stack| may be losing entries during the matching process.
8868(Note that |param_stack| can't be gaining entries, since |macro_call| is
8869the only routine that puts anything onto |param_stack|, and it
8870is not recursive.)
8871
8872@<Glob...@>=
8873@!pstack:array[0..8] of pointer; {arguments supplied to a macro}
8874
8875@ After parameter scanning is complete, the parameters are moved to the
8876|param_stack|. Then the macro body is fed to the scanner; in other words,
8877|macro_call| places the defined text of the control sequence at the
8878top of\/ \TeX's input stack, so that |get_next| will proceed to read it
8879next.
8880
8881The global variable |cur_cs| contains the |eqtb| address of the control sequence
8882being expanded, when |macro_call| begins. If this control sequence has not been
8883declared \.{\\long}, i.e., if its command code in the |eq_type| field is
8884not |long_call| or |long_outer_call|, its parameters are not allowed to contain
8885the control sequence \.{\\par}. If an illegal \.{\\par} appears, the macro
8886call is aborted, and the \.{\\par} will be rescanned.
8887
8888@<Declare the procedure called |macro_call|@>=
8889procedure macro_call; {invokes a user-defined control sequence}
8890label exit, continue, done, done1, found;
8891var r:pointer; {current node in the macro's token list}
8892@!p:pointer; {current node in parameter token list being built}
8893@!q:pointer; {new node being put into the token list}
8894@!s:pointer; {backup pointer for parameter matching}
8895@!t:pointer; {cycle pointer for backup recovery}
8896@!u,@!v:pointer; {auxiliary pointers for backup recovery}
8897@!rbrace_ptr:pointer; {one step before the last |right_brace| token}
8898@!n:small_number; {the number of parameters scanned}
8899@!unbalance:halfword; {unmatched left braces in current parameter}
8900@!m:halfword; {the number of tokens or groups (usually)}
8901@!ref_count:pointer; {start of the token list}
8902@!save_scanner_status:small_number; {|scanner_status| upon entry}
8903@!save_warning_index:pointer; {|warning_index| upon entry}
8904@!match_chr:ASCII_code; {character used in parameter}
8905begin save_scanner_status:=scanner_status; save_warning_index:=warning_index;
8906warning_index:=cur_cs; ref_count:=cur_chr; r:=link(ref_count); n:=0;
8907if tracing_macros>0 then @<Show the text of the macro being expanded@>;
8908if info(r)=protected_token then r:=link(r);
8909if info(r)<>end_match_token then
8910  @<Scan the parameters and make |link(r)| point to the macro body; but
8911    |return| if an illegal \.{\\par} is detected@>;
8912@<Feed the macro body and its parameters to the scanner@>;
8913exit:scanner_status:=save_scanner_status; warning_index:=save_warning_index;
8914end;
8915
8916@ Before we put a new token list on the input stack, it is wise to clean off
8917all token lists that have recently been depleted. Then a user macro that ends
8918with a call to itself will not require unbounded stack space.
8919
8920@<Feed the macro body and its parameters to the scanner@>=
8921while (state=token_list)and(loc=null)and(token_type<>v_template) do
8922  end_token_list; {conserve stack space}
8923begin_token_list(ref_count,macro); name:=warning_index; loc:=link(r);
8924if n>0 then
8925  begin if param_ptr+n>max_param_stack then
8926    begin max_param_stack:=param_ptr+n;
8927    if max_param_stack>param_size then
8928      overflow("parameter stack size",param_size);
8929@:TeX capacity exceeded parameter stack size}{\quad parameter stack size@>
8930    end;
8931  for m:=0 to n-1 do param_stack[param_ptr+m]:=pstack[m];
8932  param_ptr:=param_ptr+n;
8933  end
8934
8935@ At this point, the reader will find it advisable to review the explanation
8936of token list format that was presented earlier, since many aspects of that
8937format are of importance chiefly in the |macro_call| routine.
8938
8939The token list might begin with a string of compulsory tokens before the
8940first |match| or |end_match|. In that case the macro name is supposed to be
8941followed by those tokens; the following program will set |s=null| to
8942represent this restriction. Otherwise |s| will be set to the first token of
8943a string that will delimit the next parameter.
8944
8945@<Scan the parameters and make |link(r)| point to the macro body...@>=
8946begin scanner_status:=matching; unbalance:=0;
8947long_state:=eq_type(cur_cs);
8948if long_state>=outer_call then long_state:=long_state-2;
8949repeat link(temp_head):=null;
8950if (info(r)>=end_match_token)or(info(r)<match_token) then s:=null
8951else  begin match_chr:=info(r)-match_token; s:=link(r); r:=s;
8952  p:=temp_head; m:=0;
8953  end;
8954@<Scan a parameter until its delimiter string has been found; or, if |s=null|,
8955  simply scan the delimiter string@>;@/
8956{now |info(r)| is a token whose command code is either |match| or |end_match|}
8957until info(r)=end_match_token;
8958end
8959
8960@ If |info(r)| is a |match| or |end_match| command, it cannot be equal to
8961any token found by |get_token|. Therefore an undelimited parameter---i.e.,
8962a |match| that is immediately followed by |match| or |end_match|---will
8963always fail the test `|cur_tok=info(r)|' in the following algorithm.
8964
8965@<Scan a parameter until its delimiter string has been found; or, ...@>=
8966continue: get_token; {set |cur_tok| to the next token of input}
8967if cur_tok=info(r) then
8968  @<Advance \(r)|r|; |goto found| if the parameter delimiter has been
8969    fully matched, otherwise |goto continue|@>;
8970@<Contribute the recently matched tokens to the current parameter, and
8971  |goto continue| if a partial match is still in effect;
8972  but abort if |s=null|@>;
8973if cur_tok=par_token then if long_state<>long_call then
8974  @<Report a runaway argument and abort@>;
8975if cur_tok<right_brace_limit then
8976  if cur_tok<left_brace_limit then
8977    @<Contribute an entire group to the current parameter@>
8978  else @<Report an extra right brace and |goto continue|@>
8979else @<Store the current token, but |goto continue| if it is
8980   a blank space that would become an undelimited parameter@>;
8981incr(m);
8982if info(r)>end_match_token then goto continue;
8983if info(r)<match_token then goto continue;
8984found: if s<>null then @<Tidy up the parameter just scanned, and tuck it away@>
8985
8986@ @<Store the current token, but |goto continue| if it is...@>=
8987begin if cur_tok=space_token then
8988  if info(r)<=end_match_token then
8989    if info(r)>=match_token then goto continue;
8990store_new_token(cur_tok);
8991end
8992
8993@ A slightly subtle point arises here: When the parameter delimiter ends
8994with `\.{\#\{}', the token list will have a left brace both before and
8995after the |end_match|\kern-.4pt. Only one of these should affect the
8996|align_state|, but both will be scanned, so we must make a correction.
8997
8998@<Advance \(r)|r|; |goto found| if the parameter delimiter has been fully...@>=
8999begin r:=link(r);
9000if (info(r)>=match_token)and(info(r)<=end_match_token) then
9001  begin if cur_tok<left_brace_limit then decr(align_state);
9002  goto found;
9003  end
9004else goto continue;
9005end
9006
9007@ @<Report an extra right brace and |goto continue|@>=
9008begin back_input; print_err("Argument of "); sprint_cs(warning_index);
9009@.Argument of \\x has...@>
9010print(" has an extra }");
9011help6("I've run across a `}' that doesn't seem to match anything.")@/
9012  ("For example, `\def\a#1{...}' and `\a}' would produce")@/
9013  ("this error. If you simply proceed now, the `\par' that")@/
9014  ("I've just inserted will cause me to report a runaway")@/
9015  ("argument that might be the root of the problem. But if")@/
9016  ("your `}' was spurious, just type `2' and it will go away.");
9017incr(align_state); long_state:=call; cur_tok:=par_token; ins_error;
9018goto continue;
9019end {a white lie; the \.{\\par} won't always trigger a runaway}
9020
9021@ If |long_state=outer_call|, a runaway argument has already been reported.
9022
9023@<Report a runaway argument and abort@>=
9024begin if long_state=call then
9025  begin runaway; print_err("Paragraph ended before ");
9026@.Paragraph ended before...@>
9027  sprint_cs(warning_index); print(" was complete");
9028  help3("I suspect you've forgotten a `}', causing me to apply this")@/
9029    ("control sequence to too much text. How can we recover?")@/
9030    ("My plan is to forget the whole thing and hope for the best.");
9031  back_error;
9032  end;
9033pstack[n]:=link(temp_head); align_state:=align_state-unbalance;
9034for m:=0 to n do flush_list(pstack[m]);
9035return;
9036end
9037
9038@ When the following code becomes active, we have matched tokens from |s| to
9039the predecessor of |r|, and we have found that |cur_tok<>info(r)|. An
9040interesting situation now presents itself: If the parameter is to be
9041delimited by a string such as `\.{ab}', and if we have scanned `\.{aa}',
9042we want to contribute one `\.a' to the current parameter and resume
9043looking for a `\.b'. The program must account for such partial matches and
9044for others that can be quite complex.  But most of the time we have |s=r|
9045and nothing needs to be done.
9046
9047Incidentally, it is possible for \.{\\par} tokens to sneak in to certain
9048parameters of non-\.{\\long} macros. For example, consider a case like
9049`\.{\\def\\a\#1\\par!\{...\}}' where the first \.{\\par} is not followed
9050by an exclamation point. In such situations it does not seem appropriate
9051to prohibit the \.{\\par}, so \TeX\ keeps quiet about this bending of
9052the rules.
9053
9054@<Contribute the recently matched tokens to the current parameter...@>=
9055if s<>r then
9056  if s=null then @<Report an improper use of the macro and abort@>
9057  else  begin t:=s;
9058    repeat store_new_token(info(t)); incr(m); u:=link(t); v:=s;
9059    loop@+  begin if u=r then
9060        if cur_tok<>info(v) then goto done
9061        else  begin r:=link(v); goto continue;
9062          end;
9063      if info(u)<>info(v) then goto done;
9064      u:=link(u); v:=link(v);
9065      end;
9066    done: t:=link(t);
9067    until t=r;
9068    r:=s; {at this point, no tokens are recently matched}
9069    end
9070
9071@ @<Report an improper use...@>=
9072begin print_err("Use of "); sprint_cs(warning_index);
9073@.Use of x doesn't match...@>
9074print(" doesn't match its definition");
9075help4("If you say, e.g., `\def\a1{...}', then you must always")@/
9076  ("put `1' after `\a', since control sequence names are")@/
9077  ("made up of letters only. The macro here has not been")@/
9078  ("followed by the required stuff, so I'm ignoring it.");
9079error; return;
9080end
9081
9082@ @<Contribute an entire group to the current parameter@>=
9083begin unbalance:=1;
9084@^inner loop@>
9085loop@+  begin fast_store_new_token(cur_tok); get_token;
9086  if cur_tok=par_token then if long_state<>long_call then
9087    @<Report a runaway argument and abort@>;
9088  if cur_tok<right_brace_limit then
9089    if cur_tok<left_brace_limit then incr(unbalance)
9090    else  begin decr(unbalance);
9091      if unbalance=0 then goto done1;
9092      end;
9093  end;
9094done1: rbrace_ptr:=p; store_new_token(cur_tok);
9095end
9096
9097@ If the parameter consists of a single group enclosed in braces, we must
9098strip off the enclosing braces. That's why |rbrace_ptr| was introduced.
9099
9100@<Tidy up the parameter just scanned, and tuck it away@>=
9101begin if (m=1)and(info(p)<right_brace_limit)and(p<>temp_head) then
9102  begin link(rbrace_ptr):=null; free_avail(p);
9103  p:=link(temp_head); pstack[n]:=link(p); free_avail(p);
9104  end
9105else pstack[n]:=link(temp_head);
9106incr(n);
9107if tracing_macros>0 then
9108  begin begin_diagnostic; print_nl(match_chr); print_int(n);
9109  print("<-"); show_token_list(pstack[n-1],null,1000);
9110  end_diagnostic(false);
9111  end;
9112end
9113
9114@ @<Show the text of the macro being expanded@>=
9115begin begin_diagnostic; print_ln; print_cs(warning_index);
9116token_show(ref_count); end_diagnostic(false);
9117end
9118
9119@* \[26] Basic scanning subroutines.
9120Let's turn now to some procedures that \TeX\ calls upon frequently to digest
9121certain kinds of patterns in the input. Most of these are quite simple;
9122some are quite elaborate. Almost all of the routines call |get_x_token|,
9123which can cause them to be invoked recursively.
9124@^stomach@>
9125@^recursion@>
9126
9127@ The |scan_left_brace| routine is called when a left brace is supposed to be
9128the next non-blank token. (The term ``left brace'' means, more precisely,
9129a character whose catcode is |left_brace|.) \TeX\ allows \.{\\relax} to
9130appear before the |left_brace|.
9131
9132@p procedure scan_left_brace; {reads a mandatory |left_brace|}
9133begin @<Get the next non-blank non-relax non-call token@>;
9134if cur_cmd<>left_brace then
9135  begin print_err("Missing { inserted");
9136@.Missing \{ inserted@>
9137  help4("A left brace was mandatory here, so I've put one in.")@/
9138    ("You might want to delete and/or insert some corrections")@/
9139    ("so that I will find a matching right brace soon.")@/
9140    ("(If you're confused by all this, try typing `I}' now.)");
9141  back_error; cur_tok:=left_brace_token+"{"; cur_cmd:=left_brace;
9142  cur_chr:="{"; incr(align_state);
9143  end;
9144end;
9145
9146@ @<Get the next non-blank non-relax non-call token@>=
9147repeat get_x_token;
9148until (cur_cmd<>spacer)and(cur_cmd<>relax)
9149
9150@ The |scan_optional_equals| routine looks for an optional `\.=' sign preceded
9151by optional spaces; `\.{\\relax}' is not ignored here.
9152
9153@p procedure scan_optional_equals;
9154begin  @<Get the next non-blank non-call token@>;
9155if cur_tok<>other_token+"=" then back_input;
9156end;
9157
9158@ @<Get the next non-blank non-call token@>=
9159repeat get_x_token;
9160until cur_cmd<>spacer
9161
9162@ In case you are getting bored, here is a slightly less trivial routine:
9163Given a string of lowercase letters, like `\.{pt}' or `\.{plus}' or
9164`\.{width}', the |scan_keyword| routine checks to see whether the next
9165tokens of input match this string. The match must be exact, except that
9166uppercase letters will match their lowercase counterparts; uppercase
9167equivalents are determined by subtracting |"a"-"A"|, rather than using the
9168|uc_code| table, since \TeX\ uses this routine only for its own limited
9169set of keywords.
9170
9171If a match is found, the characters are effectively removed from the input
9172and |true| is returned. Otherwise |false| is returned, and the input
9173is left essentially unchanged (except for the fact that some macros
9174may have been expanded, etc.).
9175@^inner loop@>
9176
9177@p function scan_keyword(@!s:str_number):boolean; {look for a given string}
9178label exit;
9179var p:pointer; {tail of the backup list}
9180@!q:pointer; {new node being added to the token list via |store_new_token|}
9181@!k:pool_pointer; {index into |str_pool|}
9182begin p:=backup_head; link(p):=null;
9183if s<too_big_char then begin
9184  while true do
9185    begin get_x_token; {recursion is possible here}
9186@^recursion@>
9187    if (cur_cs=0)and@|
9188       ((cur_chr=s)or(cur_chr=s-"a"+"A")) then
9189      begin store_new_token(cur_tok);
9190      flush_list(link(backup_head)); scan_keyword:=true; return;
9191      end
9192    else if (cur_cmd<>spacer)or(p<>backup_head) then
9193      begin back_input;
9194      if p<>backup_head then back_list(link(backup_head));
9195      scan_keyword:=false; return;
9196      end;
9197    end;
9198  end;
9199k:=str_start_macro(s);
9200while k<str_start_macro(s+1) do
9201  begin get_x_token; {recursion is possible here}
9202@^recursion@>
9203  if (cur_cs=0)and@|
9204   ((cur_chr=so(str_pool[k]))or(cur_chr=so(str_pool[k])-"a"+"A")) then
9205    begin store_new_token(cur_tok); incr(k);
9206    end
9207  else if (cur_cmd<>spacer)or(p<>backup_head) then
9208    begin back_input;
9209    if p<>backup_head then back_list(link(backup_head));
9210    scan_keyword:=false; return;
9211    end;
9212  end;
9213flush_list(link(backup_head)); scan_keyword:=true;
9214exit:end;
9215
9216@ Here is a procedure that sounds an alarm when mu and non-mu units
9217are being switched.
9218
9219@p procedure mu_error;
9220begin print_err("Incompatible glue units");
9221@.Incompatible glue units@>
9222help1("I'm going to assume that 1mu=1pt when they're mixed.");
9223error;
9224end;
9225
9226@ The next routine `|scan_something_internal|' is used to fetch internal
9227numeric quantities like `\.{\\hsize}', and also to handle the `\.{\\the}'
9228when expanding constructions like `\.{\\the\\toks0}' and
9229`\.{\\the\\baselineskip}'. Soon we will be considering the |scan_int|
9230procedure, which calls |scan_something_internal|; on the other hand,
9231|scan_something_internal| also calls |scan_int|, for constructions like
9232`\.{\\catcode\`\\\$}' or `\.{\\fontdimen} \.3 \.{\\ff}'. So we
9233have to declare |scan_int| as a |forward| procedure. A few other
9234procedures are also declared at this point.
9235
9236@p procedure@?scan_int; forward; {scans an integer value}
9237@t\4\4@>@<Declare procedures that scan restricted classes of integers@>@;
9238@t\4\4@>@<Declare \eTeX\ procedures for scanning@>@;
9239@t\4\4@>@<Declare procedures that scan font-related stuff@>
9240
9241@ \TeX\ doesn't know exactly what to expect when |scan_something_internal|
9242begins.  For example, an integer or dimension or glue value could occur
9243immediately after `\.{\\hskip}'; and one can even say \.{\\the} with
9244respect to token lists in constructions like
9245`\.{\\xdef\\o\{\\the\\output\}}'.  On the other hand, only integers are
9246allowed after a construction like `\.{\\count}'. To handle the various
9247possibilities, |scan_something_internal| has a |level| parameter, which
9248tells the ``highest'' kind of quantity that |scan_something_internal| is
9249allowed to produce. Six levels are distinguished, namely |int_val|,
9250|dimen_val|, |glue_val|, |mu_val|, |ident_val|, and |tok_val|.
9251
9252The output of |scan_something_internal| (and of the other routines
9253|scan_int|, |scan_dimen|, and |scan_glue| below) is put into the global
9254variable |cur_val|, and its level is put into |cur_val_level|. The highest
9255values of |cur_val_level| are special: |mu_val| is used only when
9256|cur_val| points to something in a ``muskip'' register, or to one of the
9257three parameters \.{\\thinmuskip}, \.{\\medmuskip}, \.{\\thickmuskip};
9258|ident_val| is used only when |cur_val| points to a font identifier;
9259|tok_val| is used only when |cur_val| points to |null| or to the reference
9260count of a token list. The last two cases are allowed only when
9261|scan_something_internal| is called with |level=tok_val|.
9262
9263If the output is glue, |cur_val| will point to a glue specification, and
9264the reference count of that glue will have been updated to reflect this
9265reference; if the output is a nonempty token list, |cur_val| will point to
9266its reference count, but in this case the count will not have been updated.
9267Otherwise |cur_val| will contain the integer or scaled value in question.
9268
9269@d int_val=0 {integer values}
9270@d dimen_val=1 {dimension values}
9271@d glue_val=2 {glue specifications}
9272@d mu_val=3 {math glue specifications}
9273@d ident_val=4 {font identifier}
9274@d tok_val=5 {token lists}
9275@d inter_char_val=6 {inter-character (class) token lists}
9276
9277@<Glob...@>=
9278@!cur_val:integer; {value returned by numeric scanners}
9279@!cur_val1:integer; {value returned by numeric scanners}
9280@!cur_val_level:int_val..tok_val; {the ``level'' of this value}
9281
9282@ The hash table is initialized with `\.{\\count}', `\.{\\dimen}', `\.{\\skip}',
9283and `\.{\\muskip}' all having |register| as their command code; they are
9284distinguished by the |chr_code|, which is either |int_val|, |dimen_val|,
9285|glue_val|, or |mu_val| more than |mem_bot| (dynamic variable-size nodes
9286cannot have these values)
9287
9288@<Put each...@>=
9289primitive("count",register,mem_bot+int_val);
9290@!@:count_}{\.{\\count} primitive@>
9291primitive("dimen",register,mem_bot+dimen_val);
9292@!@:dimen_}{\.{\\dimen} primitive@>
9293primitive("skip",register,mem_bot+glue_val);
9294@!@:skip_}{\.{\\skip} primitive@>
9295primitive("muskip",register,mem_bot+mu_val);
9296@!@:mu_skip_}{\.{\\muskip} primitive@>
9297
9298@ @<Cases of |print_cmd_chr|...@>=
9299register: @<Cases of |register| for |print_cmd_chr|@>;
9300
9301@ OK, we're ready for |scan_something_internal| itself. A second parameter,
9302|negative|, is set |true| if the value that is found should be negated.
9303It is assumed that |cur_cmd| and |cur_chr| represent the first token of
9304the internal quantity to be scanned; an error will be signalled if
9305|cur_cmd<min_internal| or |cur_cmd>max_internal|.
9306
9307@d scanned_result_end(#)==cur_val_level:=#;@+end
9308@d scanned_result(#)==@+begin cur_val:=#;scanned_result_end
9309
9310@p procedure scan_something_internal(@!level:small_number;@!negative:boolean);
9311  {fetch an internal parameter}
9312label exit;
9313var m:halfword; {|chr_code| part of the operand token}
9314    n, k, kk: integer; {accumulators}
9315@!q,@!r:pointer; {general purpose indices}
9316@!tx:pointer; {effective tail node}
9317@!i:four_quarters; {character info}
9318@!p:0..nest_size; {index into |nest|}
9319begin m:=cur_chr;
9320case cur_cmd of
9321def_code: @<Fetch a character code from some table@>;
9322XeTeX_def_code:
9323  begin
9324    scan_usv_num;
9325    if m=sf_code_base then begin
9326      scanned_result(ho(sf_code(cur_val) div @"10000))(int_val)
9327    end
9328    else if m=math_code_base then begin
9329      scanned_result(ho(math_code(cur_val)))(int_val)
9330    end
9331    else if m=math_code_base+1 then begin
9332      print_err("Can't use \Umathcode as a number (try \Umathcodenum)");
9333      help2("\Umathcode is for setting a mathcode from separate values;")@/
9334      ("use \Umathcodenum to access them as single values."); error;
9335      scanned_result(0)(int_val)
9336    end
9337    else if m=del_code_base then begin
9338      scanned_result(ho(del_code(cur_val)))(int_val)
9339    end else begin
9340      print_err("Can't use \Udelcode as a number (try \Udelcodenum)");
9341      help2("\Udelcode is for setting a delcode from separate values;")@/
9342      ("use \Udelcodenum to access them as single values."); error;
9343      scanned_result(0)(int_val);
9344    end;
9345  end;
9346toks_register,assign_toks,def_family,set_font,def_font: @<Fetch a token list or
9347  font identifier, provided that |level=tok_val|@>;
9348assign_int: scanned_result(eqtb[m].int)(int_val);
9349assign_dimen: scanned_result(eqtb[m].sc)(dimen_val);
9350assign_glue: scanned_result(equiv(m))(glue_val);
9351assign_mu_glue: scanned_result(equiv(m))(mu_val);
9352set_aux: @<Fetch the |space_factor| or the |prev_depth|@>;
9353set_prev_graf: @<Fetch the |prev_graf|@>;
9354set_page_int:@<Fetch the |dead_cycles| or the |insert_penalties|@>;
9355set_page_dimen: @<Fetch something on the |page_so_far|@>;
9356set_shape: @<Fetch the |par_shape| size@>;
9357set_box_dimen: @<Fetch a box dimension@>;
9358char_given,math_given: scanned_result(cur_chr)(int_val);
9359assign_font_dimen: @<Fetch a font dimension@>;
9360assign_font_int: @<Fetch a font integer@>;
9361register: @<Fetch a register@>;
9362last_item: @<Fetch an item in the current node, if appropriate@>;
9363othercases @<Complain that \.{\\the} can't do this; give zero result@>
9364endcases;@/
9365while cur_val_level>level do @<Convert \(c)|cur_val| to a lower level@>;
9366@<Fix the reference count, if any, and negate |cur_val| if |negative|@>;
9367exit:end;
9368
9369@ @<Fetch a character code from some table@>=
9370begin scan_usv_num;
9371if m=math_code_base then begin
9372  cur_val1:=ho(math_code(cur_val));
9373  if is_active_math_char(cur_val1) then
9374    cur_val1:=@"8000
9375  else if (math_class_field(cur_val1)>7) or
9376     (math_fam_field(cur_val1)>15) or
9377     (math_char_field(cur_val1)>255) then
9378    begin print_err("Extended mathchar used as mathchar");
9379@.Bad mathchar@>
9380    help2("A mathchar number must be between 0 and ""7FFF.")@/
9381      ("I changed this one to zero."); int_error(cur_val1); cur_val1:=0;
9382    end;
9383  cur_val1:=(math_class_field(cur_val1)*@"1000) +
9384            (math_fam_field(cur_val1)*@"100) +
9385            math_char_field(cur_val1);
9386  scanned_result(cur_val1)(int_val)
9387  end
9388else if m=del_code_base then begin
9389  cur_val1:=del_code(cur_val);
9390  if cur_val1>=@"40000000 then begin
9391    print_err("Extended delcode used as delcode");
9392@.Bad delcode@>
9393    help2("A delimiter code must be between 0 and ""7FFFFFF.")@/
9394      ("I changed this one to zero."); error;
9395    scanned_result(0)(int_val);
9396  end else begin
9397    scanned_result(cur_val1)(int_val);
9398  end
9399end
9400else if m<sf_code_base then scanned_result(equiv(m+cur_val))(int_val)
9401else if m<math_code_base then scanned_result(equiv(m+cur_val) mod @"10000)(int_val)
9402else scanned_result(eqtb[m+cur_val].int)(int_val);
9403end
9404
9405@ @<Fetch a token list...@>=
9406if level<>tok_val then
9407  begin print_err("Missing number, treated as zero");
9408@.Missing number...@>
9409  help3("A number should have been here; I inserted `0'.")@/
9410    ("(If you can't figure out why I needed to see a number,")@/
9411    ("look up `weird error' in the index to The TeXbook.)");
9412@:TeXbook}{\sl The \TeX book@>
9413  back_error; scanned_result(0)(dimen_val);
9414  end
9415else if cur_cmd<=assign_toks then
9416  begin if cur_cmd<assign_toks then {|cur_cmd=toks_register|}
9417    if m=mem_bot then
9418      begin scan_register_num;
9419      if cur_val<256 then cur_val:=equiv(toks_base+cur_val)
9420      else  begin find_sa_element(tok_val,cur_val,false);
9421        if cur_ptr=null then cur_val:=null
9422        else cur_val:=sa_ptr(cur_ptr);
9423        end;
9424      end
9425    else cur_val:=sa_ptr(m)
9426  else if cur_chr=XeTeX_inter_char_loc then begin
9427    scan_eight_bit_int; cur_ptr:=cur_val;
9428    scan_eight_bit_int;
9429    find_sa_element(inter_char_val, cur_ptr * @"100 + cur_val, false);
9430    if cur_ptr=null then cur_val:=null
9431    else cur_val:=sa_ptr(cur_ptr);
9432  end else cur_val:=equiv(m);
9433  cur_val_level:=tok_val;
9434  end
9435else  begin back_input; scan_font_ident;
9436  scanned_result(font_id_base+cur_val)(ident_val);
9437  end
9438
9439@ Users refer to `\.{\\the\\spacefactor}' only in horizontal
9440mode, and to `\.{\\the\\prevdepth}' only in vertical mode; so we put the
9441associated mode in the modifier part of the |set_aux| command.
9442The |set_page_int| command has modifier 0 or 1, for `\.{\\deadcycles}' and
9443`\.{\\insertpenalties}', respectively. The |set_box_dimen| command is
9444modified by either |width_offset|, |height_offset|, or |depth_offset|.
9445And the |last_item| command is modified by either |int_val|, |dimen_val|,
9446|glue_val|, |input_line_no_code|, or |badness_code|.
9447\eTeX\ inserts |last_node_type_code| after |glue_val| and adds
9448the codes for its extensions: |eTeX_version_code|, \dots\ .
9449
9450@d last_node_type_code=glue_val+1 {code for \.{\\lastnodetype}}
9451@d input_line_no_code=glue_val+2 {code for \.{\\inputlineno}}
9452@d badness_code=input_line_no_code+1 {code for \.{\\badness}}
9453@#
9454@d eTeX_int=badness_code+1 {first of \eTeX\ codes for integers}
9455@#
9456@d XeTeX_int=eTeX_int+8 {first of \XeTeX\ codes for integers}
9457@#
9458@d XeTeX_version_code=XeTeX_int {code for \.{\\XeTeXversion}}
9459
9460@d XeTeX_count_glyphs_code=XeTeX_int+1
9461
9462@d XeTeX_count_variations_code=XeTeX_int+2 {Deprecated}
9463@d XeTeX_variation_code=XeTeX_int+3 {Deprecated}
9464@d XeTeX_find_variation_by_name_code=XeTeX_int+4 {Deprecated}
9465@d XeTeX_variation_min_code=XeTeX_int+5 {Deprecated}
9466@d XeTeX_variation_max_code=XeTeX_int+6 {Deprecated}
9467@d XeTeX_variation_default_code=XeTeX_int+7 {Deprecated}
9468
9469@d XeTeX_count_features_code=XeTeX_int+8
9470@d XeTeX_feature_code_code=XeTeX_int+9
9471@d XeTeX_find_feature_by_name_code=XeTeX_int+10
9472@d XeTeX_is_exclusive_feature_code=XeTeX_int+11
9473@d XeTeX_count_selectors_code=XeTeX_int+12
9474@d XeTeX_selector_code_code=XeTeX_int+13
9475@d XeTeX_find_selector_by_name_code=XeTeX_int+14
9476@d XeTeX_is_default_selector_code=XeTeX_int+15
9477
9478@d XeTeX_OT_count_scripts_code=XeTeX_int+16
9479@d XeTeX_OT_count_languages_code=XeTeX_int+17
9480@d XeTeX_OT_count_features_code=XeTeX_int+18
9481@d XeTeX_OT_script_code=XeTeX_int+19
9482@d XeTeX_OT_language_code=XeTeX_int+20
9483@d XeTeX_OT_feature_code=XeTeX_int+21
9484
9485@d XeTeX_map_char_to_glyph_code=XeTeX_int+22
9486@d XeTeX_glyph_index_code=XeTeX_int+23
9487@d XeTeX_font_type_code=XeTeX_int+24
9488
9489@d XeTeX_first_char_code=XeTeX_int+25
9490@d XeTeX_last_char_code=XeTeX_int+26
9491
9492@d pdf_last_x_pos_code        = XeTeX_int+27
9493@d pdf_last_y_pos_code        = XeTeX_int+28
9494@d pdf_strcmp_code            = XeTeX_int+29
9495@d pdf_shell_escape_code      = XeTeX_int+30
9496
9497@d XeTeX_pdf_page_count_code  = XeTeX_int+31
9498
9499@#
9500@d XeTeX_dim=XeTeX_int+32 {first of \XeTeX\ codes for dimensions}
9501
9502@d XeTeX_glyph_bounds_code = XeTeX_dim
9503
9504@#
9505@d eTeX_dim=XeTeX_dim+1 {first of \eTeX\ codes for dimensions}
9506 {changed for \XeTeX\ to make room for \XeTeX\ integers and dimens}
9507@d eTeX_glue=eTeX_dim+9 {first of \eTeX\ codes for glue}
9508@d eTeX_mu=eTeX_glue+1 {first of \eTeX\ codes for muglue}
9509@d eTeX_expr=eTeX_mu+1 {first of \eTeX\ codes for expressions}
9510
9511@<Put each...@>=
9512primitive("spacefactor",set_aux,hmode);
9513@!@:space_factor_}{\.{\\spacefactor} primitive@>
9514primitive("prevdepth",set_aux,vmode);@/
9515@!@:prev_depth_}{\.{\\prevdepth} primitive@>
9516primitive("deadcycles",set_page_int,0);
9517@!@:dead_cycles_}{\.{\\deadcycles} primitive@>
9518primitive("insertpenalties",set_page_int,1);
9519@!@:insert_penalties_}{\.{\\insertpenalties} primitive@>
9520primitive("wd",set_box_dimen,width_offset);
9521@!@:wd_}{\.{\\wd} primitive@>
9522primitive("ht",set_box_dimen,height_offset);
9523@!@:ht_}{\.{\\ht} primitive@>
9524primitive("dp",set_box_dimen,depth_offset);
9525@!@:dp_}{\.{\\dp} primitive@>
9526primitive("lastpenalty",last_item,int_val);
9527@!@:last_penalty_}{\.{\\lastpenalty} primitive@>
9528primitive("lastkern",last_item,dimen_val);
9529@!@:last_kern_}{\.{\\lastkern} primitive@>
9530primitive("lastskip",last_item,glue_val);
9531@!@:last_skip_}{\.{\\lastskip} primitive@>
9532primitive("inputlineno",last_item,input_line_no_code);
9533@!@:input_line_no_}{\.{\\inputlineno} primitive@>
9534primitive("badness",last_item,badness_code);
9535@!@:badness_}{\.{\\badness} primitive@>
9536
9537@ @<Cases of |print_cmd_chr|...@>=
9538set_aux: if chr_code=vmode then print_esc("prevdepth")
9539@+else print_esc("spacefactor");
9540set_page_int: if chr_code=0 then print_esc("deadcycles")
9541@/@<Cases of |set_page_int| for |print_cmd_chr|@>@/
9542@+else print_esc("insertpenalties");
9543set_box_dimen: if chr_code=width_offset then print_esc("wd")
9544else if chr_code=height_offset then print_esc("ht")
9545else print_esc("dp");
9546last_item: case chr_code of
9547  int_val: print_esc("lastpenalty");
9548  dimen_val: print_esc("lastkern");
9549  glue_val: print_esc("lastskip");
9550  input_line_no_code: print_esc("inputlineno");
9551  pdf_shell_escape_code: print_esc("shellescape");
9552  @/@<Cases of |last_item| for |print_cmd_chr|@>@/
9553  othercases print_esc("badness")
9554  endcases;
9555
9556@ @<Fetch the |space_factor| or the |prev_depth|@>=
9557if abs(mode)<>m then
9558  begin print_err("Improper "); print_cmd_chr(set_aux,m);
9559@.Improper \\spacefactor@>
9560@.Improper \\prevdepth@>
9561  help4("You can refer to \spacefactor only in horizontal mode;")@/
9562    ("you can refer to \prevdepth only in vertical mode; and")@/
9563    ("neither of these is meaningful inside \write. So")@/
9564    ("I'm forgetting what you said and using zero instead.");
9565  error;
9566  if level<>tok_val then scanned_result(0)(dimen_val)
9567  else scanned_result(0)(int_val);
9568  end
9569else if m=vmode then scanned_result(prev_depth)(dimen_val)
9570else scanned_result(space_factor)(int_val)
9571
9572@ @<Fetch the |dead_cycles| or the |insert_penalties|@>=
9573begin if m=0 then cur_val:=dead_cycles
9574@/@<Cases for `Fetch the |dead_cycles| or the |insert_penalties|'@>@/
9575else cur_val:=insert_penalties;
9576cur_val_level:=int_val;
9577end
9578
9579@ @<Fetch a box dimension@>=
9580begin scan_register_num; fetch_box(q);
9581if q=null then cur_val:=0 @+else cur_val:=mem[q+m].sc;
9582cur_val_level:=dimen_val;
9583end
9584
9585@ Inside an \.{\\output} routine, a user may wish to look at the page totals
9586that were present at the moment when output was triggered.
9587
9588@d max_dimen==@'7777777777 {$2^{30}-1$}
9589
9590@<Fetch something on the |page_so_far|@>=
9591begin if (page_contents=empty) and (not output_active) then
9592  if m=0 then cur_val:=max_dimen@+else cur_val:=0
9593else cur_val:=page_so_far[m];
9594cur_val_level:=dimen_val;
9595end
9596
9597@ @<Fetch the |prev_graf|@>=
9598if mode=0 then scanned_result(0)(int_val) {|prev_graf=0| within \.{\\write}}
9599else begin nest[nest_ptr]:=cur_list; p:=nest_ptr;
9600  while abs(nest[p].mode_field)<>vmode do decr(p);
9601  scanned_result(nest[p].pg_field)(int_val);
9602  end
9603
9604@ @<Fetch the |par_shape| size@>=
9605begin if m>par_shape_loc then @<Fetch a penalties array element@>
9606else if par_shape_ptr=null then cur_val:=0
9607else cur_val:=info(par_shape_ptr);
9608cur_val_level:=int_val;
9609end
9610
9611@ Here is where \.{\\lastpenalty}, \.{\\lastkern}, \.{\\lastskip}, and
9612\.{\\lastnodetype} are
9613implemented. The reference count for \.{\\lastskip} will be updated later.
9614
9615We also handle \.{\\inputlineno} and \.{\\badness} here, because they are
9616legal in similar contexts.
9617
9618The macro |find_effective_tail_eTeX| sets |tx| to the last non-\.{\\endM}
9619node of the current list.
9620
9621@d find_effective_tail_eTeX==
9622tx:=tail;
9623if not is_char_node(tx) then
9624  if (type(tx)=math_node)and(subtype(tx)=end_M_code) then
9625    begin r:=head;
9626    repeat q:=r; r:=link(q);
9627    until r=tx;
9628    tx:=q;
9629    end
9630@#
9631@d find_effective_tail==find_effective_tail_eTeX
9632
9633@<Fetch an item in the current node...@>=
9634if m>=input_line_no_code then
9635 if m>=eTeX_glue then @<Process an expression and |return|@>@;
9636 else if m>=XeTeX_dim then
9637  begin case m of
9638  @/@<Cases for fetching a dimension value@>@/
9639  end; {there are no other cases}
9640  cur_val_level:=dimen_val;
9641  end
9642 else begin case m of
9643  input_line_no_code: cur_val:=line;
9644  badness_code: cur_val:=last_badness;
9645  pdf_shell_escape_code:
9646    begin
9647    if shellenabledp then begin
9648      if restrictedshell then cur_val:=2
9649      else cur_val:=1;
9650    end
9651    else cur_val:=0;
9652    end;
9653  @/@<Cases for fetching an integer value@>@/
9654  end; {there are no other cases}
9655  cur_val_level:=int_val;
9656  end
9657else begin if cur_chr=glue_val then cur_val:=zero_glue@+else cur_val:=0;
9658  find_effective_tail;
9659  if cur_chr=last_node_type_code then
9660    begin cur_val_level:=int_val;
9661    if (tx=head)or(mode=0) then cur_val:=-1;
9662    end
9663  else cur_val_level:=cur_chr;
9664  if not is_char_node(tx)and(mode<>0) then
9665    case cur_chr of
9666    int_val: if type(tx)=penalty_node then cur_val:=penalty(tx);
9667    dimen_val: if type(tx)=kern_node then cur_val:=width(tx);
9668    glue_val: if type(tx)=glue_node then
9669      begin cur_val:=glue_ptr(tx);
9670      if subtype(tx)=mu_glue then cur_val_level:=mu_val;
9671      end;
9672    last_node_type_code: if type(tx)<=unset_node then cur_val:=type(tx)+1
9673      else cur_val:=unset_node+2;
9674    end {there are no other cases}
9675  else if (mode=vmode)and(tx=head) then
9676    case cur_chr of
9677    int_val: cur_val:=last_penalty;
9678    dimen_val: cur_val:=last_kern;
9679    glue_val: if last_glue<>max_halfword then cur_val:=last_glue;
9680    last_node_type_code: cur_val:=last_node_type;
9681    end; {there are no other cases}
9682  end
9683
9684@ @<Fetch a font dimension@>=
9685begin find_font_dimen(false); font_info[fmem_ptr].sc:=0;
9686scanned_result(font_info[cur_val].sc)(dimen_val);
9687end
9688
9689@ @<Fetch a font integer@>=
9690begin scan_font_ident;
9691if m=0 then scanned_result(hyphen_char[cur_val])(int_val)
9692else if m=1 then scanned_result(skew_char[cur_val])(int_val)
9693else begin
9694  n:=cur_val;
9695  if is_native_font(n) then scan_glyph_number(n)
9696  else scan_char_num;
9697  k:=cur_val;
9698  case m of
9699  lp_code_base: scanned_result(get_cp_code(n, k, left_side))(int_val);
9700  rp_code_base: scanned_result(get_cp_code(n, k, right_side))(int_val);
9701  end;
9702end;
9703end
9704
9705@ @<Fetch a register@>=
9706begin if (m<mem_bot)or(m>lo_mem_stat_max) then
9707  begin cur_val_level:=sa_type(m);
9708  if cur_val_level<glue_val then cur_val:=sa_int(m)
9709  else cur_val:=sa_ptr(m);
9710  end
9711else  begin scan_register_num; cur_val_level:=m-mem_bot;
9712  if cur_val>255 then
9713    begin find_sa_element(cur_val_level,cur_val,false);
9714    if cur_ptr=null then
9715      if cur_val_level<glue_val then cur_val:=0
9716      else cur_val:=zero_glue
9717    else if cur_val_level<glue_val then cur_val:=sa_int(cur_ptr)
9718    else cur_val:=sa_ptr(cur_ptr);
9719    end
9720  else
9721  case cur_val_level of
9722int_val:cur_val:=count(cur_val);
9723dimen_val:cur_val:=dimen(cur_val);
9724glue_val: cur_val:=skip(cur_val);
9725mu_val: cur_val:=mu_skip(cur_val);
9726end; {there are no other cases}
9727  end;
9728end
9729
9730@ @<Complain that \.{\\the} can't do this; give zero result@>=
9731begin print_err("You can't use `"); print_cmd_chr(cur_cmd,cur_chr);
9732@.You can't use x after ...@>
9733print("' after "); print_esc("the");
9734help1("I'm forgetting what you said and using zero instead.");
9735error;
9736if level<>tok_val then scanned_result(0)(dimen_val)
9737else scanned_result(0)(int_val);
9738end
9739
9740@ When a |glue_val| changes to a |dimen_val|, we use the width component
9741of the glue; there is no need to decrease the reference count, since it
9742has not yet been increased.  When a |dimen_val| changes to an |int_val|,
9743we use scaled points so that the value doesn't actually change. And when a
9744|mu_val| changes to a |glue_val|, the value doesn't change either.
9745
9746@<Convert \(c)|cur_val| to a lower level@>=
9747begin if cur_val_level=glue_val then cur_val:=width(cur_val)
9748else if cur_val_level=mu_val then mu_error;
9749decr(cur_val_level);
9750end
9751
9752@ If |cur_val| points to a glue specification at this point, the reference
9753count for the glue does not yet include the reference by |cur_val|.
9754If |negative| is |true|, |cur_val_level| is known to be |<=mu_val|.
9755
9756@<Fix the reference count, if any, ...@>=
9757if negative then
9758  if cur_val_level>=glue_val then
9759    begin cur_val:=new_spec(cur_val);
9760    @<Negate all three glue components of |cur_val|@>;
9761    end
9762  else negate(cur_val)
9763else if (cur_val_level>=glue_val)and(cur_val_level<=mu_val) then
9764  add_glue_ref(cur_val)
9765
9766@ @<Negate all three...@>=
9767begin negate(width(cur_val));
9768negate(stretch(cur_val));
9769negate(shrink(cur_val));
9770end
9771
9772@ Our next goal is to write the |scan_int| procedure, which scans anything that
9773\TeX\ treats as an integer. But first we might as well look at some simple
9774applications of |scan_int| that have already been made inside of
9775|scan_something_internal|.
9776
9777@ @<Declare procedures that scan restricted classes of integers@>=
9778
9779procedure scan_glyph_number(f: internal_font_number);
9780{ scan a glyph ID for native font |f|, identified by Unicode value or name or glyph number }
9781begin
9782  if scan_keyword("/") then {set cp value by glyph name}
9783  begin
9784    scan_and_pack_name; {result is in |nameoffile|}
9785    scanned_result(map_glyph_to_index(f))(int_val);
9786  end else if scan_keyword("u") then {set cp value by unicode}
9787  begin
9788    scan_char_num;
9789    scanned_result(map_char_to_glyph(f,cur_val))(int_val);
9790  end else
9791    scan_int;
9792end;
9793
9794procedure scan_char_class;
9795begin scan_int;
9796if (cur_val<0)or(cur_val>256) then
9797  begin print_err("Bad character class");
9798@.Bad character code@>
9799  help2("A character class must be between 0 and 256.")@/
9800    ("I changed this one to zero."); int_error(cur_val); cur_val:=0;
9801  end;
9802end;
9803
9804procedure scan_eight_bit_int;
9805begin scan_int;
9806if (cur_val<0)or(cur_val>255) then
9807  begin print_err("Bad register code");
9808@.Bad register code@>
9809  help2("A register code or char class must be between 0 and 255.")@/
9810    ("I changed this one to zero."); int_error(cur_val); cur_val:=0;
9811  end;
9812end;
9813
9814@ @<Declare procedures that scan restricted classes of integers@>=
9815procedure scan_usv_num;
9816begin scan_int;
9817if (cur_val<0)or(cur_val>biggest_usv) then
9818  begin print_err("Bad character code");
9819@.Bad character code@>
9820  help2("A Unicode scalar value must be between 0 and ""10FFFF.")@/
9821    ("I changed this one to zero."); int_error(cur_val); cur_val:=0;
9822  end;
9823end;
9824
9825procedure scan_char_num;
9826begin scan_int;
9827if (cur_val<0)or(cur_val>biggest_char) then
9828  begin print_err("Bad character code");
9829@.Bad character code@>
9830  help2("A character number must be between 0 and 65535.")@/
9831    ("I changed this one to zero."); int_error(cur_val); cur_val:=0;
9832  end;
9833end;
9834
9835@ While we're at it, we might as well deal with similar routines that
9836will be needed later.
9837
9838@<Declare procedures that scan restricted classes of integers@>=
9839procedure scan_xetex_math_char_int;
9840begin scan_int;
9841  if is_active_math_char(cur_val) then begin
9842    if cur_val <> active_math_char then begin
9843      print_err("Bad active XeTeX math code");
9844      help2("Since I ignore class and family for active math chars,")@/
9845      ("I changed this one to ""1FFFFF."); int_error(cur_val);
9846      cur_val:=active_math_char;
9847    end
9848  end else if math_char_field(cur_val)>biggest_usv then begin
9849    print_err("Bad XeTeX math character code");
9850    help2("Since I expected a character number between 0 and ""10FFFF,")@/
9851    ("I changed this one to zero."); int_error(cur_val); cur_val:=0;
9852  end;
9853end;
9854
9855procedure scan_math_class_int;
9856begin scan_int;
9857if (cur_val<0)or(cur_val>7) then
9858  begin print_err("Bad math class");
9859@.Bad number@>
9860  help2("Since I expected to read a number between 0 and 7,")@/
9861    ("I changed this one to zero."); int_error(cur_val); cur_val:=0;
9862  end;
9863end;
9864
9865procedure scan_math_fam_int;
9866begin scan_int;
9867if (cur_val<0)or(cur_val>number_math_families-1) then
9868  begin print_err("Bad math family");
9869@.Bad number@>
9870  help2("Since I expected to read a number between 0 and 255,")@/
9871    ("I changed this one to zero."); int_error(cur_val); cur_val:=0;
9872  end;
9873end;
9874
9875procedure scan_four_bit_int;
9876begin scan_int;
9877if (cur_val<0)or(cur_val>15) then
9878  begin print_err("Bad number");
9879@.Bad number@>
9880  help2("Since I expected to read a number between 0 and 15,")@/
9881    ("I changed this one to zero."); int_error(cur_val); cur_val:=0;
9882  end;
9883end;
9884
9885@ @<Declare procedures that scan restricted classes of integers@>=
9886procedure scan_fifteen_bit_int;
9887begin scan_int;
9888if (cur_val<0)or(cur_val>@'77777) then
9889  begin print_err("Bad mathchar");
9890@.Bad mathchar@>
9891  help2("A mathchar number must be between 0 and 32767.")@/
9892    ("I changed this one to zero."); int_error(cur_val); cur_val:=0;
9893  end;
9894end;
9895
9896@ @<Declare procedures that scan restricted classes of integers@>=
9897procedure scan_delimiter_int;
9898begin scan_int;
9899if (cur_val<0)or(cur_val>@'777777777) then
9900  begin print_err("Bad delimiter code");
9901@.Bad delimiter code@>
9902  help2("A numeric delimiter code must be between 0 and 2^{27}-1.")@/
9903    ("I changed this one to zero."); int_error(cur_val); cur_val:=0;
9904  end;
9905end;
9906
9907@ An integer number can be preceded by any number of spaces and `\.+' or
9908`\.-' signs. Then comes either a decimal constant (i.e., radix 10), an
9909octal constant (i.e., radix 8, preceded by~\.\'), a hexadecimal constant
9910(radix 16, preceded by~\."), an alphabetic constant (preceded by~\.\`), or
9911an internal variable. After scanning is complete,
9912|cur_val| will contain the answer, which must be at most
9913$2^{31}-1=2147483647$ in absolute value. The value of |radix| is set to
991410, 8, or 16 in the cases of decimal, octal, or hexadecimal constants,
9915otherwise |radix| is set to zero. An optional space follows a constant.
9916
9917@d octal_token=other_token+"'" {apostrophe, indicates an octal constant}
9918@d hex_token=other_token+"""" {double quote, indicates a hex constant}
9919@d alpha_token=other_token+"`" {reverse apostrophe, precedes alpha constants}
9920@d point_token=other_token+"." {decimal point}
9921@d continental_point_token=other_token+"," {decimal point, Eurostyle}
9922
9923@<Glob...@>=
9924@!radix:small_number; {|scan_int| sets this to 8, 10, 16, or zero}
9925
9926@ We initialize the following global variables just in case |expand|
9927comes into action before any of the basic scanning routines has assigned
9928them a value.
9929
9930@<Set init...@>=
9931cur_val:=0; cur_val_level:=int_val; radix:=0; cur_order:=normal;
9932
9933@ The |scan_int| routine is used also to scan the integer part of a
9934fraction; for example, the `\.3' in `\.{3.14159}' will be found by
9935|scan_int|. The |scan_dimen| routine assumes that |cur_tok=point_token|
9936after the integer part of such a fraction has been scanned by |scan_int|,
9937and that the decimal point has been backed up to be scanned again.
9938
9939@p procedure scan_int; {sets |cur_val| to an integer}
9940label done;
9941var negative:boolean; {should the answer be negated?}
9942@!m:integer; {|@t$2^{31}$@> div radix|, the threshold of danger}
9943@!d:small_number; {the digit just scanned}
9944@!vacuous:boolean; {have no digits appeared?}
9945@!OK_so_far:boolean; {has an error message been issued?}
9946begin radix:=0; OK_so_far:=true;@/
9947@<Get the next non-blank non-sign token; set |negative| appropriately@>;
9948if cur_tok=alpha_token then @<Scan an alphabetic character code into |cur_val|@>
9949else if (cur_cmd>=min_internal)and(cur_cmd<=max_internal) then
9950  scan_something_internal(int_val,false)
9951else @<Scan a numeric constant@>;
9952if negative then negate(cur_val);
9953end;
9954
9955@ @<Get the next non-blank non-sign token...@>=
9956negative:=false;
9957repeat @<Get the next non-blank non-call token@>;
9958if cur_tok=other_token+"-" then
9959  begin negative:=not negative; cur_tok:=other_token+"+";
9960  end;
9961until cur_tok<>other_token+"+"
9962
9963@ A space is ignored after an alphabetic character constant, so that
9964such constants behave like numeric ones.
9965
9966@<Scan an alphabetic character code into |cur_val|@>=
9967begin get_token; {suppress macro expansion}
9968if cur_tok<cs_token_flag then
9969  begin cur_val:=cur_chr;
9970  if cur_cmd<=right_brace then
9971    if cur_cmd=right_brace then incr(align_state)
9972    else decr(align_state);
9973  end
9974else if cur_tok<cs_token_flag+single_base then
9975  cur_val:=cur_tok-cs_token_flag-active_base
9976else cur_val:=cur_tok-cs_token_flag-single_base;
9977if cur_val>biggest_usv then
9978  begin print_err("Improper alphabetic constant");
9979@.Improper alphabetic constant@>
9980  help2("A one-character control sequence belongs after a ` mark.")@/
9981    ("So I'm essentially inserting \0 here.");
9982  cur_val:="0"; back_error;
9983  end
9984else @<Scan an optional space@>;
9985end
9986
9987@ @<Scan an optional space@>=
9988begin get_x_token; if cur_cmd<>spacer then back_input;
9989end
9990
9991@ @<Scan a numeric constant@>=
9992begin radix:=10; m:=214748364;
9993if cur_tok=octal_token then
9994  begin radix:=8; m:=@'2000000000; get_x_token;
9995  end
9996else if cur_tok=hex_token then
9997  begin radix:=16; m:=@'1000000000; get_x_token;
9998  end;
9999vacuous:=true; cur_val:=0;@/
10000@<Accumulate the constant until |cur_tok| is not a suitable digit@>;
10001if vacuous then @<Express astonishment that no number was here@>
10002else if cur_cmd<>spacer then back_input;
10003end
10004
10005@ @d infinity==@'17777777777 {the largest positive value that \TeX\ knows}
10006@d zero_token=other_token+"0" {zero, the smallest digit}
10007@d A_token=letter_token+"A" {the smallest special hex digit}
10008@d other_A_token=other_token+"A" {special hex digit of type |other_char|}
10009
10010@<Accumulate the constant...@>=
10011loop@+  begin if (cur_tok<zero_token+radix)and(cur_tok>=zero_token)and
10012    (cur_tok<=zero_token+9) then d:=cur_tok-zero_token
10013  else if radix=16 then
10014    if (cur_tok<=A_token+5)and(cur_tok>=A_token) then d:=cur_tok-A_token+10
10015    else if (cur_tok<=other_A_token+5)and(cur_tok>=other_A_token) then
10016      d:=cur_tok-other_A_token+10
10017    else goto done
10018  else goto done;
10019  vacuous:=false;
10020  if (cur_val>=m)and((cur_val>m)or(d>7)or(radix<>10)) then
10021    begin if OK_so_far then
10022      begin print_err("Number too big");
10023@.Number too big@>
10024      help2("I can only go up to 2147483647='17777777777=""7FFFFFFF,")@/
10025        ("so I'm using that number instead of yours.");
10026      error; cur_val:=infinity; OK_so_far:=false;
10027      end;
10028    end
10029  else cur_val:=cur_val*radix+d;
10030  get_x_token;
10031  end;
10032done:
10033
10034@ @<Express astonishment...@>=
10035begin print_err("Missing number, treated as zero");
10036@.Missing number...@>
10037help3("A number should have been here; I inserted `0'.")@/
10038  ("(If you can't figure out why I needed to see a number,")@/
10039  ("look up `weird error' in the index to The TeXbook.)");
10040@:TeXbook}{\sl The \TeX book@>
10041back_error;
10042end
10043
10044@ The |scan_dimen| routine is similar to |scan_int|, but it sets |cur_val| to
10045a |scaled| value, i.e., an integral number of sp. One of its main tasks
10046is therefore to interpret the abbreviations for various kinds of units and
10047to convert measurements to scaled points.
10048
10049There are three parameters: |mu| is |true| if the finite units must be
10050`\.{mu}', while |mu| is |false| if `\.{mu}' units are disallowed;
10051|inf| is |true| if the infinite units `\.{fil}', `\.{fill}', `\.{filll}'
10052are permitted; and |shortcut| is |true| if |cur_val| already contains
10053an integer and only the units need to be considered.
10054
10055The order of infinity that was found in the case of infinite glue is returned
10056in the global variable |cur_order|.
10057
10058@<Glob...@>=
10059@!cur_order:glue_ord; {order of infinity found by |scan_dimen|}
10060
10061@ Constructions like `\.{-\'77 pt}' are legal dimensions, so |scan_dimen|
10062may begin with |scan_int|. This explains why it is convenient to use
10063|scan_int| also for the integer part of a decimal fraction.
10064
10065Several branches of |scan_dimen| work with |cur_val| as an integer and
10066with an auxiliary fraction |f|, so that the actual quantity of interest is
10067$|cur_val|+|f|/2^{16}$. At the end of the routine, this ``unpacked''
10068representation is put into the single word |cur_val|, which suddenly
10069switches significance from |integer| to |scaled|.
10070
10071@d attach_fraction=88 {go here to pack |cur_val| and |f| into |cur_val|}
10072@d attach_sign=89 {go here when |cur_val| is correct except perhaps for sign}
10073@d scan_normal_dimen==scan_dimen(false,false,false)
10074
10075@p procedure xetex_scan_dimen(@!mu,@!inf,@!shortcut,@!requires_units:boolean);
10076  {sets |cur_val| to a dimension}
10077label done, done1, done2, found, not_found, attach_fraction, attach_sign;
10078var negative:boolean; {should the answer be negated?}
10079@!f:integer; {numerator of a fraction whose denominator is $2^{16}$}
10080@<Local variables for dimension calculations@>@;
10081begin f:=0; arith_error:=false; cur_order:=normal; negative:=false;
10082if not shortcut then
10083  begin @<Get the next non-blank non-sign...@>;
10084  if (cur_cmd>=min_internal)and(cur_cmd<=max_internal) then
10085    @<Fetch an internal dimension and |goto attach_sign|,
10086      or fetch an internal integer@>
10087  else  begin back_input;
10088    if cur_tok=continental_point_token then cur_tok:=point_token;
10089    if cur_tok<>point_token then scan_int
10090    else  begin radix:=10; cur_val:=0;
10091      end;
10092    if cur_tok=continental_point_token then cur_tok:=point_token;
10093    if (radix=10)and(cur_tok=point_token) then @<Scan decimal fraction@>;
10094    end;
10095  end;
10096if cur_val<0 then {in this case |f=0|}
10097  begin negative:=not negative; negate(cur_val);
10098  end;
10099if requires_units then begin
10100@<Scan units and set |cur_val| to $x\cdot(|cur_val|+f/2^{16})$, where there
10101  are |x| sp per unit; |goto attach_sign| if the units are internal@>;
10102@<Scan an optional space@>;
10103end else begin
10104 if cur_val>=@'40000 then arith_error:=true
10105 else cur_val:=cur_val*unity+f;
10106end;
10107attach_sign: if arith_error or(abs(cur_val)>=@'10000000000) then
10108  @<Report that this dimension is out of range@>;
10109if negative then negate(cur_val);
10110end;
10111
10112procedure scan_dimen(@!mu,@!inf,@!shortcut:boolean);
10113begin
10114  xetex_scan_dimen(mu,inf,shortcut,true);
10115end;
10116
10117@ For XeTeX, we have an additional version |scan_decimal|, like |scan_dimen|
10118but without any scanning of units.
10119
10120@p procedure scan_decimal;
10121  {sets |cur_val| to a quantity expressed as a decimal fraction}
10122begin
10123 xetex_scan_dimen(false, false, false, false);
10124end;
10125
10126@ @<Fetch an internal dimension and |goto attach_sign|...@>=
10127if mu then
10128  begin scan_something_internal(mu_val,false);
10129  @<Coerce glue to a dimension@>;
10130  if cur_val_level=mu_val then goto attach_sign;
10131  if cur_val_level<>int_val then mu_error;
10132  end
10133else  begin scan_something_internal(dimen_val,false);
10134  if cur_val_level=dimen_val then goto attach_sign;
10135  end
10136
10137@ @<Local variables for dimension calculations@>=
10138@!num,@!denom:1..65536; {conversion ratio for the scanned units}
10139@!k,@!kk:small_number; {number of digits in a decimal fraction}
10140@!p,@!q:pointer; {top of decimal digit stack}
10141@!v:scaled; {an internal dimension}
10142@!save_cur_val:integer; {temporary storage of |cur_val|}
10143
10144@ The following code is executed when |scan_something_internal| was
10145called asking for |mu_val|, when we really wanted a ``mudimen'' instead
10146of ``muglue.''
10147
10148@<Coerce glue to a dimension@>=
10149if cur_val_level>=glue_val then
10150  begin v:=width(cur_val); delete_glue_ref(cur_val); cur_val:=v;
10151  end
10152
10153@ When the following code is executed, we have |cur_tok=point_token|, but this
10154token has been backed up using |back_input|; we must first discard it.
10155
10156It turns out that a decimal point all by itself is equivalent to `\.{0.0}'.
10157Let's hope people don't use that fact.
10158
10159@<Scan decimal fraction@>=
10160begin k:=0; p:=null; get_token; {|point_token| is being re-scanned}
10161loop@+  begin get_x_token;
10162  if (cur_tok>zero_token+9)or(cur_tok<zero_token) then goto done1;
10163  if k<17 then {digits for |k>=17| cannot affect the result}
10164    begin q:=get_avail; link(q):=p; info(q):=cur_tok-zero_token;
10165    p:=q; incr(k);
10166    end;
10167  end;
10168done1: for kk:=k downto 1 do
10169  begin dig[kk-1]:=info(p); q:=p; p:=link(p); free_avail(q);
10170  end;
10171f:=round_decimals(k);
10172if cur_cmd<>spacer then back_input;
10173end
10174
10175@ Now comes the harder part: At this point in the program, |cur_val| is a
10176nonnegative integer and $f/2^{16}$ is a nonnegative fraction less than 1;
10177we want to multiply the sum of these two quantities by the appropriate
10178factor, based on the specified units, in order to produce a |scaled|
10179result, and we want to do the calculation with fixed point arithmetic that
10180does not overflow.
10181
10182@<Scan units and set |cur_val| to $x\cdot(|cur_val|+f/2^{16})$...@>=
10183if inf then @<Scan for \(f)\.{fil} units; |goto attach_fraction| if found@>;
10184@<Scan for \(u)units that are internal dimensions;
10185  |goto attach_sign| with |cur_val| set if found@>;
10186if mu then @<Scan for \(m)\.{mu} units and |goto attach_fraction|@>;
10187if scan_keyword("true") then @<Adjust \(f)for the magnification ratio@>;
10188@.true@>
10189if scan_keyword("pt") then goto attach_fraction; {the easy case}
10190@.pt@>
10191@<Scan for \(a)all other units and adjust |cur_val| and |f| accordingly;
10192  |goto done| in the case of scaled points@>;
10193attach_fraction: if cur_val>=@'40000 then arith_error:=true
10194else cur_val:=cur_val*unity+f;
10195done:
10196
10197@ A specification like `\.{filllll}' or `\.{fill L L L}' will lead to two
10198error messages (one for each additional keyword \.{"l"}).
10199
10200@<Scan for \(f)\.{fil} units...@>=
10201if scan_keyword("fil") then
10202@.fil@>
10203  begin cur_order:=fil;
10204  while scan_keyword("l") do
10205    begin if cur_order=filll then
10206      begin print_err("Illegal unit of measure (");
10207@.Illegal unit of measure@>
10208      print("replaced by filll)");
10209      help1("I dddon't go any higher than filll."); error;
10210      end
10211    else incr(cur_order);
10212    end;
10213  goto attach_fraction;
10214  end
10215
10216@ @<Scan for \(u)units that are internal dimensions...@>=
10217save_cur_val:=cur_val;
10218@<Get the next non-blank non-call...@>;
10219if (cur_cmd<min_internal)or(cur_cmd>max_internal) then back_input
10220else  begin if mu then
10221    begin scan_something_internal(mu_val,false); @<Coerce glue...@>;
10222    if cur_val_level<>mu_val then mu_error;
10223    end
10224  else scan_something_internal(dimen_val,false);
10225  v:=cur_val; goto found;
10226  end;
10227if mu then goto not_found;
10228if scan_keyword("em") then v:=(@<The em width for |cur_font|@>)
10229@.em@>
10230else if scan_keyword("ex") then v:=(@<The x-height for |cur_font|@>)
10231@.ex@>
10232else goto not_found;
10233@<Scan an optional space@>;
10234found:cur_val:=nx_plus_y(save_cur_val,v,xn_over_d(v,f,@'200000));
10235goto attach_sign;
10236not_found:
10237
10238@ @<Scan for \(m)\.{mu} units and |goto attach_fraction|@>=
10239if scan_keyword("mu") then goto attach_fraction
10240@.mu@>
10241else  begin print_err("Illegal unit of measure ("); print("mu inserted)");
10242@.Illegal unit of measure@>
10243  help4("The unit of measurement in math glue must be mu.")@/
10244    ("To recover gracefully from this error, it's best to")@/
10245    ("delete the erroneous units; e.g., type `2' to delete")@/
10246    ("two letters. (See Chapter 27 of The TeXbook.)");
10247@:TeXbook}{\sl The \TeX book@>
10248  error; goto attach_fraction;
10249  end
10250
10251@ @<Adjust \(f)for the magnification ratio@>=
10252begin prepare_mag;
10253if mag<>1000 then
10254  begin cur_val:=xn_over_d(cur_val,1000,mag);
10255  f:=(1000*f+@'200000*remainder) div mag;
10256  cur_val:=cur_val+(f div @'200000); f:=f mod @'200000;
10257  end;
10258end
10259
10260@ The necessary conversion factors can all be specified exactly as
10261fractions whose numerator and denominator sum to 32768 or less.
10262According to the definitions here, $\rm2660\,dd\approx1000.33297\,mm$;
10263this agrees well with the value $\rm1000.333\,mm$ cited by Bosshard
10264@^Bosshard, Hans Rudolf@>
10265in {\sl Technische Grundlagen zur Satzherstellung\/} (Bern, 1980).
10266
10267@d set_conversion_end(#)== denom:=#; end
10268@d set_conversion(#)==@+begin num:=#; set_conversion_end
10269
10270@<Scan for \(a)all other units and adjust |cur_val| and |f|...@>=
10271if scan_keyword("in") then set_conversion(7227)(100)
10272@.in@>
10273else if scan_keyword("pc") then set_conversion(12)(1)
10274@.pc@>
10275else if scan_keyword("cm") then set_conversion(7227)(254)
10276@.cm@>
10277else if scan_keyword("mm") then set_conversion(7227)(2540)
10278@.mm@>
10279else if scan_keyword("bp") then set_conversion(7227)(7200)
10280@.bp@>
10281else if scan_keyword("dd") then set_conversion(1238)(1157)
10282@.dd@>
10283else if scan_keyword("cc") then set_conversion(14856)(1157)
10284@.cc@>
10285else if scan_keyword("sp") then goto done
10286@.sp@>
10287else @<Complain about unknown unit and |goto done2|@>;
10288cur_val:=xn_over_d(cur_val,num,denom);
10289f:=(num*f+@'200000*remainder) div denom;@/
10290cur_val:=cur_val+(f div @'200000); f:=f mod @'200000;
10291done2:
10292
10293@ @<Complain about unknown unit...@>=
10294begin print_err("Illegal unit of measure ("); print("pt inserted)");
10295@.Illegal unit of measure@>
10296help6("Dimensions can be in units of em, ex, in, pt, pc,")@/
10297  ("cm, mm, dd, cc, bp, or sp; but yours is a new one!")@/
10298  ("I'll assume that you meant to say pt, for printer's points.")@/
10299  ("To recover gracefully from this error, it's best to")@/
10300  ("delete the erroneous units; e.g., type `2' to delete")@/
10301  ("two letters. (See Chapter 27 of The TeXbook.)");
10302@:TeXbook}{\sl The \TeX book@>
10303error; goto done2;
10304end
10305
10306
10307@ @<Report that this dimension is out of range@>=
10308begin print_err("Dimension too large");
10309@.Dimension too large@>
10310help2("I can't work with sizes bigger than about 19 feet.")@/
10311  ("Continue and I'll use the largest value I can.");@/
10312error; cur_val:=max_dimen; arith_error:=false;
10313end
10314
10315@ The final member of \TeX's value-scanning trio is |scan_glue|, which
10316makes |cur_val| point to a glue specification. The reference count of that
10317glue spec will take account of the fact that |cur_val| is pointing to~it.
10318
10319The |level| parameter should be either |glue_val| or |mu_val|.
10320
10321Since |scan_dimen| was so much more complex than |scan_int|, we might expect
10322|scan_glue| to be even worse. But fortunately, it is very simple, since
10323most of the work has already been done.
10324
10325@p procedure scan_glue(@!level:small_number);
10326  {sets |cur_val| to a glue spec pointer}
10327label exit;
10328var negative:boolean; {should the answer be negated?}
10329@!q:pointer; {new glue specification}
10330@!mu:boolean; {does |level=mu_val|?}
10331begin mu:=(level=mu_val); @<Get the next non-blank non-sign...@>;
10332if (cur_cmd>=min_internal)and(cur_cmd<=max_internal) then
10333  begin scan_something_internal(level,negative);
10334  if cur_val_level>=glue_val then
10335    begin if cur_val_level<>level then mu_error;
10336    return;
10337    end;
10338  if cur_val_level=int_val then scan_dimen(mu,false,true)
10339  else if level=mu_val then mu_error;
10340  end
10341else  begin back_input; scan_dimen(mu,false,false);
10342  if negative then negate(cur_val);
10343  end;
10344@<Create a new glue specification whose width is |cur_val|; scan for its
10345  stretch and shrink components@>;
10346exit:end;
10347@#
10348@<Declare procedures needed for expressions@>@;
10349
10350@ @<Create a new glue specification whose width is |cur_val|...@>=
10351q:=new_spec(zero_glue); width(q):=cur_val;
10352if scan_keyword("plus") then
10353@.plus@>
10354  begin scan_dimen(mu,true,false);
10355  stretch(q):=cur_val; stretch_order(q):=cur_order;
10356  end;
10357if scan_keyword("minus") then
10358@.minus@>
10359  begin scan_dimen(mu,true,false);
10360  shrink(q):=cur_val; shrink_order(q):=cur_order;
10361  end;
10362cur_val:=q
10363
10364@ Here's a similar procedure that returns a pointer to a rule node. This
10365routine is called just after \TeX\ has seen \.{\\hrule} or \.{\\vrule};
10366therefore |cur_cmd| will be either |hrule| or |vrule|. The idea is to store
10367the default rule dimensions in the node, then to override them if
10368`\.{height}' or `\.{width}' or `\.{depth}' specifications are
10369found (in any order).
10370
10371@d default_rule=26214 {0.4\thinspace pt}
10372
10373@p function scan_rule_spec:pointer;
10374label reswitch;
10375var q:pointer; {the rule node being created}
10376begin q:=new_rule; {|width|, |depth|, and |height| all equal |null_flag| now}
10377if cur_cmd=vrule then width(q):=default_rule
10378else  begin height(q):=default_rule; depth(q):=0;
10379  end;
10380reswitch: if scan_keyword("width") then
10381@.width@>
10382  begin scan_normal_dimen; width(q):=cur_val; goto reswitch;
10383  end;
10384if scan_keyword("height") then
10385@.height@>
10386  begin scan_normal_dimen; height(q):=cur_val; goto reswitch;
10387  end;
10388if scan_keyword("depth") then
10389@.depth@>
10390  begin scan_normal_dimen; depth(q):=cur_val; goto reswitch;
10391  end;
10392scan_rule_spec:=q;
10393end;
10394
10395@* \[27] Building token lists.
10396The token lists for macros and for other things like \.{\\mark} and \.{\\output}
10397and \.{\\write} are produced by a procedure called |scan_toks|.
10398
10399Before we get into the details of |scan_toks|, let's consider a much
10400simpler task, that of converting the current string into a token list.
10401The |str_toks| function does this; it classifies spaces as type |spacer|
10402and everything else as type |other_char|.
10403
10404The token list created by |str_toks| begins at |link(temp_head)| and ends
10405at the value |p| that is returned. (If |p=temp_head|, the list is empty.)
10406
10407The |str_toks_cat| function is the same, except that the catcode |cat| is
10408stamped on all the characters, unless zero is passed in which case it
10409chooses |spacer| or |other_char| automatically.
10410
10411@p @t\4@>@<Declare \eTeX\ procedures for token lists@>@;@/
10412function str_toks_cat(@!b:pool_pointer;@!cat:small_number):pointer;
10413  {changes the string |str_pool[b..pool_ptr]| to a token list}
10414var p:pointer; {tail of the token list}
10415@!q:pointer; {new node being added to the token list via |store_new_token|}
10416@!t:halfword; {token being appended}
10417@!k:pool_pointer; {index into |str_pool|}
10418begin str_room(1);
10419p:=temp_head; link(p):=null; k:=b;
10420while k<pool_ptr do
10421  begin t:=so(str_pool[k]);
10422  if (t=" ") and (cat=0) then t:=space_token
10423  else begin if (t >= @"D800) and (t <= @"DBFF) and (k+1 < pool_ptr)
10424             and (so(str_pool[k+1]) >= @"DC00) and (so(str_pool[k+1]) <= @"DFFF) then
10425    begin
10426      incr(k);
10427      t := @"10000 + (t - @"D800) * @"400 + (so(str_pool[k]) - @"DC00);
10428    end;
10429    if cat=0 then t := other_token + t
10430    else t := max_char_val * cat + t;
10431  end;
10432  fast_store_new_token(t);
10433  incr(k);
10434  end;
10435pool_ptr:=b; str_toks_cat:=p;
10436end;
10437
10438function str_toks(@!b:pool_pointer):pointer;
10439begin str_toks:=str_toks_cat(b,0); end;
10440
10441@ The main reason for wanting |str_toks| is the next function,
10442|the_toks|, which has similar input/output characteristics.
10443
10444This procedure is supposed to scan something like `\.{\\skip\\count12}',
10445i.e., whatever can follow `\.{\\the}', and it constructs a token list
10446containing something like `\.{-3.0pt minus 0.5fill}'.
10447
10448@p function the_toks:pointer;
10449label exit;
10450var old_setting:0..max_selector; {holds |selector| setting}
10451@!p,@!q,@!r:pointer; {used for copying a token list}
10452@!b:pool_pointer; {base of temporary string}
10453@!c:small_number; {value of |cur_chr|}
10454begin @<Handle \.{\\unexpanded} or \.{\\detokenize} and |return|@>;@/
10455get_x_token; scan_something_internal(tok_val,false);
10456if cur_val_level>=ident_val then @<Copy the token list@>
10457else begin old_setting:=selector; selector:=new_string; b:=pool_ptr;
10458  case cur_val_level of
10459  int_val:print_int(cur_val);
10460  dimen_val:begin print_scaled(cur_val); print("pt");
10461    end;
10462  glue_val: begin print_spec(cur_val,"pt"); delete_glue_ref(cur_val);
10463    end;
10464  mu_val: begin print_spec(cur_val,"mu"); delete_glue_ref(cur_val);
10465    end;
10466  end; {there are no other cases}
10467  selector:=old_setting; the_toks:=str_toks(b);
10468  end;
10469exit:end;
10470
10471@ @<Copy the token list@>=
10472begin p:=temp_head; link(p):=null;
10473if cur_val_level=ident_val then store_new_token(cs_token_flag+cur_val)
10474else if cur_val<>null then
10475  begin r:=link(cur_val); {do not copy the reference count}
10476  while r<>null do
10477    begin fast_store_new_token(info(r)); r:=link(r);
10478    end;
10479  end;
10480the_toks:=p;
10481end
10482
10483@ Here's part of the |expand| subroutine that we are now ready to complete:
10484
10485@p procedure ins_the_toks;
10486begin link(garbage):=the_toks; ins_list(link(temp_head));
10487end;
10488
10489@ The primitives \.{\\number}, \.{\\romannumeral}, \.{\\string}, \.{\\meaning},
10490\.{\\fontname}, and \.{\\jobname} are defined as follows.
10491
10492\eTeX\ adds \.{\\eTeXrevision} such that |job_name_code| remains last.
10493
10494@d number_code=0 {command code for \.{\\number}}
10495@d roman_numeral_code=1 {command code for \.{\\romannumeral}}
10496@d string_code=2 {command code for \.{\\string}}
10497@d meaning_code=3 {command code for \.{\\meaning}}
10498@d font_name_code=4 {command code for \.{\\fontname}}
10499@d etex_convert_base=5 {base for \eTeX's command codes}
10500@d eTeX_revision_code=etex_convert_base {command code for \.{\\eTeXrevision}}
10501
10502@d XeTeX_revision_code=6
10503@d XeTeX_variation_name_code=7
10504@d XeTeX_feature_name_code=8
10505@d XeTeX_selector_name_code=9
10506@d XeTeX_glyph_name_code=10
10507
10508@d left_margin_kern_code=11
10509@d right_margin_kern_code=12
10510
10511@d XeTeX_Uchar_code = 13
10512@d XeTeX_Ucharcat_code = 14
10513
10514@d etex_convert_codes=XeTeX_Ucharcat_code+1 {end of \eTeX's command codes}
10515@d job_name_code=etex_convert_codes {command code for \.{\\jobname}}
10516
10517@<Put each...@>=
10518primitive("number",convert,number_code);@/
10519@!@:number_}{\.{\\number} primitive@>
10520primitive("romannumeral",convert,roman_numeral_code);@/
10521@!@:roman_numeral_}{\.{\\romannumeral} primitive@>
10522primitive("string",convert,string_code);@/
10523@!@:string_}{\.{\\string} primitive@>
10524primitive("meaning",convert,meaning_code);@/
10525@!@:meaning_}{\.{\\meaning} primitive@>
10526primitive("fontname",convert,font_name_code);@/
10527@!@:font_name_}{\.{\\fontname} primitive@>
10528primitive("jobname",convert,job_name_code);@/
10529@!@:job_name_}{\.{\\jobname} primitive@>
10530primitive("leftmarginkern",convert,left_margin_kern_code);@/
10531@!@:left_margin_kern_}{\.{\\leftmarginkern} primitive@>
10532primitive("rightmarginkern",convert,right_margin_kern_code);@/
10533@!@:right_margin_kern_}{\.{\\rightmarginkern} primitive@>
10534
10535primitive("Uchar",convert,XeTeX_Uchar_code);@/
10536@!@:XeTeX_Uchar_}{\.{\\Uchar} primitive@>
10537primitive("Ucharcat",convert,XeTeX_Ucharcat_code);@/
10538@!@:XeTeX_Ucharcat_}{\.{\\Ucharcat} primitive@>
10539
10540@ @<Cases of |print_cmd_chr|...@>=
10541convert: case chr_code of
10542  number_code: print_esc("number");
10543  roman_numeral_code: print_esc("romannumeral");
10544  string_code: print_esc("string");
10545  meaning_code: print_esc("meaning");
10546  font_name_code:  print_esc("fontname");
10547  pdf_strcmp_code: print_esc("strcmp");
10548  left_margin_kern_code:    print_esc("leftmarginkern");
10549  right_margin_kern_code:   print_esc("rightmarginkern");
10550  @/@<Cases of |convert| for |print_cmd_chr|@>@/
10551  othercases print_esc("jobname")
10552  endcases;
10553
10554@ The procedure |conv_toks| uses |str_toks| to insert the token list
10555for |convert| functions into the scanner; `\.{\\outer}' control sequences
10556are allowed to follow `\.{\\string}' and `\.{\\meaning}'.
10557
10558The extra temp string |u| is needed because |pdf_scan_ext_toks| incorporates
10559any pending string in its output. In order to save such a pending string,
10560we have to create a temporary string that is destroyed immediately after.
10561
10562@d save_cur_string==if str_start_macro(str_ptr)<pool_ptr then u:=make_string else u:=0
10563@d restore_cur_string==if u<>0 then decr(str_ptr)
10564
10565@p procedure conv_toks;
10566var old_setting:0..max_selector; {holds |selector| setting}
10567@!save_warning_index, @!save_def_ref:pointer;
10568@!u: str_number;
10569@!c:small_number; {desired type of conversion}
10570@!save_scanner_status:small_number; {|scanner_status| upon entry}
10571@!b:pool_pointer; {base of temporary string}
10572@!fnt,@!arg1,@!arg2:integer; {args for \XeTeX\ extensions}
10573@!font_name_str:str_number; {local vars for \.{\\fontname} quoting extension}
10574@!i:small_number;
10575@!quote_char:UTF16_code;
10576@!cat:small_number; {desired catcode, or 0 for automatic |spacer|/|other_char| selection}
10577@!saved_chr:UnicodeScalar;
10578p, q: pointer;
10579begin cat:=0; c:=cur_chr; @<Scan the argument for command |c|@>;
10580old_setting:=selector; selector:=new_string; b:=pool_ptr;
10581@<Print the result of command |c|@>;
10582selector:=old_setting; link(garbage):=str_toks_cat(b,cat); ins_list(link(temp_head));
10583end;
10584
10585@ Not all catcode values are allowed by \.{\\Ucharcat}:
10586@d illegal_Ucharcat_catcode(#)==(#<left_brace)or(#>other_char)or(#=out_param)or(#=ignore)
10587
10588@<Scan the argument for command |c|@>=
10589case c of
10590number_code,roman_numeral_code: scan_int;
10591string_code, meaning_code: begin save_scanner_status:=scanner_status;
10592  scanner_status:=normal; get_token; scanner_status:=save_scanner_status;
10593  end;
10594font_name_code: scan_font_ident;
10595XeTeX_Uchar_code: scan_usv_num;
10596XeTeX_Ucharcat_code:
10597  begin
10598    scan_usv_num;
10599    saved_chr:=cur_val;
10600    scan_int;
10601    if illegal_Ucharcat_catcode(cur_val) then
10602      begin print_err("Invalid code ("); print_int(cur_val);
10603@.Invalid code@>
10604      print("), should be in the ranges 1..4, 6..8, 10..12");
10605      help1("I'm going to use 12 instead of that illegal code value.");@/
10606      error; cat:=12;
10607    end else
10608     cat:=cur_val;
10609    cur_val:=saved_chr;
10610  end;
10611@/@<Cases of `Scan the argument for command |c|'@>@/
10612job_name_code: if job_name=0 then open_log_file;
10613end {there are no other cases}
10614
10615@ @<Print the result of command |c|@>=
10616case c of
10617number_code: print_int(cur_val);
10618roman_numeral_code: print_roman_int(cur_val);
10619string_code:if cur_cs<>0 then sprint_cs(cur_cs)
10620  else print_char(cur_chr);
10621meaning_code: print_meaning;
10622font_name_code: begin
10623  font_name_str:=font_name[cur_val];
10624  if is_native_font(cur_val) then begin
10625    quote_char:="""";
10626    for i:=0 to length(font_name_str) - 1 do
10627     if str_pool[str_start_macro(font_name_str) + i] = """" then quote_char:="'";
10628    print_char(quote_char);
10629    print(font_name_str);
10630    print_char(quote_char);
10631  end else
10632    print(font_name_str);
10633  if font_size[cur_val]<>font_dsize[cur_val] then
10634    begin print(" at "); print_scaled(font_size[cur_val]);
10635    print("pt");
10636    end;
10637  end;
10638XeTeX_Uchar_code,
10639XeTeX_Ucharcat_code: print_char(cur_val);
10640@/@<Cases of `Print the result of command |c|'@>@/
10641job_name_code: print_file_name(job_name, 0, 0);
10642end {there are no other cases}
10643
10644@ Now we can't postpone the difficulties any longer; we must bravely tackle
10645|scan_toks|. This function returns a pointer to the tail of a new token
10646list, and it also makes |def_ref| point to the reference count at the
10647head of that list.
10648
10649There are two boolean parameters, |macro_def| and |xpand|. If |macro_def|
10650is true, the goal is to create the token list for a macro definition;
10651otherwise the goal is to create the token list for some other \TeX\
10652primitive: \.{\\mark}, \.{\\output}, \.{\\everypar}, \.{\\lowercase},
10653\.{\\uppercase}, \.{\\message}, \.{\\errmessage}, \.{\\write}, or
10654\.{\\special}. In the latter cases a left brace must be scanned next; this
10655left brace will not be part of the token list, nor will the matching right
10656brace that comes at the end. If |xpand| is false, the token list will
10657simply be copied from the input using |get_token|. Otherwise all expandable
10658tokens will be expanded until unexpandable tokens are left, except that
10659the results of expanding `\.{\\the}' are not expanded further.
10660If both |macro_def| and |xpand| are true, the expansion applies
10661only to the macro body (i.e., to the material following the first
10662|left_brace| character).
10663
10664The value of |cur_cs| when |scan_toks| begins should be the |eqtb|
10665address of the control sequence to display in ``runaway'' error
10666messages.
10667
10668@p function scan_toks(@!macro_def,@!xpand:boolean):pointer;
10669label found,done,done1,done2;
10670var t:halfword; {token representing the highest parameter number}
10671@!s:halfword; {saved token}
10672@!p:pointer; {tail of the token list being built}
10673@!q:pointer; {new node being added to the token list via |store_new_token|}
10674@!unbalance:halfword; {number of unmatched left braces}
10675@!hash_brace:halfword; {possible `\.{\#\{}' token}
10676begin if macro_def then scanner_status:=defining
10677@+else scanner_status:=absorbing;
10678warning_index:=cur_cs; def_ref:=get_avail; token_ref_count(def_ref):=null;
10679p:=def_ref; hash_brace:=0; t:=zero_token;
10680if macro_def then @<Scan and build the parameter part of the macro definition@>
10681else scan_left_brace; {remove the compulsory left brace}
10682@<Scan and build the body of the token list; |goto found| when finished@>;
10683found: scanner_status:=normal;
10684if hash_brace<>0 then store_new_token(hash_brace);
10685scan_toks:=p;
10686end;
10687
10688@ @<Scan and build the parameter part...@>=
10689begin loop begin get_token; {set |cur_cmd|, |cur_chr|, |cur_tok|}
10690  if cur_tok<right_brace_limit then goto done1;
10691  if cur_cmd=mac_param then
10692    @<If the next character is a parameter number, make |cur_tok|
10693      a |match| token; but if it is a left brace, store
10694      `|left_brace|, |end_match|', set |hash_brace|, and |goto done|@>;
10695  store_new_token(cur_tok);
10696  end;
10697done1: store_new_token(end_match_token);
10698if cur_cmd=right_brace then
10699  @<Express shock at the missing left brace; |goto found|@>;
10700done: end
10701
10702@ @<Express shock...@>=
10703begin print_err("Missing { inserted"); incr(align_state);
10704@.Missing \{ inserted@>
10705help2("Where was the left brace? You said something like `\def\a}',")@/
10706  ("which I'm going to interpret as `\def\a{}'."); error; goto found;
10707end
10708
10709@ @<If the next character is a parameter number...@>=
10710begin s:=match_token+cur_chr; get_token;
10711if cur_cmd=left_brace then
10712  begin hash_brace:=cur_tok;
10713  store_new_token(cur_tok); store_new_token(end_match_token);
10714  goto done;
10715  end;
10716if t=zero_token+9 then
10717  begin print_err("You already have nine parameters");
10718@.You already have nine...@>
10719  help1("I'm going to ignore the # sign you just used."); error;
10720  end
10721else  begin incr(t);
10722  if cur_tok<>t then
10723    begin print_err("Parameters must be numbered consecutively");
10724@.Parameters...consecutively@>
10725    help2("I've inserted the digit you should have used after the #.")@/
10726      ("Type `1' to delete what you did use."); back_error;
10727    end;
10728  cur_tok:=s;
10729  end;
10730end
10731
10732@ @<Scan and build the body of the token list; |goto found| when finished@>=
10733unbalance:=1;
10734loop@+  begin if xpand then @<Expand the next part of the input@>
10735  else get_token;
10736  if cur_tok<right_brace_limit then
10737    if cur_cmd<right_brace then incr(unbalance)
10738    else  begin decr(unbalance);
10739      if unbalance=0 then goto found;
10740      end
10741  else if cur_cmd=mac_param then
10742    if macro_def then @<Look for parameter number or \.{\#\#}@>;
10743  store_new_token(cur_tok);
10744  end
10745
10746@ Here we insert an entire token list created by |the_toks| without
10747expanding it further.
10748
10749@<Expand the next part of the input@>=
10750begin loop begin get_next;
10751  if cur_cmd>=call then
10752    if info(link(cur_chr))=protected_token then
10753      begin cur_cmd:=relax; cur_chr:=no_expand_flag;
10754      end;
10755  if cur_cmd<=max_command then goto done2;
10756  if cur_cmd<>the then expand
10757  else  begin q:=the_toks;
10758    if link(temp_head)<>null then
10759      begin link(p):=link(temp_head); p:=q;
10760      end;
10761    end;
10762  end;
10763done2: x_token
10764end
10765
10766@ @<Look for parameter number...@>=
10767begin s:=cur_tok;
10768if xpand then get_x_token else get_token;
10769if cur_cmd<>mac_param then
10770  if (cur_tok<=zero_token)or(cur_tok>t) then
10771    begin print_err("Illegal parameter number in definition of ");
10772@.Illegal parameter number...@>
10773    sprint_cs(warning_index);
10774    help3("You meant to type ## instead of #, right?")@/
10775    ("Or maybe a } was forgotten somewhere earlier, and things")@/
10776    ("are all screwed up? I'm going to assume that you meant ##.");
10777    back_error; cur_tok:=s;
10778    end
10779  else cur_tok:=out_param_token-"0"+cur_chr;
10780end
10781
10782@ Another way to create a token list is via the \.{\\read} command. The
10783sixteen files potentially usable for reading appear in the following
10784global variables. The value of |read_open[n]| will be |closed| if
10785stream number |n| has not been opened or if it has been fully read;
10786|just_open| if an \.{\\openin} but not a \.{\\read} has been done;
10787and |normal| if it is open and ready to read the next line.
10788
10789@d closed=2 {not open, or at end of file}
10790@d just_open=1 {newly opened, first line not yet read}
10791
10792@<Glob...@>=
10793@!read_file:array[0..15] of unicode_file; {used for \.{\\read}}
10794@!read_open:array[0..16] of normal..closed; {state of |read_file[n]|}
10795
10796@ @<Set init...@>=
10797for k:=0 to 16 do read_open[k]:=closed;
10798
10799@ The |read_toks| procedure constructs a token list like that for any
10800macro definition, and makes |cur_val| point to it. Parameter |r| points
10801to the control sequence that will receive this token list.
10802
10803@p procedure read_toks(@!n:integer;@!r:pointer;@!j:halfword);
10804label done;
10805var p:pointer; {tail of the token list}
10806@!q:pointer; {new node being added to the token list via |store_new_token|}
10807@!s:integer; {saved value of |align_state|}
10808@!m:small_number; {stream number}
10809begin scanner_status:=defining; warning_index:=r;
10810def_ref:=get_avail; token_ref_count(def_ref):=null;
10811p:=def_ref; {the reference count}
10812store_new_token(end_match_token);
10813if (n<0)or(n>15) then m:=16@+else m:=n;
10814s:=align_state; align_state:=1000000; {disable tab marks, etc.}
10815repeat @<Input and store tokens from the next line of the file@>;
10816until align_state=1000000;
10817cur_val:=def_ref; scanner_status:=normal; align_state:=s;
10818end;
10819
10820@ @<Input and store tokens from the next line of the file@>=
10821begin_file_reading; name:=m+1;
10822if read_open[m]=closed then @<Input for \.{\\read} from the terminal@>
10823else if read_open[m]=just_open then @<Input the first line of |read_file[m]|@>
10824else @<Input the next line of |read_file[m]|@>;
10825limit:=last;
10826if end_line_char_inactive then decr(limit)
10827else  buffer[limit]:=end_line_char;
10828first:=limit+1; loc:=start; state:=new_line;@/
10829@<Handle \.{\\readline} and |goto done|@>;@/
10830loop@+  begin get_token;
10831  if cur_tok=0 then goto done;
10832    {|cur_cmd=cur_chr=0| will occur at the end of the line}
10833  if align_state<1000000 then {unmatched `\.\}' aborts the line}
10834    begin repeat get_token; until cur_tok=0;
10835    align_state:=1000000; goto done;
10836    end;
10837  store_new_token(cur_tok);
10838  end;
10839done: end_file_reading
10840
10841@ Here we input on-line into the |buffer| array, prompting the user explicitly
10842if |n>=0|.  The value of |n| is set negative so that additional prompts
10843will not be given in the case of multi-line input.
10844
10845@<Input for \.{\\read} from the terminal@>=
10846if interaction>nonstop_mode then
10847  if n<0 then prompt_input("")
10848  else  begin wake_up_terminal;
10849    print_ln; sprint_cs(r); prompt_input("="); n:=-1;
10850    end
10851else fatal_error("*** (cannot \read from terminal in nonstop modes)")
10852@.cannot \\read@>
10853
10854@ The first line of a file must be treated specially, since |input_ln|
10855must be told not to start with |get|.
10856@^system dependencies@>
10857
10858@<Input the first line of |read_file[m]|@>=
10859if input_ln(read_file[m],false) then read_open[m]:=normal
10860else  begin u_close(read_file[m]); read_open[m]:=closed;
10861  end
10862
10863@ An empty line is appended at the end of a |read_file|.
10864@^empty line at end of file@>
10865
10866@<Input the next line of |read_file[m]|@>=
10867begin if not input_ln(read_file[m],true) then
10868  begin u_close(read_file[m]); read_open[m]:=closed;
10869  if align_state<>1000000 then
10870    begin runaway;
10871    print_err("File ended within "); print_esc("read");
10872@.File ended within \\read@>
10873    help1("This \read has unbalanced braces.");
10874    align_state:=1000000; error;
10875    end;
10876  end;
10877end
10878
10879@* \[28] Conditional processing.
10880We consider now the way \TeX\ handles various kinds of \.{\\if} commands.
10881
10882@d unless_code=32 {amount added for `\.{\\unless}' prefix}
10883@#
10884@d if_char_code=0 { `\.{\\if}' }
10885@d if_cat_code=1 { `\.{\\ifcat}' }
10886@d if_int_code=2 { `\.{\\ifnum}' }
10887@d if_dim_code=3 { `\.{\\ifdim}' }
10888@d if_odd_code=4 { `\.{\\ifodd}' }
10889@d if_vmode_code=5 { `\.{\\ifvmode}' }
10890@d if_hmode_code=6 { `\.{\\ifhmode}' }
10891@d if_mmode_code=7 { `\.{\\ifmmode}' }
10892@d if_inner_code=8 { `\.{\\ifinner}' }
10893@d if_void_code=9 { `\.{\\ifvoid}' }
10894@d if_hbox_code=10 { `\.{\\ifhbox}' }
10895@d if_vbox_code=11 { `\.{\\ifvbox}' }
10896@d ifx_code=12 { `\.{\\ifx}' }
10897@d if_eof_code=13 { `\.{\\ifeof}' }
10898@d if_true_code=14 { `\.{\\iftrue}' }
10899@d if_false_code=15 { `\.{\\iffalse}' }
10900@d if_case_code=16 { `\.{\\ifcase}' }
10901@d if_primitive_code=21 { `\.{\\ifprimitive}' }
10902
10903@<Put each...@>=
10904primitive("if",if_test,if_char_code);
10905@!@:if_char_}{\.{\\if} primitive@>
10906primitive("ifcat",if_test,if_cat_code);
10907@!@:if_cat_code_}{\.{\\ifcat} primitive@>
10908primitive("ifnum",if_test,if_int_code);
10909@!@:if_int_}{\.{\\ifnum} primitive@>
10910primitive("ifdim",if_test,if_dim_code);
10911@!@:if_dim_}{\.{\\ifdim} primitive@>
10912primitive("ifodd",if_test,if_odd_code);
10913@!@:if_odd_}{\.{\\ifodd} primitive@>
10914primitive("ifvmode",if_test,if_vmode_code);
10915@!@:if_vmode_}{\.{\\ifvmode} primitive@>
10916primitive("ifhmode",if_test,if_hmode_code);
10917@!@:if_hmode_}{\.{\\ifhmode} primitive@>
10918primitive("ifmmode",if_test,if_mmode_code);
10919@!@:if_mmode_}{\.{\\ifmmode} primitive@>
10920primitive("ifinner",if_test,if_inner_code);
10921@!@:if_inner_}{\.{\\ifinner} primitive@>
10922primitive("ifvoid",if_test,if_void_code);
10923@!@:if_void_}{\.{\\ifvoid} primitive@>
10924primitive("ifhbox",if_test,if_hbox_code);
10925@!@:if_hbox_}{\.{\\ifhbox} primitive@>
10926primitive("ifvbox",if_test,if_vbox_code);
10927@!@:if_vbox_}{\.{\\ifvbox} primitive@>
10928primitive("ifx",if_test,ifx_code);
10929@!@:ifx_}{\.{\\ifx} primitive@>
10930primitive("ifeof",if_test,if_eof_code);
10931@!@:if_eof_}{\.{\\ifeof} primitive@>
10932primitive("iftrue",if_test,if_true_code);
10933@!@:if_true_}{\.{\\iftrue} primitive@>
10934primitive("iffalse",if_test,if_false_code);
10935@!@:if_false_}{\.{\\iffalse} primitive@>
10936primitive("ifcase",if_test,if_case_code);
10937@!@:if_case_}{\.{\\ifcase} primitive@>
10938primitive("ifprimitive",if_test,if_primitive_code);
10939@!@:if_primitive_}{\.{\\ifprimitive} primitive@>
10940
10941@ @<Cases of |print_cmd_chr|...@>=
10942if_test: begin if chr_code>=unless_code then print_esc("unless");
10943case chr_code mod unless_code of
10944  if_cat_code:print_esc("ifcat");
10945  if_int_code:print_esc("ifnum");
10946  if_dim_code:print_esc("ifdim");
10947  if_odd_code:print_esc("ifodd");
10948  if_vmode_code:print_esc("ifvmode");
10949  if_hmode_code:print_esc("ifhmode");
10950  if_mmode_code:print_esc("ifmmode");
10951  if_inner_code:print_esc("ifinner");
10952  if_void_code:print_esc("ifvoid");
10953  if_hbox_code:print_esc("ifhbox");
10954  if_vbox_code:print_esc("ifvbox");
10955  ifx_code:print_esc("ifx");
10956  if_eof_code:print_esc("ifeof");
10957  if_true_code:print_esc("iftrue");
10958  if_false_code:print_esc("iffalse");
10959  if_case_code:print_esc("ifcase");
10960  if_primitive_code:print_esc("ifprimitive");
10961  @/@<Cases of |if_test| for |print_cmd_chr|@>@/
10962  othercases print_esc("if")
10963  endcases;
10964end;
10965
10966@ Conditions can be inside conditions, and this nesting has a stack
10967that is independent of the |save_stack|.
10968
10969Four global variables represent the top of the condition stack:
10970|cond_ptr| points to pushed-down entries, if any; |if_limit| specifies
10971the largest code of a |fi_or_else| command that is syntactically legal;
10972|cur_if| is the name of the current type of conditional; and |if_line|
10973is the line number at which it began.
10974
10975If no conditions are currently in progress, the condition stack has the
10976special state |cond_ptr=null|, |if_limit=normal|, |cur_if=0|, |if_line=0|.
10977Otherwise |cond_ptr| points to a two-word node; the |type|, |subtype|, and
10978|link| fields of the first word contain |if_limit|, |cur_if|, and
10979|cond_ptr| at the next level, and the second word contains the
10980corresponding |if_line|.
10981
10982@d if_node_size=2 {number of words in stack entry for conditionals}
10983@d if_line_field(#)==mem[#+1].int
10984@d if_code=1 {code for \.{\\if...} being evaluated}
10985@d fi_code=2 {code for \.{\\fi}}
10986@d else_code=3 {code for \.{\\else}}
10987@d or_code=4 {code for \.{\\or}}
10988
10989@<Glob...@>=
10990@!cond_ptr:pointer; {top of the condition stack}
10991@!if_limit:normal..or_code; {upper bound on |fi_or_else| codes}
10992@!cur_if:small_number; {type of conditional being worked on}
10993@!if_line:integer; {line where that conditional began}
10994
10995@ @<Set init...@>=
10996cond_ptr:=null; if_limit:=normal; cur_if:=0; if_line:=0;
10997
10998@ @<Put each...@>=
10999primitive("fi",fi_or_else,fi_code);
11000@!@:fi_}{\.{\\fi} primitive@>
11001text(frozen_fi):="fi"; eqtb[frozen_fi]:=eqtb[cur_val];
11002primitive("or",fi_or_else,or_code);
11003@!@:or_}{\.{\\or} primitive@>
11004primitive("else",fi_or_else,else_code);
11005@!@:else_}{\.{\\else} primitive@>
11006
11007@ @<Cases of |print_cmd_chr|...@>=
11008fi_or_else: if chr_code=fi_code then print_esc("fi")
11009  else if chr_code=or_code then print_esc("or")
11010  else print_esc("else");
11011
11012@ When we skip conditional text, we keep track of the line number
11013where skipping began, for use in error messages.
11014
11015@<Glob...@>=
11016@!skip_line:integer; {skipping began here}
11017
11018@ Here is a procedure that ignores text until coming to an \.{\\or},
11019\.{\\else}, or \.{\\fi} at level zero of $\.{\\if}\ldots\.{\\fi}$
11020nesting. After it has acted, |cur_chr| will indicate the token that
11021was found, but |cur_tok| will not be set (because this makes the
11022procedure run faster).
11023
11024@p procedure pass_text;
11025label done;
11026var l:integer; {level of $\.{\\if}\ldots\.{\\fi}$ nesting}
11027@!save_scanner_status:small_number; {|scanner_status| upon entry}
11028begin save_scanner_status:=scanner_status; scanner_status:=skipping; l:=0;
11029skip_line:=line;
11030loop@+  begin get_next;
11031  if cur_cmd=fi_or_else then
11032    begin if l=0 then goto done;
11033    if cur_chr=fi_code then decr(l);
11034    end
11035  else if cur_cmd=if_test then incr(l);
11036  end;
11037done: scanner_status:=save_scanner_status;
11038if tracing_ifs>0 then show_cur_cmd_chr;
11039end;
11040
11041@ When we begin to process a new \.{\\if}, we set |if_limit:=if_code|; then
11042if\/ \.{\\or} or \.{\\else} or \.{\\fi} occurs before the current \.{\\if}
11043condition has been evaluated, \.{\\relax} will be inserted.
11044For example, a sequence of commands like `\.{\\ifvoid1\\else...\\fi}'
11045would otherwise require something after the `\.1'.
11046
11047@<Push the condition stack@>=
11048begin p:=get_node(if_node_size); link(p):=cond_ptr; type(p):=if_limit;
11049subtype(p):=cur_if; if_line_field(p):=if_line;
11050cond_ptr:=p; cur_if:=cur_chr; if_limit:=if_code; if_line:=line;
11051end
11052
11053@ @<Pop the condition stack@>=
11054begin if if_stack[in_open]=cond_ptr then if_warning;
11055  {conditionals possibly not properly nested with files}
11056p:=cond_ptr; if_line:=if_line_field(p);
11057cur_if:=subtype(p); if_limit:=type(p); cond_ptr:=link(p);
11058free_node(p,if_node_size);
11059end
11060
11061@ Here's a procedure that changes the |if_limit| code corresponding to
11062a given value of |cond_ptr|.
11063
11064@p procedure change_if_limit(@!l:small_number;@!p:pointer);
11065label exit;
11066var q:pointer;
11067begin if p=cond_ptr then if_limit:=l {that's the easy case}
11068else  begin q:=cond_ptr;
11069  loop@+  begin if q=null then confusion("if");
11070@:this can't happen if}{\quad if@>
11071    if link(q)=p then
11072      begin type(q):=l; return;
11073      end;
11074    q:=link(q);
11075    end;
11076  end;
11077exit:end;
11078
11079@ A condition is started when the |expand| procedure encounters
11080an |if_test| command; in that case |expand| reduces to |conditional|,
11081which is a recursive procedure.
11082@^recursion@>
11083
11084@p procedure conditional;
11085label exit,common_ending;
11086var b:boolean; {is the condition true?}
11087@!e:boolean; {keep track of nested csnames}
11088@!r:"<"..">"; {relation to be evaluated}
11089@!m,@!n:integer; {to be tested against the second operand}
11090@!p,@!q:pointer; {for traversing token lists in \.{\\ifx} tests}
11091@!save_scanner_status:small_number; {|scanner_status| upon entry}
11092@!save_cond_ptr:pointer; {|cond_ptr| corresponding to this conditional}
11093@!this_if:small_number; {type of this conditional}
11094@!is_unless:boolean; {was this if preceded by `\.{\\unless}' ?}
11095begin if tracing_ifs>0 then if tracing_commands<=1 then show_cur_cmd_chr;
11096@<Push the condition stack@>;@+save_cond_ptr:=cond_ptr;
11097is_unless:=(cur_chr>=unless_code); this_if:=cur_chr mod unless_code;@/
11098@<Either process \.{\\ifcase} or set |b| to the value of a boolean condition@>;
11099if is_unless then b:=not b;
11100if tracing_commands>1 then @<Display the value of |b|@>;
11101if b then
11102  begin change_if_limit(else_code,save_cond_ptr);
11103  return; {wait for \.{\\else} or \.{\\fi}}
11104  end;
11105@<Skip to \.{\\else} or \.{\\fi}, then |goto common_ending|@>;
11106common_ending: if cur_chr=fi_code then @<Pop the condition stack@>
11107else if_limit:=fi_code; {wait for \.{\\fi}}
11108exit:end;
11109
11110@ In a construction like `\.{\\if\\iftrue abc\\else d\\fi}', the first
11111\.{\\else} that we come to after learning that the \.{\\if} is false is
11112not the \.{\\else} we're looking for. Hence the following curious
11113logic is needed.
11114
11115@ @<Skip to \.{\\else} or \.{\\fi}...@>=
11116loop@+  begin pass_text;
11117  if cond_ptr=save_cond_ptr then
11118    begin if cur_chr<>or_code then goto common_ending;
11119    print_err("Extra "); print_esc("or");
11120@.Extra \\or@>
11121    help1("I'm ignoring this; it doesn't match any \if.");
11122    error;
11123    end
11124  else if cur_chr=fi_code then @<Pop the condition stack@>;
11125  end
11126
11127@ @<Either process \.{\\ifcase} or set |b|...@>=
11128case this_if of
11129if_char_code, if_cat_code: @<Test if two characters match@>;
11130if_int_code, if_dim_code: @<Test relation between integers or dimensions@>;
11131if_odd_code: @<Test if an integer is odd@>;
11132if_vmode_code: b:=(abs(mode)=vmode);
11133if_hmode_code: b:=(abs(mode)=hmode);
11134if_mmode_code: b:=(abs(mode)=mmode);
11135if_inner_code: b:=(mode<0);
11136if_void_code, if_hbox_code, if_vbox_code: @<Test box register status@>;
11137ifx_code: @<Test if two tokens match@>;
11138if_eof_code: begin scan_four_bit_int; b:=(read_open[cur_val]=closed);
11139  end;
11140if_true_code: b:=true;
11141if_false_code: b:=false;
11142@/@<Cases for |conditional|@>@/
11143if_case_code: @<Select the appropriate case
11144  and |return| or |goto common_ending|@>;
11145if_primitive_code: begin
11146  save_scanner_status:=scanner_status;
11147  scanner_status:=normal;
11148  get_next;
11149  scanner_status:=save_scanner_status;
11150  if cur_cs < hash_base then
11151    m:=prim_lookup(cur_cs-257)
11152  else
11153    m:=prim_lookup(text(cur_cs));
11154  b :=((cur_cmd<>undefined_cs) and
11155       (m<>undefined_primitive) and
11156       (cur_cmd=prim_eq_type(m)) and
11157       (cur_chr=prim_equiv(m)));
11158  end;
11159end {there are no other cases}
11160
11161@ @<Display the value of |b|@>=
11162begin begin_diagnostic;
11163if b then print("{true}")@+else print("{false}");
11164end_diagnostic(false);
11165end
11166
11167@ Here we use the fact that |"<"|, |"="|, and |">"| are consecutive ASCII
11168codes.
11169@^ASCII code@>
11170
11171@<Test relation between integers or dimensions@>=
11172begin if this_if=if_int_code then scan_int@+else scan_normal_dimen;
11173n:=cur_val; @<Get the next non-blank non-call...@>;
11174if (cur_tok>=other_token+"<")and(cur_tok<=other_token+">") then
11175  r:=cur_tok-other_token
11176else  begin print_err("Missing = inserted for ");
11177@.Missing = inserted@>
11178  print_cmd_chr(if_test,this_if);
11179  help1("I was expecting to see `<', `=', or `>'. Didn't.");
11180  back_error; r:="=";
11181  end;
11182if this_if=if_int_code then scan_int@+else scan_normal_dimen;
11183case r of
11184"<": b:=(n<cur_val);
11185"=": b:=(n=cur_val);
11186">": b:=(n>cur_val);
11187end;
11188end
11189
11190@ @<Test if an integer is odd@>=
11191begin scan_int; b:=odd(cur_val);
11192end
11193
11194@ @<Test box register status@>=
11195begin scan_register_num; fetch_box(p);
11196if this_if=if_void_code then b:=(p=null)
11197else if p=null then b:=false
11198else if this_if=if_hbox_code then b:=(type(p)=hlist_node)
11199else b:=(type(p)=vlist_node);
11200end
11201
11202@ An active character will be treated as category 13 following
11203\.{\\if\\noexpand} or following \.{\\ifcat\\noexpand}. We use the fact that
11204active characters have the smallest tokens, among all control sequences.
11205
11206@d get_x_token_or_active_char==@t@>@;
11207  begin get_x_token;
11208  if cur_cmd=relax then if cur_chr=no_expand_flag then
11209    begin cur_cmd:=active_char;
11210    cur_chr:=cur_tok-cs_token_flag-active_base;
11211    end;
11212  end
11213
11214@<Test if two characters match@>=
11215begin get_x_token_or_active_char;
11216if (cur_cmd>active_char)or(cur_chr>biggest_usv) then {not a character}
11217  begin m:=relax; n:=too_big_usv;
11218  end
11219else  begin m:=cur_cmd; n:=cur_chr;
11220  end;
11221get_x_token_or_active_char;
11222if (cur_cmd>active_char)or(cur_chr>biggest_usv) then
11223  begin cur_cmd:=relax; cur_chr:=too_big_usv;
11224  end;
11225if this_if=if_char_code then b:=(n=cur_chr)@+else b:=(m=cur_cmd);
11226end
11227
11228@ Note that `\.{\\ifx}' will declare two macros different if one is \\{long}
11229or \\{outer} and the other isn't, even though the texts of the macros are
11230the same.
11231
11232We need to reset |scanner_status|, since \.{\\outer} control sequences
11233are allowed, but we might be scanning a macro definition or preamble.
11234
11235@<Test if two tokens match@>=
11236begin save_scanner_status:=scanner_status; scanner_status:=normal;
11237get_next; n:=cur_cs; p:=cur_cmd; q:=cur_chr;
11238get_next; if cur_cmd<>p then b:=false
11239else if cur_cmd<call then b:=(cur_chr=q)
11240else @<Test if two macro texts match@>;
11241scanner_status:=save_scanner_status;
11242end
11243
11244@ Note also that `\.{\\ifx}' decides that macros \.{\\a} and \.{\\b} are
11245different in examples like this:
11246$$\vbox{\halign{\.{#}\hfil&\qquad\.{#}\hfil\cr
11247  {}\\def\\a\{\\c\}&
11248  {}\\def\\c\{\}\cr
11249  {}\\def\\b\{\\d\}&
11250  {}\\def\\d\{\}\cr}}$$
11251
11252@<Test if two macro texts match@>=
11253begin p:=link(cur_chr); q:=link(equiv(n)); {omit reference counts}
11254if p=q then b:=true
11255else begin while (p<>null)and(q<>null) do
11256    if info(p)<>info(q) then p:=null
11257    else  begin p:=link(p); q:=link(q);
11258      end;
11259  b:=((p=null)and(q=null));
11260  end;
11261end
11262
11263@ @<Select the appropriate case and |return| or |goto common_ending|@>=
11264begin scan_int; n:=cur_val; {|n| is the number of cases to pass}
11265if tracing_commands>1 then
11266  begin begin_diagnostic; print("{case "); print_int(n); print_char("}");
11267  end_diagnostic(false);
11268  end;
11269while n<>0 do
11270  begin pass_text;
11271  if cond_ptr=save_cond_ptr then
11272    if cur_chr=or_code then decr(n)
11273    else goto common_ending
11274  else if cur_chr=fi_code then @<Pop the condition stack@>;
11275  end;
11276change_if_limit(or_code,save_cond_ptr);
11277return; {wait for \.{\\or}, \.{\\else}, or \.{\\fi}}
11278end
11279
11280@ The processing of conditionals is complete except for the following
11281code, which is actually part of |expand|. It comes into play when
11282\.{\\or}, \.{\\else}, or \.{\\fi} is scanned.
11283
11284@<Terminate the current conditional and skip to \.{\\fi}@>=
11285begin if tracing_ifs>0 then if tracing_commands<=1 then show_cur_cmd_chr;
11286if cur_chr>if_limit then
11287  if if_limit=if_code then insert_relax {condition not yet evaluated}
11288  else  begin print_err("Extra "); print_cmd_chr(fi_or_else,cur_chr);
11289@.Extra \\or@>
11290@.Extra \\else@>
11291@.Extra \\fi@>
11292    help1("I'm ignoring this; it doesn't match any \if.");
11293    error;
11294    end
11295else  begin while cur_chr<>fi_code do pass_text; {skip to \.{\\fi}}
11296  @<Pop the condition stack@>;
11297  end;
11298end
11299
11300@* \[29] File names.
11301It's time now to fret about file names.  Besides the fact that different
11302operating systems treat files in different ways, we must cope with the
11303fact that completely different naming conventions are used by different
11304groups of people. The following programs show what is required for one
11305particular operating system; similar routines for other systems are not
11306difficult to devise.
11307@^fingers@>
11308@^system dependencies@>
11309
11310\TeX\ assumes that a file name has three parts: the name proper; its
11311``extension''; and a ``file area'' where it is found in an external file
11312system.  The extension of an input file or a write file is assumed to be
11313`\.{.tex}' unless otherwise specified; it is `\.{.log}' on the
11314transcript file that records each run of \TeX; it is `\.{.tfm}' on the font
11315metric files that describe characters in the fonts \TeX\ uses; it is
11316`\.{.dvi}' on the output files that specify typesetting information; and it
11317is `\.{.fmt}' on the format files written by \.{INITEX} to initialize \TeX.
11318The file area can be arbitrary on input files, but files are usually
11319output to the user's current area.  If an input file cannot be
11320found on the specified area, \TeX\ will look for it on a special system
11321area; this special area is intended for commonly used input files like
11322\.{webmac.tex}.
11323
11324Simple uses of \TeX\ refer only to file names that have no explicit
11325extension or area. For example, a person usually says `\.{\\input} \.{paper}'
11326or `\.{\\font\\tenrm} \.= \.{helvetica}' instead of `\.{\\input}
11327\.{paper.new}' or `\.{\\font\\tenrm} \.= \.{<csd.knuth>test}'. Simple file
11328names are best, because they make the \TeX\ source files portable;
11329whenever a file name consists entirely of letters and digits, it should be
11330treated in the same way by all implementations of \TeX. However, users
11331need the ability to refer to other files in their environment, especially
11332when responding to error messages concerning unopenable files; therefore
11333we want to let them use the syntax that appears in their favorite
11334operating system.
11335
11336@ In order to isolate the system-dependent aspects of file names, the
11337@^system dependencies@>
11338system-independent parts of \TeX\ are expressed in terms
11339of three system-dependent
11340procedures called |begin_name|, |more_name|, and |end_name|. In
11341essence, if the user-specified characters of the file name are $c_1\ldots c_n$,
11342the system-independent driver program does the operations
11343$$|begin_name|;\,|more_name|(c_1);\,\ldots\,;\,|more_name|(c_n);
11344\,|end_name|.$$
11345These three procedures communicate with each other via global variables.
11346Afterwards the file name will appear in the string pool as three strings
11347called |cur_name|\penalty10000\hskip-.05em,
11348|cur_area|, and |cur_ext|; the latter two are null (i.e.,
11349|""|), unless they were explicitly specified by the user.
11350
11351Actually the situation is slightly more complicated, because \TeX\ needs
11352to know when the file name ends. The |more_name| routine is a function
11353(with side effects) that returns |true| on the calls |more_name|$(c_1)$,
11354\dots, |more_name|$(c_{n-1})$. The final call |more_name|$(c_n)$
11355returns |false|; or, it returns |true| and the token following $c_n$ is
11356something like `\.{\\hbox}' (i.e., not a character). In other words,
11357|more_name| is supposed to return |true| unless it is sure that the
11358file name has been completely scanned; and |end_name| is supposed to be able
11359to finish the assembly of |cur_name|, |cur_area|, and |cur_ext| regardless of
11360whether $|more_name|(c_n)$ returned |true| or |false|.
11361
11362@<Glob...@>=
11363@!cur_name:str_number; {name of file just scanned}
11364@!cur_area:str_number; {file area just scanned, or \.{""}}
11365@!cur_ext:str_number; {file extension just scanned, or \.{""}}
11366
11367@ The file names we shall deal with for illustrative purposes have the
11368following structure:  If the name contains `\.>' or `\.:', the file area
11369consists of all characters up to and including the final such character;
11370otherwise the file area is null.  If the remaining file name contains
11371`\..', the file extension consists of all such characters from the first
11372remaining `\..' to the end, otherwise the file extension is null.
11373@^system dependencies@>
11374
11375We can scan such file names easily by using two global variables that keep track
11376of the occurrences of area and extension delimiters:
11377
11378@<Glob...@>=
11379@!area_delimiter:pool_pointer; {the most recent `\.>' or `\.:', if any}
11380@!ext_delimiter:pool_pointer; {the relevant `\..', if any}
11381@!file_name_quote_char:UTF16_code;
11382
11383@ Input files that can't be found in the user's area may appear in a standard
11384system area called |TEX_area|. Font metric files whose areas are not given
11385explicitly are assumed to appear in a standard system area called
11386|TEX_font_area|.  These system area names will, of course, vary from place
11387to place.
11388@^system dependencies@>
11389
11390@d TEX_area=="TeXinputs:"
11391@.TeXinputs@>
11392@d TEX_font_area=="TeXfonts:"
11393@.TeXfonts@>
11394
11395@ Here now is the first of the system-dependent routines for file name scanning.
11396@^system dependencies@>
11397
11398@p procedure begin_name;
11399begin area_delimiter:=0; ext_delimiter:=0;
11400file_name_quote_char:=0;
11401end;
11402
11403@ And here's the second. The string pool might change as the file name is
11404being scanned, since a new \.{\\csname} might be entered; therefore we keep
11405|area_delimiter| and |ext_delimiter| relative to the beginning of the current
11406string, instead of assigning an absolute address like |pool_ptr| to them.
11407@^system dependencies@>
11408
11409@p function more_name(@!c:ASCII_code):boolean;
11410begin if c=" " then more_name:=false
11411else  begin str_room(1); append_char(c); {contribute |c| to the current string}
11412  if (c=">")or(c=":") then
11413    begin area_delimiter:=cur_length; ext_delimiter:=0;
11414    end
11415  else if (c=".")and(ext_delimiter=0) then ext_delimiter:=cur_length;
11416  more_name:=true;
11417  end;
11418end;
11419
11420@ The third.
11421@^system dependencies@>
11422
11423@p procedure end_name;
11424begin if str_ptr+3>max_strings then
11425  overflow("number of strings",max_strings-init_str_ptr);
11426@:TeX capacity exceeded number of strings}{\quad number of strings@>
11427if area_delimiter=0 then cur_area:=""
11428else  begin cur_area:=str_ptr;
11429  str_start_macro(str_ptr+1):=str_start_macro(str_ptr)+area_delimiter; incr(str_ptr);
11430  end;
11431if ext_delimiter=0 then
11432  begin cur_ext:=""; cur_name:=make_string;
11433  end
11434else  begin cur_name:=str_ptr;
11435  str_start_macro(str_ptr+1):=str_start_macro(str_ptr)+ext_delimiter-area_delimiter-1;
11436  incr(str_ptr); cur_ext:=make_string;
11437  end;
11438end;
11439
11440@ Conversely, here is a routine that takes three strings and prints a file
11441name that might have produced them. (The routine is system dependent, because
11442some operating systems put the file area last instead of first.)
11443@^system dependencies@>
11444
11445@<Basic printing...@>=
11446procedure print_file_name(@!n,@!a,@!e:integer);
11447begin slow_print(a); slow_print(n); slow_print(e);
11448end;
11449
11450@ Another system-dependent routine is needed to convert three internal
11451\TeX\ strings
11452into the |name_of_file| value that is used to open files. The present code
11453allows both lowercase and uppercase letters in the file name.
11454@^system dependencies@>
11455
11456@d append_to_name(#)==begin c:=#; incr(k);
11457  if k<=file_name_size then name_of_file[k]:=xchr[c];
11458  end
11459
11460@p procedure pack_file_name(@!n,@!a,@!e:str_number);
11461var k:integer; {number of positions filled in |name_of_file|}
11462@!c: ASCII_code; {character being packed}
11463@!j:pool_pointer; {index into |str_pool|}
11464begin k:=0;
11465for j:=str_start_macro(a) to str_start_macro(a+1)-1 do append_to_name(so(str_pool[j]));
11466for j:=str_start_macro(n) to str_start_macro(n+1)-1 do append_to_name(so(str_pool[j]));
11467for j:=str_start_macro(e) to str_start_macro(e+1)-1 do append_to_name(so(str_pool[j]));
11468if k<=file_name_size then name_length:=k@+else name_length:=file_name_size;
11469for k:=name_length+1 to file_name_size do name_of_file[k]:=' ';
11470end;
11471
11472@ A messier routine is also needed, since format file names must be scanned
11473before \TeX's string mechanism has been initialized. We shall use the
11474global variable |TEX_format_default| to supply the text for default system areas
11475and extensions related to format files.
11476@^system dependencies@>
11477
11478@d format_default_length=20 {length of the |TEX_format_default| string}
11479@d format_area_length=11 {length of its area part}
11480@d format_ext_length=4 {length of its `\.{.fmt}' part}
11481@d format_extension=".fmt" {the extension, as a \.{WEB} constant}
11482
11483@<Glob...@>=
11484@!TEX_format_default:packed array[1..format_default_length] of char;
11485
11486@ @<Set init...@>=
11487TEX_format_default:='TeXformats:plain.fmt';
11488@.TeXformats@>
11489@.plain@>
11490@^system dependencies@>
11491
11492@ @<Check the ``constant'' values for consistency@>=
11493if format_default_length>file_name_size then bad:=31;
11494
11495@ Here is the messy routine that was just mentioned. It sets |name_of_file|
11496from the first |n| characters of |TEX_format_default|, followed by
11497|buffer[a..b]|, followed by the last |format_ext_length| characters of
11498|TEX_format_default|.
11499
11500We dare not give error messages here, since \TeX\ calls this routine before
11501the |error| routine is ready to roll. Instead, we simply drop excess characters,
11502since the error will be detected in another way when a strange file name
11503isn't found.
11504@^system dependencies@>
11505
11506@p procedure pack_buffered_name(@!n:small_number;@!a,@!b:integer);
11507var k:integer; {number of positions filled in |name_of_file|}
11508@!c: ASCII_code; {character being packed}
11509@!j:integer; {index into |buffer| or |TEX_format_default|}
11510begin if n+b-a+1+format_ext_length>file_name_size then
11511  b:=a+file_name_size-n-1-format_ext_length;
11512k:=0;
11513for j:=1 to n do append_to_name(xord[TEX_format_default[j]]);
11514for j:=a to b do append_to_name(buffer[j]);
11515for j:=format_default_length-format_ext_length+1 to format_default_length do
11516  append_to_name(xord[TEX_format_default[j]]);
11517if k<=file_name_size then name_length:=k@+else name_length:=file_name_size;
11518for k:=name_length+1 to file_name_size do name_of_file[k]:=' ';
11519end;
11520
11521@ Here is the only place we use |pack_buffered_name|. This part of the program
11522becomes active when a ``virgin'' \TeX\ is trying to get going, just after
11523the preliminary initialization, or when the user is substituting another
11524format file by typing `\.\&' after the initial `\.{**}' prompt.  The buffer
11525contains the first line of input in |buffer[loc..(last-1)]|, where
11526|loc<last| and |buffer[loc]<>" "|.
11527
11528@<Declare the function called |open_fmt_file|@>=
11529function open_fmt_file:boolean;
11530label found,exit;
11531var j:0..buf_size; {the first space after the format file name}
11532begin j:=loc;
11533if buffer[loc]="&" then
11534  begin incr(loc); j:=loc; buffer[last]:=" ";
11535  while buffer[j]<>" " do incr(j);
11536  pack_buffered_name(0,loc,j-1); {try first without the system file area}
11537  if w_open_in(fmt_file) then goto found;
11538  pack_buffered_name(format_area_length,loc,j-1);
11539    {now try the system format file area}
11540  if w_open_in(fmt_file) then goto found;
11541  wake_up_terminal;
11542  wterm_ln('Sorry, I can''t find that format;',' will try PLAIN.');
11543@.Sorry, I can't find...@>
11544  update_terminal;
11545  end;
11546  {now pull out all the stops: try for the system \.{plain} file}
11547pack_buffered_name(format_default_length-format_ext_length,1,0);
11548if not w_open_in(fmt_file) then
11549  begin wake_up_terminal;
11550  wterm_ln('I can''t find the PLAIN format file!');
11551@.I can't find PLAIN...@>
11552@.plain@>
11553  open_fmt_file:=false; return;
11554  end;
11555found:loc:=j; open_fmt_file:=true;
11556exit:end;
11557
11558@ Operating systems often make it possible to determine the exact name (and
11559possible version number) of a file that has been opened. The following routine,
11560which simply makes a \TeX\ string from the value of |name_of_file|, should
11561ideally be changed to deduce the full name of file~|f|, which is the file
11562most recently opened, if it is possible to do this in a \PASCAL\ program.
11563@^system dependencies@>
11564
11565This routine might be called after string memory has overflowed, hence
11566we dare not use `|str_room|'.
11567
11568@p function make_name_string:str_number;
11569var k:0..file_name_size; {index into |name_of_file|}
11570begin if (pool_ptr+name_length>pool_size)or(str_ptr=max_strings)or
11571 (cur_length>0) then
11572  make_name_string:="?"
11573else  begin
11574  make_utf16_name;
11575  for k:=0 to name_length16-1 do append_char(name_of_file16[k]);
11576  make_name_string:=make_string;
11577  end;
11578end;
11579function u_make_name_string(var f:unicode_file):str_number;
11580begin u_make_name_string:=make_name_string;
11581end;
11582function a_make_name_string(var f:alpha_file):str_number;
11583begin a_make_name_string:=make_name_string;
11584end;
11585function b_make_name_string(var f:byte_file):str_number;
11586begin b_make_name_string:=make_name_string;
11587end;
11588function w_make_name_string(var f:word_file):str_number;
11589begin w_make_name_string:=make_name_string;
11590end;
11591
11592@ Now let's consider the ``driver''
11593routines by which \TeX\ deals with file names
11594in a system-independent manner.  First comes a procedure that looks for a
11595file name in the input by calling |get_x_token| for the information.
11596
11597@p procedure scan_file_name;
11598label done;
11599begin name_in_progress:=true; begin_name;
11600@<Get the next non-blank non-call...@>;
11601loop@+begin if (cur_cmd>other_char)or(cur_chr>biggest_char) then
11602    {not a character}
11603    begin back_input; goto done;
11604    end;
11605  if not more_name(cur_chr) then goto done;
11606  get_x_token;
11607  end;
11608done: end_name; name_in_progress:=false;
11609end;
11610
11611@ The global variable |name_in_progress| is used to prevent recursive
11612use of |scan_file_name|, since the |begin_name| and other procedures
11613communicate via global variables. Recursion would arise only by
11614devious tricks like `\.{\\input\\input f}'; such attempts at sabotage
11615must be thwarted. Furthermore, |name_in_progress| prevents \.{\\input}
11616@^recursion@>
11617from being initiated when a font size specification is being scanned.
11618
11619Another global variable, |job_name|, contains the file name that was first
11620\.{\\input} by the user. This name is extended by `\.{.log}' and `\.{.dvi}'
11621and `\.{.fmt}' in the names of \TeX's output files.
11622
11623@<Glob...@>=
11624@!name_in_progress:boolean; {is a file name being scanned?}
11625@!job_name:str_number; {principal file name}
11626@!log_opened:boolean; {has the transcript file been opened?}
11627
11628@ Initially |job_name=0|; it becomes nonzero as soon as the true name is known.
11629We have |job_name=0| if and only if the `\.{log}' file has not been opened,
11630except of course for a short time just after |job_name| has become nonzero.
11631
11632@<Initialize the output...@>=
11633job_name:=0; name_in_progress:=false; log_opened:=false;
11634
11635@ Here is a routine that manufactures the output file names, assuming that
11636|job_name<>0|. It ignores and changes the current settings of |cur_area|
11637and |cur_ext|.
11638
11639@d pack_cur_name==pack_file_name(cur_name,cur_area,cur_ext)
11640
11641@p procedure pack_job_name(@!s:str_number); {|s = ".log"|, |output_file_extension|, or
11642  |format_extension|}
11643begin cur_area:=""; cur_ext:=s;
11644cur_name:=job_name; pack_cur_name;
11645end;
11646
11647@ If some trouble arises when \TeX\ tries to open a file, the following
11648routine calls upon the user to supply another file name. Parameter~|s|
11649is used in the error message to identify the type of file; parameter~|e|
11650is the default extension if none is given. Upon exit from the routine,
11651variables |cur_name|, |cur_area|, |cur_ext|, and |name_of_file| are
11652ready for another attempt at file opening.
11653
11654@p procedure prompt_file_name(@!s,@!e:str_number);
11655label done;
11656var k:0..buf_size; {index into |buffer|}
11657begin if interaction=scroll_mode then wake_up_terminal;
11658if s="input file name" then print_err("I can't find file `")
11659@.I can't find file x@>
11660else print_err("I can't write on file `");
11661@.I can't write on file x@>
11662print_file_name(cur_name,cur_area,cur_ext); print("'.");
11663if e=".tex" then show_context;
11664print_nl("Please type another "); print(s);
11665@.Please type...@>
11666if interaction<scroll_mode then
11667  fatal_error("*** (job aborted, file error in nonstop mode)");
11668@.job aborted, file error...@>
11669clear_terminal; prompt_input(": "); @<Scan file name in the buffer@>;
11670if cur_ext="" then cur_ext:=e;
11671pack_cur_name;
11672end;
11673
11674@ @<Scan file name in the buffer@>=
11675begin begin_name; k:=first;
11676while (buffer[k]=" ")and(k<last) do incr(k);
11677loop@+  begin if k=last then goto done;
11678  if not more_name(buffer[k]) then goto done;
11679  incr(k);
11680  end;
11681done:end_name;
11682end
11683
11684@ Here's an example of how these conventions are used. Whenever it is time to
11685ship out a box of stuff, we shall use the macro |ensure_dvi_open|.
11686
11687@d ensure_dvi_open==if output_file_name=0 then
11688  begin if job_name=0 then open_log_file;
11689  pack_job_name(output_file_extension);
11690  while not dvi_open_out(dvi_file) do
11691    prompt_file_name("file name for output",output_file_extension);
11692  output_file_name:=b_make_name_string(dvi_file);
11693  end
11694
11695@<Glob...@>=
11696@!output_file_extension: str_number;
11697@!no_pdf_output: boolean;
11698@!dvi_file: byte_file; {the device-independent output goes here}
11699@!output_file_name: str_number; {full name of the output file}
11700@!log_name:str_number; {full name of the log file}
11701
11702@ @<Initialize the output...@>=
11703  output_file_name:=0;
11704  if no_pdf_output then output_file_extension:=".xdv"
11705  else output_file_extension:=".pdf";
11706
11707@ The |open_log_file| routine is used to open the transcript file and to help
11708it catch up to what has previously been printed on the terminal.
11709
11710@p procedure open_log_file;
11711var old_setting:0..max_selector; {previous |selector| setting}
11712@!k:0..buf_size; {index into |months| and |buffer|}
11713@!l:0..buf_size; {end of first input line}
11714@!months:packed array [1..36] of char; {abbreviations of month names}
11715begin old_setting:=selector;
11716if job_name=0 then job_name:="texput";
11717@.texput@>
11718pack_job_name(".log");
11719while not a_open_out(log_file) do @<Try to get a different log file name@>;
11720log_name:=a_make_name_string(log_file);
11721selector:=log_only; log_opened:=true;
11722@<Print the banner line, including the date and time@>;
11723input_stack[input_ptr]:=cur_input; {make sure bottom level is in memory}
11724print_nl("**");
11725@.**@>
11726l:=input_stack[0].limit_field; {last position of first line}
11727if buffer[l]=end_line_char then decr(l);
11728for k:=1 to l do print(buffer[k]);
11729print_ln; {now the transcript file contains the first line of input}
11730selector:=old_setting+2; {|log_only| or |term_and_log|}
11731end;
11732
11733@ Sometimes |open_log_file| is called at awkward moments when \TeX\ is
11734unable to print error messages or even to |show_context|.
11735The |prompt_file_name| routine can result in a |fatal_error|, but the |error|
11736routine will not be invoked because |log_opened| will be false.
11737
11738The normal idea of |batch_mode| is that nothing at all should be written
11739on the terminal. However, in the unusual case that
11740no log file could be opened, we make an exception and allow
11741an explanatory message to be seen.
11742
11743Incidentally, the program always refers to the log file as a `\.{transcript
11744file}', because some systems cannot use the extension `\.{.log}' for
11745this file.
11746
11747@<Try to get a different log file name@>=
11748begin selector:=term_only;
11749prompt_file_name("transcript file name",".log");
11750end
11751
11752@ @<Print the banner...@>=
11753begin wlog(banner);
11754slow_print(format_ident); print("  ");
11755print_int(day); print_char(" ");
11756months:='JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC';
11757for k:=3*month-2 to 3*month do wlog(months[k]);
11758print_char(" "); print_int(year); print_char(" ");
11759print_two(time div 60); print_char(":"); print_two(time mod 60);
11760if eTeX_ex then
11761  begin; wlog_cr; wlog('entering extended mode');
11762  end;
11763end
11764
11765@ Let's turn now to the procedure that is used to initiate file reading
11766when an `\.{\\input}' command is being processed.
11767
11768@p procedure start_input; {\TeX\ will \.{\\input} something}
11769label done;
11770begin scan_file_name; {set |cur_name| to desired file name}
11771if cur_ext="" then cur_ext:=".tex";
11772pack_cur_name;
11773loop@+  begin begin_file_reading; {set up |cur_file| and new level of input}
11774  if a_open_in(cur_file) then goto done;
11775  if cur_area="" then
11776    begin pack_file_name(cur_name,TEX_area,cur_ext);
11777    if a_open_in(cur_file) then goto done;
11778    end;
11779  end_file_reading; {remove the level that didn't work}
11780  prompt_file_name("input file name",".tex");
11781  end;
11782done: name:=a_make_name_string(cur_file);
11783if job_name=0 then
11784  begin job_name:=cur_name; open_log_file;
11785  end; {|open_log_file| doesn't |show_context|, so |limit|
11786    and |loc| needn't be set to meaningful values yet}
11787if term_offset+length(name)>max_print_line-2 then print_ln
11788else if (term_offset>0)or(file_offset>0) then print_char(" ");
11789print_char("("); incr(open_parens); slow_print(name); update_terminal;
11790state:=new_line;
11791if name=str_ptr-1 then {we can conserve string pool space now}
11792  begin flush_string; name:=cur_name;
11793  end;
11794@<Read the first line of the new file@>;
11795end;
11796
11797@ Here we have to remember to tell the |input_ln| routine not to
11798start with a |get|. If the file is empty, it is considered to
11799contain a single blank line.
11800@^system dependencies@>
11801@^empty line at end of file@>
11802
11803@<Read the first line...@>=
11804begin line:=1;
11805if input_ln(cur_file,false) then do_nothing;
11806firm_up_the_line;
11807if end_line_char_inactive then decr(limit)
11808else  buffer[limit]:=end_line_char;
11809first:=limit+1; loc:=start;
11810end
11811
11812@* \[30] Font metric data.
11813\TeX\ gets its knowledge about fonts from font metric files, also called
11814\.{TFM} files; the `\.T' in `\.{TFM}' stands for \TeX,
11815but other programs know about them too.
11816@:TFM files}{\.{TFM} files@>
11817@^font metric files@>
11818
11819The information in a \.{TFM} file appears in a sequence of 8-bit bytes.
11820Since the number of bytes is always a multiple of 4, we could
11821also regard the file as a sequence of 32-bit words, but \TeX\ uses the
11822byte interpretation. The format of \.{TFM} files was designed by
11823Lyle Ramshaw in 1980. The intent is to convey a lot of different kinds
11824@^Ramshaw, Lyle Harold@>
11825of information in a compact but useful form.
11826
11827@<Glob...@>=
11828@!tfm_file:byte_file;
11829
11830@ The first 24 bytes (6 words) of a \.{TFM} file contain twelve 16-bit
11831integers that give the lengths of the various subsequent portions
11832of the file. These twelve integers are, in order:
11833$$\vbox{\halign{\hfil#&$\null=\null$#\hfil\cr
11834|lf|&length of the entire file, in words;\cr
11835|lh|&length of the header data, in words;\cr
11836|bc|&smallest character code in the font;\cr
11837|ec|&largest character code in the font;\cr
11838|nw|&number of words in the width table;\cr
11839|nh|&number of words in the height table;\cr
11840|nd|&number of words in the depth table;\cr
11841|ni|&number of words in the italic correction table;\cr
11842|nl|&number of words in the lig/kern table;\cr
11843|nk|&number of words in the kern table;\cr
11844|ne|&number of words in the extensible character table;\cr
11845|np|&number of font parameter words.\cr}}$$
11846They are all nonnegative and less than $2^{15}$. We must have |bc-1<=ec<=255|,
11847and
11848$$\hbox{|lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np|.}$$
11849Note that a font may contain as many as 256 characters (if |bc=0| and |ec=255|),
11850and as few as 0 characters (if |bc=ec+1|).
11851
11852Incidentally, when two or more 8-bit bytes are combined to form an integer of
1185316 or more bits, the most significant bytes appear first in the file.
11854This is called BigEndian order.
11855@!@^BigEndian order@>
11856
11857@ The rest of the \.{TFM} file may be regarded as a sequence of ten data
11858arrays having the informal specification
11859$$\def\arr$[#1]#2${\&{array} $[#1]$ \&{of} #2}
11860\vbox{\halign{\hfil\\{#}&$\,:\,$\arr#\hfil\cr
11861header&|[0..lh-1]@t\\{stuff}@>|\cr
11862char\_info&|[bc..ec]char_info_word|\cr
11863width&|[0..nw-1]fix_word|\cr
11864height&|[0..nh-1]fix_word|\cr
11865depth&|[0..nd-1]fix_word|\cr
11866italic&|[0..ni-1]fix_word|\cr
11867lig\_kern&|[0..nl-1]lig_kern_command|\cr
11868kern&|[0..nk-1]fix_word|\cr
11869exten&|[0..ne-1]extensible_recipe|\cr
11870param&|[1..np]fix_word|\cr}}$$
11871The most important data type used here is a |@!fix_word|, which is
11872a 32-bit representation of a binary fraction. A |fix_word| is a signed
11873quantity, with the two's complement of the entire word used to represent
11874negation. Of the 32 bits in a |fix_word|, exactly 12 are to the left of the
11875binary point; thus, the largest |fix_word| value is $2048-2^{-20}$, and
11876the smallest is $-2048$. We will see below, however, that all but two of
11877the |fix_word| values must lie between $-16$ and $+16$.
11878
11879@ The first data array is a block of header information, which contains
11880general facts about the font. The header must contain at least two words,
11881|header[0]| and |header[1]|, whose meaning is explained below.
11882Additional header information of use to other software routines might
11883also be included, but \TeX82 does not need to know about such details.
11884For example, 16 more words of header information are in use at the Xerox
11885Palo Alto Research Center; the first ten specify the character coding
11886scheme used (e.g., `\.{XEROX text}' or `\.{TeX math symbols}'), the next five
11887give the font identifier (e.g., `\.{HELVETICA}' or `\.{CMSY}'), and the
11888last gives the ``face byte.'' The program that converts \.{DVI} files
11889to Xerox printing format gets this information by looking at the \.{TFM}
11890file, which it needs to read anyway because of other information that
11891is not explicitly repeated in \.{DVI}~format.
11892
11893\yskip\hang|header[0]| is a 32-bit check sum that \TeX\ will copy into
11894the \.{DVI} output file. Later on when the \.{DVI} file is printed,
11895possibly on another computer, the actual font that gets used is supposed
11896to have a check sum that agrees with the one in the \.{TFM} file used by
11897\TeX. In this way, users will be warned about potential incompatibilities.
11898(However, if the check sum is zero in either the font file or the \.{TFM}
11899file, no check is made.)  The actual relation between this check sum and
11900the rest of the \.{TFM} file is not important; the check sum is simply an
11901identification number with the property that incompatible fonts almost
11902always have distinct check sums.
11903@^check sum@>
11904
11905\yskip\hang|header[1]| is a |fix_word| containing the design size of
11906the font, in units of \TeX\ points. This number must be at least 1.0; it is
11907fairly arbitrary, but usually the design size is 10.0 for a ``10 point''
11908font, i.e., a font that was designed to look best at a 10-point size,
11909whatever that really means. When a \TeX\ user asks for a font
11910`\.{at} $\delta$ \.{pt}', the effect is to override the design size
11911and replace it by $\delta$, and to multiply the $x$ and~$y$ coordinates
11912of the points in the font image by a factor of $\delta$ divided by the
11913design size.  {\sl All other dimensions in the\/ \.{TFM} file are
11914|fix_word|\kern-1pt\ numbers in design-size units}, with the exception of
11915|param[1]| (which denotes the slant ratio). Thus, for example, the value
11916of |param[6]|, which defines the \.{em} unit, is often the |fix_word| value
11917$2^{20}=1.0$, since many fonts have a design size equal to one em.
11918The other dimensions must be less than 16 design-size units in absolute
11919value; thus, |header[1]| and |param[1]| are the only |fix_word|
11920entries in the whole \.{TFM} file whose first byte might be something
11921besides 0 or 255.
11922
11923@ Next comes the |char_info| array, which contains one |@!char_info_word|
11924per character. Each word in this part of the file contains six fields
11925packed into four bytes as follows.
11926
11927\yskip\hang first byte: |@!width_index| (8 bits)\par
11928\hang second byte: |@!height_index| (4 bits) times 16, plus |@!depth_index|
11929  (4~bits)\par
11930\hang third byte: |@!italic_index| (6 bits) times 4, plus |@!tag|
11931  (2~bits)\par
11932\hang fourth byte: |@!remainder| (8 bits)\par
11933\yskip\noindent
11934The actual width of a character is \\{width}|[width_index]|, in design-size
11935units; this is a device for compressing information, since many characters
11936have the same width. Since it is quite common for many characters
11937to have the same height, depth, or italic correction, the \.{TFM} format
11938imposes a limit of 16 different heights, 16 different depths, and
1193964 different italic corrections.
11940
11941@!@^italic correction@>
11942The italic correction of a character has two different uses.
11943(a)~In ordinary text, the italic correction is added to the width only if
11944the \TeX\ user specifies `\.{\\/}' after the character.
11945(b)~In math formulas, the italic correction is always added to the width,
11946except with respect to the positioning of subscripts.
11947
11948Incidentally, the relation $\\{width}[0]=\\{height}[0]=\\{depth}[0]=
11949\\{italic}[0]=0$ should always hold, so that an index of zero implies a
11950value of zero.  The |width_index| should never be zero unless the
11951character does not exist in the font, since a character is valid if and
11952only if it lies between |bc| and |ec| and has a nonzero |width_index|.
11953
11954@ The |tag| field in a |char_info_word| has four values that explain how to
11955interpret the |remainder| field.
11956
11957\yskip\hangg|tag=0| (|no_tag|) means that |remainder| is unused.\par
11958\hangg|tag=1| (|lig_tag|) means that this character has a ligature/kerning
11959program starting at position |remainder| in the |lig_kern| array.\par
11960\hangg|tag=2| (|list_tag|) means that this character is part of a chain of
11961characters of ascending sizes, and not the largest in the chain.  The
11962|remainder| field gives the character code of the next larger character.\par
11963\hangg|tag=3| (|ext_tag|) means that this character code represents an
11964extensible character, i.e., a character that is built up of smaller pieces
11965so that it can be made arbitrarily large. The pieces are specified in
11966|@!exten[remainder]|.\par
11967\yskip\noindent
11968Characters with |tag=2| and |tag=3| are treated as characters with |tag=0|
11969unless they are used in special circumstances in math formulas. For example,
11970the \.{\\sum} operation looks for a |list_tag|, and the \.{\\left}
11971operation looks for both |list_tag| and |ext_tag|.
11972
11973@d no_tag=0 {vanilla character}
11974@d lig_tag=1 {character has a ligature/kerning program}
11975@d list_tag=2 {character has a successor in a charlist}
11976@d ext_tag=3 {character is extensible}
11977
11978@ The |lig_kern| array contains instructions in a simple programming language
11979that explains what to do for special letter pairs. Each word in this array is a
11980|@!lig_kern_command| of four bytes.
11981
11982\yskip\hang first byte: |skip_byte|, indicates that this is the final program
11983  step if the byte is 128 or more, otherwise the next step is obtained by
11984  skipping this number of intervening steps.\par
11985\hang second byte: |next_char|, ``if |next_char| follows the current character,
11986  then perform the operation and stop, otherwise continue.''\par
11987\hang third byte: |op_byte|, indicates a ligature step if less than~128,
11988  a kern step otherwise.\par
11989\hang fourth byte: |remainder|.\par
11990\yskip\noindent
11991In a kern step, an
11992additional space equal to |kern[256*(op_byte-128)+remainder]| is inserted
11993between the current character and |next_char|. This amount is
11994often negative, so that the characters are brought closer together
11995by kerning; but it might be positive.
11996
11997There are eight kinds of ligature steps, having |op_byte| codes $4a+2b+c$ where
11998$0\le a\le b+c$ and $0\le b,c\le1$. The character whose code is
11999|remainder| is inserted between the current character and |next_char|;
12000then the current character is deleted if $b=0$, and |next_char| is
12001deleted if $c=0$; then we pass over $a$~characters to reach the next
12002current character (which may have a ligature/kerning program of its own).
12003
12004If the very first instruction of the |lig_kern| array has |skip_byte=255|,
12005the |next_char| byte is the so-called right boundary character of this font;
12006the value of |next_char| need not lie between |bc| and~|ec|.
12007If the very last instruction of the |lig_kern| array has |skip_byte=255|,
12008there is a special ligature/kerning program for a left boundary character,
12009beginning at location |256*op_byte+remainder|.
12010The interpretation is that \TeX\ puts implicit boundary characters
12011before and after each consecutive string of characters from the same font.
12012These implicit characters do not appear in the output, but they can affect
12013ligatures and kerning.
12014
12015If the very first instruction of a character's |lig_kern| program has
12016|skip_byte>128|, the program actually begins in location
12017|256*op_byte+remainder|. This feature allows access to large |lig_kern|
12018arrays, because the first instruction must otherwise
12019appear in a location |<=255|.
12020
12021Any instruction with |skip_byte>128| in the |lig_kern| array must satisfy
12022the condition
12023$$\hbox{|256*op_byte+remainder<nl|.}$$
12024If such an instruction is encountered during
12025normal program execution, it denotes an unconditional halt; no ligature
12026or kerning command is performed.
12027
12028@d stop_flag==qi(128) {value indicating `\.{STOP}' in a lig/kern program}
12029@d kern_flag==qi(128) {op code for a kern step}
12030@d skip_byte(#)==#.b0
12031@d next_char(#)==#.b1
12032@d op_byte(#)==#.b2
12033@d rem_byte(#)==#.b3
12034
12035@ Extensible characters are specified by an |@!extensible_recipe|, which
12036consists of four bytes called |@!top|, |@!mid|, |@!bot|, and |@!rep| (in this
12037order). These bytes are the character codes of individual pieces used to
12038build up a large symbol.  If |top|, |mid|, or |bot| are zero, they are not
12039present in the built-up result. For example, an extensible vertical line is
12040like an extensible bracket, except that the top and bottom pieces are missing.
12041
12042Let $T$, $M$, $B$, and $R$ denote the respective pieces, or an empty box
12043if the piece isn't present. Then the extensible characters have the form
12044$TR^kMR^kB$ from top to bottom, for some |k>=0|, unless $M$ is absent;
12045in the latter case we can have $TR^kB$ for both even and odd values of~|k|.
12046The width of the extensible character is the width of $R$; and the
12047height-plus-depth is the sum of the individual height-plus-depths of the
12048components used, since the pieces are butted together in a vertical list.
12049
12050@d ext_top(#)==#.b0 {|top| piece in a recipe}
12051@d ext_mid(#)==#.b1 {|mid| piece in a recipe}
12052@d ext_bot(#)==#.b2 {|bot| piece in a recipe}
12053@d ext_rep(#)==#.b3 {|rep| piece in a recipe}
12054
12055@ The final portion of a \.{TFM} file is the |param| array, which is another
12056sequence of |fix_word| values.
12057
12058\yskip\hang|param[1]=slant| is the amount of italic slant, which is used
12059to help position accents. For example, |slant=.25| means that when you go
12060up one unit, you also go .25 units to the right. The |slant| is a pure
12061number; it's the only |fix_word| other than the design size itself that is
12062not scaled by the design size.
12063
12064\hang|param[2]=space| is the normal spacing between words in text.
12065Note that character |" "| in the font need not have anything to do with
12066blank spaces.
12067
12068\hang|param[3]=space_stretch| is the amount of glue stretching between words.
12069
12070\hang|param[4]=space_shrink| is the amount of glue shrinking between words.
12071
12072\hang|param[5]=x_height| is the size of one ex in the font; it is also
12073the height of letters for which accents don't have to be raised or lowered.
12074
12075\hang|param[6]=quad| is the size of one em in the font.
12076
12077\hang|param[7]=extra_space| is the amount added to |param[2]| at the
12078ends of sentences.
12079
12080\yskip\noindent
12081If fewer than seven parameters are present, \TeX\ sets the missing parameters
12082to zero. Fonts used for math symbols are required to have
12083additional parameter information, which is explained later.
12084
12085@d slant_code=1
12086@d space_code=2
12087@d space_stretch_code=3
12088@d space_shrink_code=4
12089@d x_height_code=5
12090@d quad_code=6
12091@d extra_space_code=7
12092
12093@ So that is what \.{TFM} files hold. Since \TeX\ has to absorb such information
12094about lots of fonts, it stores most of the data in a large array called
12095|font_info|. Each item of |font_info| is a |memory_word|; the |fix_word|
12096data gets converted into |scaled| entries, while everything else goes into
12097words of type |four_quarters|.
12098
12099When the user defines \.{\\font\\f}, say, \TeX\ assigns an internal number
12100to the user's font~\.{\\f}. Adding this number to |font_id_base| gives the
12101|eqtb| location of a ``frozen'' control sequence that will always select
12102the font.
12103
12104@<Types...@>=
12105@!internal_font_number=font_base..font_max; {|font| in a |char_node|}
12106@!font_index=0..font_mem_size; {index into |font_info|}
12107
12108@ Here now is the (rather formidable) array of font arrays.
12109
12110@d otgr_font_flag=@"FFFE
12111@d aat_font_flag=@"FFFF
12112@d is_aat_font(#)==(font_area[#]=aat_font_flag)
12113@d is_ot_font(#)==((font_area[#]=otgr_font_flag) and (usingOpenType(font_layout_engine[#])))
12114@d is_gr_font(#)==((font_area[#]=otgr_font_flag) and (usingGraphite(font_layout_engine[#])))
12115@d is_otgr_font(#)==(font_area[#]=otgr_font_flag)
12116@d is_native_font(#)==(is_aat_font(#) or is_otgr_font(#))
12117    {native fonts have |font_area| = 65534 or 65535,
12118     which would be a string containing an invalid Unicode character}
12119@d is_new_mathfont(#)==((font_area[#]=otgr_font_flag) and (isOpenTypeMathFont(font_layout_engine[#])))
12120
12121@d non_char==qi(too_big_char) {a |halfword| code that can't match a real character}
12122@d non_address=0 {a spurious |bchar_label|}
12123
12124@<Glob...@>=
12125@!font_info:array[font_index] of memory_word;
12126  {the big collection of font data}
12127@!fmem_ptr:font_index; {first unused word of |font_info|}
12128@!font_ptr:internal_font_number; {largest internal font number in use}
12129@!font_check:array[internal_font_number] of four_quarters; {check sum}
12130@!font_size:array[internal_font_number] of scaled; {``at'' size}
12131@!font_dsize:array[internal_font_number] of scaled; {``design'' size}
12132@!font_params:array[internal_font_number] of font_index; {how many font
12133  parameters are present}
12134@!font_name:array[internal_font_number] of str_number; {name of the font}
12135@!font_area:array[internal_font_number] of str_number; {area of the font}
12136@!font_bc:array[internal_font_number] of eight_bits;
12137  {beginning (smallest) character code}
12138@!font_ec:array[internal_font_number] of eight_bits;
12139  {ending (largest) character code}
12140@!font_glue:array[internal_font_number] of pointer;
12141  {glue specification for interword space, |null| if not allocated}
12142@!font_used:array[internal_font_number] of boolean;
12143  {has a character from this font actually appeared in the output?}
12144@!hyphen_char:array[internal_font_number] of integer;
12145  {current \.{\\hyphenchar} values}
12146@!skew_char:array[internal_font_number] of integer;
12147  {current \.{\\skewchar} values}
12148@!bchar_label:array[internal_font_number] of font_index;
12149  {start of |lig_kern| program for left boundary character,
12150  |non_address| if there is none}
12151@!font_bchar:array[internal_font_number] of min_quarterword..non_char;
12152  {right boundary character, |non_char| if there is none}
12153@!font_false_bchar:array[internal_font_number] of min_quarterword..non_char;
12154  {|font_bchar| if it doesn't exist in the font, otherwise |non_char|}
12155
12156@ Besides the arrays just enumerated, we have directory arrays that make it
12157easy to get at the individual entries in |font_info|. For example, the
12158|char_info| data for character |c| in font |f| will be in
12159|font_info[char_base[f]+c].qqqq|; and if |w| is the |width_index|
12160part of this word (the |b0| field), the width of the character is
12161|font_info[width_base[f]+w].sc|. (These formulas assume that
12162|min_quarterword| has already been added to |c| and to |w|, since \TeX\
12163stores its quarterwords that way.)
12164
12165@<Glob...@>=
12166@!char_base:array[internal_font_number] of integer;
12167  {base addresses for |char_info|}
12168@!width_base:array[internal_font_number] of integer;
12169  {base addresses for widths}
12170@!height_base:array[internal_font_number] of integer;
12171  {base addresses for heights}
12172@!depth_base:array[internal_font_number] of integer;
12173  {base addresses for depths}
12174@!italic_base:array[internal_font_number] of integer;
12175  {base addresses for italic corrections}
12176@!lig_kern_base:array[internal_font_number] of integer;
12177  {base addresses for ligature/kerning programs}
12178@!kern_base:array[internal_font_number] of integer;
12179  {base addresses for kerns}
12180@!exten_base:array[internal_font_number] of integer;
12181  {base addresses for extensible recipes}
12182@!param_base:array[internal_font_number] of integer;
12183  {base addresses for font parameters}
12184
12185@ @<Set init...@>=
12186for k:=font_base to font_max do font_used[k]:=false;
12187
12188@ \TeX\ always knows at least one font, namely the null font. It has no
12189characters, and its seven parameters are all equal to zero.
12190
12191@<Initialize table...@>=
12192font_ptr:=null_font; fmem_ptr:=7;
12193font_name[null_font]:="nullfont"; font_area[null_font]:="";
12194hyphen_char[null_font]:="-"; skew_char[null_font]:=-1;
12195bchar_label[null_font]:=non_address;
12196font_bchar[null_font]:=non_char; font_false_bchar[null_font]:=non_char;
12197font_bc[null_font]:=1; font_ec[null_font]:=0;
12198font_size[null_font]:=0; font_dsize[null_font]:=0;
12199char_base[null_font]:=0; width_base[null_font]:=0;
12200height_base[null_font]:=0; depth_base[null_font]:=0;
12201italic_base[null_font]:=0; lig_kern_base[null_font]:=0;
12202kern_base[null_font]:=0; exten_base[null_font]:=0;
12203font_glue[null_font]:=null; font_params[null_font]:=7;
12204param_base[null_font]:=-1;
12205for k:=0 to 6 do font_info[k].sc:=0;
12206
12207@ @<Put each...@>=
12208primitive("nullfont",set_font,null_font);
12209@!@:null_font_}{\.{\\nullfont} primitive@>
12210text(frozen_null_font):="nullfont"; eqtb[frozen_null_font]:=eqtb[cur_val];
12211
12212@ Of course we want to define macros that suppress the detail of how font
12213information is actually packed, so that we don't have to write things like
12214$$\hbox{|font_info[width_base[f]+font_info[char_base[f]+c].qqqq.b0].sc|}$$
12215too often. The \.{WEB} definitions here make |char_info(f)(c)| the
12216|four_quarters| word of font information corresponding to character
12217|c| of font |f|. If |q| is such a word, |char_width(f)(q)| will be
12218the character's width; hence the long formula above is at least
12219abbreviated to
12220$$\hbox{|char_width(f)(char_info(f)(c))|.}$$
12221Usually, of course, we will fetch |q| first and look at several of its
12222fields at the same time.
12223
12224The italic correction of a character will be denoted by
12225|char_italic(f)(q)|, so it is analogous to |char_width|.  But we will get
12226at the height and depth in a slightly different way, since we usually want
12227to compute both height and depth if we want either one.  The value of
12228|height_depth(q)| will be the 8-bit quantity
12229$$b=|height_index|\times16+|depth_index|,$$ and if |b| is such a byte we
12230will write |char_height(f)(b)| and |char_depth(f)(b)| for the height and
12231depth of the character |c| for which |q=char_info(f)(c)|. Got that?
12232
12233The tag field will be called |char_tag(q)|; the remainder byte will be
12234called |rem_byte(q)|, using a macro that we have already defined above.
12235
12236Access to a character's |width|, |height|, |depth|, and |tag| fields is
12237part of \TeX's inner loop, so we want these macros to produce code that is
12238as fast as possible under the circumstances.
12239@^inner loop@>
12240
12241@d char_info_end(#)==#].qqqq
12242@d char_info(#)==font_info[char_base[#]+char_info_end
12243@d char_width_end(#)==#.b0].sc
12244@d char_width(#)==font_info[width_base[#]+char_width_end
12245@d char_exists(#)==(#.b0>min_quarterword)
12246@d char_italic_end(#)==(qo(#.b2)) div 4].sc
12247@d char_italic(#)==font_info[italic_base[#]+char_italic_end
12248@d height_depth(#)==qo(#.b1)
12249@d char_height_end(#)==(#) div 16].sc
12250@d char_height(#)==font_info[height_base[#]+char_height_end
12251@d char_depth_end(#)==(#) mod 16].sc
12252@d char_depth(#)==font_info[depth_base[#]+char_depth_end
12253@d char_tag(#)==((qo(#.b2)) mod 4)
12254
12255@ The global variable |null_character| is set up to be a word of
12256|char_info| for a character that doesn't exist. Such a word provides a
12257convenient way to deal with erroneous situations.
12258
12259@<Glob...@>=
12260@!null_character:four_quarters; {nonexistent character information}
12261
12262@ @<Set init...@>=
12263null_character.b0:=min_quarterword; null_character.b1:=min_quarterword;
12264null_character.b2:=min_quarterword; null_character.b3:=min_quarterword;
12265
12266@ Here are some macros that help process ligatures and kerns.
12267We write |char_kern(f)(j)| to find the amount of kerning specified by
12268kerning command~|j| in font~|f|. If |j| is the |char_info| for a character
12269with a ligature/kern program, the first instruction of that program is either
12270|i=font_info[lig_kern_start(f)(j)]| or |font_info[lig_kern_restart(f)(i)]|,
12271depending on whether or not |skip_byte(i)<=stop_flag|.
12272
12273The constant |kern_base_offset| should be simplified, for \PASCAL\ compilers
12274that do not do local optimization.
12275@^system dependencies@>
12276
12277@d char_kern_end(#)==256*op_byte(#)+rem_byte(#)].sc
12278@d char_kern(#)==font_info[kern_base[#]+char_kern_end
12279@d kern_base_offset==256*(128+min_quarterword)
12280@d lig_kern_start(#)==lig_kern_base[#]+rem_byte {beginning of lig/kern program}
12281@d lig_kern_restart_end(#)==256*op_byte(#)+rem_byte(#)+32768-kern_base_offset
12282@d lig_kern_restart(#)==lig_kern_base[#]+lig_kern_restart_end
12283
12284@ Font parameters are referred to as |slant(f)|, |space(f)|, etc.
12285
12286@d param_end(#)==param_base[#]].sc
12287@d param(#)==font_info[#+param_end
12288@d slant==param(slant_code) {slant to the right, per unit distance upward}
12289@d space==param(space_code) {normal space between words}
12290@d space_stretch==param(space_stretch_code) {stretch between words}
12291@d space_shrink==param(space_shrink_code) {shrink between words}
12292@d x_height==param(x_height_code) {one ex}
12293@d quad==param(quad_code) {one em}
12294@d extra_space==param(extra_space_code) {additional space at end of sentence}
12295
12296@<The em width for |cur_font|@>=quad(cur_font)
12297
12298@ @<The x-height for |cur_font|@>=x_height(cur_font)
12299
12300@ \TeX\ checks the information of a \.{TFM} file for validity as the
12301file is being read in, so that no further checks will be needed when
12302typesetting is going on. The somewhat tedious subroutine that does this
12303is called |read_font_info|. It has four parameters: the user font
12304identifier~|u|, the file name and area strings |nom| and |aire|, and the
12305``at'' size~|s|. If |s|~is negative, it's the negative of a scale factor
12306to be applied to the design size; |s=-1000| is the normal case.
12307Otherwise |s| will be substituted for the design size; in this
12308case, |s| must be positive and less than $2048\rm\,pt$
12309(i.e., it must be less than $2^{27}$ when considered as an integer).
12310
12311The subroutine opens and closes a global file variable called |tfm_file|.
12312It returns the value of the internal font number that was just loaded.
12313If an error is detected, an error message is issued and no font
12314information is stored; |null_font| is returned in this case.
12315
12316@d bad_tfm=11 {label for |read_font_info|}
12317@d abort==goto bad_tfm {do this when the \.{TFM} data is wrong}
12318
12319@p function read_font_info(@!u:pointer;@!nom,@!aire:str_number;
12320  @!s:scaled):internal_font_number; {input a \.{TFM} file}
12321label done,bad_tfm,not_found;
12322var k:font_index; {index into |font_info|}
12323@!file_opened:boolean; {was |tfm_file| successfully opened?}
12324@!lf,@!lh,@!bc,@!ec,@!nw,@!nh,@!nd,@!ni,@!nl,@!nk,@!ne,@!np:halfword;
12325  {sizes of subfiles}
12326@!f:internal_font_number; {the new font's number}
12327@!g:internal_font_number; {the number to return}
12328@!a,@!b,@!c,@!d:eight_bits; {byte variables}
12329@!qw:four_quarters;@!sw:scaled; {accumulators}
12330@!bch_label:integer; {left boundary start location, or infinity}
12331@!bchar:0..256; {right boundary character, or 256}
12332@!z:scaled; {the design size or the ``at'' size}
12333@!alpha:integer;@!beta:1..16;
12334  {auxiliary quantities used in fixed-point multiplication}
12335begin g:=null_font;@/
12336file_opened:=false;
12337pack_file_name(nom,aire,cur_ext);
12338if XeTeX_tracing_fonts_state>0 then begin
12339  begin_diagnostic;
12340  print_nl("Requested font """);
12341  print_c_string(stringcast(name_of_file+1));
12342  print('"');
12343  if s < 0 then begin
12344    print(" scaled ");
12345    print_int(-s);
12346  end else begin
12347    print(" at ");
12348    print_scaled(s);
12349    print("pt");
12350  end;
12351  end_diagnostic(false);
12352end;
12353if quoted_filename then begin
12354  { quoted name, so try for a native font }
12355  g:=load_native_font(u,nom,aire,s);
12356  if g<>null_font then goto done;
12357end;
12358{ it was an unquoted name, or not found as an installed font, so try for a TFM file }
12359@<Read and check the font data if file exists;
12360  |abort| if the \.{TFM} file is
12361  malformed; if there's no room for this font, say so and |goto
12362  done|; otherwise |incr(font_ptr)| and |goto done|@>;
12363if g<>null_font then goto done;
12364if not quoted_filename then begin
12365  { we failed to find a TFM file, so try for a native font }
12366  g:=load_native_font(u,nom,aire,s);
12367  if g<>null_font then goto done
12368end;
12369bad_tfm:
12370if suppress_fontnotfound_error=0 then begin
12371  @<Report that the font won't be loaded@>;
12372  end;
12373done: if file_opened then b_close(tfm_file);
12374if XeTeX_tracing_fonts_state>0 then begin
12375  if g=null_font then begin
12376    begin_diagnostic;
12377    print_nl(" -> font not found, using ""nullfont""");
12378    end_diagnostic(false);
12379  end else if file_opened then begin
12380    begin_diagnostic;
12381    print_nl(" -> ");
12382    print_c_string(stringcast(name_of_file+1));
12383    end_diagnostic(false);
12384  end;
12385end;
12386read_font_info:=g;
12387end;
12388
12389@ There are programs called \.{TFtoPL} and \.{PLtoTF} that convert
12390between the \.{TFM} format and a symbolic property-list format
12391that can be easily edited. These programs contain extensive
12392diagnostic information, so \TeX\ does not have to bother giving
12393precise details about why it rejects a particular \.{TFM} file.
12394@.TFtoPL@> @.PLtoTF@>
12395
12396@d start_font_error_message==print_err("Font "); sprint_cs(u);
12397  print_char("=");
12398  if file_name_quote_char<>0 then print_char(file_name_quote_char);
12399  print_file_name(nom,aire,cur_ext);
12400  if file_name_quote_char<>0 then print_char(file_name_quote_char);
12401  if s>=0 then
12402    begin print(" at "); print_scaled(s); print("pt");
12403    end
12404  else if s<>-1000 then
12405    begin print(" scaled "); print_int(-s);
12406    end
12407
12408@<Report that the font won't be loaded@>=
12409start_font_error_message;
12410@.Font x=xx not loadable...@>
12411if file_opened then print(" not loadable: Bad metric (TFM) file")
12412else print(" not loadable: Metric (TFM) file not found");
12413help5("I wasn't able to read the size data for this font,")@/
12414("so I will ignore the font specification.")@/
12415("[Wizards can fix TFM files using TFtoPL/PLtoTF.]")@/
12416("You might try inserting a different font spec;")@/
12417("e.g., type `I\font<same font id>=<substitute font name>'.");
12418error
12419
12420@ @<Read and check...@>=
12421@<Open |tfm_file| for input and |begin|@>;
12422@<Read the {\.{TFM}} size fields@>;
12423@<Use size fields to allocate font information@>;
12424@<Read the {\.{TFM}} header@>;
12425@<Read character data@>;
12426@<Read box dimensions@>;
12427@<Read ligature/kern program@>;
12428@<Read extensible character recipes@>;
12429@<Read font parameters@>;
12430@<Make final adjustments and |goto done|@>;
12431end
12432
12433@ @<Open |tfm_file| for input...@>=
12434if aire="" then pack_file_name(nom,TEX_font_area,".tfm")
12435else pack_file_name(nom,aire,".tfm");
12436check_for_tfm_font_mapping;
12437if b_open_in(tfm_file) then begin
12438  file_opened:=true
12439
12440@ Note: A malformed \.{TFM} file might be shorter than it claims to be;
12441thus |eof(tfm_file)| might be true when |read_font_info| refers to
12442|tfm_file^| or when it says |get(tfm_file)|. If such circumstances
12443cause system error messages, you will have to defeat them somehow,
12444for example by defining |fget| to be `\ignorespaces|begin get(tfm_file);|
12445|if eof(tfm_file) then abort; end|\unskip'.
12446@^system dependencies@>
12447
12448@d fget==get(tfm_file)
12449@d fbyte==tfm_file^
12450@d read_sixteen(#)==begin #:=fbyte;
12451  if #>127 then abort;
12452  fget; #:=#*@'400+fbyte;
12453  end
12454@d store_four_quarters(#)==begin fget; a:=fbyte; qw.b0:=qi(a);
12455  fget; b:=fbyte; qw.b1:=qi(b);
12456  fget; c:=fbyte; qw.b2:=qi(c);
12457  fget; d:=fbyte; qw.b3:=qi(d);
12458  #:=qw;
12459  end
12460
12461@ @<Read the {\.{TFM}} size fields@>=
12462begin read_sixteen(lf);
12463fget; read_sixteen(lh);
12464fget; read_sixteen(bc);
12465fget; read_sixteen(ec);
12466if (bc>ec+1)or(ec>255) then abort;
12467if bc>255 then {|bc=256| and |ec=255|}
12468  begin bc:=1; ec:=0;
12469  end;
12470fget; read_sixteen(nw);
12471fget; read_sixteen(nh);
12472fget; read_sixteen(nd);
12473fget; read_sixteen(ni);
12474fget; read_sixteen(nl);
12475fget; read_sixteen(nk);
12476fget; read_sixteen(ne);
12477fget; read_sixteen(np);
12478if lf<>6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np then abort;
12479if (nw=0)or(nh=0)or(nd=0)or(ni=0) then abort;
12480end
12481
12482@ The preliminary settings of the index-offset variables |char_base|,
12483|width_base|, |lig_kern_base|, |kern_base|, and |exten_base| will be
12484corrected later by subtracting |min_quarterword| from them; and we will
12485subtract 1 from |param_base| too. It's best to forget about such anomalies
12486until later.
12487
12488@<Use size fields to allocate font information@>=
12489lf:=lf-6-lh; {|lf| words should be loaded into |font_info|}
12490if np<7 then lf:=lf+7-np; {at least seven parameters will appear}
12491if (font_ptr=font_max)or(fmem_ptr+lf>font_mem_size) then
12492  @<Apologize for not loading the font, |goto done|@>;
12493f:=font_ptr+1;
12494char_base[f]:=fmem_ptr-bc;
12495width_base[f]:=char_base[f]+ec+1;
12496height_base[f]:=width_base[f]+nw;
12497depth_base[f]:=height_base[f]+nh;
12498italic_base[f]:=depth_base[f]+nd;
12499lig_kern_base[f]:=italic_base[f]+ni;
12500kern_base[f]:=lig_kern_base[f]+nl-kern_base_offset;
12501exten_base[f]:=kern_base[f]+kern_base_offset+nk;
12502param_base[f]:=exten_base[f]+ne
12503
12504@ @<Apologize for not loading...@>=
12505begin start_font_error_message;
12506print(" not loaded: Not enough room left");
12507@.Font x=xx not loaded...@>
12508help4("I'm afraid I won't be able to make use of this font,")@/
12509("because my memory for character-size data is too small.")@/
12510("If you're really stuck, ask a wizard to enlarge me.")@/
12511("Or maybe try `I\font<same font id>=<name of loaded font>'.");
12512error; goto done;
12513end
12514
12515@ Only the first two words of the header are needed by \TeX82.
12516
12517@<Read the {\.{TFM}} header@>=
12518begin if lh<2 then abort;
12519store_four_quarters(font_check[f]);
12520fget; read_sixteen(z); {this rejects a negative design size}
12521fget; z:=z*@'400+fbyte; fget; z:=(z*@'20)+(fbyte div@'20);
12522if z<unity then abort;
12523while lh>2 do
12524  begin fget;fget;fget;fget;decr(lh); {ignore the rest of the header}
12525  end;
12526font_dsize[f]:=z;
12527if s<>-1000 then
12528  if s>=0 then z:=s
12529  else z:=xn_over_d(z,-s,1000);
12530font_size[f]:=z;
12531end
12532
12533@ @<Read character data@>=
12534for k:=fmem_ptr to width_base[f]-1 do
12535  begin store_four_quarters(font_info[k].qqqq);
12536  if (a>=nw)or(b div @'20>=nh)or(b mod @'20>=nd)or
12537    (c div 4>=ni) then abort;
12538  case c mod 4 of
12539  lig_tag: if d>=nl then abort;
12540  ext_tag: if d>=ne then abort;
12541  list_tag: @<Check for charlist cycle@>;
12542  othercases do_nothing {|no_tag|}
12543  endcases;
12544  end
12545
12546@ We want to make sure that there is no cycle of characters linked together
12547by |list_tag| entries, since such a cycle would get \TeX\ into an endless
12548loop. If such a cycle exists, the routine here detects it when processing
12549the largest character code in the cycle.
12550
12551@d check_byte_range(#)==begin if (#<bc)or(#>ec) then abort@+end
12552@d current_character_being_worked_on==k+bc-fmem_ptr
12553
12554@<Check for charlist cycle@>=
12555begin check_byte_range(d);
12556while d<current_character_being_worked_on do
12557  begin qw:=char_info(f)(d);
12558  {N.B.: not |qi(d)|, since |char_base[f]| hasn't been adjusted yet}
12559  if char_tag(qw)<>list_tag then goto not_found;
12560  d:=qo(rem_byte(qw)); {next character on the list}
12561  end;
12562if d=current_character_being_worked_on then abort; {yes, there's a cycle}
12563not_found:end
12564
12565@ A |fix_word| whose four bytes are $(a,b,c,d)$ from left to right represents
12566the number
12567$$x=\left\{\vcenter{\halign{$#$,\hfil\qquad&if $#$\hfil\cr
12568b\cdot2^{-4}+c\cdot2^{-12}+d\cdot2^{-20}&a=0;\cr
12569-16+b\cdot2^{-4}+c\cdot2^{-12}+d\cdot2^{-20}&a=255.\cr}}\right.$$
12570(No other choices of |a| are allowed, since the magnitude of a number in
12571design-size units must be less than 16.)  We want to multiply this
12572quantity by the integer~|z|, which is known to be less than $2^{27}$.
12573If $|z|<2^{23}$, the individual multiplications $b\cdot z$,
12574$c\cdot z$, $d\cdot z$ cannot overflow; otherwise we will divide |z| by 2,
125754, 8, or 16, to obtain a multiplier less than $2^{23}$, and we can
12576compensate for this later. If |z| has thereby been replaced by
12577$|z|^\prime=|z|/2^e$, let $\beta=2^{4-e}$; we shall compute
12578$$\lfloor(b+c\cdot2^{-8}+d\cdot2^{-16})\,z^\prime/\beta\rfloor$$
12579if $a=0$, or the same quantity minus $\alpha=2^{4+e}z^\prime$ if $a=255$.
12580This calculation must be done exactly, in order to guarantee portability
12581of \TeX\ between computers.
12582
12583@d store_scaled(#)==begin fget; a:=fbyte; fget; b:=fbyte;
12584  fget; c:=fbyte; fget; d:=fbyte;@/
12585  sw:=(((((d*z)div@'400)+(c*z))div@'400)+(b*z))div beta;
12586  if a=0 then #:=sw@+else if a=255 then #:=sw-alpha@+else abort;
12587  end
12588
12589@<Read box dimensions@>=
12590begin @<Replace |z| by $|z|^\prime$ and compute $\alpha,\beta$@>;
12591for k:=width_base[f] to lig_kern_base[f]-1 do
12592  store_scaled(font_info[k].sc);
12593if font_info[width_base[f]].sc<>0 then abort; {\\{width}[0] must be zero}
12594if font_info[height_base[f]].sc<>0 then abort; {\\{height}[0] must be zero}
12595if font_info[depth_base[f]].sc<>0 then abort; {\\{depth}[0] must be zero}
12596if font_info[italic_base[f]].sc<>0 then abort; {\\{italic}[0] must be zero}
12597end
12598
12599@ @<Replace |z|...@>=
12600begin alpha:=16;
12601while z>=@'40000000 do
12602  begin z:=z div 2; alpha:=alpha+alpha;
12603  end;
12604beta:=256 div alpha; alpha:=alpha*z;
12605end
12606
12607@ @d check_existence(#)==@t@>@;@/
12608  begin check_byte_range(#);
12609  qw:=char_info(f)(#); {N.B.: not |qi(#)|}
12610  if not char_exists(qw) then abort;
12611  end
12612
12613@<Read ligature/kern program@>=
12614bch_label:=@'77777; bchar:=256;
12615if nl>0 then
12616  begin for k:=lig_kern_base[f] to kern_base[f]+kern_base_offset-1 do
12617    begin store_four_quarters(font_info[k].qqqq);
12618    if a>128 then
12619      begin if 256*c+d>=nl then abort;
12620      if a=255 then if k=lig_kern_base[f] then bchar:=b;
12621      end
12622    else begin if b<>bchar then check_existence(b);
12623      if c<128 then check_existence(d) {check ligature}
12624      else if 256*(c-128)+d>=nk then abort; {check kern}
12625      if a<128 then if k-lig_kern_base[f]+a+1>=nl then abort;
12626      end;
12627    end;
12628  if a=255 then bch_label:=256*c+d;
12629  end;
12630for k:=kern_base[f]+kern_base_offset to exten_base[f]-1 do
12631  store_scaled(font_info[k].sc);
12632
12633@ @<Read extensible character recipes@>=
12634for k:=exten_base[f] to param_base[f]-1 do
12635  begin store_four_quarters(font_info[k].qqqq);
12636  if a<>0 then check_existence(a);
12637  if b<>0 then check_existence(b);
12638  if c<>0 then check_existence(c);
12639  check_existence(d);
12640  end
12641
12642@ We check to see that the \.{TFM} file doesn't end prematurely; but
12643no error message is given for files having more than |lf| words.
12644
12645@<Read font parameters@>=
12646begin for k:=1 to np do
12647  if k=1 then {the |slant| parameter is a pure number}
12648    begin fget; sw:=fbyte; if sw>127 then sw:=sw-256;
12649    fget; sw:=sw*@'400+fbyte; fget; sw:=sw*@'400+fbyte;
12650    fget; font_info[param_base[f]].sc:=
12651      (sw*@'20)+(fbyte div@'20);
12652    end
12653  else store_scaled(font_info[param_base[f]+k-1].sc);
12654if eof(tfm_file) then abort;
12655for k:=np+1 to 7 do font_info[param_base[f]+k-1].sc:=0;
12656end
12657
12658@ Now to wrap it up, we have checked all the necessary things about the \.{TFM}
12659file, and all we need to do is put the finishing touches on the data for
12660the new font.
12661
12662@d adjust(#)==#[f]:=qo(#[f])
12663  {correct for the excess |min_quarterword| that was added}
12664
12665@<Make final adjustments...@>=
12666if np>=7 then font_params[f]:=np@+else font_params[f]:=7;
12667hyphen_char[f]:=default_hyphen_char; skew_char[f]:=default_skew_char;
12668if bch_label<nl then bchar_label[f]:=bch_label+lig_kern_base[f]
12669else bchar_label[f]:=non_address;
12670font_bchar[f]:=qi(bchar);
12671font_false_bchar[f]:=qi(bchar);
12672if bchar<=ec then if bchar>=bc then
12673  begin qw:=char_info(f)(bchar); {N.B.: not |qi(bchar)|}
12674  if char_exists(qw) then font_false_bchar[f]:=non_char;
12675  end;
12676font_name[f]:=nom;
12677font_area[f]:=aire;
12678font_bc[f]:=bc; font_ec[f]:=ec; font_glue[f]:=null;
12679adjust(char_base); adjust(width_base); adjust(lig_kern_base);
12680adjust(kern_base); adjust(exten_base);
12681decr(param_base[f]);
12682fmem_ptr:=fmem_ptr+lf; font_ptr:=f; g:=f;
12683font_mapping[f]:=load_tfm_font_mapping;
12684goto done
12685
12686@ Before we forget about the format of these tables, let's deal with two
12687of \TeX's basic scanning routines related to font information.
12688
12689@<Declare procedures that scan font-related stuff@>=
12690procedure scan_font_ident;
12691var f:internal_font_number;
12692@!m:halfword;
12693begin @<Get the next non-blank non-call...@>;
12694if cur_cmd=def_font then f:=cur_font
12695else if cur_cmd=set_font then f:=cur_chr
12696else if cur_cmd=def_family then
12697  begin m:=cur_chr; scan_math_fam_int; f:=equiv(m+cur_val);
12698  end
12699else  begin print_err("Missing font identifier");
12700@.Missing font identifier@>
12701  help2("I was looking for a control sequence whose")@/
12702  ("current meaning has been defined by \font.");
12703  back_error; f:=null_font;
12704  end;
12705cur_val:=f;
12706end;
12707
12708@ The following routine is used to implement `\.{\\fontdimen} |n| |f|'.
12709The boolean parameter |writing| is set |true| if the calling program
12710intends to change the parameter value.
12711
12712@<Declare procedures that scan font-related stuff@>=
12713procedure find_font_dimen(@!writing:boolean);
12714  {sets |cur_val| to |font_info| location}
12715var f:internal_font_number;
12716@!n:integer; {the parameter number}
12717begin scan_int; n:=cur_val; scan_font_ident; f:=cur_val;
12718if n<=0 then cur_val:=fmem_ptr
12719else  begin if writing and(n<=space_shrink_code)and@|
12720    (n>=space_code)and(font_glue[f]<>null) then
12721    begin delete_glue_ref(font_glue[f]);
12722    font_glue[f]:=null;
12723    end;
12724  if n>font_params[f] then
12725    if f<font_ptr then cur_val:=fmem_ptr
12726    else @<Increase the number of parameters in the last font@>
12727  else cur_val:=n+param_base[f];
12728  end;
12729@<Issue an error message if |cur_val=fmem_ptr|@>;
12730end;
12731
12732@ @<Issue an error message if |cur_val=fmem_ptr|@>=
12733if cur_val=fmem_ptr then
12734  begin print_err("Font "); print_esc(font_id_text(f));
12735  print(" has only "); print_int(font_params[f]);
12736  print(" fontdimen parameters");
12737@.Font x has only...@>
12738  help2("To increase the number of font parameters, you must")@/
12739    ("use \fontdimen immediately after the \font is loaded.");
12740  error;
12741  end
12742
12743@ @<Increase the number of parameters...@>=
12744begin repeat if fmem_ptr=font_mem_size then
12745  overflow("font memory",font_mem_size);
12746@:TeX capacity exceeded font memory}{\quad font memory@>
12747font_info[fmem_ptr].sc:=0; incr(fmem_ptr); incr(font_params[f]);
12748until n=font_params[f];
12749cur_val:=fmem_ptr-1; {this equals |param_base[f]+font_params[f]|}
12750end
12751
12752@ When \TeX\ wants to typeset a character that doesn't exist, the
12753character node is not created; thus the output routine can assume
12754that characters exist when it sees them. The following procedure
12755prints a warning message unless the user has suppressed it.
12756
12757@<Declare subroutines for |new_character|@>=
12758procedure char_warning(@!f:internal_font_number;@!c:integer);
12759var old_setting: integer; {saved value of |tracing_online|}
12760begin if tracing_lost_chars>0 then
12761 begin old_setting:=tracing_online;
12762 if eTeX_ex and(tracing_lost_chars>1) then tracing_online:=1;
12763  begin begin_diagnostic;
12764  print_nl("Missing character: There is no ");
12765@.Missing character@>
12766  if c < @"10000 then print_ASCII(c)
12767  else print_char(c); {non-Plane 0 Unicodes can't be sent through |print_ASCII|}
12768  print(" in font ");
12769  slow_print(font_name[f]); print_char("!"); end_diagnostic(false);
12770  end;
12771 tracing_online:=old_setting;
12772 end;
12773end;
12774
12775@ We need a few subroutines for |new_character|.
12776
12777@p @t\4@>@<Declare subroutines for |new_character|@>@;
12778
12779@ Here is a function that returns a pointer to a character node for a
12780given character in a given font. If that character doesn't exist,
12781|null| is returned instead.
12782
12783@p function new_character(@!f:internal_font_number;@!c:eight_bits):pointer;
12784label exit;
12785var p:pointer; {newly allocated node}
12786begin if font_bc[f]<=c then if font_ec[f]>=c then
12787  if char_exists(char_info(f)(qi(c))) then
12788    begin p:=get_avail; font(p):=f; character(p):=qi(c);
12789    new_character:=p; return;
12790    end;
12791char_warning(f,c);
12792new_character:=null;
12793exit:end;
12794
12795@* \[31] Device-independent file format.
12796The most important output produced by a run of \TeX\ is the ``device
12797independent'' (\.{DVI}) file that specifies where characters and rules
12798are to appear on printed pages. The form of these files was designed by
12799David R. Fuchs in 1979. Almost any reasonable typesetting device can be
12800@^Fuchs, David Raymond@>
12801@:DVI_files}{\.{DVI} files@>
12802driven by a program that takes \.{DVI} files as input, and dozens of such
12803\.{DVI}-to-whatever programs have been written. Thus, it is possible to
12804print the output of \TeX\ on many different kinds of equipment, using \TeX\
12805as a device-independent ``front end.''
12806
12807A \.{DVI} file is a stream of 8-bit bytes, which may be regarded as a
12808series of commands in a machine-like language. The first byte of each command
12809is the operation code, and this code is followed by zero or more bytes
12810that provide parameters to the command. The parameters themselves may consist
12811of several consecutive bytes; for example, the `|set_rule|' command has two
12812parameters, each of which is four bytes long. Parameters are usually
12813regarded as nonnegative integers; but four-byte-long parameters,
12814and shorter parameters that denote distances, can be
12815either positive or negative. Such parameters are given in two's complement
12816notation. For example, a two-byte-long distance parameter has a value between
12817$-2^{15}$ and $2^{15}-1$. As in \.{TFM} files, numbers that occupy
12818more than one byte position appear in BigEndian order.
12819
12820\XeTeX\ extends the format of \.{DVI} with its own commands, and thus produced
12821``extended device independent'' (\.{XDV}) files.
12822
12823A \.{DVI} file consists of a ``preamble,'' followed by a sequence of one
12824or more ``pages,'' followed by a ``postamble.'' The preamble is simply a
12825|pre| command, with its parameters that define the dimensions used in the
12826file; this must come first.  Each ``page'' consists of a |bop| command,
12827followed by any number of other commands that tell where characters are to
12828be placed on a physical page, followed by an |eop| command. The pages
12829appear in the order that \TeX\ generated them. If we ignore |nop| commands
12830and \\{fnt\_def} commands (which are allowed between any two commands in
12831the file), each |eop| command is immediately followed by a |bop| command,
12832or by a |post| command; in the latter case, there are no more pages in the
12833file, and the remaining bytes form the postamble.  Further details about
12834the postamble will be explained later.
12835
12836Some parameters in \.{DVI} commands are ``pointers.'' These are four-byte
12837quantities that give the location number of some other byte in the file;
12838the first byte is number~0, then comes number~1, and so on. For example,
12839one of the parameters of a |bop| command points to the previous |bop|;
12840this makes it feasible to read the pages in backwards order, in case the
12841results are being directed to a device that stacks its output face up.
12842Suppose the preamble of a \.{DVI} file occupies bytes 0 to 99. Now if the
12843first page occupies bytes 100 to 999, say, and if the second
12844page occupies bytes 1000 to 1999, then the |bop| that starts in byte 1000
12845points to 100 and the |bop| that starts in byte 2000 points to 1000. (The
12846very first |bop|, i.e., the one starting in byte 100, has a pointer of~$-1$.)
12847
12848@ The \.{DVI} format is intended to be both compact and easily interpreted
12849by a machine. Compactness is achieved by making most of the information
12850implicit instead of explicit. When a \.{DVI}-reading program reads the
12851commands for a page, it keeps track of several quantities: (a)~The current
12852font |f| is an integer; this value is changed only
12853by \\{fnt} and \\{fnt\_num} commands. (b)~The current position on the page
12854is given by two numbers called the horizontal and vertical coordinates,
12855|h| and |v|. Both coordinates are zero at the upper left corner of the page;
12856moving to the right corresponds to increasing the horizontal coordinate, and
12857moving down corresponds to increasing the vertical coordinate. Thus, the
12858coordinates are essentially Cartesian, except that vertical directions are
12859flipped; the Cartesian version of |(h,v)| would be |(h,-v)|.  (c)~The
12860current spacing amounts are given by four numbers |w|, |x|, |y|, and |z|,
12861where |w| and~|x| are used for horizontal spacing and where |y| and~|z|
12862are used for vertical spacing. (d)~There is a stack containing
12863|(h,v,w,x,y,z)| values; the \.{DVI} commands |push| and |pop| are used to
12864change the current level of operation. Note that the current font~|f| is
12865not pushed and popped; the stack contains only information about
12866positioning.
12867
12868The values of |h|, |v|, |w|, |x|, |y|, and |z| are signed integers having up
12869to 32 bits, including the sign. Since they represent physical distances,
12870there is a small unit of measurement such that increasing |h| by~1 means
12871moving a certain tiny distance to the right. The actual unit of
12872measurement is variable, as explained below; \TeX\ sets things up so that
12873its \.{DVI} output is in sp units, i.e., scaled points, in agreement with
12874all the |scaled| dimensions in \TeX's data structures.
12875
12876@ Here is a list of all the commands that may appear in a \.{XDV} file. Each
12877command is specified by its symbolic name (e.g., |bop|), its opcode byte
12878(e.g., 139), and its parameters (if any). The parameters are followed
12879by a bracketed number telling how many bytes they occupy; for example,
12880`|p[4]|' means that parameter |p| is four bytes long.
12881
12882\yskip\hang|set_char_0| 0. Typeset character number~0 from font~|f|
12883such that the reference point of the character is at |(h,v)|. Then
12884increase |h| by the width of that character. Note that a character may
12885have zero or negative width, so one cannot be sure that |h| will advance
12886after this command; but |h| usually does increase.
12887
12888\yskip\hang\\{set\_char\_1} through \\{set\_char\_127} (opcodes 1 to 127).
12889Do the operations of |set_char_0|; but use the character whose number
12890matches the opcode, instead of character~0.
12891
12892\yskip\hang|set1| 128 |c[1]|. Same as |set_char_0|, except that character
12893number~|c| is typeset. \TeX82 uses this command for characters in the
12894range |128<=c<256|.
12895
12896\yskip\hang|@!set2| 129 |c[2]|. Same as |set1|, except that |c|~is two
12897bytes long, so it is in the range |0<=c<65536|. \TeX82 never uses this
12898command, but it should come in handy for extensions of \TeX\ that deal
12899with oriental languages.
12900@^oriental characters@>@^Chinese characters@>@^Japanese characters@>
12901
12902\yskip\hang|@!set3| 130 |c[3]|. Same as |set1|, except that |c|~is three
12903bytes long, so it can be as large as $2^{24}-1$. Not even the Chinese
12904language has this many characters, but this command might prove useful
12905in some yet unforeseen extension.
12906
12907\yskip\hang|@!set4| 131 |c[4]|. Same as |set1|, except that |c|~is four
12908bytes long. Imagine that.
12909
12910\yskip\hang|set_rule| 132 |a[4]| |b[4]|. Typeset a solid black rectangle
12911of height~|a| and width~|b|, with its bottom left corner at |(h,v)|. Then
12912set |h:=h+b|. If either |a<=0| or |b<=0|, nothing should be typeset. Note
12913that if |b<0|, the value of |h| will decrease even though nothing else happens.
12914See below for details about how to typeset rules so that consistency with
12915\MF\ is guaranteed.
12916
12917\yskip\hang|@!put1| 133 |c[1]|. Typeset character number~|c| from font~|f|
12918such that the reference point of the character is at |(h,v)|. (The `put'
12919commands are exactly like the `set' commands, except that they simply put out a
12920character or a rule without moving the reference point afterwards.)
12921
12922\yskip\hang|@!put2| 134 |c[2]|. Same as |set2|, except that |h| is not changed.
12923
12924\yskip\hang|@!put3| 135 |c[3]|. Same as |set3|, except that |h| is not changed.
12925
12926\yskip\hang|@!put4| 136 |c[4]|. Same as |set4|, except that |h| is not changed.
12927
12928\yskip\hang|put_rule| 137 |a[4]| |b[4]|. Same as |set_rule|, except that
12929|h| is not changed.
12930
12931\yskip\hang|nop| 138. No operation, do nothing. Any number of |nop|'s
12932may occur between \.{DVI} commands, but a |nop| cannot be inserted between
12933a command and its parameters or between two parameters.
12934
12935\yskip\hang|bop| 139 $c_0[4]$ $c_1[4]$ $\ldots$ $c_9[4]$ $p[4]$. Beginning
12936of a page: Set |(h,v,w,x,y,z):=(0,0,0,0,0,0)| and set the stack empty. Set
12937the current font |f| to an undefined value.  The ten $c_i$ parameters hold
12938the values of \.{\\count0} $\ldots$ \.{\\count9} in \TeX\ at the time
12939\.{\\shipout} was invoked for this page; they can be used to identify
12940pages, if a user wants to print only part of a \.{DVI} file. The parameter
12941|p| points to the previous |bop| in the file; the first
12942|bop| has $p=-1$.
12943
12944\yskip\hang|eop| 140.  End of page: Print what you have read since the
12945previous |bop|. At this point the stack should be empty. (The \.{DVI}-reading
12946programs that drive most output devices will have kept a buffer of the
12947material that appears on the page that has just ended. This material is
12948largely, but not entirely, in order by |v| coordinate and (for fixed |v|) by
12949|h|~coordinate; so it usually needs to be sorted into some order that is
12950appropriate for the device in question.)
12951
12952\yskip\hang|push| 141. Push the current values of |(h,v,w,x,y,z)| onto the
12953top of the stack; do not change any of these values. Note that |f| is
12954not pushed.
12955
12956\yskip\hang|pop| 142. Pop the top six values off of the stack and assign
12957them respectively to |(h,v,w,x,y,z)|. The number of pops should never
12958exceed the number of pushes, since it would be highly embarrassing if the
12959stack were empty at the time of a |pop| command.
12960
12961\yskip\hang|right1| 143 |b[1]|. Set |h:=h+b|, i.e., move right |b| units.
12962The parameter is a signed number in two's complement notation, |-128<=b<128|;
12963if |b<0|, the reference point moves left.
12964
12965\yskip\hang|right2| 144 |b[2]|. Same as |right1|, except that |b| is a
12966two-byte quantity in the range |-32768<=b<32768|.
12967
12968\yskip\hang|right3| 145 |b[3]|. Same as |right1|, except that |b| is a
12969three-byte quantity in the range |@t$-2^{23}$@><=b<@t$2^{23}$@>|.
12970
12971\yskip\hang|right4| 146 |b[4]|. Same as |right1|, except that |b| is a
12972four-byte quantity in the range |@t$-2^{31}$@><=b<@t$2^{31}$@>|.
12973
12974\yskip\hang|w0| 147. Set |h:=h+w|; i.e., move right |w| units. With luck,
12975this parameterless command will usually suffice, because the same kind of motion
12976will occur several times in succession; the following commands explain how
12977|w| gets particular values.
12978
12979\yskip\hang|w1| 148 |b[1]|. Set |w:=b| and |h:=h+b|. The value of |b| is a
12980signed quantity in two's complement notation, |-128<=b<128|. This command
12981changes the current |w|~spacing and moves right by |b|.
12982
12983\yskip\hang|@!w2| 149 |b[2]|. Same as |w1|, but |b| is two bytes long,
12984|-32768<=b<32768|.
12985
12986\yskip\hang|@!w3| 150 |b[3]|. Same as |w1|, but |b| is three bytes long,
12987|@t$-2^{23}$@><=b<@t$2^{23}$@>|.
12988
12989\yskip\hang|@!w4| 151 |b[4]|. Same as |w1|, but |b| is four bytes long,
12990|@t$-2^{31}$@><=b<@t$2^{31}$@>|.
12991
12992\yskip\hang|x0| 152. Set |h:=h+x|; i.e., move right |x| units. The `|x|'
12993commands are like the `|w|' commands except that they involve |x| instead
12994of |w|.
12995
12996\yskip\hang|x1| 153 |b[1]|. Set |x:=b| and |h:=h+b|. The value of |b| is a
12997signed quantity in two's complement notation, |-128<=b<128|. This command
12998changes the current |x|~spacing and moves right by |b|.
12999
13000\yskip\hang|@!x2| 154 |b[2]|. Same as |x1|, but |b| is two bytes long,
13001|-32768<=b<32768|.
13002
13003\yskip\hang|@!x3| 155 |b[3]|. Same as |x1|, but |b| is three bytes long,
13004|@t$-2^{23}$@><=b<@t$2^{23}$@>|.
13005
13006\yskip\hang|@!x4| 156 |b[4]|. Same as |x1|, but |b| is four bytes long,
13007|@t$-2^{31}$@><=b<@t$2^{31}$@>|.
13008
13009\yskip\hang|down1| 157 |a[1]|. Set |v:=v+a|, i.e., move down |a| units.
13010The parameter is a signed number in two's complement notation, |-128<=a<128|;
13011if |a<0|, the reference point moves up.
13012
13013\yskip\hang|@!down2| 158 |a[2]|. Same as |down1|, except that |a| is a
13014two-byte quantity in the range |-32768<=a<32768|.
13015
13016\yskip\hang|@!down3| 159 |a[3]|. Same as |down1|, except that |a| is a
13017three-byte quantity in the range |@t$-2^{23}$@><=a<@t$2^{23}$@>|.
13018
13019\yskip\hang|@!down4| 160 |a[4]|. Same as |down1|, except that |a| is a
13020four-byte quantity in the range |@t$-2^{31}$@><=a<@t$2^{31}$@>|.
13021
13022\yskip\hang|y0| 161. Set |v:=v+y|; i.e., move down |y| units. With luck,
13023this parameterless command will usually suffice, because the same kind of motion
13024will occur several times in succession; the following commands explain how
13025|y| gets particular values.
13026
13027\yskip\hang|y1| 162 |a[1]|. Set |y:=a| and |v:=v+a|. The value of |a| is a
13028signed quantity in two's complement notation, |-128<=a<128|. This command
13029changes the current |y|~spacing and moves down by |a|.
13030
13031\yskip\hang|@!y2| 163 |a[2]|. Same as |y1|, but |a| is two bytes long,
13032|-32768<=a<32768|.
13033
13034\yskip\hang|@!y3| 164 |a[3]|. Same as |y1|, but |a| is three bytes long,
13035|@t$-2^{23}$@><=a<@t$2^{23}$@>|.
13036
13037\yskip\hang|@!y4| 165 |a[4]|. Same as |y1|, but |a| is four bytes long,
13038|@t$-2^{31}$@><=a<@t$2^{31}$@>|.
13039
13040\yskip\hang|z0| 166. Set |v:=v+z|; i.e., move down |z| units. The `|z|' commands
13041are like the `|y|' commands except that they involve |z| instead of |y|.
13042
13043\yskip\hang|z1| 167 |a[1]|. Set |z:=a| and |v:=v+a|. The value of |a| is a
13044signed quantity in two's complement notation, |-128<=a<128|. This command
13045changes the current |z|~spacing and moves down by |a|.
13046
13047\yskip\hang|@!z2| 168 |a[2]|. Same as |z1|, but |a| is two bytes long,
13048|-32768<=a<32768|.
13049
13050\yskip\hang|@!z3| 169 |a[3]|. Same as |z1|, but |a| is three bytes long,
13051|@t$-2^{23}$@><=a<@t$2^{23}$@>|.
13052
13053\yskip\hang|@!z4| 170 |a[4]|. Same as |z1|, but |a| is four bytes long,
13054|@t$-2^{31}$@><=a<@t$2^{31}$@>|.
13055
13056\yskip\hang|fnt_num_0| 171. Set |f:=0|. Font 0 must previously have been
13057defined by a \\{fnt\_def} instruction, as explained below.
13058
13059\yskip\hang\\{fnt\_num\_1} through \\{fnt\_num\_63} (opcodes 172 to 234). Set
13060|f:=1|, \dots, \hbox{|f:=63|}, respectively.
13061
13062\yskip\hang|fnt1| 235 |k[1]|. Set |f:=k|. \TeX82 uses this command for font
13063numbers in the range |64<=k<256|.
13064
13065\yskip\hang|@!fnt2| 236 |k[2]|. Same as |fnt1|, except that |k|~is two
13066bytes long, so it is in the range |0<=k<65536|. \TeX82 never generates this
13067command, but large font numbers may prove useful for specifications of
13068color or texture, or they may be used for special fonts that have fixed
13069numbers in some external coding scheme.
13070
13071\yskip\hang|@!fnt3| 237 |k[3]|. Same as |fnt1|, except that |k|~is three
13072bytes long, so it can be as large as $2^{24}-1$.
13073
13074\yskip\hang|@!fnt4| 238 |k[4]|. Same as |fnt1|, except that |k|~is four
13075bytes long; this is for the really big font numbers (and for the negative ones).
13076
13077\yskip\hang|xxx1| 239 |k[1]| |x[k]|. This command is undefined in
13078general; it functions as a $(k+2)$-byte |nop| unless special \.{DVI}-reading
13079programs are being used. \TeX82 generates |xxx1| when a short enough
13080\.{\\special} appears, setting |k| to the number of bytes being sent. It
13081is recommended that |x| be a string having the form of a keyword followed
13082by possible parameters relevant to that keyword.
13083
13084\yskip\hang|@!xxx2| 240 |k[2]| |x[k]|. Like |xxx1|, but |0<=k<65536|.
13085
13086\yskip\hang|@!xxx3| 241 |k[3]| |x[k]|. Like |xxx1|, but |0<=k<@t$2^{24}$@>|.
13087
13088\yskip\hang|xxx4| 242 |k[4]| |x[k]|. Like |xxx1|, but |k| can be ridiculously
13089large. \TeX82 uses |xxx4| when sending a string of length 256 or more.
13090
13091\yskip\hang|fnt_def1| 243 |k[1]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|.
13092Define font |k|, where |0<=k<256|; font definitions will be explained shortly.
13093
13094\yskip\hang|@!fnt_def2| 244 |k[2]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|.
13095Define font |k|, where |0<=k<65536|.
13096
13097\yskip\hang|@!fnt_def3| 245 |k[3]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|.
13098Define font |k|, where |0<=k<@t$2^{24}$@>|.
13099
13100\yskip\hang|@!fnt_def4| 246 |k[4]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|.
13101Define font |k|, where |@t$-2^{31}$@><=k<@t$2^{31}$@>|.
13102
13103\yskip\hang|pre| 247 |i[1]| |num[4]| |den[4]| |mag[4]| |k[1]| |x[k]|.
13104Beginning of the preamble; this must come at the very beginning of the
13105file. Parameters |i|, |num|, |den|, |mag|, |k|, and |x| are explained below.
13106
13107\yskip\hang|post| 248. Beginning of the postamble, see below.
13108
13109\yskip\hang|post_post| 249. Ending of the postamble, see below.
13110
13111\yskip\noindent Commands 250--255 are undefined in normal \.{DVI} files, but
13112the following commands are used in \.{XDV} files.
13113
13114\yskip\hang\vbox{\halign{#&#\hfil\cr
13115|define_native_font| 252 & |k[4]| |s[4]| |flags[2]| |l[1]| |n[l]| |i[4]|\cr
13116& |if (flags and COLORED) then| |rgba[4]|\cr
13117& |if (flags and EXTEND) then| |extend[4]|\cr
13118& |if (flags and SLANT) then| |slant[4]|\cr
13119& |if (flags and EMBOLDEN) then| |embolden[4]|\cr
13120}}
13121
13122\yskip\hang|set_glyphs| 253 |w[4]| |k[2]| |xy[8k]| |g[2k]|.
13123
13124\yskip\noindent Commands 250 and 255 are undefined in normal \.{XDV} files.
13125
13126@ @d set_char_0=0 {typeset character 0 and move right}
13127@d set1=128 {typeset a character and move right}
13128@d set_rule=132 {typeset a rule and move right}
13129@d put_rule=137 {typeset a rule}
13130@d nop=138 {no operation}
13131@d bop=139 {beginning of page}
13132@d eop=140 {ending of page}
13133@d push=141 {save the current positions}
13134@d pop=142 {restore previous positions}
13135@d right1=143 {move right}
13136@d w0=147 {move right by |w|}
13137@d w1=148 {move right and set |w|}
13138@d x0=152 {move right by |x|}
13139@d x1=153 {move right and set |x|}
13140@d down1=157 {move down}
13141@d y0=161 {move down by |y|}
13142@d y1=162 {move down and set |y|}
13143@d z0=166 {move down by |z|}
13144@d z1=167 {move down and set |z|}
13145@d fnt_num_0=171 {set current font to 0}
13146@d fnt1=235 {set current font}
13147@d xxx1=239 {extension to \.{DVI} primitives}
13148@d xxx4=242 {potentially long extension to \.{DVI} primitives}
13149@d fnt_def1=243 {define the meaning of a font number}
13150@d pre=247 {preamble}
13151@d post=248 {postamble beginning}
13152@d post_post=249 {postamble ending}
13153
13154@d define_native_font=252 {define native font}
13155@d set_glyphs=253 {sequence of glyphs with individual x-y coordinates}
13156
13157@ The preamble contains basic information about the file as a whole. As
13158stated above, there are six parameters:
13159$$\hbox{|@!i[1]| |@!num[4]| |@!den[4]| |@!mag[4]| |@!k[1]| |@!x[k]|.}$$
13160The |i| byte identifies \.{DVI} format; in \XeTeX\ this byte is set to~6, as we
13161have new \.{DVI} opcodes, while in \TeX82 it is always set to~2. (The value
13162|i=3| is used for an extended format that allows a mixture of right-to-left and
13163left-to-right typesetting. Older versions of \XeTeX\ used |i=4| and |i=5|.)
13164
13165The next two parameters, |num| and |den|, are positive integers that define
13166the units of measurement; they are the numerator and denominator of a
13167fraction by which all dimensions in the \.{DVI} file could be multiplied
13168in order to get lengths in units of $10^{-7}$ meters. Since $\rm 7227{pt} =
13169254{cm}$, and since \TeX\ works with scaled points where there are $2^{16}$
13170sp in a point, \TeX\ sets
13171$|num|/|den|=(254\cdot10^5)/(7227\cdot2^{16})=25400000/473628672$.
13172@^sp@>
13173
13174The |mag| parameter is what \TeX\ calls \.{\\mag}, i.e., 1000 times the
13175desired magnification. The actual fraction by which dimensions are
13176multiplied is therefore $|mag|\cdot|num|/1000|den|$. Note that if a \TeX\
13177source document does not call for any `\.{true}' dimensions, and if you
13178change it only by specifying a different \.{\\mag} setting, the \.{DVI}
13179file that \TeX\ creates will be completely unchanged except for the value
13180of |mag| in the preamble and postamble. (Fancy \.{DVI}-reading programs allow
13181users to override the |mag|~setting when a \.{DVI} file is being printed.)
13182
13183Finally, |k| and |x| allow the \.{DVI} writer to include a comment, which is not
13184interpreted further. The length of comment |x| is |k|, where |0<=k<256|.
13185
13186@d id_byte=6 {identifies the kind of \.{DVI} files described here}
13187
13188@ Font definitions for a given font number |k| contain further parameters
13189$$\hbox{|c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|.}$$
13190The four-byte value |c| is the check sum that \TeX\ found in the \.{TFM}
13191file for this font; |c| should match the check sum of the font found by
13192programs that read this \.{DVI} file.
13193@^check sum@>
13194
13195Parameter |s| contains a fixed-point scale factor that is applied to
13196the character widths in font |k|; font dimensions in \.{TFM} files and
13197other font files are relative to this quantity, which is called the
13198``at size'' elsewhere in this documentation. The value of |s| is
13199always positive and less than $2^{27}$. It is given in the same units
13200as the other \.{DVI} dimensions, i.e., in sp when \TeX82 has made the
13201file.  Parameter |d| is similar to |s|; it is the ``design size,'' and
13202(like~|s|) it is given in \.{DVI} units. Thus, font |k| is to be used
13203at $|mag|\cdot s/1000d$ times its normal size.
13204
13205The remaining part of a font definition gives the external name of the font,
13206which is an ASCII string of length |a+l|. The number |a| is the length
13207of the ``area'' or directory, and |l| is the length of the font name itself;
13208the standard local system font area is supposed to be used when |a=0|.
13209The |n| field contains the area in its first |a| bytes.
13210
13211Font definitions must appear before the first use of a particular font number.
13212Once font |k| is defined, it must not be defined again; however, we
13213shall see below that font definitions appear in the postamble as well as
13214in the pages, so in this sense each font number is defined exactly twice,
13215if at all. Like |nop| commands, font definitions can
13216appear before the first |bop|, or between an |eop| and a |bop|.
13217
13218@ Sometimes it is desirable to make horizontal or vertical rules line up
13219precisely with certain features in characters of a font. It is possible to
13220guarantee the correct matching between \.{DVI} output and the characters
13221generated by \MF\ by adhering to the following principles: (1)~The \MF\
13222characters should be positioned so that a bottom edge or left edge that is
13223supposed to line up with the bottom or left edge of a rule appears at the
13224reference point, i.e., in row~0 and column~0 of the \MF\ raster. This
13225ensures that the position of the rule will not be rounded differently when
13226the pixel size is not a perfect multiple of the units of measurement in
13227the \.{DVI} file. (2)~A typeset rule of height $a>0$ and width $b>0$
13228should be equivalent to a \MF-generated character having black pixels in
13229precisely those raster positions whose \MF\ coordinates satisfy
13230|0<=x<@t$\alpha$@>b| and |0<=y<@t$\alpha$@>a|, where $\alpha$ is the number
13231of pixels per \.{DVI} unit.
13232@:METAFONT}{\MF@>
13233@^alignment of rules with characters@>
13234@^rules aligning with characters@>
13235
13236@ The last page in a \.{DVI} file is followed by `|post|'; this command
13237introduces the postamble, which summarizes important facts that \TeX\ has
13238accumulated about the file, making it possible to print subsets of the data
13239with reasonable efficiency. The postamble has the form
13240$$\vbox{\halign{\hbox{#\hfil}\cr
13241  |post| |p[4]| |num[4]| |den[4]| |mag[4]| |l[4]| |u[4]| |s[2]| |t[2]|\cr
13242  $\langle\,$font definitions$\,\rangle$\cr
13243  |post_post| |q[4]| |i[1]| 223's$[{\G}4]$\cr}}$$
13244Here |p| is a pointer to the final |bop| in the file. The next three
13245parameters, |num|, |den|, and |mag|, are duplicates of the quantities that
13246appeared in the preamble.
13247
13248Parameters |l| and |u| give respectively the height-plus-depth of the tallest
13249page and the width of the widest page, in the same units as other dimensions
13250of the file. These numbers might be used by a \.{DVI}-reading program to
13251position individual ``pages'' on large sheets of film or paper; however,
13252the standard convention for output on normal size paper is to position each
13253page so that the upper left-hand corner is exactly one inch from the left
13254and the top. Experience has shown that it is unwise to design \.{DVI}-to-printer
13255software that attempts cleverly to center the output; a fixed position of
13256the upper left corner is easiest for users to understand and to work with.
13257Therefore |l| and~|u| are often ignored.
13258
13259Parameter |s| is the maximum stack depth (i.e., the largest excess of
13260|push| commands over |pop| commands) needed to process this file. Then
13261comes |t|, the total number of pages (|bop| commands) present.
13262
13263The postamble continues with font definitions, which are any number of
13264\\{fnt\_def} commands as described above, possibly interspersed with |nop|
13265commands. Each font number that is used in the \.{DVI} file must be defined
13266exactly twice: Once before it is first selected by a \\{fnt} command, and once
13267in the postamble.
13268
13269@ The last part of the postamble, following the |post_post| byte that
13270signifies the end of the font definitions, contains |q|, a pointer to the
13271|post| command that started the postamble.  An identification byte, |i|,
13272comes next; this currently equals~2, as in the preamble.
13273
13274The |i| byte is followed by four or more bytes that are all equal to
13275the decimal number 223 (i.e., @'337 in octal). \TeX\ puts out four to seven of
13276these trailing bytes, until the total length of the file is a multiple of
13277four bytes, since this works out best on machines that pack four bytes per
13278word; but any number of 223's is allowed, as long as there are at least four
13279of them. In effect, 223 is a sort of signature that is added at the very end.
13280@^Fuchs, David Raymond@>
13281
13282This curious way to finish off a \.{DVI} file makes it feasible for
13283\.{DVI}-reading programs to find the postamble first, on most computers,
13284even though \TeX\ wants to write the postamble last. Most operating
13285systems permit random access to individual words or bytes of a file, so
13286the \.{DVI} reader can start at the end and skip backwards over the 223's
13287until finding the identification byte. Then it can back up four bytes, read
13288|q|, and move to byte |q| of the file. This byte should, of course,
13289contain the value 248 (|post|); now the postamble can be read, so the
13290\.{DVI} reader can discover all the information needed for typesetting the
13291pages. Note that it is also possible to skip through the \.{DVI} file at
13292reasonably high speed to locate a particular page, if that proves
13293desirable. This saves a lot of time, since \.{DVI} files used in production
13294jobs tend to be large.
13295
13296Unfortunately, however, standard \PASCAL\ does not include the ability to
13297@^system dependencies@>
13298access a random position in a file, or even to determine the length of a file.
13299Almost all systems nowadays provide the necessary capabilities, so \.{DVI}
13300format has been designed to work most efficiently with modern operating systems.
13301But if \.{DVI} files have to be processed under the restrictions of standard
13302\PASCAL, one can simply read them from front to back, since the necessary
13303header information is present in the preamble and in the font definitions.
13304(The |l| and |u| and |s| and |t| parameters, which appear only in the
13305postamble, are ``frills'' that are handy but not absolutely necessary.)
13306
13307@* \[32] Shipping pages out.
13308After considering \TeX's eyes and stomach, we come now to the bowels.
13309@^bowels@>
13310
13311The |ship_out| procedure is given a pointer to a box; its mission is
13312to describe that box in \.{DVI} form, outputting a ``page'' to |dvi_file|.
13313The \.{DVI} coordinates $(h,v)=(0,0)$ should correspond to the upper left
13314corner of the box being shipped.
13315
13316Since boxes can be inside of boxes inside of boxes, the main work of
13317|ship_out| is done by two mutually recursive routines, |hlist_out|
13318and |vlist_out|, which traverse the hlists and vlists inside of horizontal
13319and vertical boxes.
13320
13321As individual pages are being processed, we need to accumulate
13322information about the entire set of pages, since such statistics must be
13323reported in the postamble. The global variables |total_pages|, |max_v|,
13324|max_h|, |max_push|, and |last_bop| are used to record this information.
13325
13326The variable |doing_leaders| is |true| while leaders are being output.
13327The variable |dead_cycles| contains the number of times an output routine
13328has been initiated since the last |ship_out|.
13329
13330A few additional global variables are also defined here for use in
13331|vlist_out| and |hlist_out|. They could have been local variables, but
13332that would waste stack space when boxes are deeply nested, since the
13333values of these variables are not needed during recursive calls.
13334@^recursion@>
13335
13336@<Glob...@>=
13337@!total_pages:integer; {the number of pages that have been shipped out}
13338@!max_v:scaled; {maximum height-plus-depth of pages shipped so far}
13339@!max_h:scaled; {maximum width of pages shipped so far}
13340@!max_push:integer; {deepest nesting of |push| commands encountered so far}
13341@!last_bop:integer; {location of previous |bop| in the \.{DVI} output}
13342@!dead_cycles:integer; {recent outputs that didn't ship anything out}
13343@!doing_leaders:boolean; {are we inside a leader box?}
13344@#
13345@!c,@!f:quarterword; {character and font in current |char_node|}
13346@!rule_ht,@!rule_dp,@!rule_wd:scaled; {size of current rule being output}
13347@!g:pointer; {current glue specification}
13348@!lq,@!lr:integer; {quantities used in calculations for leaders}
13349
13350@ @<Set init...@>=
13351total_pages:=0; max_v:=0; max_h:=0; max_push:=0; last_bop:=-1;
13352doing_leaders:=false; dead_cycles:=0; cur_s:=-1;
13353
13354@ The \.{DVI} bytes are output to a buffer instead of being written directly
13355to the output file. This makes it possible to reduce the overhead of
13356subroutine calls, thereby measurably speeding up the computation, since
13357output of \.{DVI} bytes is part of \TeX's inner loop. And it has another
13358advantage as well, since we can change instructions in the buffer in order to
13359make the output more compact. For example, a `|down2|' command can be
13360changed to a `|y2|', thereby making a subsequent `|y0|' command possible,
13361saving two bytes.
13362
13363The output buffer is divided into two parts of equal size; the bytes found
13364in |dvi_buf[0..half_buf-1]| constitute the first half, and those in
13365|dvi_buf[half_buf..dvi_buf_size-1]| constitute the second. The global
13366variable |dvi_ptr| points to the position that will receive the next
13367output byte. When |dvi_ptr| reaches |dvi_limit|, which is always equal
13368to one of the two values |half_buf| or |dvi_buf_size|, the half buffer that
13369is about to be invaded next is sent to the output and |dvi_limit| is
13370changed to its other value. Thus, there is always at least a half buffer's
13371worth of information present, except at the very beginning of the job.
13372
13373Bytes of the \.{DVI} file are numbered sequentially starting with 0;
13374the next byte to be generated will be number |dvi_offset+dvi_ptr|.
13375A byte is present in the buffer only if its number is |>=dvi_gone|.
13376
13377@<Types...@>=
13378@!dvi_index=0..dvi_buf_size; {an index into the output buffer}
13379
13380@ Some systems may find it more efficient to make |dvi_buf| a |packed|
13381array, since output of four bytes at once may be facilitated.
13382@^system dependencies@>
13383
13384@<Glob...@>=
13385@!dvi_buf:array[dvi_index] of eight_bits; {buffer for \.{DVI} output}
13386@!half_buf:dvi_index; {half of |dvi_buf_size|}
13387@!dvi_limit:dvi_index; {end of the current half buffer}
13388@!dvi_ptr:dvi_index; {the next available buffer address}
13389@!dvi_offset:integer; {|dvi_buf_size| times the number of times the
13390  output buffer has been fully emptied}
13391@!dvi_gone:integer; {the number of bytes already output to |dvi_file|}
13392
13393@ Initially the buffer is all in one piece; we will output half of it only
13394after it first fills up.
13395
13396@<Set init...@>=
13397half_buf:=dvi_buf_size div 2; dvi_limit:=dvi_buf_size; dvi_ptr:=0;
13398dvi_offset:=0; dvi_gone:=0;
13399
13400@ The actual output of |dvi_buf[a..b]| to |dvi_file| is performed by calling
13401|write_dvi(a,b)|. For best results, this procedure should be optimized to
13402run as fast as possible on each particular system, since it is part of
13403\TeX's inner loop. It is safe to assume that |a| and |b+1| will both be
13404multiples of 4 when |write_dvi(a,b)| is called; therefore it is possible on
13405many machines to use efficient methods to pack four bytes per word and to
13406output an array of words with one system call.
13407@^system dependencies@>
13408@^inner loop@>
13409@^defecation@>
13410
13411@p procedure write_dvi(@!a,@!b:dvi_index);
13412var k:dvi_index;
13413begin for k:=a to b do write(dvi_file,dvi_buf[k]);
13414end;
13415
13416@ To put a byte in the buffer without paying the cost of invoking a procedure
13417each time, we use the macro |dvi_out|.
13418
13419@d dvi_out(#)==@+begin dvi_buf[dvi_ptr]:=#; incr(dvi_ptr);
13420  if dvi_ptr=dvi_limit then dvi_swap;
13421  end
13422
13423@p procedure dvi_swap; {outputs half of the buffer}
13424begin if dvi_limit=dvi_buf_size then
13425  begin write_dvi(0,half_buf-1); dvi_limit:=half_buf;
13426  dvi_offset:=dvi_offset+dvi_buf_size; dvi_ptr:=0;
13427  end
13428else  begin write_dvi(half_buf,dvi_buf_size-1); dvi_limit:=dvi_buf_size;
13429  end;
13430dvi_gone:=dvi_gone+half_buf;
13431end;
13432
13433@ Here is how we clean out the buffer when \TeX\ is all through; |dvi_ptr|
13434will be a multiple of~4.
13435
13436@<Empty the last bytes out of |dvi_buf|@>=
13437if dvi_limit=half_buf then write_dvi(half_buf,dvi_buf_size-1);
13438if dvi_ptr>0 then write_dvi(0,dvi_ptr-1)
13439
13440@ The |dvi_four| procedure outputs four bytes in two's complement notation,
13441without risking arithmetic overflow.
13442
13443@p procedure dvi_four(@!x:integer);
13444begin if x>=0 then dvi_out(x div @'100000000)
13445else  begin x:=x+@'10000000000;
13446  x:=x+@'10000000000;
13447  dvi_out((x div @'100000000) + 128);
13448  end;
13449x:=x mod @'100000000; dvi_out(x div @'200000);
13450x:=x mod @'200000; dvi_out(x div @'400);
13451dvi_out(x mod @'400);
13452end;
13453
13454procedure dvi_two(s: UTF16_code);
13455begin
13456    dvi_out(s div @'400);
13457    dvi_out(s mod @'400);
13458end;
13459
13460@ A mild optimization of the output is performed by the |dvi_pop|
13461routine, which issues a |pop| unless it is possible to cancel a
13462`|push| |pop|' pair. The parameter to |dvi_pop| is the byte address
13463following the old |push| that matches the new |pop|.
13464
13465@p procedure dvi_pop(@!l:integer);
13466begin if (l=dvi_offset+dvi_ptr)and(dvi_ptr>0) then decr(dvi_ptr)
13467else dvi_out(pop);
13468end;
13469
13470@ Here's a procedure that outputs a font definition. Since \TeX82 uses at
13471most 256 different fonts per job, |fnt_def1| is always used as the command code.
13472
13473@p procedure dvi_native_font_def(@!f:internal_font_number);
13474var
13475  font_def_length, i: integer;
13476begin
13477  dvi_out(define_native_font);
13478  dvi_four(f-font_base-1);
13479  font_def_length:=make_font_def(f);
13480  for i:=0 to font_def_length - 1 do dvi_out(xdv_buffer[i]);
13481end;
13482
13483procedure dvi_font_def(@!f:internal_font_number);
13484var k:pool_pointer; {index into |str_pool|}
13485l:integer; {length of name without mapping option}
13486begin if is_native_font(f) then dvi_native_font_def(f) else
13487begin dvi_out(fnt_def1);
13488dvi_out(f-font_base-1);@/
13489dvi_out(qo(font_check[f].b0));
13490dvi_out(qo(font_check[f].b1));
13491dvi_out(qo(font_check[f].b2));
13492dvi_out(qo(font_check[f].b3));@/
13493dvi_four(font_size[f]);
13494dvi_four(font_dsize[f]);@/
13495dvi_out(length(font_area[f]));
13496@<Output the font name whose internal number is |f|@>;
13497end;
13498
13499@ @<Output the font name whose internal number is |f|@>=
13500l:=0; k:=str_start_macro(font_name[f]);
13501{search for colon; we will truncate the name there}
13502while (l=0) and (k<str_start_macro(font_name[f]+1)) do begin
13503  if so(str_pool[k]) = ":" then l:=k-str_start_macro(font_name[f]);
13504  incr(k);
13505end;
13506if l=0 then l:=length(font_name[f]); {no colon found}
13507dvi_out(l);
13508for k:=str_start_macro(font_area[f]) to str_start_macro(font_area[f]+1)-1 do
13509  dvi_out(so(str_pool[k]));
13510for k:=str_start_macro(font_name[f]) to str_start_macro(font_name[f])+l-1 do
13511  dvi_out(so(str_pool[k]));
13512end;
13513
13514@ Versions of \TeX\ intended for small computers might well choose to omit
13515the ideas in the next few parts of this program, since it is not really
13516necessary to optimize the \.{DVI} code by making use of the |w0|, |x0|,
13517|y0|, and |z0| commands. Furthermore, the algorithm that we are about to
13518describe does not pretend to give an optimum reduction in the length
13519of the \.{DVI} code; after all, speed is more important than compactness.
13520But the method is surprisingly effective, and it takes comparatively little
13521time.
13522
13523We can best understand the basic idea by first considering a simpler problem
13524that has the same essential characteristics. Given a sequence of digits,
13525say $3\,1\,4\,1\,5\,9\,2\,6\,5\,3\,5\,8\,9$, we want to assign subscripts
13526$d$, $y$, or $z$ to each digit so as to maximize the number of ``$y$-hits''
13527and ``$z$-hits''; a $y$-hit is an instance of two appearances of the same
13528digit with the subscript $y$, where no $y$'s intervene between the two
13529appearances, and a $z$-hit is defined similarly. For example, the sequence
13530above could be decorated with subscripts as follows:
13531$$3_z\,1_y\,4_d\,1_y\,5_y\,9_d\,2_d\,6_d\,5_y\,3_z\,5_y\,8_d\,9_d.$$
13532There are three $y$-hits ($1_y\ldots1_y$ and $5_y\ldots5_y\ldots5_y$) and
13533one $z$-hit ($3_z\ldots3_z$); there are no $d$-hits, since the two appearances
13534of $9_d$ have $d$'s between them, but we don't count $d$-hits so it doesn't
13535matter how many there are. These subscripts are analogous to the \.{DVI}
13536commands called \\{down}, $y$, and $z$, and the digits are analogous to
13537different amounts of vertical motion; a $y$-hit or $z$-hit corresponds to
13538the opportunity to use the one-byte commands |y0| or |z0| in a \.{DVI} file.
13539
13540\TeX's method of assigning subscripts works like this: Append a new digit,
13541say $\delta$, to the right of the sequence. Now look back through the
13542sequence until one of the following things happens: (a)~You see
13543$\delta_y$ or $\delta_z$, and this was the first time you encountered a
13544$y$ or $z$ subscript, respectively.  Then assign $y$ or $z$ to the new
13545$\delta$; you have scored a hit. (b)~You see $\delta_d$, and no $y$
13546subscripts have been encountered so far during this search.  Then change
13547the previous $\delta_d$ to $\delta_y$ (this corresponds to changing a
13548command in the output buffer), and assign $y$ to the new $\delta$; it's
13549another hit.  (c)~You see $\delta_d$, and a $y$ subscript has been seen
13550but not a $z$.  Change the previous $\delta_d$ to $\delta_z$ and assign
13551$z$ to the new $\delta$. (d)~You encounter both $y$ and $z$ subscripts
13552before encountering a suitable $\delta$, or you scan all the way to the
13553front of the sequence. Assign $d$ to the new $\delta$; this assignment may
13554be changed later.
13555
13556The subscripts $3_z\,1_y\,4_d\ldots\,$ in the example above were, in fact,
13557produced by this procedure, as the reader can verify. (Go ahead and try it.)
13558
13559@ In order to implement such an idea, \TeX\ maintains a stack of pointers
13560to the \\{down}, $y$, and $z$ commands that have been generated for the
13561current page. And there is a similar stack for \\{right}, |w|, and |x|
13562commands. These stacks are called the down stack and right stack, and their
13563top elements are maintained in the variables |down_ptr| and |right_ptr|.
13564
13565Each entry in these stacks contains four fields: The |width| field is
13566the amount of motion down or to the right; the |location| field is the
13567byte number of the \.{DVI} command in question (including the appropriate
13568|dvi_offset|); the |link| field points to the next item below this one
13569on the stack; and the |info| field encodes the options for possible change
13570in the \.{DVI} command.
13571
13572@d movement_node_size=3 {number of words per entry in the down and right stacks}
13573@d location(#)==mem[#+2].int {\.{DVI} byte number for a movement command}
13574
13575@<Glob...@>=
13576@!down_ptr,@!right_ptr:pointer; {heads of the down and right stacks}
13577
13578@ @<Set init...@>=
13579down_ptr:=null; right_ptr:=null;
13580
13581@ Here is a subroutine that produces a \.{DVI} command for some specified
13582downward or rightward motion. It has two parameters: |w| is the amount
13583of motion, and |o| is either |down1| or |right1|. We use the fact that
13584the command codes have convenient arithmetic properties: |y1-down1=w1-right1|
13585and |z1-down1=x1-right1|.
13586
13587@p procedure movement(@!w:scaled;@!o:eight_bits);
13588label exit,found,not_found,2,1;
13589var mstate:small_number; {have we seen a |y| or |z|?}
13590@!p,@!q:pointer; {current and top nodes on the stack}
13591@!k:integer; {index into |dvi_buf|, modulo |dvi_buf_size|}
13592begin q:=get_node(movement_node_size); {new node for the top of the stack}
13593width(q):=w; location(q):=dvi_offset+dvi_ptr;
13594if o=down1 then
13595  begin link(q):=down_ptr; down_ptr:=q;
13596  end
13597else  begin link(q):=right_ptr; right_ptr:=q;
13598  end;
13599@<Look at the other stack entries until deciding what sort of \.{DVI} command
13600  to generate; |goto found| if node |p| is a ``hit''@>;
13601@<Generate a |down| or |right| command for |w| and |return|@>;
13602found: @<Generate a |y0| or |z0| command in order to reuse a previous
13603  appearance of~|w|@>;
13604exit:end;
13605
13606@ The |info| fields in the entries of the down stack or the right stack
13607have six possible settings: |y_here| or |z_here| mean that the \.{DVI}
13608command refers to |y| or |z|, respectively (or to |w| or |x|, in the
13609case of horizontal motion); |yz_OK| means that the \.{DVI} command is
13610\\{down} (or \\{right}) but can be changed to either |y| or |z| (or
13611to either |w| or |x|); |y_OK| means that it is \\{down} and can be changed
13612to |y| but not |z|; |z_OK| is similar; and |d_fixed| means it must stay
13613\\{down}.
13614
13615The four settings |yz_OK|, |y_OK|, |z_OK|, |d_fixed| would not need to
13616be distinguished from each other if we were simply solving the
13617digit-subscripting problem mentioned above. But in \TeX's case there is
13618a complication because of the nested structure of |push| and |pop|
13619commands. Suppose we add parentheses to the digit-subscripting problem,
13620redefining hits so that $\delta_y\ldots \delta_y$ is a hit if all $y$'s between
13621the $\delta$'s are enclosed in properly nested parentheses, and if the
13622parenthesis level of the right-hand $\delta_y$ is deeper than or equal to
13623that of the left-hand one. Thus, `(' and `)' correspond to `|push|'
13624and `|pop|'. Now if we want to assign a subscript to the final 1 in the
13625sequence
13626$$2_y\,7_d\,1_d\,(\,8_z\,2_y\,8_z\,)\,1$$
13627we cannot change the previous $1_d$ to $1_y$, since that would invalidate
13628the $2_y\ldots2_y$ hit. But we can change it to $1_z$, scoring a hit
13629since the intervening $8_z$'s are enclosed in parentheses.
13630
13631The program below removes movement nodes that are introduced after a |push|,
13632before it outputs the corresponding |pop|.
13633
13634@d y_here=1 {|info| when the movement entry points to a |y| command}
13635@d z_here=2 {|info| when the movement entry points to a |z| command}
13636@d yz_OK=3 {|info| corresponding to an unconstrained \\{down} command}
13637@d y_OK=4 {|info| corresponding to a \\{down} that can't become a |z|}
13638@d z_OK=5 {|info| corresponding to a \\{down} that can't become a |y|}
13639@d d_fixed=6 {|info| corresponding to a \\{down} that can't change}
13640
13641@ When the |movement| procedure gets to the label |found|, the value of
13642|info(p)| will be either |y_here| or |z_here|. If it is, say, |y_here|,
13643the procedure generates a |y0| command (or a |w0| command), and marks
13644all |info| fields between |q| and |p| so that |y| is not OK in that range.
13645
13646@<Generate a |y0| or |z0| command...@>=
13647info(q):=info(p);
13648if info(q)=y_here then
13649  begin dvi_out(o+y0-down1); {|y0| or |w0|}
13650  while link(q)<>p do
13651    begin q:=link(q);
13652    case info(q) of
13653    yz_OK: info(q):=z_OK;
13654    y_OK: info(q):=d_fixed;
13655    othercases do_nothing
13656    endcases;
13657    end;
13658  end
13659else  begin dvi_out(o+z0-down1); {|z0| or |x0|}
13660  while link(q)<>p do
13661    begin q:=link(q);
13662    case info(q) of
13663    yz_OK: info(q):=y_OK;
13664    z_OK: info(q):=d_fixed;
13665    othercases do_nothing
13666    endcases;
13667    end;
13668  end
13669
13670@ @<Generate a |down| or |right|...@>=
13671info(q):=yz_OK;
13672if abs(w)>=@'40000000 then
13673  begin dvi_out(o+3); {|down4| or |right4|}
13674  dvi_four(w); return;
13675  end;
13676if abs(w)>=@'100000 then
13677  begin dvi_out(o+2); {|down3| or |right3|}
13678  if w<0 then w:=w+@'100000000;
13679  dvi_out(w div @'200000); w:=w mod @'200000; goto 2;
13680  end;
13681if abs(w)>=@'200 then
13682  begin dvi_out(o+1); {|down2| or |right2|}
13683  if w<0 then w:=w+@'200000;
13684  goto 2;
13685  end;
13686dvi_out(o); {|down1| or |right1|}
13687if w<0 then w:=w+@'400;
13688goto 1;
136892: dvi_out(w div @'400);
136901: dvi_out(w mod @'400); return
13691
13692@ As we search through the stack, we are in one of three states,
13693|y_seen|, |z_seen|, or |none_seen|, depending on whether we have
13694encountered |y_here| or |z_here| nodes. These states are encoded as
13695multiples of 6, so that they can be added to the |info| fields for quick
13696decision-making.
13697@^inner loop@>
13698
13699@d none_seen=0 {no |y_here| or |z_here| nodes have been encountered yet}
13700@d y_seen=6 {we have seen |y_here| but not |z_here|}
13701@d z_seen=12 {we have seen |z_here| but not |y_here|}
13702
13703@<Look at the other stack entries until deciding...@>=
13704p:=link(q); mstate:=none_seen;
13705while p<>null do
13706  begin if width(p)=w then @<Consider a node with matching width;
13707    |goto found| if it's a hit@>
13708  else  case mstate+info(p) of
13709    none_seen+y_here: mstate:=y_seen;
13710    none_seen+z_here: mstate:=z_seen;
13711    y_seen+z_here,z_seen+y_here: goto not_found;
13712    othercases do_nothing
13713    endcases;
13714  p:=link(p);
13715  end;
13716not_found:
13717
13718@ We might find a valid hit in a |y| or |z| byte that is already gone
13719from the buffer. But we can't change bytes that are gone forever; ``the
13720moving finger writes, $\ldots\,\,$.''
13721
13722@<Consider a node with matching width...@>=
13723case mstate+info(p) of
13724none_seen+yz_OK,none_seen+y_OK,z_seen+yz_OK,z_seen+y_OK:@t@>@;@/
13725  if location(p)<dvi_gone then goto not_found
13726  else @<Change buffered instruction to |y| or |w| and |goto found|@>;
13727none_seen+z_OK,y_seen+yz_OK,y_seen+z_OK:@t@>@;@/
13728  if location(p)<dvi_gone then goto not_found
13729  else @<Change buffered instruction to |z| or |x| and |goto found|@>;
13730none_seen+y_here,none_seen+z_here,y_seen+z_here,z_seen+y_here: goto found;
13731othercases do_nothing
13732endcases
13733
13734@ @<Change buffered instruction to |y| or |w| and |goto found|@>=
13735begin k:=location(p)-dvi_offset;
13736if k<0 then k:=k+dvi_buf_size;
13737dvi_buf[k]:=dvi_buf[k]+y1-down1;
13738info(p):=y_here; goto found;
13739end
13740
13741@ @<Change buffered instruction to |z| or |x| and |goto found|@>=
13742begin k:=location(p)-dvi_offset;
13743if k<0 then k:=k+dvi_buf_size;
13744dvi_buf[k]:=dvi_buf[k]+z1-down1;
13745info(p):=z_here; goto found;
13746end
13747
13748@ In case you are wondering when all the movement nodes are removed from
13749\TeX's memory, the answer is that they are recycled just before
13750|hlist_out| and |vlist_out| finish outputting a box. This restores the
13751down and right stacks to the state they were in before the box was output,
13752except that some |info|'s may have become more restrictive.
13753
13754@p procedure prune_movements(@!l:integer);
13755  {delete movement nodes with |location>=l|}
13756label done,exit;
13757var p:pointer; {node being deleted}
13758begin while down_ptr<>null do
13759  begin if location(down_ptr)<l then goto done;
13760  p:=down_ptr; down_ptr:=link(p); free_node(p,movement_node_size);
13761  end;
13762done: while right_ptr<>null do
13763  begin if location(right_ptr)<l then return;
13764  p:=right_ptr; right_ptr:=link(p); free_node(p,movement_node_size);
13765  end;
13766exit:end;
13767
13768@ The actual distances by which we want to move might be computed as the
13769sum of several separate movements. For example, there might be several
13770glue nodes in succession, or we might want to move right by the width of
13771some box plus some amount of glue. More importantly, the baselineskip
13772distances are computed in terms of glue together with the depth and
13773height of adjacent boxes, and we want the \.{DVI} file to lump these
13774three quantities together into a single motion.
13775
13776Therefore, \TeX\ maintains two pairs of global variables: |dvi_h| and |dvi_v|
13777are the |h| and |v| coordinates corresponding to the commands actually
13778output to the \.{DVI} file, while |cur_h| and |cur_v| are the coordinates
13779corresponding to the current state of the output routines. Coordinate
13780changes will accumulate in |cur_h| and |cur_v| without being reflected
13781in the output, until such a change becomes necessary or desirable; we
13782can call the |movement| procedure whenever we want to make |dvi_h=cur_h|
13783or |dvi_v=cur_v|.
13784
13785The current font reflected in the \.{DVI} output is called |dvi_f|;
13786there is no need for a `\\{cur\_f}' variable.
13787
13788The depth of nesting of |hlist_out| and |vlist_out| is called |cur_s|;
13789this is essentially the depth of |push| commands in the \.{DVI} output.
13790
13791For mixed direction text (\TeXXeT) the current text direction is called
13792|cur_dir|. As the box being shipped out will never be used again and
13793soon be recycled, we can simply reverse any R-text (i.e., right-to-left)
13794segments of hlist nodes as well as complete hlist nodes embedded in such
13795segments. Moreover this can be done iteratively rather than recursively.
13796There are, however, two complications related to leaders that require
13797some additional bookkeeping: (1)~One and the same hlist node might be
13798used more than once (but never inside both L- and R-text); and
13799(2)~leader boxes inside hlists must be aligned with respect to the left
13800edge of the original hlist.
13801
13802A math node is changed into a kern node whenever the text direction
13803remains the same, it is replaced by an |edge_node| if the text direction
13804changes; the subtype of an an |hlist_node| inside R-text is changed to
13805|reversed| once its hlist has been reversed.
13806@!@^data structure assumptions@>
13807
13808@d reversed=1 {subtype for an |hlist_node| whose hlist has been reversed}
13809@d dlist=2 {subtype for an |hlist_node| from display math mode}
13810@d box_lr(#) == (qo(subtype(#))) {direction mode of a box}
13811@d set_box_lr(#) ==  subtype(#):=set_box_lr_end
13812@d set_box_lr_end(#) == qi(#)
13813@#
13814@d left_to_right=0
13815@d right_to_left=1
13816@d reflected==1-cur_dir {the opposite of |cur_dir|}
13817@#
13818@d synch_h==if cur_h<>dvi_h then
13819    begin movement(cur_h-dvi_h,right1); dvi_h:=cur_h;
13820    end
13821@d synch_v==if cur_v<>dvi_v then
13822    begin movement(cur_v-dvi_v,down1); dvi_v:=cur_v;
13823    end
13824
13825@<Glob...@>=
13826@!dvi_h,@!dvi_v:scaled; {a \.{DVI} reader program thinks we are here}
13827@!cur_h,@!cur_v:scaled; {\TeX\ thinks we are here}
13828@!dvi_f:internal_font_number; {the current font}
13829@!cur_s:integer; {current depth of output box nesting, initially $-1$}
13830
13831@ @<Initialize variables as |ship_out| begins@>=
13832dvi_h:=0; dvi_v:=0; cur_h:=h_offset; dvi_f:=null_font;
13833@<Calculate page dimensions and margins@>;
13834ensure_dvi_open;
13835if total_pages=0 then
13836  begin dvi_out(pre); dvi_out(id_byte); {output the preamble}
13837@^preamble of \.{DVI} file@>
13838  dvi_four(25400000); dvi_four(473628672); {conversion ratio for sp}
13839  prepare_mag; dvi_four(mag); {magnification factor is frozen}
13840  old_setting:=selector; selector:=new_string;
13841  print(" XeTeX output "); print_int(year); print_char(".");
13842  print_two(month); print_char("."); print_two(day);
13843  print_char(":"); print_two(time div 60);
13844  print_two(time mod 60);
13845  selector:=old_setting; dvi_out(cur_length);
13846  for s:=str_start_macro(str_ptr) to pool_ptr-1 do dvi_out(so(str_pool[s]));
13847  pool_ptr:=str_start_macro(str_ptr); {flush the current string}
13848  end
13849
13850@ When |hlist_out| is called, its duty is to output the box represented
13851by the |hlist_node| pointed to by |temp_ptr|. The reference point of that
13852box has coordinates |(cur_h,cur_v)|.
13853
13854Similarly, when |vlist_out| is called, its duty is to output the box represented
13855by the |vlist_node| pointed to by |temp_ptr|. The reference point of that
13856box has coordinates |(cur_h,cur_v)|.
13857@^recursion@>
13858
13859@p procedure@?vlist_out; forward; {|hlist_out| and |vlist_out| are mutually
13860  recursive}
13861
13862@ The recursive procedures |hlist_out| and |vlist_out| each have local variables
13863|save_h| and |save_v| to hold the values of |dvi_h| and |dvi_v| just before
13864entering a new level of recursion.  In effect, the values of |save_h| and
13865|save_v| on \TeX's run-time stack correspond to the values of |h| and |v|
13866that a \.{DVI}-reading program will push onto its coordinate stack.
13867
13868@d move_past=13 {go to this label when advancing past glue or a rule}
13869@d fin_rule=14 {go to this label to finish processing a rule}
13870@d next_p=15 {go to this label when finished with node |p|}
13871
13872@d check_next=1236
13873@d end_node_run=1237
13874
13875@p @t\4@>@<Declare procedures needed in |hlist_out|, |vlist_out|@>@t@>@/
13876procedure hlist_out; {output an |hlist_node| box}
13877label reswitch, move_past, fin_rule, next_p;
13878var base_line: scaled; {the baseline coordinate for this box}
13879@!left_edge: scaled; {the left coordinate for this box}
13880@!save_h,@!save_v: scaled; {what |dvi_h| and |dvi_v| should pop to}
13881@!this_box: pointer; {pointer to containing box}
13882@!g_order: glue_ord; {applicable order of infinity for glue}
13883@!g_sign: normal..shrinking; {selects type of glue}
13884@!p:pointer; {current position in the hlist}
13885@!save_loc:integer; {\.{DVI} byte location upon entry}
13886@!leader_box:pointer; {the leader box being replicated}
13887@!leader_wd:scaled; {width of leader box being replicated}
13888@!lx:scaled; {extra space between leader boxes}
13889@!outer_doing_leaders:boolean; {were we doing leaders?}
13890@!edge:scaled; {right edge of sub-box or leader space}
13891@!prev_p:pointer; {one step behind |p|}
13892@!len: integer; {length of scratch string for native word output}
13893@!q,@!r: pointer;
13894@!k,@!j: integer;
13895@!glue_temp:real; {glue value before rounding}
13896@!cur_glue:real; {glue seen so far}
13897@!cur_g:scaled; {rounded equivalent of |cur_glue| times the glue ratio}
13898begin cur_g:=0; cur_glue:=float_constant(0);
13899this_box:=temp_ptr; g_order:=glue_order(this_box);
13900g_sign:=glue_sign(this_box);
13901@<Merge sequences of words using AAT fonts and inter-word spaces into single nodes@>;
13902p:=list_ptr(this_box);
13903incr(cur_s);
13904if cur_s>0 then dvi_out(push);
13905if cur_s>max_push then max_push:=cur_s;
13906save_loc:=dvi_offset+dvi_ptr; base_line:=cur_v;
13907prev_p:=this_box+list_offset;
13908@<Initialize |hlist_out| for mixed direction typesetting@>;
13909left_edge:=cur_h;
13910while p<>null do @<Output node |p| for |hlist_out| and move to the next node,
13911  maintaining the condition |cur_v=base_line|@>;
13912@<Finish |hlist_out| for mixed direction typesetting@>;
13913prune_movements(save_loc);
13914if cur_s>0 then dvi_pop(save_loc);
13915decr(cur_s);
13916end;
13917
13918@ Extra stuff for justifiable AAT text; need to merge runs of words and normal spaces.
13919
13920@d is_native_word_node(#) == (((#)<>null and (not is_char_node(#)) and (type(#) = whatsit_node) and (subtype(#) = native_word_node)))
13921@d is_glyph_node(#) == (((#)<>null and (not is_char_node(#)) and (type(#) = whatsit_node) and (subtype(#) = glyph_node)))
13922
13923@<Merge sequences of words using AAT fonts and inter-word spaces into single nodes@>=
13924p:=list_ptr(this_box);
13925prev_p:=this_box+list_offset;
13926while p<>null do begin
13927  if link(p) <> null then begin {not worth looking ahead at the end}
13928    if is_native_word_node(p) and (font_area[native_font(p)] = aat_font_flag)
13929        and (font_letter_space[native_font(p)] = 0) then begin
13930      {got a word in an AAT font, might be the start of a run}
13931      r:=p; {|r| is start of possible run}
13932      k:=native_length(r);
13933      q:=link(p);
13934check_next:
13935      @<Advance |q| past ignorable nodes@>;
13936      if (q <> null) and not is_char_node(q) then begin
13937        if (type(q) = glue_node) and (subtype(q) = normal) and (glue_ptr(q) = font_glue[native_font(r)]) then begin
13938          {found a normal space; if the next node is another word in the same font, we'll merge}
13939          q:=link(q);
13940          @<Advance |q| past ignorable nodes@>;
13941          if is_native_word_node(q) and (native_font(q) = native_font(r)) then begin
13942            p:=q; {record new tail of run in |p|}
13943            k:=k + 1 + native_length(q);
13944            q:=link(q);
13945            goto check_next;
13946          end;
13947          goto end_node_run;
13948        end;
13949        if is_native_word_node(q) and (native_font(q) = native_font(r)) then begin
13950          p:=q; {record new tail of run in |p|}
13951          q:=link(q);
13952          goto check_next;
13953        end
13954      end;
13955end_node_run: {now |r| points to first |native_word_node| of the run, and |p| to the last}
13956      if p <> r then begin {merge nodes from |r| to |p| inclusive; total text length is |k|}
13957        str_room(k);
13958        k:=0; {now we'll use this as accumulator for total width}
13959        q:=r;
13960        loop begin
13961          if type(q) = whatsit_node then begin
13962            if subtype(q) = native_word_node then begin
13963              for j:=0 to native_length(q)-1 do
13964                append_char(get_native_char(q, j));
13965              k:=k + width(q);
13966            end
13967          end else if type(q) = glue_node then begin
13968            append_char(" ");
13969            g:=glue_ptr(q);
13970            k:=k + width(g);
13971            if g_sign <> normal then begin
13972              if g_sign = stretching then begin
13973                if stretch_order(g) = g_order then begin
13974                  k:=k + round(float(glue_set(this_box)) * stretch(g))
13975                end
13976              end else begin
13977                if shrink_order(g) = g_order then begin
13978                  k:=k - round(float(glue_set(this_box)) * shrink(g))
13979                end
13980              end
13981            end
13982          end;
13983          {discretionary and deleted nodes can be discarded here}
13984          if q = p then break
13985          else q:=link(q);
13986        end;
13987done:
13988        q:=new_native_word_node(native_font(r), cur_length);
13989        link(prev_p):=q;
13990        for j:=0 to cur_length - 1 do
13991          set_native_char(q, j, str_pool[str_start_macro(str_ptr) + j]);
13992        link(q):=link(p);
13993        link(p):=null;
13994        flush_node_list(r);
13995        width(q):=k;
13996        set_justified_native_glyphs(q);
13997        p:=q;
13998        pool_ptr:=str_start_macro(str_ptr); {flush the temporary string data}
13999      end
14000    end;
14001    prev_p:=p;
14002  end;
14003  p:=link(p);
14004end
14005
14006@ @<Advance |q| past ignorable nodes@>=
14007while (q <> null) and (not is_char_node(q)) and (type(q) = disc_node) do
14008    q:=link(q)
14009
14010@ We ought to give special care to the efficiency of one part of |hlist_out|,
14011since it belongs to \TeX's inner loop. When a |char_node| is encountered,
14012we save a little time by processing several nodes in succession until
14013reaching a non-|char_node|. The program uses the fact that |set_char_0=0|.
14014@^inner loop@>
14015
14016@<Output node |p| for |hlist_out|...@>=
14017reswitch: if is_char_node(p) then
14018  begin synch_h; synch_v;
14019  repeat f:=font(p); c:=character(p);
14020  if (p<>lig_trick) and (font_mapping[f]<>nil) then c:=apply_tfm_font_mapping(font_mapping[f],c);
14021  if f<>dvi_f then @<Change font |dvi_f| to |f|@>;
14022  if c>=qi(128) then dvi_out(set1);
14023  dvi_out(qo(c));@/
14024  cur_h:=cur_h+char_width(f)(char_info(f)(c));
14025  prev_p:=link(prev_p); {N.B.: not |prev_p:=p|, |p| might be |lig_trick|}
14026  p:=link(p);
14027  until not is_char_node(p);
14028  dvi_h:=cur_h;
14029  end
14030else @<Output the non-|char_node| |p| for |hlist_out|
14031    and move to the next node@>
14032
14033@ @<Change font |dvi_f| to |f|@>=
14034begin if not font_used[f] then
14035  begin dvi_font_def(f); font_used[f]:=true;
14036  end;
14037if f<=64+font_base then dvi_out(f-font_base-1+fnt_num_0)
14038else  begin dvi_out(fnt1); dvi_out(f-font_base-1);
14039  end;
14040dvi_f:=f;
14041end
14042
14043@ @<Output the non-|char_node| |p| for |hlist_out|...@>=
14044begin case type(p) of
14045hlist_node,vlist_node:@<Output a box in an hlist@>;
14046rule_node: begin rule_ht:=height(p); rule_dp:=depth(p); rule_wd:=width(p);
14047  goto fin_rule;
14048  end;
14049whatsit_node: @<Output the whatsit node |p| in an hlist@>;
14050glue_node: @<Move right or output leaders@>;
14051margin_kern_node: begin
14052  cur_h:=cur_h+width(p);
14053end;
14054kern_node:cur_h:=cur_h+width(p);
14055math_node: @<Handle a math node in |hlist_out|@>;
14056ligature_node: @<Make node |p| look like a |char_node| and |goto reswitch|@>;
14057@/@<Cases of |hlist_out| that arise in mixed direction text only@>@;
14058othercases do_nothing
14059endcases;@/
14060goto next_p;
14061fin_rule: @<Output a rule in an hlist@>;
14062move_past: cur_h:=cur_h+rule_wd;
14063next_p:prev_p:=p; p:=link(p);
14064end
14065
14066@ @<Output a box in an hlist@>=
14067if list_ptr(p)=null then cur_h:=cur_h+width(p)
14068else  begin save_h:=dvi_h; save_v:=dvi_v;
14069  cur_v:=base_line+shift_amount(p); {shift the box down}
14070  temp_ptr:=p; edge:=cur_h+width(p);
14071  if cur_dir=right_to_left then cur_h:=edge;
14072  if type(p)=vlist_node then vlist_out@+else hlist_out;
14073  dvi_h:=save_h; dvi_v:=save_v;
14074  cur_h:=edge; cur_v:=base_line;
14075  end
14076
14077@ @<Output a rule in an hlist@>=
14078if is_running(rule_ht) then rule_ht:=height(this_box);
14079if is_running(rule_dp) then rule_dp:=depth(this_box);
14080rule_ht:=rule_ht+rule_dp; {this is the rule thickness}
14081if (rule_ht>0)and(rule_wd>0) then {we don't output empty rules}
14082  begin synch_h; cur_v:=base_line+rule_dp; synch_v;
14083  dvi_out(set_rule); dvi_four(rule_ht); dvi_four(rule_wd);
14084  cur_v:=base_line; dvi_h:=dvi_h+rule_wd;
14085  end
14086
14087@ @d billion==float_constant(1000000000)
14088@d vet_glue(#)== glue_temp:=#;
14089  if glue_temp>billion then
14090           glue_temp:=billion
14091  else if glue_temp<-billion then
14092           glue_temp:=-billion
14093@#
14094@d round_glue==g:=glue_ptr(p); rule_wd:=width(g)-cur_g;
14095if g_sign<>normal then
14096  begin if g_sign=stretching then
14097    begin if stretch_order(g)=g_order then
14098      begin cur_glue:=cur_glue+stretch(g);
14099      vet_glue(float(glue_set(this_box))*cur_glue);
14100@^real multiplication@>
14101      cur_g:=round(glue_temp);
14102      end;
14103    end
14104  else if shrink_order(g)=g_order then
14105      begin cur_glue:=cur_glue-shrink(g);
14106      vet_glue(float(glue_set(this_box))*cur_glue);
14107      cur_g:=round(glue_temp);
14108      end;
14109  end;
14110rule_wd:=rule_wd+cur_g
14111
14112@<Move right or output leaders@>=
14113begin round_glue;
14114if eTeX_ex then @<Handle a glue node for mixed direction typesetting@>;
14115if subtype(p)>=a_leaders then
14116  @<Output leaders in an hlist, |goto fin_rule| if a rule
14117    or to |next_p| if done@>;
14118goto move_past;
14119end
14120
14121@ @<Output leaders in an hlist...@>=
14122begin leader_box:=leader_ptr(p);
14123if type(leader_box)=rule_node then
14124  begin rule_ht:=height(leader_box); rule_dp:=depth(leader_box);
14125  goto fin_rule;
14126  end;
14127leader_wd:=width(leader_box);
14128if (leader_wd>0)and(rule_wd>0) then
14129  begin rule_wd:=rule_wd+10; {compensate for floating-point rounding}
14130  if cur_dir=right_to_left then cur_h:=cur_h-10;
14131  edge:=cur_h+rule_wd; lx:=0;
14132  @<Let |cur_h| be the position of the first box, and set |leader_wd+lx|
14133    to the spacing between corresponding parts of boxes@>;
14134  while cur_h+leader_wd<=edge do
14135    @<Output a leader box at |cur_h|,
14136      then advance |cur_h| by |leader_wd+lx|@>;
14137  if cur_dir=right_to_left then cur_h:=edge
14138  else cur_h:=edge-10;
14139  goto next_p;
14140  end;
14141end
14142
14143@ The calculations related to leaders require a bit of care. First, in the
14144case of |a_leaders| (aligned leaders), we want to move |cur_h| to
14145|left_edge| plus the smallest multiple of |leader_wd| for which the result
14146is not less than the current value of |cur_h|; i.e., |cur_h| should become
14147$|left_edge|+|leader_wd|\times\lceil
14148(|cur_h|-|left_edge|)/|leader_wd|\rceil$.  The program here should work in
14149all cases even though some implementations of \PASCAL\ give nonstandard
14150results for the |div| operation when |cur_h| is less than |left_edge|.
14151
14152In the case of |c_leaders| (centered leaders), we want to increase |cur_h|
14153by half of the excess space not occupied by the leaders; and in the
14154case of |x_leaders| (expanded leaders) we increase |cur_h|
14155by $1/(q+1)$ of this excess space, where $q$ is the number of times the
14156leader box will be replicated. Slight inaccuracies in the division might
14157accumulate; half of this rounding error is placed at each end of the leaders.
14158
14159@<Let |cur_h| be the position of the first box, ...@>=
14160if subtype(p)=a_leaders then
14161  begin save_h:=cur_h;
14162  cur_h:=left_edge+leader_wd*((cur_h-left_edge)@!div leader_wd);
14163  if cur_h<save_h then cur_h:=cur_h+leader_wd;
14164  end
14165else  begin lq:=rule_wd div leader_wd; {the number of box copies}
14166  lr:=rule_wd mod leader_wd; {the remaining space}
14167  if subtype(p)=c_leaders then cur_h:=cur_h+(lr div 2)
14168  else  begin lx:=lr div (lq+1);
14169    cur_h:=cur_h+((lr-(lq-1)*lx) div 2);
14170    end;
14171  end
14172
14173@ The `\\{synch}' operations here are intended to decrease the number of
14174bytes needed to specify horizontal and vertical motion in the \.{DVI} output.
14175
14176@<Output a leader box at |cur_h|, ...@>=
14177begin cur_v:=base_line+shift_amount(leader_box); synch_v; save_v:=dvi_v;@/
14178synch_h; save_h:=dvi_h; temp_ptr:=leader_box;
14179if cur_dir=right_to_left then cur_h:=cur_h+leader_wd;
14180outer_doing_leaders:=doing_leaders; doing_leaders:=true;
14181if type(leader_box)=vlist_node then vlist_out@+else hlist_out;
14182doing_leaders:=outer_doing_leaders;
14183dvi_v:=save_v; dvi_h:=save_h; cur_v:=base_line;
14184cur_h:=save_h+leader_wd+lx;
14185end
14186
14187@ The |vlist_out| routine is similar to |hlist_out|, but a bit simpler.
14188
14189@p procedure vlist_out; {output a |vlist_node| box}
14190label move_past, fin_rule, next_p;
14191var left_edge: scaled; {the left coordinate for this box}
14192@!top_edge: scaled; {the top coordinate for this box}
14193@!save_h,@!save_v: scaled; {what |dvi_h| and |dvi_v| should pop to}
14194@!this_box: pointer; {pointer to containing box}
14195@!g_order: glue_ord; {applicable order of infinity for glue}
14196@!g_sign: normal..shrinking; {selects type of glue}
14197@!p:pointer; {current position in the vlist}
14198@!save_loc:integer; {\.{DVI} byte location upon entry}
14199@!leader_box:pointer; {the leader box being replicated}
14200@!leader_ht:scaled; {height of leader box being replicated}
14201@!lx:scaled; {extra space between leader boxes}
14202@!outer_doing_leaders:boolean; {were we doing leaders?}
14203@!edge:scaled; {bottom boundary of leader space}
14204@!glue_temp:real; {glue value before rounding}
14205@!cur_glue:real; {glue seen so far}
14206@!cur_g:scaled; {rounded equivalent of |cur_glue| times the glue ratio}
14207@!upwards:boolean; {whether we're stacking upwards}
14208begin cur_g:=0; cur_glue:=float_constant(0);
14209this_box:=temp_ptr; g_order:=glue_order(this_box);
14210g_sign:=glue_sign(this_box); p:=list_ptr(this_box);
14211upwards:=(subtype(this_box)=min_quarterword+1);
14212incr(cur_s);
14213if cur_s>0 then dvi_out(push);
14214if cur_s>max_push then max_push:=cur_s;
14215save_loc:=dvi_offset+dvi_ptr; left_edge:=cur_h;
14216if upwards then cur_v:=cur_v+depth(this_box) else cur_v:=cur_v-height(this_box);
14217top_edge:=cur_v;
14218while p<>null do @<Output node |p| for |vlist_out| and move to the next node,
14219  maintaining the condition |cur_h=left_edge|@>;
14220prune_movements(save_loc);
14221if cur_s>0 then dvi_pop(save_loc);
14222decr(cur_s);
14223end;
14224
14225@ @<Output node |p| for |vlist_out|...@>=
14226begin if is_char_node(p) then confusion("vlistout")
14227@:this can't happen vlistout}{\quad vlistout@>
14228else @<Output the non-|char_node| |p| for |vlist_out|@>;
14229next_p:p:=link(p);
14230end
14231
14232@ @<Output the non-|char_node| |p| for |vlist_out|@>=
14233begin case type(p) of
14234hlist_node,vlist_node:@<Output a box in a vlist@>;
14235rule_node: begin rule_ht:=height(p); rule_dp:=depth(p); rule_wd:=width(p);
14236  goto fin_rule;
14237  end;
14238whatsit_node: @<Output the whatsit node |p| in a vlist@>;
14239glue_node: @<Move down or output leaders@>;
14240kern_node:if upwards then cur_v:=cur_v-width(p) else cur_v:=cur_v+width(p);
14241othercases do_nothing
14242endcases;@/
14243goto next_p;
14244fin_rule: @<Output a rule in a vlist, |goto next_p|@>;
14245move_past: if upwards then cur_v:=cur_v-rule_ht else cur_v:=cur_v+rule_ht;
14246end
14247
14248@ The |synch_v| here allows the \.{DVI} output to use one-byte commands
14249for adjusting |v| in most cases, since the baselineskip distance will
14250usually be constant.
14251
14252@<Output a box in a vlist@>=
14253if list_ptr(p)=null then cur_v:=cur_v+height(p)+depth(p)
14254else  begin if upwards then cur_v:=cur_v-depth(p) else cur_v:=cur_v+height(p); synch_v;
14255  save_h:=dvi_h; save_v:=dvi_v;
14256  if cur_dir=right_to_left then cur_h:=left_edge-shift_amount(p)
14257  else cur_h:=left_edge+shift_amount(p); {shift the box right}
14258  temp_ptr:=p;
14259  if type(p)=vlist_node then vlist_out@+else hlist_out;
14260  dvi_h:=save_h; dvi_v:=save_v;
14261  if upwards then cur_v:=save_v-height(p) else cur_v:=save_v+depth(p); cur_h:=left_edge;
14262  end
14263
14264@ @<Output a rule in a vlist...@>=
14265if is_running(rule_wd) then rule_wd:=width(this_box);
14266rule_ht:=rule_ht+rule_dp; {this is the rule thickness}
14267if upwards then cur_v:=cur_v-rule_ht else cur_v:=cur_v+rule_ht;
14268if (rule_ht>0)and(rule_wd>0) then {we don't output empty rules}
14269  begin if cur_dir=right_to_left then cur_h:=cur_h-rule_wd;
14270  synch_h; synch_v;
14271  dvi_out(put_rule); dvi_four(rule_ht); dvi_four(rule_wd);
14272  cur_h:=left_edge;
14273  end;
14274goto next_p
14275
14276@ @<Move down or output leaders@>=
14277begin g:=glue_ptr(p); rule_ht:=width(g)-cur_g;
14278if g_sign<>normal then
14279  begin if g_sign=stretching then
14280    begin if stretch_order(g)=g_order then
14281      begin cur_glue:=cur_glue+stretch(g);
14282      vet_glue(float(glue_set(this_box))*cur_glue);
14283@^real multiplication@>
14284      cur_g:=round(glue_temp);
14285      end;
14286    end
14287  else if shrink_order(g)=g_order then
14288      begin cur_glue:=cur_glue-shrink(g);
14289      vet_glue(float(glue_set(this_box))*cur_glue);
14290      cur_g:=round(glue_temp);
14291      end;
14292  end;
14293rule_ht:=rule_ht+cur_g;
14294if subtype(p)>=a_leaders then
14295  @<Output leaders in a vlist, |goto fin_rule| if a rule
14296    or to |next_p| if done@>;
14297goto move_past;
14298end
14299
14300@ @<Output leaders in a vlist...@>=
14301begin leader_box:=leader_ptr(p);
14302if type(leader_box)=rule_node then
14303  begin rule_wd:=width(leader_box); rule_dp:=0;
14304  goto fin_rule;
14305  end;
14306leader_ht:=height(leader_box)+depth(leader_box);
14307if (leader_ht>0)and(rule_ht>0) then
14308  begin rule_ht:=rule_ht+10; {compensate for floating-point rounding}
14309  edge:=cur_v+rule_ht; lx:=0;
14310  @<Let |cur_v| be the position of the first box, and set |leader_ht+lx|
14311    to the spacing between corresponding parts of boxes@>;
14312  while cur_v+leader_ht<=edge do
14313    @<Output a leader box at |cur_v|,
14314      then advance |cur_v| by |leader_ht+lx|@>;
14315  cur_v:=edge-10; goto next_p;
14316  end;
14317end
14318
14319@ @<Let |cur_v| be the position of the first box, ...@>=
14320if subtype(p)=a_leaders then
14321  begin save_v:=cur_v;
14322  cur_v:=top_edge+leader_ht*((cur_v-top_edge)@!div leader_ht);
14323  if cur_v<save_v then cur_v:=cur_v+leader_ht;
14324  end
14325else  begin lq:=rule_ht div leader_ht; {the number of box copies}
14326  lr:=rule_ht mod leader_ht; {the remaining space}
14327  if subtype(p)=c_leaders then cur_v:=cur_v+(lr div 2)
14328  else  begin lx:=lr div (lq+1);
14329    cur_v:=cur_v+((lr-(lq-1)*lx) div 2);
14330    end;
14331  end
14332
14333@ When we reach this part of the program, |cur_v| indicates the top of a
14334leader box, not its baseline.
14335
14336@<Output a leader box at |cur_v|, ...@>=
14337begin if cur_dir=right_to_left then
14338  cur_h:=left_edge-shift_amount(leader_box)
14339  else cur_h:=left_edge+shift_amount(leader_box);
14340synch_h; save_h:=dvi_h;@/
14341cur_v:=cur_v+height(leader_box); synch_v; save_v:=dvi_v;
14342temp_ptr:=leader_box;
14343outer_doing_leaders:=doing_leaders; doing_leaders:=true;
14344if type(leader_box)=vlist_node then vlist_out@+else hlist_out;
14345doing_leaders:=outer_doing_leaders;
14346dvi_v:=save_v; dvi_h:=save_h; cur_h:=left_edge;
14347cur_v:=save_v-height(leader_box)+leader_ht+lx;
14348end
14349
14350@ The |hlist_out| and |vlist_out| procedures are now complete, so we are
14351ready for the |ship_out| routine that gets them started in the first place.
14352
14353@p procedure ship_out(@!p:pointer); {output the box |p|}
14354label done;
14355var page_loc:integer; {location of the current |bop|}
14356@!j,@!k:0..9; {indices to first ten count registers}
14357@!s:pool_pointer; {index into |str_pool|}
14358@!old_setting:0..max_selector; {saved |selector| setting}
14359begin
14360if job_name=0 then open_log_file;
14361if tracing_output>0 then
14362  begin print_nl(""); print_ln;
14363  print("Completed box being shipped out");
14364@.Completed box...@>
14365  end;
14366if term_offset>max_print_line-9 then print_ln
14367else if (term_offset>0)or(file_offset>0) then print_char(" ");
14368print_char("["); j:=9;
14369while (count(j)=0)and(j>0) do decr(j);
14370for k:=0 to j do
14371  begin print_int(count(k));
14372  if k<j then print_char(".");
14373  end;
14374update_terminal;
14375if tracing_output>0 then
14376  begin print_char("]");
14377  begin_diagnostic; show_box(p); end_diagnostic(true);
14378  end;
14379@<Ship box |p| out@>;
14380if eTeX_ex then @<Check for LR anomalies at the end of |ship_out|@>;
14381if tracing_output<=0 then print_char("]");
14382dead_cycles:=0;
14383update_terminal; {progress report}
14384@<Flush the box from memory, showing statistics if requested@>;
14385end;
14386
14387@ @<Flush the box from memory, showing statistics if requested@>=
14388@!stat if tracing_stats>1 then
14389  begin print_nl("Memory usage before: ");
14390@.Memory usage...@>
14391  print_int(var_used); print_char("&");
14392  print_int(dyn_used); print_char(";");
14393  end;
14394tats@/
14395flush_node_list(p);
14396@!stat if tracing_stats>1 then
14397  begin print(" after: ");
14398  print_int(var_used); print_char("&");
14399  print_int(dyn_used); print("; still untouched: ");
14400  print_int(hi_mem_min-lo_mem_max-1); print_ln;
14401  end;
14402tats
14403
14404@ @<Ship box |p| out@>=
14405@<Update the values of |max_h| and |max_v|; but if the page is too large,
14406  |goto done|@>;
14407@<Initialize variables as |ship_out| begins@>;
14408page_loc:=dvi_offset+dvi_ptr;
14409dvi_out(bop);
14410for k:=0 to 9 do dvi_four(count(k));
14411dvi_four(last_bop); last_bop:=page_loc;
14412{ generate a pagesize special at start of page }
14413old_setting:=selector; selector:=new_string;
14414print("pdf:pagesize ");
14415if (pdf_page_width > 0) and (pdf_page_height > 0) then begin
14416  print("width"); print(" ");
14417  print_scaled(pdf_page_width);
14418  print("pt"); print(" ");
14419  print("height"); print(" ");
14420  print_scaled(pdf_page_height);
14421  print("pt");
14422end else
14423  print("default");
14424selector:=old_setting;
14425dvi_out(xxx1); dvi_out(cur_length);
14426for s:=str_start_macro(str_ptr) to pool_ptr-1 do dvi_out(so(str_pool[s]));
14427pool_ptr:=str_start_macro(str_ptr); {erase the string}
14428cur_v:=height(p)+v_offset; { does this need changing for upwards mode ???? }
14429temp_ptr:=p;
14430if type(p)=vlist_node then vlist_out@+else hlist_out;
14431dvi_out(eop); incr(total_pages); cur_s:=-1;
14432if not no_pdf_output then fflush(dvi_file);
14433done:
14434
14435@ Sometimes the user will generate a huge page because other error messages
14436are being ignored. Such pages are not output to the \.{dvi} file, since they
14437may confuse the printing software.
14438
14439@<Update the values of |max_h| and |max_v|; but if the page is too large...@>=
14440if (height(p)>max_dimen)or@|(depth(p)>max_dimen)or@|
14441   (height(p)+depth(p)+v_offset>max_dimen)or@|
14442   (width(p)+h_offset>max_dimen) then
14443  begin print_err("Huge page cannot be shipped out");
14444@.Huge page...@>
14445  help2("The page just created is more than 18 feet tall or")@/
14446   ("more than 18 feet wide, so I suspect something went wrong.");
14447  error;
14448  if tracing_output<=0 then
14449    begin begin_diagnostic;
14450    print_nl("The following box has been deleted:");
14451@.The following...deleted@>
14452    show_box(p);
14453    end_diagnostic(true);
14454    end;
14455  goto done;
14456  end;
14457if height(p)+depth(p)+v_offset>max_v then max_v:=height(p)+depth(p)+v_offset;
14458if width(p)+h_offset>max_h then max_h:=width(p)+h_offset
14459
14460@ At the end of the program, we must finish things off by writing the
14461post\-amble. If |total_pages=0|, the \.{DVI} file was never opened.
14462If |total_pages>=65536|, the \.{DVI} file will lie. And if
14463|max_push>=65536|, the user deserves whatever chaos might ensue.
14464
14465An integer variable |k| will be declared for use by this routine.
14466
14467@<Finish the \.{DVI} file@>=
14468while cur_s>-1 do
14469  begin if cur_s>0 then dvi_out(pop)
14470  else  begin dvi_out(eop); incr(total_pages);
14471    end;
14472  decr(cur_s);
14473  end;
14474if total_pages=0 then print_nl("No pages of output.")
14475@.No pages of output@>
14476else  begin dvi_out(post); {beginning of the postamble}
14477  dvi_four(last_bop); last_bop:=dvi_offset+dvi_ptr-5; {|post| location}
14478  dvi_four(25400000); dvi_four(473628672); {conversion ratio for sp}
14479  prepare_mag; dvi_four(mag); {magnification factor}
14480  dvi_four(max_v); dvi_four(max_h);@/
14481  dvi_out(max_push div 256); dvi_out(max_push mod 256);@/
14482  dvi_out((total_pages div 256) mod 256); dvi_out(total_pages mod 256);@/
14483  @<Output the font definitions for all fonts that were used@>;
14484  dvi_out(post_post); dvi_four(last_bop); dvi_out(id_byte);@/
14485  k:=4+((dvi_buf_size-dvi_ptr) mod 4); {the number of 223's}
14486  while k>0 do
14487    begin dvi_out(223); decr(k);
14488    end;
14489  @<Empty the last bytes out of |dvi_buf|@>;
14490  print_nl("Output written on "); slow_print(output_file_name);
14491@.Output written on x@>
14492  print(" ("); print_int(total_pages); print(" page");
14493  if total_pages<>1 then print_char("s");
14494  print(", "); print_int(dvi_offset+dvi_ptr); print(" bytes).");
14495  b_close(dvi_file);
14496  end
14497
14498@ @<Output the font definitions...@>=
14499while font_ptr>font_base do
14500  begin if font_used[font_ptr] then dvi_font_def(font_ptr);
14501  decr(font_ptr);
14502  end
14503
14504@* \[33] Packaging.
14505We're essentially done with the parts of \TeX\ that are concerned with
14506the input (|get_next|) and the output (|ship_out|). So it's time to
14507get heavily into the remaining part, which does the real work of typesetting.
14508
14509After lists are constructed, \TeX\ wraps them up and puts them into boxes.
14510Two major subroutines are given the responsibility for this task: |hpack|
14511applies to horizontal lists (hlists) and |vpack| applies to vertical lists
14512(vlists). The main duty of |hpack| and |vpack| is to compute the dimensions
14513of the resulting boxes, and to adjust the glue if one of those dimensions
14514is pre-specified. The computed sizes normally enclose all of the material
14515inside the new box; but some items may stick out if negative glue is used,
14516if the box is overfull, or if a \.{\\vbox} includes other boxes that have
14517been shifted left.
14518
14519The subroutine call |hpack(p,w,m)| returns a pointer to an |hlist_node|
14520for a box containing the hlist that starts at |p|. Parameter |w| specifies
14521a width; and parameter |m| is either `|exactly|' or `|additional|'.  Thus,
14522|hpack(p,w,exactly)| produces a box whose width is exactly |w|, while
14523|hpack(p,w,additional)| yields a box whose width is the natural width plus
14524|w|.  It is convenient to define a macro called `|natural|' to cover the
14525most common case, so that we can say |hpack(p,natural)| to get a box that
14526has the natural width of list |p|.
14527
14528Similarly, |vpack(p,w,m)| returns a pointer to a |vlist_node| for a
14529box containing the vlist that starts at |p|. In this case |w| represents
14530a height instead of a width; the parameter |m| is interpreted as in |hpack|.
14531
14532@d exactly=0 {a box dimension is pre-specified}
14533@d additional=1 {a box dimension is increased from the natural one}
14534@d natural==0,additional {shorthand for parameters to |hpack| and |vpack|}
14535
14536@ The parameters to |hpack| and |vpack| correspond to \TeX's primitives
14537like `\.{\\hbox} \.{to} \.{300pt}', `\.{\\hbox} \.{spread} \.{10pt}'; note
14538that `\.{\\hbox}' with no dimension following it is equivalent to
14539`\.{\\hbox} \.{spread} \.{0pt}'.  The |scan_spec| subroutine scans such
14540constructions in the user's input, including the mandatory left brace that
14541follows them, and it puts the specification onto |save_stack| so that the
14542desired box can later be obtained by executing the following code:
14543$$\vbox{\halign{#\hfil\cr
14544|save_ptr:=save_ptr-2;|\cr
14545|hpack(p,saved(1),saved(0)).|\cr}}$$
14546Special care is necessary to ensure that the special |save_stack| codes
14547are placed just below the new group code, because scanning can change
14548|save_stack| when \.{\\csname} appears.
14549
14550@p procedure scan_spec(@!c:group_code;@!three_codes:boolean);
14551  {scans a box specification and left brace}
14552label found;
14553var @!s:integer; {temporarily saved value}
14554@!spec_code:exactly..additional;
14555begin if three_codes then s:=saved(0);
14556if scan_keyword("to") then spec_code:=exactly
14557@.to@>
14558else if scan_keyword("spread") then spec_code:=additional
14559@.spread@>
14560else  begin spec_code:=additional; cur_val:=0;
14561  goto found;
14562  end;
14563scan_normal_dimen;
14564found: if three_codes then
14565  begin saved(0):=s; incr(save_ptr);
14566  end;
14567saved(0):=spec_code; saved(1):=cur_val; save_ptr:=save_ptr+2;
14568new_save_level(c); scan_left_brace;
14569end;
14570
14571@ To figure out the glue setting, |hpack| and |vpack| determine how much
14572stretchability and shrinkability are present, considering all four orders
14573of infinity. The highest order of infinity that has a nonzero coefficient
14574is then used as if no other orders were present.
14575
14576For example, suppose that the given list contains six glue nodes with
14577the respective stretchabilities 3pt, 8fill, 5fil, 6pt, $-3$fil, $-8$fill.
14578Then the total is essentially 2fil; and if a total additional space of 6pt
14579is to be achieved by stretching, the actual amounts of stretch will be
145800pt, 0pt, 15pt, 0pt, $-9$pt, and 0pt, since only `fil' glue will be
14581considered. (The `fill' glue is therefore not really stretching infinitely
14582with respect to `fil'; nobody would actually want that to happen.)
14583
14584The arrays |total_stretch| and |total_shrink| are used to determine how much
14585glue of each kind is present. A global variable |last_badness| is used
14586to implement \.{\\badness}.
14587
14588@<Glob...@>=
14589@!total_stretch, @!total_shrink: array[glue_ord] of scaled;
14590  {glue found by |hpack| or |vpack|}
14591@!last_badness:integer; {badness of the most recently packaged box}
14592
14593@ If the global variable |adjust_tail| is non-null, the |hpack| routine
14594also removes all occurrences of |ins_node|, |mark_node|, and |adjust_node|
14595items and appends the resulting material onto the list that ends at
14596location |adjust_tail|.
14597
14598@< Glob...@>=
14599@!adjust_tail:pointer; {tail of adjustment list}
14600
14601@ @<Set init...@>=adjust_tail:=null; last_badness:=0;
14602
14603@ Some stuff for character protrusion.
14604
14605@d left_pw(#) == char_pw(#, left_side)
14606@d right_pw(#) == char_pw(#, right_side)
14607
14608@p
14609function char_pw(p: pointer; side: small_number): scaled;
14610var f: internal_font_number;
14611    c: integer;
14612begin
14613  char_pw:=0;
14614  if side = left_side then
14615    last_leftmost_char:=null
14616  else
14617    last_rightmost_char:=null;
14618  if p = null then
14619    return;
14620
14621  { native word }
14622  if is_native_word_node(p) then begin
14623    if native_glyph_info_ptr(p) <> null_ptr then begin
14624      f:=native_font(p);
14625      char_pw:=round_xn_over_d(quad(f), get_native_word_cp(p, side), 1000);
14626    end;
14627    return;
14628  end;
14629
14630  { glyph node }
14631  if is_glyph_node(p) then begin
14632    f:=native_font(p);
14633    char_pw:=round_xn_over_d(quad(f), get_cp_code(f, native_glyph(p), side), 1000);
14634    return;
14635  end;
14636
14637  { char node or ligature; same like pdftex }
14638  if not is_char_node(p) then begin
14639    if type(p) = ligature_node then
14640      p:=lig_char(p)
14641    else
14642      return;
14643  end;
14644  f:=font(p);
14645  c:=get_cp_code(f, character(p), side);
14646  case side of
14647  left_side:
14648    last_leftmost_char:=p;
14649  right_side:
14650    last_rightmost_char:=p;
14651  endcases;
14652  if c = 0 then
14653    return;
14654  char_pw:=round_xn_over_d(quad(f), c, 1000);
14655end;
14656
14657function new_margin_kern(w: scaled; p: pointer; side: small_number): pointer;
14658var k: pointer;
14659begin
14660  k:=get_node(margin_kern_node_size);
14661  type(k):=margin_kern_node;
14662  subtype(k):=side;
14663  width(k):=w;
14664  new_margin_kern:=k;
14665end;
14666
14667@ Here now is |hpack|, which contains few if any surprises.
14668
14669@p function hpack(@!p:pointer;@!w:scaled;@!m:small_number):pointer;
14670label reswitch, common_ending, exit, restart;
14671var r:pointer; {the box node that will be returned}
14672@!q:pointer; {trails behind |p|}
14673@!h,@!d,@!x:scaled; {height, depth, and natural width}
14674@!s:scaled; {shift amount}
14675@!g:pointer; {points to a glue specification}
14676@!o:glue_ord; {order of infinity}
14677@!f:internal_font_number; {the font in a |char_node|}
14678@!i:four_quarters; {font information about a |char_node|}
14679@!hd:eight_bits; {height and depth indices for a character}
14680@!pp,@!ppp: pointer;
14681@!total_chars, @!k: integer;
14682begin last_badness:=0; r:=get_node(box_node_size); type(r):=hlist_node;
14683subtype(r):=min_quarterword; shift_amount(r):=0;
14684q:=r+list_offset; link(q):=p;@/
14685h:=0; @<Clear dimensions to zero@>;
14686if TeXXeT_en then @<Initialize the LR stack@>;
14687while p<>null do @<Examine node |p| in the hlist, taking account of its effect
14688  on the dimensions of the new box, or moving it to the adjustment list;
14689  then advance |p| to the next node@>;
14690if adjust_tail<>null then link(adjust_tail):=null;
14691if pre_adjust_tail<>null then link(pre_adjust_tail):=null;
14692height(r):=h; depth(r):=d;@/
14693@<Determine the value of |width(r)| and the appropriate glue setting;
14694  then |return| or |goto common_ending|@>;
14695common_ending: @<Finish issuing a diagnostic message
14696      for an overfull or underfull hbox@>;
14697exit: if TeXXeT_en then @<Check for LR anomalies at the end of |hpack|@>;
14698hpack:=r;
14699end;
14700
14701@ @<Clear dimensions to zero@>=
14702d:=0; x:=0;
14703total_stretch[normal]:=0; total_shrink[normal]:=0;
14704total_stretch[fil]:=0; total_shrink[fil]:=0;
14705total_stretch[fill]:=0; total_shrink[fill]:=0;
14706total_stretch[filll]:=0; total_shrink[filll]:=0
14707
14708@ @<Examine node |p| in the hlist, taking account of its effect...@>=
14709@^inner loop@>
14710begin reswitch: while is_char_node(p) do
14711  @<Incorporate character dimensions into the dimensions of
14712    the hbox that will contain~it, then move to the next node@>;
14713if p<>null then
14714  begin case type(p) of
14715  hlist_node,vlist_node,rule_node,unset_node:
14716    @<Incorporate box dimensions into the dimensions of
14717      the hbox that will contain~it@>;
14718  ins_node,mark_node,adjust_node: if (adjust_tail<>null) or (pre_adjust_tail<> null) then
14719    @<Transfer node |p| to the adjustment list@>;
14720  whatsit_node:@<Incorporate a whatsit node into an hbox@>;
14721  glue_node:@<Incorporate glue into the horizontal totals@>;
14722  kern_node: x:=x+width(p);
14723  margin_kern_node: x:=x+width(p);
14724  math_node: begin x:=x+width(p);
14725    if TeXXeT_en then @<Adjust \(t)the LR stack for the |hpack| routine@>;
14726    end;
14727  ligature_node: @<Make node |p| look like a |char_node|
14728    and |goto reswitch|@>;
14729  othercases do_nothing
14730  endcases;@/
14731  p:=link(p);
14732  end;
14733end
14734
14735
14736@ @<Make node |p| look like a |char_node| and |goto reswitch|@>=
14737begin mem[lig_trick]:=mem[lig_char(p)]; link(lig_trick):=link(p);
14738p:=lig_trick; xtx_ligature_present:=true; goto reswitch;
14739end
14740
14741@ The code here implicitly uses the fact that running dimensions are
14742indicated by |null_flag|, which will be ignored in the calculations
14743because it is a highly negative number.
14744
14745@<Incorporate box dimensions into the dimensions of the hbox...@>=
14746begin x:=x+width(p);
14747if type(p)>=rule_node then s:=0 @+else s:=shift_amount(p);
14748if height(p)-s>h then h:=height(p)-s;
14749if depth(p)+s>d then d:=depth(p)+s;
14750end
14751
14752@ The following code is part of \TeX's inner loop; i.e., adding another
14753character of text to the user's input will cause each of these instructions
14754to be exercised one more time.
14755@^inner loop@>
14756
14757@<Incorporate character dimensions into the dimensions of the hbox...@>=
14758begin f:=font(p); i:=char_info(f)(character(p)); hd:=height_depth(i);
14759x:=x+char_width(f)(i);@/
14760s:=char_height(f)(hd);@+if s>h then h:=s;
14761s:=char_depth(f)(hd);@+if s>d then d:=s;
14762p:=link(p);
14763end
14764
14765@ Although node |q| is not necessarily the immediate predecessor of node |p|,
14766it always points to some node in the list preceding |p|. Thus, we can delete
14767nodes by moving |q| when necessary. The algorithm takes linear time, and the
14768extra computation does not intrude on the inner loop unless it is necessary
14769to make a deletion.
14770@^inner loop@>
14771
14772@<Glob...@>=
14773@!pre_adjust_tail: pointer;
14774
14775@ @<Set init...@>=
14776pre_adjust_tail:=null;
14777
14778@ Materials in \.{\\vadjust} used with \.{pre} keyword will be appended to
14779|pre_adjust_tail| instead of |adjust_tail|.
14780
14781@d update_adjust_list(#) == begin
14782    if # = null then
14783        confusion("pre vadjust");
14784    link(#):=adjust_ptr(p);
14785    while link(#) <> null do
14786        #:=link(#);
14787end
14788
14789@<Transfer node |p| to the adjustment list@>=
14790begin while link(q)<>p do q:=link(q);
14791    if type(p) = adjust_node then begin
14792        if adjust_pre(p) <> 0 then
14793            update_adjust_list(pre_adjust_tail)
14794        else
14795            update_adjust_list(adjust_tail);
14796        p:=link(p); free_node(link(q), small_node_size);
14797    end
14798else  begin link(adjust_tail):=p; adjust_tail:=p; p:=link(p);
14799  end;
14800link(q):=p; p:=q;
14801end
14802
14803@ @<Incorporate glue into the horizontal totals@>=
14804begin g:=glue_ptr(p); x:=x+width(g);@/
14805o:=stretch_order(g); total_stretch[o]:=total_stretch[o]+stretch(g);
14806o:=shrink_order(g); total_shrink[o]:=total_shrink[o]+shrink(g);
14807if subtype(p)>=a_leaders then
14808  begin g:=leader_ptr(p);
14809  if height(g)>h then h:=height(g);
14810  if depth(g)>d then d:=depth(g);
14811  end;
14812end
14813
14814@ When we get to the present part of the program, |x| is the natural width
14815of the box being packaged.
14816
14817@<Determine the value of |width(r)| and the appropriate glue setting...@>=
14818if m=additional then w:=x+w;
14819width(r):=w; x:=w-x; {now |x| is the excess to be made up}
14820if x=0 then
14821  begin glue_sign(r):=normal; glue_order(r):=normal;
14822  set_glue_ratio_zero(glue_set(r));
14823  return;
14824  end
14825else if x>0 then @<Determine horizontal glue stretch setting, then |return|
14826    or \hbox{|goto common_ending|}@>
14827else @<Determine horizontal glue shrink setting, then |return|
14828    or \hbox{|goto common_ending|}@>
14829
14830@ @<Determine horizontal glue stretch setting...@>=
14831begin @<Determine the stretch order@>;
14832glue_order(r):=o; glue_sign(r):=stretching;
14833if total_stretch[o]<>0 then glue_set(r):=unfloat(x/total_stretch[o])
14834@^real division@>
14835else  begin glue_sign(r):=normal;
14836  set_glue_ratio_zero(glue_set(r)); {there's nothing to stretch}
14837  end;
14838if o=normal then if list_ptr(r)<>null then
14839  @<Report an underfull hbox and |goto common_ending|, if this box
14840    is sufficiently bad@>;
14841return;
14842end
14843
14844@ @<Determine the stretch order@>=
14845if total_stretch[filll]<>0 then o:=filll
14846else if total_stretch[fill]<>0 then o:=fill
14847else if total_stretch[fil]<>0 then o:=fil
14848else o:=normal
14849
14850@ @<Report an underfull hbox and |goto common_ending|, if...@>=
14851begin last_badness:=badness(x,total_stretch[normal]);
14852if last_badness>hbadness then
14853  begin print_ln;
14854  if last_badness>100 then print_nl("Underfull")@+else print_nl("Loose");
14855  print(" \hbox (badness "); print_int(last_badness);
14856@.Underfull \\hbox...@>
14857@.Loose \\hbox...@>
14858  goto common_ending;
14859  end;
14860end
14861
14862@ In order to provide a decent indication of where an overfull or underfull
14863box originated, we use a global variable |pack_begin_line| that is
14864set nonzero only when |hpack| is being called by the paragraph builder
14865or the alignment finishing routine.
14866
14867@<Glob...@>=
14868@!pack_begin_line:integer; {source file line where the current paragraph
14869  or alignment began; a negative value denotes alignment}
14870
14871@ @<Set init...@>=
14872pack_begin_line:=0;
14873
14874@ @<Finish issuing a diagnostic message for an overfull or underfull hbox@>=
14875if output_active then print(") has occurred while \output is active")
14876else  begin if pack_begin_line<>0 then
14877    begin if pack_begin_line>0 then print(") in paragraph at lines ")
14878    else print(") in alignment at lines ");
14879    print_int(abs(pack_begin_line));
14880    print("--");
14881    end
14882  else print(") detected at line ");
14883  print_int(line);
14884  end;
14885print_ln;@/
14886font_in_short_display:=null_font; short_display(list_ptr(r)); print_ln;@/
14887begin_diagnostic; show_box(r); end_diagnostic(true)
14888
14889@ @<Determine horizontal glue shrink setting...@>=
14890begin @<Determine the shrink order@>;
14891glue_order(r):=o; glue_sign(r):=shrinking;
14892if total_shrink[o]<>0 then glue_set(r):=unfloat((-x)/total_shrink[o])
14893@^real division@>
14894else  begin glue_sign(r):=normal;
14895  set_glue_ratio_zero(glue_set(r)); {there's nothing to shrink}
14896  end;
14897if (total_shrink[o]<-x)and(o=normal)and(list_ptr(r)<>null) then
14898  begin last_badness:=1000000;
14899  set_glue_ratio_one(glue_set(r)); {use the maximum shrinkage}
14900  @<Report an overfull hbox and |goto common_ending|, if this box
14901    is sufficiently bad@>;
14902  end
14903else if o=normal then if list_ptr(r)<>null then
14904  @<Report a tight hbox and |goto common_ending|, if this box
14905    is sufficiently bad@>;
14906return;
14907end
14908
14909@ @<Determine the shrink order@>=
14910if total_shrink[filll]<>0 then o:=filll
14911else if total_shrink[fill]<>0 then o:=fill
14912else if total_shrink[fil]<>0 then o:=fil
14913else o:=normal
14914
14915@ @<Report an overfull hbox and |goto common_ending|, if...@>=
14916if (-x-total_shrink[normal]>hfuzz)or(hbadness<100) then
14917  begin if (overfull_rule>0)and(-x-total_shrink[normal]>hfuzz) then
14918    begin while link(q)<>null do q:=link(q);
14919    link(q):=new_rule;
14920    width(link(q)):=overfull_rule;
14921    end;
14922  print_ln; print_nl("Overfull \hbox (");
14923@.Overfull \\hbox...@>
14924  print_scaled(-x-total_shrink[normal]); print("pt too wide");
14925  goto common_ending;
14926  end
14927
14928@ @<Report a tight hbox and |goto common_ending|, if...@>=
14929begin last_badness:=badness(-x,total_shrink[normal]);
14930if last_badness>hbadness then
14931  begin print_ln; print_nl("Tight \hbox (badness "); print_int(last_badness);
14932@.Tight \\hbox...@>
14933  goto common_ending;
14934  end;
14935end
14936
14937@ The |vpack| subroutine is actually a special case of a slightly more
14938general routine called |vpackage|, which has four parameters. The fourth
14939parameter, which is |max_dimen| in the case of |vpack|, specifies the
14940maximum depth of the page box that is constructed. The depth is first
14941computed by the normal rules; if it exceeds this limit, the reference
14942point is simply moved down until the limiting depth is attained.
14943
14944@d vpack(#)==vpackage(#,max_dimen) {special case of unconstrained depth}
14945
14946@p function vpackage(@!p:pointer;@!h:scaled;@!m:small_number;@!l:scaled):
14947  pointer;
14948label common_ending, exit;
14949var r:pointer; {the box node that will be returned}
14950@!w,@!d,@!x:scaled; {width, depth, and natural height}
14951@!s:scaled; {shift amount}
14952@!g:pointer; {points to a glue specification}
14953@!o:glue_ord; {order of infinity}
14954begin last_badness:=0; r:=get_node(box_node_size); type(r):=vlist_node;
14955if XeTeX_upwards then subtype(r):=min_quarterword+1 else subtype(r):=min_quarterword;
14956shift_amount(r):=0;
14957list_ptr(r):=p;@/
14958w:=0; @<Clear dimensions to zero@>;
14959while p<>null do @<Examine node |p| in the vlist, taking account of its effect
14960  on the dimensions of the new box; then advance |p| to the next node@>;
14961width(r):=w;
14962if d>l then
14963  begin x:=x+d-l; depth(r):=l;
14964  end
14965else depth(r):=d;
14966@<Determine the value of |height(r)| and the appropriate glue setting;
14967  then |return| or |goto common_ending|@>;
14968common_ending: @<Finish issuing a diagnostic message
14969      for an overfull or underfull vbox@>;
14970exit: vpackage:=r;
14971end;
14972
14973@ @<Examine node |p| in the vlist, taking account of its effect...@>=
14974begin if is_char_node(p) then confusion("vpack")
14975@:this can't happen vpack}{\quad vpack@>
14976else  case type(p) of
14977  hlist_node,vlist_node,rule_node,unset_node:
14978    @<Incorporate box dimensions into the dimensions of
14979      the vbox that will contain~it@>;
14980  whatsit_node:@<Incorporate a whatsit node into a vbox@>;
14981  glue_node: @<Incorporate glue into the vertical totals@>;
14982  kern_node: begin x:=x+d+width(p); d:=0;
14983    end;
14984  othercases do_nothing
14985  endcases;
14986p:=link(p);
14987end
14988
14989@ @<Incorporate box dimensions into the dimensions of the vbox...@>=
14990begin x:=x+d+height(p); d:=depth(p);
14991if type(p)>=rule_node then s:=0 @+else s:=shift_amount(p);
14992if width(p)+s>w then w:=width(p)+s;
14993end
14994
14995@ @<Incorporate glue into the vertical totals@>=
14996begin x:=x+d; d:=0;@/
14997g:=glue_ptr(p); x:=x+width(g);@/
14998o:=stretch_order(g); total_stretch[o]:=total_stretch[o]+stretch(g);
14999o:=shrink_order(g); total_shrink[o]:=total_shrink[o]+shrink(g);
15000if subtype(p)>=a_leaders then
15001  begin g:=leader_ptr(p);
15002  if width(g)>w then w:=width(g);
15003  end;
15004end
15005
15006@ When we get to the present part of the program, |x| is the natural height
15007of the box being packaged.
15008
15009@<Determine the value of |height(r)| and the appropriate glue setting...@>=
15010if m=additional then h:=x+h;
15011height(r):=h; x:=h-x; {now |x| is the excess to be made up}
15012if x=0 then
15013  begin glue_sign(r):=normal; glue_order(r):=normal;
15014  set_glue_ratio_zero(glue_set(r));
15015  return;
15016  end
15017else if x>0 then @<Determine vertical glue stretch setting, then |return|
15018    or \hbox{|goto common_ending|}@>
15019else @<Determine vertical glue shrink setting, then |return|
15020    or \hbox{|goto common_ending|}@>
15021
15022@ @<Determine vertical glue stretch setting...@>=
15023begin @<Determine the stretch order@>;
15024glue_order(r):=o; glue_sign(r):=stretching;
15025if total_stretch[o]<>0 then glue_set(r):=unfloat(x/total_stretch[o])
15026@^real division@>
15027else  begin glue_sign(r):=normal;
15028  set_glue_ratio_zero(glue_set(r)); {there's nothing to stretch}
15029  end;
15030if o=normal then if list_ptr(r)<>null then
15031  @<Report an underfull vbox and |goto common_ending|, if this box
15032    is sufficiently bad@>;
15033return;
15034end
15035
15036@ @<Report an underfull vbox and |goto common_ending|, if...@>=
15037begin last_badness:=badness(x,total_stretch[normal]);
15038if last_badness>vbadness then
15039  begin print_ln;
15040  if last_badness>100 then print_nl("Underfull")@+else print_nl("Loose");
15041  print(" \vbox (badness "); print_int(last_badness);
15042@.Underfull \\vbox...@>
15043@.Loose \\vbox...@>
15044  goto common_ending;
15045  end;
15046end
15047
15048@ @<Finish issuing a diagnostic message for an overfull or underfull vbox@>=
15049if output_active then print(") has occurred while \output is active")
15050else  begin if pack_begin_line<>0 then {it's actually negative}
15051    begin print(") in alignment at lines ");
15052    print_int(abs(pack_begin_line));
15053    print("--");
15054    end
15055  else print(") detected at line ");
15056  print_int(line);
15057  print_ln;@/
15058  end;
15059begin_diagnostic; show_box(r); end_diagnostic(true)
15060
15061@ @<Determine vertical glue shrink setting...@>=
15062begin @<Determine the shrink order@>;
15063glue_order(r):=o; glue_sign(r):=shrinking;
15064if total_shrink[o]<>0 then glue_set(r):=unfloat((-x)/total_shrink[o])
15065@^real division@>
15066else  begin glue_sign(r):=normal;
15067  set_glue_ratio_zero(glue_set(r)); {there's nothing to shrink}
15068  end;
15069if (total_shrink[o]<-x)and(o=normal)and(list_ptr(r)<>null) then
15070  begin last_badness:=1000000;
15071  set_glue_ratio_one(glue_set(r)); {use the maximum shrinkage}
15072  @<Report an overfull vbox and |goto common_ending|, if this box
15073    is sufficiently bad@>;
15074  end
15075else if o=normal then if list_ptr(r)<>null then
15076  @<Report a tight vbox and |goto common_ending|, if this box
15077    is sufficiently bad@>;
15078return;
15079end
15080
15081@ @<Report an overfull vbox and |goto common_ending|, if...@>=
15082if (-x-total_shrink[normal]>vfuzz)or(vbadness<100) then
15083  begin print_ln; print_nl("Overfull \vbox (");
15084@.Overfull \\vbox...@>
15085  print_scaled(-x-total_shrink[normal]); print("pt too high");
15086  goto common_ending;
15087  end
15088
15089@ @<Report a tight vbox and |goto common_ending|, if...@>=
15090begin last_badness:=badness(-x,total_shrink[normal]);
15091if last_badness>vbadness then
15092  begin print_ln; print_nl("Tight \vbox (badness "); print_int(last_badness);
15093@.Tight \\vbox...@>
15094  goto common_ending;
15095  end;
15096end
15097
15098@ When a box is being appended to the current vertical list, the
15099baselineskip calculation is handled by the |append_to_vlist| routine.
15100
15101@p procedure append_to_vlist(@!b:pointer);
15102var d:scaled; {deficiency of space between baselines}
15103@!p:pointer; {a new glue node}
15104@!upwards:boolean;
15105begin upwards:=XeTeX_upwards;
15106  if prev_depth>ignore_depth then
15107  begin if upwards then d:=width(baseline_skip)-prev_depth-depth(b)
15108  else d:=width(baseline_skip)-prev_depth-height(b);
15109  if d<line_skip_limit then p:=new_param_glue(line_skip_code)
15110  else  begin p:=new_skip_param(baseline_skip_code);
15111    width(temp_ptr):=d; {|temp_ptr=glue_ptr(p)|}
15112    end;
15113  link(tail):=p; tail:=p;
15114  end;
15115link(tail):=b; tail:=b; if upwards then prev_depth:=height(b) else prev_depth:=depth(b);
15116end;
15117
15118@* \[34] Data structures for math mode.
15119When \TeX\ reads a formula that is enclosed between \.\$'s, it constructs an
15120{\sl mlist}, which is essentially a tree structure representing that
15121formula.  An mlist is a linear sequence of items, but we can regard it as
15122a tree structure because mlists can appear within mlists. For example, many
15123of the entries can be subscripted or superscripted, and such ``scripts''
15124are mlists in their own right.
15125
15126An entire formula is parsed into such a tree before any of the actual
15127typesetting is done, because the current style of type is usually not
15128known until the formula has been fully scanned. For example, when the
15129formula `\.{\$a+b \\over c+d\$}' is being read, there is no way to tell
15130that `\.{a+b}' will be in script size until `\.{\\over}' has appeared.
15131
15132During the scanning process, each element of the mlist being built is
15133classified as a relation, a binary operator, an open parenthesis, etc.,
15134or as a construct like `\.{\\sqrt}' that must be built up. This classification
15135appears in the mlist data structure.
15136
15137After a formula has been fully scanned, the mlist is converted to an hlist
15138so that it can be incorporated into the surrounding text. This conversion is
15139controlled by a recursive procedure that decides all of the appropriate
15140styles by a ``top-down'' process starting at the outermost level and working
15141in towards the subformulas. The formula is ultimately pasted together using
15142combinations of horizontal and vertical boxes, with glue and penalty nodes
15143inserted as necessary.
15144
15145An mlist is represented internally as a linked list consisting chiefly
15146of ``noads'' (pronounced ``no-adds''), to distinguish them from the somewhat
15147similar ``nodes'' in hlists and vlists. Certain kinds of ordinary nodes are
15148allowed to appear in mlists together with the noads; \TeX\ tells the difference
15149by means of the |type| field, since a noad's |type| is always greater than
15150that of a node. An mlist does not contain character nodes, hlist nodes, vlist
15151nodes, math nodes, ligature nodes,
15152or unset nodes; in particular, each mlist item appears in the
15153variable-size part of |mem|, so the |type| field is always present.
15154
15155@ Each noad is four or more words long. The first word contains the |type|
15156and |subtype| and |link| fields that are already so familiar to us; the
15157second, third, and fourth words are called the noad's |nucleus|, |subscr|,
15158and |supscr| fields.
15159
15160Consider, for example, the simple formula `\.{\$x\^2\$}', which would be
15161parsed into an mlist containing a single element called an |ord_noad|.
15162The |nucleus| of this noad is a representation of `\.x', the |subscr| is
15163empty, and the |supscr| is a representation of `\.2'.
15164
15165The |nucleus|, |subscr|, and |supscr| fields are further broken into
15166subfields. If |p| points to a noad, and if |q| is one of its principal
15167fields (e.g., |q=subscr(p)|), there are several possibilities for the
15168subfields, depending on the |math_type| of |q|.
15169
15170\yskip\hang|math_type(q)=math_char| means that |fam(q)| refers to one of
15171the sixteen font families, and |character(q)| is the number of a character
15172within a font of that family, as in a character node.
15173
15174\yskip\hang|math_type(q)=math_text_char| is similar, but the character is
15175unsubscripted and unsuperscripted and it is followed immediately by another
15176character from the same font. (This |math_type| setting appears only
15177briefly during the processing; it is used to suppress unwanted italic
15178corrections.)
15179
15180\yskip\hang|math_type(q)=empty| indicates a field with no value (the
15181corresponding attribute of noad |p| is not present).
15182
15183\yskip\hang|math_type(q)=sub_box| means that |info(q)| points to a box
15184node (either an |hlist_node| or a |vlist_node|) that should be used as the
15185value of the field.  The |shift_amount| in the subsidiary box node is the
15186amount by which that box will be shifted downward.
15187
15188\yskip\hang|math_type(q)=sub_mlist| means that |info(q)| points to
15189an mlist; the mlist must be converted to an hlist in order to obtain
15190the value of this field.
15191
15192\yskip\noindent In the latter case, we might have |info(q)=null|. This
15193is not the same as |math_type(q)=empty|; for example, `\.{\$P\_\{\}\$}'
15194and `\.{\$P\$}' produce different results (the former will not have the
15195``italic correction'' added to the width of |P|, but the ``script skip''
15196will be added).
15197
15198The definitions of subfields given here are evidently wasteful of space,
15199since a halfword is being used for the |math_type| although only three
15200bits would be needed. However, there are hardly ever many noads present at
15201once, since they are soon converted to nodes that take up even more space,
15202so we can afford to represent them in whatever way simplifies the
15203programming.
15204
15205@d noad_size=4 {number of words in a normal noad}
15206@d nucleus(#)==#+1 {the |nucleus| field of a noad}
15207@d supscr(#)==#+2 {the |supscr| field of a noad}
15208@d subscr(#)==#+3 {the |subscr| field of a noad}
15209@d math_type==link {a |halfword| in |mem|}
15210@d plane_and_fam_field==font {a |quarterword| in |mem|}
15211@d fam(#) == (plane_and_fam_field(#) mod @"100)
15212@d math_char=1 {|math_type| when the attribute is simple}
15213@d sub_box=2 {|math_type| when the attribute is a box}
15214@d sub_mlist=3 {|math_type| when the attribute is a formula}
15215@d math_text_char=4 {|math_type| when italic correction is dubious}
15216
15217@ Each portion of a formula is classified as Ord, Op, Bin, Rel, Ope,
15218Clo, Pun, or Inn, for purposes of spacing and line breaking. An
15219|ord_noad|, |op_noad|, |bin_noad|, |rel_noad|, |open_noad|, |close_noad|,
15220|punct_noad|, or |inner_noad| is used to represent portions of the various
15221types. For example, an `\.=' sign in a formula leads to the creation of a
15222|rel_noad| whose |nucleus| field is a representation of an equals sign
15223(usually |fam=0|, |character=@'75|).  A formula preceded by \.{\\mathrel}
15224also results in a |rel_noad|.  When a |rel_noad| is followed by an
15225|op_noad|, say, and possibly separated by one or more ordinary nodes (not
15226noads), \TeX\ will insert a penalty node (with the current |rel_penalty|)
15227just after the formula that corresponds to the |rel_noad|, unless there
15228already was a penalty immediately following; and a ``thick space'' will be
15229inserted just before the formula that corresponds to the |op_noad|.
15230
15231A noad of type |ord_noad|, |op_noad|, \dots, |inner_noad| usually
15232has a |subtype=normal|. The only exception is that an |op_noad| might
15233have |subtype=limits| or |no_limits|, if the normal positioning of
15234limits has been overridden for this operator.
15235
15236@d ord_noad=unset_node+3 {|type| of a noad classified Ord}
15237@d op_noad=ord_noad+1 {|type| of a noad classified Op}
15238@d bin_noad=ord_noad+2 {|type| of a noad classified Bin}
15239@d rel_noad=ord_noad+3 {|type| of a noad classified Rel}
15240@d open_noad=ord_noad+4 {|type| of a noad classified Ope}
15241@d close_noad=ord_noad+5 {|type| of a noad classified Clo}
15242@d punct_noad=ord_noad+6 {|type| of a noad classified Pun}
15243@d inner_noad=ord_noad+7 {|type| of a noad classified Inn}
15244@d limits=1 {|subtype| of |op_noad| whose scripts are to be above, below}
15245@d no_limits=2 {|subtype| of |op_noad| whose scripts are to be normal}
15246
15247@ A |radical_noad| is five words long; the fifth word is the |left_delimiter|
15248field, which usually represents a square root sign.
15249
15250A |fraction_noad| is six words long; it has a |right_delimiter| field
15251as well as a |left_delimiter|.
15252
15253Delimiter fields are of type |four_quarters|, and they have four subfields
15254called |small_fam|, |small_char|, |large_fam|, |large_char|. These subfields
15255represent variable-size delimiters by giving the ``small'' and ``large''
15256starting characters, as explained in Chapter~17 of {\sl The \TeX book}.
15257@:TeXbook}{\sl The \TeX book@>
15258
15259A |fraction_noad| is actually quite different from all other noads. Not
15260only does it have six words, it has |thickness|, |denominator|, and
15261|numerator| fields instead of |nucleus|, |subscr|, and |supscr|. The
15262|thickness| is a scaled value that tells how thick to make a fraction
15263rule; however, the special value |default_code| is used to stand for the
15264|default_rule_thickness| of the current size. The |numerator| and
15265|denominator| point to mlists that define a fraction; we always have
15266$$\hbox{|math_type(numerator)=math_type(denominator)=sub_mlist|}.$$ The
15267|left_delimiter| and |right_delimiter| fields specify delimiters that will
15268be placed at the left and right of the fraction. In this way, a
15269|fraction_noad| is able to represent all of \TeX's operators \.{\\over},
15270\.{\\atop}, \.{\\above}, \.{\\overwithdelims}, \.{\\atopwithdelims}, and
15271 \.{\\abovewithdelims}.
15272
15273@d left_delimiter(#)==#+4 {first delimiter field of a noad}
15274@d right_delimiter(#)==#+5 {second delimiter field of a fraction noad}
15275@d radical_noad=inner_noad+1 {|type| of a noad for square roots}
15276@d radical_noad_size=5 {number of |mem| words in a radical noad}
15277@d fraction_noad=radical_noad+1 {|type| of a noad for generalized fractions}
15278@d fraction_noad_size=6 {number of |mem| words in a fraction noad}
15279@d small_fam(#)==(mem[#].qqqq.b0 mod @"100) {|fam| for ``small'' delimiter}
15280@d small_char(#)==(mem[#].qqqq.b1 + (mem[#].qqqq.b0 div @"100) * @"10000) {|character| for ``small'' delimiter}
15281@d large_fam(#)==(mem[#].qqqq.b2 mod @"100) {|fam| for ``large'' delimiter}
15282@d large_char(#)==(mem[#].qqqq.b3 + (mem[#].qqqq.b2 div @"100) * @"10000) {|character| for ``large'' delimiter}
15283@d small_plane_and_fam_field(#)==mem[#].qqqq.b0
15284@d small_char_field(#)==mem[#].qqqq.b1
15285@d large_plane_and_fam_field(#)==mem[#].qqqq.b2
15286@d large_char_field(#)==mem[#].qqqq.b3
15287@d thickness==width {|thickness| field in a fraction noad}
15288@d default_code==@'10000000000 {denotes |default_rule_thickness|}
15289@d numerator==supscr {|numerator| field in a fraction noad}
15290@d denominator==subscr {|denominator| field in a fraction noad}
15291
15292@ The global variable |empty_field| is set up for initialization of empty
15293fields in new noads. Similarly, |null_delimiter| is for the initialization
15294of delimiter fields.
15295
15296@<Glob...@>=
15297@!empty_field:two_halves;
15298@!null_delimiter:four_quarters;
15299
15300@ @<Set init...@>=
15301empty_field.rh:=empty; empty_field.lh:=null;@/
15302null_delimiter.b0:=0; null_delimiter.b1:=min_quarterword;@/
15303null_delimiter.b2:=0; null_delimiter.b3:=min_quarterword;
15304
15305@ The |new_noad| function creates an |ord_noad| that is completely null.
15306
15307@p function new_noad:pointer;
15308var p:pointer;
15309begin p:=get_node(noad_size);
15310type(p):=ord_noad; subtype(p):=normal;
15311mem[nucleus(p)].hh:=empty_field;
15312mem[subscr(p)].hh:=empty_field;
15313mem[supscr(p)].hh:=empty_field;
15314new_noad:=p;
15315end;
15316
15317@ A few more kinds of noads will complete the set: An |under_noad| has its
15318nucleus underlined; an |over_noad| has it overlined. An |accent_noad| places
15319an accent over its nucleus; the accent character appears as
15320|fam(accent_chr(p))| and |character(accent_chr(p))|. A |vcenter_noad|
15321centers its nucleus vertically with respect to the axis of the formula;
15322in such noads we always have |math_type(nucleus(p))=sub_box|.
15323
15324And finally, we have |left_noad| and |right_noad| types, to implement
15325\TeX's \.{\\left} and \.{\\right} as well as \eTeX's \.{\\middle}.
15326The |nucleus| of such noads is
15327replaced by a |delimiter| field; thus, for example, `\.{\\left(}' produces
15328a |left_noad| such that |delimiter(p)| holds the family and character
15329codes for all left parentheses. A |left_noad| never appears in an mlist
15330except as the first element, and a |right_noad| never appears in an mlist
15331except as the last element; furthermore, we either have both a |left_noad|
15332and a |right_noad|, or neither one is present. The |subscr| and |supscr|
15333fields are always |empty| in a |left_noad| and a |right_noad|.
15334
15335@d under_noad=fraction_noad+1 {|type| of a noad for underlining}
15336@d over_noad=under_noad+1 {|type| of a noad for overlining}
15337@d accent_noad=over_noad+1 {|type| of a noad for accented subformulas}
15338@d fixed_acc=1 {|subtype| for non growing math accents}
15339@d bottom_acc=2 {|subtype| for bottom math accents}
15340@d is_bottom_acc(#)==((subtype(#)=bottom_acc) or (subtype(#)=bottom_acc+fixed_acc))
15341@d accent_noad_size=5 {number of |mem| words in an accent noad}
15342@d accent_chr(#)==#+4 {the |accent_chr| field of an accent noad}
15343@d vcenter_noad=accent_noad+1 {|type| of a noad for \.{\\vcenter}}
15344@d left_noad=vcenter_noad+1 {|type| of a noad for \.{\\left}}
15345@d right_noad=left_noad+1 {|type| of a noad for \.{\\right}}
15346@d delimiter==nucleus {|delimiter| field in left and right noads}
15347@d middle_noad==1 {|subtype| of right noad representing \.{\\middle}}
15348@d scripts_allowed(#)==(type(#)>=ord_noad)and(type(#)<left_noad)
15349
15350@ Math formulas can also contain instructions like \.{\\textstyle} that
15351override \TeX's normal style rules. A |style_node| is inserted into the
15352data structure to record such instructions; it is three words long, so it
15353is considered a node instead of a noad. The |subtype| is either |display_style|
15354or |text_style| or |script_style| or |script_script_style|. The
15355second and third words of a |style_node| are not used, but they are
15356present because a |choice_node| is converted to a |style_node|.
15357
15358\TeX\ uses even numbers 0, 2, 4, 6 to encode the basic styles
15359|display_style|, \dots, |script_script_style|, and adds~1 to get the
15360``cramped'' versions of these styles. This gives a numerical order that
15361is backwards from the convention of Appendix~G in {\sl The \TeX book\/};
15362i.e., a smaller style has a larger numerical value.
15363@:TeXbook}{\sl The \TeX book@>
15364
15365@d style_node=unset_node+1 {|type| of a style node}
15366@d style_node_size=3 {number of words in a style node}
15367@d display_style=0 {|subtype| for \.{\\displaystyle}}
15368@d text_style=2 {|subtype| for \.{\\textstyle}}
15369@d script_style=4 {|subtype| for \.{\\scriptstyle}}
15370@d script_script_style=6 {|subtype| for \.{\\scriptscriptstyle}}
15371@d cramped=1 {add this to an uncramped style if you want to cramp it}
15372
15373@p function new_style(@!s:small_number):pointer; {create a style node}
15374var p:pointer; {the new node}
15375begin p:=get_node(style_node_size); type(p):=style_node;
15376subtype(p):=s; width(p):=0; depth(p):=0; {the |width| and |depth| are not used}
15377new_style:=p;
15378end;
15379
15380@ Finally, the \.{\\mathchoice} primitive creates a |choice_node|, which
15381has special subfields |display_mlist|, |text_mlist|, |script_mlist|,
15382and |script_script_mlist| pointing to the mlists for each style.
15383
15384@d choice_node=unset_node+2 {|type| of a choice node}
15385@d display_mlist(#)==info(#+1) {mlist to be used in display style}
15386@d text_mlist(#)==link(#+1) {mlist to be used in text style}
15387@d script_mlist(#)==info(#+2) {mlist to be used in script style}
15388@d script_script_mlist(#)==link(#+2) {mlist to be used in scriptscript style}
15389
15390@p function new_choice:pointer; {create a choice node}
15391var p:pointer; {the new node}
15392begin p:=get_node(style_node_size); type(p):=choice_node;
15393subtype(p):=0; {the |subtype| is not used}
15394display_mlist(p):=null; text_mlist(p):=null; script_mlist(p):=null;
15395script_script_mlist(p):=null;
15396new_choice:=p;
15397end;
15398
15399@ Let's consider now the previously unwritten part of |show_node_list|
15400that displays the things that can only be present in mlists; this
15401program illustrates how to access the data structures just defined.
15402
15403In the context of the following program, |p| points to a node or noad that
15404should be displayed, and the current string contains the ``recursion history''
15405that leads to this point. The recursion history consists of a dot for each
15406outer level in which |p| is subsidiary to some node, or in which |p| is
15407subsidiary to the |nucleus| field of some noad; the dot is replaced by
15408`\.\_' or `\.\^' or `\./' or `\.\\' if |p| is descended from the |subscr|
15409or |supscr| or |denominator| or |numerator| fields of noads. For example,
15410the current string would be `\.{.\^.\_/}' if |p| points to the |ord_noad| for
15411|x| in the (ridiculous) formula
15412`\.{\$\\sqrt\{a\^\{\\mathinner\{b\_\{c\\over x+y\}\}\}\}\$}'.
15413
15414@<Cases of |show_node_list| that arise...@>=
15415style_node:print_style(subtype(p));
15416choice_node:@<Display choice node |p|@>;
15417ord_noad,op_noad,bin_noad,rel_noad,open_noad,close_noad,punct_noad,inner_noad,
15418  radical_noad,over_noad,under_noad,vcenter_noad,accent_noad,
15419  left_noad,right_noad:@<Display normal noad |p|@>;
15420fraction_noad:@<Display fraction noad |p|@>;
15421
15422@ Here are some simple routines used in the display of noads.
15423
15424@<Declare procedures needed for displaying the elements of mlists@>=
15425procedure print_fam_and_char(@!p:pointer); {prints family and character}
15426var c:integer;
15427begin print_esc("fam"); print_int(fam(p) mod @"100); print_char(" ");
15428c:=(cast_to_ushort(character(p)) + ((plane_and_fam_field(p) div @"100) * @"10000));
15429if c < @"10000 then print_ASCII(c)
15430else print_char(c); {non-Plane 0 Unicodes can't be sent through |print_ASCII|}
15431end;
15432@#
15433procedure print_delimiter(@!p:pointer); {prints a delimiter as 24-bit hex value}
15434var a:integer; {accumulator}
15435begin a:=small_fam(p)*256+qo(small_char(p));
15436a:=a*@"1000+large_fam(p)*256+qo(large_char(p));
15437if a<0 then print_int(a) {this should never happen}
15438else print_hex(a);
15439end;
15440
15441@ The next subroutine will descend to another level of recursion when a
15442subsidiary mlist needs to be displayed. The parameter |c| indicates what
15443character is to become part of the recursion history. An empty mlist is
15444distinguished from a field with |math_type(p)=empty|, because these are
15445not equivalent (as explained above).
15446@^recursion@>
15447
15448@<Declare procedures needed for displaying...@>=
15449procedure@?show_info; forward;@t\2@>@?{|show_node_list(info(temp_ptr))|}
15450procedure print_subsidiary_data(@!p:pointer;@!c:ASCII_code);
15451  {display a noad field}
15452begin if cur_length>=depth_threshold then
15453  begin if math_type(p)<>empty then print(" []");
15454  end
15455else  begin append_char(c); {include |c| in the recursion history}
15456  temp_ptr:=p; {prepare for |show_info| if recursion is needed}
15457  case math_type(p) of
15458  math_char: begin print_ln; print_current_string; print_fam_and_char(p);
15459    end;
15460  sub_box: show_info; {recursive call}
15461  sub_mlist: if info(p)=null then
15462      begin print_ln; print_current_string; print("{}");
15463      end
15464    else show_info; {recursive call}
15465  othercases do_nothing {|empty|}
15466  endcases;@/
15467  flush_char; {remove |c| from the recursion history}
15468  end;
15469end;
15470
15471@ The inelegant introduction of |show_info| in the code above seems better
15472than the alternative of using \PASCAL's strange |forward| declaration for a
15473procedure with parameters. The \PASCAL\ convention about dropping parameters
15474from a post-|forward| procedure is, frankly, so intolerable to the author
15475of \TeX\ that he would rather stoop to communication via a global temporary
15476variable. (A similar stoopidity occurred with respect to |hlist_out| and
15477|vlist_out| above, and it will occur with respect to |mlist_to_hlist| below.)
15478@^Knuth, Donald Ervin@>
15479@:PASCAL}{\PASCAL@>
15480
15481@p procedure show_info; {the reader will kindly forgive this}
15482begin show_node_list(info(temp_ptr));
15483end;
15484
15485@ @<Declare procedures needed for displaying...@>=
15486procedure print_style(@!c:integer);
15487begin case c div 2 of
154880: print_esc("displaystyle"); {|display_style=0|}
154891: print_esc("textstyle"); {|text_style=2|}
154902: print_esc("scriptstyle"); {|script_style=4|}
154913: print_esc("scriptscriptstyle"); {|script_script_style=6|}
15492othercases print("Unknown style!")
15493endcases;
15494end;
15495
15496@ @<Display choice node |p|@>=
15497begin print_esc("mathchoice");
15498append_char("D"); show_node_list(display_mlist(p)); flush_char;
15499append_char("T"); show_node_list(text_mlist(p)); flush_char;
15500append_char("S"); show_node_list(script_mlist(p)); flush_char;
15501append_char("s"); show_node_list(script_script_mlist(p)); flush_char;
15502end
15503
15504@ @<Display normal noad |p|@>=
15505begin case type(p) of
15506ord_noad: print_esc("mathord");
15507op_noad: print_esc("mathop");
15508bin_noad: print_esc("mathbin");
15509rel_noad: print_esc("mathrel");
15510open_noad: print_esc("mathopen");
15511close_noad: print_esc("mathclose");
15512punct_noad: print_esc("mathpunct");
15513inner_noad: print_esc("mathinner");
15514over_noad: print_esc("overline");
15515under_noad: print_esc("underline");
15516vcenter_noad: print_esc("vcenter");
15517radical_noad: begin print_esc("radical"); print_delimiter(left_delimiter(p));
15518  end;
15519accent_noad: begin print_esc("accent"); print_fam_and_char(accent_chr(p));
15520  end;
15521left_noad: begin print_esc("left"); print_delimiter(delimiter(p));
15522  end;
15523right_noad: begin if subtype(p)=normal then print_esc("right")
15524  else print_esc("middle");
15525  print_delimiter(delimiter(p));
15526  end;
15527end;
15528if type(p)<left_noad then
15529  begin if subtype(p)<>normal then
15530    if subtype(p)=limits then print_esc("limits")
15531    else print_esc("nolimits");
15532  print_subsidiary_data(nucleus(p),".");
15533  end;
15534print_subsidiary_data(supscr(p),"^");
15535print_subsidiary_data(subscr(p),"_");
15536end
15537
15538@ @<Display fraction noad |p|@>=
15539begin print_esc("fraction, thickness ");
15540if thickness(p)=default_code then print("= default")
15541else print_scaled(thickness(p));
15542if (small_fam(left_delimiter(p))<>0)or@+
15543  (small_char(left_delimiter(p))<>min_quarterword)or@|
15544  (large_fam(left_delimiter(p))<>0)or@|
15545  (large_char(left_delimiter(p))<>min_quarterword) then
15546  begin print(", left-delimiter "); print_delimiter(left_delimiter(p));
15547  end;
15548if (small_fam(right_delimiter(p))<>0)or@|
15549  (small_char(right_delimiter(p))<>min_quarterword)or@|
15550  (large_fam(right_delimiter(p))<>0)or@|
15551  (large_char(right_delimiter(p))<>min_quarterword) then
15552  begin print(", right-delimiter "); print_delimiter(right_delimiter(p));
15553  end;
15554print_subsidiary_data(numerator(p),"\");
15555print_subsidiary_data(denominator(p),"/");
15556end
15557
15558@ That which can be displayed can also be destroyed.
15559
15560@<Cases of |flush_node_list| that arise...@>=
15561style_node: begin free_node(p,style_node_size); goto done;
15562  end;
15563choice_node:begin flush_node_list(display_mlist(p));
15564  flush_node_list(text_mlist(p));
15565  flush_node_list(script_mlist(p));
15566  flush_node_list(script_script_mlist(p));
15567  free_node(p,style_node_size); goto done;
15568  end;
15569ord_noad,op_noad,bin_noad,rel_noad,open_noad,close_noad,punct_noad,inner_noad,
15570  radical_noad,over_noad,under_noad,vcenter_noad,accent_noad:@t@>@;@/
15571  begin if math_type(nucleus(p))>=sub_box then
15572    flush_node_list(info(nucleus(p)));
15573  if math_type(supscr(p))>=sub_box then
15574    flush_node_list(info(supscr(p)));
15575  if math_type(subscr(p))>=sub_box then
15576    flush_node_list(info(subscr(p)));
15577  if type(p)=radical_noad then free_node(p,radical_noad_size)
15578  else if type(p)=accent_noad then free_node(p,accent_noad_size)
15579  else free_node(p,noad_size);
15580  goto done;
15581  end;
15582left_noad,right_noad: begin free_node(p,noad_size); goto done;
15583  end;
15584fraction_noad: begin flush_node_list(info(numerator(p)));
15585  flush_node_list(info(denominator(p)));
15586  free_node(p,fraction_noad_size); goto done;
15587  end;
15588
15589@* \[35] Subroutines for math mode.
15590In order to convert mlists to hlists, i.e., noads to nodes, we need several
15591subroutines that are conveniently dealt with now.
15592
15593Let us first introduce the macros that make it easy to get at the parameters and
15594other font information. A size code, which is a multiple of 16, is added to a
15595family number to get an index into the table of internal font numbers
15596for each combination of family and size.  (Be alert: Size codes get
15597larger as the type gets smaller.)
15598
15599@<Basic printing procedures@>=
15600procedure print_size(@!s:integer);
15601begin if s=text_size then print_esc("textfont")
15602else if s=script_size then print_esc("scriptfont")
15603else print_esc("scriptscriptfont");
15604end;
15605
15606@ Before an mlist is converted to an hlist, \TeX\ makes sure that
15607the fonts in family~2 have enough parameters to be math-symbol
15608fonts, and that the fonts in family~3 have enough parameters to be
15609math-extension fonts. The math-symbol parameters are referred to by using the
15610following macros, which take a size code as their parameter; for example,
15611|num1(cur_size)| gives the value of the |num1| parameter for the current size.
15612@^parameters for symbols@>
15613@^font parameters@>
15614
15615NB: the access functions here must all put the font \# into /f/ for mathsy().
15616
15617The accessors are defined with
15618|define_mathsy_accessor(NAME)(fontdimen-number)(NAME)|
15619because I can't see how to only give the name once, with WEB's limited
15620macro capabilities. This seems a bit ugly, but it works.
15621
15622@d total_mathsy_params=22
15623
15624{the following are OpenType MATH constant indices for use with OT math fonts}
15625@d  scriptPercentScaleDown = 0
15626@d  scriptScriptPercentScaleDown = 1
15627@d  delimitedSubFormulaMinHeight = 2
15628@d  displayOperatorMinHeight = 3
15629@d  mathLeading = 4
15630@d  firstMathValueRecord = mathLeading
15631@d  axisHeight = 5
15632@d  accentBaseHeight = 6
15633@d  flattenedAccentBaseHeight = 7
15634@d  subscriptShiftDown = 8
15635@d  subscriptTopMax = 9
15636@d  subscriptBaselineDropMin = 10
15637@d  superscriptShiftUp = 11
15638@d  superscriptShiftUpCramped = 12
15639@d  superscriptBottomMin = 13
15640@d  superscriptBaselineDropMax = 14
15641@d  subSuperscriptGapMin = 15
15642@d  superscriptBottomMaxWithSubscript = 16
15643@d  spaceAfterScript = 17
15644@d  upperLimitGapMin = 18
15645@d  upperLimitBaselineRiseMin = 19
15646@d  lowerLimitGapMin = 20
15647@d  lowerLimitBaselineDropMin = 21
15648@d  stackTopShiftUp = 22
15649@d  stackTopDisplayStyleShiftUp = 23
15650@d  stackBottomShiftDown = 24
15651@d  stackBottomDisplayStyleShiftDown = 25
15652@d  stackGapMin = 26
15653@d  stackDisplayStyleGapMin = 27
15654@d  stretchStackTopShiftUp = 28
15655@d  stretchStackBottomShiftDown = 29
15656@d  stretchStackGapAboveMin = 30
15657@d  stretchStackGapBelowMin = 31
15658@d  fractionNumeratorShiftUp = 32
15659@d  fractionNumeratorDisplayStyleShiftUp = 33
15660@d  fractionDenominatorShiftDown = 34
15661@d  fractionDenominatorDisplayStyleShiftDown = 35
15662@d  fractionNumeratorGapMin = 36
15663@d  fractionNumDisplayStyleGapMin = 37
15664@d  fractionRuleThickness = 38
15665@d  fractionDenominatorGapMin = 39
15666@d  fractionDenomDisplayStyleGapMin = 40
15667@d  skewedFractionHorizontalGap = 41
15668@d  skewedFractionVerticalGap = 42
15669@d  overbarVerticalGap = 43
15670@d  overbarRuleThickness = 44
15671@d  overbarExtraAscender = 45
15672@d  underbarVerticalGap = 46
15673@d  underbarRuleThickness = 47
15674@d  underbarExtraDescender = 48
15675@d  radicalVerticalGap = 49
15676@d  radicalDisplayStyleVerticalGap = 50
15677@d  radicalRuleThickness = 51
15678@d  radicalExtraAscender = 52
15679@d  radicalKernBeforeDegree = 53
15680@d  radicalKernAfterDegree = 54
15681@d  lastMathValueRecord = radicalKernAfterDegree
15682@d  radicalDegreeBottomRaisePercent = 55
15683@d  lastMathConstant = radicalDegreeBottomRaisePercent
15684
15685
15686@d mathsy(#)==font_info[#+param_base[f]].sc
15687
15688@d define_mathsy_end(#)==
15689    #:=rval;
15690  end
15691@d define_mathsy_body(#)==
15692  var
15693    f: integer;
15694    rval: scaled;
15695  begin
15696    f:=fam_fnt(2 + size_code);
15697    if is_new_mathfont(cur_f) then
15698      rval:=get_native_mathsy_param(cur_f, #)
15699    else
15700      rval:=mathsy(#);
15701    define_mathsy_end
15702@d define_mathsy_accessor(#)==function #(size_code: integer): scaled; define_mathsy_body
15703
15704@p define_mathsy_accessor(math_x_height)(5)(math_x_height);
15705define_mathsy_accessor(math_quad)(6)(math_quad);
15706define_mathsy_accessor(num1)(8)(num1);
15707define_mathsy_accessor(num2)(9)(num2);
15708define_mathsy_accessor(num3)(10)(num3);
15709define_mathsy_accessor(denom1)(11)(denom1);
15710define_mathsy_accessor(denom2)(12)(denom2);
15711define_mathsy_accessor(sup1)(13)(sup1);
15712define_mathsy_accessor(sup2)(14)(sup2);
15713define_mathsy_accessor(sup3)(15)(sup3);
15714define_mathsy_accessor(sub1)(16)(sub1);
15715define_mathsy_accessor(sub2)(17)(sub2);
15716define_mathsy_accessor(sup_drop)(18)(sup_drop);
15717define_mathsy_accessor(sub_drop)(19)(sub_drop);
15718define_mathsy_accessor(delim1)(20)(delim1);
15719define_mathsy_accessor(delim2)(21)(delim2);
15720define_mathsy_accessor(axis_height)(22)(axis_height);
15721
15722@ The math-extension parameters have similar macros, but the size code is
15723omitted (since it is always |cur_size| when we refer to such parameters).
15724@^parameters for symbols@>
15725@^font parameters@>
15726
15727@d total_mathex_params=13
15728
15729@d mathex(#)==font_info[#+param_base[f]].sc
15730
15731@d define_mathex_end(#)==
15732    #:=rval;
15733  end
15734@d define_mathex_body(#)==
15735  var
15736    f: integer;
15737    rval: scaled;
15738  begin
15739    f:=fam_fnt(3 + cur_size);
15740    if is_new_mathfont(cur_f) then
15741      rval:=get_native_mathex_param(cur_f, #)
15742    else
15743      rval:=mathex(#);
15744  define_mathex_end
15745@d define_mathex_accessor(#)==function #:scaled; define_mathex_body
15746
15747@p define_mathex_accessor(default_rule_thickness)(8)(default_rule_thickness);
15748define_mathex_accessor(big_op_spacing1)(9)(big_op_spacing1);
15749define_mathex_accessor(big_op_spacing2)(10)(big_op_spacing2);
15750define_mathex_accessor(big_op_spacing3)(11)(big_op_spacing3);
15751define_mathex_accessor(big_op_spacing4)(12)(big_op_spacing4);
15752define_mathex_accessor(big_op_spacing5)(13)(big_op_spacing5);
15753
15754@ Native font support requires these additional subroutines.
15755
15756|new_native_word_node| creates the node, but does not actually set its metrics;
15757call |set_native_metrics(node)| if that is required.
15758
15759@<Declare subroutines for |new_character|@>=
15760function new_native_word_node(@!f:internal_font_number;@!n:integer):pointer;
15761var
15762  l:  integer;
15763  q:  pointer;
15764begin
15765  l:=native_node_size + (n * sizeof(UTF16_code) + sizeof(memory_word) - 1) div sizeof(memory_word);
15766
15767  q:=get_node(l);
15768  type(q):=whatsit_node;
15769  subtype(q):=native_word_node;
15770
15771  native_size(q):=l;
15772  native_font(q):=f;
15773  native_length(q):=n;
15774
15775  native_glyph_count(q):=0;
15776  native_glyph_info_ptr(q):=null_ptr;
15777
15778  new_native_word_node:=q;
15779end;
15780
15781function new_native_character(@!f:internal_font_number;@!c:UnicodeScalar):pointer;
15782var
15783  p:  pointer;
15784  i, len: integer;
15785begin
15786  if font_mapping[f] <> 0 then begin
15787    if c > @"FFFF then begin
15788      str_room(2);
15789      append_char((c - @"10000) div 1024 + @"D800);
15790      append_char((c - @"10000) mod 1024 + @"DC00);
15791    end
15792    else begin
15793      str_room(1);
15794      append_char(c);
15795    end;
15796    len:=apply_mapping(font_mapping[f], addressof(str_pool[str_start_macro(str_ptr)]), cur_length);
15797    pool_ptr:=str_start_macro(str_ptr); { flush the string, as we'll be using the mapped text instead }
15798
15799    i:=0;
15800    while i < len do begin
15801      if (mapped_text[i] >= @"D800) and (mapped_text[i] < @"DC00) then begin
15802        c:=(mapped_text[i] - @"D800) * 1024 + mapped_text[i+1] - @"DC00 + @"10000;
15803        if map_char_to_glyph(f, c) = 0 then begin
15804          char_warning(f, c);
15805        end;
15806        i:=i + 2;
15807      end
15808      else begin
15809        if map_char_to_glyph(f, mapped_text[i]) = 0 then begin
15810          char_warning(f, mapped_text[i]);
15811        end;
15812        i:=i + 1;
15813      end;
15814    end;
15815
15816    p:=new_native_word_node(f, len);
15817    for i:=0 to len-1 do begin
15818      set_native_char(p, i, mapped_text[i]);
15819    end
15820  end
15821  else begin
15822    if tracing_lost_chars > 0 then
15823      if map_char_to_glyph(f, c) = 0 then begin
15824        char_warning(f, c);
15825      end;
15826
15827    p:=get_node(native_node_size + 1);
15828    type(p):=whatsit_node;
15829    subtype(p):=native_word_node;
15830
15831    native_size(p):=native_node_size + 1;
15832    native_glyph_count(p):=0;
15833    native_glyph_info_ptr(p):=null_ptr;
15834    native_font(p):=f;
15835
15836    if c > @"FFFF then begin
15837      native_length(p):=2;
15838      set_native_char(p, 0, (c - @"10000) div 1024 + @"D800);
15839      set_native_char(p, 1, (c - @"10000) mod 1024 + @"DC00);
15840    end
15841    else begin
15842      native_length(p):=1;
15843      set_native_char(p, 0, c);
15844    end;
15845  end;
15846
15847  set_native_metrics(p, XeTeX_use_glyph_metrics);
15848
15849  new_native_character:=p;
15850end;
15851
15852procedure font_feature_warning(featureNameP:void_pointer; featLen:integer;
15853  settingNameP:void_pointer; setLen:integer);
15854var
15855  i: integer;
15856begin
15857  begin_diagnostic;
15858  print_nl("Unknown ");
15859  if setLen > 0 then begin
15860    print("selector `");
15861    print_utf8_str(settingNameP, setLen);
15862    print("' for ");
15863  end;
15864  print("feature `");
15865  print_utf8_str(featureNameP, featLen);
15866  print("' in font `");
15867  i:=1;
15868  while ord(name_of_file[i]) <> 0 do begin
15869    print_visible_char(name_of_file[i]); { this is already UTF-8 }
15870    incr(i);
15871  end;
15872  print("'.");
15873  end_diagnostic(false);
15874end;
15875
15876procedure font_mapping_warning(mappingNameP:void_pointer;
15877  mappingNameLen:integer;
15878  warningType:integer); { 0: just logging; 1: file not found; 2: can't load }
15879var
15880  i: integer;
15881begin
15882  begin_diagnostic;
15883  if warningType=0 then print_nl("Loaded mapping `")
15884  else print_nl("Font mapping `");
15885  print_utf8_str(mappingNameP, mappingNameLen);
15886  print("' for font `");
15887  i:=1;
15888  while ord(name_of_file[i]) <> 0 do begin
15889    print_visible_char(name_of_file[i]); { this is already UTF-8 }
15890    incr(i);
15891  end;
15892  case warningType of
15893    1: print("' not found.");
15894    2: begin print("' not usable;");
15895      print_nl("bad mapping file or incorrect mapping type.");
15896    end;
15897    othercases print("'.")
15898  endcases;
15899  end_diagnostic(false);
15900end;
15901
15902procedure graphite_warning;
15903var
15904  i: integer;
15905begin
15906  begin_diagnostic;
15907  print_nl("Font `");
15908  i:=1;
15909  while ord(name_of_file[i]) <> 0 do begin
15910    print_visible_char(name_of_file[i]); { this is already UTF-8 }
15911    incr(i);
15912  end;
15913  print("' does not support Graphite. Trying OpenType layout instead.");
15914  end_diagnostic(false);
15915end;
15916
15917function load_native_font(u: pointer; nom, aire:str_number; s: scaled): internal_font_number;
15918label
15919  done;
15920const
15921  first_math_fontdimen = 10;
15922var
15923  k, num_font_dimens: integer;
15924  font_engine: void_pointer;  {really an CFDictionaryRef or XeTeXLayoutEngine}
15925  actual_size: scaled;    {|s| converted to real size, if it was negative}
15926  p: pointer; {for temporary |native_char| node we'll create}
15927  ascent, descent, font_slant, x_ht, cap_ht: scaled;
15928  f: internal_font_number;
15929  full_name: str_number;
15930begin
15931  { on entry here, the full name is packed into |name_of_file| in UTF8 form }
15932
15933  load_native_font:=null_font;
15934
15935  font_engine:=find_native_font(name_of_file + 1, s);
15936  if font_engine = 0 then goto done;
15937
15938  if s>=0 then
15939    actual_size:=s
15940  else begin
15941    if (s <> -1000) then
15942      actual_size:=xn_over_d(loaded_font_design_size,-s,1000)
15943    else
15944      actual_size:=loaded_font_design_size;
15945  end;
15946
15947  { look again to see if the font is already loaded, now that we know its canonical name }
15948  str_room(name_length);
15949  for k:=1 to name_length do
15950    append_char(name_of_file[k]);
15951  full_name:=make_string; { not |slow_make_string| because we'll flush it if the font was already loaded }
15952
15953  for f:=font_base+1 to font_ptr do
15954    if (font_area[f] = native_font_type_flag) and str_eq_str(font_name[f], full_name) and (font_size[f] = actual_size) then begin
15955      release_font_engine(font_engine, native_font_type_flag);
15956      flush_string;
15957      load_native_font:=f;
15958      goto done;
15959    end;
15960
15961  if (native_font_type_flag = otgr_font_flag) and isOpenTypeMathFont(font_engine) then
15962    num_font_dimens:=first_math_fontdimen + lastMathConstant
15963  else
15964    num_font_dimens:=8;
15965
15966  if (font_ptr = font_max) or (fmem_ptr + num_font_dimens > font_mem_size) then begin
15967    @<Apologize for not loading the font, |goto done|@>;
15968  end;
15969
15970  { we've found a valid installed font, and have room }
15971  incr(font_ptr);
15972  font_area[font_ptr]:=native_font_type_flag; { set by |find_native_font| to either |aat_font_flag| or |ot_font_flag| }
15973
15974  { store the canonical name }
15975  font_name[font_ptr]:=full_name;
15976
15977  font_check[font_ptr].b0:=0;
15978  font_check[font_ptr].b1:=0;
15979  font_check[font_ptr].b2:=0;
15980  font_check[font_ptr].b3:=0;
15981  font_glue[font_ptr]:=null;
15982  font_dsize[font_ptr]:=loaded_font_design_size;
15983  font_size[font_ptr]:=actual_size;
15984
15985  if (native_font_type_flag = aat_font_flag) then begin
15986    aat_get_font_metrics(font_engine, addressof(ascent), addressof(descent),
15987                         addressof(x_ht), addressof(cap_ht), addressof(font_slant))
15988  end else begin
15989    ot_get_font_metrics(font_engine, addressof(ascent), addressof(descent),
15990                        addressof(x_ht), addressof(cap_ht), addressof(font_slant));
15991  end;
15992
15993  height_base[font_ptr]:=ascent;
15994  depth_base[font_ptr]:=-descent;
15995
15996  font_params[font_ptr]:=num_font_dimens; {we add an extra \.{\\fontdimen8} |:= cap_height|; then OT math fonts have a bunch more}
15997  font_bc[font_ptr]:=0;
15998  font_ec[font_ptr]:=65535;
15999  font_used[font_ptr]:=false;
16000  hyphen_char[font_ptr]:=default_hyphen_char;
16001  skew_char[font_ptr]:=default_skew_char;
16002  param_base[font_ptr]:=fmem_ptr-1;
16003
16004  font_layout_engine[font_ptr]:=font_engine;
16005  font_mapping[font_ptr]:=0; {don't use the mapping, if any, when measuring space here}
16006  font_letter_space[font_ptr]:=loaded_font_letter_space;
16007
16008@# {measure the width of the space character and set up font parameters}
16009  p:=new_native_character(font_ptr, " ");
16010  s:=width(p) + loaded_font_letter_space;
16011  free_node(p, native_size(p));
16012
16013  font_info[fmem_ptr].sc:=font_slant;                           {|slant|}
16014  incr(fmem_ptr);
16015  font_info[fmem_ptr].sc:=s;                                    {|space| = width of space character}
16016  incr(fmem_ptr);
16017  font_info[fmem_ptr].sc:=s div 2;                              {|space_stretch| = 1/2 * space}
16018  incr(fmem_ptr);
16019  font_info[fmem_ptr].sc:=s div 3;                              {|space_shrink| = 1/3 * space}
16020  incr(fmem_ptr);
16021  font_info[fmem_ptr].sc:=x_ht;                                 {|x_height|}
16022  incr(fmem_ptr);
16023  font_info[fmem_ptr].sc:=font_size[font_ptr];                  {|quad| = font size}
16024  incr(fmem_ptr);
16025  font_info[fmem_ptr].sc:=s div 3;                              {|extra_space| = 1/3 * space}
16026  incr(fmem_ptr);
16027  font_info[fmem_ptr].sc:=cap_ht;                               {|cap_height|}
16028  incr(fmem_ptr);
16029
16030  if num_font_dimens = first_math_fontdimen + lastMathConstant then begin
16031    font_info[fmem_ptr].int:=num_font_dimens; { \.{\\fontdimen9} |:=| number of assigned fontdimens }
16032    incr(fmem_ptr);
16033    for k:=0 to lastMathConstant do begin
16034      font_info[fmem_ptr].sc:=get_ot_math_constant(font_ptr, k);
16035      incr(fmem_ptr);
16036    end;
16037  end;
16038
16039  font_mapping[font_ptr]:=loaded_font_mapping;
16040  font_flags[font_ptr]:=loaded_font_flags;
16041
16042  load_native_font:=font_ptr;
16043done:
16044end;
16045
16046procedure do_locale_linebreaks(s: integer; len: integer);
16047var
16048  offs, prevOffs, i: integer;
16049  use_penalty, use_skip: boolean;
16050begin
16051  if (XeTeX_linebreak_locale = 0) or (len = 1) then begin
16052    link(tail):=new_native_word_node(main_f, len);
16053    tail:=link(tail);
16054    for i:=0 to len - 1 do
16055      set_native_char(tail, i, native_text[s + i]);
16056    set_native_metrics(tail, XeTeX_use_glyph_metrics);
16057  end else begin
16058    use_skip:=XeTeX_linebreak_skip <> zero_glue;
16059    use_penalty:=XeTeX_linebreak_penalty <> 0 or not use_skip;
16060    linebreak_start(main_f, XeTeX_linebreak_locale, native_text + s, len);
16061    offs:=0;
16062    repeat
16063      prevOffs:=offs;
16064      offs:=linebreak_next;
16065      if offs > 0 then begin
16066        if prevOffs <> 0 then begin
16067          if use_penalty then
16068            tail_append(new_penalty(XeTeX_linebreak_penalty));
16069          if use_skip then
16070            tail_append(new_param_glue(XeTeX_linebreak_skip_code));
16071        end;
16072        link(tail):=new_native_word_node(main_f, offs - prevOffs);
16073        tail:=link(tail);
16074        for i:=prevOffs to offs - 1 do
16075          set_native_char(tail, i - prevOffs, native_text[s + i]);
16076        set_native_metrics(tail, XeTeX_use_glyph_metrics);
16077      end;
16078    until offs < 0;
16079  end
16080end;
16081
16082procedure bad_utf8_warning;
16083begin
16084  begin_diagnostic;
16085  print_nl("Invalid UTF-8 byte or sequence");
16086  if terminal_input then print(" in terminal input")
16087  else begin
16088    print(" at line ");
16089    print_int(line);
16090  end;
16091  print(" replaced by U+FFFD.");
16092  end_diagnostic(false);
16093end;
16094
16095function get_input_normalization_state: integer;
16096begin
16097  if eqtb=nil then get_input_normalization_state:=0 { may be called before eqtb is initialized }
16098  else get_input_normalization_state:=XeTeX_input_normalization_state;
16099end;
16100
16101function get_tracing_fonts_state: integer;
16102begin
16103  get_tracing_fonts_state:=XeTeX_tracing_fonts_state;
16104end;
16105
16106@ We also need to compute the change in style between mlists and their
16107subsidiaries. The following macros define the subsidiary style for
16108an overlined nucleus (|cramped_style|), for a subscript or a superscript
16109(|sub_style| or |sup_style|), or for a numerator or denominator (|num_style|
16110or |denom_style|).
16111
16112@d cramped_style(#)==2*(# div 2)+cramped {cramp the style}
16113@d sub_style(#)==2*(# div 4)+script_style+cramped {smaller and cramped}
16114@d sup_style(#)==2*(# div 4)+script_style+(# mod 2) {smaller}
16115@d num_style(#)==#+2-2*(# div 6) {smaller unless already script-script}
16116@d denom_style(#)==2*(# div 2)+cramped+2-2*(# div 6) {smaller, cramped}
16117
16118@ When the style changes, the following piece of program computes associated
16119information:
16120
16121@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>=
16122begin if cur_style<script_style then cur_size:=text_size
16123else cur_size:=script_size*((cur_style-text_style) div 2);
16124cur_mu:=x_over_n(math_quad(cur_size),18);
16125end
16126
16127@ Here is a function that returns a pointer to a rule node having a given
16128thickness |t|. The rule will extend horizontally to the boundary of the vlist
16129that eventually contains it.
16130
16131@p function fraction_rule(@!t:scaled):pointer;
16132  {construct the bar for a fraction}
16133var p:pointer; {the new node}
16134begin p:=new_rule; height(p):=t; depth(p):=0; fraction_rule:=p;
16135end;
16136
16137@ The |overbar| function returns a pointer to a vlist box that consists of
16138a given box |b|, above which has been placed a kern of height |k| under a
16139fraction rule of thickness |t| under additional space of height |t|.
16140
16141@p function overbar(@!b:pointer;@!k,@!t:scaled):pointer;
16142var p,@!q:pointer; {nodes being constructed}
16143begin p:=new_kern(k); link(p):=b; q:=fraction_rule(t); link(q):=p;
16144p:=new_kern(t); link(p):=q; overbar:=vpack(p,natural);
16145end;
16146
16147@ The |var_delimiter| function, which finds or constructs a sufficiently
16148large delimiter, is the most interesting of the auxiliary functions that
16149currently concern us. Given a pointer |d| to a delimiter field in some noad,
16150together with a size code |s| and a vertical distance |v|, this function
16151returns a pointer to a box that contains the smallest variant of |d| whose
16152height plus depth is |v| or more. (And if no variant is large enough, it
16153returns the largest available variant.) In particular, this routine will
16154construct arbitrarily large delimiters from extensible components, if
16155|d| leads to such characters.
16156
16157The value returned is a box whose |shift_amount| has been set so that
16158the box is vertically centered with respect to the axis in the given size.
16159If a built-up symbol is returned, the height of the box before shifting
16160will be the height of its topmost component.
16161
16162@p@t\4@>@<Declare subprocedures for |var_delimiter|@>
16163procedure stack_glyph_into_box(@!b:pointer;@!f:internal_font_number;@!g:integer);
16164var p,q:pointer;
16165begin
16166  p:=get_node(glyph_node_size);
16167  type(p):=whatsit_node; subtype(p):=glyph_node;
16168  native_font(p):=f; native_glyph(p):=g;
16169  set_native_glyph_metrics(p, 1);
16170  if type(b)=hlist_node then begin
16171    q:=list_ptr(b);
16172    if q=null then list_ptr(b):=p else begin
16173      while link(q)<>null do q:=link(q);
16174      link(q):=p;
16175      if (height(b) < height(p)) then height(b):=height(p);
16176      if (depth(b) < depth(p)) then depth(b):=depth(p);
16177    end;
16178  end else begin
16179    link(p):=list_ptr(b); list_ptr(b):=p;
16180    height(b):=height(p);
16181    if (width(b) < width(p)) then width(b):=width(p);
16182  end;
16183end;
16184
16185procedure stack_glue_into_box(@!b:pointer;@!min,max:scaled);
16186var p,q:pointer;
16187begin
16188  q:=new_spec(zero_glue);
16189  width(q):=min;
16190  stretch(q):=max-min;
16191  p:=new_glue(q);
16192  if type(b)=hlist_node then begin
16193    q:=list_ptr(b);
16194    if q=null then list_ptr(b):=p else begin
16195      while link(q)<>null do q:=link(q);
16196      link(q):=p;
16197    end;
16198  end else begin
16199    link(p):=list_ptr(b); list_ptr(b):=p;
16200    height(b):=height(p); width(b):=width(p);
16201  end;
16202end;
16203
16204function build_opentype_assembly(@!f:internal_font_number;@!a:void_pointer;@!s:scaled;@!horiz:boolean):pointer;
16205  {return a box with height/width at least |s|, using font |f|, with glyph assembly info from |a|}
16206var
16207  b:pointer; {the box we're constructing}
16208  n:integer; {the number of repetitions of each extender}
16209  i,j:integer; {indexes}
16210  g:integer; {glyph code}
16211  p:pointer; {temp pointer}
16212  s_max,o,oo,prev_o,min_o:scaled;
16213  no_extenders: boolean;
16214  nat,str:scaled; {natural size, stretch}
16215begin
16216  b:=new_null_box;
16217  if horiz then
16218    type(b):=hlist_node
16219  else
16220    type(b):=vlist_node;
16221
16222  {figure out how many repeats of each extender to use}
16223  n:=-1;
16224  no_extenders:=true;
16225  min_o:=ot_min_connector_overlap(f);
16226  repeat
16227    n:=n+1;
16228    {calc max possible size with this number of extenders}
16229    s_max:=0;
16230    prev_o:=0;
16231    for i:=0 to ot_part_count(a)-1 do begin
16232      if ot_part_is_extender(a, i) then begin
16233        no_extenders:=false;
16234        for j:=1 to n do begin
16235          o:=ot_part_start_connector(f, a, i);
16236          if min_o<o then o:=min_o;
16237          if prev_o<o then o:=prev_o;
16238          s_max:=s_max-o+ot_part_full_advance(f, a, i);
16239          prev_o:=ot_part_end_connector(f, a, i);
16240        end
16241      end else begin
16242        o:=ot_part_start_connector(f, a, i);
16243        if min_o<o then o:=min_o;
16244        if prev_o<o then o:=prev_o;
16245        s_max:=s_max-o+ot_part_full_advance(f, a, i);
16246        prev_o:=ot_part_end_connector(f, a, i);
16247      end;
16248    end;
16249  until (s_max>=s) or no_extenders;
16250
16251  {assemble box using |n| copies of each extender,
16252   with appropriate glue wherever an overlap occurs}
16253  prev_o:=0;
16254  for i:=0 to ot_part_count(a)-1 do begin
16255    if ot_part_is_extender(a, i) then begin
16256      for j:=1 to n do begin
16257        o:=ot_part_start_connector(f, a, i);
16258        if prev_o<o then o:=prev_o;
16259        oo:=o; {max overlap}
16260        if min_o<o then o:=min_o;
16261        if oo>0 then stack_glue_into_box(b, -oo, -o);
16262        g:=ot_part_glyph(a, i);
16263        stack_glyph_into_box(b, f, g);
16264        prev_o:=ot_part_end_connector(f, a, i);
16265      end
16266    end else begin
16267      o:=ot_part_start_connector(f, a, i);
16268      if prev_o<o then o:=prev_o;
16269      oo:=o; {max overlap}
16270      if min_o<o then o:=min_o;
16271      if oo>0 then stack_glue_into_box(b, -oo, -o);
16272      g:=ot_part_glyph(a, i);
16273      stack_glyph_into_box(b, f, g);
16274      prev_o:=ot_part_end_connector(f, a, i);
16275    end;
16276  end;
16277
16278  {find natural size and total stretch of the box}
16279  p:=list_ptr(b); nat:=0; str:=0;
16280  while p<>null do begin
16281    if type(p)=whatsit_node then begin
16282      if horiz then
16283        nat:=nat+width(p)
16284      else
16285        nat:=nat+height(p)+depth(p);
16286    end else if type(p)=glue_node then begin
16287      nat:=nat+width(glue_ptr(p));
16288      str:=str+stretch(glue_ptr(p));
16289    end;
16290    p:=link(p);
16291  end;
16292
16293  {set glue so as to stretch the connections if needed}
16294  o:=0;
16295  if (s>nat) and (str>0) then begin
16296    o:=(s-nat);
16297    {don't stretch more than |str|}
16298    if (o>str) then o:=str;
16299    glue_order(b):=normal; glue_sign(b):=stretching;
16300    glue_set(b):=unfloat(o/str);
16301    if horiz then
16302      width(b):= nat+round(str*float(glue_set(b)))
16303    else
16304      height(b):=nat+round(str*float(glue_set(b)));
16305  end else
16306    if horiz then
16307      width(b):=nat
16308    else
16309      height(b):=nat;
16310
16311  build_opentype_assembly:=b;
16312end;
16313
16314function var_delimiter(@!d:pointer;@!s:integer;@!v:scaled):pointer;
16315label found,continue;
16316var b:pointer; {the box that will be constructed}
16317ot_assembly_ptr:void_pointer;
16318@!f,@!g: internal_font_number; {best-so-far and tentative font codes}
16319@!c,@!x,@!y: quarterword; {best-so-far and tentative character codes}
16320@!m,@!n: integer; {the number of extensible pieces}
16321@!u: scaled; {height-plus-depth of a tentative character}
16322@!w: scaled; {largest height-plus-depth so far}
16323@!q: four_quarters; {character info}
16324@!hd: eight_bits; {height-depth byte}
16325@!r: four_quarters; {extensible pieces}
16326@!z: integer; {runs through font family members}
16327@!large_attempt: boolean; {are we trying the ``large'' variant?}
16328begin f:=null_font; w:=0; large_attempt:=false;
16329z:=small_fam(d); x:=small_char(d);
16330ot_assembly_ptr:=nil;
16331loop@+  begin @<Look at the variants of |(z,x)|; set |f| and |c| whenever
16332    a better character is found; |goto found| as soon as a
16333    large enough variant is encountered@>;
16334  if large_attempt then goto found; {there were none large enough}
16335  large_attempt:=true; z:=large_fam(d); x:=large_char(d);
16336  end;
16337found: if f<>null_font then begin
16338  if not is_ot_font(f) then
16339    @<Make variable |b| point to a box for |(f,c)|@>
16340  else begin
16341    {for OT fonts, c is the glyph ID to use}
16342    if ot_assembly_ptr<>nil then
16343      b:=build_opentype_assembly(f, ot_assembly_ptr, v, 0)
16344    else begin
16345      b:=new_null_box; type(b):=vlist_node; list_ptr(b):=get_node(glyph_node_size);
16346      type(list_ptr(b)):=whatsit_node; subtype(list_ptr(b)):=glyph_node;
16347      native_font(list_ptr(b)):=f; native_glyph(list_ptr(b)):=c;
16348      set_native_glyph_metrics(list_ptr(b), 1);
16349      width(b):=width(list_ptr(b));
16350      height(b):=height(list_ptr(b));
16351      depth(b):=depth(list_ptr(b));
16352    end
16353  end
16354end else  begin b:=new_null_box;
16355  width(b):=null_delimiter_space; {use this width if no delimiter was found}
16356  end;
16357shift_amount(b):=half(height(b)-depth(b)) - axis_height(s);
16358var_delimiter:=b;
16359end;
16360
16361@ The search process is complicated slightly by the facts that some of the
16362characters might not be present in some of the fonts, and they might not
16363be probed in increasing order of height.
16364
16365@<Look at the variants of |(z,x)|; set |f| and |c|...@>=
16366if (z<>0)or(x<>min_quarterword) then
16367  begin z:=z+s+script_size;
16368  repeat z:=z-script_size; g:=fam_fnt(z);
16369  if g<>null_font then
16370    @<Look at the list of characters starting with |x| in
16371      font |g|; set |f| and |c| whenever
16372      a better character is found; |goto found| as soon as a
16373      large enough variant is encountered@>;
16374  until z<script_size;
16375  end
16376
16377@ @<Look at the list of characters starting with |x|...@>=
16378if is_ot_font(g) then begin
16379  b:=new_native_character(g, x);
16380  x:=get_native_glyph(b, 0);
16381  free_node(b, native_size(b));
16382  f:=g; c:=x; w:=0; n:=0;
16383  repeat
16384    y:=get_ot_math_variant(g, x, n, addressof(u), 0);
16385    if u>w then begin
16386      c:=y; w:=u;
16387      if u>=v then goto found;
16388    end;
16389    n:=n+1;
16390  until u<0;
16391  {if we get here, then we didn't find a big enough glyph; check if the char is extensible}
16392  ot_assembly_ptr:=get_ot_assembly_ptr(g, x, 0);
16393  if ot_assembly_ptr<>nil then goto found;
16394end else
16395begin y:=x;
16396if (qo(y)>=font_bc[g])and(qo(y)<=font_ec[g]) then
16397  begin continue: q:=char_info(g)(y);
16398  if char_exists(q) then
16399    begin if char_tag(q)=ext_tag then
16400      begin f:=g; c:=y; goto found;
16401      end;
16402    hd:=height_depth(q);
16403    u:=char_height(g)(hd)+char_depth(g)(hd);
16404    if u>w then
16405      begin f:=g; c:=y; w:=u;
16406      if u>=v then goto found;
16407      end;
16408    if char_tag(q)=list_tag then
16409      begin y:=rem_byte(q); goto continue;
16410      end;
16411    end;
16412  end;
16413end
16414
16415@ Here is a subroutine that creates a new box, whose list contains a
16416single character, and whose width includes the italic correction for
16417that character. The height or depth of the box will be negative, if
16418the height or depth of the character is negative; thus, this routine
16419may deliver a slightly different result than |hpack| would produce.
16420
16421@<Declare subprocedures for |var_delimiter|@>=
16422function char_box(@!f:internal_font_number;@!c:integer):pointer;
16423var q:four_quarters;
16424@!hd:eight_bits; {|height_depth| byte}
16425@!b,@!p:pointer; {the new box and its character node}
16426begin
16427if is_native_font(f) then begin
16428  b:=new_null_box;
16429  p:=new_native_character(f, c);
16430  list_ptr(b):=p;
16431  height(b):=height(p); width(b):=width(p);
16432  if depth(p)<0 then depth(b):=0 else depth(b):=depth(p);
16433  end
16434else begin
16435  q:=char_info(f)(c); hd:=height_depth(q);
16436  b:=new_null_box; width(b):=char_width(f)(q)+char_italic(f)(q);
16437  height(b):=char_height(f)(hd); depth(b):=char_depth(f)(hd);
16438  p:=get_avail; character(p):=c; font(p):=f;
16439  end;
16440list_ptr(b):=p; char_box:=b;
16441end;
16442
16443@ When the following code is executed, |char_tag(q)| will be equal to
16444|ext_tag| if and only if a built-up symbol is supposed to be returned.
16445
16446@<Make variable |b| point to a box for |(f,c)|@>=
16447if char_tag(q)=ext_tag then
16448  @<Construct an extensible character in a new box |b|,
16449    using recipe |rem_byte(q)| and font |f|@>
16450else b:=char_box(f,c)
16451
16452@ When we build an extensible character, it's handy to have the
16453following subroutine, which puts a given character on top
16454of the characters already in box |b|:
16455
16456@<Declare subprocedures for |var_delimiter|@>=
16457procedure stack_into_box(@!b:pointer;@!f:internal_font_number;
16458  @!c:quarterword);
16459var p:pointer; {new node placed into |b|}
16460begin p:=char_box(f,c); link(p):=list_ptr(b); list_ptr(b):=p;
16461height(b):=height(p);
16462end;
16463
16464@ Another handy subroutine computes the height plus depth of
16465a given character:
16466
16467@<Declare subprocedures for |var_delimiter|@>=
16468function height_plus_depth(@!f:internal_font_number;@!c:quarterword):scaled;
16469var q:four_quarters;
16470@!hd:eight_bits; {|height_depth| byte}
16471begin q:=char_info(f)(c); hd:=height_depth(q);
16472height_plus_depth:=char_height(f)(hd)+char_depth(f)(hd);
16473end;
16474
16475@ @<Construct an extensible...@>=
16476begin b:=new_null_box;
16477type(b):=vlist_node;
16478r:=font_info[exten_base[f]+rem_byte(q)].qqqq;@/
16479@<Compute the minimum suitable height, |w|, and the corresponding
16480  number of extension steps, |n|; also set |width(b)|@>;
16481c:=ext_bot(r);
16482if c<>min_quarterword then stack_into_box(b,f,c);
16483c:=ext_rep(r);
16484for m:=1 to n do stack_into_box(b,f,c);
16485c:=ext_mid(r);
16486if c<>min_quarterword then
16487  begin stack_into_box(b,f,c); c:=ext_rep(r);
16488  for m:=1 to n do stack_into_box(b,f,c);
16489  end;
16490c:=ext_top(r);
16491if c<>min_quarterword then stack_into_box(b,f,c);
16492depth(b):=w-height(b);
16493end
16494
16495@ The width of an extensible character is the width of the repeatable
16496module. If this module does not have positive height plus depth,
16497we don't use any copies of it, otherwise we use as few as possible
16498(in groups of two if there is a middle part).
16499
16500@<Compute the minimum suitable height, |w|, and...@>=
16501c:=ext_rep(r); u:=height_plus_depth(f,c);
16502w:=0; q:=char_info(f)(c); width(b):=char_width(f)(q)+char_italic(f)(q);@/
16503c:=ext_bot(r);@+if c<>min_quarterword then w:=w+height_plus_depth(f,c);
16504c:=ext_mid(r);@+if c<>min_quarterword then w:=w+height_plus_depth(f,c);
16505c:=ext_top(r);@+if c<>min_quarterword then w:=w+height_plus_depth(f,c);
16506n:=0;
16507if u>0 then while w<v do
16508  begin w:=w+u; incr(n);
16509  if ext_mid(r)<>min_quarterword then w:=w+u;
16510  end
16511
16512@ The next subroutine is much simpler; it is used for numerators and
16513denominators of fractions as well as for displayed operators and
16514their limits above and below. It takes a given box~|b| and
16515changes it so that the new box is centered in a box of width~|w|.
16516The centering is done by putting \.{\\hss} glue at the left and right
16517of the list inside |b|, then packaging the new box; thus, the
16518actual box might not really be centered, if it already contains
16519infinite glue.
16520
16521The given box might contain a single character whose italic correction
16522has been added to the width of the box; in this case a compensating
16523kern is inserted.
16524
16525@p function rebox(@!b:pointer;@!w:scaled):pointer;
16526var p:pointer; {temporary register for list manipulation}
16527@!f:internal_font_number; {font in a one-character box}
16528@!v:scaled; {width of a character without italic correction}
16529begin if (width(b)<>w)and(list_ptr(b)<>null) then
16530  begin if type(b)=vlist_node then b:=hpack(b,natural);
16531  p:=list_ptr(b);
16532  if (is_char_node(p))and(link(p)=null) then
16533    begin f:=font(p); v:=char_width(f)(char_info(f)(character(p)));
16534    if v<>width(b) then link(p):=new_kern(width(b)-v);
16535    end;
16536  free_node(b,box_node_size);
16537  b:=new_glue(ss_glue); link(b):=p;
16538  while link(p)<>null do p:=link(p);
16539  link(p):=new_glue(ss_glue);
16540  rebox:=hpack(b,w,exactly);
16541  end
16542else  begin width(b):=w; rebox:=b;
16543  end;
16544end;
16545
16546@ Here is a subroutine that creates a new glue specification from another
16547one that is expressed in `\.{mu}', given the value of the math unit.
16548
16549@d mu_mult(#)==nx_plus_y(n,#,xn_over_d(#,f,@'200000))
16550
16551@p function math_glue(@!g:pointer;@!m:scaled):pointer;
16552var p:pointer; {the new glue specification}
16553@!n:integer; {integer part of |m|}
16554@!f:scaled; {fraction part of |m|}
16555begin n:=x_over_n(m,@'200000); f:=remainder;@/
16556if f<0 then
16557  begin decr(n); f:=f+@'200000;
16558  end;
16559p:=get_node(glue_spec_size);
16560width(p):=mu_mult(width(g)); {convert \.{mu} to \.{pt}}
16561stretch_order(p):=stretch_order(g);
16562if stretch_order(p)=normal then stretch(p):=mu_mult(stretch(g))
16563else stretch(p):=stretch(g);
16564shrink_order(p):=shrink_order(g);
16565if shrink_order(p)=normal then shrink(p):=mu_mult(shrink(g))
16566else shrink(p):=shrink(g);
16567math_glue:=p;
16568end;
16569
16570@ The |math_kern| subroutine removes |mu_glue| from a kern node, given
16571the value of the math unit.
16572
16573@p procedure math_kern(@!p:pointer;@!m:scaled);
16574var @!n:integer; {integer part of |m|}
16575@!f:scaled; {fraction part of |m|}
16576begin if subtype(p)=mu_glue then
16577  begin n:=x_over_n(m,@'200000); f:=remainder;@/
16578  if f<0 then
16579    begin decr(n); f:=f+@'200000;
16580    end;
16581  width(p):=mu_mult(width(p)); subtype(p):=explicit;
16582  end;
16583end;
16584
16585@ Sometimes it is necessary to destroy an mlist. The following
16586subroutine empties the current list, assuming that |abs(mode)=mmode|.
16587
16588@p procedure flush_math;
16589begin flush_node_list(link(head)); flush_node_list(incompleat_noad);
16590link(head):=null; tail:=head; incompleat_noad:=null;
16591end;
16592
16593@* \[36] Typesetting math formulas.
16594\TeX's most important routine for dealing with formulas is called
16595|mlist_to_hlist|.  After a formula has been scanned and represented as an
16596mlist, this routine converts it to an hlist that can be placed into a box
16597or incorporated into the text of a paragraph. There are three implicit
16598parameters, passed in global variables: |cur_mlist| points to the first
16599node or noad in the given mlist (and it might be |null|); |cur_style| is a
16600style code; and |mlist_penalties| is |true| if penalty nodes for potential
16601line breaks are to be inserted into the resulting hlist. After
16602|mlist_to_hlist| has acted, |link(temp_head)| points to the translated hlist.
16603
16604Since mlists can be inside mlists, the procedure is recursive. And since this
16605is not part of \TeX's inner loop, the program has been written in a manner
16606that stresses compactness over efficiency.
16607@^recursion@>
16608
16609@<Glob...@>=
16610@!cur_mlist:pointer; {beginning of mlist to be translated}
16611@!cur_style:small_number; {style code at current place in the list}
16612@!cur_size:integer; {size code corresponding to |cur_style|}
16613@!cur_mu:scaled; {the math unit width corresponding to |cur_size|}
16614@!mlist_penalties:boolean; {should |mlist_to_hlist| insert penalties?}
16615
16616@ The recursion in |mlist_to_hlist| is due primarily to a subroutine
16617called |clean_box| that puts a given noad field into a box using a given
16618math style; |mlist_to_hlist| can call |clean_box|, which can call
16619|mlist_to_hlist|.
16620@^recursion@>
16621
16622The box returned by |clean_box| is ``clean'' in the
16623sense that its |shift_amount| is zero.
16624
16625@p procedure@?mlist_to_hlist; forward;@t\2@>@/
16626function clean_box(@!p:pointer;@!s:small_number):pointer;
16627label found;
16628var q:pointer; {beginning of a list to be boxed}
16629@!save_style:small_number; {|cur_style| to be restored}
16630@!x:pointer; {box to be returned}
16631@!r:pointer; {temporary pointer}
16632begin case math_type(p) of
16633math_char: begin cur_mlist:=new_noad; mem[nucleus(cur_mlist)]:=mem[p];
16634  end;
16635sub_box: begin q:=info(p); goto found;
16636  end;
16637sub_mlist: cur_mlist:=info(p);
16638othercases begin q:=new_null_box; goto found;
16639  end
16640endcases;@/
16641save_style:=cur_style; cur_style:=s; mlist_penalties:=false;@/
16642mlist_to_hlist; q:=link(temp_head); {recursive call}
16643cur_style:=save_style; {restore the style}
16644@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>;
16645found: if is_char_node(q)or(q=null) then x:=hpack(q,natural)
16646  else if (link(q)=null)and(type(q)<=vlist_node)and(shift_amount(q)=0) then
16647    x:=q {it's already clean}
16648  else x:=hpack(q,natural);
16649@<Simplify a trivial box@>;
16650clean_box:=x;
16651end;
16652
16653@ Here we save memory space in a common case.
16654
16655@<Simplify a trivial box@>=
16656q:=list_ptr(x);
16657if is_char_node(q) then
16658  begin r:=link(q);
16659  if r<>null then if link(r)=null then if not is_char_node(r) then
16660   if type(r)=kern_node then {unneeded italic correction}
16661    begin free_node(r,small_node_size); link(q):=null;
16662    end;
16663  end
16664
16665@ It is convenient to have a procedure that converts a |math_char|
16666field to an ``unpacked'' form. The |fetch| routine sets |cur_f|, |cur_c|,
16667and |cur_i| to the font code, character code, and character information bytes of
16668a given noad field. It also takes care of issuing error messages for
16669nonexistent characters; in such cases, |char_exists(cur_i)| will be |false|
16670after |fetch| has acted, and the field will also have been reset to |empty|.
16671
16672@p procedure fetch(@!a:pointer); {unpack the |math_char| field |a|}
16673begin cur_c:=cast_to_ushort(character(a)); cur_f:=fam_fnt(fam(a)+cur_size);
16674cur_c:=cur_c + (plane_and_fam_field(a) div @"100) * @"10000;
16675if cur_f=null_font then
16676  @<Complain about an undefined family and set |cur_i| null@>
16677else if is_native_font(cur_f) then begin
16678  cur_i:=null_character;
16679end else begin if (qo(cur_c)>=font_bc[cur_f])and(qo(cur_c)<=font_ec[cur_f]) then
16680    cur_i:=char_info(cur_f)(cur_c)
16681  else cur_i:=null_character;
16682  if not(char_exists(cur_i)) then
16683    begin char_warning(cur_f,qo(cur_c));
16684    math_type(a):=empty;
16685    end;
16686  end;
16687end;
16688
16689@ @<Complain about an undefined family...@>=
16690begin print_err(""); print_size(cur_size); print_char(" ");
16691print_int(fam(a)); print(" is undefined (character ");
16692print_ASCII(qo(cur_c)); print_char(")");
16693help4("Somewhere in the math formula just ended, you used the")@/
16694("stated character from an undefined font family. For example,")@/
16695("plain TeX doesn't allow \it or \sl in subscripts. Proceed,")@/
16696("and I'll try to forget that I needed that character.");
16697error; cur_i:=null_character; math_type(a):=empty;
16698end
16699
16700@ The outputs of |fetch| are placed in global variables.
16701
16702@<Glob...@>=
16703@!cur_f:internal_font_number; {the |font| field of a |math_char|}
16704@!cur_c:integer; {the |character| field of a |math_char|}
16705@!cur_i:four_quarters; {the |char_info| of a |math_char|,
16706  or a lig/kern instruction}
16707
16708@ We need to do a lot of different things, so |mlist_to_hlist| makes two
16709passes over the given mlist.
16710
16711The first pass does most of the processing: It removes ``mu'' spacing from
16712glue, it recursively evaluates all subsidiary mlists so that only the
16713top-level mlist remains to be handled, it puts fractions and square roots
16714and such things into boxes, it attaches subscripts and superscripts, and
16715it computes the overall height and depth of the top-level mlist so that
16716the size of delimiters for a |left_noad| and a |right_noad| will be known.
16717The hlist resulting from each noad is recorded in that noad's |new_hlist|
16718field, an integer field that replaces the |nucleus| or |thickness|.
16719@^recursion@>
16720
16721The second pass eliminates all noads and inserts the correct glue and
16722penalties between nodes.
16723
16724@d new_hlist(#)==mem[nucleus(#)].int {the translation of an mlist}
16725
16726@ Here is the overall plan of |mlist_to_hlist|, and the list of its
16727local variables.
16728
16729@d done_with_noad=80 {go here when a noad has been fully translated}
16730@d done_with_node=81 {go here when a node has been fully converted}
16731@d check_dimensions=82 {go here to update |max_h| and |max_d|}
16732@d delete_q=83 {go here to delete |q| and move to the next node}
16733
16734@p@t\4@>@<Declare math construction procedures@>
16735procedure mlist_to_hlist;
16736label reswitch, check_dimensions, done_with_noad, done_with_node, delete_q,
16737  done;
16738var mlist:pointer; {beginning of the given list}
16739@!penalties:boolean; {should penalty nodes be inserted?}
16740@!style:small_number; {the given style}
16741@!save_style:small_number; {holds |cur_style| during recursion}
16742@!q:pointer; {runs through the mlist}
16743@!r:pointer; {the most recent noad preceding |q|}
16744@!r_type:small_number; {the |type| of noad |r|, or |op_noad| if |r=null|}
16745@!t:small_number; {the effective |type| of noad |q| during the second pass}
16746@!p,@!x,@!y,@!z: pointer; {temporary registers for list construction}
16747@!pen:integer; {a penalty to be inserted}
16748@!s:small_number; {the size of a noad to be deleted}
16749@!max_h,@!max_d:scaled; {maximum height and depth of the list translated so far}
16750@!delta:scaled; {offset between subscript and superscript}
16751begin mlist:=cur_mlist; penalties:=mlist_penalties;
16752style:=cur_style; {tuck global parameters away as local variables}
16753q:=mlist; r:=null; r_type:=op_noad; max_h:=0; max_d:=0;
16754@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>;
16755while q<>null do @<Process node-or-noad |q| as much as possible in preparation
16756    for the second pass of |mlist_to_hlist|, then move to the next
16757    item in the mlist@>;
16758@<Convert \(a)a final |bin_noad| to an |ord_noad|@>;
16759@<Make a second pass over the mlist, removing all noads and inserting the
16760  proper spacing and penalties@>;
16761end;
16762
16763@ We use the fact that no character nodes appear in an mlist, hence
16764the field |type(q)| is always present.
16765
16766@<Process node-or-noad...@>=
16767begin @<Do first-pass processing based on |type(q)|; |goto done_with_noad|
16768  if a noad has been fully processed, |goto check_dimensions| if it
16769  has been translated into |new_hlist(q)|, or |goto done_with_node|
16770  if a node has been fully processed@>;
16771check_dimensions: z:=hpack(new_hlist(q),natural);
16772if height(z)>max_h then max_h:=height(z);
16773if depth(z)>max_d then max_d:=depth(z);
16774free_node(z,box_node_size);
16775done_with_noad: r:=q; r_type:=type(r);
16776if r_type=right_noad then
16777  begin r_type:=left_noad; cur_style:=style; @<Set up the values...@>;
16778  end;
16779done_with_node: q:=link(q);
16780end
16781
16782@ One of the things we must do on the first pass is change a |bin_noad| to
16783an |ord_noad| if the |bin_noad| is not in the context of a binary operator.
16784The values of |r| and |r_type| make this fairly easy.
16785
16786@<Do first-pass processing...@>=
16787reswitch: delta:=0;
16788case type(q) of
16789bin_noad: case r_type of
16790  bin_noad,op_noad,rel_noad,open_noad,punct_noad,left_noad:
16791    begin type(q):=ord_noad; goto reswitch;
16792    end;
16793  othercases do_nothing
16794  endcases;
16795rel_noad,close_noad,punct_noad,right_noad: begin@t@>@;@/
16796  @<Convert \(a)a final |bin_noad| to an |ord_noad|@>;
16797  if type(q)=right_noad then goto done_with_noad;
16798  end;
16799@t\4@>@<Cases for noads that can follow a |bin_noad|@>@;
16800@t\4@>@<Cases for nodes that can appear in an mlist, after which we
16801  |goto done_with_node|@>@;
16802othercases confusion("mlist1")
16803@:this can't happen mlist1}{\quad mlist1@>
16804endcases;@/
16805@<Convert \(n)|nucleus(q)| to an hlist and attach the sub/superscripts@>
16806
16807@ @<Convert \(a)a final |bin_noad| to an |ord_noad|@>=
16808if r_type=bin_noad then type(r):=ord_noad
16809
16810@ @<Cases for nodes that can appear in an mlist...@>=
16811style_node: begin cur_style:=subtype(q);
16812  @<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>;
16813  goto done_with_node;
16814  end;
16815choice_node: @<Change this node to a style node followed by the correct choice,
16816   then |goto done_with_node|@>;
16817ins_node,mark_node,adjust_node,
16818  whatsit_node,penalty_node,disc_node: goto done_with_node;
16819rule_node: begin if height(q)>max_h then max_h:=height(q);
16820  if depth(q)>max_d then max_d:=depth(q); goto done_with_node;
16821  end;
16822glue_node: begin @<Convert \(m)math glue to ordinary glue@>;
16823  goto done_with_node;
16824  end;
16825kern_node: begin math_kern(q,cur_mu); goto done_with_node;
16826  end;
16827
16828@ @d choose_mlist(#)==begin p:=#(q); #(q):=null;@+end
16829
16830@<Change this node to a style node...@>=
16831begin case cur_style div 2 of
168320: choose_mlist(display_mlist); {|display_style=0|}
168331: choose_mlist(text_mlist); {|text_style=2|}
168342: choose_mlist(script_mlist); {|script_style=4|}
168353: choose_mlist(script_script_mlist); {|script_script_style=6|}
16836end; {there are no other cases}
16837flush_node_list(display_mlist(q));
16838flush_node_list(text_mlist(q));
16839flush_node_list(script_mlist(q));
16840flush_node_list(script_script_mlist(q));@/
16841type(q):=style_node; subtype(q):=cur_style; width(q):=0; depth(q):=0;
16842if p<>null then
16843  begin z:=link(q); link(q):=p;
16844  while link(p)<>null do p:=link(p);
16845  link(p):=z;
16846  end;
16847goto done_with_node;
16848end
16849
16850@ Conditional math glue (`\.{\\nonscript}') results in a |glue_node|
16851pointing to |zero_glue|, with |subtype(q)=cond_math_glue|; in such a case
16852the node following will be eliminated if it is a glue or kern node and if the
16853current size is different from |text_size|. Unconditional math glue
16854(`\.{\\muskip}') is converted to normal glue by multiplying the dimensions
16855by |cur_mu|.
16856@!@:non_script_}{\.{\\nonscript} primitive@>
16857
16858@<Convert \(m)math glue to ordinary glue@>=
16859if subtype(q)=mu_glue then
16860  begin x:=glue_ptr(q);
16861  y:=math_glue(x,cur_mu); delete_glue_ref(x); glue_ptr(q):=y;
16862  subtype(q):=normal;
16863  end
16864else if (cur_size<>text_size)and(subtype(q)=cond_math_glue) then
16865  begin p:=link(q);
16866  if p<>null then if (type(p)=glue_node)or(type(p)=kern_node) then
16867    begin link(q):=link(p); link(p):=null; flush_node_list(p);
16868    end;
16869  end
16870
16871@ @<Cases for noads that can follow a |bin_noad|@>=
16872left_noad: goto done_with_noad;
16873fraction_noad: begin make_fraction(q); goto check_dimensions;
16874  end;
16875op_noad: begin delta:=make_op(q);
16876  if subtype(q)=limits then goto check_dimensions;
16877  end;
16878ord_noad: make_ord(q);
16879open_noad,inner_noad: do_nothing;
16880radical_noad: make_radical(q);
16881over_noad: make_over(q);
16882under_noad: make_under(q);
16883accent_noad: make_math_accent(q);
16884vcenter_noad: make_vcenter(q);
16885
16886@ Most of the actual construction work of |mlist_to_hlist| is done
16887by procedures with names
16888like |make_fraction|, |make_radical|, etc. To illustrate
16889the general setup of such procedures, let's begin with a couple of
16890simple ones.
16891
16892@<Declare math...@>=
16893procedure make_over(@!q:pointer);
16894begin info(nucleus(q)):=@|
16895  overbar(clean_box(nucleus(q),cramped_style(cur_style)),@|
16896  3*default_rule_thickness,default_rule_thickness);
16897math_type(nucleus(q)):=sub_box;
16898end;
16899
16900@ @<Declare math...@>=
16901procedure make_under(@!q:pointer);
16902var p,@!x,@!y: pointer; {temporary registers for box construction}
16903@!delta:scaled; {overall height plus depth}
16904begin x:=clean_box(nucleus(q),cur_style);
16905p:=new_kern(3*default_rule_thickness); link(x):=p;
16906link(p):=fraction_rule(default_rule_thickness);
16907y:=vpack(x,natural);
16908delta:=height(y)+depth(y)+default_rule_thickness;
16909height(y):=height(x); depth(y):=delta-height(y);
16910info(nucleus(q)):=y; math_type(nucleus(q)):=sub_box;
16911end;
16912
16913@ @<Declare math...@>=
16914procedure make_vcenter(@!q:pointer);
16915var v:pointer; {the box that should be centered vertically}
16916@!delta:scaled; {its height plus depth}
16917begin v:=info(nucleus(q));
16918if type(v)<>vlist_node then confusion("vcenter");
16919@:this can't happen vcenter}{\quad vcenter@>
16920delta:=height(v)+depth(v);
16921height(v):=axis_height(cur_size)+half(delta);
16922depth(v):=delta-height(v);
16923end;
16924
16925@ According to the rules in the \.{DVI} file specifications, we ensure alignment
16926@^square roots@>
16927between a square root sign and the rule above its nucleus by assuming that the
16928baseline of the square-root symbol is the same as the bottom of the rule. The
16929height of the square-root symbol will be the thickness of the rule, and the
16930depth of the square-root symbol should exceed or equal the height-plus-depth
16931of the nucleus plus a certain minimum clearance~|clr|. The symbol will be
16932placed so that the actual clearance is |clr| plus half the excess.
16933
16934@<Declare math...@>=
16935procedure make_radical(@!q:pointer);
16936var x,@!y:pointer; {temporary registers for box construction}
16937f:internal_font_number;
16938rule_thickness:scaled; {rule thickness}
16939@!delta,@!clr:scaled; {dimensions involved in the calculation}
16940begin f:=fam_fnt(small_fam(left_delimiter(q)) + cur_size);
16941if is_new_mathfont(f) then rule_thickness:=get_ot_math_constant(f,radicalRuleThickness)
16942else rule_thickness:=default_rule_thickness;
16943x:=clean_box(nucleus(q),cramped_style(cur_style));
16944if is_new_mathfont(f) then begin
16945  if cur_style<text_style then {display style}
16946    clr:=get_ot_math_constant(f,radicalDisplayStyleVerticalGap)
16947  else clr:=get_ot_math_constant(f,radicalVerticalGap);
16948end else begin
16949  if cur_style<text_style then {display style}
16950    clr:=rule_thickness+(abs(math_x_height(cur_size)) div 4)
16951  else  begin clr:=rule_thickness; clr:=clr + (abs(clr) div 4);
16952    end;
16953end;
16954y:=var_delimiter(left_delimiter(q),cur_size,height(x)+depth(x)+clr+rule_thickness);
16955if is_new_mathfont(f) then begin
16956  depth(y):=height(y)+depth(y)-rule_thickness;
16957  height(y):=rule_thickness;
16958end;
16959delta:=depth(y)-(height(x)+depth(x)+clr);
16960if delta>0 then clr:=clr+half(delta); {increase the actual clearance}
16961shift_amount(y):=-(height(x)+clr);
16962link(y):=overbar(x,clr,height(y));
16963info(nucleus(q)):=hpack(y,natural); math_type(nucleus(q)):=sub_box;
16964end;
16965
16966@ Slants are not considered when placing accents in math mode. The accenter is
16967centered over the accentee, and the accent width is treated as zero with
16968respect to the size of the final box.
16969
16970@<Declare math...@>=
16971function compute_ot_math_accent_pos(@!p:pointer):scaled;
16972var
16973  @!q,@!r:pointer;
16974  @!s,@!g:scaled;
16975begin
16976  if (math_type(nucleus(p))=math_char) then begin
16977    fetch(nucleus(p));
16978    q:=new_native_character(cur_f, qo(cur_c));
16979    g:=get_native_glyph(q, 0);
16980    s:=get_ot_math_accent_pos(cur_f, g);
16981  end else begin
16982    if (math_type(nucleus(p))=sub_mlist) then begin
16983      r:=info(nucleus(p));
16984      if (r<>null) and (type(r)=accent_noad) then
16985        s:=compute_ot_math_accent_pos(r)
16986      else
16987        s:=@"7FFFFFFF;
16988    end else
16989      s:=@"7FFFFFFF;
16990  end;
16991  compute_ot_math_accent_pos:=s;
16992end;
16993
16994procedure make_math_accent(@!q:pointer);
16995label done,done1;
16996var p,@!x,@!y:pointer; {temporary registers for box construction}
16997@!a:integer; {address of lig/kern instruction}
16998@!c,@!g:integer; {accent character}
16999@!f:internal_font_number; {its font}
17000@!i:four_quarters; {its |char_info|}
17001@!s,@!sa:scaled; {amount to skew the accent to the right}
17002@!h:scaled; {height of character being accented}
17003@!delta:scaled; {space to remove between accent and accentee}
17004@!w,@!w2:scaled; {width of the accentee, not including sub/superscripts}
17005@!ot_assembly_ptr:void_pointer;
17006begin fetch(accent_chr(q));
17007x:=null;
17008if is_native_font(cur_f) then
17009  begin c:=cur_c; f:=cur_f;
17010  if not is_bottom_acc(q) then s:=compute_ot_math_accent_pos(q) else s:=0;
17011  x:=clean_box(nucleus(q),cramped_style(cur_style)); w:=width(x); h:=height(x);
17012  end
17013else if char_exists(cur_i) then
17014  begin i:=cur_i; c:=cur_c; f:=cur_f;@/
17015  @<Compute the amount of skew@>;
17016  x:=clean_box(nucleus(q),cramped_style(cur_style)); w:=width(x); h:=height(x);
17017  @<Switch to a larger accent if available and appropriate@>;
17018  end;
17019if x<>null then begin
17020  if is_new_mathfont(f) then
17021    if is_bottom_acc(q) then delta:=0
17022    else if h<get_ot_math_constant(f, accentBaseHeight) then delta:=h@+else delta:=get_ot_math_constant(f, accentBaseHeight)
17023  else
17024    if h<x_height(f) then delta:=h@+else delta:=x_height(f);
17025  if (math_type(supscr(q))<>empty)or(math_type(subscr(q))<>empty) then
17026    if math_type(nucleus(q))=math_char then
17027      @<Swap the subscript and superscript into box |x|@>;
17028  y:=char_box(f,c);
17029  if is_native_font(f) then begin
17030    {turn the |native_word| node into a |native_glyph| one}
17031    p:=get_node(glyph_node_size);
17032    type(p):=whatsit_node; subtype(p):=glyph_node;
17033    native_font(p):=f; native_glyph(p):=get_native_glyph(list_ptr(y), 0);
17034    set_native_glyph_metrics(p, 1);
17035    free_node(list_ptr(y), native_size(list_ptr(y)));
17036    list_ptr(y):=p;
17037
17038    @<Switch to a larger native-font accent if available and appropriate@>;
17039
17040    {determine horiz positioning}
17041    if is_glyph_node(p) then
17042      begin sa:=get_ot_math_accent_pos(f,native_glyph(p));
17043      if sa=@"7FFFFFFF then sa:=half(width(y));
17044      end
17045    else sa:=half(width(y));
17046    if is_bottom_acc(q) or (s=@"7FFFFFFF) then s:=half(w);
17047    shift_amount(y):=s-sa;
17048  end else
17049    shift_amount(y):=s+half(w-width(y));
17050  width(y):=0;
17051  if is_bottom_acc(q) then begin
17052    link(x):=y; y:=vpack(x,natural); shift_amount(y):=-(h - height(y));
17053  end else begin
17054    p:=new_kern(-delta); link(p):=x; link(y):=p; y:=vpack(y,natural);
17055    if height(y)<h then @<Make the height of box |y| equal to |h|@>;
17056  end;
17057  width(y):=width(x);
17058  info(nucleus(q)):=y;
17059  math_type(nucleus(q)):=sub_box;
17060  end;
17061end;
17062
17063@ @<Make the height of box |y|...@>=
17064begin p:=new_kern(h-height(y)); link(p):=list_ptr(y); list_ptr(y):=p;
17065height(y):=h;
17066end
17067
17068@ @<Switch to a larger native-font accent if available and appropriate@>=
17069  if odd(subtype(q)) then {non growing accent}
17070    set_native_glyph_metrics(p, 1)
17071  else begin
17072    c:=native_glyph(p);
17073    a:=0;
17074    repeat
17075      g:=get_ot_math_variant(f, c, a, addressof(w2), 1);
17076      if (w2>0) and (w2<=w) then begin
17077        native_glyph(p):=g;
17078        set_native_glyph_metrics(p, 1);
17079        incr(a);
17080      end;
17081    until (w2<0) or (w2>=w);
17082    if (w2<0) then begin
17083      ot_assembly_ptr:=get_ot_assembly_ptr(f, c, 1);
17084      if ot_assembly_ptr<>nil then begin
17085        free_node(p,glyph_node_size);
17086        p:=build_opentype_assembly(f, ot_assembly_ptr, w, 1);
17087        list_ptr(y):=p;
17088        goto found;
17089      end;
17090    end else
17091      set_native_glyph_metrics(p, 1);
17092  end;
17093found:
17094  width(y):=width(p); height(y):=height(p); depth(y):=depth(p);
17095  if is_bottom_acc(q) then begin
17096    if height(y)<0 then height(y):=0
17097  end else if depth(y)<0 then depth(y):=0;
17098
17099@ @<Switch to a larger accent if available and appropriate@>=
17100loop@+  begin if char_tag(i)<>list_tag then goto done;
17101  y:=rem_byte(i);
17102  i:=char_info(f)(y);
17103  if not char_exists(i) then goto done;
17104  if char_width(f)(i)>w then goto done;
17105  c:=y;
17106  end;
17107done:
17108
17109@ @<Compute the amount of skew@>=
17110s:=0;
17111if math_type(nucleus(q))=math_char then
17112  begin fetch(nucleus(q));
17113  if char_tag(cur_i)=lig_tag then
17114    begin a:=lig_kern_start(cur_f)(cur_i);
17115    cur_i:=font_info[a].qqqq;
17116    if skip_byte(cur_i)>stop_flag then
17117      begin a:=lig_kern_restart(cur_f)(cur_i);
17118      cur_i:=font_info[a].qqqq;
17119      end;
17120    loop@+ begin if qo(next_char(cur_i))=skew_char[cur_f] then
17121        begin if op_byte(cur_i)>=kern_flag then
17122          if skip_byte(cur_i)<=stop_flag then s:=char_kern(cur_f)(cur_i);
17123        goto done1;
17124        end;
17125      if skip_byte(cur_i)>=stop_flag then goto done1;
17126      a:=a+qo(skip_byte(cur_i))+1;
17127      cur_i:=font_info[a].qqqq;
17128      end;
17129    end;
17130  end;
17131done1:
17132
17133@ @<Swap the subscript and superscript into box |x|@>=
17134begin flush_node_list(x); x:=new_noad;
17135mem[nucleus(x)]:=mem[nucleus(q)];
17136mem[supscr(x)]:=mem[supscr(q)];
17137mem[subscr(x)]:=mem[subscr(q)];@/
17138mem[supscr(q)].hh:=empty_field;
17139mem[subscr(q)].hh:=empty_field;@/
17140math_type(nucleus(q)):=sub_mlist; info(nucleus(q)):=x;
17141x:=clean_box(nucleus(q),cur_style); delta:=delta+height(x)-h; h:=height(x);
17142end
17143
17144@ The |make_fraction| procedure is a bit different because it sets
17145|new_hlist(q)| directly rather than making a sub-box.
17146
17147@<Declare math...@>=
17148procedure make_fraction(@!q:pointer);
17149var p,@!v,@!x,@!y,@!z:pointer; {temporary registers for box construction}
17150@!delta,@!delta1,@!delta2,@!shift_up,@!shift_down,@!clr:scaled;
17151  {dimensions for box calculations}
17152begin if thickness(q)=default_code then thickness(q):=default_rule_thickness;
17153@<Create equal-width boxes |x| and |z| for the numerator and denominator,
17154  and compute the default amounts |shift_up| and |shift_down| by which they
17155  are displaced from the baseline@>;
17156if thickness(q)=0 then @<Adjust \(s)|shift_up| and |shift_down| for the case
17157  of no fraction line@>
17158else @<Adjust \(s)|shift_up| and |shift_down| for the case of a fraction line@>;
17159@<Construct a vlist box for the fraction, according to |shift_up| and
17160  |shift_down|@>;
17161@<Put the \(f)fraction into a box with its delimiters, and make |new_hlist(q)|
17162  point to it@>;
17163end;
17164
17165@ @<Create equal-width boxes |x| and |z| for the numerator and denom...@>=
17166x:=clean_box(numerator(q),num_style(cur_style));
17167z:=clean_box(denominator(q),denom_style(cur_style));
17168if width(x)<width(z) then x:=rebox(x,width(z))
17169else z:=rebox(z,width(x));
17170if cur_style<text_style then {display style}
17171  begin shift_up:=num1(cur_size); shift_down:=denom1(cur_size);
17172  end
17173else  begin shift_down:=denom2(cur_size);
17174  if thickness(q)<>0 then shift_up:=num2(cur_size)
17175  else shift_up:=num3(cur_size);
17176  end
17177
17178@ The numerator and denominator must be separated by a certain minimum
17179clearance, called |clr| in the following program. The difference between
17180|clr| and the actual clearance is |2delta|.
17181
17182@<Adjust \(s)|shift_up| and |shift_down| for the case of no fraction line@>=
17183begin if is_new_mathfont(cur_f) then begin
17184  if cur_style<text_style then clr:=get_ot_math_constant(cur_f, stackDisplayStyleGapMin)
17185  else clr:=get_ot_math_constant(cur_f, stackGapMin);
17186end else begin
17187  if cur_style<text_style then clr:=7*default_rule_thickness
17188  else clr:=3*default_rule_thickness;
17189end;
17190delta:=half(clr-((shift_up-depth(x))-(height(z)-shift_down)));
17191if delta>0 then
17192  begin shift_up:=shift_up+delta;
17193  shift_down:=shift_down+delta;
17194  end;
17195end
17196
17197@ In the case of a fraction line, the minimum clearance depends on the actual
17198thickness of the line.
17199
17200@<Adjust \(s)|shift_up| and |shift_down| for the case of a fraction line@>=
17201begin if is_new_mathfont(cur_f) then begin
17202  delta:=half(thickness(q));
17203  if cur_style<text_style then clr:=get_ot_math_constant(cur_f, fractionNumDisplayStyleGapMin)
17204  else clr:=get_ot_math_constant(cur_f, fractionNumeratorGapMin);
17205  delta1:=clr-((shift_up-depth(x))-(axis_height(cur_size)+delta));
17206  if cur_style<text_style then clr:=get_ot_math_constant(cur_f, fractionDenomDisplayStyleGapMin)
17207  else clr:=get_ot_math_constant(cur_f, fractionDenominatorGapMin);
17208  delta2:=clr-((axis_height(cur_size)-delta)-(height(z)-shift_down));
17209end else begin
17210  if cur_style<text_style then clr:=3*thickness(q)
17211  else clr:=thickness(q);
17212  delta:=half(thickness(q));
17213  delta1:=clr-((shift_up-depth(x))-(axis_height(cur_size)+delta));
17214  delta2:=clr-((axis_height(cur_size)-delta)-(height(z)-shift_down));
17215end;
17216if delta1>0 then shift_up:=shift_up+delta1;
17217if delta2>0 then shift_down:=shift_down+delta2;
17218end
17219
17220@ @<Construct a vlist box for the fraction...@>=
17221v:=new_null_box; type(v):=vlist_node;
17222height(v):=shift_up+height(x); depth(v):=depth(z)+shift_down;
17223width(v):=width(x); {this also equals |width(z)|}
17224if thickness(q)=0 then
17225  begin p:=new_kern((shift_up-depth(x))-(height(z)-shift_down));
17226  link(p):=z;
17227  end
17228else  begin y:=fraction_rule(thickness(q));@/
17229  p:=new_kern((axis_height(cur_size)-delta)-@|(height(z)-shift_down));@/
17230  link(y):=p; link(p):=z;@/
17231  p:=new_kern((shift_up-depth(x))-(axis_height(cur_size)+delta));
17232  link(p):=y;
17233  end;
17234link(x):=p; list_ptr(v):=x
17235
17236@ @<Put the \(f)fraction into a box with its delimiters...@>=
17237if cur_style<text_style then delta:=delim1(cur_size)
17238else delta:=delim2(cur_size);
17239x:=var_delimiter(left_delimiter(q), cur_size, delta); link(x):=v;@/
17240z:=var_delimiter(right_delimiter(q), cur_size, delta); link(v):=z;@/
17241new_hlist(q):=hpack(x,natural)
17242
17243@ If the nucleus of an |op_noad| is a single character, it is to be
17244centered vertically with respect to the axis, after first being enlarged
17245(via a character list in the font) if we are in display style.  The normal
17246convention for placing displayed limits is to put them above and below the
17247operator in display style.
17248
17249The italic correction is removed from the character if there is a subscript
17250and the limits are not being displayed. The |make_op|
17251routine returns the value that should be used as an offset between
17252subscript and superscript.
17253
17254After |make_op| has acted, |subtype(q)| will be |limits| if and only if
17255the limits have been set above and below the operator. In that case,
17256|new_hlist(q)| will already contain the desired final box.
17257
17258@<Declare math...@>=
17259function make_op(@!q:pointer):scaled;
17260label found;
17261var delta:scaled; {offset between subscript and superscript}
17262@!p,@!v,@!x,@!y,@!z:pointer; {temporary registers for box construction}
17263@!c:quarterword;@+@!i:four_quarters; {registers for character examination}
17264@!shift_up,@!shift_down:scaled; {dimensions for box calculation}
17265@!h1,@!h2:scaled; {height of original text-style symbol and possible replacement}
17266@!n,@!g:integer; {potential variant index and glyph code}
17267@!ot_assembly_ptr:void_pointer;
17268@!save_f:internal_font_number;
17269begin if (subtype(q)=normal)and(cur_style<text_style) then
17270  subtype(q):=limits;
17271delta:=0;
17272if math_type(nucleus(q))=math_char then
17273  begin fetch(nucleus(q));
17274  if not is_ot_font(cur_f) then begin
17275    if (cur_style<text_style)and(char_tag(cur_i)=list_tag) then {make it larger}
17276      begin c:=rem_byte(cur_i); i:=char_info(cur_f)(c);
17277      if char_exists(i) then
17278        begin cur_c:=c; cur_i:=i; character(nucleus(q)):=c;
17279        end;
17280      end;
17281    delta:=char_italic(cur_f)(cur_i);
17282  end;
17283  x:=clean_box(nucleus(q),cur_style);
17284  if is_new_mathfont(cur_f) then begin
17285    p:=list_ptr(x);
17286    if is_glyph_node(p) then begin
17287      if cur_style<text_style then begin
17288        {try to replace the operator glyph with a display-size variant,
17289         ensuring it is larger than the text size}
17290        h1:=get_ot_math_constant(cur_f,displayOperatorMinHeight);
17291        if h1<(height(p)+depth(p))*5/4 then h1:=(height(p)+depth(p))*5/4;
17292        c:=native_glyph(p);
17293        n:=0;
17294        repeat
17295          g:=get_ot_math_variant(cur_f, c, n, addressof(h2), 0);
17296          if h2>0 then begin
17297            native_glyph(p):=g;
17298            set_native_glyph_metrics(p, 1);
17299          end;
17300          incr(n);
17301        until (h2<0) or (h2>=h1);
17302        if (h2<0) then begin
17303          {if we get here, then we didn't find a big enough glyph; check if the char is extensible}
17304          ot_assembly_ptr:=get_ot_assembly_ptr(cur_f, c, 0);
17305          if ot_assembly_ptr<>nil then begin
17306            free_node(p,glyph_node_size);
17307            p:=build_opentype_assembly(cur_f, ot_assembly_ptr, h1, 0);
17308            list_ptr(x):=p;
17309            delta:=0;
17310            goto found;
17311          end;
17312        end else
17313          set_native_glyph_metrics(p, 1);
17314      end;
17315      delta:=get_ot_math_ital_corr(cur_f, native_glyph(p));
17316found:
17317      width(x):=width(p); height(x):=height(p); depth(x):=depth(p);
17318    end
17319  end;
17320  if (math_type(subscr(q))<>empty)and(subtype(q)<>limits) then
17321    width(x):=width(x)-delta; {remove italic correction}
17322  shift_amount(x):=half(height(x)-depth(x)) - axis_height(cur_size);
17323    {center vertically}
17324  math_type(nucleus(q)):=sub_box; info(nucleus(q)):=x;
17325  end;
17326save_f:=cur_f;
17327if subtype(q)=limits then
17328  @<Construct a box with limits above and below it, skewed by |delta|@>;
17329make_op:=delta;
17330end;
17331
17332@ The following program builds a vlist box |v| for displayed limits. The
17333width of the box is not affected by the fact that the limits may be skewed.
17334
17335@<Construct a box with limits above and below it...@>=
17336begin x:=clean_box(supscr(q),sup_style(cur_style));
17337y:=clean_box(nucleus(q),cur_style);
17338z:=clean_box(subscr(q),sub_style(cur_style));
17339v:=new_null_box; type(v):=vlist_node; width(v):=width(y);
17340if width(x)>width(v) then width(v):=width(x);
17341if width(z)>width(v) then width(v):=width(z);
17342x:=rebox(x,width(v)); y:=rebox(y,width(v)); z:=rebox(z,width(v));@/
17343shift_amount(x):=half(delta); shift_amount(z):=-shift_amount(x);
17344height(v):=height(y); depth(v):=depth(y);
17345@<Attach the limits to |y| and adjust |height(v)|, |depth(v)| to
17346  account for their presence@>;
17347new_hlist(q):=v;
17348end
17349
17350@ We use |shift_up| and |shift_down| in the following program for the
17351amount of glue between the displayed operator |y| and its limits |x| and
17352|z|. The vlist inside box |v| will consist of |x| followed by |y| followed
17353by |z|, with kern nodes for the spaces between and around them.
17354
17355@<Attach the limits to |y| and adjust |height(v)|, |depth(v)|...@>=
17356cur_f:=save_f;
17357if math_type(supscr(q))=empty then
17358  begin free_node(x,box_node_size); list_ptr(v):=y;
17359  end
17360else  begin shift_up:=big_op_spacing3-depth(x);
17361  if shift_up<big_op_spacing1 then shift_up:=big_op_spacing1;
17362  p:=new_kern(shift_up); link(p):=y; link(x):=p;@/
17363  p:=new_kern(big_op_spacing5); link(p):=x; list_ptr(v):=p;
17364  height(v):=height(v)+big_op_spacing5+height(x)+depth(x)+shift_up;
17365  end;
17366if math_type(subscr(q))=empty then free_node(z,box_node_size)
17367else  begin shift_down:=big_op_spacing4-height(z);
17368  if shift_down<big_op_spacing2 then shift_down:=big_op_spacing2;
17369  p:=new_kern(shift_down); link(y):=p; link(p):=z;@/
17370  p:=new_kern(big_op_spacing5); link(z):=p;
17371  depth(v):=depth(v)+big_op_spacing5+height(z)+depth(z)+shift_down;
17372  end
17373
17374@ A ligature found in a math formula does not create a |ligature_node|, because
17375there is no question of hyphenation afterwards; the ligature will simply be
17376stored in an ordinary |char_node|, after residing in an |ord_noad|.
17377
17378The |math_type| is converted to |math_text_char| here if we would not want to
17379apply an italic correction to the current character unless it belongs
17380to a math font (i.e., a font with |space=0|).
17381
17382No boundary characters enter into these ligatures.
17383
17384@<Declare math...@>=
17385procedure make_ord(@!q:pointer);
17386label restart,exit;
17387var a:integer; {address of lig/kern instruction}
17388@!p,@!r:pointer; {temporary registers for list manipulation}
17389begin restart:@t@>@;@/
17390if math_type(subscr(q))=empty then if math_type(supscr(q))=empty then
17391 if math_type(nucleus(q))=math_char then
17392  begin p:=link(q);
17393  if p<>null then if (type(p)>=ord_noad)and(type(p)<=punct_noad) then
17394    if math_type(nucleus(p))=math_char then
17395    if fam(nucleus(p))=fam(nucleus(q)) then
17396      begin math_type(nucleus(q)):=math_text_char;
17397      fetch(nucleus(q));
17398      if char_tag(cur_i)=lig_tag then
17399        begin a:=lig_kern_start(cur_f)(cur_i);
17400        cur_c:=character(nucleus(p));
17401        cur_i:=font_info[a].qqqq;
17402        if skip_byte(cur_i)>stop_flag then
17403          begin a:=lig_kern_restart(cur_f)(cur_i);
17404          cur_i:=font_info[a].qqqq;
17405          end;
17406        loop@+ begin @<If instruction |cur_i| is a kern with |cur_c|, attach
17407            the kern after~|q|; or if it is a ligature with |cur_c|, combine
17408            noads |q| and~|p| appropriately; then |return| if the cursor has
17409            moved past a noad, or |goto restart|@>;
17410          if skip_byte(cur_i)>=stop_flag then return;
17411          a:=a+qo(skip_byte(cur_i))+1;
17412          cur_i:=font_info[a].qqqq;
17413          end;
17414        end;
17415      end;
17416  end;
17417exit:end;
17418
17419@ Note that a ligature between an |ord_noad| and another kind of noad
17420is replaced by an |ord_noad|, when the two noads collapse into one.
17421But we could make a parenthesis (say) change shape when it follows
17422certain letters. Presumably a font designer will define such
17423ligatures only when this convention makes sense.
17424
17425\chardef\?='174 % vertical line to indicate character retention
17426
17427@<If instruction |cur_i| is a kern with |cur_c|, ...@>=
17428if next_char(cur_i)=cur_c then if skip_byte(cur_i)<=stop_flag then
17429  if op_byte(cur_i)>=kern_flag then
17430    begin p:=new_kern(char_kern(cur_f)(cur_i));
17431    link(p):=link(q); link(q):=p; return;
17432    end
17433  else  begin check_interrupt; {allow a way out of infinite ligature loop}
17434    case op_byte(cur_i) of
17435  qi(1),qi(5): character(nucleus(q)):=rem_byte(cur_i); {\.{=:\?}, \.{=:\?>}}
17436  qi(2),qi(6): character(nucleus(p)):=rem_byte(cur_i); {\.{\?=:}, \.{\?=:>}}
17437  qi(3),qi(7),qi(11):begin r:=new_noad; {\.{\?=:\?}, \.{\?=:\?>}, \.{\?=:\?>>}}
17438      character(nucleus(r)):=rem_byte(cur_i);
17439      plane_and_fam_field(nucleus(r)):=fam(nucleus(q));@/
17440      link(q):=r; link(r):=p;
17441      if op_byte(cur_i)<qi(11) then math_type(nucleus(r)):=math_char
17442      else math_type(nucleus(r)):=math_text_char; {prevent combination}
17443      end;
17444    othercases begin link(q):=link(p);
17445      character(nucleus(q)):=rem_byte(cur_i); {\.{=:}}
17446      mem[subscr(q)]:=mem[subscr(p)]; mem[supscr(q)]:=mem[supscr(p)];@/
17447      free_node(p,noad_size);
17448      end
17449    endcases;
17450    if op_byte(cur_i)>qi(3) then return;
17451    math_type(nucleus(q)):=math_char; goto restart;
17452    end
17453
17454@ When we get to the following part of the program, we have ``fallen through''
17455from cases that did not lead to |check_dimensions| or |done_with_noad| or
17456|done_with_node|. Thus, |q|~points to a noad whose nucleus may need to be
17457converted to an hlist, and whose subscripts and superscripts need to be
17458appended if they are present.
17459
17460If |nucleus(q)| is not a |math_char|, the variable |delta| is the amount
17461by which a superscript should be moved right with respect to a subscript
17462when both are present.
17463@^subscripts@>
17464@^superscripts@>
17465
17466@<Convert \(n)|nucleus(q)| to an hlist and attach the sub/superscripts@>=
17467case math_type(nucleus(q)) of
17468math_char, math_text_char:
17469  @<Create a character node |p| for |nucleus(q)|, possibly followed
17470  by a kern node for the italic correction, and set |delta| to the
17471  italic correction if a subscript is present@>;
17472empty: p:=null;
17473sub_box: p:=info(nucleus(q));
17474sub_mlist: begin cur_mlist:=info(nucleus(q)); save_style:=cur_style;
17475  mlist_penalties:=false; mlist_to_hlist; {recursive call}
17476@^recursion@>
17477  cur_style:=save_style; @<Set up the values...@>;
17478  p:=hpack(link(temp_head),natural);
17479  end;
17480othercases confusion("mlist2")
17481@:this can't happen mlist2}{\quad mlist2@>
17482endcases;@/
17483new_hlist(q):=p;
17484if (math_type(subscr(q))=empty)and(math_type(supscr(q))=empty) then
17485  goto check_dimensions;
17486make_scripts(q,delta)
17487
17488@ @<Create a character node |p| for |nucleus(q)|...@>=
17489begin fetch(nucleus(q));
17490if is_native_font(cur_f) then begin
17491  z:=new_native_character(cur_f, qo(cur_c));
17492  p:=get_node(glyph_node_size);
17493  type(p):=whatsit_node; subtype(p):=glyph_node;
17494  native_font(p):=cur_f; native_glyph(p):=get_native_glyph(z, 0);
17495  set_native_glyph_metrics(p, 1);
17496  free_node(z, native_size(z));
17497  delta:=get_ot_math_ital_corr(cur_f, native_glyph(p));
17498  if (math_type(nucleus(q))=math_text_char)and(not is_new_mathfont(cur_f)<>0) then
17499    delta:=0; {no italic correction in mid-word of text font}
17500  if (math_type(subscr(q))=empty)and(delta<>0) then
17501    begin link(p):=new_kern(delta); delta:=0;
17502    end;
17503end else if char_exists(cur_i) then
17504  begin delta:=char_italic(cur_f)(cur_i); p:=new_character(cur_f,qo(cur_c));
17505  if (math_type(nucleus(q))=math_text_char)and(space(cur_f)<>0) then
17506    delta:=0; {no italic correction in mid-word of text font}
17507  if (math_type(subscr(q))=empty)and(delta<>0) then
17508    begin link(p):=new_kern(delta); delta:=0;
17509    end;
17510  end
17511else p:=null;
17512end
17513
17514@ The purpose of |make_scripts(q,delta)| is to attach the subscript and/or
17515superscript of noad |q| to the list that starts at |new_hlist(q)|,
17516given that the subscript and superscript aren't both empty. The superscript
17517will appear to the right of the subscript by a given distance |delta|.
17518
17519We set |shift_down| and |shift_up| to the minimum amounts to shift the
17520baseline of subscripts and superscripts based on the given nucleus.
17521
17522@<Declare math...@>=
17523function attach_hkern_to_new_hlist(@!q:pointer;@!delta:scaled):pointer;
17524var y,@!z:pointer; {temporary registers for box construction}
17525begin z:=new_kern(delta);
17526if new_hlist(q)=null then new_hlist(q):=z
17527else begin y:=new_hlist(q);
17528  while link(y)<>null do y:=link(y);
17529  link(y):=z;
17530  end;
17531attach_hkern_to_new_hlist:=new_hlist(q);
17532end;
17533
17534procedure make_scripts(@!q:pointer;@!delta:scaled);
17535var p,@!x,@!y,@!z:pointer; {temporary registers for box construction}
17536@!shift_up,@!shift_down,@!clr,@!sub_kern,@!sup_kern:scaled; {dimensions in the calculation}
17537@!script_c:pointer; {temprary native character for sub/superscript}
17538@!script_g:quarterword; {temporary register for sub/superscript native glyph id}
17539@!script_f:internal_font_number; {temporary register for sub/superscript font}
17540@!t:integer; {subsidiary size code}
17541@!save_f:internal_font_number;
17542begin p:=new_hlist(q);
17543script_c:=null; script_g:=0; script_f:=0; sup_kern:=0; sub_kern:=0;
17544if is_char_node(p) or is_glyph_node(p) then
17545  begin shift_up:=0; shift_down:=0;
17546  end
17547else  begin z:=hpack(p,natural);
17548  if cur_style<script_style then t:=script_size@+else t:=script_script_size;
17549  shift_up:=height(z)-sup_drop(t);
17550  shift_down:=depth(z)+sub_drop(t);
17551  free_node(z,box_node_size);
17552  end;
17553if math_type(supscr(q))=empty then
17554  @<Construct a subscript box |x| when there is no superscript@>
17555else  begin @<Construct a superscript box |x|@>;
17556  if math_type(subscr(q))=empty then shift_amount(x):=-shift_up
17557  else @<Construct a sub/superscript combination box |x|, with the
17558    superscript offset by |delta|@>;
17559  end;
17560if new_hlist(q)=null then new_hlist(q):=x
17561else  begin p:=new_hlist(q);
17562  while link(p)<>null do p:=link(p);
17563  link(p):=x;
17564  end;
17565end;
17566
17567@ When there is a subscript without a superscript, the top of the subscript
17568should not exceed the baseline plus four-fifths of the x-height.
17569
17570@<Construct a subscript box |x| when there is no superscript@>=
17571begin
17572save_f:=cur_f;
17573x:=clean_box(subscr(q),sub_style(cur_style));
17574cur_f:=save_f;
17575width(x):=width(x)+script_space;
17576if shift_down<sub1(cur_size) then shift_down:=sub1(cur_size);
17577if is_new_mathfont(cur_f) then
17578  clr:=height(x)-get_ot_math_constant(cur_f, subscriptTopMax)
17579else
17580  clr:=height(x)-(abs(math_x_height(cur_size)*4) div 5);
17581if shift_down<clr then shift_down:=clr;
17582shift_amount(x):=shift_down;
17583if is_new_mathfont(cur_f) then @<Attach subscript OpenType math kerning@>
17584end
17585
17586@ The bottom of a superscript should never descend below the baseline plus
17587one-fourth of the x-height.
17588
17589@<Construct a superscript box |x|@>=
17590begin
17591save_f:=cur_f;
17592x:=clean_box(supscr(q),sup_style(cur_style));
17593cur_f:=save_f;
17594width(x):=width(x)+script_space;
17595if odd(cur_style) then clr:=sup3(cur_size)
17596else if cur_style<text_style then clr:=sup1(cur_size)
17597else clr:=sup2(cur_size);
17598if shift_up<clr then shift_up:=clr;
17599if is_new_mathfont(cur_f) then
17600  clr:=depth(x)+get_ot_math_constant(cur_f, superscriptBottomMin)
17601else
17602  clr:=depth(x)+(abs(math_x_height(cur_size)) div 4);
17603if shift_up<clr then shift_up:=clr;
17604if is_new_mathfont(cur_f) then @<Attach superscript OpenType math kerning@>
17605end
17606
17607@ When both subscript and superscript are present, the subscript must be
17608separated from the superscript by at least four times |default_rule_thickness|.
17609If this condition would be violated, the subscript moves down, after which
17610both subscript and superscript move up so that the bottom of the superscript
17611is at least as high as the baseline plus four-fifths of the x-height.
17612
17613@<Construct a sub/superscript combination box |x|...@>=
17614begin
17615save_f:=cur_f;
17616y:=clean_box(subscr(q),sub_style(cur_style));
17617cur_f:=save_f;
17618width(y):=width(y)+script_space;
17619if shift_down<sub2(cur_size) then shift_down:=sub2(cur_size);
17620if is_new_mathfont(cur_f) then
17621  clr:=get_ot_math_constant(cur_f, subSuperscriptGapMin)-((shift_up-depth(x))-(height(y)-shift_down))
17622else
17623  clr:=4*default_rule_thickness-((shift_up-depth(x))-(height(y)-shift_down));
17624if clr>0 then
17625  begin shift_down:=shift_down+clr;
17626  if is_new_mathfont(cur_f) then
17627    clr:=get_ot_math_constant(cur_f, superscriptBottomMaxWithSubscript)-(shift_up-depth(x))
17628  else
17629    clr:=(abs(math_x_height(cur_size)*4) div 5)-(shift_up-depth(x));
17630  if clr>0 then
17631    begin shift_up:=shift_up+clr;
17632    shift_down:=shift_down-clr;
17633    end;
17634  end;
17635if is_new_mathfont(cur_f) then begin
17636  @<Attach subscript OpenType math kerning@>@/
17637  @<Attach superscript OpenType math kerning@>
17638  end;
17639shift_amount(x):=sup_kern+delta-sub_kern; {superscript is |delta| to the right of the subscript}
17640p:=new_kern((shift_up-depth(x))-(height(y)-shift_down)); link(x):=p; link(p):=y;
17641x:=vpack(x,natural); shift_amount(x):=shift_down;
17642end
17643
17644@ OpenType math fonts provide an additional adjustment for the horizontal
17645position of sub/superscripts called math kerning.
17646
17647The following definitions should be kept in sync with \.{XeTeXOTMath.cpp}.
17648
17649@d sup_cmd=0 {superscript kern type for |get_ot_math_kern|}
17650@d sub_cmd=1 {subscript kern type for |get_ot_math_kern|}
17651
17652@<Attach subscript OpenType math kerning@>=
17653begin if math_type(subscr(q))=math_char then
17654  begin save_f:=cur_f;
17655  fetch(subscr(q));
17656  if is_new_mathfont(cur_f) then
17657    begin script_c:=new_native_character(cur_f, qo(cur_c));
17658    script_g:=get_native_glyph(script_c, 0);
17659    script_f:=cur_f;
17660  end else
17661    begin script_g:=0; script_f:=0
17662    end;
17663  cur_f:=save_f;
17664  end;
17665if is_glyph_node(p) then
17666  sub_kern:=get_ot_math_kern(native_font(p), native_glyph(p), script_f, script_g, sub_cmd, shift_down);
17667
17668if sub_kern<>0 then p:=attach_hkern_to_new_hlist(q, sub_kern);
17669end;
17670
17671@ @<Attach superscript OpenType math kerning@>=
17672begin if math_type(supscr(q))=math_char then
17673  begin save_f:=cur_f;
17674  fetch(supscr(q));
17675  if is_new_mathfont(cur_f) then
17676    begin script_c:=new_native_character(cur_f, qo(cur_c));
17677    script_g:=get_native_glyph(script_c, 0);
17678    script_f:=cur_f;
17679  end else
17680    begin script_g:=0; script_f:=0
17681    end;
17682  cur_f:=save_f;
17683  end;
17684if is_glyph_node(p) then
17685  sup_kern:=get_ot_math_kern(native_font(p), native_glyph(p), script_f, script_g, sup_cmd, shift_up);
17686
17687if (sup_kern<>0) and (math_type(subscr(q))=empty) then
17688  {if there is a superscript the kern will be added to |shift_amount(x)|}
17689  p:=attach_hkern_to_new_hlist(q, sup_kern);
17690end;
17691
17692@ We have now tied up all the loose ends of the first pass of |mlist_to_hlist|.
17693The second pass simply goes through and hooks everything together with the
17694proper glue and penalties. It also handles the |left_noad| and |right_noad| that
17695might be present, since |max_h| and |max_d| are now known. Variable |p| points
17696to a node at the current end of the final hlist.
17697
17698@<Make a second pass over the mlist, ...@>=
17699p:=temp_head; link(p):=null; q:=mlist; r_type:=0; cur_style:=style;
17700@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>;
17701while q<>null do
17702  begin @<If node |q| is a style node, change the style and |goto delete_q|;
17703    otherwise if it is not a noad, put it into the hlist,
17704    advance |q|, and |goto done|; otherwise set |s| to the size
17705    of noad |q|, set |t| to the associated type (|ord_noad..
17706    inner_noad|), and set |pen| to the associated penalty@>;
17707  @<Append inter-element spacing based on |r_type| and |t|@>;
17708  @<Append any |new_hlist| entries for |q|, and any appropriate penalties@>;
17709  if type(q)=right_noad then t:=open_noad;
17710  r_type:=t;
17711  delete_q: r:=q; q:=link(q); free_node(r,s);
17712  done: end
17713
17714@ Just before doing the big |case| switch in the second pass, the program
17715sets up default values so that most of the branches are short.
17716
17717@<If node |q| is a style node, change the style...@>=
17718t:=ord_noad; s:=noad_size; pen:=inf_penalty;
17719case type(q) of
17720op_noad,open_noad,close_noad,punct_noad,inner_noad: t:=type(q);
17721bin_noad: begin t:=bin_noad; pen:=bin_op_penalty;
17722  end;
17723rel_noad: begin t:=rel_noad; pen:=rel_penalty;
17724  end;
17725ord_noad,vcenter_noad,over_noad,under_noad: do_nothing;
17726radical_noad: s:=radical_noad_size;
17727accent_noad: s:=accent_noad_size;
17728fraction_noad: begin t:=inner_noad; s:=fraction_noad_size;
17729  end;
17730left_noad,right_noad: t:=make_left_right(q,style,max_d,max_h);
17731style_node: @<Change the current style and |goto delete_q|@>;
17732whatsit_node,penalty_node,rule_node,disc_node,adjust_node,ins_node,mark_node,
17733 glue_node,kern_node:@t@>@;@/
17734  begin link(p):=q; p:=q; q:=link(q); link(p):=null; goto done;
17735  end;
17736othercases confusion("mlist3")
17737@:this can't happen mlist3}{\quad mlist3@>
17738endcases
17739
17740@ The |make_left_right| function constructs a left or right delimiter of
17741the required size and returns the value |open_noad| or |close_noad|. The
17742|right_noad| and |left_noad| will both be based on the original |style|,
17743so they will have consistent sizes.
17744
17745We use the fact that |right_noad-left_noad=close_noad-open_noad|.
17746
17747@<Declare math...@>=
17748function make_left_right(@!q:pointer;@!style:small_number;
17749  @!max_d,@!max_h:scaled):small_number;
17750var delta,@!delta1,@!delta2:scaled; {dimensions used in the calculation}
17751begin cur_style:=style; @<Set up the values...@>;
17752delta2:=max_d+axis_height(cur_size);
17753delta1:=max_h+max_d-delta2;
17754if delta2>delta1 then delta1:=delta2; {|delta1| is max distance from axis}
17755delta:=(delta1 div 500)*delimiter_factor;
17756delta2:=delta1+delta1-delimiter_shortfall;
17757if delta<delta2 then delta:=delta2;
17758new_hlist(q):=var_delimiter(delimiter(q),cur_size,delta);
17759make_left_right:=type(q)-(left_noad-open_noad); {|open_noad| or |close_noad|}
17760end;
17761
17762@ @<Change the current style and |goto delete_q|@>=
17763begin cur_style:=subtype(q); s:=style_node_size;
17764@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>;
17765goto delete_q;
17766end
17767
17768@ The inter-element spacing in math formulas depends on a $8\times8$ table that
17769\TeX\ preloads as a 64-digit string. The elements of this string have the
17770following significance:
17771$$\vbox{\halign{#\hfil\cr
17772\.0 means no space;\cr
17773\.1 means a conditional thin space (\.{\\nonscript\\mskip\\thinmuskip});\cr
17774\.2 means a thin space (\.{\\mskip\\thinmuskip});\cr
17775\.3 means a conditional medium space
17776  (\.{\\nonscript\\mskip\\medmuskip});\cr
17777\.4 means a conditional thick space
17778  (\.{\\nonscript\\mskip\\thickmuskip});\cr
17779\.* means an impossible case.\cr}}$$
17780This is all pretty cryptic, but {\sl The \TeX book\/} explains what is
17781supposed to happen, and the string makes it happen.
17782@:TeXbook}{\sl The \TeX book@>
17783
17784A global variable |magic_offset| is computed so that if |a| and |b| are
17785in the range |ord_noad..inner_noad|, then |str_pool[a*8+b+magic_offset]|
17786is the digit for spacing between noad types |a| and |b|.
17787
17788If \PASCAL\ had provided a good way to preload constant arrays, this part of
17789the program would not have been so strange.
17790@:PASCAL}{\PASCAL@>
17791
17792@d math_spacing=@;@/
17793@t\hskip-35pt@>
17794"0234000122*4000133**3**344*0400400*000000234000111*1111112341011"
17795@t$ \hskip-35pt$@>
17796
17797@<Glob...@>=
17798@!magic_offset:integer; {used to find inter-element spacing}
17799
17800@ @<Compute the magic offset@>=
17801magic_offset:=str_start_macro(math_spacing)-9*ord_noad
17802
17803@ @<Append inter-element spacing based on |r_type| and |t|@>=
17804if r_type>0 then {not the first noad}
17805  begin case so(str_pool[r_type*8+t+magic_offset]) of
17806  "0": x:=0;
17807  "1": if cur_style<script_style then x:=thin_mu_skip_code@+else x:=0;
17808  "2": x:=thin_mu_skip_code;
17809  "3": if cur_style<script_style then x:=med_mu_skip_code@+else x:=0;
17810  "4": if cur_style<script_style then x:=thick_mu_skip_code@+else x:=0;
17811  othercases confusion("mlist4")
17812@:this can't happen mlist4}{\quad mlist4@>
17813  endcases;
17814  if x<>0 then
17815    begin y:=math_glue(glue_par(x),cur_mu);
17816    z:=new_glue(y); glue_ref_count(y):=null; link(p):=z; p:=z;@/
17817    subtype(z):=x+1; {store a symbolic subtype}
17818    end;
17819  end
17820
17821@ We insert a penalty node after the hlist entries of noad |q| if |pen|
17822is not an ``infinite'' penalty, and if the node immediately following |q|
17823is not a penalty node or a |rel_noad| or absent entirely.
17824
17825@<Append any |new_hlist| entries for |q|, and any appropriate penalties@>=
17826if new_hlist(q)<>null then
17827  begin link(p):=new_hlist(q);
17828  repeat p:=link(p);
17829  until link(p)=null;
17830  end;
17831if penalties then if link(q)<>null then if pen<inf_penalty then
17832  begin r_type:=type(link(q));
17833  if r_type<>penalty_node then if r_type<>rel_noad then
17834    begin z:=new_penalty(pen); link(p):=z; p:=z;
17835    end;
17836  end
17837
17838@* \[37] Alignment.
17839It's sort of a miracle whenever \.{\\halign} and \.{\\valign} work, because
17840they cut across so many of the control structures of \TeX.
17841
17842Therefore the
17843present page is probably not the best place for a beginner to start reading
17844this program; it is better to master everything else first.
17845
17846Let us focus our thoughts on an example of what the input might be, in order
17847to get some idea about how the alignment miracle happens. The example doesn't
17848do anything useful, but it is sufficiently general to indicate all of the
17849special cases that must be dealt with; please do not be disturbed by its
17850apparent complexity and meaninglessness.
17851$$\vbox{\halign{\.{#}\hfil\cr
17852{}\\tabskip 2pt plus 3pt\cr
17853{}\\halign to 300pt\{u1\#v1\&\cr
17854\hskip 50pt\\tabskip 1pt plus 1fil u2\#v2\&\cr
17855\hskip 50pt u3\#v3\\cr\cr
17856\hskip 25pt a1\&\\omit a2\&\\vrule\\cr\cr
17857\hskip 25pt \\noalign\{\\vskip 3pt\}\cr
17858\hskip 25pt b1\\span b2\\cr\cr
17859\hskip 25pt \\omit\&c2\\span\\omit\\cr\}\cr}}$$
17860Here's what happens:
17861
17862\yskip
17863(0) When `\.{\\halign to 300pt\{}' is scanned, the |scan_spec| routine
17864places the 300pt dimension onto the |save_stack|, and an |align_group|
17865code is placed above it. This will make it possible to complete the alignment
17866when the matching `\.\}' is found.
17867
17868(1) The preamble is scanned next. Macros in the preamble are not expanded,
17869@^preamble@>
17870except as part of a tabskip specification. For example, if \.{u2} had been
17871a macro in the preamble above, it would have been expanded, since \TeX\
17872must look for `\.{minus...}' as part of the tabskip glue. A ``preamble list''
17873is constructed based on the user's preamble; in our case it contains the
17874following seven items:
17875$$\vbox{\halign{\.{#}\hfil\qquad&(#)\hfil\cr
17876{}\\glue 2pt plus 3pt&the tabskip preceding column 1\cr
17877{}\\alignrecord, width $-\infty$&preamble info for column 1\cr
17878{}\\glue 2pt plus 3pt&the tabskip between columns 1 and 2\cr
17879{}\\alignrecord, width $-\infty$&preamble info for column 2\cr
17880{}\\glue 1pt plus 1fil&the tabskip between columns 2 and 3\cr
17881{}\\alignrecord, width $-\infty$&preamble info for column 3\cr
17882{}\\glue 1pt plus 1fil&the tabskip following column 3\cr}}$$
17883These ``alignrecord'' entries have the same size as an |unset_node|,
17884since they will later be converted into such nodes. However, at the
17885moment they have no |type| or |subtype| fields; they have |info| fields
17886instead, and these |info| fields are initially set to the value |end_span|,
17887for reasons explained below. Furthermore, the alignrecord nodes have no
17888|height| or |depth| fields; these are renamed |u_part| and |v_part|,
17889and they point to token lists for the templates of the alignment.
17890For example, the |u_part| field in the first alignrecord points to the
17891token list `\.{u1}', i.e., the template preceding the `\.\#' for column~1.
17892
17893(2) \TeX\ now looks at what follows the \.{\\cr} that ended the preamble.
17894It is not `\.{\\noalign}' or `\.{\\omit}', so this input is put back to
17895be read again, and the template `\.{u1}' is fed to the scanner. Just
17896before reading `\.{u1}', \TeX\ goes into restricted horizontal mode.
17897Just after reading `\.{u1}', \TeX\ will see `\.{a1}', and then (when the
17898{\.\&} is sensed) \TeX\ will see `\.{v1}'. Then \TeX\ scans an |endv|
17899token, indicating the end of a column. At this point an |unset_node| is
17900created, containing the contents of the current hlist (i.e., `\.{u1a1v1}').
17901The natural width of this unset node replaces the |width| field of the
17902alignrecord for column~1; in general, the alignrecords will record the
17903maximum natural width that has occurred so far in a given column.
17904
17905(3) Since `\.{\\omit}' follows the `\.\&', the templates for column~2
17906are now bypassed. Again \TeX\ goes into restricted horizontal mode and
17907makes an |unset_node| from the resulting hlist; but this time the
17908hlist contains simply `\.{a2}'. The natural width of the new unset box
17909is remembered in the |width| field of the alignrecord for column~2.
17910
17911(4) A third |unset_node| is created for column 3, using essentially the
17912mechanism that worked for column~1; this unset box contains `\.{u3\\vrule
17913v3}'. The vertical rule in this case has running dimensions that will later
17914extend to the height and depth of the whole first row, since each |unset_node|
17915in a row will eventually inherit the height and depth of its enclosing box.
17916
17917(5) The first row has now ended; it is made into a single unset box
17918comprising the following seven items:
17919$$\vbox{\halign{\hbox to 325pt{\qquad\.{#}\hfil}\cr
17920{}\\glue 2pt plus 3pt\cr
17921{}\\unsetbox for 1 column: u1a1v1\cr
17922{}\\glue 2pt plus 3pt\cr
17923{}\\unsetbox for 1 column: a2\cr
17924{}\\glue 1pt plus 1fil\cr
17925{}\\unsetbox for 1 column: u3\\vrule v3\cr
17926{}\\glue 1pt plus 1fil\cr}}$$
17927The width of this unset row is unimportant, but it has the correct height
17928and depth, so the correct baselineskip glue will be computed as the row
17929is inserted into a vertical list.
17930
17931(6) Since `\.{\\noalign}' follows the current \.{\\cr}, \TeX\ appends
17932additional material (in this case \.{\\vskip 3pt}) to the vertical list.
17933While processing this material, \TeX\ will be in internal vertical
17934mode, and |no_align_group| will be on |save_stack|.
17935
17936(7) The next row produces an unset box that looks like this:
17937$$\vbox{\halign{\hbox to 325pt{\qquad\.{#}\hfil}\cr
17938{}\\glue 2pt plus 3pt\cr
17939{}\\unsetbox for 2 columns: u1b1v1u2b2v2\cr
17940{}\\glue 1pt plus 1fil\cr
17941{}\\unsetbox for 1 column: {\rm(empty)}\cr
17942{}\\glue 1pt plus 1fil\cr}}$$
17943The natural width of the unset box that spans columns 1~and~2 is stored
17944in a ``span node,'' which we will explain later; the |info| field of the
17945alignrecord for column~1 now points to the new span node, and the |info|
17946of the span node points to |end_span|.
17947
17948(8) The final row produces the unset box
17949$$\vbox{\halign{\hbox to 325pt{\qquad\.{#}\hfil}\cr
17950{}\\glue 2pt plus 3pt\cr
17951{}\\unsetbox for 1 column: {\rm(empty)}\cr
17952{}\\glue 2pt plus 3pt\cr
17953{}\\unsetbox for 2 columns: u2c2v2\cr
17954{}\\glue 1pt plus 1fil\cr}}$$
17955A new span node is attached to the alignrecord for column 2.
17956
17957(9) The last step is to compute the true column widths and to change all the
17958unset boxes to hboxes, appending the whole works to the vertical list that
17959encloses the \.{\\halign}. The rules for deciding on the final widths of
17960each unset column box will be explained below.
17961
17962\yskip\noindent
17963Note that as \.{\\halign} is being processed, we fearlessly give up control
17964to the rest of \TeX. At critical junctures, an alignment routine is
17965called upon to step in and do some little action, but most of the time
17966these routines just lurk in the background. It's something like
17967post-hypnotic suggestion.
17968
17969@ We have mentioned that alignrecords contain no |height| or |depth| fields.
17970Their |glue_sign| and |glue_order| are pre-empted as well, since it
17971is necessary to store information about what to do when a template ends.
17972This information is called the |extra_info| field.
17973
17974@d u_part(#)==mem[#+height_offset].int {pointer to \<u_j> token list}
17975@d v_part(#)==mem[#+depth_offset].int {pointer to \<v_j> token list}
17976@d extra_info(#)==info(#+list_offset) {info to remember during template}
17977
17978@ Alignments can occur within alignments, so a small stack is used to access
17979the alignrecord information. At each level we have a |preamble| pointer,
17980indicating the beginning of the preamble list; a |cur_align| pointer,
17981indicating the current position in the preamble list; a |cur_span| pointer,
17982indicating the value of |cur_align| at the beginning of a sequence of
17983spanned columns; a |cur_loop| pointer, indicating the tabskip glue before
17984an alignrecord that should be copied next if the current list is extended;
17985and the |align_state| variable, which indicates the nesting of braces so
17986that \.{\\cr} and \.{\\span} and tab marks are properly intercepted.
17987There also are pointers |cur_head| and |cur_tail| to the head and tail
17988of a list of adjustments being moved out from horizontal mode to
17989vertical~mode.
17990
17991The current values of these seven quantities appear in global variables;
17992when they have to be pushed down, they are stored in 5-word nodes, and
17993|align_ptr| points to the topmost such node.
17994
17995@d preamble==link(align_head) {the current preamble list}
17996@d align_stack_node_size=6 {number of |mem| words to save alignment states}
17997
17998@<Glob...@>=
17999@!cur_align:pointer; {current position in preamble list}
18000@!cur_span:pointer; {start of currently spanned columns in preamble list}
18001@!cur_loop:pointer; {place to copy when extending a periodic preamble}
18002@!align_ptr:pointer; {most recently pushed-down alignment stack node}
18003@!cur_head,@!cur_tail:pointer; {adjustment list pointers}
18004@!cur_pre_head,@!cur_pre_tail:pointer; {pre-adjustment list pointers}
18005
18006@ The |align_state| and |preamble| variables are initialized elsewhere.
18007
18008@<Set init...@>=
18009align_ptr:=null; cur_align:=null; cur_span:=null; cur_loop:=null;
18010cur_head:=null; cur_tail:=null;
18011cur_pre_head:=null; cur_pre_tail:=null;
18012
18013@ Alignment stack maintenance is handled by a pair of trivial routines
18014called |push_alignment| and |pop_alignment|.
18015
18016@p procedure push_alignment;
18017var p:pointer; {the new alignment stack node}
18018begin p:=get_node(align_stack_node_size);
18019link(p):=align_ptr; info(p):=cur_align;
18020llink(p):=preamble; rlink(p):=cur_span;
18021mem[p+2].int:=cur_loop; mem[p+3].int:=align_state;
18022info(p+4):=cur_head; link(p+4):=cur_tail;
18023info(p+5):=cur_pre_head; link(p+5):=cur_pre_tail;
18024align_ptr:=p;
18025cur_head:=get_avail;
18026cur_pre_head:=get_avail;
18027end;
18028@#
18029procedure pop_alignment;
18030var p:pointer; {the top alignment stack node}
18031begin free_avail(cur_head);
18032free_avail(cur_pre_head);
18033p:=align_ptr;
18034cur_tail:=link(p+4); cur_head:=info(p+4);
18035cur_pre_tail:=link(p+5); cur_pre_head:=info(p+5);
18036align_state:=mem[p+3].int; cur_loop:=mem[p+2].int;
18037cur_span:=rlink(p); preamble:=llink(p);
18038cur_align:=info(p); align_ptr:=link(p);
18039free_node(p,align_stack_node_size);
18040end;
18041
18042@ \TeX\ has eight procedures that govern alignments: |init_align| and
18043|fin_align| are used at the very beginning and the very end; |init_row| and
18044|fin_row| are used at the beginning and end of individual rows; |init_span|
18045is used at the beginning of a sequence of spanned columns (possibly involving
18046only one column); |init_col| and |fin_col| are used at the beginning and
18047end of individual columns; and |align_peek| is used after \.{\\cr} to see
18048whether the next item is \.{\\noalign}.
18049
18050We shall consider these routines in the order they are first used during
18051the course of a complete \.{\\halign}, namely |init_align|, |align_peek|,
18052|init_row|, |init_span|, |init_col|, |fin_col|, |fin_row|, |fin_align|.
18053
18054@ When \.{\\halign} or \.{\\valign} has been scanned in an appropriate
18055mode, \TeX\ calls |init_align|, whose task is to get everything off to a
18056good start. This mostly involves scanning the preamble and putting its
18057information into the preamble list.
18058@^preamble@>
18059
18060@p @t\4@>@<Declare the procedure called |get_preamble_token|@>@t@>@/
18061procedure@?align_peek; forward;@t\2@>@/
18062procedure@?normal_paragraph; forward;@t\2@>@/
18063procedure init_align;
18064label done, done1, done2, continue;
18065var save_cs_ptr:pointer; {|warning_index| value for error messages}
18066@!p:pointer; {for short-term temporary use}
18067begin save_cs_ptr:=cur_cs; {\.{\\halign} or \.{\\valign}, usually}
18068push_alignment; align_state:=-1000000; {enter a new alignment level}
18069@<Check for improper alignment in displayed math@>;
18070push_nest; {enter a new semantic level}
18071@<Change current mode to |-vmode| for \.{\\halign}, |-hmode| for \.{\\valign}@>;
18072scan_spec(align_group,false);@/
18073@<Scan the preamble and record it in the |preamble| list@>;
18074new_save_level(align_group);
18075if every_cr<>null then begin_token_list(every_cr,every_cr_text);
18076align_peek; {look for \.{\\noalign} or \.{\\omit}}
18077end;
18078
18079@ In vertical modes, |prev_depth| already has the correct value. But
18080if we are in |mmode| (displayed formula mode), we reach out to the
18081enclosing vertical mode for the |prev_depth| value that produces the
18082correct baseline calculations.
18083
18084@<Change current mode...@>=
18085if mode=mmode then
18086  begin mode:=-vmode; prev_depth:=nest[nest_ptr-2].aux_field.sc;
18087  end
18088else if mode>0 then negate(mode)
18089
18090@ When \.{\\halign} is used as a displayed formula, there should be
18091no other pieces of mlists present.
18092
18093@<Check for improper alignment in displayed math@>=
18094if (mode=mmode)and((tail<>head)or(incompleat_noad<>null)) then
18095  begin print_err("Improper "); print_esc("halign"); print(" inside $$'s");
18096@.Improper \\halign...@>
18097  help3("Displays can use special alignments (like \eqalignno)")@/
18098  ("only if nothing but the alignment itself is between $$'s.")@/
18099  ("So I've deleted the formulas that preceded this alignment.");
18100  error; flush_math;
18101  end
18102
18103@ @<Scan the preamble and record it in the |preamble| list@>=
18104preamble:=null; cur_align:=align_head; cur_loop:=null; scanner_status:=aligning;
18105warning_index:=save_cs_ptr; align_state:=-1000000;
18106  {at this point, |cur_cmd=left_brace|}
18107loop@+  begin @<Append the current tabskip glue to the preamble list@>;
18108  if cur_cmd=car_ret then goto done; {\.{\\cr} ends the preamble}
18109  @<Scan preamble text until |cur_cmd| is |tab_mark| or |car_ret|,
18110    looking for changes in the tabskip glue; append an
18111    alignrecord to the preamble list@>;
18112  end;
18113done: scanner_status:=normal
18114
18115@ @<Append the current tabskip glue to the preamble list@>=
18116link(cur_align):=new_param_glue(tab_skip_code);
18117cur_align:=link(cur_align)
18118
18119@ @<Scan preamble text until |cur_cmd| is |tab_mark| or |car_ret|...@>=
18120@<Scan the template \<u_j>, putting the resulting token list in |hold_head|@>;
18121link(cur_align):=new_null_box; cur_align:=link(cur_align); {a new alignrecord}
18122info(cur_align):=end_span; width(cur_align):=null_flag;
18123u_part(cur_align):=link(hold_head);
18124@<Scan the template \<v_j>, putting the resulting token list in |hold_head|@>;
18125v_part(cur_align):=link(hold_head)
18126
18127@ We enter `\.{\\span}' into |eqtb| with |tab_mark| as its command code,
18128and with |span_code| as the command modifier. This makes \TeX\ interpret it
18129essentially the same as an alignment delimiter like `\.\&', yet it is
18130recognizably different when we need to distinguish it from a normal delimiter.
18131It also turns out to be useful to give a special |cr_code| to `\.{\\cr}',
18132and an even larger |cr_cr_code| to `\.{\\crcr}'.
18133
18134The end of a template is represented by two ``frozen'' control sequences
18135called \.{\\endtemplate}. The first has the command code |end_template|, which
18136is |>outer_call|, so it will not easily disappear in the presence of errors.
18137The |get_x_token| routine converts the first into the second, which has |endv|
18138as its command code.
18139
18140@d span_code=special_char {distinct from any character}
18141@d cr_code=span_code+1 {distinct from |span_code| and from any character}
18142@d cr_cr_code=cr_code+1 {this distinguishes \.{\\crcr} from \.{\\cr}}
18143@d end_template_token==cs_token_flag+frozen_end_template
18144
18145@<Put each of \TeX's primitives into the hash table@>=
18146primitive("span",tab_mark,span_code);@/
18147@!@:span_}{\.{\\span} primitive@>
18148primitive("cr",car_ret,cr_code);
18149@!@:cr_}{\.{\\cr} primitive@>
18150text(frozen_cr):="cr"; eqtb[frozen_cr]:=eqtb[cur_val];@/
18151primitive("crcr",car_ret,cr_cr_code);
18152@!@:cr_cr_}{\.{\\crcr} primitive@>
18153text(frozen_end_template):="endtemplate"; text(frozen_endv):="endtemplate";
18154eq_type(frozen_endv):=endv; equiv(frozen_endv):=null_list;
18155eq_level(frozen_endv):=level_one;@/
18156eqtb[frozen_end_template]:=eqtb[frozen_endv];
18157eq_type(frozen_end_template):=end_template;
18158
18159@ @<Cases of |print_cmd_chr|...@>=
18160tab_mark: if chr_code=span_code then print_esc("span")
18161  else chr_cmd("alignment tab character ");
18162car_ret: if chr_code=cr_code then print_esc("cr")
18163  else print_esc("crcr");
18164
18165@ The preamble is copied directly, except that \.{\\tabskip} causes a change
18166to the tabskip glue, thereby possibly expanding macros that immediately
18167follow it. An appearance of \.{\\span} also causes such an expansion.
18168
18169Note that if the preamble contains `\.{\\global\\tabskip}', the `\.{\\global}'
18170token survives in the preamble and the `\.{\\tabskip}' defines new
18171tabskip glue (locally).
18172
18173@<Declare the procedure called |get_preamble_token|@>=
18174procedure get_preamble_token;
18175label restart;
18176begin restart: get_token;
18177while (cur_chr=span_code)and(cur_cmd=tab_mark) do
18178  begin get_token; {this token will be expanded once}
18179  if cur_cmd>max_command then
18180    begin expand; get_token;
18181    end;
18182  end;
18183if cur_cmd=endv then
18184  fatal_error("(interwoven alignment preambles are not allowed)");
18185@.interwoven alignment preambles...@>
18186if (cur_cmd=assign_glue)and(cur_chr=glue_base+tab_skip_code) then
18187  begin scan_optional_equals; scan_glue(glue_val);
18188  if global_defs>0 then geq_define(glue_base+tab_skip_code,glue_ref,cur_val)
18189  else eq_define(glue_base+tab_skip_code,glue_ref,cur_val);
18190  goto restart;
18191  end;
18192end;
18193
18194@ Spaces are eliminated from the beginning of a template.
18195
18196@<Scan the template \<u_j>...@>=
18197p:=hold_head; link(p):=null;
18198loop@+  begin get_preamble_token;
18199  if cur_cmd=mac_param then goto done1;
18200  if (cur_cmd<=car_ret)and(cur_cmd>=tab_mark)and(align_state=-1000000) then
18201   if (p=hold_head)and(cur_loop=null)and(cur_cmd=tab_mark)
18202    then cur_loop:=cur_align
18203   else  begin print_err("Missing # inserted in alignment preamble");
18204@.Missing \# inserted...@>
18205    help3("There should be exactly one # between &'s, when an")@/
18206    ("\halign or \valign is being set up. In this case you had")@/
18207    ("none, so I've put one in; maybe that will work.");
18208    back_error; goto done1;
18209    end
18210  else if (cur_cmd<>spacer)or(p<>hold_head) then
18211    begin link(p):=get_avail; p:=link(p); info(p):=cur_tok;
18212    end;
18213  end;
18214done1:
18215
18216@ @<Scan the template \<v_j>...@>=
18217p:=hold_head; link(p):=null;
18218loop@+  begin continue: get_preamble_token;
18219  if (cur_cmd<=car_ret)and(cur_cmd>=tab_mark)and(align_state=-1000000) then
18220    goto done2;
18221  if cur_cmd=mac_param then
18222    begin print_err("Only one # is allowed per tab");
18223@.Only one \# is allowed...@>
18224    help3("There should be exactly one # between &'s, when an")@/
18225    ("\halign or \valign is being set up. In this case you had")@/
18226    ("more than one, so I'm ignoring all but the first.");
18227    error; goto continue;
18228    end;
18229  link(p):=get_avail; p:=link(p); info(p):=cur_tok;
18230  end;
18231done2: link(p):=get_avail; p:=link(p);
18232info(p):=end_template_token {put \.{\\endtemplate} at the end}
18233
18234@ The tricky part about alignments is getting the templates into the
18235scanner at the right time, and recovering control when a row or column
18236is finished.
18237
18238We usually begin a row after each \.{\\cr} has been sensed, unless that
18239\.{\\cr} is followed by \.{\\noalign} or by the right brace that terminates
18240the alignment. The |align_peek| routine is used to look ahead and do
18241the right thing; it either gets a new row started, or gets a \.{\\noalign}
18242started, or finishes off the alignment.
18243
18244@<Declare the procedure called |align_peek|@>=
18245procedure align_peek;
18246label restart;
18247begin restart: align_state:=1000000;
18248repeat get_x_or_protected;
18249until cur_cmd<>spacer;
18250if cur_cmd=no_align then
18251  begin scan_left_brace; new_save_level(no_align_group);
18252  if mode=-vmode then normal_paragraph;
18253  end
18254else if cur_cmd=right_brace then fin_align
18255else if (cur_cmd=car_ret)and(cur_chr=cr_cr_code) then
18256  goto restart {ignore \.{\\crcr}}
18257else  begin init_row; {start a new row}
18258  init_col; {start a new column and replace what we peeked at}
18259  end;
18260end;
18261
18262@ To start a row (i.e., a `row' that rhymes with `dough' but not with `bough'),
18263we enter a new semantic level, copy the first tabskip glue, and change
18264from internal vertical mode to restricted horizontal mode or vice versa.
18265The |space_factor| and |prev_depth| are not used on this semantic level,
18266but we clear them to zero just to be tidy.
18267
18268@p @t\4@>@<Declare the procedure called |init_span|@>@t@>@/
18269procedure init_row;
18270begin push_nest; mode:=(-hmode-vmode)-mode;
18271if mode=-hmode then space_factor:=0 @+else prev_depth:=0;
18272tail_append(new_glue(glue_ptr(preamble)));
18273subtype(tail):=tab_skip_code+1;@/
18274cur_align:=link(preamble); cur_tail:=cur_head; cur_pre_tail:=cur_pre_head;
18275init_span(cur_align);
18276end;
18277
18278@ The parameter to |init_span| is a pointer to the alignrecord where the
18279next column or group of columns will begin. A new semantic level is
18280entered, so that the columns will generate a list for subsequent packaging.
18281
18282@<Declare the procedure called |init_span|@>=
18283procedure init_span(@!p:pointer);
18284begin push_nest;
18285if mode=-hmode then space_factor:=1000
18286else  begin prev_depth:=ignore_depth; normal_paragraph;
18287  end;
18288cur_span:=p;
18289end;
18290
18291@ When a column begins, we assume that |cur_cmd| is either |omit| or else
18292the current token should be put back into the input until the \<u_j>
18293template has been scanned.  (Note that |cur_cmd| might be |tab_mark| or
18294|car_ret|.)  We also assume that |align_state| is approximately 1000000 at
18295this time.  We remain in the same mode, and start the template if it is
18296called for.
18297
18298@p procedure init_col;
18299begin extra_info(cur_align):=cur_cmd;
18300if cur_cmd=omit then align_state:=0
18301else  begin back_input; begin_token_list(u_part(cur_align),u_template);
18302  end; {now |align_state=1000000|}
18303end;
18304
18305@ The scanner sets |align_state| to zero when the \<u_j> template ends. When
18306a subsequent \.{\\cr} or \.{\\span} or tab mark occurs with |align_state=0|,
18307the scanner activates the following code, which fires up the \<v_j> template.
18308We need to remember the |cur_chr|, which is either |cr_cr_code|, |cr_code|,
18309|span_code|, or a character code, depending on how the column text has ended.
18310
18311This part of the program had better not be activated when the preamble
18312to another alignment is being scanned, or when no alignment preamble is active.
18313
18314@<Insert the \(v)\<v_j>...@>=
18315begin if (scanner_status=aligning) or (cur_align=null) then
18316  fatal_error("(interwoven alignment preambles are not allowed)");
18317@.interwoven alignment preambles...@>
18318cur_cmd:=extra_info(cur_align); extra_info(cur_align):=cur_chr;
18319if cur_cmd=omit then begin_token_list(omit_template,v_template)
18320else begin_token_list(v_part(cur_align),v_template);
18321align_state:=1000000; goto restart;
18322end
18323
18324@ The token list |omit_template| just referred to is a constant token
18325list that contains the special control sequence \.{\\endtemplate} only.
18326
18327@<Initialize the special...@>=
18328info(omit_template):=end_template_token; {|link(omit_template)=null|}
18329
18330@ When the |endv| command at the end of a \<v_j> template comes through the
18331scanner, things really start to happen; and it is the |fin_col| routine
18332that makes them happen. This routine returns |true| if a row as well as a
18333column has been finished.
18334
18335@p function fin_col:boolean;
18336label exit;
18337var p:pointer; {the alignrecord after the current one}
18338@!q,@!r:pointer; {temporary pointers for list manipulation}
18339@!s:pointer; {a new span node}
18340@!u:pointer; {a new unset box}
18341@!w:scaled; {natural width}
18342@!o:glue_ord; {order of infinity}
18343@!n:halfword; {span counter}
18344begin if cur_align=null then confusion("endv");
18345q:=link(cur_align);@+if q=null then confusion("endv");
18346@:this can't happen endv}{\quad endv@>
18347if align_state<500000 then
18348  fatal_error("(interwoven alignment preambles are not allowed)");
18349@.interwoven alignment preambles...@>
18350p:=link(q);
18351@<If the preamble list has been traversed, check that the row has ended@>;
18352if extra_info(cur_align)<>span_code then
18353  begin unsave; new_save_level(align_group);@/
18354  @<Package an unset box for the current column and record its width@>;
18355  @<Copy the tabskip glue between columns@>;
18356  if extra_info(cur_align)>=cr_code then
18357    begin fin_col:=true; return;
18358    end;
18359  init_span(p);
18360  end;
18361align_state:=1000000;
18362repeat get_x_or_protected;
18363until cur_cmd<>spacer;
18364cur_align:=p;
18365init_col; fin_col:=false;
18366exit: end;
18367
18368@ @<If the preamble list has been traversed, check that the row has ended@>=
18369if (p=null)and(extra_info(cur_align)<cr_code) then
18370 if cur_loop<>null then @<Lengthen the preamble periodically@>
18371 else  begin print_err("Extra alignment tab has been changed to ");
18372@.Extra alignment tab...@>
18373  print_esc("cr");
18374  help3("You have given more \span or & marks than there were")@/
18375  ("in the preamble to the \halign or \valign now in progress.")@/
18376  ("So I'll assume that you meant to type \cr instead.");
18377  extra_info(cur_align):=cr_code; error;
18378  end
18379
18380@ @<Lengthen the preamble...@>=
18381begin link(q):=new_null_box; p:=link(q); {a new alignrecord}
18382info(p):=end_span; width(p):=null_flag; cur_loop:=link(cur_loop);
18383@<Copy the templates from node |cur_loop| into node |p|@>;
18384cur_loop:=link(cur_loop);
18385link(p):=new_glue(glue_ptr(cur_loop));
18386end
18387
18388@ @<Copy the templates from node |cur_loop| into node |p|@>=
18389q:=hold_head; r:=u_part(cur_loop);
18390while r<>null do
18391  begin link(q):=get_avail; q:=link(q); info(q):=info(r); r:=link(r);
18392  end;
18393link(q):=null; u_part(p):=link(hold_head);
18394q:=hold_head; r:=v_part(cur_loop);
18395while r<>null do
18396  begin link(q):=get_avail; q:=link(q); info(q):=info(r); r:=link(r);
18397  end;
18398link(q):=null; v_part(p):=link(hold_head)
18399
18400@ @<Copy the tabskip glue...@>=
18401tail_append(new_glue(glue_ptr(link(cur_align))));
18402subtype(tail):=tab_skip_code+1
18403
18404@ @<Package an unset...@>=
18405begin if mode=-hmode then
18406  begin adjust_tail:=cur_tail; pre_adjust_tail:=cur_pre_tail;
18407  u:=hpack(link(head),natural); w:=width(u);
18408  cur_tail:=adjust_tail; adjust_tail:=null;
18409  cur_pre_tail:=pre_adjust_tail; pre_adjust_tail:=null;
18410  end
18411else  begin u:=vpackage(link(head),natural,0); w:=height(u);
18412  end;
18413n:=min_quarterword; {this represents a span count of 1}
18414if cur_span<>cur_align then @<Update width entry for spanned columns@>
18415else if w>width(cur_align) then width(cur_align):=w;
18416type(u):=unset_node; span_count(u):=n;@/
18417@<Determine the stretch order@>;
18418glue_order(u):=o; glue_stretch(u):=total_stretch[o];@/
18419@<Determine the shrink order@>;
18420glue_sign(u):=o; glue_shrink(u):=total_shrink[o];@/
18421pop_nest; link(tail):=u; tail:=u;
18422end
18423
18424@ A span node is a 2-word record containing |width|, |info|, and |link|
18425fields. The |link| field is not really a link, it indicates the number of
18426spanned columns; the |info| field points to a span node for the same
18427starting column, having a greater extent of spanning, or to |end_span|,
18428which has the largest possible |link| field; the |width| field holds the
18429largest natural width corresponding to a particular set of spanned columns.
18430
18431A list of the maximum widths so far, for spanned columns starting at a
18432given column, begins with the |info| field of the alignrecord for that
18433column.
18434
18435@d span_node_size=2 {number of |mem| words for a span node}
18436
18437@<Initialize the special list heads...@>=
18438link(end_span):=max_quarterword+1; info(end_span):=null;
18439
18440@ @<Update width entry for spanned columns@>=
18441begin q:=cur_span;
18442repeat incr(n); q:=link(link(q));
18443until q=cur_align;
18444if n>max_quarterword then confusion("too many spans");
18445   {this can happen, but won't}
18446@^system dependencies@>
18447@:this can't happen too many spans}{\quad too many spans@>
18448q:=cur_span; while link(info(q))<n do q:=info(q);
18449if link(info(q))>n then
18450  begin s:=get_node(span_node_size); info(s):=info(q); link(s):=n;
18451  info(q):=s; width(s):=w;
18452  end
18453else if width(info(q))<w then width(info(q)):=w;
18454end
18455
18456@ At the end of a row, we append an unset box to the current vlist (for
18457\.{\\halign}) or the current hlist (for \.{\\valign}). This unset box
18458contains the unset boxes for the columns, separated by the tabskip glue.
18459Everything will be set later.
18460
18461@p procedure fin_row;
18462var p:pointer; {the new unset box}
18463begin if mode=-hmode then
18464  begin p:=hpack(link(head),natural);
18465  pop_nest;
18466  if cur_pre_head <> cur_pre_tail then
18467      append_list(cur_pre_head)(cur_pre_tail);
18468  append_to_vlist(p);
18469  if cur_head <> cur_tail then
18470      append_list(cur_head)(cur_tail);
18471  end
18472else  begin p:=vpack(link(head),natural); pop_nest;
18473  link(tail):=p; tail:=p; space_factor:=1000;
18474  end;
18475type(p):=unset_node; glue_stretch(p):=0;
18476if every_cr<>null then begin_token_list(every_cr,every_cr_text);
18477align_peek;
18478end; {note that |glue_shrink(p)=0| since |glue_shrink==shift_amount|}
18479
18480@ Finally, we will reach the end of the alignment, and we can breathe a
18481sigh of relief that memory hasn't overflowed. All the unset boxes will now be
18482set so that the columns line up, taking due account of spanned columns.
18483
18484@p procedure@?do_assignments; forward;@t\2@>@/
18485procedure@?resume_after_display; forward;@t\2@>@/
18486procedure@?build_page; forward;@t\2@>@/
18487procedure fin_align;
18488var @!p,@!q,@!r,@!s,@!u,@!v: pointer; {registers for the list operations}
18489@!t,@!w:scaled; {width of column}
18490@!o:scaled; {shift offset for unset boxes}
18491@!n:halfword; {matching span amount}
18492@!rule_save:scaled; {temporary storage for |overfull_rule|}
18493@!aux_save:memory_word; {temporary storage for |aux|}
18494begin if cur_group<>align_group then confusion("align1");
18495@:this can't happen align}{\quad align@>
18496unsave; {that |align_group| was for individual entries}
18497if cur_group<>align_group then confusion("align0");
18498unsave; {that |align_group| was for the whole alignment}
18499if nest[nest_ptr-1].mode_field=mmode then o:=display_indent
18500  else o:=0;
18501@<Go through the preamble list, determining the column widths and
18502  changing the alignrecords to dummy unset boxes@>;
18503@<Package the preamble list, to determine the actual tabskip glue amounts,
18504  and let |p| point to this prototype box@>;
18505@<Set the glue in all the unset boxes of the current list@>;
18506flush_node_list(p); pop_alignment;
18507@<Insert the \(c)current list into its environment@>;
18508end;@/
18509@t\4@>@<Declare the procedure called |align_peek|@>
18510
18511@ It's time now to dismantle the preamble list and to compute the column
18512widths. Let $w_{ij}$ be the maximum of the natural widths of all entries
18513that span columns $i$ through $j$, inclusive. The alignrecord for column~$i$
18514contains $w_{ii}$ in its |width| field, and there is also a linked list of
18515the nonzero $w_{ij}$ for increasing $j$, accessible via the |info| field;
18516these span nodes contain the value $j-i+|min_quarterword|$ in their
18517|link| fields. The values of $w_{ii}$ were initialized to |null_flag|, which
18518we regard as $-\infty$.
18519
18520The final column widths are defined by the formula
18521$$w_j=\max_{1\L i\L j}\biggl( w_{ij}-\sum_{i\L k<j}(t_k+w_k)\biggr),$$
18522where $t_k$ is the natural width of the tabskip glue between columns
18523$k$ and~$k+1$. However, if $w_{ij}=-\infty$ for all |i| in the range
18524|1<=i<=j| (i.e., if every entry that involved column~|j| also involved
18525column~|j+1|), we let $w_j=0$, and we zero out the tabskip glue after
18526column~|j|.
18527
18528\TeX\ computes these values by using the following scheme: First $w_1=w_{11}$.
18529Then replace $w_{2j}$ by $\max(w_{2j},w_{1j}-t_1-w_1)$, for all $j>1$.
18530Then $w_2=w_{22}$. Then replace $w_{3j}$ by $\max(w_{3j},w_{2j}-t_2-w_2)$
18531for all $j>2$; and so on. If any $w_j$ turns out to be $-\infty$, its
18532value is changed to zero and so is the next tabskip.
18533
18534@<Go through the preamble list,...@>=
18535q:=link(preamble);
18536repeat flush_list(u_part(q)); flush_list(v_part(q));
18537p:=link(link(q));
18538if width(q)=null_flag then
18539  @<Nullify |width(q)| and the tabskip glue following this column@>;
18540if info(q)<>end_span then
18541  @<Merge the widths in the span nodes of |q| with those of |p|,
18542    destroying the span nodes of |q|@>;
18543type(q):=unset_node; span_count(q):=min_quarterword; height(q):=0;
18544depth(q):=0; glue_order(q):=normal; glue_sign(q):=normal;
18545glue_stretch(q):=0; glue_shrink(q):=0; q:=p;
18546until q=null
18547
18548@ @<Nullify |width(q)| and the tabskip glue following this column@>=
18549begin width(q):=0; r:=link(q); s:=glue_ptr(r);
18550if s<>zero_glue then
18551  begin add_glue_ref(zero_glue); delete_glue_ref(s);
18552  glue_ptr(r):=zero_glue;
18553  end;
18554end
18555
18556@ Merging of two span-node lists is a typical exercise in the manipulation of
18557linearly linked data structures. The essential invariant in the following
18558|repeat| loop is that we want to dispense with node |r|, in |q|'s list,
18559and |u| is its successor; all nodes of |p|'s list up to and including |s|
18560have been processed, and the successor of |s| matches |r| or precedes |r|
18561or follows |r|, according as |link(r)=n| or |link(r)>n| or |link(r)<n|.
18562
18563@<Merge the widths...@>=
18564begin t:=width(q)+width(glue_ptr(link(q)));
18565r:=info(q); s:=end_span; info(s):=p; n:=min_quarterword+1;
18566repeat width(r):=width(r)-t; u:=info(r);
18567while link(r)>n do
18568  begin s:=info(s); n:=link(info(s))+1;
18569  end;
18570if link(r)<n then
18571  begin info(r):=info(s); info(s):=r; decr(link(r)); s:=r;
18572  end
18573else  begin if width(r)>width(info(s)) then width(info(s)):=width(r);
18574  free_node(r,span_node_size);
18575  end;
18576r:=u;
18577until r=end_span;
18578end
18579
18580@ Now the preamble list has been converted to a list of alternating unset
18581boxes and tabskip glue, where the box widths are equal to the final
18582column sizes. In case of \.{\\valign}, we change the widths to heights,
18583so that a correct error message will be produced if the alignment is
18584overfull or underfull.
18585
18586@<Package the preamble list...@>=
18587save_ptr:=save_ptr-2; pack_begin_line:=-mode_line;
18588if mode=-vmode then
18589  begin rule_save:=overfull_rule;
18590  overfull_rule:=0; {prevent rule from being packaged}
18591  p:=hpack(preamble,saved(1),saved(0)); overfull_rule:=rule_save;
18592  end
18593else  begin q:=link(preamble);
18594  repeat height(q):=width(q); width(q):=0; q:=link(link(q));
18595  until q=null;
18596  p:=vpack(preamble,saved(1),saved(0));
18597  q:=link(preamble);
18598  repeat width(q):=height(q); height(q):=0; q:=link(link(q));
18599  until q=null;
18600  end;
18601pack_begin_line:=0
18602
18603@ @<Set the glue in all the unset...@>=
18604q:=link(head); s:=head;
18605while q<>null do
18606  begin if not is_char_node(q) then
18607    if type(q)=unset_node then
18608      @<Set the unset box |q| and the unset boxes in it@>
18609    else if type(q)=rule_node then
18610      @<Make the running dimensions in rule |q| extend to the
18611        boundaries of the alignment@>;
18612  s:=q; q:=link(q);
18613  end
18614
18615@ @<Make the running dimensions in rule |q| extend...@>=
18616begin if is_running(width(q)) then width(q):=width(p);
18617if is_running(height(q)) then height(q):=height(p);
18618if is_running(depth(q)) then depth(q):=depth(p);
18619if o<>0 then
18620  begin r:=link(q); link(q):=null; q:=hpack(q,natural);
18621  shift_amount(q):=o; link(q):=r; link(s):=q;
18622  end;
18623end
18624
18625@ The unset box |q| represents a row that contains one or more unset boxes,
18626depending on how soon \.{\\cr} occurred in that row.
18627
18628@<Set the unset box |q| and the unset boxes in it@>=
18629begin if mode=-vmode then
18630  begin type(q):=hlist_node; width(q):=width(p);
18631  if nest[nest_ptr-1].mode_field=mmode then set_box_lr(q)(dlist); {for |ship_out|}
18632  end
18633else  begin type(q):=vlist_node; height(q):=height(p);
18634  end;
18635glue_order(q):=glue_order(p); glue_sign(q):=glue_sign(p);
18636glue_set(q):=glue_set(p); shift_amount(q):=o;
18637r:=link(list_ptr(q)); s:=link(list_ptr(p));
18638repeat @<Set the glue in node |r| and change it from an unset node@>;
18639r:=link(link(r)); s:=link(link(s));
18640until r=null;
18641end
18642
18643@ A box made from spanned columns will be followed by tabskip glue nodes and
18644by empty boxes as if there were no spanning. This permits perfect alignment
18645of subsequent entries, and it prevents values that depend on floating point
18646arithmetic from entering into the dimensions of any boxes.
18647
18648@<Set the glue in node |r|...@>=
18649n:=span_count(r); t:=width(s); w:=t; u:=hold_head;
18650set_box_lr(r)(0); {for |ship_out|}
18651while n>min_quarterword do
18652  begin decr(n);
18653  @<Append tabskip glue and an empty box to list |u|,
18654    and update |s| and |t| as the prototype nodes are passed@>;
18655  end;
18656if mode=-vmode then
18657  @<Make the unset node |r| into an |hlist_node| of width |w|,
18658    setting the glue as if the width were |t|@>
18659else @<Make the unset node |r| into a |vlist_node| of height |w|,
18660    setting the glue as if the height were |t|@>;
18661shift_amount(r):=0;
18662if u<>hold_head then {append blank boxes to account for spanned nodes}
18663  begin link(u):=link(r); link(r):=link(hold_head); r:=u;
18664  end
18665
18666@ @<Append tabskip glue and an empty box to list |u|...@>=
18667s:=link(s); v:=glue_ptr(s); link(u):=new_glue(v); u:=link(u);
18668subtype(u):=tab_skip_code+1; t:=t+width(v);
18669if glue_sign(p)=stretching then
18670  begin if stretch_order(v)=glue_order(p) then
18671    t:=t+round(float(glue_set(p))*stretch(v));
18672@^real multiplication@>
18673  end
18674else if glue_sign(p)=shrinking then
18675  begin if shrink_order(v)=glue_order(p) then
18676    t:=t-round(float(glue_set(p))*shrink(v));
18677  end;
18678s:=link(s); link(u):=new_null_box; u:=link(u); t:=t+width(s);
18679if mode=-vmode then width(u):=width(s)@+else
18680  begin type(u):=vlist_node; height(u):=width(s);
18681  end
18682
18683@ @<Make the unset node |r| into an |hlist_node| of width |w|...@>=
18684begin height(r):=height(q); depth(r):=depth(q);
18685if t=width(r) then
18686  begin glue_sign(r):=normal; glue_order(r):=normal;
18687  set_glue_ratio_zero(glue_set(r));
18688  end
18689else if t>width(r) then
18690  begin glue_sign(r):=stretching;
18691  if glue_stretch(r)=0 then set_glue_ratio_zero(glue_set(r))
18692  else glue_set(r):=unfloat((t-width(r))/glue_stretch(r));
18693@^real division@>
18694  end
18695else  begin glue_order(r):=glue_sign(r); glue_sign(r):=shrinking;
18696  if glue_shrink(r)=0 then set_glue_ratio_zero(glue_set(r))
18697  else if (glue_order(r)=normal)and(width(r)-t>glue_shrink(r)) then
18698    set_glue_ratio_one(glue_set(r))
18699  else glue_set(r):=unfloat((width(r)-t)/glue_shrink(r));
18700  end;
18701width(r):=w; type(r):=hlist_node;
18702end
18703
18704@ @<Make the unset node |r| into a |vlist_node| of height |w|...@>=
18705begin width(r):=width(q);
18706if t=height(r) then
18707  begin glue_sign(r):=normal; glue_order(r):=normal;
18708  set_glue_ratio_zero(glue_set(r));
18709  end
18710else if t>height(r) then
18711  begin glue_sign(r):=stretching;
18712  if glue_stretch(r)=0 then set_glue_ratio_zero(glue_set(r))
18713  else glue_set(r):=unfloat((t-height(r))/glue_stretch(r));
18714@^real division@>
18715  end
18716else  begin glue_order(r):=glue_sign(r); glue_sign(r):=shrinking;
18717  if glue_shrink(r)=0 then set_glue_ratio_zero(glue_set(r))
18718  else if (glue_order(r)=normal)and(height(r)-t>glue_shrink(r)) then
18719    set_glue_ratio_one(glue_set(r))
18720  else glue_set(r):=unfloat((height(r)-t)/glue_shrink(r));
18721  end;
18722height(r):=w; type(r):=vlist_node;
18723end
18724
18725@ We now have a completed alignment, in the list that starts at |head|
18726and ends at |tail|. This list will be merged with the one that encloses
18727it. (In case the enclosing mode is |mmode|, for displayed formulas,
18728we will need to insert glue before and after the display; that part of the
18729program will be deferred until we're more familiar with such operations.)
18730
18731In restricted horizontal mode, the |clang| part of |aux| is undefined;
18732an over-cautious \PASCAL\ runtime system may complain about this.
18733@^dirty \PASCAL@>
18734
18735@<Insert the \(c)current list into its environment@>=
18736aux_save:=aux; p:=link(head); q:=tail; pop_nest;
18737if mode=mmode then @<Finish an alignment in a display@>
18738else  begin aux:=aux_save; link(tail):=p;
18739  if p<>null then tail:=q;
18740  if mode=vmode then build_page;
18741  end
18742
18743@* \[38] Breaking paragraphs into lines.
18744We come now to what is probably the most interesting algorithm of \TeX:
18745the mechanism for choosing the ``best possible'' breakpoints that yield
18746the individual lines of a paragraph. \TeX's line-breaking algorithm takes
18747a given horizontal list and converts it to a sequence of boxes that are
18748appended to the current vertical list. In the course of doing this, it
18749creates a special data structure containing three kinds of records that are
18750not used elsewhere in \TeX. Such nodes are created while a paragraph is
18751being processed, and they are destroyed afterwards; thus, the other parts
18752of \TeX\ do not need to know anything about how line-breaking is done.
18753
18754The method used here is based on an approach devised by Michael F. Plass and
18755@^Plass, Michael Frederick@>
18756@^Knuth, Donald Ervin@>
18757the author in 1977, subsequently generalized and improved by the same two
18758people in 1980. A detailed discussion appears in {\sl SOFTWARE---Practice
18759\AM\ Experience \bf11} (1981), 1119--1184, where it is shown that the
18760line-breaking problem can be regarded as a special case of the problem of
18761computing the shortest path in an acyclic network. The cited paper includes
18762numerous examples and describes the history of line breaking as it has been
18763practiced by printers through the ages. The present implementation adds two
18764new ideas to the algorithm of 1980: Memory space requirements are considerably
18765reduced by using smaller records for inactive nodes than for active ones,
18766and arithmetic overflow is avoided by using ``delta distances'' instead of
18767keeping track of the total distance from the beginning of the paragraph to the
18768current point.
18769
18770@ The |line_break| procedure should be invoked only in horizontal mode; it
18771leaves that mode and places its output into the current vlist of the
18772enclosing vertical mode (or internal vertical mode).
18773There is one explicit parameter:  |d| is true for partial paragraphs
18774preceding display math mode; in this case the amount of additional
18775penalty inserted before the final line is |display_widow_penalty|
18776instead of |widow_penalty|.
18777
18778There are also a number of implicit parameters: The hlist to be broken
18779starts at |link(head)|, and it is nonempty. The value of |prev_graf| in the
18780enclosing semantic level tells where the paragraph should begin in the
18781sequence of line numbers, in case hanging indentation or \.{\\parshape}
18782is in use; |prev_graf| is zero unless this paragraph is being continued
18783after a displayed formula.  Other implicit parameters, such as the
18784|par_shape_ptr| and various penalties to use for hyphenation, etc., appear
18785in |eqtb|.
18786
18787After |line_break| has acted, it will have updated the current vlist and the
18788value of |prev_graf|. Furthermore, the global variable |just_box| will
18789point to the final box created by |line_break|, so that the width of this
18790line can be ascertained when it is necessary to decide whether to use
18791|above_display_skip| or |above_display_short_skip| before a displayed formula.
18792
18793@<Glob...@>=
18794@!just_box:pointer; {the |hlist_node| for the last line of the new paragraph}
18795
18796@ Since |line_break| is a rather lengthy procedure---sort of a small world unto
18797itself---we must build it up little by little, somewhat more cautiously
18798than we have done with the simpler procedures of \TeX. Here is the
18799general outline.
18800
18801@p@t\4@>@<Declare subprocedures for |line_break|@>
18802procedure line_break(@!d:boolean);
18803label done,done1,done2,done3,done4,done5,done6,continue, restart;
18804var @<Local variables for line breaking@>@;
18805begin pack_begin_line:=mode_line; {this is for over/underfull box messages}
18806@<Get ready to start line breaking@>;
18807@<Find optimal breakpoints@>;
18808@<Break the paragraph at the chosen breakpoints, justify the resulting lines
18809to the correct widths, and append them to the current vertical list@>;
18810@<Clean up the memory by removing the break nodes@>;
18811pack_begin_line:=0;
18812end;
18813@#
18814@t\4@>@<Declare \eTeX\ procedures for use by |main_control|@>
18815
18816@ The first task is to move the list from |head| to |temp_head| and go
18817into the enclosing semantic level. We also append the \.{\\parfillskip}
18818glue to the end of the paragraph, removing a space (or other glue node) if
18819it was there, since spaces usually precede blank lines and instances of
18820`\.{\$\$}'. The |par_fill_skip| is preceded by an infinite penalty, so
18821it will never be considered as a potential breakpoint.
18822
18823This code assumes that a |glue_node| and a |penalty_node| occupy the
18824same number of |mem|~words.
18825@^data structure assumptions@>
18826
18827@<Get ready to start...@>=
18828link(temp_head):=link(head);
18829if is_char_node(tail) then tail_append(new_penalty(inf_penalty))
18830else if type(tail)<>glue_node then tail_append(new_penalty(inf_penalty))
18831else  begin type(tail):=penalty_node; delete_glue_ref(glue_ptr(tail));
18832  flush_node_list(leader_ptr(tail)); penalty(tail):=inf_penalty;
18833  end;
18834link(tail):=new_param_glue(par_fill_skip_code);
18835last_line_fill:=link(tail);
18836init_cur_lang:=prev_graf mod @'200000;
18837init_l_hyf:=prev_graf div @'20000000;
18838init_r_hyf:=(prev_graf div @'200000) mod @'100;
18839pop_nest;
18840
18841@ When looking for optimal line breaks, \TeX\ creates a ``break node'' for
18842each break that is {\sl feasible}, in the sense that there is a way to end
18843a line at the given place without requiring any line to stretch more than
18844a given tolerance. A break node is characterized by three things: the position
18845of the break (which is a pointer to a |glue_node|, |math_node|, |penalty_node|,
18846or |disc_node|); the ordinal number of the line that will follow this
18847breakpoint; and the fitness classification of the line that has just
18848ended, i.e., |tight_fit|, |decent_fit|, |loose_fit|, or |very_loose_fit|.
18849
18850@d tight_fit=3 {fitness classification for lines shrinking 0.5 to 1.0 of their
18851  shrinkability}
18852@d loose_fit=1 {fitness classification for lines stretching 0.5 to 1.0 of their
18853  stretchability}
18854@d very_loose_fit=0 {fitness classification for lines stretching more than
18855  their stretchability}
18856@d decent_fit=2 {fitness classification for all other lines}
18857
18858@ The algorithm essentially determines the best possible way to achieve
18859each feasible combination of position, line, and fitness. Thus, it answers
18860questions like, ``What is the best way to break the opening part of the
18861paragraph so that the fourth line is a tight line ending at such-and-such
18862a place?'' However, the fact that all lines are to be the same length
18863after a certain point makes it possible to regard all sufficiently large
18864line numbers as equivalent, when the looseness parameter is zero, and this
18865makes it possible for the algorithm to save space and time.
18866
18867An ``active node'' and a ``passive node'' are created in |mem| for each
18868feasible breakpoint that needs to be considered. Active nodes are three
18869words long and passive nodes are two words long. We need active nodes only
18870for breakpoints near the place in the paragraph that is currently being
18871examined, so they are recycled within a comparatively short time after
18872they are created.
18873
18874@ An active node for a given breakpoint contains six fields:
18875
18876\yskip\hang|link| points to the next node in the list of active nodes; the
18877last active node has |link=last_active|.
18878
18879\yskip\hang|break_node| points to the passive node associated with this
18880breakpoint.
18881
18882\yskip\hang|line_number| is the number of the line that follows this
18883breakpoint.
18884
18885\yskip\hang|fitness| is the fitness classification of the line ending at this
18886breakpoint.
18887
18888\yskip\hang|type| is either |hyphenated| or |unhyphenated|, depending on
18889whether this breakpoint is a |disc_node|.
18890
18891\yskip\hang|total_demerits| is the minimum possible sum of demerits over all
18892lines leading from the beginning of the paragraph to this breakpoint.
18893
18894\yskip\noindent
18895The value of |link(active)| points to the first active node on a linked list
18896of all currently active nodes. This list is in order by |line_number|,
18897except that nodes with |line_number>easy_line| may be in any order relative
18898to each other.
18899
18900@d active_node_size_normal=3 {number of words in normal active nodes}
18901@d fitness==subtype {|very_loose_fit..tight_fit| on final line for this break}
18902@d break_node==rlink {pointer to the corresponding passive node}
18903@d line_number==llink {line that begins at this breakpoint}
18904@d total_demerits(#)==mem[#+2].int {the quantity that \TeX\ minimizes}
18905@d unhyphenated=0 {the |type| of a normal active break node}
18906@d hyphenated=1 {the |type| of an active node that breaks at a |disc_node|}
18907@d last_active==active {the active list ends where it begins}
18908
18909@ @<Initialize the special list heads...@>=
18910type(last_active):=hyphenated; line_number(last_active):=max_halfword;
18911subtype(last_active):=0; {the |subtype| is never examined by the algorithm}
18912
18913@ The passive node for a given breakpoint contains only four fields:
18914
18915\yskip\hang|link| points to the passive node created just before this one,
18916if any, otherwise it is |null|.
18917
18918\yskip\hang|cur_break| points to the position of this breakpoint in the
18919horizontal list for the paragraph being broken.
18920
18921\yskip\hang|prev_break| points to the passive node that should precede this
18922one in an optimal path to this breakpoint.
18923
18924\yskip\hang|serial| is equal to |n| if this passive node is the |n|th
18925one created during the current pass. (This field is used only when
18926printing out detailed statistics about the line-breaking calculations.)
18927
18928\yskip\noindent
18929There is a global variable called |passive| that points to the most
18930recently created passive node. Another global variable, |printed_node|,
18931is used to help print out the paragraph when detailed information about
18932the line-breaking computation is being displayed.
18933
18934@d passive_node_size=2 {number of words in passive nodes}
18935@d cur_break==rlink {in passive node, points to position of this breakpoint}
18936@d prev_break==llink {points to passive node that should precede this one}
18937@d serial==info {serial number for symbolic identification}
18938
18939@<Glob...@>=
18940@!passive:pointer; {most recent node on passive list}
18941@!printed_node:pointer; {most recent node that has been printed}
18942@!pass_number:halfword; {the number of passive nodes allocated on this pass}
18943
18944@ The active list also contains ``delta'' nodes that help the algorithm
18945compute the badness of individual lines. Such nodes appear only between two
18946active nodes, and they have |type=delta_node|. If |p| and |r| are active nodes
18947and if |q| is a delta node between them, so that |link(p)=q| and |link(q)=r|,
18948then |q| tells the space difference between lines in the horizontal list that
18949start after breakpoint |p| and lines that start after breakpoint |r|. In
18950other words, if we know the length of the line that starts after |p| and
18951ends at our current position, then the corresponding length of the line that
18952starts after |r| is obtained by adding the amounts in node~|q|. A delta node
18953contains six scaled numbers, since it must record the net change in glue
18954stretchability with respect to all orders of infinity. The natural width
18955difference appears in |mem[q+1].sc|; the stretch differences in units of
18956pt, fil, fill, and filll appear in |mem[q+2..q+5].sc|; and the shrink difference
18957appears in |mem[q+6].sc|. The |subtype| field of a delta node is not used.
18958
18959@d delta_node_size=7 {number of words in a delta node}
18960@d delta_node=2 {|type| field in a delta node}
18961
18962@ As the algorithm runs, it maintains a set of six delta-like registers
18963for the length of the line following the first active breakpoint to the
18964current position in the given hlist. When it makes a pass through the
18965active list, it also maintains a similar set of six registers for the
18966length following the active breakpoint of current interest. A third set
18967holds the length of an empty line (namely, the sum of \.{\\leftskip} and
18968\.{\\rightskip}); and a fourth set is used to create new delta nodes.
18969
18970When we pass a delta node we want to do operations like
18971$$\hbox{\ignorespaces|for
18972k:=1 to 6 do cur_active_width[k]:=cur_active_width[k]+mem[q+k].sc|};$$ and we
18973want to do this without the overhead of |for| loops. The |do_all_six|
18974macro makes such six-tuples convenient.
18975
18976@d do_all_six(#)==#(1);#(2);#(3);#(4);#(5);#(6)
18977
18978@<Glob...@>=
18979@!active_width:array[1..6] of scaled;
18980  {distance from first active node to~|cur_p|}
18981@!cur_active_width:array[1..6] of scaled; {distance from current active node}
18982@!background:array[1..6] of scaled; {length of an ``empty'' line}
18983@!break_width:array[1..6] of scaled; {length being computed after current break}
18984
18985@ Let's state the principles of the delta nodes more precisely and concisely,
18986so that the following programs will be less obscure. For each legal
18987breakpoint~|p| in the paragraph, we define two quantities $\alpha(p)$ and
18988$\beta(p)$ such that the length of material in a line from breakpoint~|p|
18989to breakpoint~|q| is $\gamma+\beta(q)-\alpha(p)$, for some fixed $\gamma$.
18990Intuitively, $\alpha(p)$ and $\beta(q)$ are the total length of material from
18991the beginning of the paragraph to a point ``after'' a break at |p| and to a
18992point ``before'' a break at |q|; and $\gamma$ is the width of an empty line,
18993namely the length contributed by \.{\\leftskip} and \.{\\rightskip}.
18994
18995Suppose, for example, that the paragraph consists entirely of alternating
18996boxes and glue skips; let the boxes have widths $x_1\ldots x_n$ and
18997let the skips have widths $y_1\ldots y_n$, so that the paragraph can be
18998represented by $x_1y_1\ldots x_ny_n$. Let $p_i$ be the legal breakpoint
18999at $y_i$; then $\alpha(p_i)=x_1+y_1+\cdots+x_i+y_i$, and $\beta(p_i)=
19000x_1+y_1+\cdots+x_i$. To check this, note that the length of material from
19001$p_2$ to $p_5$, say, is $\gamma+x_3+y_3+x_4+y_4+x_5=\gamma+\beta(p_5)
19002-\alpha(p_2)$.
19003
19004The quantities $\alpha$, $\beta$, $\gamma$ involve glue stretchability and
19005shrinkability as well as a natural width. If we were to compute $\alpha(p)$
19006and $\beta(p)$ for each |p|, we would need multiple precision arithmetic, and
19007the multiprecise numbers would have to be kept in the active nodes.
19008\TeX\ avoids this problem by working entirely with relative differences
19009or ``deltas.'' Suppose, for example, that the active list contains
19010$a_1\,\delta_1\,a_2\,\delta_2\,a_3$, where the |a|'s are active breakpoints
19011and the $\delta$'s are delta nodes. Then $\delta_1=\alpha(a_1)-\alpha(a_2)$
19012and $\delta_2=\alpha(a_2)-\alpha(a_3)$. If the line breaking algorithm is
19013currently positioned at some other breakpoint |p|, the |active_width| array
19014contains the value $\gamma+\beta(p)-\alpha(a_1)$. If we are scanning through
19015the list of active nodes and considering a tentative line that runs from
19016$a_2$ to~|p|, say, the |cur_active_width| array will contain the value
19017$\gamma+\beta(p)-\alpha(a_2)$. Thus, when we move from $a_2$ to $a_3$,
19018we want to add $\alpha(a_2)-\alpha(a_3)$ to |cur_active_width|; and this
19019is just $\delta_2$, which appears in the active list between $a_2$ and
19020$a_3$. The |background| array contains $\gamma$. The |break_width| array
19021will be used to calculate values of new delta nodes when the active
19022list is being updated.
19023
19024@ Glue nodes in a horizontal list that is being paragraphed are not supposed to
19025include ``infinite'' shrinkability; that is why the algorithm maintains
19026four registers for stretching but only one for shrinking. If the user tries to
19027introduce infinite shrinkability, the shrinkability will be reset to finite
19028and an error message will be issued. A boolean variable |no_shrink_error_yet|
19029prevents this error message from appearing more than once per paragraph.
19030
19031@d check_shrinkage(#)==if (shrink_order(#)<>normal)and(shrink(#)<>0) then
19032  begin #:=finite_shrink(#);
19033  end
19034
19035@<Glob...@>=
19036@!no_shrink_error_yet:boolean; {have we complained about infinite shrinkage?}
19037
19038@ @<Declare subprocedures for |line_break|@>=
19039function finite_shrink(@!p:pointer):pointer; {recovers from infinite shrinkage}
19040var q:pointer; {new glue specification}
19041begin if no_shrink_error_yet then
19042  begin no_shrink_error_yet:=false;
19043  print_err("Infinite glue shrinkage found in a paragraph");
19044@.Infinite glue shrinkage...@>
19045  help5("The paragraph just ended includes some glue that has")@/
19046  ("infinite shrinkability, e.g., `\hskip 0pt minus 1fil'.")@/
19047  ("Such glue doesn't belong there---it allows a paragraph")@/
19048  ("of any length to fit on one line. But it's safe to proceed,")@/
19049  ("since the offensive shrinkability has been made finite.");
19050  error;
19051  end;
19052q:=new_spec(p); shrink_order(q):=normal;
19053delete_glue_ref(p); finite_shrink:=q;
19054end;
19055
19056@ @<Get ready to start...@>=
19057no_shrink_error_yet:=true;@/
19058check_shrinkage(left_skip); check_shrinkage(right_skip);@/
19059q:=left_skip; r:=right_skip; background[1]:=width(q)+width(r);@/
19060background[2]:=0; background[3]:=0; background[4]:=0; background[5]:=0;@/
19061background[2+stretch_order(q)]:=stretch(q);@/
19062background[2+stretch_order(r)]:=@|background[2+stretch_order(r)]+stretch(r);@/
19063background[6]:=shrink(q)+shrink(r);
19064@<Check for special treatment of last line of paragraph@>;
19065
19066@ A pointer variable |cur_p| runs through the given horizontal list as we look
19067for breakpoints. This variable is global, since it is used both by |line_break|
19068and by its subprocedure |try_break|.
19069
19070Another global variable called |threshold| is used to determine the feasibility
19071of individual lines: Breakpoints are feasible if there is a way to reach
19072them without creating lines whose badness exceeds |threshold|.  (The
19073badness is compared to |threshold| before penalties are added, so that
19074penalty values do not affect the feasibility of breakpoints, except that
19075no break is allowed when the penalty is 10000 or more.) If |threshold|
19076is 10000 or more, all legal breaks are considered feasible, since the
19077|badness| function specified above never returns a value greater than~10000.
19078
19079Up to three passes might be made through the paragraph in an attempt to find at
19080least one set of feasible breakpoints. On the first pass, we have
19081|threshold=pretolerance| and |second_pass=final_pass=false|.
19082If this pass fails to find a
19083feasible solution, |threshold| is set to |tolerance|, |second_pass| is set
19084|true|, and an attempt is made to hyphenate as many words as possible.
19085If that fails too, we add |emergency_stretch| to the background
19086stretchability and set |final_pass=true|.
19087
19088@<Glob...@>=
19089@!cur_p:pointer; {the current breakpoint under consideration}
19090@!second_pass:boolean; {is this our second attempt to break this paragraph?}
19091@!final_pass:boolean; {is this our final attempt to break this paragraph?}
19092@!threshold:integer; {maximum badness on feasible lines}
19093
19094@ The heart of the line-breaking procedure is `|try_break|', a subroutine
19095that tests if the current breakpoint |cur_p| is feasible, by running
19096through the active list to see what lines of text can be made from active
19097nodes to~|cur_p|.  If feasible breaks are possible, new break nodes are
19098created.  If |cur_p| is too far from an active node, that node is
19099deactivated.
19100
19101The parameter |pi| to |try_break| is the penalty associated
19102with a break at |cur_p|; we have |pi=eject_penalty| if the break is forced,
19103and |pi=inf_penalty| if the break is illegal.
19104
19105The other parameter, |break_type|, is set to |hyphenated| or |unhyphenated|,
19106depending on whether or not the current break is at a |disc_node|. The
19107end of a paragraph is also regarded as `|hyphenated|'; this case is
19108distinguishable by the condition |cur_p=null|.
19109
19110@d copy_to_cur_active(#)==cur_active_width[#]:=active_width[#]
19111@d deactivate=60 {go here when node |r| should be deactivated}
19112
19113@d cp_skipable(#) == {skipable nodes at the margins during character protrusion}
19114(
19115    not is_char_node(#) and
19116    (
19117        (type(#) = ins_node)
19118        or (type(#) = mark_node)
19119        or (type(#) = adjust_node)
19120        or (type(#) = penalty_node)
19121        or ((type(#) = disc_node) and
19122            (pre_break(#) = null) and
19123            (post_break(#) = null) and
19124            (replace_count(#) = 0)) {an empty |disc_node|}
19125        or ((type(#) = math_node) and (width(#) = 0))
19126        or ((type(#) = kern_node) and
19127            ((width(#) = 0) or (subtype(#) = normal)))
19128        or ((type(#) = glue_node) and (glue_ptr(#) = zero_glue))
19129        or ((type(#) = hlist_node) and (width(#) = 0) and (height(#) = 0) and
19130            (depth(#) = 0) and (list_ptr(#) = null))
19131    )
19132)
19133
19134@<Declare subprocedures for |line_break|@>=
19135procedure push_node(p: pointer);
19136begin
19137  if hlist_stack_level > max_hlist_stack then
19138    pdf_error("push_node", "stack overflow");
19139  hlist_stack[hlist_stack_level]:=p;
19140  hlist_stack_level:=hlist_stack_level + 1;
19141end;
19142
19143function pop_node: pointer;
19144begin
19145  hlist_stack_level:=hlist_stack_level - 1;
19146  if hlist_stack_level < 0 then {would point to some bug}
19147    pdf_error("pop_node", "stack underflow (internal error)");
19148  pop_node:=hlist_stack[hlist_stack_level];
19149end;
19150
19151function find_protchar_left(l: pointer; d: boolean): pointer;
19152{searches left to right from list head |l|, returns 1st non-skipable item}
19153var t: pointer;
19154  run: boolean;
19155begin
19156  if (link(l) <> null) and (type(l) = hlist_node) and (width(l) = 0)
19157    and (height(l) = 0) and (depth(l) = 0) and (list_ptr(l) = null) then
19158    l:=link(l) {for paragraph start with \.{\\parindent = 0pt}}
19159  else if d then
19160    while (link(l) <> null) and (not (is_char_node(l) or non_discardable(l))) do
19161      l:=link(l); {std.\ discardables at line break, \TeX book, p 95}
19162  hlist_stack_level:=0;
19163  run:=true;
19164  repeat
19165    t:=l;
19166    while run and (type(l) = hlist_node) and (list_ptr(l) <> null) do begin
19167      push_node(l);
19168      l:=list_ptr(l);
19169    end;
19170    while run and cp_skipable(l) do begin
19171      while (link(l) = null) and (hlist_stack_level > 0) do begin
19172        l:=pop_node; {don't visit this node again}
19173      end;
19174      if link(l) <> null then
19175        l:=link(l)
19176      else if hlist_stack_level = 0 then run:=false
19177    end;
19178  until t = l;
19179  find_protchar_left:=l;
19180end;
19181
19182function find_protchar_right(l, r: pointer): pointer;
19183{searches right to left from list tail |r| to head |l|, returns 1st non-skipable item}
19184var t: pointer;
19185  run: boolean;
19186begin
19187  find_protchar_right:=null;
19188  if r = null then return;
19189  hlist_stack_level:=0;
19190  run:=true;
19191  repeat
19192    t:=r;
19193    while run and (type(r) = hlist_node) and (list_ptr(r) <> null) do begin
19194      push_node(l);
19195      push_node(r);
19196      l:=list_ptr(r);
19197      r:=l;
19198      while link(r) <> null do
19199        r:=link(r);
19200    end;
19201    while run and cp_skipable(r) do begin
19202      while (r = l) and (hlist_stack_level > 0) do begin
19203        r:=pop_node; {don't visit this node again}
19204        l:=pop_node;
19205      end;
19206      if (r <> l) and (r <> null) then
19207        r:=prev_rightmost(l, r)
19208      else if (r = l) and (hlist_stack_level = 0) then run:=false
19209    end;
19210  until t = r;
19211  find_protchar_right:=r;
19212end;
19213
19214function total_pw(q, p: pointer): scaled;
19215{returns the total width of character protrusion of a line;
19216|cur_break(break_node(q))| and |p| is the leftmost resp. rightmost node in the
19217horizontal list representing the actual line}
19218var l, r: pointer;
19219  n: integer;
19220begin
19221  if break_node(q) = null then
19222    l:=first_p
19223  else
19224    l:=cur_break(break_node(q));
19225  r:=prev_rightmost(global_prev_p, p); {get |link(r)=p|}
19226  {let's look at the right margin first}
19227  if (p <> null) and (type(p) = disc_node) and (pre_break(p) <> null) then
19228  {a |disc_node| with non-empty |pre_break|, protrude the last char of |pre_break|}
19229  begin
19230    r:=pre_break(p);
19231    while link(r) <> null do
19232      r:=link(r);
19233  end else r:=find_protchar_right(l, r);
19234  {now the left margin}
19235  if (l <> null) and (type(l) = disc_node) then begin
19236    if post_break(l) <> null then begin
19237      l:=post_break(l); {protrude the first char}
19238      goto done;
19239    end else {discard |replace_count(l)| nodes}
19240    begin
19241      n:=replace_count(l);
19242      l:=link(l);
19243      while n > 0 do begin
19244        if link(l) <> null then
19245          l:=link(l);
19246        decr(n);
19247      end;
19248    end;
19249  end;
19250  l:=find_protchar_left(l, true);
19251done:
19252  total_pw:=left_pw(l) + right_pw(r);
19253end;
19254procedure try_break(@!pi:integer;@!break_type:small_number);
19255label exit,done,done1,continue,deactivate,found,not_found;
19256var r:pointer; {runs through the active list}
19257@!prev_r:pointer; {stays a step behind |r|}
19258@!old_l:halfword; {maximum line number in current equivalence class of lines}
19259@!no_break_yet:boolean; {have we found a feasible break at |cur_p|?}
19260@<Other local variables for |try_break|@>@;
19261begin @<Make sure that |pi| is in the proper range@>;
19262no_break_yet:=true; prev_r:=active; old_l:=0;
19263do_all_six(copy_to_cur_active);
19264loop@+  begin continue: r:=link(prev_r);
19265  @<If node |r| is of type |delta_node|, update |cur_active_width|,
19266    set |prev_r| and |prev_prev_r|, then |goto continue|@>;
19267  @<If a line number class has ended, create new active nodes for
19268    the best feasible breaks in that class; then |return|
19269    if |r=last_active|, otherwise compute the new |line_width|@>;
19270  @<Consider the demerits for a line from |r| to |cur_p|;
19271    deactivate node |r| if it should no longer be active;
19272    then |goto continue| if a line from |r| to |cur_p| is infeasible,
19273    otherwise record a new feasible break@>;
19274  end;
19275exit: @!stat @<Update the value of |printed_node| for
19276  symbolic displays@>@+tats@;
19277end;
19278
19279@ @<Other local variables for |try_break|@>=
19280@!prev_prev_r:pointer; {a step behind |prev_r|, if |type(prev_r)=delta_node|}
19281@!s:pointer; {runs through nodes ahead of |cur_p|}
19282@!q:pointer; {points to a new node being created}
19283@!v:pointer; {points to a glue specification or a node ahead of |cur_p|}
19284@!t:integer; {node count, if |cur_p| is a discretionary node}
19285@!f:internal_font_number; {used in character width calculation}
19286@!l:halfword; {line number of current active node}
19287@!node_r_stays_active:boolean; {should node |r| remain in the active list?}
19288@!line_width:scaled; {the current line will be justified to this width}
19289@!fit_class:very_loose_fit..tight_fit; {possible fitness class of test line}
19290@!b:halfword; {badness of test line}
19291@!d:integer; {demerits of test line}
19292@!artificial_demerits:boolean; {has |d| been forced to zero?}
19293@!save_link:pointer; {temporarily holds value of |link(cur_p)|}
19294@!shortfall:scaled; {used in badness calculations}
19295
19296@ @<Make sure that |pi| is in the proper range@>=
19297if abs(pi)>=inf_penalty then
19298  if pi>0 then return {this breakpoint is inhibited by infinite penalty}
19299  else pi:=eject_penalty {this breakpoint will be forced}
19300
19301@ The following code uses the fact that |type(last_active)<>delta_node|.
19302
19303@d update_width(#)==@|
19304  cur_active_width[#]:=cur_active_width[#]+mem[r+#].sc
19305
19306@<If node |r|...@>=
19307@^inner loop@>
19308if type(r)=delta_node then
19309  begin do_all_six(update_width);
19310  prev_prev_r:=prev_r; prev_r:=r; goto continue;
19311  end
19312
19313@ As we consider various ways to end a line at |cur_p|, in a given line number
19314class, we keep track of the best total demerits known, in an array with
19315one entry for each of the fitness classifications. For example,
19316|minimal_demerits[tight_fit]| contains the fewest total demerits of feasible
19317line breaks ending at |cur_p| with a |tight_fit| line; |best_place[tight_fit]|
19318points to the passive node for the break before~|cur_p| that achieves such
19319an optimum; and |best_pl_line[tight_fit]| is the |line_number| field in the
19320active node corresponding to |best_place[tight_fit]|. When no feasible break
19321sequence is known, the |minimal_demerits| entries will be equal to
19322|awful_bad|, which is $2^{30}-1$. Another variable, |minimum_demerits|,
19323keeps track of the smallest value in the |minimal_demerits| array.
19324
19325@d awful_bad==@'7777777777 {more than a billion demerits}
19326
19327@<Global...@>=
19328@!minimal_demerits:array[very_loose_fit..tight_fit] of integer; {best total
19329  demerits known for current line class and position, given the fitness}
19330@!minimum_demerits:integer; {best total demerits known for current line class
19331  and position}
19332@!best_place:array[very_loose_fit..tight_fit] of pointer; {how to achieve
19333  |minimal_demerits|}
19334@!best_pl_line:array[very_loose_fit..tight_fit] of halfword; {corresponding
19335  line number}
19336
19337@ @<Get ready to start...@>=
19338minimum_demerits:=awful_bad;
19339minimal_demerits[tight_fit]:=awful_bad;
19340minimal_demerits[decent_fit]:=awful_bad;
19341minimal_demerits[loose_fit]:=awful_bad;
19342minimal_demerits[very_loose_fit]:=awful_bad;
19343
19344@ The first part of the following code is part of \TeX's inner loop, so
19345we don't want to waste any time. The current active node, namely node |r|,
19346contains the line number that will be considered next. At the end of the
19347list we have arranged the data structure so that |r=last_active| and
19348|line_number(last_active)>old_l|.
19349@^inner loop@>
19350
19351@<If a line number class...@>=
19352begin l:=line_number(r);
19353if l>old_l then
19354  begin {now we are no longer in the inner loop}
19355  if (minimum_demerits<awful_bad)and@|
19356      ((old_l<>easy_line)or(r=last_active)) then
19357    @<Create new active nodes for the best feasible breaks
19358      just found@>;
19359  if r=last_active then return;
19360  @<Compute the new line width@>;
19361  end;
19362end
19363
19364@ It is not necessary to create new active nodes having |minimal_demerits|
19365greater than
19366|minimum_demerits+abs(adj_demerits)|, since such active nodes will never
19367be chosen in the final paragraph breaks. This observation allows us to
19368omit a substantial number of feasible breakpoints from further consideration.
19369
19370@<Create new active nodes...@>=
19371begin if no_break_yet then @<Compute the values of |break_width|@>;
19372@<Insert a delta node to prepare for breaks at |cur_p|@>;
19373if abs(adj_demerits)>=awful_bad-minimum_demerits then
19374  minimum_demerits:=awful_bad-1
19375else minimum_demerits:=minimum_demerits+abs(adj_demerits);
19376for fit_class:=very_loose_fit to tight_fit do
19377  begin if minimal_demerits[fit_class]<=minimum_demerits then
19378    @<Insert a new active node
19379      from |best_place[fit_class]| to |cur_p|@>;
19380  minimal_demerits[fit_class]:=awful_bad;
19381  end;
19382minimum_demerits:=awful_bad;
19383@<Insert a delta node to prepare for the next active node@>;
19384end
19385
19386@ When we insert a new active node for a break at |cur_p|, suppose this
19387new node is to be placed just before active node |a|; then we essentially
19388want to insert `$\delta\,|cur_p|\,\delta^\prime$' before |a|, where
19389$\delta=\alpha(a)-\alpha(|cur_p|)$ and $\delta^\prime=\alpha(|cur_p|)-\alpha(a)$
19390in the notation explained above.  The |cur_active_width| array now holds
19391$\gamma+\beta(|cur_p|)-\alpha(a)$; so $\delta$ can be obtained by
19392subtracting |cur_active_width| from the quantity $\gamma+\beta(|cur_p|)-
19393\alpha(|cur_p|)$. The latter quantity can be regarded as the length of a
19394line ``from |cur_p| to |cur_p|''; we call it the |break_width| at |cur_p|.
19395
19396The |break_width| is usually negative, since it consists of the background
19397(which is normally zero) minus the width of nodes following~|cur_p| that are
19398eliminated after a break. If, for example, node |cur_p| is a glue node, the
19399width of this glue is subtracted from the background; and we also look
19400ahead to eliminate all subsequent glue and penalty and kern and math
19401nodes, subtracting their widths as well.
19402
19403Kern nodes do not disappear at a line break unless they are |explicit|.
19404
19405@d set_break_width_to_background(#)==break_width[#]:=background[#]
19406
19407@<Compute the values of |break...@>=
19408begin no_break_yet:=false; do_all_six(set_break_width_to_background);
19409s:=cur_p;
19410if break_type>unhyphenated then if cur_p<>null then
19411  @<Compute the discretionary |break_width| values@>;
19412while s<>null do
19413  begin if is_char_node(s) then goto done;
19414  case type(s) of
19415  glue_node:@<Subtract glue from |break_width|@>;
19416  penalty_node: do_nothing;
19417  math_node: break_width[1]:=break_width[1]-width(s);
19418  kern_node: if subtype(s)<>explicit then goto done
19419    else break_width[1]:=break_width[1]-width(s);
19420  othercases goto done
19421  endcases;@/
19422  s:=link(s);
19423  end;
19424done: end
19425
19426@ @<Subtract glue from |break...@>=
19427begin v:=glue_ptr(s); break_width[1]:=break_width[1]-width(v);
19428break_width[2+stretch_order(v)]:=break_width[2+stretch_order(v)]-stretch(v);
19429break_width[6]:=break_width[6]-shrink(v);
19430end
19431
19432@ When |cur_p| is a discretionary break, the length of a line ``from |cur_p| to
19433|cur_p|'' has to be defined properly so that the other calculations work out.
19434Suppose that the pre-break text at |cur_p| has length $l_0$, the post-break
19435text has length $l_1$, and the replacement text has length |l|. Suppose
19436also that |q| is the node following the replacement text. Then length of a
19437line from |cur_p| to |q| will be computed as $\gamma+\beta(q)-\alpha(|cur_p|)$,
19438where $\beta(q)=\beta(|cur_p|)-l_0+l$. The actual length will be the background
19439plus $l_1$, so the length from |cur_p| to |cur_p| should be $\gamma+l_0+l_1-l$.
19440If the post-break text of the discretionary is empty, a break may also
19441discard~|q|; in that unusual case we subtract the length of~|q| and any
19442other nodes that will be discarded after the discretionary break.
19443
19444The value of $l_0$ need not be computed, since |line_break| will put
19445it into the global variable |disc_width| before calling |try_break|.
19446
19447@<Glob...@>=
19448@!disc_width:scaled; {the length of discretionary material preceding a break}
19449
19450@ @<Compute the discretionary |break...@>=
19451begin t:=replace_count(cur_p); v:=cur_p; s:=post_break(cur_p);
19452while t>0 do
19453  begin decr(t); v:=link(v);
19454  @<Subtract the width of node |v| from |break_width|@>;
19455  end;
19456while s<>null do
19457  begin @<Add the width of node |s| to |break_width|@>;
19458  s:=link(s);
19459  end;
19460break_width[1]:=break_width[1]+disc_width;
19461if post_break(cur_p)=null then s:=link(v);
19462          {nodes may be discardable after the break}
19463end
19464
19465@ Replacement texts and discretionary texts are supposed to contain
19466only character nodes, kern nodes, ligature nodes, and box or rule nodes.
19467
19468@<Subtract the width of node |v|...@>=
19469if is_char_node(v) then
19470  begin f:=font(v);
19471  break_width[1]:=break_width[1]-char_width(f)(char_info(f)(character(v)));
19472  end
19473else  case type(v) of
19474  ligature_node: begin f:=font(lig_char(v));@/
19475    xtx_ligature_present:=true;
19476    break_width[1]:=@|break_width[1]-
19477      char_width(f)(char_info(f)(character(lig_char(v))));
19478    end;
19479  hlist_node,vlist_node,rule_node,kern_node:
19480    break_width[1]:=break_width[1]-width(v);
19481  whatsit_node:
19482    if (subtype(v)=native_word_node)
19483    or (subtype(v)=glyph_node)
19484    or (subtype(v)=pic_node)
19485    or (subtype(v)=pdf_node)
19486    then break_width[1]:=break_width[1]-width(v)
19487    else confusion("disc1a");
19488  othercases confusion("disc1")
19489@:this can't happen disc1}{\quad disc1@>
19490  endcases
19491
19492@ @<Add the width of node |s| to |b...@>=
19493if is_char_node(s) then
19494  begin f:=font(s);
19495  break_width[1]:=@|break_width[1]+char_width(f)(char_info(f)(character(s)));
19496  end
19497else  case type(s) of
19498  ligature_node: begin f:=font(lig_char(s));
19499    xtx_ligature_present:=true;
19500    break_width[1]:=break_width[1]+
19501      char_width(f)(char_info(f)(character(lig_char(s))));
19502    end;
19503  hlist_node,vlist_node,rule_node,kern_node:
19504    break_width[1]:=break_width[1]+width(s);
19505  whatsit_node:
19506    if (subtype(s)=native_word_node)
19507    or (subtype(s)=glyph_node)
19508    or (subtype(s)=pic_node)
19509    or (subtype(s)=pdf_node)
19510    then break_width[1]:=break_width[1]+width(s)
19511    else confusion("disc2a");
19512  othercases confusion("disc2")
19513@:this can't happen disc2}{\quad disc2@>
19514  endcases
19515
19516@ We use the fact that |type(active)<>delta_node|.
19517
19518@d convert_to_break_width(#)==@|
19519  mem[prev_r+#].sc:=@|@t\hskip10pt@>mem[prev_r+#].sc
19520  -cur_active_width[#]+break_width[#]
19521@d store_break_width(#)==active_width[#]:=break_width[#]
19522@d new_delta_to_break_width(#)==@|
19523  mem[q+#].sc:=break_width[#]-cur_active_width[#]
19524
19525@<Insert a delta node to prepare for breaks at |cur_p|@>=
19526if type(prev_r)=delta_node then {modify an existing delta node}
19527  begin do_all_six(convert_to_break_width);
19528  end
19529else if prev_r=active then {no delta node needed at the beginning}
19530  begin do_all_six(store_break_width);
19531  end
19532else  begin q:=get_node(delta_node_size); link(q):=r; type(q):=delta_node;@/
19533  subtype(q):=0; {the |subtype| is not used}
19534  do_all_six(new_delta_to_break_width);
19535  link(prev_r):=q; prev_prev_r:=prev_r; prev_r:=q;
19536  end
19537
19538@ When the following code is performed, we will have just inserted at
19539least one active node before |r|, so |type(prev_r)<>delta_node|.
19540
19541@d new_delta_from_break_width(#)==@|mem[q+#].sc:=
19542    cur_active_width[#]-break_width[#]
19543
19544@<Insert a delta node to prepare for the next active node@>=
19545if r<>last_active then
19546  begin q:=get_node(delta_node_size); link(q):=r; type(q):=delta_node;@/
19547  subtype(q):=0; {the |subtype| is not used}
19548  do_all_six(new_delta_from_break_width);
19549  link(prev_r):=q; prev_prev_r:=prev_r; prev_r:=q;
19550  end
19551
19552@ When we create an active node, we also create the corresponding
19553passive node.
19554
19555@<Insert a new active node from |best_place[fit_class]| to |cur_p|@>=
19556begin q:=get_node(passive_node_size);
19557link(q):=passive; passive:=q; cur_break(q):=cur_p;
19558@!stat incr(pass_number); serial(q):=pass_number;@+tats@;@/
19559prev_break(q):=best_place[fit_class];@/
19560q:=get_node(active_node_size); break_node(q):=passive;
19561line_number(q):=best_pl_line[fit_class]+1;
19562fitness(q):=fit_class; type(q):=break_type;
19563total_demerits(q):=minimal_demerits[fit_class];
19564if do_last_line_fit then
19565  @<Store \(a)additional data in the new active node@>;
19566link(q):=r; link(prev_r):=q; prev_r:=q;
19567@!stat if tracing_paragraphs>0 then
19568  @<Print a symbolic description of the new break node@>;
19569tats@;@/
19570end
19571
19572@ @<Print a symbolic description of the new break node@>=
19573begin print_nl("@@@@"); print_int(serial(passive));
19574@.\AT!\AT!@>
19575print(": line "); print_int(line_number(q)-1);
19576print_char("."); print_int(fit_class);
19577if break_type=hyphenated then print_char("-");
19578print(" t="); print_int(total_demerits(q));
19579if do_last_line_fit then @<Print additional data in the new active node@>;
19580print(" -> @@@@");
19581if prev_break(passive)=null then print_char("0")
19582else print_int(serial(prev_break(passive)));
19583end
19584
19585@ The length of lines depends on whether the user has specified
19586\.{\\parshape} or \.{\\hangindent}. If |par_shape_ptr| is not null, it
19587points to a $(2n+1)$-word record in |mem|, where the |info| in the first
19588word contains the value of |n|, and the other $2n$ words contain the left
19589margins and line lengths for the first |n| lines of the paragraph; the
19590specifications for line |n| apply to all subsequent lines. If
19591|par_shape_ptr=null|, the shape of the paragraph depends on the value of
19592|n=hang_after|; if |n>=0|, hanging indentation takes place on lines |n+1|,
19593|n+2|, \dots, otherwise it takes place on lines 1, \dots, $\vert
19594n\vert$. When hanging indentation is active, the left margin is
19595|hang_indent|, if |hang_indent>=0|, else it is 0; the line length is
19596$|hsize|-\vert|hang_indent|\vert$. The normal setting is
19597|par_shape_ptr=null|, |hang_after=1|, and |hang_indent=0|.
19598Note that if |hang_indent=0|, the value of |hang_after| is irrelevant.
19599@^length of lines@> @^hanging indentation@>
19600
19601@<Glob...@>=
19602@!easy_line:halfword; {line numbers |>easy_line| are equivalent in break nodes}
19603@!last_special_line:halfword; {line numbers |>last_special_line| all have
19604  the same width}
19605@!first_width:scaled; {the width of all lines |<=last_special_line|, if
19606  no \.{\\parshape} has been specified}
19607@!second_width:scaled; {the width of all lines |>last_special_line|}
19608@!first_indent:scaled; {left margin to go with |first_width|}
19609@!second_indent:scaled; {left margin to go with |second_width|}
19610
19611@ We compute the values of |easy_line| and the other local variables relating
19612to line length when the |line_break| procedure is initializing itself.
19613
19614@<Get ready to start...@>=
19615if par_shape_ptr=null then
19616  if hang_indent=0 then
19617    begin last_special_line:=0; second_width:=hsize;
19618    second_indent:=0;
19619    end
19620  else @<Set line length parameters in preparation for hanging indentation@>
19621else  begin last_special_line:=info(par_shape_ptr)-1;
19622  second_width:=mem[par_shape_ptr+2*(last_special_line+1)].sc;
19623  second_indent:=mem[par_shape_ptr+2*last_special_line+1].sc;
19624  end;
19625if looseness=0 then easy_line:=last_special_line
19626else easy_line:=max_halfword
19627
19628@ @<Set line length parameters in preparation for hanging indentation@>=
19629begin last_special_line:=abs(hang_after);
19630if hang_after<0 then
19631  begin first_width:=hsize-abs(hang_indent);
19632  if hang_indent>=0 then first_indent:=hang_indent
19633  else first_indent:=0;
19634  second_width:=hsize; second_indent:=0;
19635  end
19636else  begin first_width:=hsize; first_indent:=0;
19637  second_width:=hsize-abs(hang_indent);
19638  if hang_indent>=0 then second_indent:=hang_indent
19639  else second_indent:=0;
19640  end;
19641end
19642
19643@ When we come to the following code, we have just encountered the first
19644active node~|r| whose |line_number| field contains |l|. Thus we want to
19645compute the length of the $l\mskip1mu$th line of the current paragraph. Furthermore,
19646we want to set |old_l| to the last number in the class of line numbers
19647equivalent to~|l|.
19648
19649@<Compute the new line width@>=
19650if l>easy_line then
19651  begin line_width:=second_width; old_l:=max_halfword-1;
19652  end
19653else  begin old_l:=l;
19654  if l>last_special_line then line_width:=second_width
19655  else if par_shape_ptr=null then line_width:=first_width
19656  else line_width:=mem[par_shape_ptr+2*l@,].sc;
19657  end
19658
19659@ The remaining part of |try_break| deals with the calculation of
19660demerits for a break from |r| to |cur_p|.
19661
19662The first thing to do is calculate the badness, |b|. This value will always
19663be between zero and |inf_bad+1|; the latter value occurs only in the
19664case of lines from |r| to |cur_p| that cannot shrink enough to fit the necessary
19665width. In such cases, node |r| will be deactivated.
19666We also deactivate node~|r| when a break at~|cur_p| is forced, since future
19667breaks must go through a forced break.
19668
19669@<Consider the demerits for a line from |r| to |cur_p|...@>=
19670begin artificial_demerits:=false;@/
19671@^inner loop@>
19672shortfall:=line_width-cur_active_width[1]; {we're this much too short}
19673if XeTeX_protrude_chars > 1 then
19674  shortfall:=shortfall + total_pw(r, cur_p);
19675if shortfall>0 then
19676  @<Set the value of |b| to the badness for stretching the line,
19677    and compute the corresponding |fit_class|@>
19678else @<Set the value of |b| to the badness for shrinking the line,
19679    and compute the corresponding |fit_class|@>;
19680if do_last_line_fit then @<Adjust \(t)the additional data for last line@>;
19681found:
19682if (b>inf_bad)or(pi=eject_penalty) then
19683  @<Prepare to deactivate node~|r|, and |goto deactivate| unless
19684    there is a reason to consider lines of text from |r| to |cur_p|@>
19685else  begin prev_r:=r;
19686  if b>threshold then goto continue;
19687  node_r_stays_active:=true;
19688  end;
19689@<Record a new feasible break@>;
19690if node_r_stays_active then goto continue; {|prev_r| has been set to |r|}
19691deactivate: @<Deactivate node |r|@>;
19692end
19693
19694@ When a line must stretch, the available stretchability can be found in the
19695subarray |cur_active_width[2..5]|, in units of points, fil, fill, and filll.
19696
19697The present section is part of \TeX's inner loop, and it is most often performed
19698when the badness is infinite; therefore it is worth while to make a quick
19699test for large width excess and small stretchability, before calling the
19700|badness| subroutine.
19701@^inner loop@>
19702
19703@<Set the value of |b| to the badness for stretching...@>=
19704if (cur_active_width[3]<>0)or(cur_active_width[4]<>0)or@|
19705  (cur_active_width[5]<>0) then
19706  begin if do_last_line_fit then
19707    begin if cur_p=null then {the last line of a paragraph}
19708      @<Perform computations for last line and |goto found|@>;
19709    shortfall:=0;
19710    end;
19711  b:=0; fit_class:=decent_fit; {infinite stretch}
19712  end
19713else  begin if shortfall>7230584 then if cur_active_width[2]<1663497 then
19714    begin b:=inf_bad; fit_class:=very_loose_fit; goto done1;
19715    end;
19716  b:=badness(shortfall,cur_active_width[2]);
19717  if b>12 then
19718    if b>99 then fit_class:=very_loose_fit
19719    else fit_class:=loose_fit
19720  else fit_class:=decent_fit;
19721  done1:
19722  end
19723
19724@ Shrinkability is never infinite in a paragraph;
19725we can shrink the line from |r| to |cur_p| by at most |cur_active_width[6]|.
19726
19727@<Set the value of |b| to the badness for shrinking...@>=
19728begin if -shortfall>cur_active_width[6] then b:=inf_bad+1
19729else b:=badness(-shortfall,cur_active_width[6]);
19730if b>12 then fit_class:=tight_fit@+else fit_class:=decent_fit;
19731end
19732
19733@ During the final pass, we dare not lose all active nodes, lest we lose
19734touch with the line breaks already found. The code shown here makes sure
19735that such a catastrophe does not happen, by permitting overfull boxes as
19736a last resort. This particular part of \TeX\ was a source of several subtle
19737bugs before the correct program logic was finally discovered; readers
19738who seek to ``improve'' \TeX\ should therefore think thrice before daring
19739to make any changes here.
19740@^overfull boxes@>
19741
19742@<Prepare to deactivate node~|r|, and |goto deactivate| unless...@>=
19743begin if final_pass and (minimum_demerits=awful_bad) and@|
19744   (link(r)=last_active) and
19745   (prev_r=active) then
19746  artificial_demerits:=true {set demerits zero, this break is forced}
19747else if b>threshold then goto deactivate;
19748node_r_stays_active:=false;
19749end
19750
19751@ When we get to this part of the code, the line from |r| to |cur_p| is
19752feasible, its badness is~|b|, and its fitness classification is |fit_class|.
19753We don't want to make an active node for this break yet, but we will
19754compute the total demerits and record them in the |minimal_demerits| array,
19755if such a break is the current champion among all ways to get to |cur_p|
19756in a given line-number class and fitness class.
19757
19758@<Record a new feasible break@>=
19759if artificial_demerits then d:=0
19760else @<Compute the demerits, |d|, from |r| to |cur_p|@>;
19761@!stat if tracing_paragraphs>0 then
19762  @<Print a symbolic description of this feasible break@>;
19763tats@;@/
19764d:=d+total_demerits(r); {this is the minimum total demerits
19765  from the beginning to |cur_p| via |r|}
19766if d<=minimal_demerits[fit_class] then
19767  begin minimal_demerits[fit_class]:=d;
19768  best_place[fit_class]:=break_node(r); best_pl_line[fit_class]:=l;
19769  if do_last_line_fit then
19770    @<Store \(a)additional data for this feasible break@>;
19771  if d<minimum_demerits then minimum_demerits:=d;
19772  end
19773
19774@ @<Print a symbolic description of this feasible break@>=
19775begin if printed_node<>cur_p then
19776  @<Print the list between |printed_node| and |cur_p|,
19777    then set |printed_node:=cur_p|@>;
19778print_nl("@@");
19779@.\AT!@>
19780if cur_p=null then print_esc("par")
19781else if type(cur_p)<>glue_node then
19782  begin if type(cur_p)=penalty_node then print_esc("penalty")
19783  else if type(cur_p)=disc_node then print_esc("discretionary")
19784  else if type(cur_p)=kern_node then print_esc("kern")
19785  else print_esc("math");
19786  end;
19787print(" via @@@@");
19788if break_node(r)=null then print_char("0")
19789else print_int(serial(break_node(r)));
19790print(" b=");
19791if b>inf_bad then print_char("*")@+else print_int(b);
19792@.*\relax@>
19793print(" p="); print_int(pi); print(" d=");
19794if artificial_demerits then print_char("*")@+else print_int(d);
19795end
19796
19797@ @<Print the list between |printed_node| and |cur_p|...@>=
19798begin print_nl("");
19799if cur_p=null then short_display(link(printed_node))
19800else  begin save_link:=link(cur_p);
19801  link(cur_p):=null; print_nl(""); short_display(link(printed_node));
19802  link(cur_p):=save_link;
19803  end;
19804printed_node:=cur_p;
19805end
19806
19807@ When the data for a discretionary break is being displayed, we will have
19808printed the |pre_break| and |post_break| lists; we want to skip over the
19809third list, so that the discretionary data will not appear twice.  The
19810following code is performed at the very end of |try_break|.
19811
19812@<Update the value of |printed_node|...@>=
19813if cur_p=printed_node then if cur_p<>null then if type(cur_p)=disc_node then
19814  begin t:=replace_count(cur_p);
19815  while t>0 do
19816    begin decr(t); printed_node:=link(printed_node);
19817    end;
19818  end
19819
19820@ @<Compute the demerits, |d|, from |r| to |cur_p|@>=
19821begin d:=line_penalty+b;
19822if abs(d)>=10000 then d:=100000000@+else d:=d*d;
19823if pi<>0 then
19824  if pi>0 then d:=d+pi*pi
19825  else if pi>eject_penalty then d:=d-pi*pi;
19826if (break_type=hyphenated)and(type(r)=hyphenated) then
19827  if cur_p<>null then d:=d+double_hyphen_demerits
19828  else d:=d+final_hyphen_demerits;
19829if abs(fit_class-fitness(r))>1 then d:=d+adj_demerits;
19830end
19831
19832@ When an active node disappears, we must delete an adjacent delta node if the
19833active node was at the beginning or the end of the active list, or if it
19834was surrounded by delta nodes. We also must preserve the property that
19835|cur_active_width| represents the length of material from |link(prev_r)|
19836to~|cur_p|.
19837
19838@d combine_two_deltas(#)==@|mem[prev_r+#].sc:=mem[prev_r+#].sc+mem[r+#].sc
19839@d downdate_width(#)==@|cur_active_width[#]:=cur_active_width[#]-
19840  mem[prev_r+#].sc
19841
19842@<Deactivate node |r|@>=
19843link(prev_r):=link(r); free_node(r,active_node_size);
19844if prev_r=active then @<Update the active widths, since the first active
19845  node has been deleted@>
19846else if type(prev_r)=delta_node then
19847  begin r:=link(prev_r);
19848  if r=last_active then
19849    begin do_all_six(downdate_width);
19850    link(prev_prev_r):=last_active;
19851    free_node(prev_r,delta_node_size); prev_r:=prev_prev_r;
19852    end
19853  else if type(r)=delta_node then
19854    begin do_all_six(update_width);
19855    do_all_six(combine_two_deltas);
19856    link(prev_r):=link(r); free_node(r,delta_node_size);
19857    end;
19858  end
19859
19860@ The following code uses the fact that |type(last_active)<>delta_node|. If the
19861active list has just become empty, we do not need to update the
19862|active_width| array, since it will be initialized when an active
19863node is next inserted.
19864
19865@d update_active(#)==active_width[#]:=active_width[#]+mem[r+#].sc
19866
19867@<Update the active widths,...@>=
19868begin r:=link(active);
19869if type(r)=delta_node then
19870  begin do_all_six(update_active);
19871  do_all_six(copy_to_cur_active);
19872  link(active):=link(r); free_node(r,delta_node_size);
19873  end;
19874end
19875
19876@* \[39] Breaking paragraphs into lines, continued.
19877So far we have gotten a little way into the |line_break| routine, having
19878covered its important |try_break| subroutine. Now let's consider the
19879rest of the process.
19880
19881The main loop of |line_break| traverses the given hlist,
19882starting at |link(temp_head)|, and calls |try_break| at each legal
19883breakpoint. A variable called |auto_breaking| is set to true except
19884within math formulas, since glue nodes are not legal breakpoints when
19885they appear in formulas.
19886
19887The current node of interest in the hlist is pointed to by |cur_p|. Another
19888variable, |prev_p|, is usually one step behind |cur_p|, but the real
19889meaning of |prev_p| is this: If |type(cur_p)=glue_node| then |cur_p| is a legal
19890breakpoint if and only if |auto_breaking| is true and |prev_p| does not
19891point to a glue node, penalty node, explicit kern node, or math node.
19892
19893The following declarations provide for a few other local variables that are
19894used in special calculations.
19895
19896@<Local variables for line breaking@>=
19897@!auto_breaking:boolean; {is node |cur_p| outside a formula?}
19898@!prev_p:pointer; {helps to determine when glue nodes are breakpoints}
19899@!q,@!r,@!s,@!prev_s:pointer; {miscellaneous nodes of temporary interest}
19900@!f:internal_font_number; {used when calculating character widths}
19901
19902@ The `\ignorespaces|loop|\unskip' in the following code is performed at most
19903thrice per call of |line_break|, since it is actually a pass over the
19904entire paragraph.
19905
19906@d update_prev_p == begin
19907  prev_p:=cur_p;
19908  global_prev_p:=cur_p;
19909end
19910
19911@<Find optimal breakpoints@>=
19912threshold:=pretolerance;
19913if threshold>=0 then
19914  begin @!stat if tracing_paragraphs>0 then
19915    begin begin_diagnostic; print_nl("@@firstpass");@+end;@;@+tats@;@/
19916  second_pass:=false; final_pass:=false;
19917  end
19918else  begin threshold:=tolerance; second_pass:=true;
19919  final_pass:=(emergency_stretch<=0);
19920  @!stat if tracing_paragraphs>0 then begin_diagnostic;@+tats@;
19921  end;
19922loop@+  begin if threshold>inf_bad then threshold:=inf_bad;
19923  if second_pass then @<Initialize for hyphenating a paragraph@>;
19924  @<Create an active breakpoint representing the beginning of the paragraph@>;
19925  cur_p:=link(temp_head); auto_breaking:=true;@/
19926  update_prev_p; {glue at beginning is not a legal breakpoint}
19927  first_p:=cur_p; {to access the first node of paragraph as the first active
19928                     node has |break_node=null|}
19929  while (cur_p<>null)and(link(active)<>last_active) do
19930    @<Call |try_break| if |cur_p| is a legal breakpoint;
19931    on the second pass, also try to hyphenate the next
19932    word, if |cur_p| is a glue node;
19933    then advance |cur_p| to the next node of the paragraph
19934    that could possibly be a legal breakpoint@>;
19935  if cur_p=null then
19936    @<Try the final line break at the end of the paragraph,
19937    and |goto done| if the desired breakpoints have been found@>;
19938  @<Clean up the memory by removing the break nodes@>;
19939  if not second_pass then
19940    begin@!stat if tracing_paragraphs>0 then print_nl("@@secondpass");@;@+tats@/
19941    threshold:=tolerance; second_pass:=true; final_pass:=(emergency_stretch<=0);
19942    end {if at first you don't succeed, \dots}
19943  else begin @!stat if tracing_paragraphs>0 then
19944      print_nl("@@emergencypass");@;@+tats@/
19945    background[2]:=background[2]+emergency_stretch; final_pass:=true;
19946    end;
19947  end;
19948done: @!stat if tracing_paragraphs>0 then
19949  begin end_diagnostic(true); normalize_selector;
19950  end;@+tats@/
19951if do_last_line_fit then @<Adjust \(t)the final line of the paragraph@>;
19952
19953@ The active node that represents the starting point does not need a
19954corresponding passive node.
19955
19956@d store_background(#)==active_width[#]:=background[#]
19957
19958@<Create an active breakpoint representing the beginning of the paragraph@>=
19959q:=get_node(active_node_size);
19960type(q):=unhyphenated; fitness(q):=decent_fit;
19961link(q):=last_active; break_node(q):=null;
19962line_number(q):=prev_graf+1; total_demerits(q):=0; link(active):=q;
19963if do_last_line_fit then
19964  @<Initialize additional fields of the first active node@>;
19965do_all_six(store_background);@/
19966passive:=null; printed_node:=temp_head; pass_number:=0;
19967font_in_short_display:=null_font
19968
19969@ @<Clean...@>=
19970q:=link(active);
19971while q<>last_active do
19972  begin cur_p:=link(q);
19973  if type(q)=delta_node then free_node(q,delta_node_size)
19974  else free_node(q,active_node_size);
19975  q:=cur_p;
19976  end;
19977q:=passive;
19978while q<>null do
19979  begin cur_p:=link(q);
19980  free_node(q,passive_node_size);
19981  q:=cur_p;
19982  end
19983
19984@ Here is the main switch in the |line_break| routine, where legal breaks
19985are determined. As we move through the hlist, we need to keep the |active_width|
19986array up to date, so that the badness of individual lines is readily calculated
19987by |try_break|. It is convenient to use the short name |act_width| for
19988the component of active width that represents real width as opposed to glue.
19989
19990@d act_width==active_width[1] {length from first active node to current node}
19991@d kern_break==begin if not is_char_node(link(cur_p)) and auto_breaking then
19992    if type(link(cur_p))=glue_node then try_break(0,unhyphenated);
19993  act_width:=act_width+width(cur_p);
19994  end
19995
19996@<Call |try_break| if |cur_p| is a legal breakpoint...@>=
19997begin if is_char_node(cur_p) then
19998  @<Advance \(c)|cur_p| to the node following the present
19999    string of characters@>;
20000case type(cur_p) of
20001hlist_node,vlist_node,rule_node: act_width:=act_width+width(cur_p);
20002whatsit_node: @<Advance \(p)past a whatsit node in the \(l)|line_break| loop@>;
20003glue_node: begin @<If node |cur_p| is a legal breakpoint, call |try_break|;
20004  then update the active widths by including the glue in |glue_ptr(cur_p)|@>;
20005  if second_pass and auto_breaking then
20006    @<Try to hyphenate the following word@>;
20007  end;
20008kern_node: if subtype(cur_p)=explicit then kern_break
20009  else act_width:=act_width+width(cur_p);
20010ligature_node: begin f:=font(lig_char(cur_p));
20011  xtx_ligature_present:=true;
20012  act_width:=act_width+char_width(f)(char_info(f)(character(lig_char(cur_p))));
20013  end;
20014disc_node: @<Try to break after a discretionary fragment, then |goto done5|@>;
20015math_node: begin if subtype(cur_p)<L_code then auto_breaking:=odd(subtype(cur_p));
20016  kern_break;
20017  end;
20018penalty_node: try_break(penalty(cur_p),unhyphenated);
20019mark_node,ins_node,adjust_node: do_nothing;
20020othercases confusion("paragraph")
20021@:this can't happen paragraph}{\quad paragraph@>
20022endcases;@/
20023update_prev_p; cur_p:=link(cur_p);
20024done5:end
20025
20026@ The code that passes over the characters of words in a paragraph is
20027part of \TeX's inner loop, so it has been streamlined for speed. We use
20028the fact that `\.{\\parfillskip}' glue appears at the end of each paragraph;
20029it is therefore unnecessary to check if |link(cur_p)=null| when |cur_p| is a
20030character node.
20031@^inner loop@>
20032
20033@<Advance \(c)|cur_p| to the node following the present string...@>=
20034begin update_prev_p;
20035repeat f:=font(cur_p);
20036act_width:=act_width+char_width(f)(char_info(f)(character(cur_p)));
20037cur_p:=link(cur_p);
20038until not is_char_node(cur_p);
20039end
20040
20041@ When node |cur_p| is a glue node, we look at |prev_p| to see whether or not
20042a breakpoint is legal at |cur_p|, as explained above.
20043
20044@<If node |cur_p| is a legal breakpoint, call...@>=
20045if auto_breaking then
20046  begin if is_char_node(prev_p) then try_break(0,unhyphenated)
20047  else if precedes_break(prev_p) then try_break(0,unhyphenated)
20048  else if (type(prev_p)=kern_node)and(subtype(prev_p)<>explicit) then
20049    try_break(0,unhyphenated);
20050  end;
20051check_shrinkage(glue_ptr(cur_p)); q:=glue_ptr(cur_p);
20052act_width:=act_width+width(q);@|
20053active_width[2+stretch_order(q)]:=@|
20054  active_width[2+stretch_order(q)]+stretch(q);@/
20055active_width[6]:=active_width[6]+shrink(q)
20056
20057@ The following code knows that discretionary texts contain
20058only character nodes, kern nodes, box nodes, rule nodes, and ligature nodes.
20059
20060@<Try to break after a discretionary fragment...@>=
20061begin s:=pre_break(cur_p); disc_width:=0;
20062if s=null then try_break(ex_hyphen_penalty,hyphenated)
20063else  begin repeat @<Add the width of node |s| to |disc_width|@>;
20064    s:=link(s);
20065  until s=null;
20066  act_width:=act_width+disc_width;
20067  try_break(hyphen_penalty,hyphenated);
20068  act_width:=act_width-disc_width;
20069  end;
20070r:=replace_count(cur_p); s:=link(cur_p);
20071while r>0 do
20072  begin @<Add the width of node |s| to |act_width|@>;
20073  decr(r); s:=link(s);
20074  end;
20075update_prev_p; cur_p:=s; goto done5;
20076end
20077
20078@ @<Add the width of node |s| to |disc_width|@>=
20079if is_char_node(s) then
20080  begin f:=font(s);
20081  disc_width:=disc_width+char_width(f)(char_info(f)(character(s)));
20082  end
20083else  case type(s) of
20084  ligature_node: begin f:=font(lig_char(s));
20085    xtx_ligature_present:=true;
20086    disc_width:=disc_width+
20087      char_width(f)(char_info(f)(character(lig_char(s))));
20088    end;
20089  hlist_node,vlist_node,rule_node,kern_node:
20090    disc_width:=disc_width+width(s);
20091  whatsit_node:
20092    if (subtype(s)=native_word_node)
20093    or (subtype(s)=glyph_node)
20094    or (subtype(s)=pic_node)
20095    or (subtype(s)=pdf_node)
20096    then disc_width:=disc_width+width(s)
20097    else confusion("disc3a");
20098  othercases confusion("disc3")
20099@:this can't happen disc3}{\quad disc3@>
20100  endcases
20101
20102@ @<Add the width of node |s| to |act_width|@>=
20103if is_char_node(s) then
20104  begin f:=font(s);
20105  act_width:=act_width+char_width(f)(char_info(f)(character(s)));
20106  end
20107else  case type(s) of
20108  ligature_node: begin f:=font(lig_char(s));
20109    xtx_ligature_present:=true;
20110    act_width:=act_width+
20111      char_width(f)(char_info(f)(character(lig_char(s))));
20112    end;
20113  hlist_node,vlist_node,rule_node,kern_node:
20114    act_width:=act_width+width(s);
20115  whatsit_node:
20116    if (subtype(s)=native_word_node)
20117    or (subtype(s)=glyph_node)
20118    or (subtype(s)=pic_node)
20119    or (subtype(s)=pdf_node)
20120    then act_width:=act_width+width(s)
20121    else confusion("disc4a");
20122  othercases confusion("disc4")
20123@:this can't happen disc4}{\quad disc4@>
20124  endcases
20125
20126@ The forced line break at the paragraph's end will reduce the list of
20127breakpoints so that all active nodes represent breaks at |cur_p=null|.
20128On the first pass, we insist on finding an active node that has the
20129correct ``looseness.'' On the final pass, there will be at least one active
20130node, and we will match the desired looseness as well as we can.
20131
20132The global variable |best_bet| will be set to the active node for the best
20133way to break the paragraph, and a few other variables are used to
20134help determine what is best.
20135
20136@<Glob...@>=
20137@!best_bet:pointer; {use this passive node and its predecessors}
20138@!fewest_demerits:integer; {the demerits associated with |best_bet|}
20139@!best_line:halfword; {line number following the last line of the new paragraph}
20140@!actual_looseness:integer; {the difference between |line_number(best_bet)|
20141  and the optimum |best_line|}
20142@!line_diff:integer; {the difference between the current line number and
20143  the optimum |best_line|}
20144
20145@ @<Try the final line break at the end of the paragraph...@>=
20146begin try_break(eject_penalty,hyphenated);
20147if link(active)<>last_active then
20148  begin @<Find an active node with fewest demerits@>;
20149  if looseness=0 then goto done;
20150  @<Find the best active node for the desired looseness@>;
20151  if (actual_looseness=looseness)or final_pass then goto done;
20152  end;
20153end
20154
20155@ @<Find an active node...@>=
20156r:=link(active); fewest_demerits:=awful_bad;
20157repeat if type(r)<>delta_node then if total_demerits(r)<fewest_demerits then
20158  begin fewest_demerits:=total_demerits(r); best_bet:=r;
20159  end;
20160r:=link(r);
20161until r=last_active;
20162best_line:=line_number(best_bet)
20163
20164@ The adjustment for a desired looseness is a slightly more complicated
20165version of the loop just considered. Note that if a paragraph is broken
20166into segments by displayed equations, each segment will be subject to the
20167looseness calculation, independently of the other segments.
20168
20169@<Find the best active node...@>=
20170begin r:=link(active); actual_looseness:=0;
20171repeat if type(r)<>delta_node then
20172  begin line_diff:=line_number(r)-best_line;
20173  if ((line_diff<actual_looseness)and(looseness<=line_diff))or@|
20174  ((line_diff>actual_looseness)and(looseness>=line_diff)) then
20175    begin best_bet:=r; actual_looseness:=line_diff;
20176    fewest_demerits:=total_demerits(r);
20177    end
20178  else if (line_diff=actual_looseness)and@|
20179    (total_demerits(r)<fewest_demerits) then
20180    begin best_bet:=r; fewest_demerits:=total_demerits(r);
20181    end;
20182  end;
20183r:=link(r);
20184until r=last_active;
20185best_line:=line_number(best_bet);
20186end
20187
20188@ Once the best sequence of breakpoints has been found (hurray), we call on the
20189procedure |post_line_break| to finish the remainder of the work.
20190(By introducing this subprocedure, we are able to keep |line_break|
20191from getting extremely long.)
20192
20193@<Break the paragraph at the chosen...@>=
20194post_line_break(d)
20195
20196@ The total number of lines that will be set by |post_line_break|
20197is |best_line-prev_graf-1|. The last breakpoint is specified by
20198|break_node(best_bet)|, and this passive node points to the other breakpoints
20199via the |prev_break| links. The finishing-up phase starts by linking the
20200relevant passive nodes in forward order, changing |prev_break| to
20201|next_break|. (The |next_break| fields actually reside in the same memory
20202space as the |prev_break| fields did, but we give them a new name because
20203of their new significance.) Then the lines are justified, one by one.
20204
20205@d next_break==prev_break {new name for |prev_break| after links are reversed}
20206
20207@<Declare subprocedures for |line_break|@>=
20208procedure post_line_break(@!d:boolean);
20209label done,done1;
20210var q,@!r,@!s:pointer; {temporary registers for list manipulation}
20211    p, k: pointer;
20212    w: scaled;
20213    glue_break: boolean; {was a break at glue?}
20214    ptmp: pointer;
20215@!disc_break:boolean; {was the current break at a discretionary node?}
20216@!post_disc_break:boolean; {and did it have a nonempty post-break part?}
20217@!cur_width:scaled; {width of line number |cur_line|}
20218@!cur_indent:scaled; {left margin of line number |cur_line|}
20219@!t:quarterword; {used for replacement counts in discretionary nodes}
20220@!pen:integer; {use when calculating penalties between lines}
20221@!cur_line: halfword; {the current line number being justified}
20222@!LR_ptr:pointer; {stack of LR codes}
20223begin LR_ptr:=LR_save;
20224@<Reverse the links of the relevant passive nodes, setting |cur_p| to the
20225  first breakpoint@>;
20226cur_line:=prev_graf+1;
20227repeat @<Justify the line ending at breakpoint |cur_p|, and append it to the
20228  current vertical list, together with associated penalties and other
20229  insertions@>;
20230incr(cur_line); cur_p:=next_break(cur_p);
20231if cur_p<>null then if not post_disc_break then
20232  @<Prune unwanted nodes at the beginning of the next line@>;
20233until cur_p=null;
20234if (cur_line<>best_line)or(link(temp_head)<>null) then
20235  confusion("line breaking");
20236@:this can't happen line breaking}{\quad line breaking@>
20237prev_graf:=best_line-1;
20238LR_save:=LR_ptr;
20239end;
20240
20241@ The job of reversing links in a list is conveniently regarded as the job
20242of taking items off one stack and putting them on another. In this case we
20243take them off a stack pointed to by |q| and having |prev_break| fields;
20244we put them on a stack pointed to by |cur_p| and having |next_break| fields.
20245Node |r| is the passive node being moved from stack to stack.
20246
20247@<Reverse the links of the relevant passive nodes...@>=
20248q:=break_node(best_bet); cur_p:=null;
20249repeat r:=q; q:=prev_break(q); next_break(r):=cur_p; cur_p:=r;
20250until q=null
20251
20252@ Glue and penalty and kern and math nodes are deleted at the beginning of
20253a line, except in the anomalous case that the node to be deleted is actually
20254one of the chosen breakpoints. Otherwise
20255the pruning done here is designed to match
20256the lookahead computation in |try_break|, where the |break_width| values
20257are computed for non-discretionary breakpoints.
20258
20259@<Prune unwanted nodes at the beginning of the next line@>=
20260begin r:=temp_head;
20261loop@+  begin q:=link(r);
20262  if q=cur_break(cur_p) then goto done1;
20263    {|cur_break(cur_p)| is the next breakpoint}
20264  {now |q| cannot be |null|}
20265  if is_char_node(q) then goto done1;
20266  if non_discardable(q) then goto done1;
20267  if type(q)=kern_node then if subtype(q)<>explicit then goto done1;
20268  r:=q; {now |type(q)=glue_node|, |kern_node|, |math_node| or |penalty_node|}
20269  if type(q)=math_node then if TeXXeT_en then
20270    @<Adjust \(t)the LR stack for the |post_line_break| routine@>;
20271  end;
20272done1: if r<>temp_head then
20273  begin link(r):=null; flush_node_list(link(temp_head));
20274  link(temp_head):=q;
20275  end;
20276end
20277
20278@ The current line to be justified appears in a horizontal list starting
20279at |link(temp_head)| and ending at |cur_break(cur_p)|. If |cur_break(cur_p)| is
20280a glue node, we reset the glue to equal the |right_skip| glue; otherwise
20281we append the |right_skip| glue at the right. If |cur_break(cur_p)| is a
20282discretionary node, we modify the list so that the discretionary break
20283is compulsory, and we set |disc_break| to |true|. We also append
20284the |left_skip| glue at the left of the line, unless it is zero.
20285
20286@<Justify the line ending at breakpoint |cur_p|, and append it...@>=
20287if TeXXeT_en then
20288  @<Insert LR nodes at the beginning of the current line and adjust
20289    the LR stack based on LR nodes in this line@>;
20290@<Modify the end of the line to reflect the nature of the break and to include
20291  \.{\\rightskip}; also set the proper value of |disc_break|@>;
20292if TeXXeT_en then @<Insert LR nodes at the end of the current line@>;
20293@<Put the \(l)\.{\\leftskip} glue at the left and detach this line@>;
20294@<Call the packaging subroutine, setting |just_box| to the justified box@>;
20295@<Append the new box to the current vertical list, followed by the list of
20296  special nodes taken out of the box by the packager@>;
20297@<Append a penalty node, if a nonzero penalty is appropriate@>
20298
20299@ At the end of the following code, |q| will point to the final node on the
20300list about to be justified.
20301
20302@<Modify the end of the line...@>=
20303q:=cur_break(cur_p); disc_break:=false; post_disc_break:=false;
20304glue_break:=false;
20305if q<>null then {|q| cannot be a |char_node|}
20306  if type(q)=glue_node then
20307    begin delete_glue_ref(glue_ptr(q));
20308    glue_ptr(q):=right_skip;
20309    subtype(q):=right_skip_code+1; add_glue_ref(right_skip);
20310    glue_break:=true;
20311    goto done;
20312    end
20313  else  begin if type(q)=disc_node then
20314      @<Change discretionary to compulsory and set
20315        |disc_break:=true|@>
20316    else if type(q)=kern_node then width(q):=0
20317    else if type(q)=math_node then
20318      begin width(q):=0;
20319      if TeXXeT_en then @<Adjust \(t)the LR stack for the |p...@>;
20320      end;
20321    end
20322else  begin q:=temp_head;
20323  while link(q)<>null do q:=link(q);
20324  end;
20325done:
20326{at this point |q| is the rightmost breakpoint; the only exception is the case
20327of a discretionary break with non-empty |pre_break|, then |q| has been changed
20328to the last node of the |pre_break| list}
20329if XeTeX_protrude_chars > 0 then begin
20330  if disc_break and (is_char_node(q) or (type(q) <> disc_node))
20331  {|q| has been reset to the last node of |pre_break|}
20332  then begin
20333    p:=q;
20334    ptmp:=p;
20335  end else begin
20336    p:=prev_rightmost(link(temp_head), q); {get |link(p) = q|}
20337    ptmp:=p;
20338    p:=find_protchar_right(link(temp_head), p);
20339  end;
20340  w:=right_pw(p);
20341  if w <> 0 then {we have found a marginal kern, append it after |ptmp|}
20342  begin
20343    k:=new_margin_kern(-w, last_rightmost_char, right_side);
20344    link(k):=link(ptmp);
20345    link(ptmp):=k;
20346    if (ptmp = q) then
20347      q:=link(q);
20348  end;
20349end;
20350{if |q| was not a breakpoint at glue and has been reset to |rightskip| then
20351 we append |rightskip| after |q| now}
20352if not glue_break then begin
20353  @<Put the \(r)\.{\\rightskip} glue after node |q|@>;
20354end;
20355
20356@ @<Change discretionary to compulsory...@>=
20357begin t:=replace_count(q);
20358@<Destroy the |t| nodes following |q|, and
20359   make |r| point to the following node@>;
20360if post_break(q)<>null then @<Transplant the post-break list@>;
20361if pre_break(q)<>null then @<Transplant the pre-break list@>;
20362link(q):=r; disc_break:=true;
20363end
20364
20365@ @<Destroy the |t| nodes following |q|...@>=
20366if t=0 then r:=link(q)
20367else  begin r:=q;
20368  while t>1 do
20369    begin r:=link(r); decr(t);
20370    end;
20371  s:=link(r);
20372  r:=link(s); link(s):=null;
20373  flush_node_list(link(q)); replace_count(q):=0;
20374  end
20375
20376@ We move the post-break list from inside node |q| to the main list by
20377re\-attaching it just before the present node |r|, then resetting |r|.
20378
20379@<Transplant the post-break list@>=
20380begin s:=post_break(q);
20381while link(s)<>null do s:=link(s);
20382link(s):=r; r:=post_break(q); post_break(q):=null; post_disc_break:=true;
20383end
20384
20385@ We move the pre-break list from inside node |q| to the main list by
20386re\-attaching it just after the present node |q|, then resetting |q|.
20387
20388@<Transplant the pre-break list@>=
20389begin s:=pre_break(q); link(q):=s;
20390while link(s)<>null do s:=link(s);
20391pre_break(q):=null; q:=s;
20392end
20393
20394@ @<Put the \(r)\.{\\rightskip} glue after node |q|@>=
20395r:=new_param_glue(right_skip_code); link(r):=link(q); link(q):=r; q:=r
20396
20397@ The following code begins with |q| at the end of the list to be
20398justified. It ends with |q| at the beginning of that list, and with
20399|link(temp_head)| pointing to the remainder of the paragraph, if any.
20400
20401@<Put the \(l)\.{\\leftskip} glue at the left...@>=
20402r:=link(q); link(q):=null; q:=link(temp_head); link(temp_head):=r;
20403{at this point |q| is the leftmost node; all discardable nodes have been discarded}
20404if XeTeX_protrude_chars > 0 then begin
20405  p:=q;
20406  p:=find_protchar_left(p, false); {no more discardables}
20407  w:=left_pw(p);
20408  if w <> 0 then begin
20409    k:=new_margin_kern(-w, last_leftmost_char, left_side);
20410    link(k):=q;
20411    q:=k;
20412  end;
20413end;
20414if left_skip<>zero_glue then
20415  begin r:=new_param_glue(left_skip_code);
20416  link(r):=q; q:=r;
20417  end
20418
20419@ @<Append the new box to the current vertical list...@>=
20420if pre_adjust_head <> pre_adjust_tail then
20421    append_list(pre_adjust_head)(pre_adjust_tail);
20422pre_adjust_tail:=null;
20423append_to_vlist(just_box);
20424if adjust_head <> adjust_tail then
20425    append_list(adjust_head)(adjust_tail);
20426adjust_tail:=null
20427
20428@ Now |q| points to the hlist that represents the current line of the
20429paragraph. We need to compute the appropriate line width, pack the
20430line into a box of this size, and shift the box by the appropriate
20431amount of indentation.
20432
20433@<Call the packaging subroutine...@>=
20434if cur_line>last_special_line then
20435  begin cur_width:=second_width; cur_indent:=second_indent;
20436  end
20437else if par_shape_ptr=null then
20438  begin cur_width:=first_width; cur_indent:=first_indent;
20439  end
20440else  begin cur_width:=mem[par_shape_ptr+2*cur_line].sc;
20441  cur_indent:=mem[par_shape_ptr+2*cur_line-1].sc;
20442  end;
20443adjust_tail:=adjust_head;
20444pre_adjust_tail:=pre_adjust_head;
20445just_box:=hpack(q,cur_width,exactly);
20446shift_amount(just_box):=cur_indent
20447
20448@ Penalties between the lines of a paragraph come from club and widow lines,
20449from the |inter_line_penalty| parameter, and from lines that end at
20450discretionary breaks.  Breaking between lines of a two-line paragraph gets
20451both club-line and widow-line penalties. The local variable |pen| will
20452be set to the sum of all relevant penalties for the current line, except
20453that the final line is never penalized.
20454
20455@<Append a penalty node, if a nonzero penalty is appropriate@>=
20456if cur_line+1<>best_line then
20457  begin q:=inter_line_penalties_ptr;
20458  if q<>null then
20459    begin r:=cur_line;
20460    if r>penalty(q) then r:=penalty(q);
20461    pen:=penalty(q+r);
20462    end
20463  else pen:=inter_line_penalty;
20464  q:=club_penalties_ptr;
20465  if q<>null then
20466    begin r:=cur_line-prev_graf;
20467    if r>penalty(q) then r:=penalty(q);
20468    pen:=pen+penalty(q+r);
20469    end
20470  else if cur_line=prev_graf+1 then pen:=pen+club_penalty;
20471  if d then q:=display_widow_penalties_ptr
20472  else q:=widow_penalties_ptr;
20473  if q<>null then
20474    begin r:=best_line-cur_line-1;
20475    if r>penalty(q) then r:=penalty(q);
20476    pen:=pen+penalty(q+r);
20477    end
20478  else if cur_line+2=best_line then
20479    if d then pen:=pen+display_widow_penalty
20480    else pen:=pen+widow_penalty;
20481  if disc_break then pen:=pen+broken_penalty;
20482  if pen<>0 then
20483    begin r:=new_penalty(pen);
20484    link(tail):=r; tail:=r;
20485    end;
20486  end
20487
20488@* \[40] Pre-hyphenation.
20489When the line-breaking routine is unable to find a feasible sequence of
20490breakpoints, it makes a second pass over the paragraph, attempting to
20491hyphenate the hyphenatable words. The goal of hyphenation is to insert
20492discretionary material into the paragraph so that there are more
20493potential places to break.
20494
20495The general rules for hyphenation are somewhat complex and technical,
20496because we want to be able to hyphenate words that are preceded or
20497followed by punctuation marks, and because we want the rules to work
20498for languages other than English. We also must contend with the fact
20499that hyphens might radically alter the ligature and kerning structure
20500of a word.
20501
20502A sequence of characters will be considered for hyphenation only if it
20503belongs to a ``potentially hyphenatable part'' of the current paragraph.
20504This is a sequence of nodes $p_0p_1\ldots p_m$ where $p_0$ is a glue node,
20505$p_1\ldots p_{m-1}$ are either character or ligature or whatsit or
20506implicit kern or text direction nodes, and $p_m$ is a glue or penalty or
20507insertion or adjust
20508or mark or whatsit or explicit kern node.  (Therefore hyphenation is
20509disabled by boxes, math formulas, and discretionary nodes already inserted
20510by the user.) The ligature nodes among $p_1\ldots p_{m-1}$ are effectively
20511expanded into the original non-ligature characters; the kern nodes and
20512whatsits are ignored. Each character |c| is now classified as either a
20513nonletter (if |lc_code(c)=0|), a lowercase letter (if
20514|lc_code(c)=c|), or an uppercase letter (otherwise); an uppercase letter
20515is treated as if it were |lc_code(c)| for purposes of hyphenation. The
20516characters generated by $p_1\ldots p_{m-1}$ may begin with nonletters; let
20517$c_1$ be the first letter that is not in the middle of a ligature. Whatsit
20518nodes preceding $c_1$ are ignored; a whatsit found after $c_1$ will be the
20519terminating node $p_m$. All characters that do not have the same font as
20520$c_1$ will be treated as nonletters. The |hyphen_char| for that font
20521must be between 0 and 255, otherwise hyphenation will not be attempted.
20522\TeX\ looks ahead for as many consecutive letters $c_1\ldots c_n$ as
20523possible; however, |n| must be less than 64, so a character that would
20524otherwise be $c_{64}$ is effectively not a letter. Furthermore $c_n$ must
20525not be in the middle of a ligature.  In this way we obtain a string of
20526letters $c_1\ldots c_n$ that are generated by nodes $p_a\ldots p_b$, where
20527|1<=a<=b+1<=m|. If |n>=l_hyf+r_hyf|, this string qualifies for hyphenation;
20528however, |uc_hyph| must be positive, if $c_1$ is uppercase.
20529
20530The hyphenation process takes place in three stages. First, the candidate
20531sequence $c_1\ldots c_n$ is found; then potential positions for hyphens
20532are determined by referring to hyphenation tables; and finally, the nodes
20533$p_a\ldots p_b$ are replaced by a new sequence of nodes that includes the
20534discretionary breaks found.
20535
20536Fortunately, we do not have to do all this calculation very often, because
20537of the way it has been taken out of \TeX's inner loop. For example, when
20538the second edition of the author's 700-page book {\sl Seminumerical
20539Algorithms} was typeset by \TeX, only about 1.2 hyphenations needed to be
20540@^Knuth, Donald Ervin@>
20541tried per paragraph, since the line breaking algorithm needed to use two
20542passes on only about 5 per cent of the paragraphs.
20543
20544@<Initialize for hyphenating...@>=
20545begin @!init if trie_not_ready then init_trie;@+tini@;@/
20546cur_lang:=init_cur_lang; l_hyf:=init_l_hyf; r_hyf:=init_r_hyf;
20547set_hyph_index;
20548end
20549
20550@ The letters $c_1\ldots c_n$ that are candidates for hyphenation are placed
20551into an array called |hc|; the number |n| is placed into |hn|; pointers to
20552nodes $p_{a-1}$ and~$p_b$ in the description above are placed into variables
20553|ha| and |hb|; and the font number is placed into |hf|.
20554
20555@<Glob...@>=
20556@!hc:array[0..66] of 0..number_usvs; {word to be hyphenated}
20557{ note that element 0 needs to be a full UnicodeScalar, even though we
20558  basically work in UTF16 }
20559@!hn:small_number; {the number of positions occupied in |hc|}
20560@!ha,@!hb:pointer; {nodes |ha..hb| should be replaced by the hyphenated result}
20561@!hf:internal_font_number; {font number of the letters in |hc|}
20562@!hu:array[0..64] of 0..too_big_char;
20563     {like |hc|, before conversion to lowercase}
20564@!hyf_char:integer; {hyphen character of the relevant font}
20565@!cur_lang,@!init_cur_lang:0..biggest_lang;
20566     {current hyphenation table of interest}
20567@!l_hyf,@!r_hyf,@!init_l_hyf,@!init_r_hyf:integer; {limits on fragment sizes}
20568@!hyf_bchar:halfword; {boundary character after $c_n$}
20569@!max_hyph_char:integer;
20570
20571@ @<Set initial values of key variables@>=
20572max_hyph_char:=too_big_lang;
20573
20574@ Hyphenation routines need a few more local variables.
20575
20576@<Local variables for line...@>=
20577@!j:small_number; {an index into |hc| or |hu|}
20578@!c:UnicodeScalar; {character being considered for hyphenation}
20579
20580@ When the following code is activated, the |line_break| procedure is in its
20581second pass, and |cur_p| points to a glue node.
20582
20583@<Try to hyphenate...@>=
20584begin prev_s:=cur_p; s:=link(prev_s);
20585if s<>null then
20586  begin @<Skip to node |ha|, or |goto done1| if no hyphenation
20587    should be attempted@>;
20588  if l_hyf+r_hyf>63 then goto done1;
20589if is_native_word_node(ha) then begin
20590  @<Check that nodes after |native_word| permit hyphenation; if not, |goto done1|@>;
20591  @<Prepare a |native_word_node| for hyphenation@>;
20592end else begin
20593  @<Skip to node |hb|, putting letters into |hu| and |hc|@>;
20594end;
20595  @<Check that the nodes following |hb| permit hyphenation and that at least
20596    |l_hyf+r_hyf| letters have been found, otherwise |goto done1|@>;
20597  hyphenate;
20598  end;
20599done1: end
20600
20601@ @<Declare subprocedures for |line_break|@>=
20602@t\4@>@<Declare the function called |reconstitute|@>
20603procedure hyphenate;
20604label common_ending,done,found,found1,found2,not_found,exit;
20605var @<Local variables for hyphenation@>@;
20606begin @<Find hyphen locations for the word in |hc|, or |return|@>;
20607@<If no hyphens were found, |return|@>;
20608@<Replace nodes |ha..hb| by a sequence of nodes that includes
20609  the discretionary hyphens@>;
20610exit:end;
20611
20612@ @<Check that nodes after |native_word| permit hyphenation; if not, |goto done1|@>=
20613s:=link(ha);
20614loop@+  begin if not(is_char_node(s)) then
20615  case type(s) of
20616    ligature_node: do_nothing;
20617    kern_node: if subtype(s)<>normal then goto done6;
20618    whatsit_node,glue_node,penalty_node,ins_node,adjust_node,mark_node:
20619      goto done6;
20620    othercases goto done1
20621    endcases;
20622  s:=link(s);
20623  end;
20624done6:
20625
20626@ @<Prepare a |native_word_node| for hyphenation@>=
20627{ note that if there are chars with |lccode = 0|, we split them out into separate |native_word| nodes }
20628hn:=0;
20629restart:
20630for l:=0 to native_length(ha)-1 do begin
20631  c:=get_native_usv(ha, l);
20632  set_lc_code(c);
20633  if (hc[0] = 0) then begin
20634    if (hn > 0) then begin
20635      { we've got some letters, and now found a non-letter, so break off the tail of the |native_word|
20636        and link it after this node, and goto done3 }
20637      @<Split the |native_word_node| at |l| and link the second part after |ha|@>;
20638      goto done3;
20639    end
20640  end else if (hn = 0) and (l > 0) then begin
20641    { we've found the first letter after some non-letters, so break off the head of the |native_word| and restart }
20642    @<Split the |native_word_node| at |l| and link the second part after |ha|@>;
20643    ha:=link(ha);
20644    goto restart;
20645  end else if (hn = 63) then
20646    { reached max hyphenatable length }
20647    goto done3
20648  else begin
20649    { found a letter that is part of a potentially hyphenatable sequence }
20650    incr(hn);
20651    if c<@"10000 then begin
20652      hu[hn]:=c; hc[hn]:=hc[0];
20653      end
20654    else begin
20655      hu[hn]:=(c - @"10000) div @"400 + @"D800;
20656      hc[hn]:=(hc[0] - @"10000) div @"400 + @"D800;
20657      incr(hn);
20658      hu[hn]:=c mod @"400 + @"DC00;
20659      hc[hn]:=hc[0] mod @"400 + @"DC00;
20660      incr(l);
20661      end;
20662    hyf_bchar:=non_char;
20663  end
20664end;
20665
20666@ @<Split the |native_word_node| at |l| and link the second part after |ha|@>=
20667  q:=new_native_word_node(hf, native_length(ha) - l);
20668  for i:=l to native_length(ha) - 1 do
20669    set_native_char(q, i - l, get_native_char(ha, i));
20670  set_native_metrics(q, XeTeX_use_glyph_metrics);
20671  link(q):=link(ha);
20672  link(ha):=q;
20673  { truncate text in node |ha| }
20674  native_length(ha):=l;
20675  set_native_metrics(ha, XeTeX_use_glyph_metrics);
20676
20677@ @<Local variables for line breaking@>=
20678l: integer;
20679i: integer;
20680
20681@ The first thing we need to do is find the node |ha| just before the
20682first letter.
20683
20684@<Skip to node |ha|, or |goto done1|...@>=
20685loop@+  begin if is_char_node(s) then
20686    begin c:=qo(character(s)); hf:=font(s);
20687    end
20688  else if type(s)=ligature_node then
20689    if lig_ptr(s)=null then goto continue
20690    else begin q:=lig_ptr(s); c:=qo(character(q)); hf:=font(q);
20691      end
20692  else if (type(s)=kern_node)and(subtype(s)=normal) then goto continue
20693  else if (type(s)=math_node)and(subtype(s)>=L_code) then goto continue
20694  else if type(s)=whatsit_node then
20695    begin
20696      if subtype(s) = native_word_node then begin
20697        { we only consider the node if it contains at least one letter, otherwise we'll skip it }
20698        for l:=0 to native_length(s) - 1 do begin
20699          c:=get_native_usv(s, l);
20700          if lc_code(c) <> 0 then begin
20701            hf:=native_font(s);
20702            prev_s:=s;
20703            goto done2;
20704          end;
20705          if c>=@"10000 then incr(l);
20706        end
20707      end;
20708      @<Advance \(p)past a whatsit node in the \(p)pre-hyphenation loop@>;
20709      goto continue
20710    end
20711  else goto done1;
20712  set_lc_code(c);
20713  if hc[0]<>0 then
20714    if (hc[0]=c)or(uc_hyph>0) then goto done2
20715    else goto done1;
20716continue: prev_s:=s; s:=link(prev_s);
20717  end;
20718done2: hyf_char:=hyphen_char[hf];
20719if hyf_char<0 then goto done1;
20720if hyf_char>biggest_char then goto done1;
20721ha:=prev_s
20722
20723@ The word to be hyphenated is now moved to the |hu| and |hc| arrays.
20724
20725@<Skip to node |hb|, putting letters...@>=
20726hn:=0;
20727loop@+  begin if is_char_node(s) then
20728    begin if font(s)<>hf then goto done3;
20729    hyf_bchar:=character(s); c:=qo(hyf_bchar);
20730    set_lc_code(c);
20731    if hc[0]=0 then goto done3;
20732    if hc[0]>max_hyph_char then goto done3;
20733    if hn=63 then goto done3;
20734    hb:=s; incr(hn); hu[hn]:=c; hc[hn]:=hc[0]; hyf_bchar:=non_char;
20735    end
20736  else if type(s)=ligature_node then
20737    @<Move the characters of a ligature node to |hu| and |hc|;
20738      but |goto done3| if they are not all letters@>
20739  else if (type(s)=kern_node)and(subtype(s)=normal) then
20740    begin hb:=s;
20741    hyf_bchar:=font_bchar[hf];
20742    end
20743  else goto done3;
20744  s:=link(s);
20745  end;
20746done3:
20747
20748@ We let |j| be the index of the character being stored when a ligature node
20749is being expanded, since we do not want to advance |hn| until we are sure
20750that the entire ligature consists of letters. Note that it is possible
20751to get to |done3| with |hn=0| and |hb| not set to any value.
20752
20753@<Move the characters of a ligature node to |hu| and |hc|...@>=
20754begin if font(lig_char(s))<>hf then goto done3;
20755j:=hn; q:=lig_ptr(s);@+if q>null then hyf_bchar:=character(q);
20756while q>null do
20757  begin c:=qo(character(q));
20758  set_lc_code(c);
20759  if hc[0]=0 then goto done3;
20760  if hc[0]>max_hyph_char then goto done3;
20761  if j=63 then goto done3;
20762  incr(j); hu[j]:=c; hc[j]:=hc[0];@/
20763  q:=link(q);
20764  end;
20765hb:=s; hn:=j;
20766if odd(subtype(s)) then hyf_bchar:=font_bchar[hf]@+else hyf_bchar:=non_char;
20767end
20768
20769@ @<Check that the nodes following |hb| permit hyphenation...@>=
20770if hn<l_hyf+r_hyf then goto done1; {|l_hyf| and |r_hyf| are |>=1|}
20771loop@+  begin if not(is_char_node(s)) then
20772  case type(s) of
20773    ligature_node: do_nothing;
20774    kern_node: if subtype(s)<>normal then goto done4;
20775    whatsit_node,glue_node,penalty_node,ins_node,adjust_node,mark_node:
20776      goto done4;
20777    math_node: if subtype(s)>=L_code then goto done4@+else goto done1;
20778    othercases goto done1
20779    endcases;
20780  s:=link(s);
20781  end;
20782done4:
20783
20784@* \[41] Post-hyphenation.
20785If a hyphen may be inserted between |hc[j]| and |hc[j+1]|, the hyphenation
20786procedure will set |hyf[j]| to some small odd number. But before we look
20787at \TeX's hyphenation procedure, which is independent of the rest of the
20788line-breaking algorithm, let us consider what we will do with the hyphens
20789it finds, since it is better to work on this part of the program before
20790forgetting what |ha| and |hb|, etc., are all about.
20791
20792@<Glob...@>=
20793@!hyf:array [0..64] of 0..9; {odd values indicate discretionary hyphens}
20794@!init_list:pointer; {list of punctuation characters preceding the word}
20795@!init_lig:boolean; {does |init_list| represent a ligature?}
20796@!init_lft:boolean; {if so, did the ligature involve a left boundary?}
20797
20798@ @<Local variables for hyphenation@>=
20799@!i,@!j,@!l:0..65; {indices into |hc| or |hu|}
20800@!q,@!r,@!s:pointer; {temporary registers for list manipulation}
20801@!bchar:halfword; {right boundary character of hyphenated word, or |non_char|}
20802
20803@ \TeX\ will never insert a hyphen that has fewer than
20804\.{\\lefthyphenmin} letters before it or fewer than
20805\.{\\righthyphenmin} after it; hence, a short word has
20806comparatively little chance of being hyphenated. If no hyphens have
20807been found, we can save time by not having to make any changes to the
20808paragraph.
20809
20810@<If no hyphens were found, |return|@>=
20811for j:=l_hyf to hn-r_hyf do if odd(hyf[j]) then goto found1;
20812return;
20813found1:
20814
20815@ If hyphens are in fact going to be inserted, \TeX\ first deletes the
20816subsequence of nodes between |ha| and~|hb|. An attempt is made to
20817preserve the effect that implicit boundary characters and punctuation marks
20818had on ligatures inside the hyphenated word, by storing a left boundary or
20819preceding character in |hu[0]| and by storing a possible right boundary
20820in |bchar|. We set |j:=0| if |hu[0]| is to be part of the reconstruction;
20821otherwise |j:=1|.
20822The variable |s| will point to the tail of the current hlist, and
20823|q| will point to the node following |hb|, so that
20824things can be hooked up after we reconstitute the hyphenated word.
20825
20826@<Replace nodes |ha..hb| by a sequence of nodes...@>=
20827if is_native_word_node(ha) then begin
20828 @<Hyphenate the |native_word_node| at |ha|@>;
20829end else begin
20830q:=link(hb); link(hb):=null; r:=link(ha); link(ha):=null; bchar:=hyf_bchar;
20831if is_char_node(ha) then
20832  if font(ha)<>hf then goto found2
20833  else begin init_list:=ha; init_lig:=false; hu[0]:=qo(character(ha));
20834    end
20835else if type(ha)=ligature_node then
20836  if font(lig_char(ha))<>hf then goto found2
20837  else begin init_list:=lig_ptr(ha); init_lig:=true; init_lft:=(subtype(ha)>1);
20838    hu[0]:=qo(character(lig_char(ha)));
20839    if init_list=null then if init_lft then
20840      begin hu[0]:=max_hyph_char; init_lig:=false;
20841      end; {in this case a ligature will be reconstructed from scratch}
20842    free_node(ha,small_node_size);
20843    end
20844else begin {no punctuation found; look for left boundary}
20845  if not is_char_node(r) then if type(r)=ligature_node then
20846   if subtype(r)>1 then goto found2;
20847  j:=1; s:=ha; init_list:=null; goto common_ending;
20848  end;
20849s:=cur_p; {we have |cur_p<>ha| because |type(cur_p)=glue_node|}
20850while link(s)<>ha do s:=link(s);
20851j:=0; goto common_ending;
20852found2: s:=ha; j:=0; hu[0]:=max_hyph_char; init_lig:=false; init_list:=null;
20853common_ending: flush_node_list(r);
20854@<Reconstitute nodes for the hyphenated word, inserting discretionary hyphens@>;
20855flush_list(init_list);
20856end
20857
20858@ @<Hyphenate the |native_word_node| at |ha|@>=
20859{ find the node immediately before the word to be hyphenated }
20860s:=cur_p; {we have |cur_p<>ha| because |type(cur_p)=glue_node|}
20861while link(s) <> ha do s:=link(s);
20862
20863{ for each hyphen position,
20864  create a |native_word_node| fragment for the text before this point,
20865  and a |disc_node| for the break, with the |hyf_char| in the |pre_break| text
20866}
20867
20868hyphen_passed:=0; { location of last hyphen we saw }
20869
20870for j:=l_hyf to hn - r_hyf do begin
20871  { if this is a valid break.... }
20872  if odd(hyf[j]) then begin
20873
20874    { make a |native_word_node| for the fragment before the hyphen }
20875    q:=new_native_word_node(hf, j - hyphen_passed);
20876    for i:=0 to j - hyphen_passed - 1 do
20877      set_native_char(q, i, get_native_char(ha, i + hyphen_passed));
20878    set_native_metrics(q, XeTeX_use_glyph_metrics);
20879    link(s):=q; { append the new node }
20880    s:=q;
20881
20882    { make the |disc_node| for the hyphenation point }
20883    q:=new_disc;
20884    pre_break(q):=new_native_character(hf, hyf_char);
20885    link(s):=q;
20886    s:=q;
20887
20888    hyphen_passed:=j;
20889  end
20890end;
20891
20892{ make a |native_word_node| for the last fragment of the word }
20893hn:=native_length(ha); { ensure trailing punctuation is not lost! }
20894q:=new_native_word_node(hf, hn - hyphen_passed);
20895for i:=0 to hn - hyphen_passed - 1 do
20896  set_native_char(q, i, get_native_char(ha, i + hyphen_passed));
20897set_native_metrics(q, XeTeX_use_glyph_metrics);
20898link(s):=q; { append the new node }
20899s:=q;
20900
20901q:=link(ha);
20902link(s):=q;
20903link(ha):=null;
20904flush_node_list(ha);
20905
20906@ We must now face the fact that the battle is not over, even though the
20907{\def\!{\kern-1pt}%
20908hyphens have been found: The process of reconstituting a word can be nontrivial
20909because ligatures might change when a hyphen is present. {\sl The \TeX book\/}
20910discusses the difficulties of the word ``difficult'', and
20911the discretionary material surrounding a
20912hyphen can be considerably more complex than that. Suppose
20913\.{abcdef} is a word in a font for which the only ligatures are \.{b\!c},
20914\.{c\!d}, \.{d\!e}, and \.{e\!f}. If this word permits hyphenation
20915between \.b and \.c, the two patterns with and without hyphenation are
20916$\.a\,\.b\,\.-\,\.{c\!d}\,\.{e\!f}$ and $\.a\,\.{b\!c}\,\.{d\!e}\,\.f$.
20917Thus the insertion of a hyphen might cause effects to ripple arbitrarily
20918far into the rest of the word. A further complication arises if additional
20919hyphens appear together with such rippling, e.g., if the word in the
20920example just given could also be hyphenated between \.c and \.d; \TeX\
20921avoids this by simply ignoring the additional hyphens in such weird cases.}
20922
20923Still further complications arise in the presence of ligatures that do not
20924delete the original characters. When punctuation precedes the word being
20925hyphenated, \TeX's method is not perfect under all possible scenarios,
20926because punctuation marks and letters can propagate information back and forth.
20927For example, suppose the original pre-hyphenation pair
20928\.{*a} changes to \.{*y} via a \.{\?=:} ligature, which changes to \.{xy}
20929via a \.{=:\?} ligature; if $p_{a-1}=\.x$ and $p_a=\.y$, the reconstitution
20930procedure isn't smart enough to obtain \.{xy} again. In such cases the
20931font designer should include a ligature that goes from \.{xa} to \.{xy}.
20932
20933@ The processing is facilitated by a subroutine called |reconstitute|. Given
20934a string of characters $x_j\ldots x_n$, there is a smallest index $m\ge j$
20935such that the ``translation'' of $x_j\ldots x_n$ by ligatures and kerning
20936has the form $y_1\ldots y_t$ followed by the translation of $x_{m+1}\ldots x_n$,
20937where $y_1\ldots y_t$ is some nonempty sequence of character, ligature, and
20938kern nodes. We call $x_j\ldots x_m$ a ``cut prefix'' of $x_j\ldots x_n$.
20939For example, if $x_1x_2x_3=\.{fly}$, and if the font contains `fl' as a
20940ligature and a kern between `fl' and `y', then $m=2$, $t=2$, and $y_1$ will
20941be a ligature node for `fl' followed by an appropriate kern node~$y_2$.
20942In the most common case, $x_j$~forms no ligature with $x_{j+1}$ and we
20943simply have $m=j$, $y_1=x_j$. If $m<n$ we can repeat the procedure on
20944$x_{m+1}\ldots x_n$ until the entire translation has been found.
20945
20946The |reconstitute| function returns the integer $m$ and puts the nodes
20947$y_1\ldots y_t$ into a linked list starting at |link(hold_head)|,
20948getting the input $x_j\ldots x_n$ from the |hu| array. If $x_j=256$,
20949we consider $x_j$ to be an implicit left boundary character; in this
20950case |j| must be strictly less than~|n|. There is a
20951parameter |bchar|, which is either 256 or an implicit right boundary character
20952assumed to be present just following~$x_n$. (The value |hu[n+1]| is never
20953explicitly examined, but the algorithm imagines that |bchar| is there.)
20954
20955If there exists an index |k| in the range $j\le k\le m$ such that |hyf[k]|
20956is odd and such that the result of |reconstitute| would have been different
20957if $x_{k+1}$ had been |hchar|, then |reconstitute| sets |hyphen_passed|
20958to the smallest such~|k|. Otherwise it sets |hyphen_passed| to zero.
20959
20960A special convention is used in the case |j=0|: Then we assume that the
20961translation of |hu[0]| appears in a special list of charnodes starting at
20962|init_list|; moreover, if |init_lig| is |true|, then |hu[0]| will be
20963a ligature character, involving a left boundary if |init_lft| is |true|.
20964This facility is provided for cases when a hyphenated
20965word is preceded by punctuation (like single or double quotes) that might
20966affect the translation of the beginning of the word.
20967
20968@<Glob...@>=
20969@!hyphen_passed:small_number; {first hyphen in a ligature, if any}
20970
20971@ @<Declare the function called |reconstitute|@>=
20972function reconstitute(@!j,@!n:small_number;@!bchar,@!hchar:halfword):
20973  small_number;
20974label continue,done;
20975var @!p:pointer; {temporary register for list manipulation}
20976@!t:pointer; {a node being appended to}
20977@!q:four_quarters; {character information or a lig/kern instruction}
20978@!cur_rh:halfword; {hyphen character for ligature testing}
20979@!test_char:halfword; {hyphen or other character for ligature testing}
20980@!w:scaled; {amount of kerning}
20981@!k:font_index; {position of current lig/kern instruction}
20982begin hyphen_passed:=0; t:=hold_head; w:=0; link(hold_head):=null;
20983 {at this point |ligature_present=lft_hit=rt_hit=false|}
20984@<Set up data structures with the cursor following position |j|@>;
20985continue:@<If there's a ligature or kern at the cursor position, update the data
20986  structures, possibly advancing~|j|; continue until the cursor moves@>;
20987@<Append a ligature and/or kern to the translation;
20988  |goto continue| if the stack of inserted ligatures is nonempty@>;
20989reconstitute:=j;
20990end;
20991
20992@ The reconstitution procedure shares many of the global data structures
20993by which \TeX\ has processed the words before they were hyphenated.
20994There is an implied ``cursor'' between characters |cur_l| and |cur_r|;
20995these characters will be tested for possible ligature activity. If
20996|ligature_present| then |cur_l| is a ligature character formed from the
20997original characters following |cur_q| in the current translation list.
20998There is a ``ligature stack'' between the cursor and character |j+1|,
20999consisting of pseudo-ligature nodes linked together by their |link| fields.
21000This stack is normally empty unless a ligature command has created a new
21001character that will need to be processed later. A pseudo-ligature is
21002a special node having a |character| field that represents a potential
21003ligature and a |lig_ptr| field that points to a |char_node| or is |null|.
21004We have
21005$$|cur_r|=\cases{|character(lig_stack)|,&if |lig_stack>null|;\cr
21006  |qi(hu[j+1])|,&if |lig_stack=null| and |j<n|;\cr
21007  bchar,&if |lig_stack=null| and |j=n|.\cr}$$
21008
21009@<Glob...@>=
21010@!cur_l,@!cur_r:halfword; {characters before and after the cursor}
21011@!cur_q:pointer; {where a ligature should be detached}
21012@!lig_stack:pointer; {unfinished business to the right of the cursor}
21013@!ligature_present:boolean; {should a ligature node be made for |cur_l|?}
21014@!lft_hit,@!rt_hit:boolean; {did we hit a ligature with a boundary character?}
21015
21016@ @d append_charnode_to_t(#)== begin link(t):=get_avail; t:=link(t);
21017    font(t):=hf; character(t):=#;
21018    end
21019@d set_cur_r==begin if j<n then cur_r:=qi(hu[j+1])@+else cur_r:=bchar;
21020    if odd(hyf[j]) then cur_rh:=hchar@+else cur_rh:=non_char;
21021    end
21022
21023@<Set up data structures with the cursor following position |j|@>=
21024cur_l:=qi(hu[j]); cur_q:=t;
21025if j=0 then
21026  begin ligature_present:=init_lig; p:=init_list;
21027  if ligature_present then lft_hit:=init_lft;
21028  while p>null do
21029    begin append_charnode_to_t(character(p)); p:=link(p);
21030    end;
21031  end
21032else if cur_l<non_char then append_charnode_to_t(cur_l);
21033lig_stack:=null; set_cur_r
21034
21035@ We may want to look at the lig/kern program twice, once for a hyphen
21036and once for a normal letter. (The hyphen might appear after the letter
21037in the program, so we'd better not try to look for both at once.)
21038
21039@<If there's a ligature or kern at the cursor position, update...@>=
21040if cur_l=non_char then
21041  begin k:=bchar_label[hf];
21042  if k=non_address then goto done@+else q:=font_info[k].qqqq;
21043  end
21044else begin q:=char_info(hf)(cur_l);
21045  if char_tag(q)<>lig_tag then goto done;
21046  k:=lig_kern_start(hf)(q); q:=font_info[k].qqqq;
21047  if skip_byte(q)>stop_flag then
21048    begin k:=lig_kern_restart(hf)(q); q:=font_info[k].qqqq;
21049    end;
21050  end; {now |k| is the starting address of the lig/kern program}
21051if cur_rh<non_char then test_char:=cur_rh@+else test_char:=cur_r;
21052loop@+begin if next_char(q)=test_char then if skip_byte(q)<=stop_flag then
21053    if cur_rh<non_char then
21054      begin hyphen_passed:=j; hchar:=non_char; cur_rh:=non_char;
21055      goto continue;
21056      end
21057    else begin if hchar<non_char then if odd(hyf[j]) then
21058        begin hyphen_passed:=j; hchar:=non_char;
21059        end;
21060      if op_byte(q)<kern_flag then
21061      @<Carry out a ligature replacement, updating the cursor structure
21062        and possibly advancing~|j|; |goto continue| if the cursor doesn't
21063        advance, otherwise |goto done|@>;
21064      w:=char_kern(hf)(q); goto done; {this kern will be inserted below}
21065     end;
21066  if skip_byte(q)>=stop_flag then
21067    if cur_rh=non_char then goto done
21068    else begin cur_rh:=non_char; goto continue;
21069      end;
21070  k:=k+qo(skip_byte(q))+1; q:=font_info[k].qqqq;
21071  end;
21072done:
21073
21074@ @d wrap_lig(#)==if ligature_present then
21075    begin p:=new_ligature(hf,cur_l,link(cur_q));
21076    if lft_hit then
21077      begin subtype(p):=2; lft_hit:=false;
21078      end;
21079    if # then if lig_stack=null then
21080      begin incr(subtype(p)); rt_hit:=false;
21081      end;
21082    link(cur_q):=p; t:=p; ligature_present:=false;
21083    end
21084@d pop_lig_stack==begin if lig_ptr(lig_stack)>null then
21085    begin link(t):=lig_ptr(lig_stack); {this is a charnode for |hu[j+1]|}
21086    t:=link(t); incr(j);
21087    end;
21088  p:=lig_stack; lig_stack:=link(p); free_node(p,small_node_size);
21089  if lig_stack=null then set_cur_r@+else cur_r:=character(lig_stack);
21090  end {if |lig_stack| isn't |null| we have |cur_rh=non_char|}
21091
21092@<Append a ligature and/or kern to the translation...@>=
21093wrap_lig(rt_hit);
21094if w<>0 then
21095  begin link(t):=new_kern(w); t:=link(t); w:=0;
21096  end;
21097if lig_stack>null then
21098  begin cur_q:=t; cur_l:=character(lig_stack); ligature_present:=true;
21099  pop_lig_stack; goto continue;
21100  end
21101
21102@ @<Carry out a ligature replacement, updating the cursor structure...@>=
21103begin if cur_l=non_char then lft_hit:=true;
21104if j=n then if lig_stack=null then rt_hit:=true;
21105check_interrupt; {allow a way out in case there's an infinite ligature loop}
21106case op_byte(q) of
21107qi(1),qi(5):begin cur_l:=rem_byte(q); {\.{=:\?}, \.{=:\?>}}
21108  ligature_present:=true;
21109  end;
21110qi(2),qi(6):begin cur_r:=rem_byte(q); {\.{\?=:}, \.{\?=:>}}
21111  if lig_stack>null then character(lig_stack):=cur_r
21112  else begin lig_stack:=new_lig_item(cur_r);
21113    if j=n then bchar:=non_char
21114    else begin p:=get_avail; lig_ptr(lig_stack):=p;
21115      character(p):=qi(hu[j+1]); font(p):=hf;
21116      end;
21117    end;
21118  end;
21119qi(3):begin cur_r:=rem_byte(q); {\.{\?=:\?}}
21120  p:=lig_stack; lig_stack:=new_lig_item(cur_r); link(lig_stack):=p;
21121  end;
21122qi(7),qi(11):begin wrap_lig(false); {\.{\?=:\?>}, \.{\?=:\?>>}}
21123  cur_q:=t; cur_l:=rem_byte(q); ligature_present:=true;
21124  end;
21125othercases begin cur_l:=rem_byte(q); ligature_present:=true; {\.{=:}}
21126  if lig_stack>null then pop_lig_stack
21127  else if j=n then goto done
21128  else begin append_charnode_to_t(cur_r); incr(j); set_cur_r;
21129    end;
21130  end
21131endcases;
21132if op_byte(q)>qi(4) then if op_byte(q)<>qi(7) then goto done;
21133goto continue;
21134end
21135
21136@ Okay, we're ready to insert the potential hyphenations that were found.
21137When the following program is executed, we want to append the word
21138|hu[1..hn]| after node |ha|, and node |q| should be appended to the result.
21139During this process, the variable |i| will be a temporary
21140index into |hu|; the variable |j| will be an index to our current position
21141in |hu|; the variable |l| will be the counterpart of |j|, in a discretionary
21142branch; the variable |r| will point to new nodes being created; and
21143we need a few new local variables:
21144
21145@<Local variables for hyph...@>=
21146@!major_tail,@!minor_tail:pointer; {the end of lists in the main and
21147  discretionary branches being reconstructed}
21148@!c:UnicodeScalar; {character temporarily replaced by a hyphen}
21149@!c_loc:0..63; {where that character came from}
21150@!r_count:integer; {replacement count for discretionary}
21151@!hyf_node:pointer; {the hyphen, if it exists}
21152
21153@ When the following code is performed, |hyf[0]| and |hyf[hn]| will be zero.
21154
21155@<Reconstitute nodes for the hyphenated word...@>=
21156repeat l:=j; j:=reconstitute(j,hn,bchar,qi(hyf_char))+1;
21157if hyphen_passed=0 then
21158  begin link(s):=link(hold_head);
21159  while link(s)>null do s:=link(s);
21160  if odd(hyf[j-1]) then
21161    begin l:=j; hyphen_passed:=j-1; link(hold_head):=null;
21162    end;
21163  end;
21164if hyphen_passed>0 then
21165  @<Create and append a discretionary node as an alternative to the
21166    unhyphenated word, and continue to develop both branches until they
21167    become equivalent@>;
21168until j>hn;
21169link(s):=q
21170
21171@ In this repeat loop we will insert another discretionary if |hyf[j-1]| is
21172odd, when both branches of the previous discretionary end at position |j-1|.
21173Strictly speaking, we aren't justified in doing this, because we don't know
21174that a hyphen after |j-1| is truly independent of those branches. But in almost
21175all applications we would rather not lose a potentially valuable hyphenation
21176point. (Consider the word `difficult', where the letter `c' is in position |j|.)
21177
21178@d advance_major_tail==begin major_tail:=link(major_tail); incr(r_count);
21179    end
21180
21181@<Create and append a discretionary node as an alternative...@>=
21182repeat r:=get_node(small_node_size);
21183link(r):=link(hold_head); type(r):=disc_node;
21184major_tail:=r; r_count:=0;
21185while link(major_tail)>null do advance_major_tail;
21186i:=hyphen_passed; hyf[i]:=0;
21187@<Put the \(c)characters |hu[l..i]| and a hyphen into |pre_break(r)|@>;
21188@<Put the \(c)characters |hu[i+1..@,]| into |post_break(r)|, appending to this
21189  list and to |major_tail| until synchronization has been achieved@>;
21190@<Move pointer |s| to the end of the current list, and set |replace_count(r)|
21191  appropriately@>;
21192hyphen_passed:=j-1; link(hold_head):=null;
21193until not odd(hyf[j-1])
21194
21195@ The new hyphen might combine with the previous character via ligature
21196or kern. At this point we have |l-1<=i<j| and |i<hn|.
21197
21198@<Put the \(c)characters |hu[l..i]| and a hyphen into |pre_break(r)|@>=
21199minor_tail:=null; pre_break(r):=null; hyf_node:=new_character(hf,hyf_char);
21200if hyf_node<>null then
21201  begin incr(i); c:=hu[i]; hu[i]:=hyf_char; free_avail(hyf_node);
21202  end;
21203while l<=i do
21204  begin l:=reconstitute(l,i,font_bchar[hf],non_char)+1;
21205  if link(hold_head)>null then
21206    begin if minor_tail=null then pre_break(r):=link(hold_head)
21207    else link(minor_tail):=link(hold_head);
21208    minor_tail:=link(hold_head);
21209    while link(minor_tail)>null do minor_tail:=link(minor_tail);
21210    end;
21211  end;
21212if hyf_node<>null then
21213  begin hu[i]:=c; {restore the character in the hyphen position}
21214  l:=i; decr(i);
21215  end
21216
21217@ The synchronization algorithm begins with |l=i+1<=j|.
21218
21219@<Put the \(c)characters |hu[i+1..@,]| into |post_break(r)|...@>=
21220minor_tail:=null; post_break(r):=null; c_loc:=0;
21221if bchar_label[hf]<>non_address then {put left boundary at beginning of new line}
21222  begin decr(l); c:=hu[l]; c_loc:=l; hu[l]:=max_hyph_char;
21223  end;
21224while l<j do
21225  begin repeat l:=reconstitute(l,hn,bchar,non_char)+1;
21226  if c_loc>0 then
21227    begin hu[c_loc]:=c; c_loc:=0;
21228    end;
21229  if link(hold_head)>null then
21230    begin if minor_tail=null then post_break(r):=link(hold_head)
21231    else link(minor_tail):=link(hold_head);
21232    minor_tail:=link(hold_head);
21233    while link(minor_tail)>null do minor_tail:=link(minor_tail);
21234    end;
21235  until l>=j;
21236  while l>j do
21237    @<Append characters of |hu[j..@,]| to |major_tail|, advancing~|j|@>;
21238  end
21239
21240@ @<Append characters of |hu[j..@,]|...@>=
21241begin j:=reconstitute(j,hn,bchar,non_char)+1;
21242link(major_tail):=link(hold_head);
21243while link(major_tail)>null do advance_major_tail;
21244end
21245
21246@ Ligature insertion can cause a word to grow exponentially in size. Therefore
21247we must test the size of |r_count| here, even though the hyphenated text
21248was at most 63 characters long.
21249
21250@<Move pointer |s| to the end of the current list...@>=
21251if r_count>127 then {we have to forget the discretionary hyphen}
21252  begin link(s):=link(r); link(r):=null; flush_node_list(r);
21253  end
21254else begin link(s):=r; replace_count(r):=r_count;
21255  end;
21256s:=major_tail
21257
21258@* \[42] Hyphenation.
21259When a word |hc[1..hn]| has been set up to contain a candidate for hyphenation,
21260\TeX\ first looks to see if it is in the user's exception dictionary. If not,
21261hyphens are inserted based on patterns that appear within the given word,
21262using an algorithm due to Frank~M. Liang.
21263@^Liang, Franklin Mark@>
21264
21265Let's consider Liang's method first, since it is much more interesting than the
21266exception-lookup routine.  The algorithm begins by setting |hyf[j]| to zero
21267for all |j|, and invalid characters are inserted into |hc[0]|
21268and |hc[hn+1]| to serve as delimiters. Then a reasonably fast method is
21269used to see which of a given set of patterns occurs in the word
21270|hc[0..(hn+1)]|. Each pattern $p_1\ldots p_k$ of length |k| has an associated
21271sequence of |k+1| numbers $n_0\ldots n_k$; and if the pattern occurs in
21272|hc[(j+1)..(j+k)]|, \TeX\ will set |hyf[j+i]:=@tmax@>(hyf[j+i],@t$n_i$@>)| for
21273|0<=i<=k|. After this has been done for each pattern that occurs, a
21274discretionary hyphen will be inserted between |hc[j]| and |hc[j+1]| when
21275|hyf[j]| is odd, as we have already seen.
21276
21277The set of patterns $p_1\ldots p_k$ and associated numbers $n_0\ldots n_k$
21278depends, of course, on the language whose words are being hyphenated, and
21279on the degree of hyphenation that is desired. A method for finding
21280appropriate |p|'s and |n|'s, from a given dictionary of words and acceptable
21281hyphenations, is discussed in Liang's Ph.D. thesis (Stanford University,
212821983); \TeX\ simply starts with the patterns and works from there.
21283
21284@ The patterns are stored in a compact table that is also efficient for
21285retrieval, using a variant of ``trie memory'' [cf.\ {\sl The Art of
21286Computer Programming \bf3} (1973), 481--505]. We can find each pattern
21287$p_1\ldots p_k$ by letting $z_0$ be one greater than the relevant language
21288index and then, for |1<=i<=k|,
21289setting |@t$z_i$@>:=trie_link@t$(z_{i-1})+p_i$@>|; the pattern will be
21290identified by the number $z_k$. Since all the pattern information is
21291packed together into a single |trie_link| array, it is necessary to
21292prevent confusion between the data from inequivalent patterns, so another
21293table is provided such that |trie_char@t$(z_i)=p_i$@>| for all |i|. There
21294is also a table |trie_op|$(z_k)$ to identify the numbers $n_0\ldots n_k$
21295associated with $p_1\ldots p_k$.
21296
21297Comparatively few different number sequences $n_0\ldots n_k$ actually occur,
21298since most of the |n|'s are generally zero. Therefore the number sequences
21299are encoded in such a way that |trie_op|$(z_k)$ is only one byte long.
21300If |trie_op(@t$z_k$@>)<>min_quarterword|, when $p_1\ldots p_k$ has matched
21301the letters in |hc[(l-k+1)..l@,]| of language |t|,
21302we perform all of the required operations
21303for this pattern by carrying out the following little program: Set
21304|v:=trie_op(@t$z_k$@>)|. Then set |v:=v+op_start[t]|,
21305|hyf[l-hyf_distance[v]]:=@tmax@>(hyf[l-hyf_distance[v]], hyf_num[v])|,
21306and |v:=hyf_next[v]|; repeat, if necessary, until |v=min_quarterword|.
21307
21308@<Types...@>=
21309@!trie_pointer=0..trie_size; {an index into |trie|}
21310
21311@ @d trie_link(#)==trie[#].rh {``downward'' link in a trie}
21312@d trie_char(#)==trie[#].b1 {character matched at this trie location}
21313@d trie_op(#)==trie[#].b0 {program for hyphenation at this trie location}
21314
21315@<Glob...@>=
21316@!trie:array[trie_pointer] of two_halves; {|trie_link|, |trie_char|, |trie_op|}
21317@!hyf_distance:array[1..trie_op_size] of small_number; {position |k-j| of $n_j$}
21318@!hyf_num:array[1..trie_op_size] of small_number; {value of $n_j$}
21319@!hyf_next:array[1..trie_op_size] of quarterword; {continuation code}
21320@!op_start:array[0..biggest_lang] of 0..trie_op_size; {offset for current language}
21321
21322@ @<Local variables for hyph...@>=
21323@!z:trie_pointer; {an index into |trie|}
21324@!v:integer; {an index into |hyf_distance|, etc.}
21325
21326@ Assuming that these auxiliary tables have been set up properly, the
21327hyphenation algorithm is quite short. In the following code we set |hc[hn+2]|
21328to the impossible value 256, in order to guarantee that |hc[hn+3]| will
21329never be fetched.
21330
21331@<Find hyphen locations for the word in |hc|...@>=
21332for j:=0 to hn do hyf[j]:=0;
21333@<Look for the word |hc[1..hn]| in the exception table, and |goto found| (with
21334  |hyf| containing the hyphens) if an entry is found@>;
21335if trie_char(cur_lang+1)<>qi(cur_lang) then return; {no patterns for |cur_lang|}
21336hc[0]:=0; hc[hn+1]:=0; hc[hn+2]:=max_hyph_char; {insert delimiters}
21337for j:=0 to hn-r_hyf+1 do
21338  begin z:=trie_link(cur_lang+1)+hc[j]; l:=j;
21339  while hc[l]=qo(trie_char(z)) do
21340    begin if trie_op(z)<>min_quarterword then
21341      @<Store \(m)maximum values in the |hyf| table@>;
21342    incr(l); z:=trie_link(z)+hc[l];
21343    end;
21344  end;
21345found: for j:=0 to l_hyf-1 do hyf[j]:=0;
21346for j:=0 to r_hyf-1 do hyf[hn-j]:=0
21347
21348@ @<Store \(m)maximum values in the |hyf| table@>=
21349begin v:=trie_op(z);
21350repeat v:=v+op_start[cur_lang]; i:=l-hyf_distance[v];
21351if hyf_num[v]>hyf[i] then hyf[i]:=hyf_num[v];
21352v:=hyf_next[v];
21353until v=min_quarterword;
21354end
21355
21356@ The exception table that is built by \TeX's \.{\\hyphenation} primitive is
21357organized as an ordered hash table [cf.\ Amble and Knuth, {\sl The Computer
21358@^Amble, Ole@> @^Knuth, Donald Ervin@>
21359Journal\/ \bf17} (1974), 135--142] using linear probing. If $\alpha$ and
21360$\beta$ are words, we will say that $\alpha<\beta$ if $\vert\alpha\vert<
21361\vert\beta\vert$ or if $\vert\alpha\vert=\vert\beta\vert$ and
21362$\alpha$ is lexicographically smaller than $\beta$. (The notation $\vert
21363\alpha\vert$ stands for the length of $\alpha$.) The idea of ordered hashing
21364is to arrange the table so that a given word $\alpha$ can be sought by computing
21365a hash address $h=h(\alpha)$ and then looking in table positions |h|, |h-1|,
21366\dots, until encountering the first word $\L\alpha$. If this word is
21367different from $\alpha$, we can conclude that $\alpha$ is not in the table.
21368
21369The words in the table point to lists in |mem| that specify hyphen positions
21370in their |info| fields. The list for $c_1\ldots c_n$ contains the number |k| if
21371the word $c_1\ldots c_n$ has a discretionary hyphen between $c_k$ and
21372$c_{k+1}$.
21373
21374@<Types...@>=
21375@!hyph_pointer=0..hyph_size; {an index into the ordered hash table}
21376
21377@ @<Glob...@>=
21378@!hyph_word:array[hyph_pointer] of str_number; {exception words}
21379@!hyph_list:array[hyph_pointer] of pointer; {lists of hyphen positions}
21380@!hyph_count:hyph_pointer; {the number of words in the exception dictionary}
21381
21382@ @<Local variables for init...@>=
21383@!z:hyph_pointer; {runs through the exception dictionary}
21384
21385@ @<Set init...@>=
21386for z:=0 to hyph_size do
21387  begin hyph_word[z]:=0; hyph_list[z]:=null;
21388  end;
21389hyph_count:=0;
21390
21391@ The algorithm for exception lookup is quite simple, as soon as we have
21392a few more local variables to work with.
21393
21394@<Local variables for hyph...@>=
21395@!h:hyph_pointer; {an index into |hyph_word| and |hyph_list|}
21396@!k:str_number; {an index into |str_start|}
21397@!u:pool_pointer; {an index into |str_pool|}
21398
21399@ First we compute the hash code |h|, then we search until we either
21400find the word or we don't. Words from different languages are kept
21401separate by appending the language code to the string.
21402
21403@<Look for the word |hc[1...@>=
21404h:=hc[1]; incr(hn); hc[hn]:=cur_lang;
21405for j:=2 to hn do h:=(h+h+hc[j]) mod hyph_size;
21406loop@+  begin @<If the string |hyph_word[h]| is less than \(hc)|hc[1..hn]|,
21407    |goto not_found|; but if the two strings are equal,
21408    set |hyf| to the hyphen positions and |goto found|@>;
21409  if h>0 then decr(h)@+else h:=hyph_size;
21410  end;
21411not_found: decr(hn)
21412
21413@ @<If the string |hyph_word[h]| is less than \(hc)...@>=
21414k:=hyph_word[h]; if k=0 then goto not_found;
21415if length(k)<hn then goto not_found;
21416if length(k)=hn then
21417  begin j:=1; u:=str_start_macro(k);
21418  repeat if so(str_pool[u])<hc[j] then goto not_found;
21419  if so(str_pool[u])>hc[j] then goto done;
21420  incr(j); incr(u);
21421  until j>hn;
21422  @<Insert hyphens as specified in |hyph_list[h]|@>;
21423  decr(hn); goto found;
21424  end;
21425done:
21426
21427@ @<Insert hyphens as specified...@>=
21428s:=hyph_list[h];
21429while s<>null do
21430  begin hyf[info(s)]:=1; s:=link(s);
21431  end
21432
21433@ @<Search |hyph_list| for pointers to |p|@>=
21434for q:=0 to hyph_size do
21435  begin if hyph_list[q]=p then
21436    begin print_nl("HYPH("); print_int(q); print_char(")");
21437    end;
21438  end
21439
21440@ We have now completed the hyphenation routine, so the |line_break| procedure
21441is finished at last. Since the hyphenation exception table is fresh in our
21442minds, it's a good time to deal with the routine that adds new entries to it.
21443
21444When \TeX\ has scanned `\.{\\hyphenation}', it calls on a procedure named
21445|new_hyph_exceptions| to do the right thing.
21446
21447@d set_cur_lang==if language<=0 then cur_lang:=0
21448  else if language>biggest_lang then cur_lang:=0
21449  else cur_lang:=language
21450
21451@p procedure new_hyph_exceptions; {enters new exceptions}
21452label reswitch, exit, found, not_found, not_found1;
21453var n:0..64; {length of current word; not always a |small_number|}
21454@!j:0..64; {an index into |hc|}
21455@!h:hyph_pointer; {an index into |hyph_word| and |hyph_list|}
21456@!k:str_number; {an index into |str_start|}
21457@!p:pointer; {head of a list of hyphen positions}
21458@!q:pointer; {used when creating a new node for list |p|}
21459@!s,@!t:str_number; {strings being compared or stored}
21460@!u,@!v:pool_pointer; {indices into |str_pool|}
21461begin scan_left_brace; {a left brace must follow \.{\\hyphenation}}
21462set_cur_lang;
21463@!init if trie_not_ready then
21464  begin hyph_index:=0; goto not_found1;
21465  end;
21466tini@/
21467set_hyph_index;
21468not_found1:
21469@<Enter as many hyphenation exceptions as are listed,
21470until coming to a right brace; then |return|@>;
21471exit:end;
21472
21473@ @<Enter as many...@>=
21474n:=0; p:=null;
21475loop@+  begin get_x_token;
21476  reswitch: case cur_cmd of
21477  letter,other_char,char_given:@<Append a new letter or hyphen@>;
21478  char_num: begin scan_char_num; cur_chr:=cur_val; cur_cmd:=char_given;
21479    goto reswitch;
21480    end;
21481  spacer,right_brace: begin if n>1 then @<Enter a hyphenation exception@>;
21482    if cur_cmd=right_brace then return;
21483    n:=0; p:=null;
21484    end;
21485  othercases @<Give improper \.{\\hyphenation} error@>
21486  endcases;
21487  end
21488
21489@ @<Give improper \.{\\hyph...@>=
21490begin print_err("Improper "); print_esc("hyphenation");
21491@.Improper \\hyphenation...@>
21492  print(" will be flushed");
21493help2("Hyphenation exceptions must contain only letters")@/
21494  ("and hyphens. But continue; I'll forgive and forget.");
21495error;
21496end
21497
21498@ @<Append a new letter or hyphen@>=
21499if cur_chr="-" then @<Append the value |n| to list |p|@>
21500else  begin set_lc_code(cur_chr);
21501  if hc[0]=0 then
21502    begin print_err("Not a letter");
21503@.Not a letter@>
21504    help2("Letters in \hyphenation words must have \lccode>0.")@/
21505      ("Proceed; I'll ignore the character I just read.");
21506    error;
21507    end
21508  else if n<63 then
21509    begin incr(n);
21510      if hc[0]<@"10000 then hc[n]:=hc[0]
21511      else begin
21512        hc[n]:=(hc[0] - @"10000) div @"400 + @"D800;
21513        incr(n);
21514        hc[n]:=hc[0] mod @"400 + @"DC00;
21515        end;
21516    end;
21517  end
21518
21519@ @<Append the value |n| to list |p|@>=
21520begin if n<63 then
21521  begin q:=get_avail; link(q):=p; info(q):=n; p:=q;
21522  end;
21523end
21524
21525@ @<Enter a hyphenation exception@>=
21526begin incr(n); hc[n]:=cur_lang; str_room(n); h:=0;
21527for j:=1 to n do
21528  begin h:=(h+h+hc[j]) mod hyph_size;
21529  append_char(hc[j]);
21530  end;
21531s:=make_string;
21532@<Insert the \(p)pair |(s,p)| into the exception table@>;
21533end
21534
21535@ @<Insert the \(p)pair |(s,p)|...@>=
21536if hyph_count=hyph_size then overflow("exception dictionary",hyph_size);
21537@:TeX capacity exceeded exception dictionary}{\quad exception dictionary@>
21538incr(hyph_count);
21539while hyph_word[h]<>0 do
21540  begin @<If the string |hyph_word[h]| is less than \(or)or equal to
21541  |s|, interchange |(hyph_word[h],hyph_list[h])| with |(s,p)|@>;
21542  if h>0 then decr(h)@+else h:=hyph_size;
21543  end;
21544hyph_word[h]:=s; hyph_list[h]:=p
21545
21546@ @<If the string |hyph_word[h]| is less than \(or)...@>=
21547k:=hyph_word[h];
21548if length(k)<length(s) then goto found;
21549if length(k)>length(s) then goto not_found;
21550u:=str_start_macro(k); v:=str_start_macro(s);
21551repeat if str_pool[u]<str_pool[v] then goto found;
21552if str_pool[u]>str_pool[v] then goto not_found;
21553incr(u); incr(v);
21554until u=str_start_macro(k+1);
21555found:q:=hyph_list[h]; hyph_list[h]:=p; p:=q;@/
21556t:=hyph_word[h]; hyph_word[h]:=s; s:=t;
21557not_found:
21558
21559@* \[43] Initializing the hyphenation tables.
21560The trie for \TeX's hyphenation algorithm is built from a sequence of
21561patterns following a \.{\\patterns} specification. Such a specification
21562is allowed only in \.{INITEX}, since the extra memory for auxiliary tables
21563and for the initialization program itself would only clutter up the
21564production version of \TeX\ with a lot of deadwood.
21565
21566The first step is to build a trie that is linked, instead of packed
21567into sequential storage, so that insertions are readily made.
21568After all patterns have been processed, \.{INITEX}
21569compresses the linked trie by identifying common subtries. Finally the
21570trie is packed into the efficient sequential form that the hyphenation
21571algorithm actually uses.
21572
21573@<Declare subprocedures for |line_break|@>=
21574@!init @<Declare procedures for preprocessing hyphenation patterns@>@;
21575tini
21576
21577@ Before we discuss trie building in detail, let's consider the simpler
21578problem of creating the |hyf_distance|, |hyf_num|, and |hyf_next| arrays.
21579
21580Suppose, for example, that \TeX\ reads the pattern `\.{ab2cde1}'. This is
21581a pattern of length 5, with $n_0\ldots n_5=0\,0\,2\,0\,0\,1$ in the
21582notation above. We want the corresponding |trie_op| code |v| to have
21583|hyf_distance[v]=3|, |hyf_num[v]=2|, and |hyf_next[v]=@t$v^\prime$@>|,
21584where the auxiliary |trie_op| code $v^\prime$ has
21585|hyf_distance[@t$v^\prime$@>]=0|, |hyf_num[@t$v^\prime$@>]=1|, and
21586|hyf_next[@t$v^\prime$@>]=min_quarterword|.
21587
21588\TeX\ computes an appropriate value |v| with the |new_trie_op| subroutine
21589below, by setting
21590$$\hbox{|@t$v^\prime$@>:=new_trie_op(0,1,min_quarterword)|,\qquad
21591|v:=new_trie_op(3,2,@t$v^\prime$@>)|.}$$
21592This subroutine looks up its three
21593parameters in a special hash table, assigning a new value only if these
21594three have not appeared before for the current language.
21595
21596The hash table is called |trie_op_hash|, and the number of entries it contains
21597is |trie_op_ptr|.
21598
21599@<Glob...@>=
21600@!init@! trie_op_hash:array[-trie_op_size..trie_op_size] of 0..trie_op_size;
21601  {trie op codes for quadruples}
21602@!trie_used:array[ASCII_code] of quarterword;
21603  {largest opcode used so far for this language}
21604@!trie_op_lang:array[1..trie_op_size] of 0..biggest_lang;
21605  {language part of a hashed quadruple}
21606@!trie_op_val:array[1..trie_op_size] of quarterword;
21607  {opcode corresponding to a hashed quadruple}
21608@!trie_op_ptr:0..trie_op_size; {number of stored ops so far}
21609tini
21610
21611@ It's tempting to remove the |overflow| stops in the following procedure;
21612|new_trie_op| could return |min_quarterword| (thereby simply ignoring
21613part of a hyphenation pattern) instead of aborting the job. However, that would
21614lead to different hyphenation results on different installations of \TeX\
21615using the same patterns. The |overflow| stops are necessary for portability
21616of patterns.
21617
21618@<Declare procedures for preprocessing hyph...@>=
21619function new_trie_op(@!d,@!n:small_number;@!v:quarterword):quarterword;
21620label exit;
21621var h:-trie_op_size..trie_op_size; {trial hash location}
21622@!u:quarterword; {trial op code}
21623@!l:0..trie_op_size; {pointer to stored data}
21624begin h:=abs(n+313*d+361*v+1009*cur_lang) mod (trie_op_size+trie_op_size)
21625  - trie_op_size;
21626loop@+  begin l:=trie_op_hash[h];
21627  if l=0 then {empty position found for a new op}
21628    begin if trie_op_ptr=trie_op_size then
21629      overflow("pattern memory ops",trie_op_size);
21630    u:=trie_used[cur_lang];
21631    if u=max_quarterword then
21632      overflow("pattern memory ops per language",
21633        max_quarterword-min_quarterword);
21634    incr(trie_op_ptr); incr(u); trie_used[cur_lang]:=u;
21635    hyf_distance[trie_op_ptr]:=d;
21636    hyf_num[trie_op_ptr]:=n; hyf_next[trie_op_ptr]:=v;
21637    trie_op_lang[trie_op_ptr]:=cur_lang; trie_op_hash[h]:=trie_op_ptr;
21638    trie_op_val[trie_op_ptr]:=u; new_trie_op:=u; return;
21639    end;
21640  if (hyf_distance[l]=d)and(hyf_num[l]=n)and(hyf_next[l]=v)
21641   and(trie_op_lang[l]=cur_lang) then
21642    begin new_trie_op:=trie_op_val[l]; return;
21643    end;
21644  if h>-trie_op_size then decr(h)@+else h:=trie_op_size;
21645  end;
21646exit:end;
21647
21648@ After |new_trie_op| has compressed the necessary opcode information,
21649plenty of information is available to unscramble the data into the
21650final form needed by our hyphenation algorithm.
21651
21652@<Sort \(t)the hyphenation op tables into proper order@>=
21653op_start[0]:=-min_quarterword;
21654for j:=1 to biggest_lang do op_start[j]:=op_start[j-1]+qo(trie_used[j-1]);
21655for j:=1 to trie_op_ptr do
21656  trie_op_hash[j]:=op_start[trie_op_lang[j]]+trie_op_val[j]; {destination}
21657for j:=1 to trie_op_ptr do while trie_op_hash[j]>j do
21658  begin k:=trie_op_hash[j];@/
21659  t:=hyf_distance[k]; hyf_distance[k]:=hyf_distance[j]; hyf_distance[j]:=t;@/
21660  t:=hyf_num[k]; hyf_num[k]:=hyf_num[j]; hyf_num[j]:=t;@/
21661  t:=hyf_next[k]; hyf_next[k]:=hyf_next[j]; hyf_next[j]:=t;@/
21662  trie_op_hash[j]:=trie_op_hash[k]; trie_op_hash[k]:=k;
21663  end
21664
21665@ Before we forget how to initialize the data structures that have been
21666mentioned so far, let's write down the code that gets them started.
21667
21668@<Initialize table entries...@>=
21669for k:=-trie_op_size to trie_op_size do trie_op_hash[k]:=0;
21670for k:=0 to 255 do trie_used[k]:=min_quarterword;
21671trie_op_ptr:=0;
21672
21673@ The linked trie that is used to preprocess hyphenation patterns appears
21674in several global arrays. Each node represents an instruction of the form
21675``if you see character |c|, then perform operation |o|, move to the
21676next character, and go to node |l|; otherwise go to node |r|.''
21677The four quantities |c|, |o|, |l|, and |r| are stored in four arrays
21678|trie_c|, |trie_o|, |trie_l|, and |trie_r|. The root of the trie
21679is |trie_l[0]|, and the number of nodes is |trie_ptr|. Null trie
21680pointers are represented by zero. To initialize the trie, we simply
21681set |trie_l[0]| and |trie_ptr| to zero. We also set |trie_c[0]| to some
21682arbitrary value, since the algorithm may access it.
21683
21684The algorithms maintain the condition
21685$$\hbox{|trie_c[trie_r[z]]>trie_c[z]|\qquad
21686whenever |z<>0| and |trie_r[z]<>0|};$$ in other words, sibling nodes are
21687ordered by their |c| fields.
21688
21689@d trie_root==trie_l[0] {root of the linked trie}
21690
21691@<Glob...@>=
21692@!init @!trie_c:packed array[trie_pointer] of packed_ASCII_code;
21693  {characters to match}
21694@t\hskip10pt@>@!trie_o:packed array[trie_pointer] of quarterword;
21695  {operations to perform}
21696@t\hskip10pt@>@!trie_l:packed array[trie_pointer] of trie_pointer;
21697  {left subtrie links}
21698@t\hskip10pt@>@!trie_r:packed array[trie_pointer] of trie_pointer;
21699  {right subtrie links}
21700@t\hskip10pt@>@!trie_ptr:trie_pointer; {the number of nodes in the trie}
21701@t\hskip10pt@>@!trie_hash:packed array[trie_pointer] of trie_pointer;
21702  {used to identify equivalent subtries}
21703tini
21704
21705@ Let us suppose that a linked trie has already been constructed.
21706Experience shows that we can often reduce its size by recognizing common
21707subtries; therefore another hash table is introduced for this purpose,
21708somewhat similar to |trie_op_hash|. The new hash table will be
21709initialized to zero.
21710
21711The function |trie_node(p)| returns |p| if |p| is distinct from other nodes
21712that it has seen, otherwise it returns the number of the first equivalent
21713node that it has seen.
21714
21715Notice that we might make subtries equivalent even if they correspond to
21716patterns for different languages, in which the trie ops might mean quite
21717different things. That's perfectly all right.
21718
21719@<Declare procedures for preprocessing hyph...@>=
21720function trie_node(@!p:trie_pointer):trie_pointer; {converts
21721  to a canonical form}
21722label exit;
21723var h:trie_pointer; {trial hash location}
21724@!q:trie_pointer; {trial trie node}
21725begin h:=abs(trie_c[p]+1009*trie_o[p]+@|
21726    2718*trie_l[p]+3142*trie_r[p]) mod trie_size;
21727loop@+  begin q:=trie_hash[h];
21728  if q=0 then
21729    begin trie_hash[h]:=p; trie_node:=p; return;
21730    end;
21731  if (trie_c[q]=trie_c[p])and(trie_o[q]=trie_o[p])and@|
21732    (trie_l[q]=trie_l[p])and(trie_r[q]=trie_r[p]) then
21733    begin trie_node:=q; return;
21734    end;
21735  if h>0 then decr(h)@+else h:=trie_size;
21736  end;
21737exit:end;
21738
21739@ A neat recursive procedure is now able to compress a trie by
21740traversing it and applying |trie_node| to its nodes in ``bottom up''
21741fashion. We will compress the entire trie by clearing |trie_hash| to
21742zero and then saying `|trie_root:=compress_trie(trie_root)|'.
21743@^recursion@>
21744
21745@<Declare procedures for preprocessing hyph...@>=
21746function compress_trie(@!p:trie_pointer):trie_pointer;
21747begin if p=0 then compress_trie:=0
21748else  begin trie_l[p]:=compress_trie(trie_l[p]);
21749  trie_r[p]:=compress_trie(trie_r[p]);
21750  compress_trie:=trie_node(p);
21751  end;
21752end;
21753
21754@ The compressed trie will be packed into the |trie| array using a
21755``top-down first-fit'' procedure. This is a little tricky, so the reader
21756should pay close attention: The |trie_hash| array is cleared to zero
21757again and renamed |trie_ref| for this phase of the operation; later on,
21758|trie_ref[p]| will be nonzero only if the linked trie node |p| is the
21759smallest character
21760in a family and if the characters |c| of that family have been allocated to
21761locations |trie_ref[p]+c| in the |trie| array. Locations of |trie| that
21762are in use will have |trie_link=0|, while the unused holes in |trie|
21763will be doubly linked with |trie_link| pointing to the next larger vacant
21764location and |trie_back| pointing to the next smaller one. This double
21765linking will have been carried out only as far as |trie_max|, where
21766|trie_max| is the largest index of |trie| that will be needed.
21767To save time at the low end of the trie, we maintain array entries
21768|trie_min[c]| pointing to the smallest hole that is greater than~|c|.
21769Another array |trie_taken| tells whether or not a given location is
21770equal to |trie_ref[p]| for some |p|; this array is used to ensure that
21771distinct nodes in the compressed trie will have distinct |trie_ref|
21772entries.
21773
21774@d trie_ref==trie_hash {where linked trie families go into |trie|}
21775@d trie_back(#)==trie[#].lh {backward links in |trie| holes}
21776
21777@<Glob...@>=
21778@!init@!trie_taken:packed array[1..trie_size] of boolean;
21779  {does a family start here?}
21780@t\hskip10pt@>@!trie_min:array[ASCII_code] of trie_pointer;
21781  {the first possible slot for each character}
21782@t\hskip10pt@>@!trie_max:trie_pointer; {largest location used in |trie|}
21783@t\hskip10pt@>@!trie_not_ready:boolean; {is the trie still in linked form?}
21784tini
21785
21786@ Each time \.{\\patterns} appears, it contributes further patterns to
21787the future trie, which will be built only when hyphenation is attempted or
21788when a format file is dumped. The boolean variable |trie_not_ready|
21789will change to |false| when the trie is compressed; this will disable
21790further patterns.
21791
21792@<Initialize table entries...@>=
21793trie_not_ready:=true; trie_root:=0; trie_c[0]:=si(0); trie_ptr:=0;
21794
21795@ Here is how the trie-compression data structures are initialized.
21796If storage is tight, it would be possible to overlap |trie_op_hash|,
21797|trie_op_lang|, and |trie_op_val| with |trie|, |trie_hash|, and |trie_taken|,
21798because we finish with the former just before we need the latter.
21799
21800@<Get ready to compress the trie@>=
21801@<Sort \(t)the hyphenation...@>;
21802for p:=0 to trie_size do trie_hash[p]:=0;
21803hyph_root:=compress_trie(hyph_root);
21804trie_root:=compress_trie(trie_root); {identify equivalent subtries}
21805for p:=0 to trie_ptr do trie_ref[p]:=0;
21806for p:=0 to biggest_char do trie_min[p]:=p+1;
21807trie_link(0):=1; trie_max:=0
21808
21809@ The |first_fit| procedure finds the smallest hole |z| in |trie| such that
21810a trie family starting at a given node |p| will fit into vacant positions
21811starting at |z|. If |c=trie_c[p]|, this means that location |z-c| must
21812not already be taken by some other family, and that |z-c+@t$c^\prime$@>|
21813must be vacant for all characters $c^\prime$ in the family. The procedure
21814sets |trie_ref[p]| to |z-c| when the first fit has been found.
21815
21816@<Declare procedures for preprocessing hyph...@>=
21817procedure first_fit(@!p:trie_pointer); {packs a family into |trie|}
21818label not_found,found;
21819var h:trie_pointer; {candidate for |trie_ref[p]|}
21820@!z:trie_pointer; {runs through holes}
21821@!q:trie_pointer; {runs through the family starting at |p|}
21822@!c:ASCII_code; {smallest character in the family}
21823@!l,@!r:trie_pointer; {left and right neighbors}
21824@!ll:1..too_big_char; {upper limit of |trie_min| updating}
21825begin c:=so(trie_c[p]);
21826z:=trie_min[c]; {get the first conceivably good hole}
21827loop@+  begin h:=z-c;@/
21828  @<Ensure that |trie_max>=h+max_hyph_char|@>;
21829  if trie_taken[h] then goto not_found;
21830  @<If all characters of the family fit relative to |h|, then
21831    |goto found|,\30\ otherwise |goto not_found|@>;
21832  not_found: z:=trie_link(z); {move to the next hole}
21833  end;
21834found: @<Pack the family into |trie| relative to |h|@>;
21835end;
21836
21837@ By making sure that |trie_max| is at least |h+max_hyph_char|,
21838we can be sure that
21839|trie_max>z|, since |h=z-c|. It follows that location |trie_max| will
21840never be occupied in |trie|, and we will have |trie_max>=trie_link(z)|.
21841
21842@<Ensure that |trie_max>=h+max_hyph_char|@>=
21843if trie_max<h+max_hyph_char then
21844  begin if trie_size<=h+max_hyph_char then overflow("pattern memory",trie_size);
21845@:TeX capacity exceeded pattern memory}{\quad pattern memory@>
21846  repeat incr(trie_max); trie_taken[trie_max]:=false;
21847  trie_link(trie_max):=trie_max+1; trie_back(trie_max):=trie_max-1;
21848  until trie_max=h+max_hyph_char;
21849  end
21850
21851@ @<If all characters of the family fit relative to |h|...@>=
21852q:=trie_r[p];
21853while q>0 do
21854  begin if trie_link(h+so(trie_c[q]))=0 then goto not_found;
21855  q:=trie_r[q];
21856  end;
21857goto found
21858
21859@ @<Pack the family into |trie| relative to |h|@>=
21860trie_taken[h]:=true; trie_ref[p]:=h; q:=p;
21861repeat z:=h+so(trie_c[q]); l:=trie_back(z); r:=trie_link(z);
21862trie_back(r):=l; trie_link(l):=r; trie_link(z):=0;
21863if l<max_hyph_char then
21864  begin if z<max_hyph_char then ll:=z @+else ll:=max_hyph_char;
21865  repeat trie_min[l]:=r; incr(l);
21866  until l=ll;
21867  end;
21868q:=trie_r[q];
21869until q=0
21870
21871@ To pack the entire linked trie, we use the following recursive procedure.
21872@^recursion@>
21873
21874@<Declare procedures for preprocessing hyph...@>=
21875procedure trie_pack(@!p:trie_pointer); {pack subtries of a family}
21876var q:trie_pointer; {a local variable that need not be saved on recursive calls}
21877begin repeat q:=trie_l[p];
21878if (q>0)and(trie_ref[q]=0) then
21879  begin first_fit(q); trie_pack(q);
21880  end;
21881p:=trie_r[p];
21882until p=0;
21883end;
21884
21885@ When the whole trie has been allocated into the sequential table, we
21886must go through it once again so that |trie| contains the correct
21887information. Null pointers in the linked trie will be represented by the
21888value~0, which properly implements an ``empty'' family.
21889
21890@<Move the data into |trie|@>=
21891h.rh:=0; h.b0:=min_quarterword; h.b1:=min_quarterword; {|trie_link:=0|,
21892  |trie_op:=min_quarterword|, |trie_char:=qi(0)|}
21893if trie_max=0 then {no patterns were given}
21894  begin for r:=0 to 256 do trie[r]:=h;
21895  trie_max:=256;
21896  end
21897else begin if hyph_root>0 then trie_fix(hyph_root);
21898  if trie_root>0 then trie_fix(trie_root); {this fixes the non-holes in |trie|}
21899  r:=0; {now we will zero out all the holes}
21900  repeat s:=trie_link(r); trie[r]:=h; r:=s;
21901  until r>trie_max;
21902  end;
21903trie_char(0):=qi("?"); {make |trie_char(c)<>c| for all |c|}
21904
21905@ The fixing-up procedure is, of course, recursive. Since the linked trie
21906usually has overlapping subtries, the same data may be moved several
21907times; but that causes no harm, and at most as much work is done as it
21908took to build the uncompressed trie.
21909@^recursion@>
21910
21911@<Declare procedures for preprocessing hyph...@>=
21912procedure trie_fix(@!p:trie_pointer); {moves |p| and its siblings into |trie|}
21913var q:trie_pointer; {a local variable that need not be saved on recursive calls}
21914@!c:ASCII_code; {another one that need not be saved}
21915@!z:trie_pointer; {|trie| reference; this local variable must be saved}
21916begin z:=trie_ref[p];
21917repeat q:=trie_l[p]; c:=so(trie_c[p]);
21918trie_link(z+c):=trie_ref[q]; trie_char(z+c):=qi(c); trie_op(z+c):=trie_o[p];
21919if q>0 then trie_fix(q);
21920p:=trie_r[p];
21921until p=0;
21922end;
21923
21924@ Now let's go back to the easier problem, of building the linked
21925trie.  When \.{INITEX} has scanned the `\.{\\patterns}' control
21926sequence, it calls on |new_patterns| to do the right thing.
21927
21928@<Declare procedures for preprocessing hyph...@>=
21929procedure new_patterns; {initializes the hyphenation pattern data}
21930label done, done1;
21931var k,@!l:0..64; {indices into |hc| and |hyf|;
21932                  not always in |small_number| range}
21933@!digit_sensed:boolean; {should the next digit be treated as a letter?}
21934@!v:quarterword; {trie op code}
21935@!p,@!q:trie_pointer; {nodes of trie traversed during insertion}
21936@!first_child:boolean; {is |p=trie_l[q]|?}
21937@!c:ASCII_code; {character being inserted}
21938begin if trie_not_ready then
21939  begin set_cur_lang; scan_left_brace; {a left brace must follow \.{\\patterns}}
21940  @<Enter all of the patterns into a linked trie, until coming to a right
21941  brace@>;
21942  if saving_hyph_codes>0 then
21943    @<Store hyphenation codes for current language@>;
21944  end
21945else begin print_err("Too late for "); print_esc("patterns");
21946  help1("All patterns must be given before typesetting begins.");
21947  error; link(garbage):=scan_toks(false,false); flush_list(def_ref);
21948  end;
21949end;
21950
21951@ Novices are not supposed to be using \.{\\patterns}, so the error
21952messages are terse. (Note that all error messages appear in \TeX's string
21953pool, even if they are used only by \.{INITEX}.)
21954
21955@<Enter all of the patterns into a linked trie...@>=
21956k:=0; hyf[0]:=0; digit_sensed:=false;
21957loop@+  begin get_x_token;
21958  case cur_cmd of
21959  letter,other_char:@<Append a new letter or a hyphen level@>;
21960  spacer,right_brace: begin if k>0 then
21961      @<Insert a new pattern into the linked trie@>;
21962    if cur_cmd=right_brace then goto done;
21963    k:=0; hyf[0]:=0; digit_sensed:=false;
21964    end;
21965  othercases begin print_err("Bad "); print_esc("patterns");
21966@.Bad \\patterns@>
21967    help1("(See Appendix H.)"); error;
21968    end
21969  endcases;
21970  end;
21971done:
21972
21973@ @<Append a new letter or a hyphen level@>=
21974if digit_sensed or(cur_chr<"0")or(cur_chr>"9") then
21975  begin if cur_chr="." then cur_chr:=0 {edge-of-word delimiter}
21976  else  begin cur_chr:=lc_code(cur_chr);
21977    if cur_chr=0 then
21978      begin print_err("Nonletter");
21979@.Nonletter@>
21980      help1("(See Appendix H.)"); error;
21981      end;
21982    end;
21983    if cur_chr>max_hyph_char then max_hyph_char:=cur_chr;
21984  if k<63 then
21985    begin incr(k); hc[k]:=cur_chr; hyf[k]:=0; digit_sensed:=false;
21986    end;
21987  end
21988else if k<63 then
21989  begin hyf[k]:=cur_chr-"0"; digit_sensed:=true;
21990  end
21991
21992@ When the following code comes into play, the pattern $p_1\ldots p_k$
21993appears in |hc[1..k]|, and the corresponding sequence of numbers $n_0\ldots
21994n_k$ appears in |hyf[0..k]|.
21995
21996@<Insert a new pattern into the linked trie@>=
21997begin @<Compute the trie op code, |v|, and set |l:=0|@>;
21998q:=0; hc[0]:=cur_lang;
21999while l<=k do
22000  begin c:=hc[l]; incr(l); p:=trie_l[q]; first_child:=true;
22001  while (p>0)and(c>so(trie_c[p])) do
22002    begin q:=p; p:=trie_r[q]; first_child:=false;
22003    end;
22004  if (p=0)or(c<so(trie_c[p])) then
22005    @<Insert a new trie node between |q| and |p|, and
22006      make |p| point to it@>;
22007  q:=p; {now node |q| represents $p_1\ldots p_{l-1}$}
22008  end;
22009if trie_o[q]<>min_quarterword then
22010  begin print_err("Duplicate pattern");
22011@.Duplicate pattern@>
22012  help1("(See Appendix H.)"); error;
22013  end;
22014trie_o[q]:=v;
22015end
22016
22017@ @<Insert a new trie node between |q| and |p|...@>=
22018begin if trie_ptr=trie_size then overflow("pattern memory",trie_size);
22019@:TeX capacity exceeded pattern memory}{\quad pattern memory@>
22020incr(trie_ptr); trie_r[trie_ptr]:=p; p:=trie_ptr; trie_l[p]:=0;
22021if first_child then trie_l[q]:=p@+else trie_r[q]:=p;
22022trie_c[p]:=si(c); trie_o[p]:=min_quarterword;
22023end
22024
22025@ @<Compute the trie op code, |v|...@>=
22026if hc[1]=0 then hyf[0]:=0;
22027if hc[k]=0 then hyf[k]:=0;
22028l:=k; v:=min_quarterword;
22029loop@+  begin if hyf[l]<>0 then v:=new_trie_op(k-l,hyf[l],v);
22030  if l>0 then decr(l)@+else goto done1;
22031  end;
22032done1:
22033
22034@ Finally we put everything together: Here is how the trie gets to its
22035final, efficient form.
22036The following packing routine is rigged so that the root of the linked
22037tree gets mapped into location 1 of |trie|, as required by the hyphenation
22038algorithm. This happens because the first call of |first_fit| will
22039``take'' location~1.
22040
22041@<Declare procedures for preprocessing hyphenation patterns@>=
22042procedure init_trie;
22043var @!p:trie_pointer; {pointer for initialization}
22044@!j,@!k,@!t:integer; {all-purpose registers for initialization}
22045@!r,@!s:trie_pointer; {used to clean up the packed |trie|}
22046@!h:two_halves; {template used to zero out |trie|'s holes}
22047begin
22048incr(max_hyph_char);
22049@<Get ready to compress the trie@>;
22050if trie_root<>0 then
22051  begin first_fit(trie_root); trie_pack(trie_root);
22052  end;
22053if hyph_root<>0 then @<Pack all stored |hyph_codes|@>;
22054@<Move the data into |trie|@>;
22055trie_not_ready:=false;
22056end;
22057
22058@* \[44] Breaking vertical lists into pages.
22059The |vsplit| procedure, which implements \TeX's \.{\\vsplit} operation,
22060is considerably simpler than |line_break| because it doesn't have to
22061worry about hyphenation, and because its mission is to discover a single
22062break instead of an optimum sequence of breakpoints.  But before we get
22063into the details of |vsplit|, we need to consider a few more basic things.
22064
22065@ A subroutine called |prune_page_top| takes a pointer to a vlist and
22066returns a pointer to a modified vlist in which all glue, kern, and penalty nodes
22067have been deleted before the first box or rule node. However, the first
22068box or rule is actually preceded by a newly created glue node designed so that
22069the topmost baseline will be at distance |split_top_skip| from the top,
22070whenever this is possible without backspacing.
22071
22072When the second argument |s| is |false| the deleted nodes are destroyed,
22073otherwise they are collected in a list starting at |split_disc|.
22074
22075In this routine and those that follow, we make use of the fact that a
22076vertical list contains no character nodes, hence the |type| field exists
22077for each node in the list.
22078@^data structure assumptions@>
22079
22080@p function prune_page_top(@!p:pointer;@!s:boolean):pointer;
22081  {adjust top after page break}
22082var prev_p:pointer; {lags one step behind |p|}
22083@!q,@!r:pointer; {temporary variables for list manipulation}
22084begin prev_p:=temp_head; link(temp_head):=p;
22085while p<>null do
22086  case type(p) of
22087  hlist_node,vlist_node,rule_node:@<Insert glue for |split_top_skip|
22088    and set~|p:=null|@>;
22089  whatsit_node,mark_node,ins_node: begin prev_p:=p; p:=link(prev_p);
22090    end;
22091  glue_node,kern_node,penalty_node: begin q:=p; p:=link(q); link(q):=null;
22092    link(prev_p):=p;
22093    if s then
22094      begin if split_disc=null then split_disc:=q@+else link(r):=q;
22095      r:=q;
22096      end
22097    else flush_node_list(q);
22098    end;
22099  othercases confusion("pruning")
22100@:this can't happen pruning}{\quad pruning@>
22101  endcases;
22102prune_page_top:=link(temp_head);
22103end;
22104
22105@ @<Insert glue for |split_top_skip|...@>=
22106begin q:=new_skip_param(split_top_skip_code); link(prev_p):=q; link(q):=p;
22107  {now |temp_ptr=glue_ptr(q)|}
22108if width(temp_ptr)>height(p) then width(temp_ptr):=width(temp_ptr)-height(p)
22109else width(temp_ptr):=0;
22110p:=null;
22111end
22112
22113@ The next subroutine finds the best place to break a given vertical list
22114so as to obtain a box of height~|h|, with maximum depth~|d|.
22115A pointer to the beginning of the vertical list is given,
22116and a pointer to the optimum breakpoint is returned. The list is effectively
22117followed by a forced break, i.e., a penalty node with the |eject_penalty|;
22118if the best break occurs at this artificial node, the value |null| is returned.
22119
22120An array of six |scaled| distances is used to keep track of the height
22121from the beginning of the list to the current place, just as in |line_break|.
22122In fact, we use one of the same arrays, only changing its name to reflect
22123its new significance.
22124
22125@d active_height==active_width {new name for the six distance variables}
22126@d cur_height==active_height[1] {the natural height}
22127@d set_height_zero(#)==active_height[#]:=0 {initialize the height to zero}
22128@#
22129@d update_heights=90 {go here to record glue in the |active_height| table}
22130
22131@p function vert_break(@!p:pointer; @!h,@!d:scaled):pointer;
22132  {finds optimum page break}
22133label done,not_found,update_heights;
22134var prev_p:pointer; {if |p| is a glue node, |type(prev_p)| determines
22135  whether |p| is a legal breakpoint}
22136@!q,@!r:pointer; {glue specifications}
22137@!pi:integer; {penalty value}
22138@!b:integer; {badness at a trial breakpoint}
22139@!least_cost:integer; {the smallest badness plus penalties found so far}
22140@!best_place:pointer; {the most recent break that leads to |least_cost|}
22141@!prev_dp:scaled; {depth of previous box in the list}
22142@!t:small_number; {|type| of the node following a kern}
22143begin prev_p:=p; {an initial glue node is not a legal breakpoint}
22144least_cost:=awful_bad; do_all_six(set_height_zero); prev_dp:=0;
22145loop@+  begin @<If node |p| is a legal breakpoint, check if this break is
22146    the best known, and |goto done| if |p| is null or
22147    if the page-so-far is already too full to accept more stuff@>;
22148  prev_p:=p; p:=link(prev_p);
22149  end;
22150done: vert_break:=best_place;
22151end;
22152
22153@ A global variable |best_height_plus_depth| will be set to the natural size
22154of the box that corresponds to the optimum breakpoint found by |vert_break|.
22155(This value is used by the insertion-splitting algorithm of the page builder.)
22156
22157@<Glob...@>=
22158@!best_height_plus_depth:scaled; {height of the best box, without stretching or
22159  shrinking}
22160
22161@ A subtle point to be noted here is that the maximum depth~|d| might be
22162negative, so |cur_height| and |prev_dp| might need to be corrected even
22163after a glue or kern node.
22164
22165@<If node |p| is a legal breakpoint, check...@>=
22166if p=null then pi:=eject_penalty
22167else  @<Use node |p| to update the current height and depth measurements;
22168    if this node is not a legal breakpoint, |goto not_found|
22169    or |update_heights|,
22170    otherwise set |pi| to the associated penalty at the break@>;
22171@<Check if node |p| is a new champion breakpoint; then \(go)|goto done|
22172  if |p| is a forced break or if the page-so-far is already too full@>;
22173if (type(p)<glue_node)or(type(p)>kern_node) then goto not_found;
22174update_heights: @<Update the current height and depth measurements with
22175  respect to a glue or kern node~|p|@>;
22176not_found: if prev_dp>d then
22177    begin cur_height:=cur_height+prev_dp-d;
22178    prev_dp:=d;
22179    end;
22180
22181@ @<Use node |p| to update the current height and depth measurements...@>=
22182case type(p) of
22183hlist_node,vlist_node,rule_node: begin@t@>@;@/
22184  cur_height:=cur_height+prev_dp+height(p); prev_dp:=depth(p);
22185  goto not_found;
22186  end;
22187whatsit_node:@<Process whatsit |p| in |vert_break| loop, |goto not_found|@>;
22188glue_node: if precedes_break(prev_p) then pi:=0
22189  else goto update_heights;
22190kern_node: begin if link(p)=null then t:=penalty_node
22191  else t:=type(link(p));
22192  if t=glue_node then pi:=0@+else goto update_heights;
22193  end;
22194penalty_node: pi:=penalty(p);
22195mark_node,ins_node: goto not_found;
22196othercases confusion("vertbreak")
22197@:this can't happen vertbreak}{\quad vertbreak@>
22198endcases
22199
22200@ @d deplorable==100000 {more than |inf_bad|, but less than |awful_bad|}
22201
22202@<Check if node |p| is a new champion breakpoint; then \(go)...@>=
22203if pi<inf_penalty then
22204  begin @<Compute the badness, |b|, using |awful_bad|
22205    if the box is too full@>;
22206  if b<awful_bad then
22207    if pi<=eject_penalty then b:=pi
22208    else if b<inf_bad then b:=b+pi
22209      else b:=deplorable;
22210  if b<=least_cost then
22211    begin best_place:=p; least_cost:=b;
22212    best_height_plus_depth:=cur_height+prev_dp;
22213    end;
22214  if (b=awful_bad)or(pi<=eject_penalty) then goto done;
22215  end
22216
22217@ @<Compute the badness, |b|, using |awful_bad| if the box is too full@>=
22218if cur_height<h then
22219  if (active_height[3]<>0) or (active_height[4]<>0) or
22220    (active_height[5]<>0) then b:=0
22221  else b:=badness(h-cur_height,active_height[2])
22222else if cur_height-h>active_height[6] then b:=awful_bad
22223else b:=badness(cur_height-h,active_height[6])
22224
22225@ Vertical lists that are subject to the |vert_break| procedure should not
22226contain infinite shrinkability, since that would permit any amount of
22227information to ``fit'' on one page.
22228
22229@<Update the current height and depth measurements with...@>=
22230if type(p)=kern_node then q:=p
22231else  begin q:=glue_ptr(p);
22232  active_height[2+stretch_order(q)]:=@|
22233    active_height[2+stretch_order(q)]+stretch(q);@/
22234  active_height[6]:=active_height[6]+shrink(q);
22235  if (shrink_order(q)<>normal)and(shrink(q)<>0) then
22236    begin@t@>@;@/
22237    print_err("Infinite glue shrinkage found in box being split");@/
22238@.Infinite glue shrinkage...@>
22239    help4("The box you are \vsplitting contains some infinitely")@/
22240      ("shrinkable glue, e.g., `\vss' or `\vskip 0pt minus 1fil'.")@/
22241      ("Such glue doesn't belong there; but you can safely proceed,")@/
22242      ("since the offensive shrinkability has been made finite.");
22243    error; r:=new_spec(q); shrink_order(r):=normal; delete_glue_ref(q);
22244    glue_ptr(p):=r; q:=r;
22245    end;
22246  end;
22247cur_height:=cur_height+prev_dp+width(q); prev_dp:=0
22248
22249@ Now we are ready to consider |vsplit| itself. Most of
22250its work is accomplished by the two subroutines that we have just considered.
22251
22252Given the number of a vlist box |n|, and given a desired page height |h|,
22253the |vsplit| function finds the best initial segment of the vlist and
22254returns a box for a page of height~|h|. The remainder of the vlist, if
22255any, replaces the original box, after removing glue and penalties and
22256adjusting for |split_top_skip|. Mark nodes in the split-off box are used to
22257set the values of |split_first_mark| and |split_bot_mark|; we use the
22258fact that |split_first_mark=null| if and only if |split_bot_mark=null|.
22259
22260The original box becomes ``void'' if and only if it has been entirely
22261extracted.  The extracted box is ``void'' if and only if the original
22262box was void (or if it was, erroneously, an hlist box).
22263
22264@p @t\4@>@<Declare the function called |do_marks|@>@;
22265function vsplit(@!n:halfword; @!h:scaled):pointer;
22266  {extracts a page of height |h| from box |n|}
22267label exit,done;
22268var v:pointer; {the box to be split}
22269p:pointer; {runs through the vlist}
22270q:pointer; {points to where the break occurs}
22271begin cur_val:=n; fetch_box(v);
22272flush_node_list(split_disc); split_disc:=null;
22273if sa_mark<>null then
22274  if do_marks(vsplit_init,0,sa_mark) then sa_mark:=null;
22275if split_first_mark<>null then
22276  begin delete_token_ref(split_first_mark); split_first_mark:=null;
22277  delete_token_ref(split_bot_mark); split_bot_mark:=null;
22278  end;
22279@<Dispense with trivial cases of void or bad boxes@>;
22280q:=vert_break(list_ptr(v),h,split_max_depth);
22281@<Look at all the marks in nodes before the break, and set the final
22282  link to |null| at the break@>;
22283q:=prune_page_top(q,saving_vdiscards>0);
22284p:=list_ptr(v); free_node(v,box_node_size);
22285if q<>null then q:=vpack(q,natural);
22286change_box(q); {the |eq_level| of the box stays the same}
22287vsplit:=vpackage(p,h,exactly,split_max_depth);
22288exit: end;
22289
22290@ @<Dispense with trivial cases of void or bad boxes@>=
22291if v=null then
22292  begin vsplit:=null; return;
22293  end;
22294if type(v)<>vlist_node then
22295  begin print_err(""); print_esc("vsplit"); print(" needs a ");
22296  print_esc("vbox");
22297@:vsplit_}{\.{\\vsplit needs a \\vbox}@>
22298  help2("The box you are trying to split is an \hbox.")@/
22299  ("I can't split such a box, so I'll leave it alone.");
22300  error; vsplit:=null; return;
22301  end
22302
22303@ It's possible that the box begins with a penalty node that is the
22304``best'' break, so we must be careful to handle this special case correctly.
22305
22306@<Look at all the marks...@>=
22307p:=list_ptr(v);
22308if p=q then list_ptr(v):=null
22309else loop@+begin if type(p)=mark_node then
22310    if mark_class(p)<>0 then @<Update the current marks for |vsplit|@>
22311    else if split_first_mark=null then
22312      begin split_first_mark:=mark_ptr(p);
22313      split_bot_mark:=split_first_mark;
22314      token_ref_count(split_first_mark):=@|
22315        token_ref_count(split_first_mark)+2;
22316      end
22317    else  begin delete_token_ref(split_bot_mark);
22318      split_bot_mark:=mark_ptr(p);
22319      add_token_ref(split_bot_mark);
22320      end;
22321  if link(p)=q then
22322    begin link(p):=null; goto done;
22323    end;
22324  p:=link(p);
22325  end;
22326done:
22327
22328@* \[45] The page builder.
22329When \TeX\ appends new material to its main vlist in vertical mode, it uses
22330a method something like |vsplit| to decide where a page ends, except that
22331the calculations are done ``on line'' as new items come in.
22332The main complication in this process is that insertions must be put
22333into their boxes and removed from the vlist, in a more-or-less optimum manner.
22334
22335We shall use the term ``current page'' for that part of the main vlist that
22336is being considered as a candidate for being broken off and sent to the
22337user's output routine. The current page starts at |link(page_head)|, and
22338it ends at |page_tail|.  We have |page_head=page_tail| if this list is empty.
22339@^current page@>
22340
22341Utter chaos would reign if the user kept changing page specifications
22342while a page is being constructed, so the page builder keeps the pertinent
22343specifications frozen as soon as the page receives its first box or
22344insertion.  The global variable |page_contents| is |empty| when the
22345current page contains only mark nodes and content-less whatsit nodes; it
22346is |inserts_only| if the page contains only insertion nodes in addition to
22347marks and whatsits.  Glue nodes, kern nodes, and penalty nodes are
22348discarded until a box or rule node appears, at which time |page_contents|
22349changes to |box_there|.  As soon as |page_contents| becomes non-|empty|,
22350the current |vsize| and |max_depth| are squirreled away into |page_goal|
22351and |page_max_depth|; the latter values will be used until the page has
22352been forwarded to the user's output routine. The \.{\\topskip} adjustment
22353is made when |page_contents| changes to |box_there|.
22354
22355Although |page_goal| starts out equal to |vsize|, it is decreased by the
22356scaled natural height-plus-depth of the insertions considered so far, and by
22357the \.{\\skip} corrections for those insertions. Therefore it represents
22358the size into which the non-inserted material should fit, assuming that
22359all insertions in the current page have been made.
22360
22361The global variables |best_page_break| and |least_page_cost| correspond
22362respectively to the local variables |best_place| and |least_cost| in the
22363|vert_break| routine that we have already studied; i.e., they record the
22364location and value of the best place currently known for breaking the
22365current page. The value of |page_goal| at the time of the best break is
22366stored in |best_size|.
22367
22368@d inserts_only=1
22369  {|page_contents| when an insert node has been contributed, but no boxes}
22370@d box_there=2 {|page_contents| when a box or rule has been contributed}
22371
22372@<Glob...@>=
22373@!page_tail:pointer; {the final node on the current page}
22374@!page_contents:empty..box_there; {what is on the current page so far?}
22375@!page_max_depth:scaled; {maximum box depth on page being built}
22376@!best_page_break:pointer; {break here to get the best page known so far}
22377@!least_page_cost:integer; {the score for this currently best page}
22378@!best_size:scaled; {its |page_goal|}
22379
22380@ The page builder has another data structure to keep track of insertions.
22381This is a list of four-word nodes, starting and ending at |page_ins_head|.
22382That is, the first element of the list is node |r@t$_1$@>=link(page_ins_head)|;
22383node $r_j$ is followed by |r@t$_{j+1}$@>=link(r@t$_j$@>)|; and if there are
22384|n| items we have |r@t$_{n+1}$@>=page_ins_head|. The |subtype| field of
22385each node in this list refers to an insertion number; for example, `\.{\\insert
22386250}' would correspond to a node whose |subtype| is |qi(250)|
22387(the same as the |subtype| field of the relevant |ins_node|). These |subtype|
22388fields are in increasing order, and |subtype(page_ins_head)=
22389qi(255)|, so |page_ins_head| serves as a convenient sentinel
22390at the end of the list. A record is present for each insertion number that
22391appears in the current page.
22392
22393The |type| field in these nodes distinguishes two possibilities that
22394might occur as we look ahead before deciding on the optimum page break.
22395If |type(r)=inserting|, then |height(r)| contains the total of the
22396height-plus-depth dimensions of the box and all its inserts seen so far.
22397If |type(r)=split_up|, then no more insertions will be made into this box,
22398because at least one previous insertion was too big to fit on the current
22399page; |broken_ptr(r)| points to the node where that insertion will be
22400split, if \TeX\ decides to split it, |broken_ins(r)| points to the
22401insertion node that was tentatively split, and |height(r)| includes also the
22402natural height plus depth of the part that would be split off.
22403
22404In both cases, |last_ins_ptr(r)| points to the last |ins_node|
22405encountered for box |qo(subtype(r))| that would be at least partially
22406inserted on the next page; and |best_ins_ptr(r)| points to the last
22407such |ins_node| that should actually be inserted, to get the page with
22408minimum badness among all page breaks considered so far. We have
22409|best_ins_ptr(r)=null| if and only if no insertion for this box should
22410be made to produce this optimum page.
22411
22412The data structure definitions here use the fact that the |@!height| field
22413appears in the fourth word of a box node.
22414@^data structure assumptions@>
22415
22416@d page_ins_node_size=4 {number of words for a page insertion node}
22417@d inserting=0 {an insertion class that has not yet overflowed}
22418@d split_up=1 {an overflowed insertion class}
22419@d broken_ptr(#)==link(#+1)
22420  {an insertion for this class will break here if anywhere}
22421@d broken_ins(#)==info(#+1) {this insertion might break at |broken_ptr|}
22422@d last_ins_ptr(#)==link(#+2) {the most recent insertion for this |subtype|}
22423@d best_ins_ptr(#)==info(#+2) {the optimum most recent insertion}
22424
22425@<Initialize the special list heads...@>=
22426subtype(page_ins_head):=qi(255);
22427type(page_ins_head):=split_up; link(page_ins_head):=page_ins_head;
22428
22429@ An array |page_so_far| records the heights and depths of everything
22430on the current page. This array contains six |scaled| numbers, like the
22431similar arrays already considered in |line_break| and |vert_break|; and it
22432also contains |page_goal| and |page_depth|, since these values are
22433all accessible to the user via |set_page_dimen| commands. The
22434value of |page_so_far[1]| is also called |page_total|.  The stretch
22435and shrink components of the \.{\\skip} corrections for each insertion are
22436included in |page_so_far|, but the natural space components of these
22437corrections are not, since they have been subtracted from |page_goal|.
22438
22439The variable |page_depth| records the depth of the current page; it has been
22440adjusted so that it is at most |page_max_depth|. The variable
22441|last_glue| points to the glue specification of the most recent node
22442contributed from the contribution list, if this was a glue node; otherwise
22443|last_glue=max_halfword|. (If the contribution list is nonempty,
22444however, the value of |last_glue| is not necessarily accurate.)
22445The variables |last_penalty|, |last_kern|, and |last_node_type|
22446are similar.  And
22447finally, |insert_penalties| holds the sum of the penalties associated with
22448all split and floating insertions.
22449
22450@d page_goal==page_so_far[0] {desired height of information on page being built}
22451@d page_total==page_so_far[1] {height of the current page}
22452@d page_shrink==page_so_far[6] {shrinkability of the current page}
22453@d page_depth==page_so_far[7] {depth of the current page}
22454
22455@<Glob...@>=
22456@!page_so_far:array [0..7] of scaled; {height and glue of the current page}
22457@!last_glue:pointer; {used to implement \.{\\lastskip}}
22458@!last_penalty:integer; {used to implement \.{\\lastpenalty}}
22459@!last_kern:scaled; {used to implement \.{\\lastkern}}
22460@!last_node_type:integer; {used to implement \.{\\lastnodetype}}
22461@!insert_penalties:integer; {sum of the penalties for held-over insertions}
22462
22463@ @<Put each...@>=
22464primitive("pagegoal",set_page_dimen,0);
22465@!@:page_goal_}{\.{\\pagegoal} primitive@>
22466primitive("pagetotal",set_page_dimen,1);
22467@!@:page_total_}{\.{\\pagetotal} primitive@>
22468primitive("pagestretch",set_page_dimen,2);
22469@!@:page_stretch_}{\.{\\pagestretch} primitive@>
22470primitive("pagefilstretch",set_page_dimen,3);
22471@!@:page_fil_stretch_}{\.{\\pagefilstretch} primitive@>
22472primitive("pagefillstretch",set_page_dimen,4);
22473@!@:page_fill_stretch_}{\.{\\pagefillstretch} primitive@>
22474primitive("pagefilllstretch",set_page_dimen,5);
22475@!@:page_filll_stretch_}{\.{\\pagefilllstretch} primitive@>
22476primitive("pageshrink",set_page_dimen,6);
22477@!@:page_shrink_}{\.{\\pageshrink} primitive@>
22478primitive("pagedepth",set_page_dimen,7);
22479@!@:page_depth_}{\.{\\pagedepth} primitive@>
22480
22481@ @<Cases of |print_cmd_chr|...@>=
22482set_page_dimen: case chr_code of
224830: print_esc("pagegoal");
224841: print_esc("pagetotal");
224852: print_esc("pagestretch");
224863: print_esc("pagefilstretch");
224874: print_esc("pagefillstretch");
224885: print_esc("pagefilllstretch");
224896: print_esc("pageshrink");
22490othercases print_esc("pagedepth")
22491endcases;
22492
22493@ @d print_plus_end(#)==print(#);@+end
22494@d print_plus(#)==if page_so_far[#]<>0 then
22495  begin print(" plus "); print_scaled(page_so_far[#]); print_plus_end
22496
22497@p procedure print_totals;
22498begin print_scaled(page_total);
22499print_plus(2)("");
22500print_plus(3)("fil");
22501print_plus(4)("fill");
22502print_plus(5)("filll");
22503if page_shrink<>0 then
22504  begin print(" minus "); print_scaled(page_shrink);
22505  end;
22506end;
22507
22508@ @<Show the status of the current page@>=
22509if page_head<>page_tail then
22510  begin print_nl("### current page:");
22511  if output_active then print(" (held over for next output)");
22512@.held over for next output@>
22513  show_box(link(page_head));
22514  if page_contents>empty then
22515    begin print_nl("total height "); print_totals;
22516@:total_height}{\.{total height}@>
22517    print_nl(" goal height "); print_scaled(page_goal);
22518@.goal height@>
22519    r:=link(page_ins_head);
22520    while r<>page_ins_head do
22521      begin print_ln; print_esc("insert"); t:=qo(subtype(r));
22522      print_int(t); print(" adds ");
22523      if count(t)=1000 then t:=height(r)
22524      else t:=x_over_n(height(r),1000)*count(t);
22525      print_scaled(t);
22526      if type(r)=split_up then
22527        begin q:=page_head; t:=0;
22528        repeat q:=link(q);
22529        if (type(q)=ins_node)and(subtype(q)=subtype(r)) then incr(t);
22530        until q=broken_ins(r);
22531        print(", #"); print_int(t); print(" might split");
22532        end;
22533      r:=link(r);
22534      end;
22535    end;
22536  end
22537
22538@ Here is a procedure that is called when the |page_contents| is changing
22539from |empty| to |inserts_only| or |box_there|.
22540
22541@d set_page_so_far_zero(#)==page_so_far[#]:=0
22542
22543@p procedure freeze_page_specs(@!s:small_number);
22544begin page_contents:=s;
22545page_goal:=vsize; page_max_depth:=max_depth;
22546page_depth:=0; do_all_six(set_page_so_far_zero);
22547least_page_cost:=awful_bad;
22548@!stat if tracing_pages>0 then
22549  begin begin_diagnostic;
22550  print_nl("%% goal height="); print_scaled(page_goal);
22551@.goal height@>
22552  print(", max depth="); print_scaled(page_max_depth);
22553  end_diagnostic(false);
22554  end;@;@+tats@;@/
22555end;
22556
22557@ Pages are built by appending nodes to the current list in \TeX's
22558vertical mode, which is at the outermost level of the semantic nest. This
22559vlist is split into two parts; the ``current page'' that we have been
22560talking so much about already, and the ``contribution list'' that receives
22561new nodes as they are created.  The current page contains everything that
22562the page builder has accounted for in its data structures, as described
22563above, while the contribution list contains other things that have been
22564generated by other parts of \TeX\ but have not yet been
22565seen by the page builder.
22566The contribution list starts at |link(contrib_head)|, and it ends at the
22567current node in \TeX's vertical mode.
22568
22569When \TeX\ has appended new material in vertical mode, it calls the procedure
22570|build_page|, which tries to catch up by moving nodes from the contribution
22571list to the current page. This procedure will succeed in its goal of
22572emptying the contribution list, unless a page break is discovered, i.e.,
22573unless the current page has grown to the point where the optimum next
22574page break has been determined. In the latter case, the nodes after the
22575optimum break will go back onto the contribution list, and control will
22576effectively pass to the user's output routine.
22577
22578We make |type(page_head)=glue_node|, so that an initial glue node on
22579the current page will not be considered a valid breakpoint.
22580
22581@<Initialize the special list...@>=
22582type(page_head):=glue_node; subtype(page_head):=normal;
22583
22584@ The global variable |output_active| is true during the time the
22585user's output routine is driving \TeX.
22586
22587@<Glob...@>=
22588@!output_active:boolean; {are we in the midst of an output routine?}
22589
22590@ @<Set init...@>=
22591output_active:=false; insert_penalties:=0;
22592
22593@ The page builder is ready to start a fresh page if we initialize
22594the following state variables. (However, the page insertion list is initialized
22595elsewhere.)
22596
22597@<Start a new current page@>=
22598page_contents:=empty; page_tail:=page_head; link(page_head):=null;@/
22599last_glue:=max_halfword; last_penalty:=0; last_kern:=0;
22600last_node_type:=-1;
22601page_depth:=0; page_max_depth:=0
22602
22603@ At certain times box 255 is supposed to be void (i.e., |null|),
22604or an insertion box is supposed to be ready to accept a vertical list.
22605If not, an error message is printed, and the following subroutine
22606flushes the unwanted contents, reporting them to the user.
22607
22608@p procedure box_error(@!n:eight_bits);
22609begin error; begin_diagnostic;
22610print_nl("The following box has been deleted:");
22611@.The following...deleted@>
22612show_box(box(n)); end_diagnostic(true);
22613flush_node_list(box(n)); box(n):=null;
22614end;
22615
22616@ The following procedure guarantees that a given box register
22617does not contain an \.{\\hbox}.
22618
22619@p procedure ensure_vbox(@!n:eight_bits);
22620var p:pointer; {the box register contents}
22621begin p:=box(n);
22622if p<>null then if type(p)=hlist_node then
22623  begin print_err("Insertions can only be added to a vbox");
22624@.Insertions can only...@>
22625  help3("Tut tut: You're trying to \insert into a")@/
22626    ("\box register that now contains an \hbox.")@/
22627    ("Proceed, and I'll discard its present contents.");
22628  box_error(n);
22629  end;
22630end;
22631
22632@ \TeX\ is not always in vertical mode at the time |build_page|
22633is called; the current mode reflects what \TeX\ should return to, after
22634the contribution list has been emptied. A call on |build_page| should
22635be immediately followed by `|goto big_switch|', which is \TeX's central
22636control point.
22637
22638@d contribute=80 {go here to link a node into the current page}
22639
22640@p @t\4@>@<Declare the procedure called |fire_up|@>@;@/
22641procedure build_page; {append contributions to the current page}
22642label exit,done,done1,continue,contribute,update_heights;
22643var p:pointer; {the node being appended}
22644@!q,@!r:pointer; {nodes being examined}
22645@!b,@!c:integer; {badness and cost of current page}
22646@!pi:integer; {penalty to be added to the badness}
22647@!n:min_quarterword..biggest_reg; {insertion box number}
22648@!delta,@!h,@!w:scaled; {sizes used for insertion calculations}
22649begin if (link(contrib_head)=null)or output_active then return;
22650repeat continue: p:=link(contrib_head);@/
22651@<Update the values of |last_glue|, |last_penalty|, and |last_kern|@>;
22652@<Move node |p| to the current page; if it is time for a page break,
22653  put the nodes following the break back onto the contribution list,
22654  and |return| to the user's output routine if there is one@>;
22655until link(contrib_head)=null;
22656@<Make the contribution list empty by setting its tail to |contrib_head|@>;
22657exit:end;
22658
22659@ @d contrib_tail==nest[0].tail_field {tail of the contribution list}
22660
22661@<Make the contribution list empty...@>=
22662if nest_ptr=0 then tail:=contrib_head {vertical mode}
22663else contrib_tail:=contrib_head {other modes}
22664
22665@ @<Update the values of |last_glue|...@>=
22666if last_glue<>max_halfword then delete_glue_ref(last_glue);
22667last_penalty:=0; last_kern:=0;
22668last_node_type:=type(p)+1;
22669if type(p)=glue_node then
22670  begin last_glue:=glue_ptr(p); add_glue_ref(last_glue);
22671  end
22672else  begin last_glue:=max_halfword;
22673  if type(p)=penalty_node then last_penalty:=penalty(p)
22674  else if type(p)=kern_node then last_kern:=width(p);
22675  end
22676
22677@ The code here is an example of a many-way switch into routines that
22678merge together in different places. Some people call this unstructured
22679programming, but the author doesn't see much wrong with it, as long as
22680@^Knuth, Donald Ervin@>
22681the various labels have a well-understood meaning.
22682
22683@<Move node |p| to the current page; ...@>=
22684@<If the current page is empty and node |p| is to be deleted, |goto done1|;
22685  otherwise use node |p| to update the state of the current page;
22686  if this node is an insertion, |goto contribute|; otherwise if this node
22687  is not a legal breakpoint, |goto contribute| or |update_heights|;
22688  otherwise set |pi| to the penalty associated with this breakpoint@>;
22689@<Check if node |p| is a new champion breakpoint; then \(if)if it is time for
22690  a page break, prepare for output, and either fire up the user's
22691  output routine and |return| or ship out the page and |goto done|@>;
22692if (type(p)<glue_node)or(type(p)>kern_node) then goto contribute;
22693update_heights:@<Update the current page measurements with respect to the
22694  glue or kern specified by node~|p|@>;
22695contribute: @<Make sure that |page_max_depth| is not exceeded@>;
22696@<Link node |p| into the current page and |goto done|@>;
22697done1:@<Recycle node |p|@>;
22698done:
22699
22700@ @<Link node |p| into the current page and |goto done|@>=
22701link(page_tail):=p; page_tail:=p;
22702link(contrib_head):=link(p); link(p):=null; goto done
22703
22704@ @<Recycle node |p|@>=
22705link(contrib_head):=link(p); link(p):=null;
22706if saving_vdiscards>0 then
22707  begin if page_disc=null then page_disc:=p@+else link(tail_page_disc):=p;
22708  tail_page_disc:=p;
22709  end
22710else flush_node_list(p)
22711
22712@ The title of this section is already so long, it seems best to avoid
22713making it more accurate but still longer, by mentioning the fact that a
22714kern node at the end of the contribution list will not be contributed until
22715we know its successor.
22716
22717@<If the current page is empty...@>=
22718case type(p) of
22719hlist_node,vlist_node,rule_node: if page_contents<box_there then
22720    @<Initialize the current page, insert the \.{\\topskip} glue
22721      ahead of |p|, and |goto continue|@>
22722  else @<Prepare to move a box or rule node to the current page,
22723    then |goto contribute|@>;
22724whatsit_node: @<Prepare to move whatsit |p| to the current page,
22725  then |goto contribute|@>;
22726glue_node: if page_contents<box_there then goto done1
22727  else if precedes_break(page_tail) then pi:=0
22728  else goto update_heights;
22729kern_node: if page_contents<box_there then goto done1
22730  else if link(p)=null then return
22731  else if type(link(p))=glue_node then pi:=0
22732  else goto update_heights;
22733penalty_node: if page_contents<box_there then goto done1@+else pi:=penalty(p);
22734mark_node: goto contribute;
22735ins_node: @<Append an insertion to the current page and |goto contribute|@>;
22736othercases confusion("page")
22737@:this can't happen page}{\quad page@>
22738endcases
22739
22740@ @<Initialize the current page, insert the \.{\\topskip} glue...@>=
22741begin if page_contents=empty then freeze_page_specs(box_there)
22742else page_contents:=box_there;
22743q:=new_skip_param(top_skip_code); {now |temp_ptr=glue_ptr(q)|}
22744if width(temp_ptr)>height(p) then width(temp_ptr):=width(temp_ptr)-height(p)
22745else width(temp_ptr):=0;
22746link(q):=p; link(contrib_head):=q; goto continue;
22747end
22748
22749@ @<Prepare to move a box or rule node to the current page...@>=
22750begin page_total:=page_total+page_depth+height(p);
22751page_depth:=depth(p);
22752goto contribute;
22753end
22754
22755@ @<Make sure that |page_max_depth| is not exceeded@>=
22756if page_depth>page_max_depth then
22757  begin page_total:=@|
22758    page_total+page_depth-page_max_depth;@/
22759  page_depth:=page_max_depth;
22760  end;
22761
22762@ @<Update the current page measurements with respect to the glue...@>=
22763if type(p)=kern_node then q:=p
22764else begin q:=glue_ptr(p);
22765  page_so_far[2+stretch_order(q)]:=@|
22766    page_so_far[2+stretch_order(q)]+stretch(q);@/
22767  page_shrink:=page_shrink+shrink(q);
22768  if (shrink_order(q)<>normal)and(shrink(q)<>0) then
22769    begin@t@>@;@/
22770    print_err("Infinite glue shrinkage found on current page");@/
22771@.Infinite glue shrinkage...@>
22772    help4("The page about to be output contains some infinitely")@/
22773      ("shrinkable glue, e.g., `\vss' or `\vskip 0pt minus 1fil'.")@/
22774      ("Such glue doesn't belong there; but you can safely proceed,")@/
22775      ("since the offensive shrinkability has been made finite.");
22776    error;
22777    r:=new_spec(q); shrink_order(r):=normal; delete_glue_ref(q);
22778    glue_ptr(p):=r; q:=r;
22779    end;
22780  end;
22781page_total:=page_total+page_depth+width(q); page_depth:=0
22782
22783@ @<Check if node |p| is a new champion breakpoint; then \(if)...@>=
22784if pi<inf_penalty then
22785  begin @<Compute the badness, |b|, of the current page,
22786    using |awful_bad| if the box is too full@>;
22787  if b<awful_bad then
22788    if pi<=eject_penalty then c:=pi
22789    else  if b<inf_bad then c:=b+pi+insert_penalties
22790      else c:=deplorable
22791  else c:=b;
22792  if insert_penalties>=10000 then c:=awful_bad;
22793  @!stat if tracing_pages>0 then @<Display the page break cost@>;@+tats@;@/
22794  if c<=least_page_cost then
22795    begin best_page_break:=p; best_size:=page_goal;
22796    least_page_cost:=c;
22797    r:=link(page_ins_head);
22798    while r<>page_ins_head do
22799      begin best_ins_ptr(r):=last_ins_ptr(r);
22800      r:=link(r);
22801      end;
22802    end;
22803  if (c=awful_bad)or(pi<=eject_penalty) then
22804    begin fire_up(p); {output the current page at the best place}
22805    if output_active then return; {user's output routine will act}
22806    goto done; {the page has been shipped out by default output routine}
22807    end;
22808  end
22809
22810@ @<Display the page break cost@>=
22811begin begin_diagnostic; print_nl("%");
22812print(" t="); print_totals;@/
22813print(" g="); print_scaled(page_goal);@/
22814print(" b=");
22815if b=awful_bad then print_char("*")@+else print_int(b);
22816@.*\relax@>
22817print(" p="); print_int(pi);
22818print(" c=");
22819if c=awful_bad then print_char("*")@+else print_int(c);
22820if c<=least_page_cost then print_char("#");
22821end_diagnostic(false);
22822end
22823
22824@ @<Compute the badness, |b|, of the current page...@>=
22825if page_total<page_goal then
22826  if (page_so_far[3]<>0) or (page_so_far[4]<>0) or@|
22827    (page_so_far[5]<>0) then b:=0
22828  else b:=badness(page_goal-page_total,page_so_far[2])
22829else if page_total-page_goal>page_shrink then b:=awful_bad
22830else b:=badness(page_total-page_goal,page_shrink)
22831
22832@ @<Append an insertion to the current page and |goto contribute|@>=
22833begin if page_contents=empty then freeze_page_specs(inserts_only);
22834n:=subtype(p); r:=page_ins_head;
22835while n>=subtype(link(r)) do r:=link(r);
22836n:=qo(n);
22837if subtype(r)<>qi(n) then
22838  @<Create a page insertion node with |subtype(r)=qi(n)|, and
22839    include the glue correction for box |n| in the
22840    current page state@>;
22841if type(r)=split_up then insert_penalties:=insert_penalties+float_cost(p)
22842else  begin last_ins_ptr(r):=p;
22843  delta:=page_goal-page_total-page_depth+page_shrink;
22844    {this much room is left if we shrink the maximum}
22845  if count(n)=1000 then h:=height(p)
22846  else h:=x_over_n(height(p),1000)*count(n); {this much room is needed}
22847  if ((h<=0)or(h<=delta))and(height(p)+height(r)<=dimen(n)) then
22848    begin page_goal:=page_goal-h; height(r):=height(r)+height(p);
22849    end
22850  else @<Find the best way to split the insertion, and change
22851    |type(r)| to |split_up|@>;
22852  end;
22853goto contribute;
22854end
22855
22856@ We take note of the value of \.{\\skip} |n| and the height plus depth
22857of \.{\\box}~|n| only when the first \.{\\insert}~|n| node is
22858encountered for a new page. A user who changes the contents of \.{\\box}~|n|
22859after that first \.{\\insert}~|n| had better be either extremely careful
22860or extremely lucky, or both.
22861
22862@<Create a page insertion node...@>=
22863begin q:=get_node(page_ins_node_size); link(q):=link(r); link(r):=q; r:=q;
22864subtype(r):=qi(n); type(r):=inserting; ensure_vbox(n);
22865if box(n)=null then height(r):=0
22866else height(r):=height(box(n))+depth(box(n));
22867best_ins_ptr(r):=null;@/
22868q:=skip(n);
22869if count(n)=1000 then h:=height(r)
22870else h:=x_over_n(height(r),1000)*count(n);
22871page_goal:=page_goal-h-width(q);@/
22872page_so_far[2+stretch_order(q)]:=@|page_so_far[2+stretch_order(q)]+stretch(q);@/
22873page_shrink:=page_shrink+shrink(q);
22874if (shrink_order(q)<>normal)and(shrink(q)<>0) then
22875  begin print_err("Infinite glue shrinkage inserted from "); print_esc("skip");
22876@.Infinite glue shrinkage...@>
22877  print_int(n);
22878  help3("The correction glue for page breaking with insertions")@/
22879    ("must have finite shrinkability. But you may proceed,")@/
22880    ("since the offensive shrinkability has been made finite.");
22881  error;
22882  end;
22883end
22884
22885@ Here is the code that will split a long footnote between pages, in an
22886emergency. The current situation deserves to be recapitulated: Node |p|
22887is an insertion into box |n|; the insertion will not fit, in its entirety,
22888either because it would make the total contents of box |n| greater than
22889\.{\\dimen} |n|, or because it would make the incremental amount of growth
22890|h| greater than the available space |delta|, or both. (This amount |h| has
22891been weighted by the insertion scaling factor, i.e., by \.{\\count} |n|
22892over 1000.) Now we will choose the best way to break the vlist of the
22893insertion, using the same criteria as in the \.{\\vsplit} operation.
22894
22895@<Find the best way to split the insertion...@>=
22896begin if count(n)<=0 then w:=max_dimen
22897else  begin w:=page_goal-page_total-page_depth;
22898  if count(n)<>1000 then w:=x_over_n(w,count(n))*1000;
22899  end;
22900if w>dimen(n)-height(r) then w:=dimen(n)-height(r);
22901q:=vert_break(ins_ptr(p),w,depth(p));
22902height(r):=height(r)+best_height_plus_depth;
22903@!stat if tracing_pages>0 then @<Display the insertion split cost@>;@+tats@;@/
22904if count(n)<>1000 then
22905  best_height_plus_depth:=x_over_n(best_height_plus_depth,1000)*count(n);
22906page_goal:=page_goal-best_height_plus_depth;
22907type(r):=split_up; broken_ptr(r):=q; broken_ins(r):=p;
22908if q=null then insert_penalties:=insert_penalties+eject_penalty
22909else if type(q)=penalty_node then insert_penalties:=insert_penalties+penalty(q);
22910end
22911
22912@ @<Display the insertion split cost@>=
22913begin begin_diagnostic; print_nl("% split"); print_int(n);
22914@.split@>
22915print(" to "); print_scaled(w);
22916print_char(","); print_scaled(best_height_plus_depth);@/
22917print(" p=");
22918if q=null then print_int(eject_penalty)
22919else if type(q)=penalty_node then print_int(penalty(q))
22920else print_char("0");
22921end_diagnostic(false);
22922end
22923
22924@ When the page builder has looked at as much material as could appear before
22925the next page break, it makes its decision. The break that gave minimum
22926badness will be used to put a completed ``page'' into box 255, with insertions
22927appended to their other boxes.
22928
22929We also set the values of |top_mark|, |first_mark|, and |bot_mark|. The
22930program uses the fact that |bot_mark<>null| implies |first_mark<>null|;
22931it also knows that |bot_mark=null| implies |top_mark=first_mark=null|.
22932
22933The |fire_up| subroutine prepares to output the current page at the best
22934place; then it fires up the user's output routine, if there is one,
22935or it simply ships out the page. There is one parameter, |c|, which represents
22936the node that was being contributed to the page when the decision to
22937force an output was made.
22938
22939@<Declare the procedure called |fire_up|@>=
22940procedure fire_up(@!c:pointer);
22941label exit;
22942var p,@!q,@!r,@!s:pointer; {nodes being examined and/or changed}
22943@!prev_p:pointer; {predecessor of |p|}
22944@!n:min_quarterword..biggest_reg; {insertion box number}
22945@!wait:boolean; {should the present insertion be held over?}
22946@!save_vbadness:integer; {saved value of |vbadness|}
22947@!save_vfuzz: scaled; {saved value of |vfuzz|}
22948@!save_split_top_skip: pointer; {saved value of |split_top_skip|}
22949begin @<Set the value of |output_penalty|@>;
22950if sa_mark<>null then
22951  if do_marks(fire_up_init,0,sa_mark) then sa_mark:=null;
22952if bot_mark<>null then
22953  begin if top_mark<>null then delete_token_ref(top_mark);
22954  top_mark:=bot_mark; add_token_ref(top_mark);
22955  delete_token_ref(first_mark); first_mark:=null;
22956  end;
22957@<Put the \(o)optimal current page into box 255, update |first_mark| and
22958  |bot_mark|, append insertions to their boxes, and put the
22959  remaining nodes back on the contribution list@>;
22960if sa_mark<>null then
22961  if do_marks(fire_up_done,0,sa_mark) then sa_mark:=null;
22962if (top_mark<>null)and(first_mark=null) then
22963  begin first_mark:=top_mark; add_token_ref(top_mark);
22964  end;
22965if output_routine<>null then
22966  if dead_cycles>=max_dead_cycles then
22967    @<Explain that too many dead cycles have occurred in a row@>
22968  else @<Fire up the user's output routine and |return|@>;
22969@<Perform the default output routine@>;
22970exit:end;
22971
22972@ @<Set the value of |output_penalty|@>=
22973if type(best_page_break)=penalty_node then
22974  begin geq_word_define(int_base+output_penalty_code,penalty(best_page_break));
22975  penalty(best_page_break):=inf_penalty;
22976  end
22977else geq_word_define(int_base+output_penalty_code,inf_penalty)
22978
22979@ As the page is finally being prepared for output,
22980pointer |p| runs through the vlist, with |prev_p| trailing behind;
22981pointer |q| is the tail of a list of insertions that
22982are being held over for a subsequent page.
22983
22984@<Put the \(o)optimal current page into box 255...@>=
22985if c=best_page_break then best_page_break:=null; {|c| not yet linked in}
22986@<Ensure that box 255 is empty before output@>;
22987insert_penalties:=0; {this will count the number of insertions held over}
22988save_split_top_skip:=split_top_skip;
22989if holding_inserts<=0 then
22990  @<Prepare all the boxes involved in insertions to act as queues@>;
22991q:=hold_head; link(q):=null; prev_p:=page_head; p:=link(prev_p);
22992while p<>best_page_break do
22993  begin if type(p)=ins_node then
22994    begin if holding_inserts<=0 then
22995       @<Either insert the material specified by node |p| into the
22996         appropriate box, or hold it for the next page;
22997         also delete node |p| from the current page@>;
22998    end
22999  else if type(p)=mark_node then
23000    if mark_class(p)<>0 then @<Update the current marks for |fire_up|@>
23001    else @<Update the values of
23002    |first_mark| and |bot_mark|@>;
23003  prev_p:=p; p:=link(prev_p);
23004  end;
23005split_top_skip:=save_split_top_skip;
23006@<Break the current page at node |p|, put it in box~255,
23007  and put the remaining nodes on the contribution list@>;
23008@<Delete \(t)the page-insertion nodes@>
23009
23010@ @<Ensure that box 255 is empty before output@>=
23011if box(255)<>null then
23012  begin print_err(""); print_esc("box"); print("255 is not void");
23013@:box255}{\.{\\box255 is not void}@>
23014  help2("You shouldn't use \box255 except in \output routines.")@/
23015    ("Proceed, and I'll discard its present contents.");
23016  box_error(255);
23017  end
23018
23019@ @<Update the values of |first_mark| and |bot_mark|@>=
23020begin if first_mark=null then
23021  begin first_mark:=mark_ptr(p);
23022  add_token_ref(first_mark);
23023  end;
23024if bot_mark<>null then delete_token_ref(bot_mark);
23025bot_mark:=mark_ptr(p); add_token_ref(bot_mark);
23026end
23027
23028@ When the following code is executed, the current page runs from node
23029|link(page_head)| to node |prev_p|, and the nodes from |p| to |page_tail|
23030are to be placed back at the front of the contribution list. Furthermore
23031the heldover insertions appear in a list from |link(hold_head)| to |q|; we
23032will put them into the current page list for safekeeping while the user's
23033output routine is active.  We might have |q=hold_head|; and |p=null| if
23034and only if |prev_p=page_tail|. Error messages are suppressed within
23035|vpackage|, since the box might appear to be overfull or underfull simply
23036because the stretch and shrink from the \.{\\skip} registers for inserts
23037are not actually present in the box.
23038
23039@<Break the current page at node |p|, put it...@>=
23040if p<>null then
23041  begin if link(contrib_head)=null then
23042    if nest_ptr=0 then tail:=page_tail
23043    else contrib_tail:=page_tail;
23044  link(page_tail):=link(contrib_head);
23045  link(contrib_head):=p;
23046  link(prev_p):=null;
23047  end;
23048save_vbadness:=vbadness; vbadness:=inf_bad;
23049save_vfuzz:=vfuzz; vfuzz:=max_dimen; {inhibit error messages}
23050box(255):=vpackage(link(page_head),best_size,exactly,page_max_depth);
23051vbadness:=save_vbadness; vfuzz:=save_vfuzz;
23052if last_glue<>max_halfword then delete_glue_ref(last_glue);
23053@<Start a new current page@>; {this sets |last_glue:=max_halfword|}
23054if q<>hold_head then
23055  begin link(page_head):=link(hold_head); page_tail:=q;
23056  end
23057
23058@ If many insertions are supposed to go into the same box, we want to know
23059the position of the last node in that box, so that we don't need to waste time
23060when linking further information into it. The |last_ins_ptr| fields of the
23061page insertion nodes are therefore used for this purpose during the
23062packaging phase.
23063
23064@<Prepare all the boxes involved in insertions to act as queues@>=
23065begin r:=link(page_ins_head);
23066while r<>page_ins_head do
23067  begin if best_ins_ptr(r)<>null then
23068    begin n:=qo(subtype(r)); ensure_vbox(n);
23069    if box(n)=null then box(n):=new_null_box;
23070    p:=box(n)+list_offset;
23071    while link(p)<>null do p:=link(p);
23072    last_ins_ptr(r):=p;
23073    end;
23074  r:=link(r);
23075  end;
23076end
23077
23078@ @<Delete \(t)the page-insertion nodes@>=
23079r:=link(page_ins_head);
23080while r<>page_ins_head do
23081  begin q:=link(r); free_node(r,page_ins_node_size); r:=q;
23082  end;
23083link(page_ins_head):=page_ins_head
23084
23085@ We will set |best_ins_ptr:=null| and package the box corresponding to
23086insertion node~|r|, just after making the final insertion into that box.
23087If this final insertion is `|split_up|', the remainder after splitting
23088and pruning (if any) will be carried over to the next page.
23089
23090@<Either insert the material specified by node |p| into...@>=
23091begin r:=link(page_ins_head);
23092while subtype(r)<>subtype(p) do r:=link(r);
23093if best_ins_ptr(r)=null then wait:=true
23094else  begin wait:=false; s:=last_ins_ptr(r); link(s):=ins_ptr(p);
23095  if best_ins_ptr(r)=p then
23096    @<Wrap up the box specified by node |r|, splitting node |p| if
23097    called for; set |wait:=true| if node |p| holds a remainder after
23098    splitting@>
23099  else  begin while link(s)<>null do s:=link(s);
23100    last_ins_ptr(r):=s;
23101    end;
23102  end;
23103@<Either append the insertion node |p| after node |q|, and remove it
23104  from the current page, or delete |node(p)|@>;
23105end
23106
23107@ @<Wrap up the box specified by node |r|, splitting node |p| if...@>=
23108begin if type(r)=split_up then
23109  if (broken_ins(r)=p)and(broken_ptr(r)<>null) then
23110    begin while link(s)<>broken_ptr(r) do s:=link(s);
23111    link(s):=null;
23112    split_top_skip:=split_top_ptr(p);
23113    ins_ptr(p):=prune_page_top(broken_ptr(r),false);
23114    if ins_ptr(p)<>null then
23115      begin temp_ptr:=vpack(ins_ptr(p),natural);
23116      height(p):=height(temp_ptr)+depth(temp_ptr);
23117      free_node(temp_ptr,box_node_size); wait:=true;
23118      end;
23119    end;
23120best_ins_ptr(r):=null;
23121n:=qo(subtype(r));
23122temp_ptr:=list_ptr(box(n));
23123free_node(box(n),box_node_size);
23124box(n):=vpack(temp_ptr,natural);
23125end
23126
23127@ @<Either append the insertion node |p|...@>=
23128link(prev_p):=link(p); link(p):=null;
23129if wait then
23130  begin link(q):=p; q:=p; incr(insert_penalties);
23131  end
23132else  begin delete_glue_ref(split_top_ptr(p));
23133  free_node(p,ins_node_size);
23134  end;
23135p:=prev_p
23136
23137@ The list of heldover insertions, running from |link(page_head)| to
23138|page_tail|, must be moved to the contribution list when the user has
23139specified no output routine.
23140
23141@<Perform the default output routine@>=
23142begin if link(page_head)<>null then
23143  begin if link(contrib_head)=null then
23144    if nest_ptr=0 then tail:=page_tail@+else contrib_tail:=page_tail
23145  else link(page_tail):=link(contrib_head);
23146  link(contrib_head):=link(page_head);
23147  link(page_head):=null; page_tail:=page_head;
23148  end;
23149flush_node_list(page_disc); page_disc:=null;
23150ship_out(box(255)); box(255):=null;
23151end
23152
23153@ @<Explain that too many dead cycles have occurred in a row@>=
23154begin print_err("Output loop---"); print_int(dead_cycles);
23155@.Output loop...@>
23156print(" consecutive dead cycles");
23157help3("I've concluded that your \output is awry; it never does a")@/
23158("\shipout, so I'm shipping \box255 out myself. Next time")@/
23159("increase \maxdeadcycles if you want me to be more patient!"); error;
23160end
23161
23162@ @<Fire up the user's output routine and |return|@>=
23163begin output_active:=true;
23164incr(dead_cycles);
23165push_nest; mode:=-vmode; prev_depth:=ignore_depth; mode_line:=-line;
23166begin_token_list(output_routine,output_text);
23167new_save_level(output_group); normal_paragraph;
23168scan_left_brace;
23169return;
23170end
23171
23172@ When the user's output routine finishes, it has constructed a vlist
23173in internal vertical mode, and \TeX\ will do the following:
23174
23175@<Resume the page builder after an output routine has come to an end@>=
23176begin if (loc<>null) or
23177 ((token_type<>output_text)and(token_type<>backed_up)) then
23178  @<Recover from an unbalanced output routine@>;
23179end_token_list; {conserve stack space in case more outputs are triggered}
23180end_graf; unsave; output_active:=false; insert_penalties:=0;@/
23181@<Ensure that box 255 is empty after output@>;
23182if tail<>head then {current list goes after heldover insertions}
23183  begin link(page_tail):=link(head);
23184  page_tail:=tail;
23185  end;
23186if link(page_head)<>null then {and both go before heldover contributions}
23187  begin if link(contrib_head)=null then contrib_tail:=page_tail;
23188  link(page_tail):=link(contrib_head);
23189  link(contrib_head):=link(page_head);
23190  link(page_head):=null; page_tail:=page_head;
23191  end;
23192flush_node_list(page_disc); page_disc:=null;
23193pop_nest; build_page;
23194end
23195
23196@ @<Recover from an unbalanced output routine@>=
23197begin print_err("Unbalanced output routine");
23198@.Unbalanced output routine@>
23199help2("Your sneaky output routine has problematic {'s and/or }'s.")@/
23200("I can't handle that very well; good luck."); error;
23201repeat get_token;
23202until loc=null;
23203end {loops forever if reading from a file, since |null=min_halfword<=0|}
23204
23205@ @<Ensure that box 255 is empty after output@>=
23206if box(255)<>null then
23207  begin print_err("Output routine didn't use all of ");
23208  print_esc("box"); print_int(255);
23209@.Output routine didn't use...@>
23210  help3("Your \output commands should empty \box255,")@/
23211    ("e.g., by saying `\shipout\box255'.")@/
23212    ("Proceed; I'll discard its present contents.");
23213  box_error(255);
23214  end
23215
23216@* \[46] The chief executive.
23217We come now to the |main_control| routine, which contains the master
23218switch that causes all the various pieces of \TeX\ to do their things,
23219in the right order.
23220
23221In a sense, this is the grand climax of the program: It applies all the
23222tools that we have worked so hard to construct. In another sense, this is
23223the messiest part of the program: It necessarily refers to other pieces
23224of code all over the place, so that a person can't fully understand what is
23225going on without paging back and forth to be reminded of conventions that
23226are defined elsewhere. We are now at the hub of the web, the central nervous
23227system that touches most of the other parts and ties them together.
23228@^brain@>
23229
23230The structure of |main_control| itself is quite simple. There's a label
23231called |big_switch|, at which point the next token of input is fetched
23232using |get_x_token|. Then the program branches at high speed into one of
23233about 100 possible directions, based on the value of the current
23234mode and the newly fetched command code; the sum |abs(mode)+cur_cmd|
23235indicates what to do next. For example, the case `|vmode+letter|' arises
23236when a letter occurs in vertical mode (or internal vertical mode); this
23237case leads to instructions that initialize a new paragraph and enter
23238horizontal mode.
23239
23240The big |case| statement that contains this multiway switch has been labeled
23241|reswitch|, so that the program can |goto reswitch| when the next token
23242has already been fetched. Most of the cases are quite short; they call
23243an ``action procedure'' that does the work for that case, and then they
23244either |goto reswitch| or they ``fall through'' to the end of the |case|
23245statement, which returns control back to |big_switch|. Thus, |main_control|
23246is not an extremely large procedure, in spite of the multiplicity of things
23247it must do; it is small enough to be handled by \PASCAL\ compilers that put
23248severe restrictions on procedure size.
23249@!@^action procedure@>
23250
23251One case is singled out for special treatment, because it accounts for most
23252of \TeX's activities in typical applications. The process of reading simple
23253text and converting it into |char_node| records, while looking for ligatures
23254and kerns, is part of \TeX's ``inner loop''; the whole program runs
23255efficiently when its inner loop is fast, so this part has been written
23256with particular care.
23257
23258@ We shall concentrate first on the inner loop of |main_control|, deferring
23259consideration of the other cases until later.
23260@^inner loop@>
23261
23262@d big_switch=60 {go here to branch on the next token of input}
23263@d main_loop=70 {go here to typeset a string of consecutive characters}
23264@d collect_native=71 {go here to collect characters in a "native" font string}
23265@d collected=72
23266@d main_loop_wrapup=80 {go here to finish a character or ligature}
23267@d main_loop_move=90 {go here to advance the ligature cursor}
23268@d main_loop_move_lig=95 {same, when advancing past a generated ligature}
23269@d main_loop_lookahead=100 {go here to bring in another character, if any}
23270@d main_lig_loop=110 {go here to check for ligatures or kerning}
23271@d append_normal_space=120 {go here to append a normal space between words}
23272@#
23273@d pdfbox_crop=1 {|pdf_box_type| passed to |find_pic_file|}
23274@d pdfbox_media=2
23275@d pdfbox_bleed=3
23276@d pdfbox_trim=4
23277@d pdfbox_art=5
23278
23279@p @t\4@>@<Declare action procedures for use by |main_control|@>@;
23280@t\4@>@<Declare the procedure called |handle_right_brace|@>@;
23281procedure main_control; {governs \TeX's activities}
23282label big_switch,reswitch,main_loop,main_loop_wrapup,
23283  main_loop_move,main_loop_move+1,main_loop_move+2,main_loop_move_lig,
23284  main_loop_lookahead,main_loop_lookahead+1,
23285  main_lig_loop,main_lig_loop+1,main_lig_loop+2,
23286  collect_native,collected,
23287  append_normal_space,exit;
23288var@!t:integer; {general-purpose temporary variable}
23289begin if every_job<>null then begin_token_list(every_job,every_job_text);
23290big_switch: get_x_token;@/
23291reswitch: @<Give diagnostic information, if requested@>;
23292case abs(mode)+cur_cmd of
23293hmode+letter,hmode+other_char,hmode+char_given: goto main_loop;
23294hmode+char_num: begin scan_usv_num; cur_chr:=cur_val; goto main_loop;@+end;
23295hmode+no_boundary: begin get_x_token;
23296  if (cur_cmd=letter)or(cur_cmd=other_char)or(cur_cmd=char_given)or
23297   (cur_cmd=char_num) then cancel_boundary:=true;
23298  goto reswitch;
23299  end;
23300othercases begin
23301  if abs(mode)=hmode then check_for_post_char_toks(big_switch);
23302  case abs(mode)+cur_cmd of
23303    hmode+spacer: if space_factor=1000 then goto append_normal_space
23304      else app_space;
23305    hmode+ex_space,mmode+ex_space: goto append_normal_space;
23306    @t\4@>@<Cases of |main_control| that are not part of the inner loop@>
23307    end
23308  end
23309endcases; {of the big |case| statement}
23310goto big_switch;
23311main_loop:@<Append character |cur_chr| and the following characters (if~any)
23312  to the current hlist in the current font; |goto reswitch| when
23313  a non-character has been fetched@>;
23314append_normal_space:check_for_post_char_toks(big_switch);
23315@<Append a normal inter-word space to the current list,
23316  then |goto big_switch|@>;
23317exit:end;
23318
23319@ When a new token has just been fetched at |big_switch|, we have an
23320ideal place to monitor \TeX's activity.
23321@^debugging@>
23322
23323@<Give diagnostic information, if requested@>=
23324if interrupt<>0 then if OK_to_interrupt then
23325  begin back_input; check_interrupt; goto big_switch;
23326  end;
23327@!debug if panicking then check_mem(false);@+@;@+gubed
23328if tracing_commands>0 then show_cur_cmd_chr
23329
23330@ The following part of the program was first written in a structured
23331manner, according to the philosophy that ``premature optimization is
23332the root of all evil.'' Then it was rearranged into pieces of
23333spaghetti so that the most common actions could proceed with little or
23334no redundancy.
23335
23336The original unoptimized form of this algorithm resembles the
23337|reconstitute| procedure, which was described earlier in connection with
23338hyphenation. Again we have an implied ``cursor'' between characters
23339|cur_l| and |cur_r|. The main difference is that the |lig_stack| can now
23340contain a charnode as well as pseudo-ligatures; that stack is now
23341usually nonempty, because the next character of input (if any) has been
23342appended to it. In |main_control| we have
23343$$|cur_r|=\cases{|character(lig_stack)|,&if |lig_stack>null|;\cr
23344  |font_bchar[cur_font]|,&otherwise;\cr}$$
23345except when |character(lig_stack)=font_false_bchar[cur_font]|.
23346Several additional global variables are needed.
23347
23348@<Glob...@>=
23349@!main_f:internal_font_number; {the current font}
23350@!main_i:four_quarters; {character information bytes for |cur_l|}
23351@!main_j:four_quarters; {ligature/kern command}
23352@!main_k:font_index; {index into |font_info|}
23353@!main_p:pointer; {temporary register for list manipulation}
23354@!main_pp,@!main_ppp:pointer; {more temporary registers for list manipulation}
23355@!main_h:pointer; {temp for hyphen offset in native-font text}
23356@!is_hyph:boolean; {whether the last char seen is the font's hyphenchar}
23357@!space_class:integer;
23358@!prev_class:integer;
23359@!main_s:integer; {space factor value}
23360@!bchar:halfword; {right boundary character of current font, or |non_char|}
23361@!false_bchar:halfword; {nonexistent character matching |bchar|, or |non_char|}
23362@!cancel_boundary:boolean; {should the left boundary be ignored?}
23363@!ins_disc:boolean; {should we insert a discretionary node?}
23364
23365@ The boolean variables of the main loop are normally false, and always reset
23366to false before the loop is left. That saves us the extra work of initializing
23367each time.
23368
23369@<Set init...@>=
23370ligature_present:=false; cancel_boundary:=false; lft_hit:=false; rt_hit:=false;
23371ins_disc:=false;
23372
23373@ We leave the |space_factor| unchanged if |sf_code(cur_chr)=0|; otherwise we
23374set it equal to |sf_code(cur_chr)|, except that it should never change
23375from a value less than 1000 to a value exceeding 1000. The most common
23376case is |sf_code(cur_chr)=1000|, so we want that case to be fast.
23377
23378The overall structure of the main loop is presented here. Some program labels
23379are inside the individual sections.
23380@^inner loop@>
23381
23382@d adjust_space_factor==@t@>@;@/
23383  main_s:=sf_code(cur_chr) mod @"10000;
23384  if main_s=1000 then space_factor:=1000
23385  else if main_s<1000 then
23386    begin if main_s>0 then space_factor:=main_s;
23387    end
23388  else if space_factor<1000 then space_factor:=1000
23389  else space_factor:=main_s
23390
23391@d check_for_inter_char_toks(#)=={check for a spacing token list, goto |#| if found,
23392                               or |big_switch| in case of the initial letter of a run}
23393  cur_ptr:=null;
23394  space_class:=sf_code(cur_chr) div @"10000;
23395
23396  if XeTeX_inter_char_tokens_en and space_class <> 256 then begin {class 256 = ignored (for combining marks etc)}
23397    if prev_class = 255 then begin {boundary}
23398      if (state<>token_list) or (token_type<>backed_up_char) then begin
23399        find_sa_element(inter_char_val, 255*@"100 + space_class, false);
23400        if cur_ptr<>null then begin
23401          if cur_cmd<>letter then cur_cmd:=other_char;
23402          cur_tok:=(cur_cmd*max_char_val)+cur_chr;
23403          back_input; token_type:=backed_up_char;
23404          begin_token_list(sa_ptr(cur_ptr), inter_char_text);
23405          goto big_switch;
23406        end
23407      end
23408    end else begin
23409      find_sa_element(inter_char_val, prev_class*@"100 + space_class, false);
23410      if cur_ptr<>null then begin
23411        if cur_cmd<>letter then cur_cmd:=other_char;
23412        cur_tok:=(cur_cmd*max_char_val)+cur_chr;
23413        back_input; token_type:=backed_up_char;
23414        begin_token_list(sa_ptr(cur_ptr), inter_char_text);
23415        prev_class:=255;
23416        goto #;
23417      end;
23418    end;
23419    prev_class:=space_class;
23420  end
23421
23422@d check_for_post_char_toks(#)==
23423  if XeTeX_inter_char_tokens_en and (space_class<>256) and (prev_class<>255) then begin
23424    prev_class:=255;
23425    find_sa_element(inter_char_val, space_class*@"100 + 255, false); {boundary}
23426    if cur_ptr<>null then begin
23427      if cur_cs=0 then begin
23428        if cur_cmd=char_num then cur_cmd:=other_char;
23429        cur_tok:=(cur_cmd*max_char_val)+cur_chr;
23430      end else cur_tok:=cs_token_flag+cur_cs;
23431      back_input;
23432      begin_token_list(sa_ptr(cur_ptr), inter_char_text);
23433      goto #;
23434    end;
23435  end
23436
23437@<Append character |cur_chr|...@>=
23438
23439prev_class:=255; {boundary}
23440
23441{ added code for native font support }
23442if is_native_font(cur_font) then begin
23443  if mode>0 then if language<>clang then fix_language;
23444
23445  main_h:=0;
23446  main_f:=cur_font;
23447  native_len:=0;
23448
23449collect_native:
23450  adjust_space_factor;
23451
23452  check_for_inter_char_toks(collected);
23453
23454  if (cur_chr > @"FFFF) then begin
23455    native_room(2);
23456    append_native((cur_chr - @"10000) div 1024 + @"D800);
23457    append_native((cur_chr - @"10000) mod 1024 + @"DC00);
23458  end else begin
23459    native_room(1);
23460    append_native(cur_chr);
23461  end;
23462  is_hyph:=(cur_chr = hyphen_char[main_f])
23463    or (XeTeX_dash_break_en and ((cur_chr = @"2014) or (cur_chr = @"2013)));
23464  if (main_h = 0) and is_hyph then main_h:=native_len;
23465
23466  {try to collect as many chars as possible in the same font}
23467  get_next;
23468  if (cur_cmd=letter) or (cur_cmd=other_char) or (cur_cmd=char_given) then goto collect_native;
23469  x_token;
23470  if (cur_cmd=letter) or (cur_cmd=other_char) or (cur_cmd=char_given) then goto collect_native;
23471  if cur_cmd=char_num then begin
23472    scan_usv_num;
23473    cur_chr:=cur_val;
23474    goto collect_native;
23475  end;
23476
23477  check_for_post_char_toks(collected);
23478
23479collected:
23480  if (font_mapping[main_f] <> 0) then begin
23481    main_k:=apply_mapping(font_mapping[main_f], native_text, native_len);
23482    native_len:=0;
23483    native_room(main_k);
23484    main_h:=0;
23485    for main_p:=0 to main_k - 1 do begin
23486      append_native(mapped_text[main_p]);
23487      if (main_h = 0) and ((mapped_text[main_p] = hyphen_char[main_f])
23488        or (XeTeX_dash_break_en and ((mapped_text[main_p] = @"2014) or (mapped_text[main_p] = @"2013)) ) )
23489      then main_h:=native_len;
23490    end
23491  end;
23492
23493  if tracing_lost_chars > 0 then begin
23494    temp_ptr:=0;
23495    while (temp_ptr < native_len) do begin
23496      main_k:=native_text[temp_ptr];
23497      incr(temp_ptr);
23498      if (main_k >= @"D800) and (main_k < @"DC00) then begin
23499        main_k:=@"10000 + (main_k - @"D800) * 1024;
23500        main_k:=main_k + native_text[temp_ptr] - @"DC00;
23501        incr(temp_ptr);
23502      end;
23503      if map_char_to_glyph(main_f, main_k) = 0 then
23504        char_warning(main_f, main_k);
23505    end
23506  end;
23507
23508  main_k:=native_len;
23509  main_pp:=tail;
23510
23511  if mode=hmode then begin
23512    main_ppp:=head;
23513    if main_ppp<>main_pp then   { find node preceding tail, skipping discretionaries }
23514      while (link(main_ppp)<>main_pp) do begin
23515        if (not is_char_node(main_ppp)) and (type(main_ppp=disc_node)) then begin
23516          temp_ptr:=main_ppp;
23517          for main_p:=1 to replace_count(temp_ptr) do main_ppp:=link(main_ppp);
23518        end;
23519        if main_ppp<>main_pp then main_ppp:=link(main_ppp);
23520      end;
23521
23522    temp_ptr:=0;
23523    repeat
23524      if main_h = 0 then main_h:=main_k;
23525
23526      if is_native_word_node(main_pp)
23527        and (native_font(main_pp)=main_f)
23528        and (main_ppp<>main_pp)
23529        and (not is_char_node(main_ppp))
23530        and (type(main_ppp)<>disc_node)
23531      then begin
23532        { make a new temp string that contains the concatenated text of |tail| + the current word/fragment }
23533        main_k:=main_h + native_length(main_pp);
23534        native_room(main_k);
23535
23536        save_native_len:=native_len;
23537        for main_p:=0 to native_length(main_pp) - 1 do
23538          append_native(get_native_char(main_pp, main_p));
23539        for main_p:=0 to main_h - 1 do
23540          append_native(native_text[temp_ptr + main_p]);
23541
23542        do_locale_linebreaks(save_native_len, main_k);
23543
23544        native_len:=save_native_len;  { discard the temp string }
23545        main_k:=native_len - main_h - temp_ptr;   { and set |main_k| to remaining length of new word }
23546        temp_ptr:=main_h; { pointer to remaining fragment }
23547
23548        main_h:=0;
23549        while (main_h < main_k) and (native_text[temp_ptr + main_h] <> hyphen_char[main_f])
23550          and ( (not XeTeX_dash_break_en)
23551            or ((native_text[temp_ptr + main_h] <> @"2014) and (native_text[temp_ptr + main_h] <> @"2013)) )
23552        do incr(main_h);    { look for next hyphen or end of text }
23553        if (main_h < main_k) then incr(main_h);
23554
23555        { remove the preceding node from the list }
23556        link(main_ppp):=link(main_pp);
23557        link(main_pp):=null;
23558        flush_node_list(main_pp);
23559        main_pp:=tail;
23560        while (link(main_ppp)<>main_pp) do
23561          main_ppp:=link(main_ppp);
23562      end else begin
23563        do_locale_linebreaks(temp_ptr, main_h); { append fragment of current word }
23564
23565        temp_ptr:=temp_ptr + main_h;  { advance ptr to remaining fragment }
23566        main_k:=main_k - main_h;  { decrement remaining length }
23567
23568        main_h:=0;
23569        while (main_h < main_k) and (native_text[temp_ptr + main_h] <> hyphen_char[main_f])
23570          and ( (not XeTeX_dash_break_en)
23571            or ((native_text[temp_ptr + main_h] <> @"2014) and (native_text[temp_ptr + main_h] <> @"2013)) )
23572        do incr(main_h);    { look for next hyphen or end of text }
23573        if (main_h < main_k) then incr(main_h);
23574      end;
23575
23576      if (main_k > 0) or is_hyph then begin
23577        tail_append(new_disc);  { add a break if we aren't at end of text (must be a hyphen),
23578                                      or if last char in original text was a hyphen }
23579        main_pp:=tail;
23580      end;
23581    until main_k = 0;
23582  end else begin
23583    { must be restricted hmode, so no need for line-breaking or discretionaries }
23584    { but there might already be explicit |disc_node|s in the list }
23585    main_ppp:=head;
23586    if main_ppp<>main_pp then   { find node preceding tail, skipping discretionaries }
23587      while (link(main_ppp)<>main_pp) do begin
23588        if (not is_char_node(main_ppp)) and (type(main_ppp=disc_node)) then begin
23589          temp_ptr:=main_ppp;
23590          for main_p:=1 to replace_count(temp_ptr) do main_ppp:=link(main_ppp);
23591        end;
23592        if main_ppp<>main_pp then main_ppp:=link(main_ppp);
23593      end;
23594    if is_native_word_node(main_pp)
23595      and (native_font(main_pp)=main_f)
23596      and (main_ppp<>main_pp)
23597      and (not is_char_node(main_ppp))
23598      and (type(main_ppp)<>disc_node)
23599    then begin
23600      { total string length for the new merged whatsit }
23601      link(main_pp):=new_native_word_node(main_f, main_k + native_length(main_pp));
23602      tail:=link(main_pp);
23603
23604      { copy text from the old one into the new }
23605      for main_p:=0 to native_length(main_pp) - 1 do
23606        set_native_char(tail, main_p, get_native_char(main_pp, main_p));
23607      { append the new text }
23608      for main_p:=0 to main_k - 1 do
23609        set_native_char(tail, main_p + native_length(main_pp), native_text[main_p]);
23610      set_native_metrics(tail, XeTeX_use_glyph_metrics);
23611
23612      { remove the preceding node from the list }
23613      main_p:=head;
23614      if main_p<>main_pp then
23615        while link(main_p)<>main_pp do
23616          main_p:=link(main_p);
23617      link(main_p):=link(main_pp);
23618      link(main_pp):=null;
23619      flush_node_list(main_pp);
23620    end else begin
23621      { package the current string into a |native_word| whatsit }
23622      link(main_pp):=new_native_word_node(main_f, main_k);
23623      tail:=link(main_pp);
23624      for main_p:=0 to main_k - 1 do
23625        set_native_char(tail, main_p, native_text[main_p]);
23626      set_native_metrics(tail, XeTeX_use_glyph_metrics);
23627    end
23628  end;
23629
23630  if cur_ptr<>null then goto big_switch
23631  else goto reswitch;
23632end;
23633{ End of added code for native fonts }
23634
23635adjust_space_factor;@/
23636check_for_inter_char_toks(big_switch);
23637main_f:=cur_font;
23638bchar:=font_bchar[main_f]; false_bchar:=font_false_bchar[main_f];
23639if mode>0 then if language<>clang then fix_language;
23640fast_get_avail(lig_stack); font(lig_stack):=main_f; cur_l:=qi(cur_chr);
23641character(lig_stack):=cur_l;@/
23642cur_q:=tail;
23643if cancel_boundary then
23644  begin cancel_boundary:=false; main_k:=non_address;
23645  end
23646else main_k:=bchar_label[main_f];
23647if main_k=non_address then goto main_loop_move+2; {no left boundary processing}
23648cur_r:=cur_l; cur_l:=non_char;
23649goto main_lig_loop+1; {begin with cursor after left boundary}
23650@#
23651main_loop_wrapup:@<Make a ligature node, if |ligature_present|;
23652  insert a null discretionary, if appropriate@>;
23653main_loop_move:@<If the cursor is immediately followed by the right boundary,
23654  |goto reswitch|; if it's followed by an invalid character, |goto big_switch|;
23655  otherwise move the cursor one step to the right and |goto main_lig_loop|@>;
23656main_loop_lookahead:@<Look ahead for another character, or leave |lig_stack|
23657  empty if there's none there@>;
23658main_lig_loop:@<If there's a ligature/kern command relevant to |cur_l| and
23659  |cur_r|, adjust the text appropriately; exit to |main_loop_wrapup|@>;
23660main_loop_move_lig:@<Move the cursor past a pseudo-ligature, then
23661  |goto main_loop_lookahead| or |main_lig_loop|@>
23662
23663@ If |link(cur_q)| is nonnull when |wrapup| is invoked, |cur_q| points to
23664the list of characters that were consumed while building the ligature
23665character~|cur_l|.
23666
23667A discretionary break is not inserted for an explicit hyphen when we are in
23668restricted horizontal mode. In particular, this avoids putting discretionary
23669nodes inside of other discretionaries.
23670@^inner loop@>
23671
23672@d pack_lig(#)== {the parameter is either |rt_hit| or |false|}
23673  begin main_p:=new_ligature(main_f,cur_l,link(cur_q));
23674  if lft_hit then
23675    begin subtype(main_p):=2; lft_hit:=false;
23676    end;
23677  if # then if lig_stack=null then
23678    begin incr(subtype(main_p)); rt_hit:=false;
23679    end;
23680  link(cur_q):=main_p; tail:=main_p; ligature_present:=false;
23681  end
23682
23683@d wrapup(#)==if cur_l<non_char then
23684  begin if link(cur_q)>null then
23685    if character(tail)=qi(hyphen_char[main_f]) then ins_disc:=true;
23686  if ligature_present then pack_lig(#);
23687  if ins_disc then
23688    begin ins_disc:=false;
23689    if mode>0 then tail_append(new_disc);
23690    end;
23691  end
23692
23693@<Make a ligature node, if |ligature_present|;...@>=
23694wrapup(rt_hit)
23695
23696@ @<If the cursor is immediately followed by the right boundary...@>=
23697@^inner loop@>
23698if lig_stack=null then goto reswitch;
23699cur_q:=tail; cur_l:=character(lig_stack);
23700main_loop_move+1:if not is_char_node(lig_stack) then goto main_loop_move_lig;
23701main_loop_move+2:if(cur_chr<font_bc[main_f])or(cur_chr>font_ec[main_f]) then
23702  begin char_warning(main_f,cur_chr); free_avail(lig_stack); goto big_switch;
23703  end;
23704main_i:=char_info(main_f)(cur_l);
23705if not char_exists(main_i) then
23706  begin char_warning(main_f,cur_chr); free_avail(lig_stack); goto big_switch;
23707  end;
23708link(tail):=lig_stack; tail:=lig_stack {|main_loop_lookahead| is next}
23709
23710@ Here we are at |main_loop_move_lig|.
23711When we begin this code we have |cur_q=tail| and |cur_l=character(lig_stack)|.
23712
23713@<Move the cursor past a pseudo-ligature...@>=
23714main_p:=lig_ptr(lig_stack);
23715if main_p>null then tail_append(main_p); {append a single character}
23716temp_ptr:=lig_stack; lig_stack:=link(temp_ptr);
23717free_node(temp_ptr,small_node_size);
23718main_i:=char_info(main_f)(cur_l); ligature_present:=true;
23719if lig_stack=null then
23720  if main_p>null then goto main_loop_lookahead
23721  else cur_r:=bchar
23722else cur_r:=character(lig_stack);
23723goto main_lig_loop
23724
23725@ The result of \.{\\char} can participate in a ligature or kern, so we must
23726look ahead for it.
23727
23728@<Look ahead for another character...@>=
23729get_next; {set only |cur_cmd| and |cur_chr|, for speed}
23730if cur_cmd=letter then goto main_loop_lookahead+1;
23731if cur_cmd=other_char then goto main_loop_lookahead+1;
23732if cur_cmd=char_given then goto main_loop_lookahead+1;
23733x_token; {now expand and set |cur_cmd|, |cur_chr|, |cur_tok|}
23734if cur_cmd=letter then goto main_loop_lookahead+1;
23735if cur_cmd=other_char then goto main_loop_lookahead+1;
23736if cur_cmd=char_given then goto main_loop_lookahead+1;
23737if cur_cmd=char_num then
23738  begin scan_char_num; cur_chr:=cur_val; goto main_loop_lookahead+1;
23739  end;
23740if cur_cmd=no_boundary then bchar:=non_char;
23741cur_r:=bchar; lig_stack:=null; goto main_lig_loop;
23742main_loop_lookahead+1: adjust_space_factor;
23743check_for_inter_char_toks(big_switch);
23744fast_get_avail(lig_stack); font(lig_stack):=main_f;
23745cur_r:=qi(cur_chr); character(lig_stack):=cur_r;
23746if cur_r=false_bchar then cur_r:=non_char {this prevents spurious ligatures}
23747
23748@ Even though comparatively few characters have a lig/kern program, several
23749of the instructions here count as part of \TeX's inner loop, since a
23750@^inner loop@>
23751potentially long sequential search must be performed. For example, tests with
23752Computer Modern Roman showed that about 40 per cent of all characters
23753actually encountered in practice had a lig/kern program, and that about four
23754lig/kern commands were investigated for every such character.
23755
23756At the beginning of this code we have |main_i=char_info(main_f)(cur_l)|.
23757
23758@<If there's a ligature/kern command...@>=
23759if char_tag(main_i)<>lig_tag then goto main_loop_wrapup;
23760if cur_r=non_char then goto main_loop_wrapup;
23761main_k:=lig_kern_start(main_f)(main_i); main_j:=font_info[main_k].qqqq;
23762if skip_byte(main_j)<=stop_flag then goto main_lig_loop+2;
23763main_k:=lig_kern_restart(main_f)(main_j);
23764main_lig_loop+1:main_j:=font_info[main_k].qqqq;
23765main_lig_loop+2:if next_char(main_j)=cur_r then
23766 if skip_byte(main_j)<=stop_flag then
23767  @<Do ligature or kern command, returning to |main_lig_loop|
23768  or |main_loop_wrapup| or |main_loop_move|@>;
23769if skip_byte(main_j)=qi(0) then incr(main_k)
23770else begin if skip_byte(main_j)>=stop_flag then goto main_loop_wrapup;
23771  main_k:=main_k+qo(skip_byte(main_j))+1;
23772  end;
23773goto main_lig_loop+1
23774
23775@ When a ligature or kern instruction matches a character, we know from
23776|read_font_info| that the character exists in the font, even though we
23777haven't verified its existence in the normal way.
23778
23779This section could be made into a subroutine, if the code inside
23780|main_control| needs to be shortened.
23781
23782\chardef\?='174 % vertical line to indicate character retention
23783
23784@<Do ligature or kern command...@>=
23785begin if op_byte(main_j)>=kern_flag then
23786  begin wrapup(rt_hit);
23787  tail_append(new_kern(char_kern(main_f)(main_j))); goto main_loop_move;
23788  end;
23789if cur_l=non_char then lft_hit:=true
23790else if lig_stack=null then rt_hit:=true;
23791check_interrupt; {allow a way out in case there's an infinite ligature loop}
23792case op_byte(main_j) of
23793qi(1),qi(5):begin cur_l:=rem_byte(main_j); {\.{=:\?}, \.{=:\?>}}
23794  main_i:=char_info(main_f)(cur_l); ligature_present:=true;
23795  end;
23796qi(2),qi(6):begin cur_r:=rem_byte(main_j); {\.{\?=:}, \.{\?=:>}}
23797  if lig_stack=null then {right boundary character is being consumed}
23798    begin lig_stack:=new_lig_item(cur_r); bchar:=non_char;
23799    end
23800  else if is_char_node(lig_stack) then {|link(lig_stack)=null|}
23801    begin main_p:=lig_stack; lig_stack:=new_lig_item(cur_r);
23802    lig_ptr(lig_stack):=main_p;
23803    end
23804  else character(lig_stack):=cur_r;
23805  end;
23806qi(3):begin cur_r:=rem_byte(main_j); {\.{\?=:\?}}
23807  main_p:=lig_stack; lig_stack:=new_lig_item(cur_r);
23808  link(lig_stack):=main_p;
23809  end;
23810qi(7),qi(11):begin wrapup(false); {\.{\?=:\?>}, \.{\?=:\?>>}}
23811  cur_q:=tail; cur_l:=rem_byte(main_j);
23812  main_i:=char_info(main_f)(cur_l); ligature_present:=true;
23813  end;
23814othercases begin cur_l:=rem_byte(main_j); ligature_present:=true; {\.{=:}}
23815  if lig_stack=null then goto main_loop_wrapup
23816  else goto main_loop_move+1;
23817  end
23818endcases;
23819if op_byte(main_j)>qi(4) then
23820  if op_byte(main_j)<>qi(7) then goto main_loop_wrapup;
23821if cur_l<non_char then goto main_lig_loop;
23822main_k:=bchar_label[main_f]; goto main_lig_loop+1;
23823end
23824
23825@ The occurrence of blank spaces is almost part of \TeX's inner loop,
23826@^inner loop@>
23827since we usually encounter about one space for every five non-blank characters.
23828Therefore |main_control| gives second-highest priority to ordinary spaces.
23829
23830When a glue parameter like \.{\\spaceskip} is set to `\.{0pt}', we will
23831see to it later that the corresponding glue specification is precisely
23832|zero_glue|, not merely a pointer to some specification that happens
23833to be full of zeroes. Therefore it is simple to test whether a glue parameter
23834is zero or~not.
23835
23836@<Append a normal inter-word space...@>=
23837if space_skip=zero_glue then
23838  begin @<Find the glue specification, |main_p|, for
23839    text spaces in the current font@>;
23840  temp_ptr:=new_glue(main_p);
23841  end
23842else temp_ptr:=new_param_glue(space_skip_code);
23843link(tail):=temp_ptr; tail:=temp_ptr;
23844goto big_switch
23845
23846@ Having |font_glue| allocated for each text font saves both time and memory.
23847If any of the three spacing parameters are subsequently changed by the
23848use of \.{\\fontdimen}, the |find_font_dimen| procedure deallocates the
23849|font_glue| specification allocated here.
23850
23851@<Find the glue specification...@>=
23852begin main_p:=font_glue[cur_font];
23853if main_p=null then
23854  begin main_p:=new_spec(zero_glue); main_k:=param_base[cur_font]+space_code;
23855  width(main_p):=font_info[main_k].sc; {that's |space(cur_font)|}
23856  stretch(main_p):=font_info[main_k+1].sc; {and |space_stretch(cur_font)|}
23857  shrink(main_p):=font_info[main_k+2].sc; {and |space_shrink(cur_font)|}
23858  font_glue[cur_font]:=main_p;
23859  end;
23860end
23861
23862@ @<Declare act...@>=
23863procedure app_space; {handle spaces when |space_factor<>1000|}
23864var@!q:pointer; {glue node}
23865begin if (space_factor>=2000)and(xspace_skip<>zero_glue) then
23866  q:=new_param_glue(xspace_skip_code)
23867else  begin if space_skip<>zero_glue then main_p:=space_skip
23868  else @<Find the glue specification...@>;
23869  main_p:=new_spec(main_p);
23870  @<Modify the glue specification in |main_p| according to the space factor@>;
23871  q:=new_glue(main_p); glue_ref_count(main_p):=null;
23872  end;
23873link(tail):=q; tail:=q;
23874end;
23875
23876@ @<Modify the glue specification in |main_p| according to the space factor@>=
23877if space_factor>=2000 then width(main_p):=width(main_p)+extra_space(cur_font);
23878stretch(main_p):=xn_over_d(stretch(main_p),space_factor,1000);
23879shrink(main_p):=xn_over_d(shrink(main_p),1000,space_factor)
23880
23881@ Whew---that covers the main loop. We can now proceed at a leisurely
23882pace through the other combinations of possibilities.
23883
23884@d any_mode(#)==vmode+#,hmode+#,mmode+# {for mode-independent commands}
23885
23886@<Cases of |main_control| that are not part of the inner loop@>=
23887any_mode(relax),vmode+spacer,mmode+spacer,mmode+no_boundary:do_nothing;
23888any_mode(ignore_spaces): begin
23889  if cur_chr = 0 then begin
23890    @<Get the next non-blank non-call...@>;
23891    goto reswitch;
23892  end
23893  else begin
23894    t:=scanner_status;
23895    scanner_status:=normal;
23896    get_next;
23897    scanner_status:=t;
23898    if cur_cs < hash_base then
23899      cur_cs:=prim_lookup(cur_cs-257)
23900    else
23901      cur_cs :=prim_lookup(text(cur_cs));
23902    if cur_cs<>undefined_primitive then begin
23903      cur_cmd:=prim_eq_type(cur_cs);
23904      cur_chr:=prim_equiv(cur_cs);
23905      goto reswitch;
23906      end;
23907    end;
23908  end;
23909vmode+stop: if its_all_over then return; {this is the only way out}
23910@t\4@>@<Forbidden cases detected in |main_control|@>@+@,any_mode(mac_param):
23911  report_illegal_case;
23912@<Math-only cases in non-math modes, or vice versa@>: insert_dollar_sign;
23913@t\4@>@<Cases of |main_control| that build boxes and lists@>@;
23914@t\4@>@<Cases of |main_control| that don't depend on |mode|@>@;
23915@t\4@>@<Cases of |main_control| that are for extensions to \TeX@>@;
23916
23917@ Here is a list of cases where the user has probably gotten into or out of math
23918mode by mistake. \TeX\ will insert a dollar sign and rescan the current token.
23919
23920@d non_math(#)==vmode+#,hmode+#
23921
23922@<Math-only cases in non-math modes...@>=
23923non_math(sup_mark), non_math(sub_mark), non_math(math_char_num),
23924non_math(math_given), non_math(XeTeX_math_given), non_math(math_comp), non_math(delim_num),
23925non_math(left_right), non_math(above), non_math(radical),
23926non_math(math_style), non_math(math_choice), non_math(vcenter),
23927non_math(non_script), non_math(mkern), non_math(limit_switch),
23928non_math(mskip), non_math(math_accent),
23929mmode+endv, mmode+par_end, mmode+stop, mmode+vskip, mmode+un_vbox,
23930mmode+valign, mmode+hrule
23931
23932@ @<Declare action...@>=
23933procedure insert_dollar_sign;
23934begin back_input; cur_tok:=math_shift_token+"$";
23935print_err("Missing $ inserted");
23936@.Missing \$ inserted@>
23937help2("I've inserted a begin-math/end-math symbol since I think")@/
23938("you left one out. Proceed, with fingers crossed."); ins_error;
23939end;
23940
23941@ When erroneous situations arise, \TeX\ usually issues an error message
23942specific to the particular error. For example, `\.{\\noalign}' should
23943not appear in any mode, since it is recognized by the |align_peek| routine
23944in all of its legitimate appearances; a special error message is given
23945when `\.{\\noalign}' occurs elsewhere. But sometimes the most appropriate
23946error message is simply that the user is not allowed to do what he or she
23947has attempted. For example, `\.{\\moveleft}' is allowed only in vertical mode,
23948and `\.{\\lower}' only in non-vertical modes.  Such cases are enumerated
23949here and in the other sections referred to under `See also \dots.'
23950
23951@<Forbidden cases...@>=
23952vmode+vmove,hmode+hmove,mmode+hmove,any_mode(last_item),
23953
23954@ The `|you_cant|' procedure prints a line saying that the current command
23955is illegal in the current mode; it identifies these things symbolically.
23956
23957@<Declare action...@>=
23958procedure you_cant;
23959begin print_err("You can't use `");
23960@.You can't use x in y mode@>
23961print_cmd_chr(cur_cmd,cur_chr);
23962print("' in "); print_mode(mode);
23963end;
23964
23965@ @<Declare act...@>=
23966procedure report_illegal_case;
23967begin you_cant;
23968help4("Sorry, but I'm not programmed to handle this case;")@/
23969("I'll just pretend that you didn't ask for it.")@/
23970("If you're in the wrong mode, you might be able to")@/
23971("return to the right one by typing `I}' or `I$' or `I\par'.");@/
23972error;
23973end;
23974
23975@ Some operations are allowed only in privileged modes, i.e., in cases
23976that |mode>0|. The |privileged| function is used to detect violations
23977of this rule; it issues an error message and returns |false| if the
23978current |mode| is negative.
23979
23980@<Declare act...@>=
23981function privileged:boolean;
23982begin if mode>0 then privileged:=true
23983else  begin report_illegal_case; privileged:=false;
23984  end;
23985end;
23986
23987@ Either \.{\\dump} or \.{\\end} will cause |main_control| to enter the
23988endgame, since both of them have `|stop|' as their command code.
23989
23990@<Put each...@>=
23991primitive("end",stop,0);@/
23992@!@:end_}{\.{\\end} primitive@>
23993primitive("dump",stop,1);@/
23994@!@:dump_}{\.{\\dump} primitive@>
23995
23996@ @<Cases of |print_cmd_chr|...@>=
23997stop:if chr_code=1 then print_esc("dump")@+else print_esc("end");
23998
23999@ We don't want to leave |main_control| immediately when a |stop| command
24000is sensed, because it may be necessary to invoke an \.{\\output} routine
24001several times before things really grind to a halt. (The output routine
24002might even say `\.{\\gdef\\end\{...\}}', to prolong the life of the job.)
24003Therefore |its_all_over| is |true| only when the current page
24004and contribution list are empty, and when the last output was not a
24005``dead cycle.''
24006
24007@<Declare act...@>=
24008function its_all_over:boolean; {do this when \.{\\end} or \.{\\dump} occurs}
24009label exit;
24010begin if privileged then
24011  begin if (page_head=page_tail)and(head=tail)and(dead_cycles=0) then
24012    begin its_all_over:=true; return;
24013    end;
24014  back_input; {we will try to end again after ejecting residual material}
24015  tail_append(new_null_box);
24016  width(tail):=hsize;
24017  tail_append(new_glue(fill_glue));
24018  tail_append(new_penalty(-@'10000000000));@/
24019  build_page; {append \.{\\hbox to \\hsize\{\}\\vfill\\penalty-'10000000000}}
24020  end;
24021its_all_over:=false;
24022exit:end;
24023
24024@* \[47] Building boxes and lists.
24025The most important parts of |main_control| are concerned with \TeX's
24026chief mission of box-making. We need to control the activities that put
24027entries on vlists and hlists, as well as the activities that convert
24028those lists into boxes. All of the necessary machinery has already been
24029developed; it remains for us to ``push the buttons'' at the right times.
24030
24031@ As an introduction to these routines, let's consider one of the simplest
24032cases: What happens when `\.{\\hrule}' occurs in vertical mode, or
24033`\.{\\vrule}' in horizontal mode or math mode? The code in |main_control|
24034is short, since the |scan_rule_spec| routine already does most of what is
24035required; thus, there is no need for a special action procedure.
24036
24037Note that baselineskip calculations are disabled after a rule in vertical
24038mode, by setting |prev_depth:=ignore_depth|.
24039
24040@<Cases of |main_control| that build...@>=
24041vmode+hrule,hmode+vrule,mmode+vrule: begin tail_append(scan_rule_spec);
24042  if abs(mode)=vmode then prev_depth:=ignore_depth
24043  else if abs(mode)=hmode then space_factor:=1000;
24044  end;
24045
24046@ The processing of things like \.{\\hskip} and \.{\\vskip} is slightly
24047more complicated. But the code in |main_control| is very short, since
24048it simply calls on the action routine |append_glue|. Similarly, \.{\\kern}
24049activates |append_kern|.
24050
24051@<Cases of |main_control| that build...@>=
24052vmode+vskip,hmode+hskip,mmode+hskip,mmode+mskip: append_glue;
24053any_mode(kern),mmode+mkern: append_kern;
24054
24055@ The |hskip| and |vskip| command codes are used for control sequences
24056like \.{\\hss} and \.{\\vfil} as well as for \.{\\hskip} and \.{\\vskip}.
24057The difference is in the value of |cur_chr|.
24058
24059@d fil_code=0 {identifies \.{\\hfil} and \.{\\vfil}}
24060@d fill_code=1 {identifies \.{\\hfill} and \.{\\vfill}}
24061@d ss_code=2 {identifies \.{\\hss} and \.{\\vss}}
24062@d fil_neg_code=3 {identifies \.{\\hfilneg} and \.{\\vfilneg}}
24063@d skip_code=4 {identifies \.{\\hskip} and \.{\\vskip}}
24064@d mskip_code=5 {identifies \.{\\mskip}}
24065
24066@<Put each...@>=
24067primitive("hskip",hskip,skip_code);@/
24068@!@:hskip_}{\.{\\hskip} primitive@>
24069primitive("hfil",hskip,fil_code);
24070@!@:hfil_}{\.{\\hfil} primitive@>
24071primitive("hfill",hskip,fill_code);@/
24072@!@:hfill_}{\.{\\hfill} primitive@>
24073primitive("hss",hskip,ss_code);
24074@!@:hss_}{\.{\\hss} primitive@>
24075primitive("hfilneg",hskip,fil_neg_code);@/
24076@!@:hfil_neg_}{\.{\\hfilneg} primitive@>
24077primitive("vskip",vskip,skip_code);@/
24078@!@:vskip_}{\.{\\vskip} primitive@>
24079primitive("vfil",vskip,fil_code);
24080@!@:vfil_}{\.{\\vfil} primitive@>
24081primitive("vfill",vskip,fill_code);@/
24082@!@:vfill_}{\.{\\vfill} primitive@>
24083primitive("vss",vskip,ss_code);
24084@!@:vss_}{\.{\\vss} primitive@>
24085primitive("vfilneg",vskip,fil_neg_code);@/
24086@!@:vfil_neg_}{\.{\\vfilneg} primitive@>
24087primitive("mskip",mskip,mskip_code);@/
24088@!@:mskip_}{\.{\\mskip} primitive@>
24089primitive("kern",kern,explicit);
24090@!@:kern_}{\.{\\kern} primitive@>
24091primitive("mkern",mkern,mu_glue);@/
24092@!@:mkern_}{\.{\\mkern} primitive@>
24093
24094@ @<Cases of |print_cmd_chr|...@>=
24095hskip: case chr_code of
24096  skip_code:print_esc("hskip");
24097  fil_code:print_esc("hfil");
24098  fill_code:print_esc("hfill");
24099  ss_code:print_esc("hss");
24100  othercases print_esc("hfilneg")
24101  endcases;
24102vskip: case chr_code of
24103  skip_code:print_esc("vskip");
24104  fil_code:print_esc("vfil");
24105  fill_code:print_esc("vfill");
24106  ss_code:print_esc("vss");
24107  othercases print_esc("vfilneg")
24108  endcases;
24109mskip: print_esc("mskip");
24110kern: print_esc("kern");
24111mkern: print_esc("mkern");
24112
24113@ All the work relating to glue creation has been relegated to the
24114following subroutine. It does not call |build_page|, because it is
24115used in at least one place where that would be a mistake.
24116
24117@<Declare action...@>=
24118procedure append_glue;
24119var s:small_number; {modifier of skip command}
24120begin s:=cur_chr;
24121case s of
24122fil_code: cur_val:=fil_glue;
24123fill_code: cur_val:=fill_glue;
24124ss_code: cur_val:=ss_glue;
24125fil_neg_code: cur_val:=fil_neg_glue;
24126skip_code: scan_glue(glue_val);
24127mskip_code: scan_glue(mu_val);
24128end; {now |cur_val| points to the glue specification}
24129tail_append(new_glue(cur_val));
24130if s>=skip_code then
24131  begin decr(glue_ref_count(cur_val));
24132  if s>skip_code then subtype(tail):=mu_glue;
24133  end;
24134end;
24135
24136@ @<Declare act...@>=
24137procedure append_kern;
24138var s:quarterword; {|subtype| of the kern node}
24139begin s:=cur_chr; scan_dimen(s=mu_glue,false,false);
24140tail_append(new_kern(cur_val)); subtype(tail):=s;
24141end;
24142
24143@ Many of the actions related to box-making are triggered by the appearance
24144of braces in the input. For example, when the user says `\.{\\hbox}
24145\.{to} \.{100pt\{$\langle\,\hbox{hlist}\,\rangle$\}}' in vertical mode,
24146the information about the box size (100pt, |exactly|) is put onto |save_stack|
24147with a level boundary word just above it, and |cur_group:=adjusted_hbox_group|;
24148\TeX\ enters restricted horizontal mode to process the hlist. The right
24149brace eventually causes |save_stack| to be restored to its former state,
24150at which time the information about the box size (100pt, |exactly|) is
24151available once again; a box is packaged and we leave restricted horizontal
24152mode, appending the new box to the current list of the enclosing mode
24153(in this case to the current list of vertical mode), followed by any
24154vertical adjustments that were removed from the box by |hpack|.
24155
24156The next few sections of the program are therefore concerned with the
24157treatment of left and right curly braces.
24158
24159@ If a left brace occurs in the middle of a page or paragraph, it simply
24160introduces a new level of grouping, and the matching right brace will not have
24161such a drastic effect. Such grouping affects neither the mode nor the
24162current list.
24163
24164@<Cases of |main_control| that build...@>=
24165non_math(left_brace): new_save_level(simple_group);
24166any_mode(begin_group): new_save_level(semi_simple_group);
24167any_mode(end_group): if cur_group=semi_simple_group then unsave
24168  else off_save;
24169
24170@ We have to deal with errors in which braces and such things are not
24171properly nested. Sometimes the user makes an error of commission by
24172inserting an extra symbol, but sometimes the user makes an error of omission.
24173\TeX\ can't always tell one from the other, so it makes a guess and tries
24174to avoid getting into a loop.
24175
24176The |off_save| routine is called when the current group code is wrong. It tries
24177to insert something into the user's input that will help clean off
24178the top level.
24179
24180@<Declare act...@>=
24181procedure off_save;
24182var p:pointer; {inserted token}
24183begin if cur_group=bottom_level then
24184  @<Drop current token and complain that it was unmatched@>
24185else  begin back_input; p:=get_avail; link(temp_head):=p;
24186  print_err("Missing ");
24187  @<Prepare to insert a token that matches |cur_group|,
24188    and print what it is@>;
24189  print(" inserted"); ins_list(link(temp_head));
24190  help5("I've inserted something that you may have forgotten.")@/
24191  ("(See the <inserted text> above.)")@/
24192  ("With luck, this will get me unwedged. But if you")@/
24193  ("really didn't forget anything, try typing `2' now; then")@/
24194  ("my insertion and my current dilemma will both disappear.");
24195  error;
24196  end;
24197end;
24198
24199@ At this point, |link(temp_head)=p|, a pointer to an empty one-word node.
24200
24201@<Prepare to insert a token that matches |cur_group|...@>=
24202case cur_group of
24203semi_simple_group: begin info(p):=cs_token_flag+frozen_end_group;
24204  print_esc("endgroup");
24205@.Missing \\endgroup inserted@>
24206  end;
24207math_shift_group: begin info(p):=math_shift_token+"$"; print_char("$");
24208@.Missing \$ inserted@>
24209  end;
24210math_left_group: begin info(p):=cs_token_flag+frozen_right; link(p):=get_avail;
24211  p:=link(p); info(p):=other_token+"."; print_esc("right.");
24212@.Missing \\right\hbox{.} inserted@>
24213@^null delimiter@>
24214  end;
24215othercases begin info(p):=right_brace_token+"}"; print_char("}");
24216@.Missing \} inserted@>
24217  end
24218endcases
24219
24220@ @<Drop current token and complain that it was unmatched@>=
24221begin print_err("Extra "); print_cmd_chr(cur_cmd,cur_chr);
24222@.Extra x@>
24223help1("Things are pretty mixed up, but I think the worst is over.");@/
24224error;
24225end
24226
24227@ The routine for a |right_brace| character branches into many subcases,
24228since a variety of things may happen, depending on |cur_group|. Some
24229types of groups are not supposed to be ended by a right brace; error
24230messages are given in hopes of pinpointing the problem. Most branches
24231of this routine will be filled in later, when we are ready to understand
24232them; meanwhile, we must prepare ourselves to deal with such errors.
24233
24234@<Cases of |main_control| that build...@>=
24235any_mode(right_brace): handle_right_brace;
24236
24237@ @<Declare the procedure called |handle_right_brace|@>=
24238procedure handle_right_brace;
24239var p,@!q:pointer; {for short-term use}
24240@!d:scaled; {holds |split_max_depth| in |insert_group|}
24241@!f:integer; {holds |floating_penalty| in |insert_group|}
24242begin case cur_group of
24243simple_group: unsave;
24244bottom_level: begin print_err("Too many }'s");
24245@.Too many \}'s@>
24246  help2("You've closed more groups than you opened.")@/
24247  ("Such booboos are generally harmless, so keep going."); error;
24248  end;
24249semi_simple_group,math_shift_group,math_left_group: extra_right_brace;
24250@t\4@>@<Cases of |handle_right_brace| where a |right_brace| triggers
24251  a delayed action@>@;
24252othercases confusion("rightbrace")
24253@:this can't happen rightbrace}{\quad rightbrace@>
24254endcases;
24255end;
24256
24257@ @<Declare act...@>=
24258procedure extra_right_brace;
24259begin print_err("Extra }, or forgotten ");
24260@.Extra \}, or forgotten x@>
24261case cur_group of
24262semi_simple_group: print_esc("endgroup");
24263math_shift_group: print_char("$");
24264math_left_group: print_esc("right");
24265end;@/
24266help5("I've deleted a group-closing symbol because it seems to be")@/
24267("spurious, as in `$x}$'. But perhaps the } is legitimate and")@/
24268("you forgot something else, as in `\hbox{$x}'. In such cases")@/
24269("the way to recover is to insert both the forgotten and the")@/
24270("deleted material, e.g., by typing `I$}'."); error;
24271incr(align_state);
24272end;
24273
24274@ Here is where we clear the parameters that are supposed to revert to their
24275default values after every paragraph and when internal vertical mode is entered.
24276
24277@<Declare act...@>=
24278procedure normal_paragraph;
24279begin if looseness<>0 then eq_word_define(int_base+looseness_code,0);
24280if hang_indent<>0 then eq_word_define(dimen_base+hang_indent_code,0);
24281if hang_after<>1 then eq_word_define(int_base+hang_after_code,1);
24282if par_shape_ptr<>null then eq_define(par_shape_loc,shape_ref,null);
24283if inter_line_penalties_ptr<>null then
24284  eq_define(inter_line_penalties_loc,shape_ref,null);
24285end;
24286
24287@ Now let's turn to the question of how \.{\\hbox} is treated. We actually
24288need to consider also a slightly larger context, since constructions like
24289`\.{\\setbox3=}\penalty0\.{\\hbox...}' and
24290`\.{\\leaders}\penalty0\.{\\hbox...}' and
24291`\.{\\lower3.8pt\\hbox...}'
24292are supposed to invoke quite
24293different actions after the box has been packaged. Conversely,
24294constructions like `\.{\\setbox3=}' can be followed by a variety of
24295different kinds of boxes, and we would like to encode such things in an
24296efficient way.
24297
24298In other words, there are two problems: to represent the context of a box,
24299and to represent its type.
24300
24301The first problem is solved by putting a ``context code'' on the |save_stack|,
24302just below the two entries that give the dimensions produced by |scan_spec|.
24303The context code is either a (signed) shift amount, or it is a large
24304integer |>=box_flag|, where |box_flag=@t$2^{30}$@>|. Codes |box_flag| through
24305|global_box_flag-1| represent `\.{\\setbox0}' through `\.{\\setbox32767}';
24306codes |global_box_flag| through |ship_out_flag-1| represent
24307`\.{\\global\\setbox0}' through `\.{\\global\\setbox32767}';
24308code |ship_out_flag| represents `\.{\\shipout}'; and codes |leader_flag|
24309through |leader_flag+2| represent `\.{\\leaders}', `\.{\\cleaders}',
24310and `\.{\\xleaders}'.
24311
24312The second problem is solved by giving the command code |make_box| to all
24313control sequences that produce a box, and by using the following |chr_code|
24314values to distinguish between them: |box_code|, |copy_code|, |last_box_code|,
24315|vsplit_code|, |vtop_code|, |vtop_code+vmode|, and |vtop_code+hmode|, where
24316the latter two are used to denote \.{\\vbox} and \.{\\hbox}, respectively.
24317
24318@d box_flag==@'10000000000 {context code for `\.{\\setbox0}'}
24319@d global_box_flag==@'10000100000 {context code for `\.{\\global\\setbox0}'}
24320@d ship_out_flag==@'10000200000  {context code for `\.{\\shipout}'}
24321@d leader_flag==@'10000200001  {context code for `\.{\\leaders}'}
24322@d box_code=0 {|chr_code| for `\.{\\box}'}
24323@d copy_code=1 {|chr_code| for `\.{\\copy}'}
24324@d last_box_code=2 {|chr_code| for `\.{\\lastbox}'}
24325@d vsplit_code=3 {|chr_code| for `\.{\\vsplit}'}
24326@d vtop_code=4 {|chr_code| for `\.{\\vtop}'}
24327
24328@<Put each...@>=
24329primitive("moveleft",hmove,1);
24330@!@:move_left_}{\.{\\moveleft} primitive@>
24331primitive("moveright",hmove,0);@/
24332@!@:move_right_}{\.{\\moveright} primitive@>
24333primitive("raise",vmove,1);
24334@!@:raise_}{\.{\\raise} primitive@>
24335primitive("lower",vmove,0);
24336@!@:lower_}{\.{\\lower} primitive@>
24337@#
24338primitive("box",make_box,box_code);
24339@!@:box_}{\.{\\box} primitive@>
24340primitive("copy",make_box,copy_code);
24341@!@:copy_}{\.{\\copy} primitive@>
24342primitive("lastbox",make_box,last_box_code);
24343@!@:last_box_}{\.{\\lastbox} primitive@>
24344primitive("vsplit",make_box,vsplit_code);
24345@!@:vsplit_}{\.{\\vsplit} primitive@>
24346primitive("vtop",make_box,vtop_code);@/
24347@!@:vtop_}{\.{\\vtop} primitive@>
24348primitive("vbox",make_box,vtop_code+vmode);
24349@!@:vbox_}{\.{\\vbox} primitive@>
24350primitive("hbox",make_box,vtop_code+hmode);@/
24351@!@:hbox_}{\.{\\hbox} primitive@>
24352primitive("shipout",leader_ship,a_leaders-1); {|ship_out_flag=leader_flag-1|}
24353@!@:ship_out_}{\.{\\shipout} primitive@>
24354primitive("leaders",leader_ship,a_leaders);
24355@!@:leaders_}{\.{\\leaders} primitive@>
24356primitive("cleaders",leader_ship,c_leaders);
24357@!@:c_leaders_}{\.{\\cleaders} primitive@>
24358primitive("xleaders",leader_ship,x_leaders);
24359@!@:x_leaders_}{\.{\\xleaders} primitive@>
24360
24361@ @<Cases of |print_cmd_chr|...@>=
24362hmove: if chr_code=1 then print_esc("moveleft")@+else print_esc("moveright");
24363vmove: if chr_code=1 then print_esc("raise")@+else print_esc("lower");
24364make_box: case chr_code of
24365  box_code: print_esc("box");
24366  copy_code: print_esc("copy");
24367  last_box_code: print_esc("lastbox");
24368  vsplit_code: print_esc("vsplit");
24369  vtop_code: print_esc("vtop");
24370  vtop_code+vmode: print_esc("vbox");
24371  othercases print_esc("hbox")
24372  endcases;
24373leader_ship: if chr_code=a_leaders then print_esc("leaders")
24374  else if chr_code=c_leaders then print_esc("cleaders")
24375  else if chr_code=x_leaders then print_esc("xleaders")
24376  else print_esc("shipout");
24377
24378@ Constructions that require a box are started by calling |scan_box| with
24379a specified context code. The |scan_box| routine verifies
24380that a |make_box| command comes next and then it calls |begin_box|.
24381
24382@<Cases of |main_control| that build...@>=
24383vmode+hmove,hmode+vmove,mmode+vmove: begin t:=cur_chr;
24384  scan_normal_dimen;
24385  if t=0 then scan_box(cur_val)@+else scan_box(-cur_val);
24386  end;
24387any_mode(leader_ship): scan_box(leader_flag-a_leaders+cur_chr);
24388any_mode(make_box): begin_box(0);
24389
24390@ The global variable |cur_box| will point to a newly made box. If the box
24391is void, we will have |cur_box=null|. Otherwise we will have
24392|type(cur_box)=hlist_node| or |vlist_node| or |rule_node|; the |rule_node|
24393case can occur only with leaders.
24394
24395@<Glob...@>=
24396@!cur_box:pointer; {box to be placed into its context}
24397
24398@ The |box_end| procedure does the right thing with |cur_box|, if
24399|box_context| represents the context as explained above.
24400
24401@<Declare act...@>=
24402procedure box_end(@!box_context:integer);
24403var p:pointer; {|ord_noad| for new box in math mode}
24404@!a:small_number; {global prefix}
24405begin if box_context<box_flag then @<Append box |cur_box| to the current list,
24406    shifted by |box_context|@>
24407else if box_context<ship_out_flag then @<Store \(c)|cur_box| in a box register@>
24408else if cur_box<>null then
24409  if box_context>ship_out_flag then @<Append a new leader node that
24410      uses |cur_box|@>
24411  else ship_out(cur_box);
24412end;
24413
24414@ The global variable |adjust_tail| will be non-null if and only if the
24415current box might include adjustments that should be appended to the
24416current vertical list.
24417
24418@<Append box |cur_box| to the current...@>=
24419begin if cur_box<>null then
24420  begin shift_amount(cur_box):=box_context;
24421  if abs(mode)=vmode then
24422    begin
24423        if pre_adjust_tail <> null then begin
24424            if pre_adjust_head <> pre_adjust_tail then
24425                append_list(pre_adjust_head)(pre_adjust_tail);
24426            pre_adjust_tail:=null;
24427        end;
24428        append_to_vlist(cur_box);
24429        if adjust_tail <> null then begin
24430            if adjust_head <> adjust_tail then
24431                append_list(adjust_head)(adjust_tail);
24432      adjust_tail:=null;
24433      end;
24434    if mode>0 then build_page;
24435    end
24436  else  begin if abs(mode)=hmode then space_factor:=1000
24437    else  begin p:=new_noad;
24438      math_type(nucleus(p)):=sub_box;
24439      info(nucleus(p)):=cur_box; cur_box:=p;
24440      end;
24441    link(tail):=cur_box; tail:=cur_box;
24442    end;
24443  end;
24444end
24445
24446@ @<Store \(c)|cur_box| in a box register@>=
24447begin if box_context<global_box_flag then
24448  begin cur_val:=box_context-box_flag; a:=0;
24449  end
24450else  begin cur_val:=box_context-global_box_flag; a:=4;
24451  end;
24452if cur_val<256 then define(box_base+cur_val,box_ref,cur_box)
24453else sa_def_box;
24454end
24455
24456@ @<Append a new leader node ...@>=
24457begin @<Get the next non-blank non-relax...@>;
24458if ((cur_cmd=hskip)and(abs(mode)<>vmode))or@|
24459   ((cur_cmd=vskip)and(abs(mode)=vmode)) then
24460  begin append_glue; subtype(tail):=box_context-(leader_flag-a_leaders);
24461  leader_ptr(tail):=cur_box;
24462  end
24463else  begin print_err("Leaders not followed by proper glue");
24464@.Leaders not followed by...@>
24465  help3("You should say `\leaders <box or rule><hskip or vskip>'.")@/
24466  ("I found the <box or rule>, but there's no suitable")@/
24467  ("<hskip or vskip>, so I'm ignoring these leaders."); back_error;
24468  flush_node_list(cur_box);
24469  end;
24470end
24471
24472@ Now that we can see what eventually happens to boxes, we can consider
24473the first steps in their creation. The |begin_box| routine is called when
24474|box_context| is a context specification, |cur_chr| specifies the type of
24475box desired, and |cur_cmd=make_box|.
24476
24477@<Declare act...@>=
24478procedure begin_box(@!box_context:integer);
24479label exit, done;
24480var @!p,@!q:pointer; {run through the current list}
24481@!r:pointer; {running behind |p|}
24482@!fm:boolean; {a final \.{\\beginM} \.{\\endM} node pair?}
24483@!tx:pointer; {effective tail node}
24484@!m:quarterword; {the length of a replacement list}
24485@!k:halfword; {0 or |vmode| or |hmode|}
24486@!n:halfword; {a box number}
24487begin case cur_chr of
24488box_code: begin scan_register_num; fetch_box(cur_box);
24489  change_box(null); {the box becomes void, at the same level}
24490  end;
24491copy_code: begin scan_register_num; fetch_box(q); cur_box:=copy_node_list(q);
24492  end;
24493last_box_code: @<If the current list ends with a box node, delete it from
24494  the list and make |cur_box| point to it; otherwise set |cur_box:=null|@>;
24495vsplit_code: @<Split off part of a vertical box, make |cur_box| point to it@>;
24496othercases @<Initiate the construction of an hbox or vbox, then |return|@>
24497endcases;@/
24498box_end(box_context); {in simple cases, we use the box immediately}
24499exit:end;
24500
24501@ Note that the condition |not is_char_node(tail)| implies that |head<>tail|,
24502since |head| is a one-word node.
24503
24504@d fetch_effective_tail_eTeX(#)== {extract |tx|,
24505  drop \.{\\beginM} \.{\\endM} pair}
24506q:=head; p:=null;
24507repeat r:=p; p:=q; fm:=false;
24508if not is_char_node(q) then
24509  if type(q)=disc_node then
24510    begin for m:=1 to replace_count(q) do p:=link(p);
24511    if p=tx then #;
24512    end
24513  else if (type(q)=math_node)and(subtype(q)=begin_M_code) then fm:=true;
24514q:=link(p);
24515until q=tx; {found |r|$\to$|p|$\to$|q=tx|}
24516q:=link(tx); link(p):=q; link(tx):=null;
24517if q=null then if fm then confusion("tail1")
24518@:this can't happen tail1}{\quad tail1@>
24519  else tail:=p
24520else if fm then {|r|$\to$|p=begin_M|$\to$|q=end_M|}
24521  begin tail:=r; link(r):=null; flush_node_list(p);@+end
24522@#
24523@d check_effective_tail(#)==find_effective_tail_eTeX
24524@d fetch_effective_tail==fetch_effective_tail_eTeX
24525
24526@<If the current list ends with a box node, delete it...@>=
24527begin cur_box:=null;
24528if abs(mode)=mmode then
24529  begin you_cant; help1("Sorry; this \lastbox will be void."); error;
24530  end
24531else if (mode=vmode)and(head=tail) then
24532  begin you_cant;
24533  help2("Sorry...I usually can't take things from the current page.")@/
24534    ("This \lastbox will therefore be void."); error;
24535  end
24536else  begin check_effective_tail(goto done);
24537  if not is_char_node(tx) then
24538    if (type(tx)=hlist_node)or(type(tx)=vlist_node) then
24539      @<Remove the last box, unless it's part of a discretionary@>;
24540  done:end;
24541end
24542
24543@ @<Remove the last box...@>=
24544begin fetch_effective_tail(goto done);
24545cur_box:=tx; shift_amount(cur_box):=0;
24546end
24547
24548@ Here we deal with things like `\.{\\vsplit 13 to 100pt}'.
24549
24550@<Split off part of a vertical box, make |cur_box| point to it@>=
24551begin scan_register_num; n:=cur_val;
24552if not scan_keyword("to") then
24553@.to@>
24554  begin print_err("Missing `to' inserted");
24555@.Missing `to' inserted@>
24556  help2("I'm working on `\vsplit<box number> to <dimen>';")@/
24557  ("will look for the <dimen> next."); error;
24558  end;
24559scan_normal_dimen;
24560cur_box:=vsplit(n,cur_val);
24561end
24562
24563@ Here is where we enter restricted horizontal mode or internal vertical
24564mode, in order to make a box.
24565
24566@<Initiate the construction of an hbox or vbox, then |return|@>=
24567begin k:=cur_chr-vtop_code; saved(0):=box_context;
24568if k=hmode then
24569  if (box_context<box_flag)and(abs(mode)=vmode) then
24570    scan_spec(adjusted_hbox_group,true)
24571  else scan_spec(hbox_group,true)
24572else  begin if k=vmode then scan_spec(vbox_group,true)
24573  else  begin scan_spec(vtop_group,true); k:=vmode;
24574    end;
24575  normal_paragraph;
24576  end;
24577push_nest; mode:=-k;
24578if k=vmode then
24579  begin prev_depth:=ignore_depth;
24580  if every_vbox<>null then begin_token_list(every_vbox,every_vbox_text);
24581  end
24582else  begin space_factor:=1000;
24583  if every_hbox<>null then begin_token_list(every_hbox,every_hbox_text);
24584  end;
24585return;
24586end
24587
24588@ @<Declare act...@>=
24589procedure scan_box(@!box_context:integer);
24590  {the next input should specify a box or perhaps a rule}
24591begin @<Get the next non-blank non-relax...@>;
24592if cur_cmd=make_box then begin_box(box_context)
24593else if (box_context>=leader_flag)and((cur_cmd=hrule)or(cur_cmd=vrule)) then
24594  begin cur_box:=scan_rule_spec; box_end(box_context);
24595  end
24596else  begin@t@>@;@/
24597  print_err("A <box> was supposed to be here");@/
24598@.A <box> was supposed to...@>
24599  help3("I was expecting to see \hbox or \vbox or \copy or \box or")@/
24600  ("something like that. So you might find something missing in")@/
24601  ("your output. But keep trying; you can fix this later."); back_error;
24602  end;
24603end;
24604
24605@ When the right brace occurs at the end of an \.{\\hbox} or \.{\\vbox} or
24606\.{\\vtop} construction, the |package| routine comes into action. We might
24607also have to finish a paragraph that hasn't ended.
24608
24609@<Cases of |handle...@>=
24610hbox_group: package(0);
24611adjusted_hbox_group: begin adjust_tail:=adjust_head;
24612    pre_adjust_tail:=pre_adjust_head; package(0);
24613  end;
24614vbox_group: begin end_graf; package(0);
24615  end;
24616vtop_group: begin end_graf; package(vtop_code);
24617  end;
24618
24619@ @<Declare action...@>=
24620procedure package(@!c:small_number);
24621var h:scaled; {height of box}
24622@!p:pointer; {first node in a box}
24623@!d:scaled; {max depth}
24624@!u,v:integer; {saved values for upwards mode flag}
24625begin d:=box_max_depth; u:=XeTeX_upwards_state; unsave; save_ptr:=save_ptr-3;
24626v:=XeTeX_upwards_state; XeTeX_upwards_state:=u;
24627if mode=-hmode then cur_box:=hpack(link(head),saved(2),saved(1))
24628else  begin cur_box:=vpackage(link(head),saved(2),saved(1),d);
24629  if c=vtop_code then @<Readjust the height and depth of |cur_box|,
24630    for \.{\\vtop}@>;
24631  end;
24632XeTeX_upwards_state:=v;
24633pop_nest; box_end(saved(0));
24634end;
24635
24636@ The height of a `\.{\\vtop}' box is inherited from the first item on its list,
24637if that item is an |hlist_node|, |vlist_node|, or |rule_node|; otherwise
24638the \.{\\vtop} height is zero.
24639
24640
24641@<Readjust the height...@>=
24642begin h:=0; p:=list_ptr(cur_box);
24643if p<>null then if type(p)<=rule_node then h:=height(p);
24644depth(cur_box):=depth(cur_box)-h+height(cur_box); height(cur_box):=h;
24645end
24646
24647@ A paragraph begins when horizontal-mode material occurs in vertical mode,
24648or when the paragraph is explicitly started by `\.{\\indent}' or
24649`\.{\\noindent}'.
24650
24651@<Put each...@>=
24652primitive("indent",start_par,1);
24653@!@:indent_}{\.{\\indent} primitive@>
24654primitive("noindent",start_par,0);
24655@!@:no_indent_}{\.{\\noindent} primitive@>
24656
24657@ @<Cases of |print_cmd_chr|...@>=
24658start_par: if chr_code=0 then print_esc("noindent")@+ else print_esc("indent");
24659
24660@ @<Cases of |main_control| that build...@>=
24661vmode+start_par: new_graf(cur_chr>0);
24662vmode+letter,vmode+other_char,vmode+char_num,vmode+char_given,
24663   vmode+math_shift,vmode+un_hbox,vmode+vrule,
24664   vmode+accent,vmode+discretionary,vmode+hskip,vmode+valign,
24665   vmode+ex_space,vmode+no_boundary:@t@>@;@/
24666  begin back_input; new_graf(true);
24667  end;
24668
24669@ @<Declare act...@>=
24670function norm_min(@!h:integer):small_number;
24671begin if h<=0 then norm_min:=1@+else if h>=63 then norm_min:=63@+
24672else norm_min:=h;
24673end;
24674@#
24675procedure new_graf(@!indented:boolean);
24676begin prev_graf:=0;
24677if (mode=vmode)or(head<>tail) then
24678  tail_append(new_param_glue(par_skip_code));
24679push_nest; mode:=hmode; space_factor:=1000; set_cur_lang; clang:=cur_lang;
24680prev_graf:=(norm_min(left_hyphen_min)*@'100+norm_min(right_hyphen_min))
24681             *@'200000+cur_lang;
24682if indented then
24683  begin tail:=new_null_box; link(head):=tail; width(tail):=par_indent;@+
24684  end;
24685if every_par<>null then begin_token_list(every_par,every_par_text);
24686if nest_ptr=1 then build_page; {put |par_skip| glue on current page}
24687end;
24688
24689@ @<Cases of |main_control| that build...@>=
24690hmode+start_par,mmode+start_par: indent_in_hmode;
24691
24692@ @<Declare act...@>=
24693procedure indent_in_hmode;
24694var p,@!q:pointer;
24695begin if cur_chr>0 then {\.{\\indent}}
24696  begin p:=new_null_box; width(p):=par_indent;
24697  if abs(mode)=hmode then space_factor:=1000
24698  else  begin q:=new_noad; math_type(nucleus(q)):=sub_box;
24699    info(nucleus(q)):=p; p:=q;
24700    end;
24701  tail_append(p);
24702  end;
24703end;
24704
24705@ A paragraph ends when a |par_end| command is sensed, or when we are in
24706horizontal mode when reaching the right brace of vertical-mode routines
24707like \.{\\vbox}, \.{\\insert}, or \.{\\output}.
24708
24709@<Cases of |main_control| that build...@>=
24710vmode+par_end: begin normal_paragraph;
24711  if mode>0 then build_page;
24712  end;
24713hmode+par_end: begin if align_state<0 then off_save; {this tries to
24714    recover from an alignment that didn't end properly}
24715  end_graf; {this takes us to the enclosing mode, if |mode>0|}
24716  if mode=vmode then build_page;
24717  end;
24718hmode+stop,hmode+vskip,hmode+hrule,hmode+un_vbox,hmode+halign: head_for_vmode;
24719
24720@ @<Declare act...@>=
24721procedure head_for_vmode;
24722begin if mode<0 then
24723  if cur_cmd<>hrule then off_save
24724  else  begin print_err("You can't use `");
24725    print_esc("hrule"); print("' here except with leaders");
24726@.You can't use \\hrule...@>
24727    help2("To put a horizontal rule in an hbox or an alignment,")@/
24728      ("you should use \leaders or \hrulefill (see The TeXbook).");
24729    error;
24730    end
24731else  begin back_input; cur_tok:=par_token; back_input; token_type:=inserted;
24732  end;
24733end;
24734
24735@ @<Declare act...@>=
24736procedure end_graf;
24737begin if mode=hmode then
24738  begin if head=tail then pop_nest {null paragraphs are ignored}
24739  else line_break(false);
24740  if LR_save<>null then
24741    begin flush_list(LR_save); LR_save:=null;
24742    end;
24743  normal_paragraph;
24744  error_count:=0;
24745  end;
24746end;
24747
24748@ Insertion and adjustment and mark nodes are constructed by the following
24749pieces of the program.
24750
24751@<Cases of |main_control| that build...@>=
24752any_mode(insert),hmode+vadjust,mmode+vadjust: begin_insert_or_adjust;
24753any_mode(mark): make_mark;
24754
24755@ @<Forbidden...@>=
24756vmode+vadjust,
24757
24758@ @<Declare act...@>=
24759procedure begin_insert_or_adjust;
24760begin if cur_cmd=vadjust then cur_val:=255
24761else  begin scan_eight_bit_int;
24762  if cur_val=255 then
24763    begin print_err("You can't "); print_esc("insert"); print_int(255);
24764@.You can't \\insert255@>
24765    help1("I'm changing to \insert0; box 255 is special.");
24766    error; cur_val:=0;
24767    end;
24768  end;
24769saved(0):=cur_val;
24770if (cur_cmd = vadjust) and scan_keyword("pre") then
24771    saved(1):=1
24772else
24773    saved(1):=0;
24774save_ptr:=save_ptr + 2;
24775new_save_level(insert_group); scan_left_brace; normal_paragraph;
24776push_nest; mode:=-vmode; prev_depth:=ignore_depth;
24777end;
24778
24779@ @<Cases of |handle...@>=
24780insert_group: begin end_graf; q:=split_top_skip; add_glue_ref(q);
24781  d:=split_max_depth; f:=floating_penalty; unsave; save_ptr:=save_ptr-2;
24782  {now |saved(0)| is the insertion number, or 255 for |vadjust|}
24783  p:=vpack(link(head),natural); pop_nest;
24784  if saved(0)<255 then
24785    begin tail_append(get_node(ins_node_size));
24786    type(tail):=ins_node; subtype(tail):=qi(saved(0));
24787    height(tail):=height(p)+depth(p); ins_ptr(tail):=list_ptr(p);
24788    split_top_ptr(tail):=q; depth(tail):=d; float_cost(tail):=f;
24789    end
24790  else  begin tail_append(get_node(small_node_size));
24791    type(tail):=adjust_node;@/
24792    adjust_pre(tail):=saved(1); {the |subtype| is used for |adjust_pre|}
24793    adjust_ptr(tail):=list_ptr(p); delete_glue_ref(q);
24794    end;
24795  free_node(p,box_node_size);
24796  if nest_ptr=0 then build_page;
24797  end;
24798output_group: @<Resume the page builder...@>;
24799
24800@ @<Declare act...@>=
24801procedure make_mark;
24802var p:pointer; {new node}
24803@!c:halfword; {the mark class}
24804begin if cur_chr=0 then c:=0
24805else  begin scan_register_num; c:=cur_val;
24806  end;
24807p:=scan_toks(false,true); p:=get_node(small_node_size);
24808mark_class(p):=c;
24809type(p):=mark_node; subtype(p):=0; {the |subtype| is not used}
24810mark_ptr(p):=def_ref; link(tail):=p; tail:=p;
24811end;
24812
24813@ Penalty nodes get into a list via the |break_penalty| command.
24814@^penalties@>
24815
24816@<Cases of |main_control| that build...@>=
24817any_mode(break_penalty): append_penalty;
24818
24819@ @<Declare action...@>=
24820procedure append_penalty;
24821begin scan_int; tail_append(new_penalty(cur_val));
24822if mode=vmode then build_page;
24823end;
24824
24825@ The |remove_item| command removes a penalty, kern, or glue node if it
24826appears at the tail of the current list, using a brute-force linear scan.
24827Like \.{\\lastbox}, this command is not allowed in vertical mode (except
24828internal vertical mode), since the current list in vertical mode is sent
24829to the page builder.  But if we happen to be able to implement it in
24830vertical mode, we do.
24831
24832@<Cases of |main_control| that build...@>=
24833any_mode(remove_item): delete_last;
24834
24835@ When |delete_last| is called, |cur_chr| is the |type| of node that
24836will be deleted, if present.
24837
24838@<Declare action...@>=
24839procedure delete_last;
24840label exit;
24841var @!p,@!q:pointer; {run through the current list}
24842@!r:pointer; {running behind |p|}
24843@!fm:boolean; {a final \.{\\beginM} \.{\\endM} node pair?}
24844@!tx:pointer; {effective tail node}
24845@!m:quarterword; {the length of a replacement list}
24846begin if (mode=vmode)and(tail=head) then
24847  @<Apologize for inability to do the operation now,
24848    unless \.{\\unskip} follows non-glue@>
24849else  begin check_effective_tail(return);
24850  if not is_char_node(tx) then if type(tx)=cur_chr then
24851    begin fetch_effective_tail(return);
24852    flush_node_list(tx);
24853    end;
24854  end;
24855exit:end;
24856
24857@ @<Apologize for inability to do the operation...@>=
24858begin if (cur_chr<>glue_node)or(last_glue<>max_halfword) then
24859  begin you_cant;
24860  help2("Sorry...I usually can't take things from the current page.")@/
24861    ("Try `I\vskip-\lastskip' instead.");
24862  if cur_chr=kern_node then help_line[0]:=
24863    ("Try `I\kern-\lastkern' instead.")
24864  else if cur_chr<>glue_node then help_line[0]:=@|
24865    ("Perhaps you can make the output routine do it.");
24866  error;
24867  end;
24868end
24869
24870@ @<Put each...@>=
24871primitive("unpenalty",remove_item,penalty_node);@/
24872@!@:un_penalty_}{\.{\\unpenalty} primitive@>
24873primitive("unkern",remove_item,kern_node);@/
24874@!@:un_kern_}{\.{\\unkern} primitive@>
24875primitive("unskip",remove_item,glue_node);@/
24876@!@:un_skip_}{\.{\\unskip} primitive@>
24877primitive("unhbox",un_hbox,box_code);@/
24878@!@:un_hbox_}{\.{\\unhbox} primitive@>
24879primitive("unhcopy",un_hbox,copy_code);@/
24880@!@:un_hcopy_}{\.{\\unhcopy} primitive@>
24881primitive("unvbox",un_vbox,box_code);@/
24882@!@:un_vbox_}{\.{\\unvbox} primitive@>
24883primitive("unvcopy",un_vbox,copy_code);@/
24884@!@:un_vcopy_}{\.{\\unvcopy} primitive@>
24885
24886@ @<Cases of |print_cmd_chr|...@>=
24887remove_item: if chr_code=glue_node then print_esc("unskip")
24888  else if chr_code=kern_node then print_esc("unkern")
24889  else print_esc("unpenalty");
24890un_hbox: if chr_code=copy_code then print_esc("unhcopy")
24891  else print_esc("unhbox");
24892un_vbox: if chr_code=copy_code then print_esc("unvcopy")
24893  @<Cases of |un_vbox| for |print_cmd_chr|@>@/
24894  else print_esc("unvbox");
24895
24896@ The |un_hbox| and |un_vbox| commands unwrap one of the 256 current boxes.
24897
24898@<Cases of |main_control| that build...@>=
24899vmode+un_vbox,hmode+un_hbox,mmode+un_hbox: unpackage;
24900
24901@ @<Declare act...@>=
24902procedure unpackage;
24903label done, exit;
24904var p:pointer; {the box}
24905    r: pointer; {to remove marginal kern nodes}
24906@!c:box_code..copy_code; {should we copy?}
24907begin if cur_chr>copy_code then @<Handle saved items and |goto done|@>;
24908c:=cur_chr; scan_register_num; fetch_box(p);
24909if p=null then return;
24910if (abs(mode)=mmode)or((abs(mode)=vmode)and(type(p)<>vlist_node))or@|
24911   ((abs(mode)=hmode)and(type(p)<>hlist_node)) then
24912  begin print_err("Incompatible list can't be unboxed");
24913@.Incompatible list...@>
24914  help3("Sorry, Pandora. (You sneaky devil.)")@/
24915  ("I refuse to unbox an \hbox in vertical mode or vice versa.")@/
24916  ("And I can't open any boxes in math mode.");@/
24917  error; return;
24918  end;
24919if c=copy_code then link(tail):=copy_node_list(list_ptr(p))
24920else  begin link(tail):=list_ptr(p); change_box(null);
24921  free_node(p,box_node_size);
24922  end;
24923done:
24924while link(tail) <> null do begin
24925    r:=link(tail);
24926    if not is_char_node(r) and (type(r) = margin_kern_node) then begin
24927        link(tail):=link(r);
24928        free_node(r, margin_kern_node_size);
24929    end;
24930    tail:=link(tail);
24931end;
24932exit:end;
24933
24934@ @<Forbidden...@>=vmode+ital_corr,
24935
24936@ Italic corrections are converted to kern nodes when the |ital_corr| command
24937follows a character. In math mode the same effect is achieved by appending
24938a kern of zero here, since italic corrections are supplied later.
24939
24940@<Cases of |main_control| that build...@>=
24941hmode+ital_corr: append_italic_correction;
24942mmode+ital_corr: tail_append(new_kern(0));
24943
24944@ @<Declare act...@>=
24945procedure append_italic_correction;
24946label exit;
24947var p:pointer; {|char_node| at the tail of the current list}
24948@!f:internal_font_number; {the font in the |char_node|}
24949begin if tail<>head then
24950  begin if is_char_node(tail) then p:=tail
24951  else if type(tail)=ligature_node then p:=lig_char(tail)
24952  else if (type(tail)=whatsit_node) then begin
24953    if (subtype(tail)=native_word_node) then begin
24954      tail_append(new_kern(get_native_italic_correction(tail))); subtype(tail):=explicit;
24955    end
24956    else if (subtype(tail)=glyph_node) then begin
24957      tail_append(new_kern(get_native_glyph_italic_correction(tail))); subtype(tail):=explicit;
24958    end;
24959    return;
24960  end
24961  else return;
24962  f:=font(p);
24963  tail_append(new_kern(char_italic(f)(char_info(f)(character(p)))));
24964  subtype(tail):=explicit;
24965  end;
24966exit:end;
24967
24968@ Discretionary nodes are easy in the common case `\.{\\-}', but in the
24969general case we must process three braces full of items.
24970
24971@<Put each...@>=
24972primitive("-",discretionary,1);
24973@!@:Single-character primitives -}{\quad\.{\\-}@>
24974primitive("discretionary",discretionary,0);
24975@!@:discretionary_}{\.{\\discretionary} primitive@>
24976
24977@ @<Cases of |print_cmd_chr|...@>=
24978discretionary: if chr_code=1 then
24979  print_esc("-")@+else print_esc("discretionary");
24980
24981@ @<Cases of |main_control| that build...@>=
24982hmode+discretionary,mmode+discretionary: append_discretionary;
24983
24984@ The space factor does not change when we append a discretionary node,
24985but it starts out as 1000 in the subsidiary lists.
24986
24987@<Declare act...@>=
24988procedure append_discretionary;
24989var c:integer; {hyphen character}
24990begin tail_append(new_disc);
24991if cur_chr=1 then
24992  begin c:=hyphen_char[cur_font];
24993  if c>=0 then if c<=biggest_char then pre_break(tail):=new_character(cur_font,c);
24994  end
24995else  begin incr(save_ptr); saved(-1):=0; new_save_level(disc_group);
24996  scan_left_brace; push_nest; mode:=-hmode; space_factor:=1000;
24997  end;
24998end;
24999
25000@ The three discretionary lists are constructed somewhat as if they were
25001hboxes. A~subroutine called |build_discretionary| handles the transitions.
25002(This is sort of fun.)
25003
25004@<Cases of |handle...@>=
25005disc_group: build_discretionary;
25006
25007@ @<Declare act...@>=
25008procedure build_discretionary;
25009label done,exit;
25010var p,@!q:pointer; {for link manipulation}
25011@!n:integer; {length of discretionary list}
25012begin unsave;
25013@<Prune the current list, if necessary, until it contains only
25014  |char_node|, |kern_node|, |hlist_node|, |vlist_node|, |rule_node|,
25015  and |ligature_node| items; set |n| to the length of the list,
25016  and set |q| to the list's tail@>;
25017p:=link(head); pop_nest;
25018case saved(-1) of
250190:pre_break(tail):=p;
250201:post_break(tail):=p;
250212:@<Attach list |p| to the current list, and record its length;
25022  then finish up and |return|@>;
25023end; {there are no other cases}
25024incr(saved(-1)); new_save_level(disc_group); scan_left_brace;
25025push_nest; mode:=-hmode; space_factor:=1000;
25026exit:end;
25027
25028@ @<Attach list |p| to the current...@>=
25029begin if (n>0)and(abs(mode)=mmode) then
25030  begin print_err("Illegal math "); print_esc("discretionary");
25031@.Illegal math \\disc...@>
25032  help2("Sorry: The third part of a discretionary break must be")@/
25033  ("empty, in math formulas. I had to delete your third part.");
25034  flush_node_list(p); n:=0; error;
25035  end
25036else link(tail):=p;
25037if n<=max_quarterword then replace_count(tail):=n
25038else  begin print_err("Discretionary list is too long");
25039@.Discretionary list is too long@>
25040  help2("Wow---I never thought anybody would tweak me here.")@/
25041  ("You can't seriously need such a huge discretionary list?");
25042  error;
25043  end;
25044if n>0 then tail:=q;
25045decr(save_ptr); return;
25046end
25047
25048@ During this loop, |p=link(q)| and there are |n| items preceding |p|.
25049
25050@<Prune the current list, if necessary...@>=
25051q:=head; p:=link(q); n:=0;
25052while p<>null do
25053  begin if not is_char_node(p) then if type(p)>rule_node then
25054    if type(p)<>kern_node then if type(p)<>ligature_node then
25055    if (type(p)<>whatsit_node) or ((subtype(p)<>native_word_node)
25056                                     and (subtype(p)<>glyph_node)) then
25057      begin print_err("Improper discretionary list");
25058@.Improper discretionary list@>
25059      help1("Discretionary lists must contain only boxes and kerns.");@/
25060      error;
25061      begin_diagnostic;
25062      print_nl("The following discretionary sublist has been deleted:");
25063@.The following...deleted@>
25064      show_box(p);
25065      end_diagnostic(true);
25066      flush_node_list(p); link(q):=null; goto done;
25067      end;
25068  q:=p; p:=link(q); incr(n);
25069  end;
25070done:
25071
25072@ We need only one more thing to complete the horizontal mode routines, namely
25073the \.{\\accent} primitive.
25074
25075@<Cases of |main_control| that build...@>=
25076hmode+accent: make_accent;
25077
25078@ The positioning of accents is straightforward but tedious. Given an accent
25079of width |a|, designed for characters of height |x| and slant |s|;
25080and given a character of width |w|, height |h|, and slant |t|: We will shift
25081the accent down by |x-h|, and we will insert kern nodes that have the effect of
25082centering the accent over the character and shifting the accent to the
25083right by $\delta={1\over2}(w-a)+h\cdot t-x\cdot s$.  If either character is
25084absent from the font, we will simply use the other, without shifting.
25085
25086@<Declare act...@>=
25087procedure make_accent;
25088var s,@!t: real; {amount of slant}
25089@!p,@!q,@!r:pointer; {character, box, and kern nodes}
25090@!f:internal_font_number; {relevant font}
25091@!a,@!h,@!x,@!w,@!delta,@!lsb,@!rsb:scaled; {heights and widths, as explained above}
25092@!i:four_quarters; {character information}
25093begin scan_char_num; f:=cur_font; p:=new_character(f,cur_val);
25094if p<>null then
25095  begin x:=x_height(f); s:=slant(f)/float_constant(65536);
25096@^real division@>
25097  if is_native_font(f) then
25098    begin a:=width(p);
25099    if a=0 then get_native_char_sidebearings(f, cur_val, addressof(lsb), addressof(rsb))
25100    end
25101  else a:=char_width(f)(char_info(f)(character(p)));@/
25102  do_assignments;@/
25103  @<Create a character node |q| for the next character,
25104    but set |q:=null| if problems arise@>;
25105  if q<>null then @<Append the accent with appropriate kerns,
25106      then set |p:=q|@>;
25107  link(tail):=p; tail:=p; space_factor:=1000;
25108  end;
25109end;
25110
25111@ @<Create a character node |q| for the next...@>=
25112q:=null; f:=cur_font;
25113if (cur_cmd=letter)or(cur_cmd=other_char)or(cur_cmd=char_given) then
25114  begin q:=new_character(f,cur_chr); cur_val:=cur_chr
25115  end
25116else if cur_cmd=char_num then
25117  begin scan_char_num; q:=new_character(f,cur_val);
25118  end
25119else back_input
25120
25121@ The kern nodes appended here must be distinguished from other kerns, lest
25122they be wiped away by the hyphenation algorithm or by a previous line break.
25123
25124The two kerns are computed with (machine-dependent) |real| arithmetic, but
25125their sum is machine-independent; the net effect is machine-independent,
25126because the user cannot remove these nodes nor access them via \.{\\lastkern}.
25127
25128@<Append the accent with appropriate kerns...@>=
25129begin t:=slant(f)/float_constant(65536);
25130@^real division@>
25131if is_native_font(f) then begin
25132  w:=width(q);
25133  get_native_char_height_depth(f, cur_val, addressof(h), addressof(delta))
25134    {using delta as scratch space for the unneeded depth value}
25135end else begin
25136  i:=char_info(f)(character(q));
25137  w:=char_width(f)(i); h:=char_height(f)(height_depth(i))
25138end;
25139if h<>x then {the accent must be shifted up or down}
25140  begin p:=hpack(p,natural); shift_amount(p):=x-h;
25141  end;
25142if is_native_font(f) and (a=0) then { special case for non-spacing marks }
25143  delta:=round((w-lsb+rsb)/float_constant(2)+h*t-x*s)
25144else delta:=round((w-a)/float_constant(2)+h*t-x*s);
25145@^real multiplication@>
25146@^real addition@>
25147r:=new_kern(delta); subtype(r):=acc_kern; link(tail):=r; link(r):=p;
25148tail:=new_kern(-a-delta); subtype(tail):=acc_kern; link(p):=tail; p:=q;
25149end
25150
25151@ When `\.{\\cr}' or `\.{\\span}' or a tab mark comes through the scanner
25152into |main_control|, it might be that the user has foolishly inserted
25153one of them into something that has nothing to do with alignment. But it is
25154far more likely that a left brace or right brace has been omitted, since
25155|get_next| takes actions appropriate to alignment only when `\.{\\cr}'
25156or `\.{\\span}' or tab marks occur with |align_state=0|. The following
25157program attempts to make an appropriate recovery.
25158
25159@<Cases of |main_control| that build...@>=
25160any_mode(car_ret), any_mode(tab_mark): align_error;
25161any_mode(no_align): no_align_error;
25162any_mode(omit): omit_error;
25163
25164@ @<Declare act...@>=
25165procedure align_error;
25166begin if abs(align_state)>2 then
25167  @<Express consternation over the fact that no alignment is in progress@>
25168else  begin back_input;
25169  if align_state<0 then
25170    begin print_err("Missing { inserted");
25171@.Missing \{ inserted@>
25172    incr(align_state); cur_tok:=left_brace_token+"{";
25173    end
25174  else  begin print_err("Missing } inserted");
25175@.Missing \} inserted@>
25176    decr(align_state); cur_tok:=right_brace_token+"}";
25177    end;
25178  help3("I've put in what seems to be necessary to fix")@/
25179    ("the current column of the current alignment.")@/
25180    ("Try to go on, since this might almost work."); ins_error;
25181  end;
25182end;
25183
25184@ @<Express consternation...@>=
25185begin print_err("Misplaced "); print_cmd_chr(cur_cmd,cur_chr);
25186@.Misplaced \&@>
25187@.Misplaced \\span@>
25188@.Misplaced \\cr@>
25189if cur_tok=tab_token+"&" then
25190  begin help6("I can't figure out why you would want to use a tab mark")@/
25191  ("here. If you just want an ampersand, the remedy is")@/
25192  ("simple: Just type `I\&' now. But if some right brace")@/
25193  ("up above has ended a previous alignment prematurely,")@/
25194  ("you're probably due for more error messages, and you")@/
25195  ("might try typing `S' now just to see what is salvageable.");
25196  end
25197else  begin help5("I can't figure out why you would want to use a tab mark")@/
25198  ("or \cr or \span just now. If something like a right brace")@/
25199  ("up above has ended a previous alignment prematurely,")@/
25200  ("you're probably due for more error messages, and you")@/
25201  ("might try typing `S' now just to see what is salvageable.");
25202  end;
25203error;
25204end
25205
25206@ The help messages here contain a little white lie, since \.{\\noalign}
25207and \.{\\omit} are allowed also after `\.{\\noalign\{...\}}'.
25208
25209@<Declare act...@>=
25210procedure no_align_error;
25211begin print_err("Misplaced "); print_esc("noalign");
25212@.Misplaced \\noalign@>
25213help2("I expect to see \noalign only after the \cr of")@/
25214  ("an alignment. Proceed, and I'll ignore this case."); error;
25215end;
25216procedure omit_error;
25217begin print_err("Misplaced "); print_esc("omit");
25218@.Misplaced \\omit@>
25219help2("I expect to see \omit only after tab marks or the \cr of")@/
25220  ("an alignment. Proceed, and I'll ignore this case."); error;
25221end;
25222
25223@ We've now covered most of the abuses of \.{\\halign} and \.{\\valign}.
25224Let's take a look at what happens when they are used correctly.
25225
25226@<Cases of |main_control| that build...@>=
25227vmode+halign:init_align;
25228hmode+valign:@<Cases of |main_control| for |hmode+valign|@>@; init_align;
25229mmode+halign: if privileged then
25230  if cur_group=math_shift_group then init_align
25231  else off_save;
25232vmode+endv,hmode+endv: do_endv;
25233
25234@ An |align_group| code is supposed to remain on the |save_stack|
25235during an entire alignment, until |fin_align| removes it.
25236
25237A devious user might force an |endv| command to occur just about anywhere;
25238we must defeat such hacks.
25239
25240@<Declare act...@>=
25241procedure do_endv;
25242begin base_ptr:=input_ptr; input_stack[base_ptr]:=cur_input;
25243while (input_stack[base_ptr].index_field<>v_template) and
25244      (input_stack[base_ptr].loc_field=null) and
25245      (input_stack[base_ptr].state_field=token_list) do decr(base_ptr);
25246if (input_stack[base_ptr].index_field<>v_template) or
25247      (input_stack[base_ptr].loc_field<>null) or
25248      (input_stack[base_ptr].state_field<>token_list) then
25249  fatal_error("(interwoven alignment preambles are not allowed)");
25250@.interwoven alignment preambles...@>
25251 if cur_group=align_group then
25252  begin end_graf;
25253  if fin_col then fin_row;
25254  end
25255else off_save;
25256end;
25257
25258@ @<Cases of |handle_right_brace|...@>=
25259align_group: begin back_input; cur_tok:=cs_token_flag+frozen_cr;
25260  print_err("Missing "); print_esc("cr"); print(" inserted");
25261@.Missing \\cr inserted@>
25262  help1("I'm guessing that you meant to end an alignment here.");
25263  ins_error;
25264  end;
25265
25266@ @<Cases of |handle_right_brace|...@>=
25267no_align_group: begin end_graf; unsave; align_peek;
25268  end;
25269
25270@ Finally, \.{\\endcsname} is not supposed to get through to |main_control|.
25271
25272@<Cases of |main_control| that build...@>=
25273any_mode(end_cs_name): cs_error;
25274
25275@ @<Declare act...@>=
25276procedure cs_error;
25277begin print_err("Extra "); print_esc("endcsname");
25278@.Extra \\endcsname@>
25279help1("I'm ignoring this, since I wasn't doing a \csname.");
25280error;
25281end;
25282
25283@* \[48] Building math lists.
25284The routines that \TeX\ uses to create mlists are similar to those we have
25285just seen for the generation of hlists and vlists. But it is necessary to
25286make ``noads'' as well as nodes, so the reader should review the
25287discussion of math mode data structures before trying to make sense out of
25288the following program.
25289
25290Here is a little routine that needs to be done whenever a subformula
25291is about to be processed. The parameter is a code like |math_group|.
25292
25293@<Declare act...@>=
25294procedure push_math(@!c:group_code);
25295begin push_nest; mode:=-mmode; incompleat_noad:=null; new_save_level(c);
25296end;
25297
25298@ We get into math mode from horizontal mode when a `\.\$' (i.e., a
25299|math_shift| character) is scanned. We must check to see whether this
25300`\.\$' is immediately followed by another, in case display math mode is
25301called for.
25302
25303@<Cases of |main_control| that build...@>=
25304hmode+math_shift:init_math;
25305
25306@ @<Declare act...@>=
25307@t\4@>@<Declare subprocedures for |init_math|@>@;
25308procedure init_math;
25309label reswitch,found,not_found,done;
25310var w:scaled; {new or partial |pre_display_size|}
25311@!j:pointer; {prototype box for display}
25312@!x:integer; {new |pre_display_direction|}
25313@!l:scaled; {new |display_width|}
25314@!s:scaled; {new |display_indent|}
25315@!p:pointer; {current node when calculating |pre_display_size|}
25316@!q:pointer; {glue specification when calculating |pre_display_size|}
25317@!f:internal_font_number; {font in current |char_node|}
25318@!n:integer; {scope of paragraph shape specification}
25319@!v:scaled; {|w| plus possible glue amount}
25320@!d:scaled; {increment to |v|}
25321begin get_token; {|get_x_token| would fail on \.{\\ifmmode}\thinspace!}
25322if (cur_cmd=math_shift)and(mode>0) then @<Go into display math mode@>
25323else  begin back_input; @<Go into ordinary math mode@>;
25324  end;
25325end;
25326
25327@ @<Go into ordinary math mode@>=
25328begin push_math(math_shift_group); eq_word_define(int_base+cur_fam_code,-1);
25329if every_math<>null then begin_token_list(every_math,every_math_text);
25330end
25331
25332@ We get into ordinary math mode from display math mode when `\.{\\eqno}' or
25333`\.{\\leqno}' appears. In such cases |cur_chr| will be 0 or~1, respectively;
25334the value of |cur_chr| is placed onto |save_stack| for safe keeping.
25335
25336@<Cases of |main_control| that build...@>=
25337mmode+eq_no: if privileged then
25338  if cur_group=math_shift_group then start_eq_no
25339  else off_save;
25340
25341@ @<Put each...@>=
25342primitive("eqno",eq_no,0);
25343@!@:eq_no_}{\.{\\eqno} primitive@>
25344primitive("leqno",eq_no,1);
25345@!@:leq_no_}{\.{\\leqno} primitive@>
25346
25347@ When \TeX\ is in display math mode, |cur_group=math_shift_group|,
25348so it is not necessary for the |start_eq_no| procedure to test for
25349this condition.
25350
25351@<Declare act...@>=
25352procedure start_eq_no;
25353begin saved(0):=cur_chr; incr(save_ptr);
25354@<Go into ordinary math mode@>;
25355end;
25356
25357@ @<Cases of |print_cmd_chr|...@>=
25358eq_no:if chr_code=1 then print_esc("leqno")@+else print_esc("eqno");
25359
25360@ @<Forbidden...@>=non_math(eq_no),
25361
25362@ When we enter display math mode, we need to call |line_break| to
25363process the partial paragraph that has just been interrupted by the
25364display. Then we can set the proper values of |display_width| and
25365|display_indent| and |pre_display_size|.
25366
25367@<Go into display math mode@>=
25368begin j:=null; w:=-max_dimen;
25369if head=tail then {`\.{\\noindent\$\$}' or `\.{\$\${ }\$\$}'}
25370  @<Prepare for display after an empty paragraph@>
25371else  begin line_break(true);@/
25372  @<Calculate the natural width, |w|, by which the characters of the
25373    final line extend to the right of the reference point,
25374    plus two ems; or set |w:=max_dimen| if the non-blank information
25375    on that line is affected by stretching or shrinking@>;
25376  end;
25377{now we are in vertical mode, working on the list that will contain the display}
25378@<Calculate the length, |l|, and the shift amount, |s|, of the display lines@>;
25379push_math(math_shift_group); mode:=mmode;
25380eq_word_define(int_base+cur_fam_code,-1);@/
25381eq_word_define(dimen_base+pre_display_size_code,w);
25382LR_box:=j;
25383if eTeX_ex then eq_word_define(int_base+pre_display_direction_code,x);
25384eq_word_define(dimen_base+display_width_code,l);
25385eq_word_define(dimen_base+display_indent_code,s);
25386if every_display<>null then begin_token_list(every_display,every_display_text);
25387if nest_ptr=1 then build_page;
25388end
25389
25390@ @<Calculate the natural width, |w|, by which...@>=
25391@<Prepare for display after a non-empty paragraph@>;
25392while p<>null do
25393  begin @<Let |d| be the natural width of node |p|;
25394    if the node is ``visible,'' |goto found|;
25395    if the node is glue that stretches or shrinks, set |v:=max_dimen|@>;
25396  if v<max_dimen then v:=v+d;
25397  goto not_found;
25398  found: if v<max_dimen then
25399    begin v:=v+d; w:=v;
25400    end
25401  else  begin w:=max_dimen; goto done;
25402    end;
25403  not_found: p:=link(p);
25404  end;
25405done:
25406@<Finish the natural width computation@>
25407
25408@ @<Let |d| be the natural width of node |p|...@>=
25409reswitch: if is_char_node(p) then
25410  begin f:=font(p); d:=char_width(f)(char_info(f)(character(p)));
25411  goto found;
25412  end;
25413case type(p) of
25414hlist_node,vlist_node,rule_node: begin d:=width(p); goto found;
25415  end;
25416ligature_node:@<Make node |p| look like a |char_node|...@>;
25417kern_node: d:=width(p);
25418margin_kern_node: d:=width(p);
25419@t\4@>@<Cases of `Let |d| be the natural width' that need special treatment@>@;
25420glue_node:@<Let |d| be the natural width of this glue; if stretching
25421  or shrinking, set |v:=max_dimen|; |goto found| in the case of leaders@>;
25422whatsit_node: @<Let |d| be the width of the whatsit |p|, and |goto found| if ``visible''@>;
25423othercases d:=0
25424endcases
25425
25426@ We need to be careful that |w|, |v|, and |d| do not depend on any |glue_set|
25427values, since such values are subject to system-dependent rounding.
25428System-dependent numbers are not allowed to infiltrate parameters like
25429|pre_display_size|, since \TeX82 is supposed to make the same decisions on all
25430machines.
25431
25432@<Let |d| be the natural width of this glue...@>=
25433begin q:=glue_ptr(p); d:=width(q);
25434if glue_sign(just_box)=stretching then
25435  begin if (glue_order(just_box)=stretch_order(q))and@|
25436     (stretch(q)<>0) then
25437    v:=max_dimen;
25438  end
25439else if glue_sign(just_box)=shrinking then
25440  begin if (glue_order(just_box)=shrink_order(q))and@|
25441     (shrink(q)<>0) then
25442    v:=max_dimen;
25443  end;
25444if subtype(p)>=a_leaders then goto found;
25445end
25446
25447@ A displayed equation is considered to be three lines long, so we
25448calculate the length and offset of line number |prev_graf+2|.
25449
25450@<Calculate the length, |l|, ...@>=
25451if par_shape_ptr=null then
25452  if (hang_indent<>0)and@|
25453   (((hang_after>=0)and(prev_graf+2>hang_after))or@|
25454    (prev_graf+1<-hang_after)) then
25455    begin l:=hsize-abs(hang_indent);
25456    if hang_indent>0 then s:=hang_indent@+else s:=0;
25457    end
25458  else  begin l:=hsize; s:=0;
25459    end
25460else  begin n:=info(par_shape_ptr);
25461  if prev_graf+2>=n then p:=par_shape_ptr+2*n
25462  else p:=par_shape_ptr+2*(prev_graf+2);
25463  s:=mem[p-1].sc; l:=mem[p].sc;
25464  end
25465
25466@ Subformulas of math formulas cause a new level of math mode to be entered,
25467on the semantic nest as well as the save stack. These subformulas arise in
25468several ways: (1)~A left brace by itself indicates the beginning of a
25469subformula that will be put into a box, thereby freezing its glue and
25470preventing line breaks. (2)~A subscript or superscript is treated as a
25471subformula if it is not a single character; the same applies to
25472the nucleus of things like \.{\\underline}. (3)~The \.{\\left} primitive
25473initiates a subformula that will be terminated by a matching \.{\\right}.
25474The group codes placed on |save_stack| in these three cases are
25475|math_group|, |math_group|, and |math_left_group|, respectively.
25476
25477Here is the code that handles case (1); the other cases are not quite as
25478trivial, so we shall consider them later.
25479
25480@<Cases of |main_control| that build...@>=
25481mmode+left_brace: begin tail_append(new_noad);
25482  back_input; scan_math(nucleus(tail));
25483  end;
25484
25485@ Recall that the |nucleus|, |subscr|, and |supscr| fields in a noad are
25486broken down into subfields called |math_type| and either |info| or
25487|(fam,character)|. The job of |scan_math| is to figure out what to place
25488in one of these principal fields; it looks at the subformula that
25489comes next in the input, and places an encoding of that subformula
25490into a given word of |mem|.
25491
25492@d fam_in_range==((cur_fam>=0)and(cur_fam<number_math_families))
25493
25494@<Declare act...@>=
25495procedure scan_math(@!p:pointer);
25496label restart,reswitch,exit;
25497var c:integer; {math character code}
25498begin restart:@<Get the next non-blank non-relax...@>;
25499reswitch:case cur_cmd of
25500letter,other_char,char_given: begin c:=ho(math_code(cur_chr));
25501    if is_active_math_char(c) then
25502      begin @<Treat |cur_chr| as an active character@>;
25503      goto restart;
25504      end;
25505    end;
25506char_num: begin scan_char_num; cur_chr:=cur_val; cur_cmd:=char_given;
25507  goto reswitch;
25508  end;
25509math_char_num:
25510  if cur_chr = 2 then begin {\.{\\Umathchar}}
25511    scan_math_class_int; c:=set_class_field(cur_val);
25512    scan_math_fam_int;   c:=c + set_family_field(cur_val);
25513    scan_usv_num;        c:=c + cur_val;
25514  end else if cur_chr = 1 then begin {\.{\\Umathcharnum}}
25515    scan_xetex_math_char_int; c:=cur_val;
25516  end else begin scan_fifteen_bit_int;
25517    c:=set_class_field(cur_val div @"1000) +
25518           set_family_field((cur_val mod @"1000) div @"100) +
25519           (cur_val mod @"100);
25520  end;
25521math_given: begin
25522  c:=set_class_field(cur_chr div @"1000) +
25523       set_family_field((cur_chr mod @"1000) div @"100) +
25524       (cur_chr mod @"100);
25525  end;
25526XeTeX_math_given: c:=cur_chr;
25527delim_num: begin
25528  if cur_chr=1 then begin {\.{\\Udelimiter <class> <fam> <usv>}}
25529    scan_math_class_int; c:=set_class_field(cur_val);
25530    scan_math_fam_int;   c:=c + set_family_field(cur_val);
25531    scan_usv_num;        c:=c + cur_val;
25532  end else begin {\.{\\delimiter <27-bit delcode>}}
25533    scan_delimiter_int;
25534    c:=cur_val div @'10000; {get the `small' delimiter field}
25535    c:=set_class_field(c div @"1000) +
25536       set_family_field((c mod @"1000) div @"100) +
25537       (c mod @"100); {and convert it to a \XeTeX\ mathchar code}
25538  end;
25539  end;
25540othercases @<Scan a subformula enclosed in braces and |return|@>
25541endcases;@/
25542math_type(p):=math_char; character(p):=qi(c mod @"10000);
25543if (is_var_family(c)) and fam_in_range then plane_and_fam_field(p):=cur_fam
25544else plane_and_fam_field(p):=(math_fam_field(c));
25545plane_and_fam_field(p):=plane_and_fam_field(p) + (math_char_field(c) div @"10000) * @"100;
25546exit:end;
25547
25548@ An active character that is an |outer_call| is allowed here.
25549
25550@<Treat |cur_chr|...@>=
25551begin cur_cs:=cur_chr+active_base;
25552cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs);
25553x_token; back_input;
25554end
25555
25556@ The pointer |p| is placed on |save_stack| while a complex subformula
25557is being scanned.
25558
25559@<Scan a subformula...@>=
25560begin back_input; scan_left_brace;@/
25561saved(0):=p; incr(save_ptr); push_math(math_group); return;
25562end
25563
25564@ The simplest math formula is, of course, `\.{\${ }\$}', when no noads are
25565generated. The next simplest cases involve a single character, e.g.,
25566`\.{\$x\$}'. Even though such cases may not seem to be very interesting,
25567the reader can perhaps understand how happy the author was when `\.{\$x\$}'
25568was first properly typeset by \TeX. The code in this section was used.
25569@^Knuth, Donald Ervin@>
25570
25571@<Cases of |main_control| that build...@>=
25572mmode+letter,mmode+other_char,mmode+char_given:
25573  set_math_char(ho(math_code(cur_chr)));
25574mmode+char_num: begin scan_char_num; cur_chr:=cur_val;
25575  set_math_char(ho(math_code(cur_chr)));
25576  end;
25577mmode+math_char_num: if cur_chr = 2 then begin {\.{\\Umathchar}}
25578    scan_math_class_int; t:=set_class_field(cur_val);
25579    scan_math_fam_int; t:=t + set_family_field(cur_val);
25580    scan_usv_num; t:=t + cur_val;
25581    set_math_char(t);
25582  end else if cur_chr = 1 then begin {\.{\\Umathcharnum}}
25583    scan_xetex_math_char_int; set_math_char(cur_val);
25584  end else begin scan_fifteen_bit_int;
25585    set_math_char(set_class_field(cur_val div @"1000) +
25586           set_family_field((cur_val mod @"1000) div @"100) +
25587           (cur_val mod @"100));
25588  end;
25589mmode+math_given: begin
25590  set_math_char(set_class_field(cur_chr div @"1000) +
25591                set_family_field((cur_chr mod @"1000) div @"100) +
25592                (cur_chr mod @"100));
25593  end;
25594mmode+XeTeX_math_given: set_math_char(cur_chr);
25595mmode+delim_num: begin
25596  if cur_chr=1 then begin {\.{\\Udelimiter}}
25597    scan_math_class_int; t:=set_class_field(cur_val);
25598    scan_math_fam_int; t:=t + set_family_field(cur_val);
25599    scan_usv_num; t:=t + cur_val;
25600    set_math_char(t);
25601  end else begin
25602    scan_delimiter_int;
25603    cur_val:=cur_val div @'10000; {discard the large delimiter code}
25604    set_math_char(set_class_field(cur_val div @"1000) +
25605         set_family_field((cur_val mod @"1000) div @"100) +
25606         (cur_val mod @"100));
25607  end;
25608  end;
25609
25610@ The |set_math_char| procedure creates a new noad appropriate to a given
25611math code, and appends it to the current mlist. However, if the math code
25612is sufficiently large, the |cur_chr| is treated as an active character and
25613nothing is appended.
25614
25615@<Declare act...@>=
25616procedure set_math_char(@!c:integer);
25617var p:pointer; {the new noad}
25618  ch: UnicodeScalar;
25619begin if is_active_math_char(c) then
25620  @<Treat |cur_chr|...@>
25621else  begin p:=new_noad; math_type(nucleus(p)):=math_char;
25622  ch:=math_char_field(c);
25623  character(nucleus(p)):=qi(ch mod @"10000);
25624  plane_and_fam_field(nucleus(p)):=math_fam_field(c);
25625  if is_var_family(c) then
25626    begin if fam_in_range then plane_and_fam_field(nucleus(p)):=cur_fam;
25627    type(p):=ord_noad;
25628    end
25629  else  type(p):=ord_noad+math_class_field(c);
25630  plane_and_fam_field(nucleus(p)):=plane_and_fam_field(nucleus(p)) + (ch div @"10000) * @"100;
25631  link(tail):=p; tail:=p;
25632  end;
25633end;
25634
25635@ Primitive math operators like \.{\\mathop} and \.{\\underline} are given
25636the command code |math_comp|, supplemented by the noad type that they
25637generate.
25638
25639@<Put each...@>=
25640primitive("mathord",math_comp,ord_noad);
25641@!@:math_ord_}{\.{\\mathord} primitive@>
25642primitive("mathop",math_comp,op_noad);
25643@!@:math_op_}{\.{\\mathop} primitive@>
25644primitive("mathbin",math_comp,bin_noad);
25645@!@:math_bin_}{\.{\\mathbin} primitive@>
25646primitive("mathrel",math_comp,rel_noad);
25647@!@:math_rel_}{\.{\\mathrel} primitive@>
25648primitive("mathopen",math_comp,open_noad);
25649@!@:math_open_}{\.{\\mathopen} primitive@>
25650primitive("mathclose",math_comp,close_noad);
25651@!@:math_close_}{\.{\\mathclose} primitive@>
25652primitive("mathpunct",math_comp,punct_noad);
25653@!@:math_punct_}{\.{\\mathpunct} primitive@>
25654primitive("mathinner",math_comp,inner_noad);
25655@!@:math_inner_}{\.{\\mathinner} primitive@>
25656primitive("underline",math_comp,under_noad);
25657@!@:underline_}{\.{\\underline} primitive@>
25658primitive("overline",math_comp,over_noad);@/
25659@!@:overline_}{\.{\\overline} primitive@>
25660primitive("displaylimits",limit_switch,normal);
25661@!@:display_limits_}{\.{\\displaylimits} primitive@>
25662primitive("limits",limit_switch,limits);
25663@!@:limits_}{\.{\\limits} primitive@>
25664primitive("nolimits",limit_switch,no_limits);
25665@!@:no_limits_}{\.{\\nolimits} primitive@>
25666
25667@ @<Cases of |print_cmd_chr|...@>=
25668math_comp: case chr_code of
25669  ord_noad: print_esc("mathord");
25670  op_noad: print_esc("mathop");
25671  bin_noad: print_esc("mathbin");
25672  rel_noad: print_esc("mathrel");
25673  open_noad: print_esc("mathopen");
25674  close_noad: print_esc("mathclose");
25675  punct_noad: print_esc("mathpunct");
25676  inner_noad: print_esc("mathinner");
25677  under_noad: print_esc("underline");
25678  othercases print_esc("overline")
25679  endcases;
25680limit_switch: if chr_code=limits then print_esc("limits")
25681  else if chr_code=no_limits then print_esc("nolimits")
25682  else print_esc("displaylimits");
25683
25684@ @<Cases of |main_control| that build...@>=
25685mmode+math_comp: begin tail_append(new_noad);
25686  type(tail):=cur_chr; scan_math(nucleus(tail));
25687  end;
25688mmode+limit_switch: math_limit_switch;
25689
25690@ @<Declare act...@>=
25691procedure math_limit_switch;
25692label exit;
25693begin if head<>tail then if type(tail)=op_noad then
25694  begin subtype(tail):=cur_chr; return;
25695  end;
25696print_err("Limit controls must follow a math operator");
25697@.Limit controls must follow...@>
25698help1("I'm ignoring this misplaced \limits or \nolimits command."); error;
25699exit:end;
25700
25701@ Delimiter fields of noads are filled in by the |scan_delimiter| routine.
25702The first parameter of this procedure is the |mem| address where the
25703delimiter is to be placed; the second tells if this delimiter follows
25704\.{\\radical} or not.
25705
25706@<Declare act...@>=
25707procedure scan_delimiter(@!p:pointer;@!r:boolean);
25708begin
25709  if r then begin
25710    if cur_chr=1 then begin {\.{\\Uradical}}
25711      cur_val1:=@"40000000; {extended delimiter code flag}
25712      scan_math_fam_int;   cur_val1:=cur_val1 + cur_val * @"200000;
25713      scan_usv_num;        cur_val:=cur_val1 + cur_val;
25714    end else {radical}
25715      scan_delimiter_int;
25716  end
25717else  begin @<Get the next non-blank non-relax...@>;
25718  case cur_cmd of
25719  letter,other_char: begin
25720    cur_val:=del_code(cur_chr);
25721    end;
25722  delim_num: if cur_chr=1 then begin {\.{\\Udelimiter}}
25723    cur_val1:=@"40000000; {extended delimiter code flag}
25724    scan_math_class_int; {discarded}
25725    scan_math_fam_int;   cur_val1:=cur_val1 + cur_val * @"200000;
25726    scan_usv_num;        cur_val:=cur_val1 + cur_val;
25727  end else scan_delimiter_int; {normal delimiter}
25728  othercases begin cur_val:=-1; end;
25729  endcases;
25730  end;
25731if cur_val<0 then begin @<Report that an invalid delimiter code is being changed
25732   to null; set~|cur_val:=0|@>;
25733  end;
25734if cur_val>=@"40000000 then begin {extended delimiter code, only one size}
25735  small_plane_and_fam_field(p):=((cur_val mod @"200000) div @"10000) * @"100 {plane}
25736                                  + (cur_val div @"200000) mod @"100; {family}
25737  small_char_field(p):=qi(cur_val mod @"10000);
25738  large_plane_and_fam_field(p):=0;
25739  large_char_field(p):=0;
25740end else begin {standard delimiter code, 4-bit families and 8-bit char codes}
25741  small_plane_and_fam_field(p):=(cur_val div @'4000000) mod 16;
25742  small_char_field(p):=qi((cur_val div @'10000) mod 256);
25743  large_plane_and_fam_field(p):=(cur_val div 256) mod 16;
25744  large_char_field(p):=qi(cur_val mod 256);
25745end;
25746end;
25747
25748@ @<Report that an invalid delimiter...@>=
25749begin print_err("Missing delimiter (. inserted)");
25750@.Missing delimiter...@>
25751help6("I was expecting to see something like `(' or `\{' or")@/
25752  ("`\}' here. If you typed, e.g., `{' instead of `\{', you")@/
25753  ("should probably delete the `{' by typing `1' now, so that")@/
25754  ("braces don't get unbalanced. Otherwise just proceed.")@/
25755  ("Acceptable delimiters are characters whose \delcode is")@/
25756  ("nonnegative, or you can use `\delimiter <delimiter code>'.");
25757back_error; cur_val:=0;
25758end
25759
25760@ @<Cases of |main_control| that build...@>=
25761mmode+radical:math_radical;
25762
25763@ @<Declare act...@>=
25764procedure math_radical;
25765begin tail_append(get_node(radical_noad_size));
25766type(tail):=radical_noad; subtype(tail):=normal;
25767mem[nucleus(tail)].hh:=empty_field;
25768mem[subscr(tail)].hh:=empty_field;
25769mem[supscr(tail)].hh:=empty_field;
25770scan_delimiter(left_delimiter(tail),true); scan_math(nucleus(tail));
25771end;
25772
25773@ @<Cases of |main_control| that build...@>=
25774mmode+accent,mmode+math_accent:math_ac;
25775
25776@ @<Declare act...@>=
25777procedure math_ac;
25778var c: integer;
25779begin if cur_cmd=accent then
25780  @<Complain that the user should have said \.{\\mathaccent}@>;
25781tail_append(get_node(accent_noad_size));
25782type(tail):=accent_noad; subtype(tail):=normal;
25783mem[nucleus(tail)].hh:=empty_field;
25784mem[subscr(tail)].hh:=empty_field;
25785mem[supscr(tail)].hh:=empty_field;
25786math_type(accent_chr(tail)):=math_char;
25787if cur_chr=1 then begin
25788  if scan_keyword("fixed") then
25789    subtype(tail):=fixed_acc
25790  else if scan_keyword("bottom") then begin
25791    if scan_keyword("fixed") then
25792      subtype(tail):=bottom_acc+fixed_acc
25793    else
25794      subtype(tail):=bottom_acc;
25795  end;
25796  scan_math_class_int; c:=set_class_field(cur_val);
25797  scan_math_fam_int;   c:=c + set_family_field(cur_val);
25798  scan_usv_num;        cur_val:=cur_val + c;
25799end
25800else begin
25801  scan_fifteen_bit_int;
25802  cur_val:=set_class_field(cur_val div @"1000) +
25803             set_family_field((cur_val mod @"1000) div @"100) +
25804             (cur_val mod @"100);
25805end;
25806character(accent_chr(tail)):=qi(cur_val mod @"10000);
25807if (is_var_family(cur_val))and fam_in_range then plane_and_fam_field(accent_chr(tail)):=cur_fam
25808else plane_and_fam_field(accent_chr(tail)):=math_fam_field(cur_val);
25809plane_and_fam_field(accent_chr(tail))
25810 :=plane_and_fam_field(accent_chr(tail)) + (math_char_field(cur_val) div @"10000) * @"100;
25811scan_math(nucleus(tail));
25812end;
25813
25814@ @<Complain that the user should have said \.{\\mathaccent}@>=
25815begin print_err("Please use "); print_esc("mathaccent");
25816print(" for accents in math mode");
25817@.Please use \\mathaccent...@>
25818help2("I'm changing \accent to \mathaccent here; wish me luck.")@/
25819  ("(Accents are not the same in formulas as they are in text.)");
25820error;
25821end
25822
25823@ @<Cases of |main_control| that build...@>=
25824mmode+vcenter: begin scan_spec(vcenter_group,false); normal_paragraph;
25825  push_nest; mode:=-vmode; prev_depth:=ignore_depth;
25826  if every_vbox<>null then begin_token_list(every_vbox,every_vbox_text);
25827  end;
25828
25829@ @<Cases of |handle...@>=
25830vcenter_group: begin end_graf; unsave; save_ptr:=save_ptr-2;
25831  p:=vpack(link(head),saved(1),saved(0)); pop_nest;
25832  tail_append(new_noad); type(tail):=vcenter_noad;
25833  math_type(nucleus(tail)):=sub_box; info(nucleus(tail)):=p;
25834  end;
25835
25836@ The routine that inserts a |style_node| holds no surprises.
25837
25838@<Put each...@>=
25839primitive("displaystyle",math_style,display_style);
25840@!@:display_style_}{\.{\\displaystyle} primitive@>
25841primitive("textstyle",math_style,text_style);
25842@!@:text_style_}{\.{\\textstyle} primitive@>
25843primitive("scriptstyle",math_style,script_style);
25844@!@:script_style_}{\.{\\scriptstyle} primitive@>
25845primitive("scriptscriptstyle",math_style,script_script_style);
25846@!@:script_script_style_}{\.{\\scriptscriptstyle} primitive@>
25847
25848@ @<Cases of |print_cmd_chr|...@>=
25849math_style: print_style(chr_code);
25850
25851@ @<Cases of |main_control| that build...@>=
25852mmode+math_style: tail_append(new_style(cur_chr));
25853mmode+non_script: begin tail_append(new_glue(zero_glue));
25854  subtype(tail):=cond_math_glue;
25855  end;
25856mmode+math_choice: append_choices;
25857
25858@ The routine that scans the four mlists of a \.{\\mathchoice} is very
25859much like the routine that builds discretionary nodes.
25860
25861@<Declare act...@>=
25862procedure append_choices;
25863begin tail_append(new_choice); incr(save_ptr); saved(-1):=0;
25864push_math(math_choice_group); scan_left_brace;
25865end;
25866
25867@ @<Cases of |handle_right_brace|...@>=
25868math_choice_group: build_choices;
25869
25870@ @<Declare act...@>=
25871@t\4@>@<Declare the function called |fin_mlist|@>@t@>@;@/
25872procedure build_choices;
25873label exit;
25874var p:pointer; {the current mlist}
25875begin unsave; p:=fin_mlist(null);
25876case saved(-1) of
258770:display_mlist(tail):=p;
258781:text_mlist(tail):=p;
258792:script_mlist(tail):=p;
258803:begin script_script_mlist(tail):=p; decr(save_ptr); return;
25881  end;
25882end; {there are no other cases}
25883incr(saved(-1)); push_math(math_choice_group); scan_left_brace;
25884exit:end;
25885
25886@ Subscripts and superscripts are attached to the previous nucleus by the
25887@^superscripts@>@^subscripts@>
25888action procedure called |sub_sup|. We use the facts that |sub_mark=sup_mark+1|
25889and |subscr(p)=supscr(p)+1|.
25890
25891@<Cases of |main_control| that build...@>=
25892mmode+sub_mark,mmode+sup_mark: sub_sup;
25893
25894@ @<Declare act...@>=
25895procedure sub_sup;
25896var t:small_number; {type of previous sub/superscript}
25897@!p:pointer; {field to be filled by |scan_math|}
25898begin t:=empty; p:=null;
25899if tail<>head then if scripts_allowed(tail) then
25900  begin p:=supscr(tail)+cur_cmd-sup_mark; {|supscr| or |subscr|}
25901  t:=math_type(p);
25902  end;
25903if (p=null)or(t<>empty) then @<Insert a dummy noad to be sub/superscripted@>;
25904scan_math(p);
25905end;
25906
25907@ @<Insert a dummy...@>=
25908begin tail_append(new_noad);
25909p:=supscr(tail)+cur_cmd-sup_mark; {|supscr| or |subscr|}
25910if t<>empty then
25911  begin if cur_cmd=sup_mark then
25912    begin print_err("Double superscript");
25913@.Double superscript@>
25914    help1("I treat `x^1^2' essentially like `x^1{}^2'.");
25915    end
25916  else  begin print_err("Double subscript");
25917@.Double subscript@>
25918    help1("I treat `x_1_2' essentially like `x_1{}_2'.");
25919    end;
25920  error;
25921  end;
25922end
25923
25924@ An operation like `\.{\\over}' causes the current mlist to go into a
25925state of suspended animation: |incompleat_noad| points to a |fraction_noad|
25926that contains the mlist-so-far as its numerator, while the denominator
25927is yet to come. Finally when the mlist is finished, the denominator will
25928go into the incompleat fraction noad, and that noad will become the
25929whole formula, unless it is surrounded by `\.{\\left}' and `\.{\\right}'
25930delimiters.
25931
25932@d above_code=0 { `\.{\\above}' }
25933@d over_code=1 { `\.{\\over}' }
25934@d atop_code=2 { `\.{\\atop}' }
25935@d delimited_code=3 { `\.{\\abovewithdelims}', etc.}
25936
25937@<Put each...@>=
25938primitive("above",above,above_code);@/
25939@!@:above_}{\.{\\above} primitive@>
25940primitive("over",above,over_code);@/
25941@!@:over_}{\.{\\over} primitive@>
25942primitive("atop",above,atop_code);@/
25943@!@:atop_}{\.{\\atop} primitive@>
25944primitive("abovewithdelims",above,delimited_code+above_code);@/
25945@!@:above_with_delims_}{\.{\\abovewithdelims} primitive@>
25946primitive("overwithdelims",above,delimited_code+over_code);@/
25947@!@:over_with_delims_}{\.{\\overwithdelims} primitive@>
25948primitive("atopwithdelims",above,delimited_code+atop_code);
25949@!@:atop_with_delims_}{\.{\\atopwithdelims} primitive@>
25950
25951@ @<Cases of |print_cmd_chr|...@>=
25952above: case chr_code of
25953  over_code:print_esc("over");
25954  atop_code:print_esc("atop");
25955  delimited_code+above_code:print_esc("abovewithdelims");
25956  delimited_code+over_code:print_esc("overwithdelims");
25957  delimited_code+atop_code:print_esc("atopwithdelims");
25958  othercases print_esc("above")
25959  endcases;
25960
25961@ @<Cases of |main_control| that build...@>=
25962mmode+above: math_fraction;
25963
25964@ @<Declare act...@>=
25965procedure math_fraction;
25966var c:small_number; {the type of generalized fraction we are scanning}
25967begin c:=cur_chr;
25968if incompleat_noad<>null then
25969  @<Ignore the fraction operation and complain about this ambiguous case@>
25970else  begin incompleat_noad:=get_node(fraction_noad_size);
25971  type(incompleat_noad):=fraction_noad;
25972  subtype(incompleat_noad):=normal;
25973  math_type(numerator(incompleat_noad)):=sub_mlist;
25974  info(numerator(incompleat_noad)):=link(head);
25975  mem[denominator(incompleat_noad)].hh:=empty_field;
25976  mem[left_delimiter(incompleat_noad)].qqqq:=null_delimiter;
25977  mem[right_delimiter(incompleat_noad)].qqqq:=null_delimiter;@/
25978  link(head):=null; tail:=head;
25979  @<Use code |c| to distinguish between generalized fractions@>;
25980  end;
25981end;
25982
25983@ @<Use code |c|...@>=
25984if c>=delimited_code then
25985  begin scan_delimiter(left_delimiter(incompleat_noad),false);
25986  scan_delimiter(right_delimiter(incompleat_noad),false);
25987  end;
25988case c mod delimited_code of
25989above_code: begin scan_normal_dimen;
25990  thickness(incompleat_noad):=cur_val;
25991  end;
25992over_code: thickness(incompleat_noad):=default_code;
25993atop_code: thickness(incompleat_noad):=0;
25994end {there are no other cases}
25995
25996@ @<Ignore the fraction...@>=
25997begin if c>=delimited_code then
25998  begin scan_delimiter(garbage,false); scan_delimiter(garbage,false);
25999  end;
26000if c mod delimited_code=above_code then scan_normal_dimen;
26001print_err("Ambiguous; you need another { and }");
26002@.Ambiguous...@>
26003help3("I'm ignoring this fraction specification, since I don't")@/
26004  ("know whether a construction like `x \over y \over z'")@/
26005  ("means `{x \over y} \over z' or `x \over {y \over z}'.");
26006error;
26007end
26008
26009@ At the end of a math formula or subformula, the |fin_mlist| routine is
26010called upon to return a pointer to the newly completed mlist, and to
26011pop the nest back to the enclosing semantic level. The parameter to
26012|fin_mlist|, if not null, points to a |right_noad| that ends the
26013current mlist; this |right_noad| has not yet been appended.
26014
26015@<Declare the function called |fin_mlist|@>=
26016function fin_mlist(@!p:pointer):pointer;
26017var q:pointer; {the mlist to return}
26018begin if incompleat_noad<>null then @<Compleat the incompleat noad@>
26019else  begin link(tail):=p; q:=link(head);
26020  end;
26021pop_nest; fin_mlist:=q;
26022end;
26023
26024@ @<Compleat...@>=
26025begin math_type(denominator(incompleat_noad)):=sub_mlist;
26026info(denominator(incompleat_noad)):=link(head);
26027if p=null then q:=incompleat_noad
26028else  begin q:=info(numerator(incompleat_noad));
26029  if (type(q)<>left_noad)or(delim_ptr=null) then confusion("right");
26030@:this can't happen right}{\quad right@>
26031  info(numerator(incompleat_noad)):=link(delim_ptr);
26032  link(delim_ptr):=incompleat_noad; link(incompleat_noad):=p;
26033  end;
26034end
26035
26036@ Now at last we're ready to see what happens when a right brace occurs
26037in a math formula. Two special cases are simplified here: Braces are effectively
26038removed when they surround a single Ord without sub/superscripts, or when they
26039surround an accent that is the nucleus of an Ord atom.
26040
26041@<Cases of |handle...@>=
26042math_group: begin unsave; decr(save_ptr);@/
26043  math_type(saved(0)):=sub_mlist; p:=fin_mlist(null); info(saved(0)):=p;
26044  if p<>null then if link(p)=null then
26045   if type(p)=ord_noad then
26046    begin if math_type(subscr(p))=empty then
26047     if math_type(supscr(p))=empty then
26048      begin mem[saved(0)].hh:=mem[nucleus(p)].hh;
26049      free_node(p,noad_size);
26050      end;
26051    end
26052  else if type(p)=accent_noad then if saved(0)=nucleus(tail) then
26053   if type(tail)=ord_noad then @<Replace the tail of the list by |p|@>;
26054  end;
26055
26056@ @<Replace the tail...@>=
26057begin q:=head; while link(q)<>tail do q:=link(q);
26058link(q):=p; free_node(tail,noad_size); tail:=p;
26059end
26060
26061@ We have dealt with all constructions of math mode except `\.{\\left}' and
26062`\.{\\right}', so the picture is completed by the following sections of
26063the program.
26064
26065@<Put each...@>=
26066primitive("left",left_right,left_noad);
26067@!@:left_}{\.{\\left} primitive@>
26068primitive("right",left_right,right_noad);
26069@!@:right_}{\.{\\right} primitive@>
26070text(frozen_right):="right"; eqtb[frozen_right]:=eqtb[cur_val];
26071
26072@ @<Cases of |print_cmd_chr|...@>=
26073left_right: if chr_code=left_noad then print_esc("left")
26074@/@<Cases of |left_right| for |print_cmd_chr|@>@/
26075else print_esc("right");
26076
26077@ @<Cases of |main_control| that build...@>=
26078mmode+left_right: math_left_right;
26079
26080@ @<Declare act...@>=
26081procedure math_left_right;
26082var t:small_number; {|left_noad| or |right_noad|}
26083@!p:pointer; {new noad}
26084@!q:pointer; {resulting mlist}
26085begin t:=cur_chr;
26086if (t<>left_noad)and(cur_group<>math_left_group) then
26087  @<Try to recover from mismatched \.{\\right}@>
26088else  begin p:=new_noad; type(p):=t;
26089  scan_delimiter(delimiter(p),false);
26090  if t=middle_noad then
26091    begin type(p):=right_noad; subtype(p):=middle_noad;
26092    end;
26093  if t=left_noad then q:=p
26094  else  begin q:=fin_mlist(p); unsave; {end of |math_left_group|}
26095    end;
26096  if t<>right_noad then
26097    begin push_math(math_left_group); link(head):=q; tail:=p;
26098    delim_ptr:=p;
26099    end
26100  else  begin
26101    tail_append(new_noad); type(tail):=inner_noad;
26102    math_type(nucleus(tail)):=sub_mlist;
26103    info(nucleus(tail)):=q;
26104    end;
26105  end;
26106end;
26107
26108@ @<Try to recover from mismatch...@>=
26109begin if cur_group=math_shift_group then
26110  begin scan_delimiter(garbage,false);
26111  print_err("Extra ");
26112  if t=middle_noad then
26113    begin print_esc("middle");
26114@.Extra \\middle.@>
26115    help1("I'm ignoring a \middle that had no matching \left.");
26116    end
26117  else  begin print_esc("right");
26118@.Extra \\right.@>
26119    help1("I'm ignoring a \right that had no matching \left.");
26120    end;
26121  error;
26122  end
26123else off_save;
26124end
26125
26126@ Here is the only way out of math mode.
26127
26128@<Cases of |main_control| that build...@>=
26129mmode+math_shift: if cur_group=math_shift_group then after_math
26130  else off_save;
26131
26132@ @<Declare act...@>=
26133@t\4@>@<Declare subprocedures for |after_math|@>@;
26134procedure after_math;
26135var l:boolean; {`\.{\\leqno}' instead of `\.{\\eqno}'}
26136@!danger:boolean; {not enough symbol fonts are present}
26137@!m:integer; {|mmode| or |-mmode|}
26138@!p:pointer; {the formula}
26139@!a:pointer; {box containing equation number}
26140@<Local variables for finishing a displayed formula@>@;
26141begin danger:=false;
26142@<Retrieve the prototype box@>;
26143@<Check that the necessary fonts for math symbols are present;
26144  if not, flush the current math lists and set |danger:=true|@>;
26145m:=mode; l:=false; p:=fin_mlist(null); {this pops the nest}
26146if mode=-m then {end of equation number}
26147  begin @<Check that another \.\$ follows@>;
26148  cur_mlist:=p; cur_style:=text_style; mlist_penalties:=false;
26149  mlist_to_hlist; a:=hpack(link(temp_head),natural);
26150  set_box_lr(a)(dlist);
26151  unsave; decr(save_ptr); {now |cur_group=math_shift_group|}
26152  if saved(0)=1 then l:=true;
26153  danger:=false;
26154  @<Retrieve the prototype box@>;
26155  @<Check that the necessary fonts for math symbols are present;
26156    if not, flush the current math lists and set |danger:=true|@>;
26157  m:=mode; p:=fin_mlist(null);
26158  end
26159else a:=null;
26160if m<0 then @<Finish math in text@>
26161else  begin if a=null then @<Check that another \.\$ follows@>;
26162  @<Finish displayed math@>;
26163  end;
26164end;
26165
26166@ @<Check that the necessary fonts...@>=
26167if ((font_params[fam_fnt(2+text_size)]<total_mathsy_params)
26168    and (not is_new_mathfont(fam_fnt(2+text_size)))) or@|
26169   ((font_params[fam_fnt(2+script_size)]<total_mathsy_params)
26170    and (not is_new_mathfont(fam_fnt(2+script_size)))) or@|
26171   ((font_params[fam_fnt(2+script_script_size)]<total_mathsy_params)
26172    and (not is_new_mathfont(fam_fnt(2+script_script_size)))) then
26173  begin print_err("Math formula deleted: Insufficient symbol fonts");@/
26174@.Math formula deleted...@>
26175  help3("Sorry, but I can't typeset math unless \textfont 2")@/
26176    ("and \scriptfont 2 and \scriptscriptfont 2 have all")@/
26177    ("the \fontdimen values needed in math symbol fonts.");
26178  error; flush_math; danger:=true;
26179  end
26180else if ((font_params[fam_fnt(3+text_size)]<total_mathex_params)
26181    and (not is_new_mathfont(fam_fnt(3+text_size)))) or@|
26182   ((font_params[fam_fnt(3+script_size)]<total_mathex_params)
26183    and (not is_new_mathfont(fam_fnt(3+script_size)))) or@|
26184   ((font_params[fam_fnt(3+script_script_size)]<total_mathex_params)
26185    and (not is_new_mathfont(fam_fnt(3+script_script_size)))) then
26186  begin print_err("Math formula deleted: Insufficient extension fonts");@/
26187  help3("Sorry, but I can't typeset math unless \textfont 3")@/
26188    ("and \scriptfont 3 and \scriptscriptfont 3 have all")@/
26189    ("the \fontdimen values needed in math extension fonts.");
26190  error; flush_math; danger:=true;
26191  end
26192
26193@ The |unsave| is done after everything else here; hence an appearance of
26194`\.{\\mathsurround}' inside of `\.{\$...\$}' affects the spacing at these
26195particular \.\$'s. This is consistent with the conventions of
26196`\.{\$\$...\$\$}', since `\.{\\abovedisplayskip}' inside a display affects the
26197space above that display.
26198
26199@<Finish math in text@>=
26200begin tail_append(new_math(math_surround,before));
26201cur_mlist:=p; cur_style:=text_style; mlist_penalties:=(mode>0); mlist_to_hlist;
26202link(tail):=link(temp_head);
26203while link(tail)<>null do tail:=link(tail);
26204tail_append(new_math(math_surround,after));
26205space_factor:=1000; unsave;
26206end
26207
26208@ \TeX\ gets to the following part of the program when the first `\.\$' ending
26209a display has been scanned.
26210
26211@<Check that another \.\$ follows@>=
26212begin get_x_token;
26213if cur_cmd<>math_shift then
26214  begin print_err("Display math should end with $$");
26215@.Display math...with \$\$@>
26216  help2("The `$' that I just saw supposedly matches a previous `$$'.")@/
26217    ("So I shall assume that you typed `$$' both times.");
26218  back_error;
26219  end;
26220end
26221
26222@ We have saved the worst for last: The fussiest part of math mode processing
26223occurs when a displayed formula is being centered and placed with an optional
26224equation number.
26225
26226@<Local variables for finishing...@>=
26227@!b:pointer; {box containing the equation}
26228@!w:scaled; {width of the equation}
26229@!z:scaled; {width of the line}
26230@!e:scaled; {width of equation number}
26231@!q:scaled; {width of equation number plus space to separate from equation}
26232@!d:scaled; {displacement of equation in the line}
26233@!s:scaled; {move the line right this much}
26234@!g1,@!g2:small_number; {glue parameter codes for before and after}
26235@!r:pointer; {kern node used to position the display}
26236@!t:pointer; {tail of adjustment list}
26237@!pre_t:pointer; {tail of pre-adjustment list}
26238
26239@ At this time |p| points to the mlist for the formula; |a| is either
26240|null| or it points to a box containing the equation number; and we are in
26241vertical mode (or internal vertical mode).
26242
26243@<Finish displayed math@>=
26244cur_mlist:=p; cur_style:=display_style; mlist_penalties:=false;
26245mlist_to_hlist; p:=link(temp_head);@/
26246adjust_tail:=adjust_head; pre_adjust_tail:=pre_adjust_head;
26247b:=hpack(p,natural); p:=list_ptr(b);
26248t:=adjust_tail; adjust_tail:=null;@/
26249pre_t:=pre_adjust_tail; pre_adjust_tail:=null;@/
26250w:=width(b); z:=display_width; s:=display_indent;
26251if pre_display_direction<0 then s:=-s-z;
26252if (a=null)or danger then
26253  begin e:=0; q:=0;
26254  end
26255else  begin e:=width(a); q:=e+math_quad(text_size);
26256  end;
26257if w+q>z then
26258  @<Squeeze the equation as much as possible; if there is an equation
26259    number that should go on a separate line by itself,
26260    set~|e:=0|@>;
26261@<Determine the displacement, |d|, of the left edge of the equation, with
26262  respect to the line size |z|, assuming that |l=false|@>;
26263@<Append the glue or equation number preceding the display@>;
26264@<Append the display and perhaps also the equation number@>;
26265@<Append the glue or equation number following the display@>;
26266@<Flush the prototype box@>;
26267resume_after_display
26268
26269@ @<Declare act...@>=
26270procedure resume_after_display;
26271begin if cur_group<>math_shift_group then confusion("display");
26272@:this can't happen display}{\quad display@>
26273unsave; prev_graf:=prev_graf+3;
26274push_nest; mode:=hmode; space_factor:=1000; set_cur_lang; clang:=cur_lang;
26275prev_graf:=(norm_min(left_hyphen_min)*@'100+norm_min(right_hyphen_min))
26276             *@'200000+cur_lang;
26277@<Scan an optional space@>;
26278if nest_ptr=1 then build_page;
26279end;
26280
26281@ The user can force the equation number to go on a separate line
26282by causing its width to be zero.
26283
26284@<Squeeze the equation as much as possible...@>=
26285begin if (e<>0)and((w-total_shrink[normal]+q<=z)or@|
26286   (total_shrink[fil]<>0)or(total_shrink[fill]<>0)or
26287   (total_shrink[filll]<>0)) then
26288  begin free_node(b,box_node_size);
26289  b:=hpack(p,z-q,exactly);
26290  end
26291else  begin e:=0;
26292  if w>z then
26293    begin free_node(b,box_node_size);
26294    b:=hpack(p,z,exactly);
26295    end;
26296  end;
26297w:=width(b);
26298end
26299
26300@ We try first to center the display without regard to the existence of
26301the equation number. If that would make it too close (where ``too close''
26302means that the space between display and equation number is less than the
26303width of the equation number), we either center it in the remaining space
26304or move it as far from the equation number as possible. The latter alternative
26305is taken only if the display begins with glue, since we assume that the
26306user put glue there to control the spacing precisely.
26307
26308@<Determine the displacement, |d|, of the left edge of the equation...@>=
26309set_box_lr(b)(dlist);
26310d:=half(z-w);
26311if (e>0)and(d<2*e) then {too close}
26312  begin d:=half(z-w-e);
26313  if p<>null then if not is_char_node(p) then if type(p)=glue_node then d:=0;
26314  end
26315
26316@ If the equation number is set on a line by itself, either before or
26317after the formula, we append an infinite penalty so that no page break will
26318separate the display from its number; and we use the same size and
26319displacement for all three potential lines of the display, even though
26320`\.{\\parshape}' may specify them differently.
26321
26322@<Append the glue or equation number preceding the display@>=
26323tail_append(new_penalty(pre_display_penalty));@/
26324if (d+s<=pre_display_size)or l then {not enough clearance}
26325  begin g1:=above_display_skip_code; g2:=below_display_skip_code;
26326  end
26327else  begin g1:=above_display_short_skip_code;
26328  g2:=below_display_short_skip_code;
26329  end;
26330if l and(e=0) then {it follows that |type(a)=hlist_node|}
26331  begin app_display(j,a,0);
26332  tail_append(new_penalty(inf_penalty));
26333  end
26334else tail_append(new_param_glue(g1))
26335
26336@ @<Append the display and perhaps also the equation number@>=
26337if e<>0 then
26338  begin r:=new_kern(z-w-e-d);
26339  if l then
26340    begin link(a):=r; link(r):=b; b:=a; d:=0;
26341    end
26342  else  begin link(b):=r; link(r):=a;
26343    end;
26344  b:=hpack(b,natural);
26345  end;
26346app_display(j,b,d)
26347
26348@ @<Append the glue or equation number following the display@>=
26349if (a<>null)and(e=0)and not l then
26350  begin tail_append(new_penalty(inf_penalty));
26351  app_display(j,a,z-width(a));
26352  g2:=0;
26353  end;
26354if t<>adjust_head then {migrating material comes after equation number}
26355  begin link(tail):=link(adjust_head); tail:=t;
26356  end;
26357if pre_t<>pre_adjust_head then
26358  begin link(tail):=link(pre_adjust_head); tail:=pre_t;
26359  end;
26360tail_append(new_penalty(post_display_penalty));
26361if g2>0 then tail_append(new_param_glue(g2))
26362
26363@ When \.{\\halign} appears in a display, the alignment routines operate
26364essentially as they do in vertical mode. Then the following program is
26365activated, with |p| and |q| pointing to the beginning and end of the
26366resulting list, and with |aux_save| holding the |prev_depth| value.
26367
26368@<Finish an alignment in a display@>=
26369begin do_assignments;
26370if cur_cmd<>math_shift then @<Pontificate about improper alignment in display@>
26371else @<Check that another \.\$ follows@>;
26372flush_node_list(LR_box);
26373pop_nest;
26374tail_append(new_penalty(pre_display_penalty));
26375tail_append(new_param_glue(above_display_skip_code));
26376link(tail):=p;
26377if p<>null then tail:=q;
26378tail_append(new_penalty(post_display_penalty));
26379tail_append(new_param_glue(below_display_skip_code));
26380prev_depth:=aux_save.sc; resume_after_display;
26381end
26382
26383@ @<Pontificate...@>=
26384begin print_err("Missing $$ inserted");
26385@.Missing {\$\$} inserted@>
26386help2("Displays can use special alignments (like \eqalignno)")@/
26387  ("only if nothing but the alignment itself is between $$'s.");
26388back_error;
26389end
26390
26391@* \[49] Mode-independent processing.
26392The long |main_control| procedure has now been fully specified, except for
26393certain activities that are independent of the current mode. These activities
26394do not change the current vlist or hlist or mlist; if they change anything,
26395it is the value of a parameter or the meaning of a control sequence.
26396
26397Assignments to values in |eqtb| can be global or local. Furthermore, a
26398control sequence can be defined to be `\.{\\long}', `\.{\\protected}',
26399or `\.{\\outer}', and it might or might not be expanded. The prefixes
26400`\.{\\global}', `\.{\\long}', `\.{\\protected}',
26401and `\.{\\outer}' can occur in any order. Therefore we assign binary numeric
26402codes, making it possible to accumulate the union of all specified prefixes
26403by adding the corresponding codes.  (\PASCAL's |set| operations could also
26404have been used.)
26405
26406@<Put each...@>=
26407primitive("long",prefix,1);
26408@!@:long_}{\.{\\long} primitive@>
26409primitive("outer",prefix,2);
26410@!@:outer_}{\.{\\outer} primitive@>
26411primitive("global",prefix,4);
26412@!@:global_}{\.{\\global} primitive@>
26413primitive("def",def,0);
26414@!@:def_}{\.{\\def} primitive@>
26415primitive("gdef",def,1);
26416@!@:gdef_}{\.{\\gdef} primitive@>
26417primitive("edef",def,2);
26418@!@:edef_}{\.{\\edef} primitive@>
26419primitive("xdef",def,3);
26420@!@:xdef_}{\.{\\xdef} primitive@>
26421
26422@ @<Cases of |print_cmd_chr|...@>=
26423prefix: if chr_code=1 then print_esc("long")
26424  else if chr_code=2 then print_esc("outer")
26425  @/@<Cases of |prefix| for |print_cmd_chr|@>@/
26426  else print_esc("global");
26427def: if chr_code=0 then print_esc("def")
26428  else if chr_code=1 then print_esc("gdef")
26429  else if chr_code=2 then print_esc("edef")
26430  else print_esc("xdef");
26431
26432@ Every prefix, and every command code that might or might not be prefixed,
26433calls the action procedure |prefixed_command|. This routine accumulates
26434a sequence of prefixes until coming to a non-prefix, then it carries out
26435the command.
26436
26437@<Cases of |main_control| that don't...@>=
26438any_mode(toks_register),
26439any_mode(assign_toks),
26440any_mode(assign_int),
26441any_mode(assign_dimen),
26442any_mode(assign_glue),
26443any_mode(assign_mu_glue),
26444any_mode(assign_font_dimen),
26445any_mode(assign_font_int),
26446any_mode(set_aux),
26447any_mode(set_prev_graf),
26448any_mode(set_page_dimen),
26449any_mode(set_page_int),
26450any_mode(set_box_dimen),
26451any_mode(set_shape),
26452any_mode(def_code),
26453any_mode(XeTeX_def_code),
26454any_mode(def_family),
26455any_mode(set_font),
26456any_mode(def_font),
26457any_mode(register),
26458any_mode(advance),
26459any_mode(multiply),
26460any_mode(divide),
26461any_mode(prefix),
26462any_mode(let),
26463any_mode(shorthand_def),
26464any_mode(read_to_cs),
26465any_mode(def),
26466any_mode(set_box),
26467any_mode(hyph_data),
26468any_mode(set_interaction):prefixed_command;
26469
26470@ If the user says, e.g., `\.{\\global\\global}', the redundancy is
26471silently accepted.
26472
26473@<Declare act...@>=
26474@t\4@>@<Declare subprocedures for |prefixed_command|@>@t@>@;@/
26475procedure prefixed_command;
26476label done,exit;
26477var a:small_number; {accumulated prefix codes so far}
26478@!f:internal_font_number; {identifies a font}
26479@!j:halfword; {index into a \.{\\parshape} specification}
26480@!k:font_index; {index into |font_info|}
26481@!p,@!q:pointer; {for temporary short-term use}
26482@!n:integer; {ditto}
26483@!e:boolean; {should a definition be expanded? or was \.{\\let} not done?}
26484begin a:=0;
26485while cur_cmd=prefix do
26486  begin if not odd(a div cur_chr) then a:=a+cur_chr;
26487  @<Get the next non-blank non-relax...@>;
26488  if cur_cmd<=max_non_prefixed_command then
26489    @<Discard erroneous prefixes and |return|@>;
26490  if tracing_commands>2 then if eTeX_ex then show_cur_cmd_chr;
26491  end;
26492@<Discard the prefixes \.{\\long} and \.{\\outer} if they are irrelevant@>;
26493@<Adjust \(f)for the setting of \.{\\globaldefs}@>;
26494case cur_cmd of
26495@t\4@>@<Assignments@>@;
26496othercases confusion("prefix")
26497@:this can't happen prefix}{\quad prefix@>
26498endcases;
26499done: @<Insert a token saved by \.{\\afterassignment}, if any@>;
26500exit:end;
26501
26502@ @<Discard erroneous...@>=
26503begin print_err("You can't use a prefix with `");
26504@.You can't use a prefix with x@>
26505print_cmd_chr(cur_cmd,cur_chr); print_char("'");
26506help1("I'll pretend you didn't say \long or \outer or \global.");
26507if eTeX_ex then help_line[0]:=@|
26508  "I'll pretend you didn't say \long or \outer or \global or \protected.";
26509back_error; return;
26510end
26511
26512@ @<Discard the prefixes...@>=
26513if a>=8 then
26514  begin j:=protected_token; a:=a-8;
26515  end
26516else j:=0;
26517if (cur_cmd<>def)and((a mod 4<>0)or(j<>0)) then
26518  begin print_err("You can't use `"); print_esc("long"); print("' or `");
26519  print_esc("outer");
26520  help1("I'll pretend you didn't say \long or \outer here.");
26521  if eTeX_ex then
26522    begin  help_line[0]:=@|
26523      "I'll pretend you didn't say \long or \outer or \protected here.";
26524    print("' or `"); print_esc("protected");
26525    end;
26526  print("' with `");
26527@.You can't use \\long...@>
26528  print_cmd_chr(cur_cmd,cur_chr); print_char("'");
26529  error;
26530  end
26531
26532@ The previous routine does not have to adjust |a| so that |a mod 4=0|,
26533since the following routines test for the \.{\\global} prefix as follows.
26534
26535@d global==(a>=4)
26536@d define(#)==if global then geq_define(#)@+else eq_define(#)
26537@d word_define(#)==if global then geq_word_define(#)@+else eq_word_define(#)
26538@d word_define1(#)==if global then geq_word_define1(#)@+else eq_word_define1(#)
26539
26540@<Adjust \(f)for the setting of \.{\\globaldefs}@>=
26541if global_defs<>0 then
26542  if global_defs<0 then
26543    begin if global then a:=a-4;
26544    end
26545  else  begin if not global then a:=a+4;
26546    end
26547
26548@ When a control sequence is to be defined, by \.{\\def} or \.{\\let} or
26549something similar, the |get_r_token| routine will substitute a special
26550control sequence for a token that is not redefinable.
26551
26552@<Declare subprocedures for |prefixed_command|@>=
26553procedure get_r_token;
26554label restart;
26555begin restart: repeat get_token;
26556until cur_tok<>space_token;
26557if (cur_cs=0)or(cur_cs>frozen_control_sequence) then
26558  begin print_err("Missing control sequence inserted");
26559@.Missing control...@>
26560  help5("Please don't say `\def cs{...}', say `\def\cs{...}'.")@/
26561  ("I've inserted an inaccessible control sequence so that your")@/
26562  ("definition will be completed without mixing me up too badly.")@/
26563  ("You can recover graciously from this error, if you're")@/
26564  ("careful; see exercise 27.2 in The TeXbook.");
26565@:TeXbook}{\sl The \TeX book@>
26566  if cur_cs=0 then back_input;
26567  cur_tok:=cs_token_flag+frozen_protection; ins_error; goto restart;
26568  end;
26569end;
26570
26571@ @<Initialize table entries...@>=
26572text(frozen_protection):="inaccessible";
26573
26574@ Here's an example of the way many of the following routines operate.
26575(Unfortunately, they aren't all as simple as this.)
26576
26577@<Assignments@>=
26578set_font: define(cur_font_loc,data,cur_chr);
26579
26580@ When a |def| command has been scanned,
26581|cur_chr| is odd if the definition is supposed to be global, and
26582|cur_chr>=2| if the definition is supposed to be expanded.
26583
26584@<Assignments@>=
26585def: begin if odd(cur_chr)and not global and(global_defs>=0) then a:=a+4;
26586  e:=(cur_chr>=2); get_r_token; p:=cur_cs;
26587  q:=scan_toks(true,e);
26588  if j<>0 then
26589    begin q:=get_avail; info(q):=j; link(q):=link(def_ref);
26590    link(def_ref):=q;
26591    end;
26592  define(p,call+(a mod 4),def_ref);
26593  end;
26594
26595@ Both \.{\\let} and \.{\\futurelet} share the command code |let|.
26596
26597@<Put each...@>=
26598primitive("let",let,normal);@/
26599@!@:let_}{\.{\\let} primitive@>
26600primitive("futurelet",let,normal+1);@/
26601@!@:future_let_}{\.{\\futurelet} primitive@>
26602
26603@ @<Cases of |print_cmd_chr|...@>=
26604let: if chr_code<>normal then print_esc("futurelet")@+else print_esc("let");
26605
26606@ @<Assignments@>=
26607let:  begin n:=cur_chr;
26608  get_r_token; p:=cur_cs;
26609  if n=normal then
26610    begin repeat get_token;
26611    until cur_cmd<>spacer;
26612    if cur_tok=other_token+"=" then
26613      begin get_token;
26614      if cur_cmd=spacer then get_token;
26615      end;
26616    end
26617  else  begin get_token; q:=cur_tok; get_token; back_input;
26618    cur_tok:=q; back_input; {look ahead, then back up}
26619    end; {note that |back_input| doesn't affect |cur_cmd|, |cur_chr|}
26620  if cur_cmd>=call then add_token_ref(cur_chr)
26621  else if (cur_cmd=register)or(cur_cmd=toks_register) then
26622    if (cur_chr<mem_bot)or(cur_chr>lo_mem_stat_max) then
26623      add_sa_ref(cur_chr);
26624  define(p,cur_cmd,cur_chr);
26625  end;
26626
26627@ A \.{\\chardef} creates a control sequence whose |cmd| is |char_given|;
26628a \.{\\mathchardef} creates a control sequence whose |cmd| is |math_given|;
26629and the corresponding |chr| is the character code or math code. A \.{\\countdef}
26630or \.{\\dimendef} or \.{\\skipdef} or \.{\\muskipdef} creates a control
26631sequence whose |cmd| is |assign_int| or \dots\ or |assign_mu_glue|, and the
26632corresponding |chr| is the |eqtb| location of the internal register in question.
26633
26634@d char_def_code=0 {|shorthand_def| for \.{\\chardef}}
26635@d math_char_def_code=1 {|shorthand_def| for \.{\\mathchardef}}
26636@d count_def_code=2 {|shorthand_def| for \.{\\countdef}}
26637@d dimen_def_code=3 {|shorthand_def| for \.{\\dimendef}}
26638@d skip_def_code=4 {|shorthand_def| for \.{\\skipdef}}
26639@d mu_skip_def_code=5 {|shorthand_def| for \.{\\muskipdef}}
26640@d toks_def_code=6 {|shorthand_def| for \.{\\toksdef}}
26641@d XeTeX_math_char_num_def_code=8
26642@d XeTeX_math_char_def_code=9
26643
26644@<Put each...@>=
26645primitive("chardef",shorthand_def,char_def_code);@/
26646@!@:char_def_}{\.{\\chardef} primitive@>
26647primitive("mathchardef",shorthand_def,math_char_def_code);@/
26648@!@:math_char_def_}{\.{\\mathchardef} primitive@>
26649primitive("XeTeXmathcharnumdef",shorthand_def,XeTeX_math_char_num_def_code);@/
26650primitive("Umathcharnumdef",shorthand_def,XeTeX_math_char_num_def_code);@/
26651@!@:U_math_char_num_def_}{\.{\\Umathcharnumdef} primitive@>
26652primitive("XeTeXmathchardef",shorthand_def,XeTeX_math_char_def_code);@/
26653primitive("Umathchardef",shorthand_def,XeTeX_math_char_def_code);@/
26654@!@:U_math_char_def_}{\.{\\Umathchardef} primitive@>
26655primitive("countdef",shorthand_def,count_def_code);@/
26656@!@:count_def_}{\.{\\countdef} primitive@>
26657primitive("dimendef",shorthand_def,dimen_def_code);@/
26658@!@:dimen_def_}{\.{\\dimendef} primitive@>
26659primitive("skipdef",shorthand_def,skip_def_code);@/
26660@!@:skip_def_}{\.{\\skipdef} primitive@>
26661primitive("muskipdef",shorthand_def,mu_skip_def_code);@/
26662@!@:mu_skip_def_}{\.{\\muskipdef} primitive@>
26663primitive("toksdef",shorthand_def,toks_def_code);@/
26664@!@:toks_def_}{\.{\\toksdef} primitive@>
26665
26666@ @<Cases of |print_cmd_chr|...@>=
26667shorthand_def: case chr_code of
26668  char_def_code: print_esc("chardef");
26669  math_char_def_code: print_esc("mathchardef");
26670  XeTeX_math_char_def_code: print_esc("Umathchardef");
26671  XeTeX_math_char_num_def_code: print_esc("Umathcharnumdef");
26672  count_def_code: print_esc("countdef");
26673  dimen_def_code: print_esc("dimendef");
26674  skip_def_code: print_esc("skipdef");
26675  mu_skip_def_code: print_esc("muskipdef");
26676  othercases print_esc("toksdef")
26677  endcases;
26678char_given: begin print_esc("char"); print_hex(chr_code);
26679  end;
26680math_given: begin print_esc("mathchar"); print_hex(chr_code);
26681  end;
26682XeTeX_math_given: begin print_esc("Umathchar");
26683  print_hex(math_class_field(chr_code));
26684  print_hex(math_fam_field(chr_code));
26685  print_hex(math_char_field(chr_code));
26686  end;
26687
26688@ We temporarily define |p| to be |relax|, so that an occurrence of |p|
26689while scanning the definition will simply stop the scanning instead of
26690producing an ``undefined control sequence'' error or expanding the
26691previous meaning.  This allows, for instance, `\.{\\chardef\\foo=123\\foo}'.
26692
26693@<Assignments@>=
26694shorthand_def: begin n:=cur_chr; get_r_token; p:=cur_cs; define(p,relax,256);
26695  scan_optional_equals;
26696  case n of
26697  char_def_code: begin scan_usv_num; define(p,char_given,cur_val);
26698    end;
26699  math_char_def_code: begin scan_fifteen_bit_int; define(p,math_given,cur_val);
26700    end;
26701  XeTeX_math_char_num_def_code: begin scan_xetex_math_char_int;
26702    define(p, XeTeX_math_given, cur_val);
26703    end;
26704  XeTeX_math_char_def_code: begin
26705      scan_math_class_int; n:=set_class_field(cur_val);
26706      scan_math_fam_int;   n:=n + set_family_field(cur_val);
26707      scan_usv_num;        n:=n + cur_val;
26708      define(p, XeTeX_math_given, n);
26709    end;
26710  othercases begin scan_register_num;
26711    if cur_val>255 then
26712      begin j:=n-count_def_code; {|int_val..box_val|}
26713      if j>mu_val then j:=tok_val; {|int_val..mu_val| or |tok_val|}
26714      find_sa_element(j,cur_val,true); add_sa_ref(cur_ptr);
26715      if j=tok_val then j:=toks_register@+else j:=register;
26716      define(p,j,cur_ptr);
26717      end
26718    else
26719    case n of
26720    count_def_code: define(p,assign_int,count_base+cur_val);
26721    dimen_def_code: define(p,assign_dimen,scaled_base+cur_val);
26722    skip_def_code: define(p,assign_glue,skip_base+cur_val);
26723    mu_skip_def_code: define(p,assign_mu_glue,mu_skip_base+cur_val);
26724    toks_def_code: define(p,assign_toks,toks_base+cur_val);
26725    end; {there are no other cases}
26726    end
26727  endcases;
26728  end;
26729
26730@ @<Assignments@>=
26731read_to_cs: begin j:=cur_chr; scan_int; n:=cur_val;
26732  if not scan_keyword("to") then
26733@.to@>
26734    begin print_err("Missing `to' inserted");
26735@.Missing `to'...@>
26736    help2("You should have said `\read<number> to \cs'.")@/
26737    ("I'm going to look for the \cs now."); error;
26738    end;
26739  get_r_token;
26740  p:=cur_cs; read_toks(n,p,j); define(p,call,cur_val);
26741  end;
26742
26743@ The token-list parameters, \.{\\output} and \.{\\everypar}, etc., receive
26744their values in the following way. (For safety's sake, we place an
26745enclosing pair of braces around an \.{\\output} list.)
26746
26747@<Assignments@>=
26748toks_register,assign_toks: begin q:=cur_cs;
26749  e:=false; {just in case, will be set |true| for sparse array elements}
26750  if cur_cmd=toks_register then
26751    if cur_chr=mem_bot then
26752      begin scan_register_num;
26753      if cur_val>255 then
26754        begin find_sa_element(tok_val,cur_val,true);
26755        cur_chr:=cur_ptr; e:=true;
26756        end
26757      else cur_chr:=toks_base+cur_val;
26758      end
26759    else e:=true
26760  else if cur_chr=XeTeX_inter_char_loc then begin
26761    scan_eight_bit_int; cur_ptr:=cur_val;
26762    scan_eight_bit_int;
26763    find_sa_element(inter_char_val, cur_ptr*@"100 + cur_val, true);
26764    cur_chr:=cur_ptr; e:=true;
26765  end;
26766  p:=cur_chr; {|p=every_par_loc| or |output_routine_loc| or \dots}
26767  scan_optional_equals;
26768  @<Get the next non-blank non-relax non-call token@>;
26769  if cur_cmd<>left_brace then @<If the right-hand side is a token parameter
26770      or token register, finish the assignment and |goto done|@>;
26771  back_input; cur_cs:=q; q:=scan_toks(false,false);
26772  if link(def_ref)=null then {empty list: revert to the default}
26773    begin sa_define(p,null)(p,undefined_cs,null); free_avail(def_ref);
26774    end
26775  else  begin if (p=output_routine_loc)and not e then {enclose in curlies}
26776      begin link(q):=get_avail; q:=link(q);
26777      info(q):=right_brace_token+"}";
26778      q:=get_avail; info(q):=left_brace_token+"{";
26779      link(q):=link(def_ref); link(def_ref):=q;
26780      end;
26781    sa_define(p,def_ref)(p,call,def_ref);
26782    end;
26783  end;
26784
26785@ @<If the right-hand side is a token parameter...@>=
26786if (cur_cmd=toks_register)or(cur_cmd=assign_toks) then
26787  begin if cur_cmd=toks_register then
26788    if cur_chr=mem_bot then
26789      begin scan_register_num;
26790      if cur_val<256 then q:=equiv(toks_base+cur_val)
26791      else  begin find_sa_element(tok_val,cur_val,false);
26792        if cur_ptr=null then q:=null
26793        else q:=sa_ptr(cur_ptr);
26794        end;
26795      end
26796    else q:=sa_ptr(cur_chr)
26797  else if cur_chr=XeTeX_inter_char_loc then begin
26798    scan_eight_bit_int; cur_ptr:=cur_val;
26799    scan_eight_bit_int;
26800    find_sa_element(inter_char_val, cur_ptr*@"100 + cur_val, false);
26801    if cur_ptr=null then q:=null
26802    else q:=sa_ptr(cur_ptr);
26803  end else q:=equiv(cur_chr);
26804  if q=null then sa_define(p,null)(p,undefined_cs,null)
26805  else  begin add_token_ref(q); sa_define(p,q)(p,call,q);
26806    end;
26807  goto done;
26808  end
26809
26810@ Similar routines are used to assign values to the numeric parameters.
26811
26812@<Assignments@>=
26813assign_int: begin p:=cur_chr; scan_optional_equals; scan_int;
26814  word_define(p,cur_val);
26815  end;
26816assign_dimen: begin p:=cur_chr; scan_optional_equals;
26817  scan_normal_dimen; word_define(p,cur_val);
26818  end;
26819assign_glue,assign_mu_glue: begin p:=cur_chr; n:=cur_cmd; scan_optional_equals;
26820  if n=assign_mu_glue then scan_glue(mu_val)@+else scan_glue(glue_val);
26821  trap_zero_glue;
26822  define(p,glue_ref,cur_val);
26823  end;
26824
26825@ When a glue register or parameter becomes zero, it will always point to
26826|zero_glue| because of the following procedure. (Exception: The tabskip
26827glue isn't trapped while preambles are being scanned.)
26828
26829@<Declare subprocedures for |prefixed_command|@>=
26830procedure trap_zero_glue;
26831begin if (width(cur_val)=0)and(stretch(cur_val)=0)and(shrink(cur_val)=0) then
26832  begin add_glue_ref(zero_glue);
26833  delete_glue_ref(cur_val); cur_val:=zero_glue;
26834  end;
26835end;
26836
26837@ The various character code tables are changed by the |def_code| commands,
26838and the font families are declared by |def_family|.
26839
26840@<Put each...@>=
26841primitive("catcode",def_code,cat_code_base);
26842@!@:cat_code_}{\.{\\catcode} primitive@>
26843primitive("mathcode",def_code,math_code_base);
26844@!@:math_code_}{\.{\\mathcode} primitive@>
26845primitive("XeTeXmathcodenum",XeTeX_def_code,math_code_base);
26846primitive("Umathcodenum",XeTeX_def_code,math_code_base);
26847@!@:U_math_code_num_}{\.{\\Umathcodenum} primitive@>
26848primitive("XeTeXmathcode",XeTeX_def_code,math_code_base+1);
26849primitive("Umathcode",XeTeX_def_code,math_code_base+1);
26850@!@:U_math_code_}{\.{\\Umathcode} primitive@>
26851primitive("lccode",def_code,lc_code_base);
26852@!@:lc_code_}{\.{\\lccode} primitive@>
26853primitive("uccode",def_code,uc_code_base);
26854@!@:uc_code_}{\.{\\uccode} primitive@>
26855primitive("sfcode",def_code,sf_code_base);
26856@!@:sf_code_}{\.{\\sfcode} primitive@>
26857primitive("XeTeXcharclass",XeTeX_def_code,sf_code_base);
26858@!@:XeTeX_char_class_}{\.{\\XeTeXcharclass} primitive@>
26859primitive("delcode",def_code,del_code_base);
26860@!@:del_code_}{\.{\\delcode} primitive@>
26861primitive("XeTeXdelcodenum",XeTeX_def_code,del_code_base);
26862primitive("Udelcodenum",XeTeX_def_code,del_code_base);
26863@!@:U_del_code_num_}{\.{\\Udelcodenum} primitive@>
26864primitive("XeTeXdelcode",XeTeX_def_code,del_code_base+1);
26865primitive("Udelcode",XeTeX_def_code,del_code_base+1);
26866@!@:U_del_code_}{\.{\\Udelcode} primitive@>
26867primitive("textfont",def_family,math_font_base);
26868@!@:text_font_}{\.{\\textfont} primitive@>
26869primitive("scriptfont",def_family,math_font_base+script_size);
26870@!@:script_font_}{\.{\\scriptfont} primitive@>
26871primitive("scriptscriptfont",def_family,math_font_base+script_script_size);
26872@!@:script_script_font_}{\.{\\scriptscriptfont} primitive@>
26873
26874@ @<Cases of |print_cmd_chr|...@>=
26875def_code: if chr_code=cat_code_base then print_esc("catcode")
26876  else if chr_code=math_code_base then print_esc("mathcode")
26877  else if chr_code=lc_code_base then print_esc("lccode")
26878  else if chr_code=uc_code_base then print_esc("uccode")
26879  else if chr_code=sf_code_base then print_esc("sfcode")
26880  else print_esc("delcode");
26881XeTeX_def_code:
26882  if chr_code=sf_code_base then print_esc("XeTeXcharclass")
26883  else if chr_code=math_code_base then print_esc("Umathcodenum")
26884  else if chr_code=math_code_base+1 then print_esc("Umathcode")
26885  else if chr_code=del_code_base then print_esc("Udelcodenum")
26886  else print_esc("Udelcode");
26887def_family: print_size(chr_code-math_font_base);
26888
26889@ The different types of code values have different legal ranges; the
26890following program is careful to check each case properly.
26891
26892@<Assignments@>=
26893XeTeX_def_code: begin
26894    if cur_chr = sf_code_base then begin
26895      p:=cur_chr; scan_usv_num;
26896      p:=p+cur_val;
26897      n:=sf_code(cur_val) mod @"10000;
26898      scan_optional_equals;
26899      scan_char_class;
26900      define(p,data,cur_val*@"10000 + n);
26901    end
26902    else if cur_chr = math_code_base then begin
26903      p:=cur_chr; scan_usv_num;
26904      p:=p+cur_val;
26905      scan_optional_equals;
26906      scan_xetex_math_char_int;
26907      define(p,data,hi(cur_val));
26908    end
26909    else if cur_chr = math_code_base+1 then begin
26910      p:=cur_chr-1; scan_usv_num;
26911      p:=p+cur_val;
26912      scan_optional_equals;
26913      scan_math_class_int; n:=set_class_field(cur_val);
26914      scan_math_fam_int;   n:=n + set_family_field(cur_val);
26915      scan_usv_num;        n:=n + cur_val;
26916      define(p,data,hi(n));
26917    end
26918    else if cur_chr = del_code_base then begin
26919      p:=cur_chr; scan_usv_num;
26920      p:=p+cur_val;
26921      scan_optional_equals;
26922      scan_int; {|scan_xetex_del_code_int|; !!FIXME!!}
26923      word_define(p,hi(cur_val));
26924    end else begin
26925      p:=cur_chr-1; scan_usv_num;
26926      p:=p+cur_val;
26927      scan_optional_equals;
26928      n:=@"40000000; {extended delimiter code flag}
26929      scan_math_fam_int;   n:=n + cur_val * @"200000; {extended delimiter code family}
26930      scan_usv_num;        n:=n + cur_val; {extended delimiter code USV}
26931      word_define(p,hi(n));
26932    end;
26933  end;
26934def_code: begin @<Let |n| be the largest legal code value, based on |cur_chr|@>;
26935  p:=cur_chr; scan_usv_num; p:=p+cur_val; scan_optional_equals;
26936  scan_int;
26937  if ((cur_val<0)and(p<del_code_base))or(cur_val>n) then
26938    begin print_err("Invalid code ("); print_int(cur_val);
26939@.Invalid code@>
26940    if p<del_code_base then print("), should be in the range 0..")
26941    else print("), should be at most ");
26942    print_int(n);
26943    help1("I'm going to use 0 instead of that illegal code value.");@/
26944    error; cur_val:=0;
26945    end;
26946  if p<math_code_base then begin
26947    if p>=sf_code_base then begin
26948      n:=equiv(p) div @"10000;
26949      define(p,data,n*@"10000 + cur_val);
26950    end else
26951      define(p,data,cur_val)
26952  end else if p<del_code_base then begin
26953    if cur_val=@"8000 then cur_val:=active_math_char
26954    else cur_val:=set_class_field(cur_val div @"1000) +
26955                  set_family_field((cur_val mod @"1000) div @"100) +
26956                  (cur_val mod @"100); {!!FIXME!! check how this is used}
26957    define(p,data,hi(cur_val));
26958    end
26959  else word_define(p,cur_val);
26960  end;
26961
26962@ @<Let |n| be the largest...@>=
26963if cur_chr=cat_code_base then n:=max_char_code
26964else if cur_chr=math_code_base then n:=@'100000
26965else if cur_chr=sf_code_base then n:=@'77777
26966else if cur_chr=del_code_base then n:=@'77777777
26967else n:=biggest_usv
26968
26969@ @<Assignments@>=
26970def_family: begin p:=cur_chr; scan_math_fam_int; p:=p+cur_val;
26971  scan_optional_equals; scan_font_ident; define(p,data,cur_val);
26972  end;
26973
26974@ Next we consider changes to \TeX's numeric registers.
26975
26976@<Assignments@>=
26977register,advance,multiply,divide: do_register_command(a);
26978
26979@ We use the fact that |register<advance<multiply<divide|.
26980
26981@<Declare subprocedures for |prefixed_command|@>=
26982procedure do_register_command(@!a:small_number);
26983label found,exit;
26984var l,@!q,@!r,@!s:pointer; {for list manipulation}
26985@!p:int_val..mu_val; {type of register involved}
26986@!e:boolean; {does |l| refer to a sparse array element?}
26987@!w:integer; {integer or dimen value of |l|}
26988begin q:=cur_cmd;
26989e:=false; {just in case, will be set |true| for sparse array elements}
26990@<Compute the register location |l| and its type |p|; but |return| if invalid@>;
26991if q=register then scan_optional_equals
26992else if scan_keyword("by") then do_nothing; {optional `\.{by}'}
26993@.by@>
26994arith_error:=false;
26995if q<multiply then @<Compute result of |register| or
26996    |advance|, put it in |cur_val|@>
26997else @<Compute result of |multiply| or |divide|, put it in |cur_val|@>;
26998if arith_error then
26999  begin print_err("Arithmetic overflow");
27000@.Arithmetic overflow@>
27001  help2("I can't carry out that multiplication or division,")@/
27002    ("since the result is out of range.");
27003  if p>=glue_val then delete_glue_ref(cur_val);
27004  error; return;
27005  end;
27006if p<glue_val then sa_word_define(l,cur_val)
27007else  begin trap_zero_glue; sa_define(l,cur_val)(l,glue_ref,cur_val);
27008  end;
27009exit: end;
27010
27011@ Here we use the fact that the consecutive codes |int_val..mu_val| and
27012|assign_int..assign_mu_glue| correspond to each other nicely.
27013
27014@<Compute the register location |l| and its type |p|...@>=
27015begin if q<>register then
27016  begin get_x_token;
27017  if (cur_cmd>=assign_int)and(cur_cmd<=assign_mu_glue) then
27018    begin l:=cur_chr; p:=cur_cmd-assign_int; goto found;
27019    end;
27020  if cur_cmd<>register then
27021    begin print_err("You can't use `"); print_cmd_chr(cur_cmd,cur_chr);
27022@.You can't use x after ...@>
27023    print("' after "); print_cmd_chr(q,0);
27024    help1("I'm forgetting what you said and not changing anything.");
27025    error; return;
27026    end;
27027  end;
27028if (cur_chr<mem_bot)or(cur_chr>lo_mem_stat_max) then
27029  begin l:=cur_chr; p:=sa_type(l); e:=true;
27030  end
27031else  begin p:=cur_chr-mem_bot; scan_register_num;
27032  if cur_val>255 then
27033    begin find_sa_element(p,cur_val,true); l:=cur_ptr; e:=true;
27034    end
27035  else
27036case p of
27037int_val: l:=cur_val+count_base;
27038dimen_val: l:=cur_val+scaled_base;
27039glue_val: l:=cur_val+skip_base;
27040mu_val: l:=cur_val+mu_skip_base;
27041end; {there are no other cases}
27042  end;
27043end;
27044found: if p<glue_val then@+if e then w:=sa_int(l)@+else w:=eqtb[l].int
27045else if e then s:=sa_ptr(l)@+else s:=equiv(l)
27046
27047@ @<Compute result of |register| or |advance|...@>=
27048if p<glue_val then
27049  begin if p=int_val then scan_int@+else scan_normal_dimen;
27050  if q=advance then cur_val:=cur_val+w;
27051  end
27052else  begin scan_glue(p);
27053  if q=advance then @<Compute the sum of two glue specs@>;
27054  end
27055
27056@ @<Compute the sum of two glue specs@>=
27057begin q:=new_spec(cur_val); r:=s;
27058delete_glue_ref(cur_val);
27059width(q):=width(q)+width(r);
27060if stretch(q)=0 then stretch_order(q):=normal;
27061if stretch_order(q)=stretch_order(r) then stretch(q):=stretch(q)+stretch(r)
27062else if (stretch_order(q)<stretch_order(r))and(stretch(r)<>0) then
27063  begin stretch(q):=stretch(r); stretch_order(q):=stretch_order(r);
27064  end;
27065if shrink(q)=0 then shrink_order(q):=normal;
27066if shrink_order(q)=shrink_order(r) then shrink(q):=shrink(q)+shrink(r)
27067else if (shrink_order(q)<shrink_order(r))and(shrink(r)<>0) then
27068  begin shrink(q):=shrink(r); shrink_order(q):=shrink_order(r);
27069  end;
27070cur_val:=q;
27071end
27072
27073@ @<Compute result of |multiply| or |divide|...@>=
27074begin scan_int;
27075if p<glue_val then
27076  if q=multiply then
27077    if p=int_val then cur_val:=mult_integers(w,cur_val)
27078    else cur_val:=nx_plus_y(w,cur_val,0)
27079  else cur_val:=x_over_n(w,cur_val)
27080else  begin r:=new_spec(s);
27081  if q=multiply then
27082    begin width(r):=nx_plus_y(width(s),cur_val,0);
27083    stretch(r):=nx_plus_y(stretch(s),cur_val,0);
27084    shrink(r):=nx_plus_y(shrink(s),cur_val,0);
27085    end
27086  else  begin width(r):=x_over_n(width(s),cur_val);
27087    stretch(r):=x_over_n(stretch(s),cur_val);
27088    shrink(r):=x_over_n(shrink(s),cur_val);
27089    end;
27090  cur_val:=r;
27091  end;
27092end
27093
27094@ The processing of boxes is somewhat different, because we may need
27095to scan and create an entire box before we actually change the value of the old
27096one.
27097
27098@<Assignments@>=
27099set_box: begin scan_register_num;
27100  if global then n:=global_box_flag+cur_val@+else n:=box_flag+cur_val;
27101  scan_optional_equals;
27102  if set_box_allowed then scan_box(n)
27103  else begin print_err("Improper "); print_esc("setbox");
27104@.Improper \\setbox@>
27105    help2("Sorry, \setbox is not allowed after \halign in a display,")@/
27106    ("or between \accent and an accented character."); error;
27107    end;
27108  end;
27109
27110@ The |space_factor| or |prev_depth| settings are changed when a |set_aux|
27111command is sensed. Similarly, |prev_graf| is changed in the presence of
27112|set_prev_graf|, and |dead_cycles| or |insert_penalties| in the presence of
27113|set_page_int|. These definitions are always global.
27114
27115When some dimension of a box register is changed, the change isn't exactly
27116global; but \TeX\ does not look at the \.{\\global} switch.
27117
27118@<Assignments@>=
27119set_aux:alter_aux;
27120set_prev_graf:alter_prev_graf;
27121set_page_dimen:alter_page_so_far;
27122set_page_int:alter_integer;
27123set_box_dimen:alter_box_dimen;
27124
27125@ @<Declare subprocedures for |prefixed_command|@>=
27126procedure alter_aux;
27127var c:halfword; {|hmode| or |vmode|}
27128begin if cur_chr<>abs(mode) then report_illegal_case
27129else  begin c:=cur_chr; scan_optional_equals;
27130  if c=vmode then
27131    begin scan_normal_dimen; prev_depth:=cur_val;
27132    end
27133  else  begin scan_int;
27134    if (cur_val<=0)or(cur_val>32767) then
27135      begin print_err("Bad space factor");
27136@.Bad space factor@>
27137      help1("I allow only values in the range 1..32767 here.");
27138      int_error(cur_val);
27139      end
27140    else space_factor:=cur_val;
27141    end;
27142  end;
27143end;
27144
27145@ @<Declare subprocedures for |prefixed_command|@>=
27146procedure alter_prev_graf;
27147var p:0..nest_size; {index into |nest|}
27148begin nest[nest_ptr]:=cur_list; p:=nest_ptr;
27149while abs(nest[p].mode_field)<>vmode do decr(p);
27150scan_optional_equals; scan_int;
27151if cur_val<0 then
27152  begin print_err("Bad "); print_esc("prevgraf");
27153@.Bad \\prevgraf@>
27154  help1("I allow only nonnegative values here.");
27155  int_error(cur_val);
27156  end
27157else  begin nest[p].pg_field:=cur_val; cur_list:=nest[nest_ptr];
27158  end;
27159end;
27160
27161@ @<Declare subprocedures for |prefixed_command|@>=
27162procedure alter_page_so_far;
27163var c:0..7; {index into |page_so_far|}
27164begin c:=cur_chr; scan_optional_equals; scan_normal_dimen;
27165page_so_far[c]:=cur_val;
27166end;
27167
27168@ @<Declare subprocedures for |prefixed_command|@>=
27169procedure alter_integer;
27170var c:small_number;
27171  {0 for \.{\\deadcycles}, 1 for \.{\\insertpenalties}, etc.}
27172begin c:=cur_chr; scan_optional_equals; scan_int;
27173if c=0 then dead_cycles:=cur_val
27174@/@<Cases for |alter_integer|@>@/
27175else insert_penalties:=cur_val;
27176end;
27177
27178@ @<Declare subprocedures for |prefixed_command|@>=
27179procedure alter_box_dimen;
27180var c:small_number; {|width_offset| or |height_offset| or |depth_offset|}
27181@!b:pointer; {box register}
27182begin c:=cur_chr; scan_register_num; fetch_box(b); scan_optional_equals;
27183scan_normal_dimen;
27184if b<>null then mem[b+c].sc:=cur_val;
27185end;
27186
27187@ Paragraph shapes are set up in the obvious way.
27188
27189@<Assignments@>=
27190set_shape: begin q:=cur_chr; scan_optional_equals; scan_int; n:=cur_val;
27191  if n<=0 then p:=null
27192  else if q>par_shape_loc then
27193    begin n:=(cur_val div 2)+1; p:=get_node(2*n+1); info(p):=n;
27194    n:=cur_val; mem[p+1].int:=n; {number of penalties}
27195    for j:=p+2 to p+n+1 do
27196      begin scan_int; mem[j].int:=cur_val; {penalty values}
27197      end;
27198    if not odd(n) then mem[p+n+2].int:=0; {unused}
27199    end
27200  else  begin p:=get_node(2*n+1); info(p):=n;
27201    for j:=1 to n do
27202      begin scan_normal_dimen;
27203      mem[p+2*j-1].sc:=cur_val; {indentation}
27204      scan_normal_dimen;
27205      mem[p+2*j].sc:=cur_val; {width}
27206      end;
27207    end;
27208  define(q,shape_ref,p);
27209  end;
27210
27211@ Here's something that isn't quite so obvious. It guarantees that
27212|info(par_shape_ptr)| can hold any positive~|n| for which |get_node(2*n+1)|
27213doesn't overflow the memory capacity.
27214
27215@<Check the ``constant''...@>=
27216if 2*max_halfword<mem_top-mem_min then bad:=41;
27217
27218@ New hyphenation data is loaded by the |hyph_data| command.
27219
27220@<Put each...@>=
27221primitive("hyphenation",hyph_data,0);
27222@!@:hyphenation_}{\.{\\hyphenation} primitive@>
27223primitive("patterns",hyph_data,1);
27224@!@:patterns_}{\.{\\patterns} primitive@>
27225
27226@ @<Cases of |print_cmd_chr|...@>=
27227hyph_data: if chr_code=1 then print_esc("patterns")
27228  else print_esc("hyphenation");
27229
27230@ @<Assignments@>=
27231hyph_data: if cur_chr=1 then
27232    begin @!init new_patterns; goto done;@;@+tini@/
27233    print_err("Patterns can be loaded only by INITEX");
27234@.Patterns can be...@>
27235    help0; error;
27236    repeat get_token; until cur_cmd=right_brace; {flush the patterns}
27237    return;
27238    end
27239  else  begin new_hyph_exceptions; goto done;
27240    end;
27241
27242@ All of \TeX's parameters are kept in |eqtb| except the font information,
27243the interaction mode, and the hyphenation tables; these are strictly global.
27244
27245@<Assignments@>=
27246assign_font_dimen: begin find_font_dimen(true); k:=cur_val;
27247  scan_optional_equals; scan_normal_dimen; font_info[k].sc:=cur_val;
27248  end;
27249assign_font_int: begin n:=cur_chr; scan_font_ident; f:=cur_val;
27250  if n < lp_code_base then begin
27251    scan_optional_equals; scan_int;
27252    if n=0 then hyphen_char[f]:=cur_val@+else skew_char[f]:=cur_val;
27253  end else begin
27254    if is_native_font(f) then
27255      scan_glyph_number(f) {for native fonts, the value is a glyph id}
27256    else scan_char_num; {for |tfm| fonts it's the same like pdftex}
27257    p:=cur_val;
27258    scan_optional_equals; scan_int;
27259    case n of
27260      lp_code_base: set_cp_code(f, p, left_side, cur_val);
27261      rp_code_base: set_cp_code(f, p, right_side, cur_val);
27262    endcases;
27263  end;
27264end;
27265
27266@ @<Put each...@>=
27267primitive("hyphenchar",assign_font_int,0);
27268@!@:hyphen_char_}{\.{\\hyphenchar} primitive@>
27269primitive("skewchar",assign_font_int,1);
27270@!@:skew_char_}{\.{\\skewchar} primitive@>
27271primitive("lpcode",assign_font_int,lp_code_base);
27272@!@:lp_code_}{\.{\\lpcode} primitive@>
27273primitive("rpcode",assign_font_int,rp_code_base);
27274@!@:rp_code_}{\.{\\rpcode} primitive@>
27275
27276@ @<Cases of |print_cmd_chr|...@>=
27277assign_font_int: case chr_code of
272780: print_esc("hyphenchar");
272791: print_esc("skewchar");
27280lp_code_base: print_esc("lpcode");
27281rp_code_base: print_esc("rpcode");
27282endcases;
27283
27284@ Here is where the information for a new font gets loaded.
27285
27286@<Assignments@>=
27287def_font: new_font(a);
27288
27289@ @<Declare subprocedures for |prefixed_command|@>=
27290procedure new_font(@!a:small_number);
27291label common_ending;
27292var u:pointer; {user's font identifier}
27293@!s:scaled; {stated ``at'' size, or negative of scaled magnification}
27294@!f:internal_font_number; {runs through existing fonts}
27295@!t:str_number; {name for the frozen font identifier}
27296@!old_setting:0..max_selector; {holds |selector| setting}
27297@!flushable_string:str_number; {string not yet referenced}
27298begin if job_name=0 then open_log_file;
27299  {avoid confusing \.{texput} with the font name}
27300@.texput@>
27301get_r_token; u:=cur_cs;
27302if u>=hash_base then t:=text(u)
27303else if u>=single_base then
27304  if u=null_cs then t:="FONT"@+else t:=u-single_base
27305else  begin old_setting:=selector; selector:=new_string;
27306  print("FONT"); print(u-active_base); selector:=old_setting;
27307@.FONTx@>
27308  str_room(1); t:=make_string;
27309  end;
27310define(u,set_font,null_font); scan_optional_equals; scan_file_name;
27311@<Scan the font size specification@>;
27312@<If this font has already been loaded, set |f| to the internal
27313  font number and |goto common_ending|@>;
27314f:=read_font_info(u,cur_name,cur_area,s);
27315common_ending: define(u,set_font,f); eqtb[font_id_base+f]:=eqtb[u]; font_id_text(f):=t;
27316end;
27317
27318@ @<Scan the font size specification@>=
27319name_in_progress:=true; {this keeps |cur_name| from being changed}
27320if scan_keyword("at") then @<Put the \(p)(positive) `at' size into |s|@>
27321@.at@>
27322else if scan_keyword("scaled") then
27323@.scaled@>
27324  begin scan_int; s:=-cur_val;
27325  if (cur_val<=0)or(cur_val>32768) then
27326    begin print_err("Illegal magnification has been changed to 1000");@/
27327@.Illegal magnification...@>
27328    help1("The magnification ratio must be between 1 and 32768.");
27329    int_error(cur_val); s:=-1000;
27330    end;
27331  end
27332else s:=-1000;
27333name_in_progress:=false
27334
27335@ @<Put the \(p)(positive) `at' size into |s|@>=
27336begin scan_normal_dimen; s:=cur_val;
27337if (s<=0)or(s>=@'1000000000) then
27338  begin print_err("Improper `at' size (");
27339  print_scaled(s); print("pt), replaced by 10pt");
27340@.Improper `at' size...@>
27341  help2("I can only handle fonts at positive sizes that are")@/
27342  ("less than 2048pt, so I've changed what you said to 10pt.");
27343  error; s:=10*unity;
27344  end;
27345end
27346
27347@ When the user gives a new identifier to a font that was previously loaded,
27348the new name becomes the font identifier of record. Font names `\.{xyz}' and
27349`\.{XYZ}' are considered to be different.
27350
27351@<If this font has already been loaded...@>=
27352flushable_string:=str_ptr-1;
27353for f:=font_base+1 to font_ptr do begin
27354  if str_eq_str(font_name[f],cur_name) and
27355    (((cur_area = "") and is_native_font(f)) or str_eq_str(font_area[f],cur_area)) then
27356    begin if cur_name=flushable_string then
27357      begin flush_string; cur_name:=font_name[f];
27358      end;
27359    if s>0 then
27360      begin if s=font_size[f] then goto common_ending;
27361      end
27362    else if font_size[f]=xn_over_d(font_dsize[f],-s,1000) then
27363      goto common_ending;
27364    end;
27365  { could be a native font whose "name" ended up partly in area or extension }
27366  append_str(cur_area); append_str(cur_name); append_str(cur_ext);
27367  if str_eq_str(font_name[f], make_string) then begin
27368    flush_string;
27369    if is_native_font(f) then
27370      begin if s>0 then
27371        begin if s=font_size[f] then goto common_ending;
27372        end
27373      else if font_size[f]=xn_over_d(font_dsize[f],-s,1000) then
27374        goto common_ending;
27375      end
27376    end
27377  else flush_string;
27378  end
27379
27380@ @<Cases of |print_cmd_chr|...@>=
27381set_font:begin print("select font ");
27382  font_name_str:=font_name[chr_code];
27383  if is_native_font(chr_code) then begin
27384    quote_char:="""";
27385    for n:=0 to length(font_name_str) - 1 do
27386     if str_pool[str_start_macro(font_name_str) + n] = """" then quote_char:="'";
27387    print_char(quote_char);
27388    slow_print(font_name_str);
27389    print_char(quote_char);
27390  end else
27391    slow_print(font_name_str);
27392  if font_size[chr_code]<>font_dsize[chr_code] then
27393    begin print(" at "); print_scaled(font_size[chr_code]);
27394    print("pt");
27395    end;
27396  end;
27397
27398@ @<Put each...@>=
27399primitive("batchmode",set_interaction,batch_mode);
27400@!@:batch_mode_}{\.{\\batchmode} primitive@>
27401primitive("nonstopmode",set_interaction,nonstop_mode);
27402@!@:nonstop_mode_}{\.{\\nonstopmode} primitive@>
27403primitive("scrollmode",set_interaction,scroll_mode);
27404@!@:scroll_mode_}{\.{\\scrollmode} primitive@>
27405primitive("errorstopmode",set_interaction,error_stop_mode);
27406@!@:error_stop_mode_}{\.{\\errorstopmode} primitive@>
27407
27408@ @<Cases of |print_cmd_chr|...@>=
27409set_interaction: case chr_code of
27410  batch_mode: print_esc("batchmode");
27411  nonstop_mode: print_esc("nonstopmode");
27412  scroll_mode: print_esc("scrollmode");
27413  othercases print_esc("errorstopmode")
27414  endcases;
27415
27416@ @<Assignments@>=
27417set_interaction: new_interaction;
27418
27419@ @<Declare subprocedures for |prefixed_command|@>=
27420procedure new_interaction;
27421begin print_ln;
27422interaction:=cur_chr;
27423@<Initialize the print |selector| based on |interaction|@>;
27424if log_opened then selector:=selector+2;
27425end;
27426
27427@ The \.{\\afterassignment} command puts a token into the global
27428variable |after_token|. This global variable is examined just after
27429every assignment has been performed.
27430
27431@<Glob...@>=
27432@!after_token:halfword; {zero, or a saved token}
27433
27434@ @<Set init...@>=
27435after_token:=0;
27436
27437@ @<Cases of |main_control| that don't...@>=
27438any_mode(after_assignment):begin get_token; after_token:=cur_tok;
27439  end;
27440
27441@ @<Insert a token saved by \.{\\afterassignment}, if any@>=
27442if after_token<>0 then
27443  begin cur_tok:=after_token; back_input; after_token:=0;
27444  end
27445
27446@ Here is a procedure that might be called `Get the next non-blank non-relax
27447non-call non-assignment token'.
27448
27449@<Declare act...@>=
27450procedure do_assignments;
27451label exit;
27452begin loop begin @<Get the next non-blank non-relax...@>;
27453  if cur_cmd<=max_non_prefixed_command then return;
27454  set_box_allowed:=false; prefixed_command; set_box_allowed:=true;
27455  end;
27456exit:end;
27457
27458@ @<Cases of |main_control| that don't...@>=
27459any_mode(after_group):begin get_token; save_for_after(cur_tok);
27460  end;
27461
27462@ Files for \.{\\read} are opened and closed by the |in_stream| command.
27463
27464@<Put each...@>=
27465primitive("openin",in_stream,1);
27466@!@:open_in_}{\.{\\openin} primitive@>
27467primitive("closein",in_stream,0);
27468@!@:close_in_}{\.{\\closein} primitive@>
27469
27470@ @<Cases of |print_cmd_chr|...@>=
27471in_stream: if chr_code=0 then print_esc("closein")
27472  else print_esc("openin");
27473
27474@ @<Cases of |main_control| that don't...@>=
27475any_mode(in_stream): open_or_close_in;
27476
27477@ @<Declare act...@>=
27478procedure open_or_close_in;
27479var c:0..1; {1 for \.{\\openin}, 0 for \.{\\closein}}
27480@!n:0..15; {stream number}
27481begin c:=cur_chr; scan_four_bit_int; n:=cur_val;
27482if read_open[n]<>closed then
27483  begin u_close(read_file[n]); read_open[n]:=closed;
27484  end;
27485if c<>0 then
27486  begin scan_optional_equals; scan_file_name;
27487  if cur_ext="" then cur_ext:=".tex";
27488  pack_cur_name;
27489  if a_open_in(read_file[n]) then read_open[n]:=just_open;
27490  end;
27491end;
27492
27493@ The user can issue messages to the terminal, regardless of the
27494current mode.
27495
27496@<Cases of |main_control| that don't...@>=
27497any_mode(message):issue_message;
27498
27499@ @<Put each...@>=
27500primitive("message",message,0);
27501@!@:message_}{\.{\\message} primitive@>
27502primitive("errmessage",message,1);
27503@!@:err_message_}{\.{\\errmessage} primitive@>
27504
27505@ @<Cases of |print_cmd_chr|...@>=
27506message: if chr_code=0 then print_esc("message")
27507  else print_esc("errmessage");
27508
27509@ @<Declare act...@>=
27510procedure issue_message;
27511var old_setting:0..max_selector; {holds |selector| setting}
27512@!c:0..1; {identifies \.{\\message} and \.{\\errmessage}}
27513@!s:str_number; {the message}
27514begin c:=cur_chr; link(garbage):=scan_toks(false,true);
27515old_setting:=selector; selector:=new_string;
27516token_show(def_ref); selector:=old_setting;
27517flush_list(def_ref);
27518str_room(1); s:=make_string;
27519if c=0 then @<Print string |s| on the terminal@>
27520else @<Print string |s| as an error message@>;
27521flush_string;
27522end;
27523
27524@ @<Print string |s| on the terminal@>=
27525begin if term_offset+length(s)>max_print_line-2 then print_ln
27526else if (term_offset>0)or(file_offset>0) then print_char(" ");
27527slow_print(s); update_terminal;
27528end
27529
27530@ If \.{\\errmessage} occurs often in |scroll_mode|, without user-defined
27531\.{\\errhelp}, we don't want to give a long help message each time. So we
27532give a verbose explanation only once.
27533
27534@<Glob...@>=
27535@!long_help_seen:boolean; {has the long \.{\\errmessage} help been used?}
27536
27537@ @<Set init...@>=long_help_seen:=false;
27538
27539@ @<Print string |s| as an error message@>=
27540begin print_err(""); slow_print(s);
27541if err_help<>null then use_err_help:=true
27542else if long_help_seen then help1("(That was another \errmessage.)")
27543else  begin if interaction<error_stop_mode then long_help_seen:=true;
27544  help4("This error message was generated by an \errmessage")@/
27545  ("command, so I can't give any explicit help.")@/
27546  ("Pretend that you're Hercule Poirot: Examine all clues,")@/
27547@^Poirot, Hercule@>
27548  ("and deduce the truth by order and method.");
27549  end;
27550error; use_err_help:=false;
27551end
27552
27553@ The |error| routine calls on |give_err_help| if help is requested from
27554the |err_help| parameter.
27555
27556@p procedure give_err_help;
27557begin token_show(err_help);
27558end;
27559
27560@ The \.{\\uppercase} and \.{\\lowercase} commands are implemented by
27561building a token list and then changing the cases of the letters in it.
27562
27563@<Cases of |main_control| that don't...@>=
27564any_mode(case_shift):shift_case;
27565
27566@ @<Put each...@>=
27567primitive("lowercase",case_shift,lc_code_base);
27568@!@:lowercase_}{\.{\\lowercase} primitive@>
27569primitive("uppercase",case_shift,uc_code_base);
27570@!@:uppercase_}{\.{\\uppercase} primitive@>
27571
27572@ @<Cases of |print_cmd_chr|...@>=
27573case_shift:if chr_code=lc_code_base then print_esc("lowercase")
27574  else print_esc("uppercase");
27575
27576@ @<Declare act...@>=
27577procedure shift_case;
27578var b:pointer; {|lc_code_base| or |uc_code_base|}
27579@!p:pointer; {runs through the token list}
27580@!t:halfword; {token}
27581@!c:integer; {character code}
27582begin b:=cur_chr; p:=scan_toks(false,false); p:=link(def_ref);
27583while p<>null do
27584  begin @<Change the case of the token in |p|, if a change is appropriate@>;
27585  p:=link(p);
27586  end;
27587back_list(link(def_ref)); free_avail(def_ref); {omit reference count}
27588end;
27589
27590@ When the case of a |chr_code| changes, we don't change the |cmd|.
27591We also change active characters, using the fact that
27592|cs_token_flag+active_base| is a multiple of~256.
27593@^data structure assumptions@>
27594
27595@<Change the case of the token in |p|, if a change is appropriate@>=
27596t:=info(p);
27597if t<cs_token_flag+single_base then
27598  begin c:=t mod max_char_val;
27599  if equiv(b+c)<>0 then info(p):=t-c+equiv(b+c);
27600  end
27601
27602@ We come finally to the last pieces missing from |main_control|, namely the
27603`\.{\\show}' commands that are useful when debugging.
27604
27605@<Cases of |main_control| that don't...@>=
27606any_mode(xray): show_whatever;
27607
27608@ @d show_code=0 { \.{\\show} }
27609@d show_box_code=1 { \.{\\showbox} }
27610@d show_the_code=2 { \.{\\showthe} }
27611@d show_lists=3 { \.{\\showlists} }
27612
27613@<Put each...@>=
27614primitive("show",xray,show_code);
27615@!@:show_}{\.{\\show} primitive@>
27616primitive("showbox",xray,show_box_code);
27617@!@:show_box_}{\.{\\showbox} primitive@>
27618primitive("showthe",xray,show_the_code);
27619@!@:show_the_}{\.{\\showthe} primitive@>
27620primitive("showlists",xray,show_lists);
27621@!@:show_lists_}{\.{\\showlists} primitive@>
27622
27623@ @<Cases of |print_cmd_chr|...@>=
27624xray: case chr_code of
27625  show_box_code:print_esc("showbox");
27626  show_the_code:print_esc("showthe");
27627  show_lists:print_esc("showlists");
27628  @<Cases of |xray| for |print_cmd_chr|@>@;@/
27629  othercases print_esc("show")
27630  endcases;
27631
27632@ @<Declare act...@>=
27633procedure show_whatever;
27634label common_ending;
27635var p:pointer; {tail of a token list to show}
27636@!t:small_number; {type of conditional being shown}
27637@!m:normal..or_code; {upper bound on |fi_or_else| codes}
27638@!l:integer; {line where that conditional began}
27639@!n:integer; {level of \.{\\if...\\fi} nesting}
27640begin case cur_chr of
27641show_lists: begin begin_diagnostic; show_activities;
27642  end;
27643show_box_code: @<Show the current contents of a box@>;
27644show_code: @<Show the current meaning of a token, then |goto common_ending|@>;
27645@<Cases for |show_whatever|@>@;@/
27646othercases @<Show the current value of some parameter or register,
27647  then |goto common_ending|@>
27648endcases;@/
27649@<Complete a potentially long \.{\\show} command@>;
27650common_ending: if interaction<error_stop_mode then
27651  begin help0; decr(error_count);
27652  end
27653else if tracing_online>0 then
27654  begin@t@>@;@/
27655  help3("This isn't an error message; I'm just \showing something.")@/
27656  ("Type `I\show...' to show more (e.g., \show\cs,")@/
27657  ("\showthe\count10, \showbox255, \showlists).");
27658  end
27659else  begin@t@>@;@/
27660  help5("This isn't an error message; I'm just \showing something.")@/
27661  ("Type `I\show...' to show more (e.g., \show\cs,")@/
27662  ("\showthe\count10, \showbox255, \showlists).")@/
27663  ("And type `I\tracingonline=1\show...' to show boxes and")@/
27664  ("lists on your terminal as well as in the transcript file.");
27665  end;
27666error;
27667end;
27668
27669@ @<Show the current meaning of a token...@>=
27670begin get_token;
27671if interaction=error_stop_mode then wake_up_terminal;
27672print_nl("> ");
27673if cur_cs<>0 then
27674  begin sprint_cs(cur_cs); print_char("=");
27675  end;
27676print_meaning; goto common_ending;
27677end
27678
27679@ @<Cases of |print_cmd_chr|...@>=
27680undefined_cs: print("undefined");
27681call,long_call,outer_call,long_outer_call: begin n:=cmd-call;
27682  if info(link(chr_code))=protected_token then n:=n+4;
27683  if odd(n div 4) then print_esc("protected");
27684  if odd(n) then print_esc("long");
27685  if odd(n div 2) then print_esc("outer");
27686  if n>0 then print_char(" ");
27687  print("macro");
27688  end;
27689end_template: print_esc("outer endtemplate");
27690
27691@ @<Show the current contents of a box@>=
27692begin scan_register_num; fetch_box(p); begin_diagnostic;
27693print_nl("> \box"); print_int(cur_val); print_char("=");
27694if p=null then print("void")@+else show_box(p);
27695end
27696
27697@ @<Show the current value of some parameter...@>=
27698begin p:=the_toks;
27699if interaction=error_stop_mode then wake_up_terminal;
27700print_nl("> "); token_show(temp_head);
27701flush_list(link(temp_head)); goto common_ending;
27702end
27703
27704@ @<Complete a potentially long \.{\\show} command@>=
27705end_diagnostic(true); print_err("OK");
27706@.OK@>
27707if selector=term_and_log then if tracing_online<=0 then
27708  begin selector:=term_only; print(" (see the transcript file)");
27709  selector:=term_and_log;
27710  end
27711
27712@* \[50] Dumping and undumping the tables.
27713After \.{INITEX} has seen a collection of fonts and macros, it
27714can write all the necessary information on an auxiliary file so
27715that production versions of \TeX\ are able to initialize their
27716memory at high speed. The present section of the program takes
27717care of such output and input. We shall consider simultaneously
27718the processes of storing and restoring,
27719so that the inverse relation between them is clear.
27720@.INITEX@>
27721
27722The global variable |format_ident| is a string that is printed right
27723after the |banner| line when \TeX\ is ready to start. For \.{INITEX} this
27724string says simply `\.{(INITEX)}'; for other versions of \TeX\ it says,
27725for example, `\.{(preloaded format=plain 1982.11.19)}', showing the year,
27726month, and day that the format file was created. We have |format_ident=0|
27727before \TeX's tables are loaded.
27728
27729@<Glob...@>=
27730@!format_ident:str_number;
27731
27732@ @<Set init...@>=
27733format_ident:=0;
27734
27735@ @<Initialize table entries...@>=
27736format_ident:=" (INITEX)";
27737
27738@ @<Declare act...@>=
27739@!init procedure store_fmt_file;
27740label found1,found2,done1,done2;
27741var j,@!k,@!l:integer; {all-purpose indices}
27742@!p,@!q: pointer; {all-purpose pointers}
27743@!x: integer; {something to dump}
27744@!w: four_quarters; {four ASCII codes}
27745begin @<If dumping is not allowed, abort@>;
27746@<Create the |format_ident|, open the format file,
27747  and inform the user that dumping has begun@>;
27748@<Dump constants for consistency check@>;
27749@<Dump the string pool@>;
27750@<Dump the dynamic memory@>;
27751@<Dump the table of equivalents@>;
27752@<Dump the font information@>;
27753@<Dump the hyphenation tables@>;
27754@<Dump a couple more things and the closing check word@>;
27755@<Close the format file@>;
27756end;
27757tini
27758
27759@ Corresponding to the procedure that dumps a format file, we have a function
27760that reads one in. The function returns |false| if the dumped format is
27761incompatible with the present \TeX\ table sizes, etc.
27762
27763@d bad_fmt=6666 {go here if the format file is unacceptable}
27764@d too_small(#)==begin wake_up_terminal;
27765  wterm_ln('---! Must increase the ',#);
27766@.Must increase the x@>
27767  goto bad_fmt;
27768  end
27769
27770@p @t\4@>@<Declare the function called |open_fmt_file|@>@;
27771function load_fmt_file:boolean;
27772label bad_fmt,exit;
27773var j,@!k:integer; {all-purpose indices}
27774@!p,@!q: pointer; {all-purpose pointers}
27775@!x: integer; {something undumped}
27776@!w: four_quarters; {four ASCII codes}
27777begin @<Undump constants for consistency check@>;
27778@<Undump the string pool@>;
27779@<Undump the dynamic memory@>;
27780@<Undump the table of equivalents@>;
27781@<Undump the font information@>;
27782@<Undump the hyphenation tables@>;
27783@<Undump a couple more things and the closing check word@>;
27784load_fmt_file:=true; return; {it worked!}
27785bad_fmt: wake_up_terminal;
27786  wterm_ln('(Fatal format file error; I''m stymied)');
27787@.Fatal format file error@>
27788load_fmt_file:=false;
27789exit:end;
27790
27791@ The user is not allowed to dump a format file unless |save_ptr=0|.
27792This condition implies that |cur_level=level_one|, hence
27793the |xeq_level| array is constant and it need not be dumped.
27794
27795@<If dumping is not allowed, abort@>=
27796if save_ptr<>0 then
27797  begin print_err("You can't dump inside a group");
27798@.You can't dump...@>
27799  help1("`{...\dump}' is a no-no."); succumb;
27800  end
27801
27802@ Format files consist of |memory_word| items, and we use the following
27803macros to dump words of different types:
27804
27805@d dump_wd(#)==begin fmt_file^:=#; put(fmt_file);@+end
27806@d dump_int(#)==begin fmt_file^.int:=#; put(fmt_file);@+end
27807@d dump_hh(#)==begin fmt_file^.hh:=#; put(fmt_file);@+end
27808@d dump_qqqq(#)==begin fmt_file^.qqqq:=#; put(fmt_file);@+end
27809
27810@<Glob...@>=
27811@!fmt_file:word_file; {for input or output of format information}
27812
27813@ The inverse macros are slightly more complicated, since we need to check
27814the range of the values we are reading in. We say `|undump(a)(b)(x)|' to
27815read an integer value |x| that is supposed to be in the range |a<=x<=b|.
27816
27817@d undump_wd(#)==begin get(fmt_file); #:=fmt_file^;@+end
27818@d undump_int(#)==begin get(fmt_file); #:=fmt_file^.int;@+end
27819@d undump_hh(#)==begin get(fmt_file); #:=fmt_file^.hh;@+end
27820@d undump_qqqq(#)==begin get(fmt_file); #:=fmt_file^.qqqq;@+end
27821@d undump_end_end(#)==#:=x;@+end
27822@d undump_end(#)==(x>#) then goto bad_fmt@+else undump_end_end
27823@d undump(#)==begin undump_int(x); if (x<#) or undump_end
27824@d undump_size_end_end(#)==too_small(#)@+else undump_end_end
27825@d undump_size_end(#)==if x># then undump_size_end_end
27826@d undump_size(#)==begin undump_int(x);
27827  if x<# then goto bad_fmt; undump_size_end
27828
27829@ The next few sections of the program should make it clear how we use the
27830dump/undump macros.
27831
27832@<Dump constants for consistency check@>=
27833dump_int(@$);@/
27834@<Dump the \eTeX\ state@>@/
27835dump_int(mem_bot);@/
27836dump_int(mem_top);@/
27837dump_int(eqtb_size);@/
27838dump_int(hash_prime);@/
27839dump_int(hyph_size)
27840
27841@ Sections of a \.{WEB} program that are ``commented out'' still contribute
27842strings to the string pool; therefore \.{INITEX} and \TeX\ will have
27843the same strings. (And it is, of course, a good thing that they do.)
27844@.WEB@>
27845@^string pool@>
27846
27847@<Undump constants for consistency check@>=
27848x:=fmt_file^.int;
27849if x<>@$ then goto bad_fmt; {check that strings are the same}
27850@/@<Undump the \eTeX\ state@>@/
27851undump_int(x);
27852if x<>mem_bot then goto bad_fmt;
27853undump_int(x);
27854if x<>mem_top then goto bad_fmt;
27855undump_int(x);
27856if x<>eqtb_size then goto bad_fmt;
27857undump_int(x);
27858if x<>hash_prime then goto bad_fmt;
27859undump_int(x);
27860if x<>hyph_size then goto bad_fmt
27861
27862@ @d dump_four_ASCII==
27863  w.b0:=qi(so(str_pool[k])); w.b1:=qi(so(str_pool[k+1]));
27864  w.b2:=qi(so(str_pool[k+2])); w.b3:=qi(so(str_pool[k+3]));
27865  dump_qqqq(w)
27866
27867@<Dump the string pool@>=
27868dump_int(pool_ptr);
27869dump_int(str_ptr);
27870for k:=0 to str_ptr do dump_int(str_start[k]);
27871k:=0;
27872while k+4<pool_ptr do
27873  begin dump_four_ASCII; k:=k+4;
27874  end;
27875k:=pool_ptr-4; dump_four_ASCII;
27876print_ln; print_int(str_ptr); print(" strings of total length ");
27877print_int(pool_ptr)
27878
27879@ @d undump_four_ASCII==
27880  undump_qqqq(w);
27881  str_pool[k]:=si(qo(w.b0)); str_pool[k+1]:=si(qo(w.b1));
27882  str_pool[k+2]:=si(qo(w.b2)); str_pool[k+3]:=si(qo(w.b3))
27883
27884@<Undump the string pool@>=
27885undump_size(0)(pool_size)('string pool size')(pool_ptr);
27886undump_size(0)(max_strings)('max strings')(str_ptr);
27887for k:=0 to str_ptr do undump(0)(pool_ptr)(str_start[k]);
27888k:=0;
27889while k+4<pool_ptr do
27890  begin undump_four_ASCII; k:=k+4;
27891  end;
27892k:=pool_ptr-4; undump_four_ASCII;
27893init_str_ptr:=str_ptr; init_pool_ptr:=pool_ptr
27894
27895@ By sorting the list of available spaces in the variable-size portion of
27896|mem|, we are usually able to get by without having to dump very much
27897of the dynamic memory.
27898
27899We recompute |var_used| and |dyn_used|, so that \.{INITEX} dumps valid
27900information even when it has not been gathering statistics.
27901
27902@<Dump the dynamic memory@>=
27903sort_avail; var_used:=0;
27904dump_int(lo_mem_max); dump_int(rover);
27905if eTeX_ex then for k:=int_val to inter_char_val do dump_int(sa_root[k]);
27906p:=mem_bot; q:=rover; x:=0;
27907repeat for k:=p to q+1 do dump_wd(mem[k]);
27908x:=x+q+2-p; var_used:=var_used+q-p;
27909p:=q+node_size(q); q:=rlink(q);
27910until q=rover;
27911var_used:=var_used+lo_mem_max-p; dyn_used:=mem_end+1-hi_mem_min;@/
27912for k:=p to lo_mem_max do dump_wd(mem[k]);
27913x:=x+lo_mem_max+1-p;
27914dump_int(hi_mem_min); dump_int(avail);
27915for k:=hi_mem_min to mem_end do dump_wd(mem[k]);
27916x:=x+mem_end+1-hi_mem_min;
27917p:=avail;
27918while p<>null do
27919  begin decr(dyn_used); p:=link(p);
27920  end;
27921dump_int(var_used); dump_int(dyn_used);
27922print_ln; print_int(x);
27923print(" memory locations dumped; current usage is ");
27924print_int(var_used); print_char("&"); print_int(dyn_used)
27925
27926@ @<Undump the dynamic memory@>=
27927undump(lo_mem_stat_max+1000)(hi_mem_stat_min-1)(lo_mem_max);
27928undump(lo_mem_stat_max+1)(lo_mem_max)(rover);
27929if eTeX_ex then for k:=int_val to inter_char_val do
27930  undump(null)(lo_mem_max)(sa_root[k]);
27931p:=mem_bot; q:=rover;
27932repeat for k:=p to q+1 do undump_wd(mem[k]);
27933p:=q+node_size(q);
27934if (p>lo_mem_max)or((q>=rlink(q))and(rlink(q)<>rover)) then goto bad_fmt;
27935q:=rlink(q);
27936until q=rover;
27937for k:=p to lo_mem_max do undump_wd(mem[k]);
27938if mem_min<mem_bot-2 then {make more low memory available}
27939  begin p:=llink(rover); q:=mem_min+1;
27940  link(mem_min):=null; info(mem_min):=null; {we don't use the bottom word}
27941  rlink(p):=q; llink(rover):=q;@/
27942  rlink(q):=rover; llink(q):=p; link(q):=empty_flag;
27943  node_size(q):=mem_bot-q;
27944  end;
27945undump(lo_mem_max+1)(hi_mem_stat_min)(hi_mem_min);
27946undump(null)(mem_top)(avail); mem_end:=mem_top;
27947for k:=hi_mem_min to mem_end do undump_wd(mem[k]);
27948undump_int(var_used); undump_int(dyn_used)
27949
27950@ @<Dump the table of equivalents@>=
27951@<Dump regions 1 to 4 of |eqtb|@>;
27952@<Dump regions 5 and 6 of |eqtb|@>;
27953dump_int(par_loc); dump_int(write_loc);@/
27954@<Dump the hash table@>
27955
27956@ @<Undump the table of equivalents@>=
27957@<Undump regions 1 to 6 of |eqtb|@>;
27958undump(hash_base)(frozen_control_sequence)(par_loc);
27959par_token:=cs_token_flag+par_loc;@/
27960undump(hash_base)(frozen_control_sequence)(write_loc);@/
27961@<Undump the hash table@>
27962
27963@ The table of equivalents usually contains repeated information, so we dump it
27964in compressed form: The sequence of $n+2$ values $(n,x_1,\ldots,x_n,m)$ in the
27965format file represents $n+m$ consecutive entries of |eqtb|, with |m| extra
27966copies of $x_n$, namely $(x_1,\ldots,x_n,x_n,\ldots,x_n)$.
27967
27968@<Dump regions 1 to 4 of |eqtb|@>=
27969k:=active_base;
27970repeat j:=k;
27971while j<int_base-1 do
27972  begin if (equiv(j)=equiv(j+1))and(eq_type(j)=eq_type(j+1))and@|
27973    (eq_level(j)=eq_level(j+1)) then goto found1;
27974  incr(j);
27975  end;
27976l:=int_base; goto done1; {|j=int_base-1|}
27977found1: incr(j); l:=j;
27978while j<int_base-1 do
27979  begin if (equiv(j)<>equiv(j+1))or(eq_type(j)<>eq_type(j+1))or@|
27980    (eq_level(j)<>eq_level(j+1)) then goto done1;
27981  incr(j);
27982  end;
27983done1:dump_int(l-k);
27984while k<l do
27985  begin dump_wd(eqtb[k]); incr(k);
27986  end;
27987k:=j+1; dump_int(k-l);
27988until k=int_base
27989
27990@ @<Dump regions 5 and 6 of |eqtb|@>=
27991repeat j:=k;
27992while j<eqtb_size do
27993  begin if eqtb[j].int=eqtb[j+1].int then goto found2;
27994  incr(j);
27995  end;
27996l:=eqtb_size+1; goto done2; {|j=eqtb_size|}
27997found2: incr(j); l:=j;
27998while j<eqtb_size do
27999  begin if eqtb[j].int<>eqtb[j+1].int then goto done2;
28000  incr(j);
28001  end;
28002done2:dump_int(l-k);
28003while k<l do
28004  begin dump_wd(eqtb[k]); incr(k);
28005  end;
28006k:=j+1; dump_int(k-l);
28007until k>eqtb_size
28008
28009@ @<Undump regions 1 to 6 of |eqtb|@>=
28010k:=active_base;
28011repeat undump_int(x);
28012if (x<1)or(k+x>eqtb_size+1) then goto bad_fmt;
28013for j:=k to k+x-1 do undump_wd(eqtb[j]);
28014k:=k+x;
28015undump_int(x);
28016if (x<0)or(k+x>eqtb_size+1) then goto bad_fmt;
28017for j:=k to k+x-1 do eqtb[j]:=eqtb[k-1];
28018k:=k+x;
28019until k>eqtb_size
28020
28021@ A different scheme is used to compress the hash table, since its lower
28022region is usually sparse. When |text(p)<>0| for |p<=hash_used|, we output
28023two words, |p| and |hash[p]|. The hash table is, of course, densely packed
28024for |p>=hash_used|, so the remaining entries are output in a~block.
28025
28026@<Dump the hash table@>=
28027for p:=0 to prim_size do dump_hh(prim[p]);
28028for p:=0 to prim_size do dump_wd(prim_eqtb[p]);
28029dump_int(hash_used); cs_count:=frozen_control_sequence-1-hash_used;
28030for p:=hash_base to hash_used do if text(p)<>0 then
28031  begin dump_int(p); dump_hh(hash[p]); incr(cs_count);
28032  end;
28033for p:=hash_used+1 to undefined_control_sequence-1 do dump_hh(hash[p]);
28034dump_int(cs_count);@/
28035print_ln; print_int(cs_count); print(" multiletter control sequences")
28036
28037@ @<Undump the hash table@>=
28038for p:=0 to prim_size do undump_hh(prim[p]);
28039for p:=0 to prim_size do undump_wd(prim_eqtb[p]);
28040undump(hash_base)(frozen_control_sequence)(hash_used); p:=hash_base-1;
28041repeat undump(p+1)(hash_used)(p); undump_hh(hash[p]);
28042until p=hash_used;
28043for p:=hash_used+1 to undefined_control_sequence-1 do undump_hh(hash[p]);
28044undump_int(cs_count)
28045
28046@ @<Dump the font information@>=
28047dump_int(fmem_ptr);
28048for k:=0 to fmem_ptr-1 do dump_wd(font_info[k]);
28049dump_int(font_ptr);
28050for k:=null_font to font_ptr do
28051  @<Dump the array info for internal font number |k|@>;
28052print_ln; print_int(fmem_ptr-7); print(" words of font info for ");
28053print_int(font_ptr-font_base); print(" preloaded font");
28054if font_ptr<>font_base+1 then print_char("s")
28055
28056@ @<Undump the font information@>=
28057undump_size(7)(font_mem_size)('font mem size')(fmem_ptr);
28058for k:=0 to fmem_ptr-1 do undump_wd(font_info[k]);
28059undump_size(font_base)(font_max)('font max')(font_ptr);
28060for k:=null_font to font_ptr do
28061  @<Undump the array info for internal font number |k|@>
28062
28063@ @<Dump the array info for internal font number |k|@>=
28064begin dump_qqqq(font_check[k]);
28065dump_int(font_size[k]);
28066dump_int(font_dsize[k]);
28067dump_int(font_params[k]);@/
28068dump_int(hyphen_char[k]);
28069dump_int(skew_char[k]);@/
28070dump_int(font_name[k]);
28071dump_int(font_area[k]);@/
28072dump_int(font_bc[k]);
28073dump_int(font_ec[k]);@/
28074dump_int(char_base[k]);
28075dump_int(width_base[k]);
28076dump_int(height_base[k]);@/
28077dump_int(depth_base[k]);
28078dump_int(italic_base[k]);
28079dump_int(lig_kern_base[k]);@/
28080dump_int(kern_base[k]);
28081dump_int(exten_base[k]);
28082dump_int(param_base[k]);@/
28083dump_int(font_glue[k]);@/
28084dump_int(bchar_label[k]);
28085dump_int(font_bchar[k]);
28086dump_int(font_false_bchar[k]);@/
28087print_nl("\font"); print_esc(font_id_text(k)); print_char("=");
28088print_file_name(font_name[k],font_area[k],"");
28089if font_size[k]<>font_dsize[k] then
28090  begin print(" at "); print_scaled(font_size[k]); print("pt");
28091  end;
28092end
28093
28094@ @<Undump the array info for internal font number |k|@>=
28095begin undump_qqqq(font_check[k]);@/
28096undump_int(font_size[k]);
28097undump_int(font_dsize[k]);
28098undump(min_halfword)(max_halfword)(font_params[k]);@/
28099undump_int(hyphen_char[k]);
28100undump_int(skew_char[k]);@/
28101undump(0)(str_ptr)(font_name[k]);
28102undump(0)(str_ptr)(font_area[k]);@/
28103undump(0)(255)(font_bc[k]);
28104undump(0)(255)(font_ec[k]);@/
28105undump_int(char_base[k]);
28106undump_int(width_base[k]);
28107undump_int(height_base[k]);@/
28108undump_int(depth_base[k]);
28109undump_int(italic_base[k]);
28110undump_int(lig_kern_base[k]);@/
28111undump_int(kern_base[k]);
28112undump_int(exten_base[k]);
28113undump_int(param_base[k]);@/
28114undump(min_halfword)(lo_mem_max)(font_glue[k]);@/
28115undump(0)(fmem_ptr-1)(bchar_label[k]);
28116undump(min_quarterword)(non_char)(font_bchar[k]);
28117undump(min_quarterword)(non_char)(font_false_bchar[k]);
28118end
28119
28120@ @<Dump the hyphenation tables@>=
28121dump_int(hyph_count);
28122for k:=0 to hyph_size do if hyph_word[k]<>0 then
28123  begin dump_int(k); dump_int(hyph_word[k]); dump_int(hyph_list[k]);
28124  end;
28125print_ln; print_int(hyph_count); print(" hyphenation exception");
28126if hyph_count<>1 then print_char("s");
28127if trie_not_ready then init_trie;
28128dump_int(trie_max);
28129dump_int(hyph_start);
28130for k:=0 to trie_max do dump_hh(trie[k]);
28131dump_int(max_hyph_char);
28132dump_int(trie_op_ptr);
28133for k:=1 to trie_op_ptr do
28134  begin dump_int(hyf_distance[k]);
28135  dump_int(hyf_num[k]);
28136  dump_int(hyf_next[k]);
28137  end;
28138print_nl("Hyphenation trie of length "); print_int(trie_max);
28139@.Hyphenation trie...@>
28140print(" has "); print_int(trie_op_ptr); print(" op");
28141if trie_op_ptr<>1 then print_char("s");
28142print(" out of "); print_int(trie_op_size);
28143for k:=biggest_lang downto 0 do if trie_used[k]>min_quarterword then
28144  begin print_nl("  "); print_int(qo(trie_used[k]));
28145  print(" for language "); print_int(k);
28146  dump_int(k); dump_int(qo(trie_used[k]));
28147  end
28148
28149@ Only ``nonempty'' parts of |op_start| need to be restored.
28150
28151@<Undump the hyphenation tables@>=
28152undump(0)(hyph_size)(hyph_count);
28153for k:=1 to hyph_count do
28154  begin undump(0)(hyph_size)(j);
28155  undump(0)(str_ptr)(hyph_word[j]);
28156  undump(min_halfword)(max_halfword)(hyph_list[j]);
28157  end;
28158undump_size(0)(trie_size)('trie size')(j); @+init trie_max:=j;@+tini
28159undump(0)(j)(hyph_start);
28160for k:=0 to j do undump_hh(trie[k]);
28161undump_int(max_hyph_char);
28162undump_size(0)(trie_op_size)('trie op size')(j); @+init trie_op_ptr:=j;@+tini
28163for k:=1 to j do
28164  begin undump(0)(63)(hyf_distance[k]); {a |small_number|}
28165  undump(0)(63)(hyf_num[k]);
28166  undump(min_quarterword)(max_quarterword)(hyf_next[k]);
28167  end;
28168init for k:=0 to biggest_lang do trie_used[k]:=min_quarterword;@+tini@;@/
28169k:=biggest_lang+1;
28170while j>0 do
28171  begin undump(0)(k-1)(k); undump(1)(j)(x);@+init trie_used[k]:=qi(x);@+tini@;@/
28172  j:=j-x; op_start[k]:=qo(j);
28173  end;
28174@!init trie_not_ready:=false @+tini
28175
28176@ We have already printed a lot of statistics, so we set |tracing_stats:=0|
28177to prevent them from appearing again.
28178
28179@<Dump a couple more things and the closing check word@>=
28180dump_int(interaction); dump_int(format_ident); dump_int(69069);
28181tracing_stats:=0
28182
28183@ @<Undump a couple more things and the closing check word@>=
28184undump(batch_mode)(error_stop_mode)(interaction);
28185undump(0)(str_ptr)(format_ident);
28186undump_int(x);
28187if (x<>69069)or eof(fmt_file) then goto bad_fmt
28188
28189@ @<Create the |format_ident|...@>=
28190selector:=new_string;
28191print(" (preloaded format="); print(job_name); print_char(" ");
28192print_int(year); print_char(".");
28193print_int(month); print_char("."); print_int(day); print_char(")");
28194if interaction=batch_mode then selector:=log_only
28195else selector:=term_and_log;
28196str_room(1);
28197format_ident:=make_string;
28198pack_job_name(format_extension);
28199while not w_open_out(fmt_file) do
28200  prompt_file_name("format file name",format_extension);
28201print_nl("Beginning to dump on file ");
28202@.Beginning to dump...@>
28203slow_print(w_make_name_string(fmt_file)); flush_string;
28204print_nl(""); slow_print(format_ident)
28205
28206@ @<Close the format file@>=
28207w_close(fmt_file)
28208
28209@* \[51] The main program.
28210This is it: the part of \TeX\ that executes all those procedures we have
28211written.
28212
28213Well---almost. Let's leave space for a few more routines that we may
28214have forgotten.
28215
28216@p @<Last-minute procedures@>
28217
28218@ We have noted that there are two versions of \TeX82. One, called \.{INITEX},
28219@.INITEX@>
28220has to be run first; it initializes everything from scratch, without
28221reading a format file, and it has the capability of dumping a format file.
28222The other one is called `\.{VIRTEX}'; it is a ``virgin'' program that needs
28223@.VIRTEX@>
28224to input a format file in order to get started. \.{VIRTEX} typically has
28225more memory capacity than \.{INITEX}, because it does not need the space
28226consumed by the auxiliary hyphenation tables and the numerous calls on
28227|primitive|, etc.
28228
28229The \.{VIRTEX} program cannot read a format file instantaneously, of course;
28230the best implementations therefore allow for production versions of \TeX\ that
28231not only avoid the loading routine for \PASCAL\ object code, they also have
28232a format file pre-loaded. This is impossible to do if we stick to standard
28233\PASCAL; but there is a simple way to fool many systems into avoiding the
28234initialization, as follows:\quad(1)~We declare a global integer variable
28235called |ready_already|. The probability is negligible that this
28236variable holds any particular value like 314159 when \.{VIRTEX} is first
28237loaded.\quad(2)~After we have read in a format file and initialized
28238everything, we set |ready_already:=314159|.\quad(3)~Soon \.{VIRTEX}
28239will print `\.*', waiting for more input; and at this point we
28240interrupt the program and save its core image in some form that the
28241operating system can reload speedily.\quad(4)~When that core image is
28242activated, the program starts again at the beginning; but now
28243|ready_already=314159| and all the other global variables have
28244their initial values too. The former chastity has vanished!
28245
28246In other words, if we allow ourselves to test the condition
28247|ready_already=314159|, before |ready_already| has been
28248assigned a value, we can avoid the lengthy initialization. Dirty tricks
28249rarely pay off so handsomely.
28250@^dirty \PASCAL@>
28251@^system dependencies@>
28252
28253On systems that allow such preloading, the standard program called \.{TeX}
28254should be the one that has \.{plain} format preloaded, since that agrees
28255with {\sl The \TeX book}. Other versions, e.g., \.{AmSTeX}, should also
28256@:TeXbook}{\sl The \TeX book@>
28257@.AmSTeX@>
28258@.plain@>
28259be provided for commonly used formats.
28260
28261@<Glob...@>=
28262@!ready_already:integer; {a sacrifice of purity for economy}
28263
28264@ Now this is really it: \TeX\ starts and ends here.
28265
28266The initial test involving |ready_already| should be deleted if the
28267\PASCAL\ runtime system is smart enough to detect such a ``mistake.''
28268@^system dependencies@>
28269
28270@p begin @!{|start_here|}
28271history:=fatal_error_stop; {in case we quit during initialization}
28272t_open_out; {open the terminal for output}
28273if ready_already=314159 then goto start_of_TEX;
28274@<Check the ``constant'' values...@>@;
28275if bad>0 then
28276  begin wterm_ln('Ouch---my internal constants have been clobbered!',
28277    '---case ',bad:1);
28278@.Ouch...clobbered@>
28279  goto final_end;
28280  end;
28281initialize; {set global variables to their starting values}
28282@!init if not get_strings_started then goto final_end;
28283init_prim; {call |primitive| for each primitive}
28284init_str_ptr:=str_ptr; init_pool_ptr:=pool_ptr; fix_date_and_time;
28285tini@/
28286ready_already:=314159;
28287start_of_TEX: @<Initialize the output routines@>;
28288@<Get the first line of input and prepare to start@>;
28289history:=spotless; {ready to go!}
28290main_control; {come to life}
28291final_cleanup; {prepare for death}
28292end_of_TEX: close_files_and_terminate;
28293final_end: ready_already:=0;
28294end.
28295
28296@ Here we do whatever is needed to complete \TeX's job gracefully on the
28297local operating system. The code here might come into play after a fatal
28298error; it must therefore consist entirely of ``safe'' operations that
28299cannot produce error messages. For example, it would be a mistake to call
28300|str_room| or |make_string| at this time, because a call on |overflow|
28301might lead to an infinite loop.
28302@^system dependencies@>
28303
28304Actually there's one way to get error messages, via |prepare_mag|;
28305but that can't cause infinite recursion.
28306@^recursion@>
28307
28308This program doesn't bother to close the input files that may still be open.
28309
28310@<Last-minute...@>=
28311procedure close_files_and_terminate;
28312var k:integer; {all-purpose index}
28313begin @<Finish the extensions@>;
28314@!stat if tracing_stats>0 then @<Output statistics about this job@>;@;@+tats@/
28315wake_up_terminal; @<Finish the \.{DVI} file@>;
28316if log_opened then
28317  begin wlog_cr; a_close(log_file); selector:=selector-2;
28318  if selector=term_only then
28319    begin print_nl("Transcript written on ");
28320@.Transcript written...@>
28321    slow_print(log_name); print_char(".");
28322    end;
28323  end;
28324end;
28325
28326@ The present section goes directly to the log file instead of using
28327|print| commands, because there's no need for these strings to take
28328up |str_pool| memory when a non-{\bf stat} version of \TeX\ is being used.
28329
28330@<Output statistics...@>=
28331if log_opened then
28332  begin wlog_ln(' ');
28333  wlog_ln('Here is how much of TeX''s memory',' you used:');
28334@.Here is how much...@>
28335  wlog(' ',str_ptr-init_str_ptr:1,' string');
28336  if str_ptr<>init_str_ptr+1 then wlog('s');
28337  wlog_ln(' out of ', max_strings-init_str_ptr:1);@/
28338  wlog_ln(' ',pool_ptr-init_pool_ptr:1,' string characters out of ',
28339    pool_size-init_pool_ptr:1);@/
28340  wlog_ln(' ',lo_mem_max-mem_min+mem_end-hi_mem_min+2:1,@|
28341    ' words of memory out of ',mem_end+1-mem_min:1);@/
28342  wlog_ln(' ',cs_count:1,' multiletter control sequences out of ',
28343    hash_size:1);@/
28344  wlog(' ',fmem_ptr:1,' words of font info for ',
28345    font_ptr-font_base:1,' font');
28346  if font_ptr<>font_base+1 then wlog('s');
28347  wlog_ln(', out of ',font_mem_size:1,' for ',font_max-font_base:1);@/
28348  wlog(' ',hyph_count:1,' hyphenation exception');
28349  if hyph_count<>1 then wlog('s');
28350  wlog_ln(' out of ',hyph_size:1);@/
28351  wlog_ln(' ',max_in_stack:1,'i,',max_nest_stack:1,'n,',@|
28352    max_param_stack:1,'p,',@|
28353    max_buf_stack+1:1,'b,',@|
28354    max_save_stack+6:1,'s stack positions out of ',@|
28355    stack_size:1,'i,',
28356    nest_size:1,'n,',
28357    param_size:1,'p,',
28358    buf_size:1,'b,',
28359    save_size:1,'s');
28360  end
28361
28362@ We get to the |final_cleanup| routine when \.{\\end} or \.{\\dump} has
28363been scanned and |its_all_over|\kern-2pt.
28364
28365@<Last-minute...@>=
28366procedure final_cleanup;
28367label exit;
28368var c:small_number; {0 for \.{\\end}, 1 for \.{\\dump}}
28369begin c:=cur_chr;
28370if job_name=0 then open_log_file;
28371while input_ptr>0 do
28372  if state=token_list then end_token_list@+else end_file_reading;
28373while open_parens>0 do
28374  begin print(" )"); decr(open_parens);
28375  end;
28376if cur_level>level_one then
28377  begin print_nl("("); print_esc("end occurred ");
28378  print("inside a group at level ");
28379@:end_}{\.{(\\end occurred...)}@>
28380  print_int(cur_level-level_one); print_char(")");
28381  if eTeX_ex then show_save_groups;
28382  end;
28383while cond_ptr<>null do
28384  begin print_nl("("); print_esc("end occurred ");
28385  print("when "); print_cmd_chr(if_test,cur_if);
28386  if if_line<>0 then
28387    begin print(" on line "); print_int(if_line);
28388    end;
28389  print(" was incomplete)");
28390  if_line:=if_line_field(cond_ptr);
28391  cur_if:=subtype(cond_ptr); temp_ptr:=cond_ptr;
28392  cond_ptr:=link(cond_ptr); free_node(temp_ptr,if_node_size);
28393  end;
28394if history<>spotless then
28395 if ((history=warning_issued)or(interaction<error_stop_mode)) then
28396  if selector=term_and_log then
28397  begin selector:=term_only;
28398  print_nl("(see the transcript file for additional information)");
28399@.see the transcript file...@>
28400  selector:=term_and_log;
28401  end;
28402if c=1 then
28403  begin @!init for c:=top_mark_code to split_bot_mark_code do
28404    if cur_mark[c]<>null then delete_token_ref(cur_mark[c]);
28405  if sa_mark<>null then
28406    if do_marks(destroy_marks,0,sa_mark) then sa_mark:=null;
28407  for c:=last_box_code to vsplit_code do flush_node_list(disc_ptr[c]);
28408  if last_glue<>max_halfword then delete_glue_ref(last_glue);
28409  store_fmt_file; return;@+tini@/
28410  print_nl("(\dump is performed only by INITEX)"); return;
28411@:dump_}{\.{\\dump...only by INITEX}@>
28412  end;
28413exit:end;
28414
28415@ @<Last-minute...@>=
28416@!init procedure init_prim; {initialize all the primitives}
28417begin no_new_control_sequence:=false;
28418first:=0;
28419@<Put each...@>;
28420no_new_control_sequence:=true;
28421end;
28422tini
28423
28424@ When we begin the following code, \TeX's tables may still contain garbage;
28425the strings might not even be present. Thus we must proceed cautiously to get
28426bootstrapped in.
28427
28428But when we finish this part of the program, \TeX\ is ready to call on the
28429|main_control| routine to do its work.
28430
28431@<Get the first line...@>=
28432begin @<Initialize the input routines@>;
28433@<Enable \eTeX, if requested@>@;@/
28434if (format_ident=0)or(buffer[loc]="&") then
28435  begin if format_ident<>0 then initialize; {erase preloaded format}
28436  if not open_fmt_file then goto final_end;
28437  if not load_fmt_file then
28438    begin w_close(fmt_file); goto final_end;
28439    end;
28440  w_close(fmt_file);
28441  while (loc<limit)and(buffer[loc]=" ") do incr(loc);
28442  end;
28443if eTeX_ex then wterm_ln('entering extended mode');
28444if end_line_char_inactive then decr(limit)
28445else  buffer[limit]:=end_line_char;
28446fix_date_and_time;@/
28447@<Compute the magic offset@>;
28448@<Initialize the print |selector|...@>;
28449if (loc<limit)and(cat_code(buffer[loc])<>escape) then start_input;
28450  {\.{\\input} assumed}
28451end
28452
28453@* \[52] Debugging.
28454Once \TeX\ is working, you should be able to diagnose most errors with
28455the \.{\\show} commands and other diagnostic features. But for the initial
28456stages of debugging, and for the revelation of really deep mysteries, you
28457can compile \TeX\ with a few more aids, including the \PASCAL\ runtime
28458checks and its debugger. An additional routine called |debug_help|
28459will also come into play when you type `\.D' after an error message;
28460|debug_help| also occurs just before a fatal error causes \TeX\ to succumb.
28461@^debugging@>
28462@^system dependencies@>
28463
28464The interface to |debug_help| is primitive, but it is good enough when used
28465with a \PASCAL\ debugger that allows you to set breakpoints and to read
28466variables and change their values. After getting the prompt `\.{debug \#}', you
28467type either a negative number (this exits |debug_help|), or zero (this
28468goes to a location where you can set a breakpoint, thereby entering into
28469dialog with the \PASCAL\ debugger), or a positive number |m| followed by
28470an argument |n|. The meaning of |m| and |n| will be clear from the
28471program below. (If |m=13|, there is an additional argument, |l|.)
28472@.debug \#@>
28473
28474@d breakpoint=888 {place where a breakpoint is desirable}
28475
28476@<Last-minute...@>=
28477@!debug procedure debug_help; {routine to display various things}
28478label breakpoint,exit;
28479var k,@!l,@!m,@!n:integer;
28480begin loop begin wake_up_terminal;
28481  print_nl("debug # (-1 to exit):"); update_terminal;
28482@.debug \#@>
28483  read(term_in,m);
28484  if m<0 then return
28485  else if m=0 then
28486    begin goto breakpoint;@\ {go to every label at least once}
28487    breakpoint: m:=0; @{'BREAKPOINT'@}@\
28488    end
28489  else  begin read(term_in,n);
28490    case m of
28491    @t\4@>@<Numbered cases for |debug_help|@>@;
28492    othercases print("?")
28493    endcases;
28494    end;
28495  end;
28496exit:end;
28497gubed
28498
28499@ @<Numbered cases...@>=
285001: print_word(mem[n]); {display |mem[n]| in all forms}
285012: print_int(info(n));
285023: print_int(link(n));
285034: print_word(eqtb[n]);
285045: print_word(font_info[n]);
285056: print_word(save_stack[n]);
285067: show_box(n);
28507  {show a box, abbreviated by |show_box_depth| and |show_box_breadth|}
285088: begin breadth_max:=10000; depth_threshold:=pool_size-pool_ptr-10;
28509  show_node_list(n); {show a box in its entirety}
28510  end;
285119: show_token_list(n,null,1000);
2851210: slow_print(n);
2851311: check_mem(n>0); {check wellformedness; print new busy locations if |n>0|}
2851412: search_mem(n); {look for pointers to |n|}
2851513: begin read(term_in,l); print_cmd_chr(n,l);
28516  end;
2851714: for k:=0 to n do print(buffer[k]);
2851815: begin font_in_short_display:=null_font; short_display(n);
28519  end;
2852016: panicking:=not panicking;
28521
28522@* \[53] Extensions.
28523The program above includes a bunch of ``hooks'' that allow further
28524capabilities to be added without upsetting \TeX's basic structure.
28525Most of these hooks are concerned with ``whatsit'' nodes, which are
28526intended to be used for special purposes; whenever a new extension to
28527\TeX\ involves a new kind of whatsit node, a corresponding change needs
28528to be made to the routines below that deal with such nodes,
28529but it will usually be unnecessary to make many changes to the
28530other parts of this program.
28531
28532In order to demonstrate how extensions can be made, we shall treat
28533`\.{\\write}', `\.{\\openout}', `\.{\\closeout}', `\.{\\immediate}',
28534`\.{\\special}', and `\.{\\setlanguage}' as if they were extensions.
28535These commands are actually primitives of \TeX, and they should
28536appear in all implementations of the system; but let's try to imagine
28537that they aren't. Then the program below illustrates how a person
28538could add them.
28539
28540Sometimes, of course, an extension will require changes to \TeX\ itself;
28541no system of hooks could be complete enough for all conceivable extensions.
28542The features associated with `\.{\\write}' are almost all confined to the
28543following paragraphs, but there are small parts of the |print_ln| and
28544|print_char| procedures that were introduced specifically to \.{\\write}
28545characters. Furthermore one of the token lists recognized by the scanner
28546is a |write_text|; and there are a few other miscellaneous places where we
28547have already provided for some aspect of \.{\\write}.  The goal of a \TeX\
28548extender should be to minimize alterations to the standard parts of the
28549program, and to avoid them completely if possible. He or she should also
28550be quite sure that there's no easy way to accomplish the desired goals
28551with the standard features that \TeX\ already has. ``Think thrice before
28552extending,'' because that may save a lot of work, and it will also keep
28553incompatible extensions of \TeX\ from proliferating.
28554@^system dependencies@>
28555@^extensions to \TeX@>
28556
28557@ First let's consider the format of whatsit nodes that are used to represent
28558the data associated with \.{\\write} and its relatives. Recall that a whatsit
28559has |type=whatsit_node|, and the |subtype| is supposed to distinguish
28560different kinds of whatsits. Each node occupies two or more words; the
28561exact number is immaterial, as long as it is readily determined from the
28562|subtype| or other data.
28563
28564We shall introduce five |subtype| values here, corresponding to the
28565control sequences \.{\\openout}, \.{\\write}, \.{\\closeout}, \.{\\special}, and
28566\.{\\setlanguage}. The second word of I/O whatsits has a |write_stream| field
28567that identifies the write-stream number (0 to 15, or 16 for out-of-range and
28568positive, or 17 for out-of-range and negative).
28569In the case of \.{\\write} and \.{\\special}, there is also a field that
28570points to the reference count of a token list that should be sent. In the
28571case of \.{\\openout}, we need three words and three auxiliary subfields
28572to hold the string numbers for name, area, and extension.
28573
28574@d write_node_size=2 {number of words in a write/whatsit node}
28575@d open_node_size=3 {number of words in an open/whatsit node}
28576@d open_node=0 {|subtype| in whatsits that represent files to \.{\\openout}}
28577@d write_node=1 {|subtype| in whatsits that represent things to \.{\\write}}
28578@d close_node=2 {|subtype| in whatsits that represent streams to \.{\\closeout}}
28579@d special_node=3 {|subtype| in whatsits that represent \.{\\special} things}
28580@d language_node=4 {|subtype| in whatsits that change the current language}
28581@d what_lang(#)==link(#+1) {language number, in the range |0..255|}
28582@d what_lhm(#)==type(#+1) {minimum left fragment, in the range |1..63|}
28583@d what_rhm(#)==subtype(#+1) {minimum right fragment, in the range |1..63|}
28584@d write_tokens(#) == link(#+1) {reference count of token list to write}
28585@d write_stream(#) == info(#+1) {stream number (0 to 17)}
28586@d open_name(#) == link(#+1) {string number of file name to open}
28587@d open_area(#) == info(#+2) {string number of file area for |open_name|}
28588@d open_ext(#) == link(#+2) {string number of file extension for |open_name|}
28589
28590@ The sixteen possible \.{\\write} streams are represented by the |write_file|
28591array. The |j|th file is open if and only if |write_open[j]=true|. The last
28592two streams are special; |write_open[16]| represents a stream number
28593greater than 15, while |write_open[17]| represents a negative stream number,
28594and both of these variables are always |false|.
28595
28596@<Glob...@>=
28597@!write_file:array[0..15] of alpha_file;
28598@!write_open:array[0..17] of boolean;
28599
28600@ @<Set init...@>=
28601for k:=0 to 17 do write_open[k]:=false;
28602
28603@ Extensions might introduce new command codes; but it's best to use
28604|extension| with a modifier, whenever possible, so that |main_control|
28605stays the same.
28606
28607@d immediate_code=4 {command modifier for \.{\\immediate}}
28608@d set_language_code=5 {command modifier for \.{\\setlanguage}}
28609
28610@d pdftex_first_extension_code = 6
28611@d pdf_save_pos_node           == pdftex_first_extension_code + 0
28612
28613@d pic_file_code=41 { command modifier for \.{\\XeTeXpicfile}, skipping codes pdfTeX might use }
28614@d pdf_file_code=42 { command modifier for \.{\\XeTeXpdffile} }
28615@d glyph_code=43 { command modifier for \.{\\XeTeXglyph} }
28616
28617@d XeTeX_input_encoding_extension_code=44
28618@d XeTeX_default_encoding_extension_code=45
28619@d XeTeX_linebreak_locale_extension_code=46
28620
28621@<Put each...@>=
28622primitive("openout",extension,open_node);@/
28623@!@:open_out_}{\.{\\openout} primitive@>
28624primitive("write",extension,write_node); write_loc:=cur_val;@/
28625@!@:write_}{\.{\\write} primitive@>
28626primitive("closeout",extension,close_node);@/
28627@!@:close_out_}{\.{\\closeout} primitive@>
28628primitive("special",extension,special_node);@/
28629@!@:special_}{\.{\\special} primitive@>
28630primitive("immediate",extension,immediate_code);@/
28631@!@:immediate_}{\.{\\immediate} primitive@>
28632primitive("setlanguage",extension,set_language_code);@/
28633@!@:set_language_}{\.{\\setlanguage} primitive@>
28634
28635@ The \.{\\XeTeXpicfile} and \.{\\XeTeXpdffile} primitives are only defined in extended mode.
28636
28637@<Generate all \eTeX\ primitives@>=
28638primitive("XeTeXpicfile",extension,pic_file_code);@/
28639@!@:XeTeX_pic_file_}{\.{\\XeTeXpicfile} primitive@>
28640primitive("XeTeXpdffile",extension,pdf_file_code);@/
28641@!@:XeTeX_pdf_file_}{\.{\\XeTeXpdffile} primitive@>
28642primitive("XeTeXglyph",extension,glyph_code);@/
28643@!@:XeTeX_glyph_}{\.{\\XeTeXglyph} primitive@>
28644primitive("XeTeXlinebreaklocale", extension, XeTeX_linebreak_locale_extension_code);@/
28645@!@:XeTeX_linebreak_locale_}{\.{\\XeTeXlinebreaklocale} primitive@>
28646primitive("XeTeXinterchartoks",assign_toks,XeTeX_inter_char_loc);
28647@!@:XeTeX_inter_char_toks_}{\.{\\XeTeXinterchartoks} primitive@>
28648@#
28649primitive("pdfsavepos",extension,pdf_save_pos_node);@/
28650@!@:pdf_save_pos_}{\.{\\pdfsavepos} primitive@>
28651
28652@ The variable |write_loc| just introduced is used to provide an
28653appropriate error message in case of ``runaway'' write texts.
28654
28655@<Glob...@>=
28656@!write_loc:pointer; {|eqtb| address of \.{\\write}}
28657
28658@ @<Cases of |print_cmd_chr|...@>=
28659extension: case chr_code of
28660  open_node:print_esc("openout");
28661  write_node:print_esc("write");
28662  close_node:print_esc("closeout");
28663  special_node:print_esc("special");
28664  immediate_code:print_esc("immediate");
28665  set_language_code:print_esc("setlanguage");
28666  pic_file_code:print_esc("XeTeXpicfile");
28667  pdf_file_code:print_esc("XeTeXpdffile");
28668  glyph_code:print_esc("XeTeXglyph");
28669  XeTeX_linebreak_locale_extension_code:print_esc("XeTeXlinebreaklocale");
28670  XeTeX_input_encoding_extension_code:print_esc("XeTeXinputencoding");
28671  XeTeX_default_encoding_extension_code:print_esc("XeTeXdefaultencoding");
28672  pdf_save_pos_node: print_esc("pdfsavepos");
28673  othercases print("[unknown extension!]")
28674  endcases;
28675
28676@ When an |extension| command occurs in |main_control|, in any mode,
28677the |do_extension| routine is called.
28678
28679@<Cases of |main_control| that are for extensions...@>=
28680any_mode(extension):do_extension;
28681
28682@ @<Declare act...@>=
28683@t\4@>@<Declare procedures needed in |do_extension|@>@;
28684procedure do_extension;
28685var i,@!j,@!k:integer; {all-purpose integers}
28686@!p,@!q,@!r:pointer; {all-purpose pointers}
28687begin case cur_chr of
28688open_node:@<Implement \.{\\openout}@>;
28689write_node:@<Implement \.{\\write}@>;
28690close_node:@<Implement \.{\\closeout}@>;
28691special_node:@<Implement \.{\\special}@>;
28692immediate_code:@<Implement \.{\\immediate}@>;
28693set_language_code:@<Implement \.{\\setlanguage}@>;
28694pic_file_code:@<Implement \.{\\XeTeXpicfile}@>;
28695pdf_file_code:@<Implement \.{\\XeTeXpdffile}@>;
28696glyph_code:@<Implement \.{\\XeTeXglyph}@>;
28697XeTeX_input_encoding_extension_code:@<Implement \.{\\XeTeXinputencoding}@>;
28698XeTeX_default_encoding_extension_code:@<Implement \.{\\XeTeXdefaultencoding}@>;
28699XeTeX_linebreak_locale_extension_code:@<Implement \.{\\XeTeXlinebreaklocale}@>;
28700
28701pdf_save_pos_node: @<Implement \.{\\pdfsavepos}@>;
28702othercases confusion("ext1")
28703@:this can't happen ext1}{\quad ext1@>
28704endcases;
28705end;
28706
28707@ Here is a subroutine that creates a whatsit node having a given |subtype|
28708and a given number of words. It initializes only the first word of the whatsit,
28709and appends it to the current list.
28710
28711@<Declare procedures needed in |do_extension|@>=
28712procedure new_whatsit(@!s:small_number;@!w:small_number);
28713var p:pointer; {the new node}
28714begin p:=get_node(w); type(p):=whatsit_node; subtype(p):=s;
28715link(tail):=p; tail:=p;
28716end;
28717
28718@ The next subroutine uses |cur_chr| to decide what sort of whatsit is
28719involved, and also inserts a |write_stream| number.
28720
28721@<Declare procedures needed in |do_ext...@>=
28722procedure new_write_whatsit(@!w:small_number);
28723begin new_whatsit(cur_chr,w);
28724if w<>write_node_size then scan_four_bit_int
28725else  begin scan_int;
28726  if cur_val<0 then cur_val:=17
28727  else if cur_val>15 then cur_val:=16;
28728  end;
28729write_stream(tail):=cur_val;
28730end;
28731
28732@ @<Implement \.{\\openout}@>=
28733begin new_write_whatsit(open_node_size);
28734scan_optional_equals; scan_file_name;@/
28735open_name(tail):=cur_name; open_area(tail):=cur_area; open_ext(tail):=cur_ext;
28736end
28737
28738@ When `\.{\\write 12\{...\}}' appears, we scan the token list `\.{\{...\}}'
28739without expanding its macros; the macros will be expanded later when this
28740token list is rescanned.
28741
28742@<Implement \.{\\write}@>=
28743begin k:=cur_cs; new_write_whatsit(write_node_size);@/
28744cur_cs:=k; p:=scan_toks(false,false); write_tokens(tail):=def_ref;
28745end
28746
28747@ @<Implement \.{\\closeout}@>=
28748begin new_write_whatsit(write_node_size); write_tokens(tail):=null;
28749end
28750
28751@ When `\.{\\special\{...\}}' appears, we expand the macros in the token
28752list as in \.{\\xdef} and \.{\\mark}.
28753
28754@<Implement \.{\\special}@>=
28755begin new_whatsit(special_node,write_node_size); write_stream(tail):=null;
28756p:=scan_toks(false,true); write_tokens(tail):=def_ref;
28757end
28758
28759@ @d call_func(#) == begin if # <> 0 then do_nothing end
28760@d flushable(#) == (# = str_ptr - 1)
28761
28762@p procedure flush_str(s: str_number); {flush a string if possible}
28763begin
28764    if flushable(s) then
28765        flush_string;
28766end;
28767
28768function tokens_to_string(p: pointer): str_number; {return a string from tokens
28769list}
28770begin
28771  if selector = new_string then
28772    pdf_error("tokens", "tokens_to_string() called while selector = new_string");
28773  old_setting:=selector; selector:=new_string;
28774  show_token_list(link(p),null,pool_size-pool_ptr);
28775  selector:=old_setting;
28776  tokens_to_string:=make_string;
28777end;
28778
28779procedure compare_strings; {to implement \.{\\strcmp}}
28780label done;
28781var s1, s2: str_number;
28782  i1, i2, j1, j2: pool_pointer;
28783begin
28784  call_func(scan_toks(false, true));
28785  s1:=tokens_to_string(def_ref);
28786  delete_token_ref(def_ref);
28787  call_func(scan_toks(false, true));
28788  s2:=tokens_to_string(def_ref);
28789  delete_token_ref(def_ref);
28790  i1:=str_start_macro(s1);
28791  j1:=str_start_macro(s1 + 1);
28792  i2:=str_start_macro(s2);
28793  j2:=str_start_macro(s2 + 1);
28794  while (i1 < j1) and (i2 < j2) do begin
28795    if str_pool[i1] < str_pool[i2] then begin
28796      cur_val:=-1;
28797      goto done;
28798    end;
28799    if str_pool[i1] > str_pool[i2] then begin
28800      cur_val:=1;
28801      goto done;
28802    end;
28803    incr(i1);
28804    incr(i2);
28805  end;
28806  if (i1 = j1) and (i2 = j2) then
28807    cur_val:=0
28808  else if i1 < j1 then
28809    cur_val:=1
28810  else
28811    cur_val:=-1;
28812done:
28813  flush_str(s2);
28814  flush_str(s1);
28815  cur_val_level:=int_val;
28816end;
28817
28818@ Each new type of node that appears in our data structure must be capable
28819of being displayed, copied, destroyed, and so on. The routines that we
28820need for write-oriented whatsits are somewhat like those for mark nodes;
28821other extensions might, of course, involve more subtlety here.
28822
28823@<Basic printing...@>=
28824procedure print_write_whatsit(@!s:str_number;@!p:pointer);
28825begin print_esc(s);
28826if write_stream(p)<16 then print_int(write_stream(p))
28827else if write_stream(p)=16 then print_char("*")
28828@.*\relax@>
28829else print_char("-");
28830end;
28831
28832procedure print_native_word(@!p:pointer);
28833var i,c,cc:integer;
28834begin
28835  for i:=0 to native_length(p) - 1 do begin
28836    c:=get_native_char(p,i);
28837    if (c >= @"D800) and (c <= @"DBFF) then begin
28838      if i < native_length(p) - 1 then begin
28839        cc:=get_native_char(p, i+1);
28840        if (cc >= @"DC00) and (cc <= @"DFFF) then begin
28841          c:=@"10000 + (c - @"D800) * @"400 + (cc - @"DC00);
28842          print_char(c);
28843          incr(i);
28844        end else
28845          print(".");
28846      end else
28847        print(".");
28848    end else
28849      print_char(c);
28850  end
28851end;
28852
28853@ @<Display the whatsit...@>=
28854case subtype(p) of
28855open_node:begin print_write_whatsit("openout",p);
28856  print_char("="); print_file_name(open_name(p),open_area(p),open_ext(p));
28857  end;
28858write_node:begin print_write_whatsit("write",p);
28859  print_mark(write_tokens(p));
28860  end;
28861close_node:print_write_whatsit("closeout",p);
28862special_node:begin print_esc("special");
28863  print_mark(write_tokens(p));
28864  end;
28865language_node:begin print_esc("setlanguage");
28866  print_int(what_lang(p)); print(" (hyphenmin ");
28867  print_int(what_lhm(p)); print_char(",");
28868  print_int(what_rhm(p)); print_char(")");
28869  end;
28870native_word_node:begin print_esc(font_id_text(native_font(p)));
28871  print_char(" ");
28872  print_native_word(p);
28873  end;
28874glyph_node:begin print_esc(font_id_text(native_font(p)));
28875  print(" glyph#");
28876  print_int(native_glyph(p));
28877  end;
28878pic_node,pdf_node: begin
28879  if subtype(p) = pic_node then print_esc("XeTeXpicfile")
28880  else print_esc("XeTeXpdffile");
28881  print(" """);
28882  for i:=0 to pic_path_length(p)-1 do
28883    print_visible_char(pic_path_byte(p, i));
28884  print("""");
28885  end;
28886pdf_save_pos_node: print_esc("pdfsavepos");
28887othercases print("whatsit?")
28888endcases
28889
28890@ Picture nodes are tricky in that they are variable size.
28891@d total_pic_node_size(#) == (pic_node_size + (pic_path_length(#) + sizeof(memory_word) - 1) div sizeof(memory_word))
28892
28893@<Make a partial copy of the whatsit...@>=
28894case subtype(p) of
28895open_node: begin r:=get_node(open_node_size); words:=open_node_size;
28896  end;
28897write_node,special_node: begin r:=get_node(write_node_size);
28898  add_token_ref(write_tokens(p)); words:=write_node_size;
28899  end;
28900close_node,language_node: begin r:=get_node(small_node_size);
28901  words:=small_node_size;
28902  end;
28903native_word_node: begin words:=native_size(p);
28904  r:=get_node(words);
28905  while words > 0 do
28906    begin decr(words); mem[r+words]:=mem[p+words]; end;
28907  native_glyph_info_ptr(r):=null_ptr; native_glyph_count(r):=0;
28908  copy_native_glyph_info(p, r);
28909  end;
28910glyph_node: begin r:=get_node(glyph_node_size);
28911  words:=glyph_node_size;
28912  end;
28913pic_node,pdf_node: begin words:=total_pic_node_size(p);
28914  r:=get_node(words);
28915  end;
28916pdf_save_pos_node:
28917  r:=get_node(small_node_size);
28918othercases confusion("ext2")
28919@:this can't happen ext2}{\quad ext2@>
28920endcases
28921
28922@ @<Wipe out the whatsit...@>=
28923begin case subtype(p) of
28924open_node: free_node(p,open_node_size);
28925write_node,special_node: begin delete_token_ref(write_tokens(p));
28926  free_node(p,write_node_size); goto done;
28927  end;
28928close_node,language_node: free_node(p,small_node_size);
28929native_word_node: begin free_native_glyph_info(p); free_node(p,native_size(p)); end;
28930glyph_node: free_node(p,glyph_node_size);
28931pic_node,pdf_node: free_node(p,total_pic_node_size(p));
28932pdf_save_pos_node:
28933  free_node(p, small_node_size);
28934othercases confusion("ext3")
28935@:this can't happen ext3}{\quad ext3@>
28936endcases;@/
28937goto done;
28938end
28939
28940@ @<Incorporate a whatsit node into a vbox@>=
28941begin
28942  if (subtype(p)=pic_node)
28943  or (subtype(p)=pdf_node)
28944  then begin
28945    x:=x + d + height(p);
28946    d:=depth(p);
28947    if width(p) > w then w:=width(p);
28948  end;
28949end
28950
28951@ @<Incorporate a whatsit node into an hbox@>=
28952begin
28953  case subtype(p) of
28954  native_word_node: begin
28955    { merge with any following word fragments in same font, discarding discretionary breaks }
28956    if type(q) = disc_node then k:=replace_count(q) else k:=0;
28957    while (link(q) <> p) do begin
28958      decr(k);
28959      q:=link(q); { bring q up in preparation for deletion of nodes starting at p }
28960      if type(q) = disc_node then k:=replace_count(q);
28961      end;
28962    pp:=link(p);
28963    restart:
28964    if (k <= 0) and (pp <> null) and (not is_char_node(pp)) then begin
28965      if (type(pp) = whatsit_node)
28966        and (subtype(pp) = native_word_node)
28967        and (native_font(pp) = native_font(p)) then begin
28968        pp:=link(pp);
28969        goto restart;
28970      end
28971      else if (type(pp) = disc_node) then begin
28972        ppp:=link(pp);
28973        if is_native_word_node(ppp) and (native_font(ppp) = native_font(p)) then begin
28974          pp:=link(ppp);
28975          goto restart;
28976        end
28977      end
28978    end;
28979
28980    { now pp points to the non-|native_word| node that ended the chain, or null }
28981
28982    { we can just check type(p)=|whatsit_node| below, as we know that the chain
28983      contains only discretionaries and |native_word| nodes, no other whatsits or |char_node|s }
28984
28985    if (pp <> link(p)) then begin
28986      { found a chain of at least two pieces starting at p }
28987      total_chars:=0;
28988      p:=link(q); { the first fragment }
28989      while (p <> pp) do begin
28990        if (type(p) = whatsit_node) then
28991          total_chars:=total_chars + native_length(p); { accumulate char count }
28992        ppp:=p; { remember last node seen }
28993        p:=link(p); { point to next fragment or discretionary or terminator }
28994      end;
28995
28996      p:=link(q); { the first fragment again }
28997      pp:=new_native_word_node(native_font(p), total_chars); { make new node for merged word }
28998      link(q):=pp; { link to preceding material }
28999      link(pp):=link(ppp); { attach remainder of hlist to it }
29000      link(ppp):=null; { and detach from the old fragments }
29001
29002      { copy the chars into new node }
29003      total_chars:=0;
29004      ppp:=p;
29005      repeat
29006        if (type(ppp) = whatsit_node) then
29007          for k:=0 to native_length(ppp)-1 do begin
29008            set_native_char(pp, total_chars, get_native_char(ppp, k));
29009            incr(total_chars);
29010          end;
29011        ppp:=link(ppp);
29012      until (ppp = null);
29013
29014      flush_node_list(p); { delete the fragments }
29015      p:=link(q); { update p to point to the new node }
29016      set_native_metrics(p, XeTeX_use_glyph_metrics); { and measure it (i.e., re-do the OT layout) }
29017    end;
29018
29019    { now incorporate the |native_word| node measurements into the box we're packing }
29020    if height(p) > h then
29021        h:=height(p);
29022    if depth(p) > d then
29023        d:=depth(p);
29024    x:=x + width(p);
29025    end;
29026  glyph_node, pic_node, pdf_node: begin
29027    if height(p) > h then
29028      h:=height(p);
29029    if depth(p) > d then
29030      d:=depth(p);
29031    x:=x + width(p);
29032    end;
29033  othercases do_nothing
29034  endcases;
29035end
29036
29037@ @<Let |d| be the width of the whatsit |p|, and |goto found| if ``visible''@>=
29038if (subtype(p)=native_word_node)
29039or (subtype(p)=glyph_node)
29040or (subtype(p)=pic_node)
29041or (subtype(p)=pdf_node)
29042then begin
29043  d:=width(p);
29044  goto found;
29045end else
29046  d:=0
29047
29048@ @d adv_past_linebreak(#)==@+if subtype(#)=language_node then
29049    begin cur_lang:=what_lang(#); l_hyf:=what_lhm(#); r_hyf:=what_rhm(#);
29050    set_hyph_index;
29051    end
29052  else if (subtype(#)=native_word_node)
29053  or (subtype(#)=glyph_node)
29054  or (subtype(#)=pic_node)
29055  or (subtype(#)=pdf_node)
29056  then
29057    begin act_width:=act_width+width(#); end
29058
29059@<Advance \(p)past a whatsit node in the \(l)|line_break| loop@>=@+
29060adv_past_linebreak(cur_p)
29061
29062@ @d adv_past_prehyph(#)==@+if subtype(#)=language_node then
29063    begin cur_lang:=what_lang(#); l_hyf:=what_lhm(#); r_hyf:=what_rhm(#);
29064    set_hyph_index;
29065    end
29066
29067@<Advance \(p)past a whatsit node in the \(p)pre-hyphenation loop@>=@+
29068adv_past_prehyph(s)
29069
29070@ @<Prepare to move whatsit |p| to the current page, then |goto contribute|@>=
29071begin
29072  if (subtype(p)=pic_node)
29073  or (subtype(p)=pdf_node)
29074  then begin
29075    page_total:=page_total + page_depth + height(p);
29076    page_depth:=depth(p);
29077  end;
29078  goto contribute;
29079end
29080
29081@ @<Process whatsit |p| in |vert_break| loop, |goto not_found|@>=
29082begin
29083  if (subtype(p)=pic_node)
29084  or (subtype(p)=pdf_node)
29085  then begin
29086    cur_height:=cur_height + prev_dp + height(p); prev_dp:=depth(p);
29087  end;
29088  goto not_found;
29089end
29090
29091@ @<Output the whatsit node |p| in a vlist@>=
29092begin
29093  case subtype(p) of
29094  glyph_node: begin
29095    cur_v:=cur_v+height(p);
29096    cur_h:=left_edge;
29097    synch_h; synch_v; {Sync DVI state to TeX state}
29098    f:=native_font(p);
29099    if f<>dvi_f then @<Change font |dvi_f| to |f|@>;
29100    dvi_out(set_glyphs);
29101    dvi_four(0); { width }
29102    dvi_two(1); { glyph count }
29103    dvi_four(0); { x-offset as fixed point }
29104    dvi_four(0); { y-offset as fixed point }
29105    dvi_two(native_glyph(p));
29106    cur_v:=cur_v+depth(p);
29107    cur_h:=left_edge;
29108    end;
29109  pic_node, pdf_node: begin
29110    save_h:=dvi_h; save_v:=dvi_v;
29111    cur_v:=cur_v+height(p);
29112    pic_out(p);
29113    dvi_h:=save_h; dvi_v:=save_v;
29114    cur_v:=save_v+depth(p); cur_h:=left_edge;
29115    end;
29116  pdf_save_pos_node: @<Save current position to |pdf_last_x_pos|, |pdf_last_y_pos|@>;
29117  othercases out_what(p)
29118  endcases
29119end
29120
29121@ @<Save current position to |pdf_last_x_pos|, |pdf_last_y_pos|@>=
29122begin
29123  pdf_last_x_pos:=cur_h + cur_h_offset;
29124  pdf_last_y_pos:=cur_page_height - cur_v - cur_v_offset
29125end
29126
29127@ @<Calculate page dimensions and margins@>=
29128cur_h_offset:=h_offset + (unity * 7227) / 100;
29129cur_v_offset:=v_offset + (unity * 7227) / 100;
29130if pdf_page_width <> 0 then
29131  cur_page_width:=pdf_page_width
29132else
29133  cur_page_width:=width(p) + 2*cur_h_offset;
29134if pdf_page_height <> 0 then
29135  cur_page_height:=pdf_page_height
29136else
29137  cur_page_height:=height(p) + depth(p) + 2*cur_v_offset
29138
29139@ @<Glob...@>=
29140@!cur_page_width: scaled; {width of page being shipped}
29141@!cur_page_height: scaled; {height of page being shipped}
29142@!cur_h_offset: scaled; {horizontal offset of page being shipped}
29143@!cur_v_offset: scaled; {vertical offset of page being shipped}
29144
29145@ @<Output the whatsit node |p| in an hlist@>=
29146begin
29147  case subtype(p) of
29148  native_word_node, glyph_node: begin
29149    synch_h; synch_v; {Sync DVI state to TeX state}
29150    f:=native_font(p);
29151    if f<>dvi_f then @<Change font |dvi_f| to |f|@>;
29152    if subtype(p) = glyph_node then begin
29153      dvi_out(set_glyphs);
29154      dvi_four(width(p));
29155      dvi_two(1); { glyph count }
29156      dvi_four(0); { x-offset as fixed point }
29157      dvi_four(0); { y-offset as fixed point }
29158      dvi_two(native_glyph(p));
29159      cur_h:=cur_h + width(p);
29160    end else begin
29161      if native_glyph_info_ptr(p) <> null_ptr then begin
29162        dvi_out(set_glyphs);
29163        len:=make_xdv_glyph_array_data(p);
29164        for k:=0 to len-1 do
29165          dvi_out(xdv_buffer_byte(k));
29166      end;
29167      cur_h:=cur_h + width(p);
29168    end;
29169    dvi_h:=cur_h;
29170  end;
29171  pic_node, pdf_node: begin
29172    save_h:=dvi_h; save_v:=dvi_v;
29173    cur_v:=base_line;
29174    edge:=cur_h+width(p);
29175    pic_out(p);
29176    dvi_h:=save_h; dvi_v:=save_v;
29177    cur_h:=edge; cur_v:=base_line;
29178  end;
29179  pdf_save_pos_node: @<Save current position to |pdf_last_x_pos|, |pdf_last_y_pos|@>;
29180  othercases out_what(p)
29181  endcases
29182end
29183
29184@ After all this preliminary shuffling, we come finally to the routines
29185that actually send out the requested data. Let's do \.{\\special} first
29186(it's easier).
29187
29188@<Declare procedures needed in |hlist_out|, |vlist_out|@>=
29189procedure special_out(@!p:pointer);
29190var old_setting:0..max_selector; {holds print |selector|}
29191@!k:pool_pointer; {index into |str_pool|}
29192begin synch_h; synch_v;@/
29193doing_special:=true;
29194old_setting:=selector; selector:=new_string;
29195show_token_list(link(write_tokens(p)),null,pool_size-pool_ptr);
29196selector:=old_setting;
29197str_room(1);
29198if cur_length<256 then
29199  begin dvi_out(xxx1); dvi_out(cur_length);
29200  end
29201else  begin dvi_out(xxx4); dvi_four(cur_length);
29202  end;
29203for k:=str_start_macro(str_ptr) to pool_ptr-1 do dvi_out(so(str_pool[k]));
29204pool_ptr:=str_start_macro(str_ptr); {erase the string}
29205doing_special:=false;
29206end;
29207
29208@ To write a token list, we must run it through \TeX's scanner, expanding
29209macros and \.{\\the} and \.{\\number}, etc. This might cause runaways,
29210if a delimited macro parameter isn't matched, and runaways would be
29211extremely confusing since we are calling on \TeX's scanner in the middle
29212of a \.{\\shipout} command. Therefore we will put a dummy control sequence as
29213a ``stopper,'' right after the token list. This control sequence is
29214artificially defined to be \.{\\outer}.
29215@:end_write_}{\.{\\endwrite}@>
29216
29217@<Initialize table...@>=
29218text(end_write):="endwrite"; eq_level(end_write):=level_one;
29219eq_type(end_write):=outer_call; equiv(end_write):=null;
29220
29221@ @<Declare procedures needed in |hlist_out|, |vlist_out|@>=
29222procedure write_out(@!p:pointer);
29223var old_setting:0..max_selector; {holds print |selector|}
29224@!old_mode:integer; {saved |mode|}
29225@!j:small_number; {write stream number}
29226@!k:integer;
29227@!q,@!r:pointer; {temporary variables for list manipulation}
29228begin @<Expand macros in the token list
29229  and make |link(def_ref)| point to the result@>;
29230old_setting:=selector; j:=write_stream(p);
29231if write_open[j] then selector:=j
29232else  begin {write to the terminal if file isn't open}
29233  if (j=17)and(selector=term_and_log) then selector:=log_only;
29234  print_nl("");
29235  end;
29236token_show(def_ref); print_ln;
29237flush_list(def_ref); selector:=old_setting;
29238end;
29239
29240@ The final line of this routine is slightly subtle; at least, the author
29241didn't think about it until getting burnt! There is a used-up token list
29242@^Knuth, Donald Ervin@>
29243on the stack, namely the one that contained |end_write_token|. (We
29244insert this artificial `\.{\\endwrite}' to prevent runaways, as explained
29245above.) If it were not removed, and if there were numerous writes on a
29246single page, the stack would overflow.
29247
29248@d end_write_token==cs_token_flag+end_write
29249
29250@<Expand macros in the token list and...@>=
29251q:=get_avail; info(q):=right_brace_token+"}";@/
29252r:=get_avail; link(q):=r; info(r):=end_write_token; ins_list(q);@/
29253begin_token_list(write_tokens(p),write_text);@/
29254q:=get_avail; info(q):=left_brace_token+"{"; ins_list(q);
29255{now we're ready to scan
29256  `\.\{$\langle\,$token list$\,\rangle$\.{\} \\endwrite}'}
29257old_mode:=mode; mode:=0;
29258  {disable \.{\\prevdepth}, \.{\\spacefactor}, \.{\\lastskip}, \.{\\prevgraf}}
29259cur_cs:=write_loc; q:=scan_toks(false,true); {expand macros, etc.}
29260get_token;@+if cur_tok<>end_write_token then
29261  @<Recover from an unbalanced write command@>;
29262mode:=old_mode;
29263end_token_list {conserve stack space}
29264
29265@ @<Recover from an unbalanced write command@>=
29266begin print_err("Unbalanced write command");
29267@.Unbalanced write...@>
29268help2("On this page there's a \write with fewer real {'s than }'s.")@/
29269("I can't handle that very well; good luck."); error;
29270repeat get_token;
29271until cur_tok=end_write_token;
29272end
29273
29274@ The |out_what| procedure takes care of outputting whatsit nodes for
29275|vlist_out| and |hlist_out|\kern-.3pt.
29276
29277@<Declare procedures needed in |hlist_out|, |vlist_out|@>=
29278procedure pic_out(@!p:pointer);
29279var old_setting:0..max_selector; {holds print |selector|}
29280  i:integer;
29281  k:pool_pointer; {index into |str_pool|}
29282begin
29283synch_h; synch_v;
29284old_setting:=selector; selector:=new_string;
29285print("pdf:image ");
29286print("matrix ");
29287print_scaled(pic_transform1(p)); print(" ");
29288print_scaled(pic_transform2(p)); print(" ");
29289print_scaled(pic_transform3(p)); print(" ");
29290print_scaled(pic_transform4(p)); print(" ");
29291print_scaled(pic_transform5(p)); print(" ");
29292print_scaled(pic_transform6(p)); print(" ");
29293print("page "); print_int(pic_page(p)); print(" ");
29294print("(");
29295for i:=0 to pic_path_length(p)-1 do
29296  print_visible_char(pic_path_byte(p, i));
29297print(")");
29298selector:=old_setting;
29299if cur_length<256 then
29300  begin dvi_out(xxx1); dvi_out(cur_length);
29301  end
29302else  begin dvi_out(xxx4); dvi_four(cur_length);
29303  end;
29304for k:=str_start_macro(str_ptr) to pool_ptr-1 do dvi_out(so(str_pool[k]));
29305pool_ptr:=str_start_macro(str_ptr); {erase the string}
29306end;
29307
29308procedure out_what(@!p:pointer);
29309var j:small_number; {write stream number}
29310begin case subtype(p) of
29311open_node,write_node,close_node:@<Do some work that has been queued up
29312  for \.{\\write}@>;
29313special_node:special_out(p);
29314language_node:do_nothing;
29315othercases confusion("ext4")
29316@:this can't happen ext4}{\quad ext4@>
29317endcases;
29318end;
29319
29320@ We don't implement \.{\\write} inside of leaders. (The reason is that
29321the number of times a leader box appears might be different in different
29322implementations, due to machine-dependent rounding in the glue calculations.)
29323@^leaders@>
29324
29325@<Do some work that has been queued up...@>=
29326if not doing_leaders then
29327  begin j:=write_stream(p);
29328  if subtype(p)=write_node then write_out(p)
29329  else  begin if write_open[j] then a_close(write_file[j]);
29330    if subtype(p)=close_node then write_open[j]:=false
29331    else if j<16 then
29332      begin cur_name:=open_name(p); cur_area:=open_area(p);
29333      cur_ext:=open_ext(p);
29334      if cur_ext="" then cur_ext:=".tex";
29335      pack_cur_name;
29336      while not a_open_out(write_file[j]) do
29337        prompt_file_name("output file name",".tex");
29338      write_open[j]:=true;
29339      end;
29340    end;
29341  end
29342
29343@ The presence of `\.{\\immediate}' causes the |do_extension| procedure
29344to descend to one level of recursion. Nothing happens unless \.{\\immediate}
29345is followed by `\.{\\openout}', `\.{\\write}', or `\.{\\closeout}'.
29346@^recursion@>
29347
29348@<Implement \.{\\immediate}@>=
29349begin get_x_token;
29350if (cur_cmd=extension)and(cur_chr<=close_node) then
29351  begin p:=tail; do_extension; {append a whatsit node}
29352  out_what(tail); {do the action immediately}
29353  flush_node_list(tail); tail:=p; link(p):=null;
29354  end
29355else back_input;
29356end
29357
29358@ The \.{\\language} extension is somewhat different.
29359We need a subroutine that comes into play when a character of
29360a non-|clang| language is being appended to the current paragraph.
29361
29362@<Declare action...@>=
29363procedure fix_language;
29364var @!l:ASCII_code; {the new current language}
29365begin if language<=0 then l:=0
29366else if language>255 then l:=0
29367else l:=language;
29368if l<>clang then
29369  begin new_whatsit(language_node,small_node_size);
29370  what_lang(tail):=l; clang:=l;@/
29371  what_lhm(tail):=norm_min(left_hyphen_min);
29372  what_rhm(tail):=norm_min(right_hyphen_min);
29373  end;
29374end;
29375
29376@ @<Implement \.{\\setlanguage}@>=
29377if abs(mode)<>hmode then report_illegal_case
29378else begin new_whatsit(language_node,small_node_size);
29379  scan_int;
29380  if cur_val<=0 then clang:=0
29381  else if cur_val>255 then clang:=0
29382  else clang:=cur_val;
29383  what_lang(tail):=clang;
29384  what_lhm(tail):=norm_min(left_hyphen_min);
29385  what_rhm(tail):=norm_min(right_hyphen_min);
29386  end
29387
29388@ @<Finish the extensions@>=
29389terminate_font_manager;
29390for k:=0 to 15 do if write_open[k] then a_close(write_file[k])
29391
29392@ @<Implement \.{\\XeTeXpicfile}@>=
29393if abs(mode)=mmode then report_illegal_case
29394else load_picture(false)
29395
29396@ @<Implement \.{\\XeTeXpdffile}@>=
29397if abs(mode)=mmode then report_illegal_case
29398else load_picture(true)
29399
29400@ @<Implement \.{\\XeTeXglyph}@>=
29401begin
29402 if abs(mode)=vmode then begin
29403  back_input;
29404  new_graf(true);
29405 end else if abs(mode)=mmode then report_illegal_case
29406 else begin
29407  if is_native_font(cur_font) then begin
29408   new_whatsit(glyph_node,glyph_node_size);
29409   scan_int;
29410   if (cur_val<0)or(cur_val>65535) then
29411     begin print_err("Bad glyph number");
29412     help2("A glyph number must be between 0 and 65535.")@/
29413     ("I changed this one to zero."); int_error(cur_val); cur_val:=0;
29414   end;
29415   native_font(tail):=cur_font;
29416   native_glyph(tail):=cur_val;
29417   set_native_glyph_metrics(tail, XeTeX_use_glyph_metrics);
29418  end else not_native_font_error(extension, glyph_code, cur_font);
29419 end
29420end
29421
29422@ Load a picture file and handle following keywords.
29423
29424@d calc_min_and_max==
29425  begin
29426    xmin:=1000000.0;
29427    xmax:=-xmin;
29428    ymin:=xmin;
29429    ymax:=xmax;
29430    for i:=0 to 3 do begin
29431      if xCoord(corners[i]) < xmin then xmin:=xCoord(corners[i]);
29432      if xCoord(corners[i]) > xmax then xmax:=xCoord(corners[i]);
29433      if yCoord(corners[i]) < ymin then ymin:=yCoord(corners[i]);
29434      if yCoord(corners[i]) > ymax then ymax:=yCoord(corners[i]);
29435    end;
29436  end
29437
29438@d update_corners==
29439  for i:=0 to 3 do
29440    transform_point(addressof(corners[i]), addressof(t2))
29441
29442@d do_size_requests==begin
29443  { calculate current width and height }
29444  calc_min_and_max;
29445  if x_size_req = 0.0 then begin
29446    make_scale(addressof(t2), y_size_req / (ymax - ymin), y_size_req / (ymax - ymin));
29447  end else if y_size_req = 0.0 then begin
29448    make_scale(addressof(t2), x_size_req / (xmax - xmin), x_size_req / (xmax - xmin));
29449  end else begin
29450    make_scale(addressof(t2), x_size_req / (xmax - xmin), y_size_req / (ymax - ymin));
29451  end;
29452  update_corners;
29453  x_size_req:=0.0;
29454  y_size_req:=0.0;
29455  transform_concat(addressof(t), addressof(t2));
29456end
29457
29458@<Declare procedures needed in |do_extension|@>=
29459procedure load_picture(@!is_pdf:boolean);
29460var
29461  pic_path: ^char;
29462  bounds: real_rect;
29463  t, t2: transform;
29464  corners: array[0..3] of real_point;
29465  x_size_req,y_size_req: real;
29466  check_keywords: boolean;
29467  xmin,xmax,ymin,ymax: real;
29468  i: small_number;
29469  page: integer;
29470  pdf_box_type: integer;
29471  result: integer;
29472begin
29473  { scan the filename and pack into |name_of_file| }
29474  scan_file_name;
29475  pack_cur_name;
29476
29477  pdf_box_type:=0;
29478  page:=0;
29479  if is_pdf then begin
29480    if scan_keyword("page") then begin
29481      scan_int;
29482      page:=cur_val;
29483    end;
29484    pdf_box_type:=pdfbox_crop;
29485    if scan_keyword("crop") then do_nothing
29486    else if scan_keyword("media") then pdf_box_type:=pdfbox_media
29487    else if scan_keyword("bleed") then pdf_box_type:=pdfbox_bleed
29488    else if scan_keyword("trim") then pdf_box_type:=pdfbox_trim
29489    else if scan_keyword("art") then pdf_box_type:=pdfbox_art;
29490  end;
29491
29492  { access the picture file and check its size }
29493  result:=find_pic_file(addressof(pic_path), addressof(bounds), pdf_box_type, page);
29494
29495  setPoint(corners[0], xField(bounds), yField(bounds));
29496  setPoint(corners[1], xField(corners[0]), yField(bounds) + htField(bounds));
29497  setPoint(corners[2], xField(bounds) + wdField(bounds), yField(corners[1]));
29498  setPoint(corners[3], xField(corners[2]), yField(corners[0]));
29499
29500  x_size_req:=0.0;
29501  y_size_req:=0.0;
29502
29503  { look for any scaling requests for this picture }
29504  make_identity(addressof(t));
29505
29506  check_keywords:=true;
29507  while check_keywords do begin
29508    if scan_keyword("scaled") then begin
29509      scan_int;
29510      if (x_size_req = 0.0) and (y_size_req = 0.0) then begin
29511        make_scale(addressof(t2), float(cur_val) / 1000.0, float(cur_val) / 1000.0);
29512        update_corners;
29513        transform_concat(addressof(t), addressof(t2));
29514      end
29515    end else if scan_keyword("xscaled") then begin
29516      scan_int;
29517      if (x_size_req = 0.0) and (y_size_req = 0.0) then begin
29518        make_scale(addressof(t2), float(cur_val) / 1000.0, 1.0);
29519        update_corners;
29520        transform_concat(addressof(t), addressof(t2));
29521      end
29522    end else if scan_keyword("yscaled") then begin
29523      scan_int;
29524      if (x_size_req = 0.0) and (y_size_req = 0.0) then begin
29525        make_scale(addressof(t2), 1.0, float(cur_val) / 1000.0);
29526        update_corners;
29527        transform_concat(addressof(t), addressof(t2));
29528      end
29529    end else if scan_keyword("width") then begin
29530      scan_normal_dimen;
29531      if cur_val <= 0 then begin
29532        print_err("Improper image ");
29533        print("size (");
29534        print_scaled(cur_val);
29535        print("pt) will be ignored");
29536        help2("I can't scale images to zero or negative sizes,")@/
29537             ("so I'm ignoring this.");
29538        error;
29539      end else
29540        x_size_req:=Fix2D(cur_val);
29541    end else if scan_keyword("height") then begin
29542      scan_normal_dimen;
29543      if cur_val <= 0 then begin
29544        print_err("Improper image ");
29545        print("size (");
29546        print_scaled(cur_val);
29547        print("pt) will be ignored");
29548        help2("I can't scale images to zero or negative sizes,")@/
29549             ("so I'm ignoring this.");
29550        error;
29551      end else
29552        y_size_req:=Fix2D(cur_val);
29553    end else if scan_keyword("rotated") then begin
29554      scan_decimal;
29555      if (x_size_req <> 0.0) or (y_size_req <> 0.0) then do_size_requests;
29556      make_rotation(addressof(t2), Fix2D(cur_val) * 3.141592653589793 / 180.0);
29557      update_corners;
29558      calc_min_and_max;
29559      setPoint(corners[0], xmin, ymin);
29560      setPoint(corners[1], xmin, ymax);
29561      setPoint(corners[2], xmax, ymax);
29562      setPoint(corners[3], xmax, ymin);
29563      transform_concat(addressof(t), addressof(t2));
29564    end else
29565      check_keywords:=false;
29566  end;
29567
29568  if (x_size_req <> 0.0) or (y_size_req <> 0.0) then do_size_requests;
29569
29570  calc_min_and_max;
29571  make_translation(addressof(t2), -xmin * 72 / 72.27, -ymin * 72 / 72.27);
29572  transform_concat(addressof(t), addressof(t2));
29573
29574  if result = 0 then begin
29575    new_whatsit(pic_node, pic_node_size + (strlen(pic_path) + sizeof(memory_word) - 1) div sizeof(memory_word));
29576    if is_pdf then begin
29577      subtype(tail):=pdf_node;
29578    end;
29579    pic_path_length(tail):=strlen(pic_path);
29580    pic_page(tail):=page;
29581
29582    width(tail):=D2Fix(xmax - xmin);
29583    height(tail):=D2Fix(ymax - ymin);
29584    depth(tail):=0;
29585
29586    pic_transform1(tail):=D2Fix(aField(t));
29587    pic_transform2(tail):=D2Fix(bField(t));
29588    pic_transform3(tail):=D2Fix(cField(t));
29589    pic_transform4(tail):=D2Fix(dField(t));
29590    pic_transform5(tail):=D2Fix(xField(t));
29591    pic_transform6(tail):=D2Fix(yField(t));
29592
29593    memcpy(addressof(mem[tail + pic_node_size]), pic_path, strlen(pic_path));
29594    libc_free(pic_path);
29595  end else begin
29596    print_err("Unable to load picture or PDF file '");
29597    print_file_name(cur_name,cur_area,cur_ext); print("'");
29598    if result = -43 then begin { Mac OS file not found error }
29599        help2("The requested image couldn't be read because")@/
29600             ("the file was not found.");
29601    end
29602    else begin { otherwise assume GraphicImport failed }
29603        help2("The requested image couldn't be read because")@/
29604             ("it was not a recognized image format.");
29605    end;
29606    error;
29607  end;
29608end;
29609
29610@ @<Implement \.{\\XeTeXinputencoding}@>=
29611begin
29612    scan_and_pack_name; {scan a filename-like arg for the input encoding}
29613
29614    i:=get_encoding_mode_and_info(addressof(j)); {convert it to |mode| and |encoding| values}
29615    if i = XeTeX_input_mode_auto then begin
29616      print_err("Encoding mode `auto' is not valid for \XeTeXinputencoding");
29617      help2("You can't use `auto' encoding here, only for \XeTeXdefaultencoding.")@/
29618           ("I'll ignore this and leave the current encoding unchanged.");@/
29619      error;
29620    end else set_input_file_encoding(input_file[in_open], i, j); {apply them to the current input file}
29621end
29622
29623@ @<Implement \.{\\XeTeXdefaultencoding}@>=
29624begin
29625  scan_and_pack_name; {scan a filename-like arg for the input encoding}
29626
29627  i:=get_encoding_mode_and_info(addressof(j)); {convert it to |mode| and |encoding| values}
29628  XeTeX_default_input_mode:=i; {store them as defaults for new input files}
29629  XeTeX_default_input_encoding:=j;
29630end
29631
29632@ @<Implement \.{\\XeTeXlinebreaklocale}@>=
29633begin
29634  scan_file_name; {scan a filename-like arg for the locale name}
29635  if length(cur_name) = 0 then XeTeX_linebreak_locale:=0
29636  else XeTeX_linebreak_locale:=cur_name; {we ignore the area and extension!}
29637end
29638
29639@ @<Glob...@>=
29640@!pdf_last_x_pos: integer;
29641@!pdf_last_y_pos: integer;
29642
29643@ @<Implement \.{\\pdfsavepos}@>=
29644begin
29645  new_whatsit(pdf_save_pos_node, small_node_size);
29646end
29647
29648
29649@* \[53a] The extended features of \eTeX.
29650The program has two modes of operation:  (1)~In \TeX\ compatibility mode
29651it fully deserves the name \TeX\ and there are neither extended features
29652nor additional primitive commands.  There are, however, a few
29653modifications that would be legitimate in any implementation of \TeX\
29654such as, e.g., preventing inadequate results of the glue to \.{DVI}
29655unit conversion during |ship_out|.  (2)~In extended mode there are
29656additional primitive commands and the extended features of \eTeX\ are
29657available.
29658
29659The distinction between these two modes of operation initially takes
29660place when a `virgin' \.{eINITEX} starts without reading a format file.
29661Later on the values of all \eTeX\ state variables are inherited when
29662\.{eVIRTEX} (or \.{eINITEX}) reads a format file.
29663
29664The code below is designed to work for cases where `$|init|\ldots|tini|$'
29665is a run-time switch.
29666
29667@<Enable \eTeX, if requested@>=
29668@!init if (buffer[loc]="*")and(format_ident=" (INITEX)") then
29669  begin no_new_control_sequence:=false;
29670  @<Generate all \eTeX\ primitives@>@;
29671  incr(loc); eTeX_mode:=1; {enter extended mode}
29672  @<Initialize variables for \eTeX\ extended mode@>@;
29673  end;
29674tini@;@/
29675if not no_new_control_sequence then {just entered extended mode ?}
29676  no_new_control_sequence:=true@+else
29677
29678@ The \eTeX\ features available in extended mode are grouped into two
29679categories:  (1)~Some of them are permanently enabled and have no
29680semantic effect as long as none of the additional primitives are
29681executed.  (2)~The remaining \eTeX\ features are optional and can be
29682individually enabled and disabled.  For each optional feature there is
29683an \eTeX\ state variable named \.{\\...state}; the feature is enabled,
29684resp.\ disabled by assigning a positive, resp.\ non-positive value to
29685that integer.
29686
29687@d eTeX_state_base=int_base+eTeX_state_code
29688@d eTeX_state(#)==eqtb[eTeX_state_base+#].int {an \eTeX\ state variable}
29689@#
29690@d eTeX_version_code=eTeX_int {code for \.{\\eTeXversion}}
29691
29692@<Generate all \eTeX...@>=
29693primitive("lastnodetype",last_item,last_node_type_code);
29694@!@:last_node_type_}{\.{\\lastnodetype} primitive@>
29695primitive("eTeXversion",last_item,eTeX_version_code);
29696@!@:eTeX_version_}{\.{\\eTeXversion} primitive@>
29697primitive("eTeXrevision",convert,eTeX_revision_code);@/
29698@!@:eTeX_revision_}{\.{\\eTeXrevision} primitive@>
29699
29700primitive("XeTeXversion",last_item,XeTeX_version_code);
29701@!@:XeTeX_version_}{\.{\\XeTeXversion} primitive@>
29702primitive("XeTeXrevision",convert,XeTeX_revision_code);@/
29703@!@:XeTeXrevision_}{\.{\\XeTeXrevision} primitive@>
29704
29705primitive("XeTeXcountglyphs",last_item,XeTeX_count_glyphs_code);
29706
29707primitive("XeTeXcountvariations",last_item,XeTeX_count_variations_code);
29708primitive("XeTeXvariation",last_item,XeTeX_variation_code);
29709primitive("XeTeXfindvariationbyname",last_item,XeTeX_find_variation_by_name_code);
29710primitive("XeTeXvariationmin",last_item,XeTeX_variation_min_code);
29711primitive("XeTeXvariationmax",last_item,XeTeX_variation_max_code);
29712primitive("XeTeXvariationdefault",last_item,XeTeX_variation_default_code);
29713
29714primitive("XeTeXcountfeatures",last_item,XeTeX_count_features_code);
29715primitive("XeTeXfeaturecode",last_item,XeTeX_feature_code_code);
29716primitive("XeTeXfindfeaturebyname",last_item,XeTeX_find_feature_by_name_code);
29717primitive("XeTeXisexclusivefeature",last_item,XeTeX_is_exclusive_feature_code);
29718primitive("XeTeXcountselectors",last_item,XeTeX_count_selectors_code);
29719primitive("XeTeXselectorcode",last_item,XeTeX_selector_code_code);
29720primitive("XeTeXfindselectorbyname",last_item,XeTeX_find_selector_by_name_code);
29721primitive("XeTeXisdefaultselector",last_item,XeTeX_is_default_selector_code);
29722
29723primitive("XeTeXvariationname",convert,XeTeX_variation_name_code);
29724primitive("XeTeXfeaturename",convert,XeTeX_feature_name_code);
29725primitive("XeTeXselectorname",convert,XeTeX_selector_name_code);
29726
29727primitive("XeTeXOTcountscripts",last_item,XeTeX_OT_count_scripts_code);
29728primitive("XeTeXOTcountlanguages",last_item,XeTeX_OT_count_languages_code);
29729primitive("XeTeXOTcountfeatures",last_item,XeTeX_OT_count_features_code);
29730primitive("XeTeXOTscripttag",last_item,XeTeX_OT_script_code);
29731primitive("XeTeXOTlanguagetag",last_item,XeTeX_OT_language_code);
29732primitive("XeTeXOTfeaturetag",last_item,XeTeX_OT_feature_code);
29733
29734primitive("XeTeXcharglyph", last_item, XeTeX_map_char_to_glyph_code);
29735primitive("XeTeXglyphindex", last_item, XeTeX_glyph_index_code);
29736primitive("XeTeXglyphbounds", last_item, XeTeX_glyph_bounds_code);
29737
29738primitive("XeTeXglyphname",convert,XeTeX_glyph_name_code);
29739
29740primitive("XeTeXfonttype", last_item, XeTeX_font_type_code);
29741
29742primitive("XeTeXfirstfontchar", last_item, XeTeX_first_char_code);
29743primitive("XeTeXlastfontchar", last_item, XeTeX_last_char_code);
29744
29745primitive("pdflastxpos",last_item,pdf_last_x_pos_code);
29746primitive("pdflastypos",last_item,pdf_last_y_pos_code);
29747primitive("strcmp",convert,pdf_strcmp_code);
29748primitive("shellescape",last_item,pdf_shell_escape_code);
29749
29750primitive("XeTeXpdfpagecount",last_item,XeTeX_pdf_page_count_code);
29751
29752@ @<Cases of |last_item| for |print_cmd_chr|@>=
29753last_node_type_code: print_esc("lastnodetype");
29754eTeX_version_code: print_esc("eTeXversion");
29755XeTeX_version_code: print_esc("XeTeXversion");
29756
29757XeTeX_count_glyphs_code: print_esc("XeTeXcountglyphs");
29758
29759XeTeX_count_variations_code: print_esc("XeTeXcountvariations");
29760XeTeX_variation_code: print_esc("XeTeXvariation");
29761XeTeX_find_variation_by_name_code: print_esc("XeTeXfindvariationbyname");
29762XeTeX_variation_min_code: print_esc("XeTeXvariationmin");
29763XeTeX_variation_max_code: print_esc("XeTeXvariationmax");
29764XeTeX_variation_default_code: print_esc("XeTeXvariationdefault");
29765
29766XeTeX_count_features_code: print_esc("XeTeXcountfeatures");
29767XeTeX_feature_code_code: print_esc("XeTeXfeaturecode");
29768XeTeX_find_feature_by_name_code: print_esc("XeTeXfindfeaturebyname");
29769XeTeX_is_exclusive_feature_code: print_esc("XeTeXisexclusivefeature");
29770XeTeX_count_selectors_code: print_esc("XeTeXcountselectors");
29771XeTeX_selector_code_code: print_esc("XeTeXselectorcode");
29772XeTeX_find_selector_by_name_code: print_esc("XeTeXfindselectorbyname");
29773XeTeX_is_default_selector_code: print_esc("XeTeXisdefaultselector");
29774
29775XeTeX_OT_count_scripts_code: print_esc("XeTeXOTcountscripts");
29776XeTeX_OT_count_languages_code: print_esc("XeTeXOTcountlanguages");
29777XeTeX_OT_count_features_code: print_esc("XeTeXOTcountfeatures");
29778XeTeX_OT_script_code: print_esc("XeTeXOTscripttag");
29779XeTeX_OT_language_code: print_esc("XeTeXOTlanguagetag");
29780XeTeX_OT_feature_code: print_esc("XeTeXOTfeaturetag");
29781
29782XeTeX_map_char_to_glyph_code: print_esc("XeTeXcharglyph");
29783XeTeX_glyph_index_code: print_esc("XeTeXglyphindex");
29784XeTeX_glyph_bounds_code: print_esc("XeTeXglyphbounds");
29785
29786XeTeX_font_type_code: print_esc("XeTeXfonttype");
29787
29788XeTeX_first_char_code: print_esc("XeTeXfirstfontchar");
29789XeTeX_last_char_code: print_esc("XeTeXlastfontchar");
29790
29791  pdf_last_x_pos_code:  print_esc("pdflastxpos");
29792  pdf_last_y_pos_code:  print_esc("pdflastypos");
29793
29794XeTeX_pdf_page_count_code: print_esc("XeTeXpdfpagecount");
29795
29796@ @<Cases for fetching an integer value@>=
29797eTeX_version_code: cur_val:=eTeX_version;
29798XeTeX_version_code: cur_val:=XeTeX_version;
29799
29800XeTeX_count_glyphs_code:
29801  begin
29802    scan_font_ident; n:=cur_val;
29803    if is_aat_font(n) then
29804      cur_val:=aat_font_get(m - XeTeX_int, font_layout_engine[n])
29805    else if is_otgr_font(n) then
29806      cur_val:=ot_font_get(m - XeTeX_int, font_layout_engine[n])
29807    else
29808      cur_val:=0;
29809  end;
29810
29811XeTeX_count_features_code:
29812  begin
29813    scan_font_ident; n:=cur_val;
29814    if is_aat_font(n) then
29815      cur_val:=aat_font_get(m - XeTeX_int, font_layout_engine[n])
29816    else if is_gr_font(n) then
29817      cur_val:=ot_font_get(m - XeTeX_int, font_layout_engine[n])
29818    else
29819      cur_val:=0;
29820  end;
29821
29822XeTeX_variation_code,
29823XeTeX_variation_min_code,
29824XeTeX_variation_max_code,
29825XeTeX_variation_default_code,
29826XeTeX_count_variations_code:
29827  begin
29828    scan_font_ident; n:=cur_val;
29829    cur_val:=0; {Deprecated}
29830  end;
29831
29832XeTeX_feature_code_code,
29833XeTeX_is_exclusive_feature_code,
29834XeTeX_count_selectors_code:
29835  begin
29836    scan_font_ident; n:=cur_val;
29837    if is_aat_font(n) then begin
29838      scan_int; k:=cur_val;
29839      cur_val:=aat_font_get_1(m - XeTeX_int, font_layout_engine[n], k);
29840    end else if is_gr_font(n) then begin
29841      scan_int; k:=cur_val;
29842      cur_val:=ot_font_get_1(m - XeTeX_int, font_layout_engine[n], k);
29843    end else begin
29844      not_aat_gr_font_error(last_item, m, n); cur_val:=-1;
29845    end;
29846  end;
29847
29848XeTeX_selector_code_code,
29849XeTeX_is_default_selector_code:
29850  begin
29851    scan_font_ident; n:=cur_val;
29852    if is_aat_font(n) then begin
29853      scan_int; k:=cur_val; scan_int;
29854      cur_val:=aat_font_get_2(m - XeTeX_int, font_layout_engine[n], k, cur_val);
29855    end else if is_gr_font(n) then begin
29856      scan_int; k:=cur_val; scan_int;
29857      cur_val:=ot_font_get_2(m - XeTeX_int, font_layout_engine[n], k, cur_val);
29858    end else begin
29859      not_aat_gr_font_error(last_item, m, n); cur_val:=-1;
29860    end;
29861  end;
29862
29863XeTeX_find_variation_by_name_code:
29864  begin
29865    scan_font_ident; n:=cur_val;
29866    if is_aat_font(n) then begin
29867      scan_and_pack_name;
29868      cur_val:=aat_font_get_named(m - XeTeX_int, font_layout_engine[n]);
29869    end else begin
29870      not_aat_font_error(last_item, m, n); cur_val:=-1;
29871    end;
29872  end;
29873
29874XeTeX_find_feature_by_name_code:
29875  begin
29876    scan_font_ident; n:=cur_val;
29877    if is_aat_font(n) then begin
29878      scan_and_pack_name;
29879      cur_val:=aat_font_get_named(m - XeTeX_int, font_layout_engine[n]);
29880    end else if is_gr_font(n) then begin
29881      scan_and_pack_name;
29882      cur_val:=gr_font_get_named(m - XeTeX_int, font_layout_engine[n]);
29883    end else begin
29884      not_aat_gr_font_error(last_item, m, n); cur_val:=-1;
29885    end;
29886  end;
29887
29888XeTeX_find_selector_by_name_code:
29889  begin
29890    scan_font_ident; n:=cur_val;
29891    if is_aat_font(n) then begin
29892      scan_int; k:=cur_val; scan_and_pack_name;
29893      cur_val:=aat_font_get_named_1(m - XeTeX_int, font_layout_engine[n], k);
29894    end else if is_gr_font(n) then begin
29895      scan_int; k:=cur_val; scan_and_pack_name;
29896      cur_val:=gr_font_get_named_1(m - XeTeX_int, font_layout_engine[n], k);
29897    end else begin
29898      not_aat_gr_font_error(last_item, m, n); cur_val:=-1;
29899    end;
29900  end;
29901
29902XeTeX_OT_count_scripts_code:
29903  begin
29904    scan_font_ident; n:=cur_val;
29905    if is_ot_font(n) then
29906      cur_val:=ot_font_get(m - XeTeX_int, font_layout_engine[n])
29907    else begin
29908      cur_val:=0;
29909    end;
29910  end;
29911
29912XeTeX_OT_count_languages_code,
29913XeTeX_OT_script_code:
29914  begin
29915    scan_font_ident; n:=cur_val;
29916    if is_ot_font(n) then begin
29917      scan_int;
29918      cur_val:=ot_font_get_1(m - XeTeX_int, font_layout_engine[n], cur_val);
29919    end else begin
29920      not_ot_font_error(last_item, m, n); cur_val:=-1;
29921    end;
29922  end;
29923
29924XeTeX_OT_count_features_code,
29925XeTeX_OT_language_code:
29926  begin
29927    scan_font_ident; n:=cur_val;
29928    if is_ot_font(n) then begin
29929      scan_int; k:=cur_val; scan_int;
29930      cur_val:=ot_font_get_2(m - XeTeX_int, font_layout_engine[n], k, cur_val);
29931    end else begin
29932      not_ot_font_error(last_item, m, n); cur_val:=-1;
29933    end;
29934  end;
29935
29936XeTeX_OT_feature_code:
29937  begin
29938    scan_font_ident; n:=cur_val;
29939    if is_ot_font(n) then begin
29940      scan_int; k:=cur_val; scan_int; kk:=cur_val; scan_int;
29941      cur_val:=ot_font_get_3(m - XeTeX_int, font_layout_engine[n], k, kk, cur_val);
29942    end else begin
29943      not_ot_font_error(last_item, m, n); cur_val:=-1;
29944    end;
29945  end;
29946
29947XeTeX_map_char_to_glyph_code:
29948  begin
29949    if is_native_font(cur_font) then begin
29950      scan_int; n:=cur_val; cur_val:=map_char_to_glyph(cur_font, n)
29951    end else begin
29952      not_native_font_error(last_item, m, cur_font); cur_val:=0
29953    end
29954  end;
29955
29956XeTeX_glyph_index_code:
29957  begin
29958    if is_native_font(cur_font) then begin
29959      scan_and_pack_name;
29960      cur_val:=map_glyph_to_index(cur_font)
29961    end else begin
29962      not_native_font_error(last_item, m, cur_font); cur_val:=0
29963    end
29964  end;
29965
29966XeTeX_font_type_code:
29967  begin
29968    scan_font_ident; n:=cur_val;
29969    if is_aat_font(n) then cur_val:=1
29970    else if is_ot_font(n) then cur_val:=2
29971    else if is_gr_font(n) then cur_val:=3
29972    else cur_val:=0;
29973  end;
29974
29975XeTeX_first_char_code,XeTeX_last_char_code:
29976  begin
29977    scan_font_ident; n:=cur_val;
29978    if is_native_font(n) then
29979      cur_val:=get_font_char_range(n, m = XeTeX_first_char_code)
29980    else begin
29981      if m = XeTeX_first_char_code then cur_val:=font_bc[n]
29982      else cur_val:=font_ec[n];
29983    end
29984  end;
29985
29986  pdf_last_x_pos_code:  cur_val:=pdf_last_x_pos;
29987  pdf_last_y_pos_code:  cur_val:=pdf_last_y_pos;
29988
29989XeTeX_pdf_page_count_code:
29990  begin
29991    scan_and_pack_name;
29992    cur_val:=count_pdf_file_pages;
29993  end;
29994
29995@ Slip in an extra procedure here and there....
29996
29997@<Error hand...@>=
29998procedure scan_and_pack_name; forward;
29999
30000@ @<Declare procedures needed in |do_extension|@>=
30001procedure scan_and_pack_name;
30002begin
30003  scan_file_name; pack_cur_name;
30004end;
30005
30006@ @<Declare the procedure called |print_cmd_chr|@>=
30007procedure not_aat_font_error(cmd, c: integer; f: integer);
30008begin
30009  print_err("Cannot use "); print_cmd_chr(cmd, c);
30010  print(" with "); print(font_name[f]);
30011  print("; not an AAT font");
30012  error;
30013end;
30014
30015procedure not_aat_gr_font_error(cmd, c: integer; f: integer);
30016begin
30017  print_err("Cannot use "); print_cmd_chr(cmd, c);
30018  print(" with "); print(font_name[f]);
30019  print("; not an AAT or Graphite font");
30020  error;
30021end;
30022
30023procedure not_ot_font_error(cmd, c: integer; f: integer);
30024begin
30025  print_err("Cannot use "); print_cmd_chr(cmd, c);
30026  print(" with "); print(font_name[f]);
30027  print("; not an OpenType Layout font");
30028  error;
30029end;
30030
30031procedure not_native_font_error(cmd, c: integer; f: integer);
30032begin
30033  print_err("Cannot use "); print_cmd_chr(cmd, c);
30034  print(" with "); print(font_name[f]);
30035  print("; not a native platform font");
30036  error;
30037end;
30038
30039@ @<Cases for fetching a dimension value@>=
30040XeTeX_glyph_bounds_code:
30041  begin
30042    if is_native_font(cur_font) then begin
30043      scan_int; n:=cur_val; { which edge: 1=left, 2=top, 3=right, 4=bottom }
30044      if (n < 1) or (n > 4) then begin
30045        print_err("\\XeTeXglyphbounds requires an edge index from 1 to 4;");
30046        print_nl("I don't know anything about edge "); print_int(n);
30047        error;
30048        cur_val:=0;
30049      end else begin
30050        scan_int; { glyph number }
30051        cur_val:=get_glyph_bounds(cur_font, n, cur_val);
30052      end
30053    end else begin
30054      not_native_font_error(last_item, m, cur_font); cur_val:=0
30055    end
30056  end;
30057
30058@ @<Cases of |convert| for |print_cmd_chr|@>=
30059eTeX_revision_code: print_esc("eTeXrevision");
30060XeTeX_revision_code: print_esc("XeTeXrevision");
30061
30062XeTeX_variation_name_code: print_esc("XeTeXvariationname");
30063XeTeX_feature_name_code: print_esc("XeTeXfeaturename");
30064XeTeX_selector_name_code: print_esc("XeTeXselectorname");
30065XeTeX_glyph_name_code: print_esc("XeTeXglyphname");
30066
30067XeTeX_Uchar_code: print_esc("Uchar");
30068XeTeX_Ucharcat_code: print_esc("Ucharcat");
30069
30070@ @<Cases of `Scan the argument for command |c|'@>=
30071eTeX_revision_code: do_nothing;
30072pdf_strcmp_code:
30073  begin
30074    save_scanner_status:=scanner_status;
30075    save_warning_index:=warning_index;
30076    save_def_ref:=def_ref;
30077    save_cur_string;
30078    compare_strings;
30079    def_ref:=save_def_ref;
30080    warning_index:=save_warning_index;
30081    scanner_status:=save_scanner_status;
30082    restore_cur_string;
30083  end;
30084XeTeX_revision_code: do_nothing;
30085
30086XeTeX_variation_name_code:
30087  begin
30088    scan_font_ident; fnt:=cur_val;
30089    if is_aat_font(fnt) then begin
30090      scan_int; arg1:=cur_val; arg2:=0;
30091    end else
30092      not_aat_font_error(convert, c, fnt);
30093  end;
30094
30095XeTeX_feature_name_code:
30096  begin
30097    scan_font_ident; fnt:=cur_val;
30098    if is_aat_font(fnt) or is_gr_font(fnt) then begin
30099      scan_int; arg1:=cur_val; arg2:=0;
30100    end else
30101      not_aat_gr_font_error(convert, c, fnt);
30102  end;
30103
30104XeTeX_selector_name_code:
30105  begin
30106    scan_font_ident; fnt:=cur_val;
30107    if is_aat_font(fnt) or is_gr_font(fnt) then begin
30108      scan_int; arg1:=cur_val; scan_int; arg2:=cur_val;
30109    end else
30110      not_aat_gr_font_error(convert, c, fnt);
30111  end;
30112
30113XeTeX_glyph_name_code:
30114  begin
30115    scan_font_ident; fnt:=cur_val;
30116    if is_native_font(fnt) then begin
30117      scan_int; arg1:=cur_val;
30118    end else
30119      not_native_font_error(convert, c, fnt);
30120  end;
30121
30122left_margin_kern_code, right_margin_kern_code: begin
30123    scan_register_num;
30124    fetch_box(p);
30125    if (p = null) or (type(p) <> hlist_node) then
30126        pdf_error("marginkern", "a non-empty hbox expected")
30127end;
30128
30129@ @<Cases of `Print the result of command |c|'@>=
30130eTeX_revision_code: print(eTeX_revision);
30131pdf_strcmp_code: print_int(cur_val);
30132XeTeX_revision_code: print(XeTeX_revision);
30133
30134XeTeX_variation_name_code:
30135  if is_aat_font(fnt) then
30136    aat_print_font_name(c, font_layout_engine[fnt], arg1, arg2);
30137
30138XeTeX_feature_name_code,
30139XeTeX_selector_name_code:
30140  if is_aat_font(fnt) then
30141    aat_print_font_name(c, font_layout_engine[fnt], arg1, arg2)
30142  else if is_gr_font(fnt) then
30143    gr_print_font_name(c, font_layout_engine[fnt], arg1, arg2);
30144
30145XeTeX_glyph_name_code:
30146  if is_native_font(fnt) then print_glyph_name(fnt, arg1);
30147
30148left_margin_kern_code: begin
30149  p:=list_ptr(p);
30150  while (p <> null) and
30151        (cp_skipable(p) or
30152         ((not is_char_node(p)) and (type(p) = glue_node) and (subtype(p) = left_skip_code + 1)))
30153  do
30154    p:=link(p);
30155  if (p <> null) and (not is_char_node(p)) and (type(p) = margin_kern_node) and (subtype(p) = left_side) then
30156    print_scaled(width(p))
30157  else
30158    print("0");
30159  print("pt");
30160end;
30161
30162right_margin_kern_code: begin
30163  q:=list_ptr(p);
30164  p:=prev_rightmost(q, null);
30165  while (p <> null) and
30166        (cp_skipable(p) or
30167         ((not is_char_node(p)) and (type(p) = glue_node) and (subtype(p) = right_skip_code + 1)))
30168  do
30169    p:=prev_rightmost(q, p);
30170  if (p <> null) and (not is_char_node(p)) and (type(p) = margin_kern_node) and (subtype(p) = right_side) then
30171    print_scaled(width(p))
30172  else
30173    print("0");
30174  print("pt");
30175end;
30176
30177@ @d eTeX_ex==(eTeX_mode=1) {is this extended mode?}
30178
30179@<Glob...@>=
30180@!eTeX_mode: 0..1; {identifies compatibility and extended mode}
30181
30182@ @<Initialize table entries...@>=
30183eTeX_mode:=0; {initially we are in compatibility mode}
30184@<Initialize variables for \eTeX\ compatibility mode@>@;
30185
30186@ @<Dump the \eTeX\ state@>=
30187dump_int(eTeX_mode);
30188{ in a deliberate change from e-TeX, we allow non-zero state variables to be dumped }
30189
30190@ @<Undump the \eTeX\ state@>=
30191undump(0)(1)(eTeX_mode);
30192if eTeX_ex then
30193  begin @<Initialize variables for \eTeX\ extended mode@>@;
30194  end
30195else  begin @<Initialize variables for \eTeX\ compatibility mode@>@;
30196  end;
30197
30198@ The |eTeX_enabled| function simply returns its first argument as
30199result.  This argument is |true| if an optional \eTeX\ feature is
30200currently enabled; otherwise, if the argument is |false|, the function
30201gives an error message.
30202
30203@<Declare \eTeX\ procedures for use...@>=
30204function eTeX_enabled(@!b:boolean;@!j:quarterword;@!k:halfword):boolean;
30205begin if not b then
30206  begin print_err("Improper "); print_cmd_chr(j,k);
30207  help1("Sorry, this optional e-TeX feature has been disabled."); error;
30208  end;
30209eTeX_enabled:=b;
30210end;
30211
30212@ First we implement the additional \eTeX\ parameters in the table of
30213equivalents.
30214
30215@<Generate all \eTeX...@>=
30216primitive("everyeof",assign_toks,every_eof_loc);
30217@!@:every_eof_}{\.{\\everyeof} primitive@>
30218primitive("tracingassigns",assign_int,int_base+tracing_assigns_code);@/
30219@!@:tracing_assigns_}{\.{\\tracingassigns} primitive@>
30220primitive("tracinggroups",assign_int,int_base+tracing_groups_code);@/
30221@!@:tracing_groups_}{\.{\\tracinggroups} primitive@>
30222primitive("tracingifs",assign_int,int_base+tracing_ifs_code);@/
30223@!@:tracing_ifs_}{\.{\\tracingifs} primitive@>
30224primitive("tracingscantokens",assign_int,int_base+tracing_scan_tokens_code);@/
30225@!@:tracing_scan_tokens_}{\.{\\tracingscantokens} primitive@>
30226primitive("tracingnesting",assign_int,int_base+tracing_nesting_code);@/
30227@!@:tracing_nesting_}{\.{\\tracingnesting} primitive@>
30228primitive("predisplaydirection",
30229  assign_int,int_base+pre_display_direction_code);@/
30230@!@:pre_display_direction_}{\.{\\predisplaydirection} primitive@>
30231primitive("lastlinefit",assign_int,int_base+last_line_fit_code);@/
30232@!@:last_line_fit_}{\.{\\lastlinefit} primitive@>
30233primitive("savingvdiscards",assign_int,int_base+saving_vdiscards_code);@/
30234@!@:saving_vdiscards_}{\.{\\savingvdiscards} primitive@>
30235primitive("savinghyphcodes",assign_int,int_base+saving_hyph_codes_code);@/
30236@!@:saving_hyph_codes_}{\.{\\savinghyphcodes} primitive@>
30237
30238@ @d every_eof==equiv(every_eof_loc)
30239
30240@<Cases of |assign_toks| for |print_cmd_chr|@>=
30241every_eof_loc: print_esc("everyeof");
30242XeTeX_inter_char_loc: print_esc("XeTeXinterchartoks");
30243
30244@ @<Cases for |print_param|@>=
30245tracing_assigns_code:print_esc("tracingassigns");
30246tracing_groups_code:print_esc("tracinggroups");
30247tracing_ifs_code:print_esc("tracingifs");
30248tracing_scan_tokens_code:print_esc("tracingscantokens");
30249tracing_nesting_code:print_esc("tracingnesting");
30250pre_display_direction_code:print_esc("predisplaydirection");
30251last_line_fit_code:print_esc("lastlinefit");
30252saving_vdiscards_code:print_esc("savingvdiscards");
30253saving_hyph_codes_code:print_esc("savinghyphcodes");
30254
30255@ In order to handle \.{\\everyeof} we need an array |eof_seen| of
30256boolean variables.
30257
30258@<Glob...@>=
30259@!eof_seen : array[1..max_in_open] of boolean; {has eof been seen?}
30260
30261@ The |print_group| procedure prints the current level of grouping and
30262the name corresponding to |cur_group|.
30263
30264@<Declare \eTeX\ procedures for tr...@>=
30265procedure print_group(@!e:boolean);
30266label exit;
30267begin case cur_group of
30268  bottom_level: begin print("bottom level"); return;
30269    end;
30270  simple_group,semi_simple_group:
30271    begin if cur_group=semi_simple_group then print("semi ");
30272    print("simple");
30273    end;
30274  hbox_group,adjusted_hbox_group:
30275    begin if cur_group=adjusted_hbox_group then print("adjusted ");
30276    print("hbox");
30277    end;
30278  vbox_group: print("vbox");
30279  vtop_group: print("vtop");
30280  align_group,no_align_group:
30281    begin if cur_group=no_align_group then print("no ");
30282    print("align");
30283    end;
30284  output_group: print("output");
30285  disc_group: print("disc");
30286  insert_group: print("insert");
30287  vcenter_group: print("vcenter");
30288  math_group,math_choice_group,math_shift_group,math_left_group:
30289    begin print("math");
30290    if cur_group=math_choice_group then print(" choice")
30291    else if cur_group=math_shift_group then print(" shift")
30292    else if cur_group=math_left_group then print(" left");
30293    end;
30294  end; {there are no other cases}
30295print(" group (level "); print_int(qo(cur_level)); print_char(")");
30296if saved(-1)<>0 then
30297  begin if e then print(" entered at line ") else print(" at line ");
30298  print_int(saved(-1));
30299  end;
30300exit:end;
30301
30302@ The |group_trace| procedure is called when a new level of grouping
30303begins (|e=false|) or ends (|e=true|) with |saved(-1)| containing the
30304line number.
30305
30306@<Declare \eTeX\ procedures for tr...@>=
30307@!stat procedure group_trace(@!e:boolean);
30308begin begin_diagnostic; print_char("{");
30309if e then print("leaving ") else print("entering ");
30310print_group(e); print_char("}"); end_diagnostic(false);
30311end;
30312tats
30313
30314@ The \.{\\currentgrouplevel} and \.{\\currentgrouptype} commands return
30315the current level of grouping and the type of the current group
30316respectively.
30317
30318@d current_group_level_code=eTeX_int+1 {code for \.{\\currentgrouplevel}}
30319@d current_group_type_code=eTeX_int+2 {code for \.{\\currentgrouptype}}
30320
30321@<Generate all \eTeX...@>=
30322primitive("currentgrouplevel",last_item,current_group_level_code);
30323@!@:current_group_level_}{\.{\\currentgrouplevel} primitive@>
30324primitive("currentgrouptype",last_item,current_group_type_code);
30325@!@:current_group_type_}{\.{\\currentgrouptype} primitive@>
30326
30327@ @<Cases of |last_item| for |print_cmd_chr|@>=
30328current_group_level_code: print_esc("currentgrouplevel");
30329current_group_type_code: print_esc("currentgrouptype");
30330
30331@ @<Cases for fetching an integer value@>=
30332current_group_level_code: cur_val:=cur_level-level_one;
30333current_group_type_code: cur_val:=cur_group;
30334
30335@ The \.{\\currentiflevel}, \.{\\currentiftype}, and
30336\.{\\currentifbranch} commands return the current level of conditionals
30337and the type and branch of the current conditional.
30338
30339@d current_if_level_code=eTeX_int+3 {code for \.{\\currentiflevel}}
30340@d current_if_type_code=eTeX_int+4 {code for \.{\\currentiftype}}
30341@d current_if_branch_code=eTeX_int+5 {code for \.{\\currentifbranch}}
30342
30343@<Generate all \eTeX...@>=
30344primitive("currentiflevel",last_item,current_if_level_code);
30345@!@:current_if_level_}{\.{\\currentiflevel} primitive@>
30346primitive("currentiftype",last_item,current_if_type_code);
30347@!@:current_if_type_}{\.{\\currentiftype} primitive@>
30348primitive("currentifbranch",last_item,current_if_branch_code);
30349@!@:current_if_branch_}{\.{\\currentifbranch} primitive@>
30350
30351@ @<Cases of |last_item| for |print_cmd_chr|@>=
30352current_if_level_code: print_esc("currentiflevel");
30353current_if_type_code: print_esc("currentiftype");
30354current_if_branch_code: print_esc("currentifbranch");
30355
30356@ @<Cases for fetching an integer value@>=
30357current_if_level_code: begin q:=cond_ptr; cur_val:=0;
30358  while q<>null do
30359    begin incr(cur_val); q:=link(q);
30360    end;
30361  end;
30362current_if_type_code: if cond_ptr=null then cur_val:=0
30363  else if cur_if<unless_code then cur_val:=cur_if+1
30364  else cur_val:=-(cur_if-unless_code+1);
30365current_if_branch_code:
30366  if (if_limit=or_code)or(if_limit=else_code) then cur_val:=1
30367  else if if_limit=fi_code then cur_val:=-1
30368  else cur_val:=0;
30369
30370@ The \.{\\fontcharwd}, \.{\\fontcharht}, \.{\\fontchardp}, and
30371\.{\\fontcharic} commands return information about a character in a
30372font.
30373
30374@d font_char_wd_code=eTeX_dim {code for \.{\\fontcharwd}}
30375@d font_char_ht_code=eTeX_dim+1 {code for \.{\\fontcharht}}
30376@d font_char_dp_code=eTeX_dim+2 {code for \.{\\fontchardp}}
30377@d font_char_ic_code=eTeX_dim+3 {code for \.{\\fontcharic}}
30378
30379@<Generate all \eTeX...@>=
30380primitive("fontcharwd",last_item,font_char_wd_code);
30381@!@:font_char_wd_}{\.{\\fontcharwd} primitive@>
30382primitive("fontcharht",last_item,font_char_ht_code);
30383@!@:font_char_ht_}{\.{\\fontcharht} primitive@>
30384primitive("fontchardp",last_item,font_char_dp_code);
30385@!@:font_char_dp_}{\.{\\fontchardp} primitive@>
30386primitive("fontcharic",last_item,font_char_ic_code);
30387@!@:font_char_ic_}{\.{\\fontcharic} primitive@>
30388
30389@ @<Cases of |last_item| for |print_cmd_chr|@>=
30390font_char_wd_code: print_esc("fontcharwd");
30391font_char_ht_code: print_esc("fontcharht");
30392font_char_dp_code: print_esc("fontchardp");
30393font_char_ic_code: print_esc("fontcharic");
30394
30395@ @<Cases for fetching a dimension value@>=
30396font_char_wd_code,
30397font_char_ht_code,
30398font_char_dp_code,
30399font_char_ic_code: begin scan_font_ident; q:=cur_val; scan_usv_num;
30400  if is_native_font(q) then begin
30401    case m of
30402    font_char_wd_code: cur_val:=getnativecharwd(q, cur_val);
30403    font_char_ht_code: cur_val:=getnativecharht(q, cur_val);
30404    font_char_dp_code: cur_val:=getnativechardp(q, cur_val);
30405    font_char_ic_code: cur_val:=getnativecharic(q, cur_val);
30406    end; {there are no other cases}
30407  end else begin
30408    if (font_bc[q]<=cur_val)and(font_ec[q]>=cur_val) then
30409      begin i:=char_info(q)(qi(cur_val));
30410      case m of
30411      font_char_wd_code: cur_val:=char_width(q)(i);
30412      font_char_ht_code: cur_val:=char_height(q)(height_depth(i));
30413      font_char_dp_code: cur_val:=char_depth(q)(height_depth(i));
30414      font_char_ic_code: cur_val:=char_italic(q)(i);
30415      end; {there are no other cases}
30416      end
30417    else cur_val:=0;
30418    end
30419  end;
30420
30421@ The \.{\\parshapedimen}, \.{\\parshapeindent}, and \.{\\parshapelength}
30422commands return the indent and length parameters of the current
30423\.{\\parshape} specification.
30424
30425@d par_shape_length_code=eTeX_dim+4 {code for \.{\\parshapelength}}
30426@d par_shape_indent_code=eTeX_dim+5 {code for \.{\\parshapeindent}}
30427@d par_shape_dimen_code=eTeX_dim+6 {code for \.{\\parshapedimen}}
30428
30429@<Generate all \eTeX...@>=
30430primitive("parshapelength",last_item,par_shape_length_code);
30431@!@:par_shape_length_}{\.{\\parshapelength} primitive@>
30432primitive("parshapeindent",last_item,par_shape_indent_code);
30433@!@:par_shape_indent_}{\.{\\parshapeindent} primitive@>
30434primitive("parshapedimen",last_item,par_shape_dimen_code);
30435@!@:par_shape_dimen_}{\.{\\parshapedimen} primitive@>
30436
30437@ @<Cases of |last_item| for |print_cmd_chr|@>=
30438par_shape_length_code: print_esc("parshapelength");
30439par_shape_indent_code: print_esc("parshapeindent");
30440par_shape_dimen_code: print_esc("parshapedimen");
30441
30442@ @<Cases for fetching a dimension value@>=
30443par_shape_length_code,
30444par_shape_indent_code,
30445par_shape_dimen_code: begin q:=cur_chr-par_shape_length_code; scan_int;
30446  if (par_shape_ptr=null)or(cur_val<=0) then cur_val:=0
30447  else  begin if q=2 then
30448      begin q:=cur_val mod 2; cur_val:=(cur_val+q)div 2;
30449      end;
30450    if cur_val>info(par_shape_ptr) then cur_val:=info(par_shape_ptr);
30451    cur_val:=mem[par_shape_ptr+2*cur_val-q].sc;
30452    end;
30453  cur_val_level:=dimen_val;
30454  end;
30455
30456@ The \.{\\showgroups} command displays all currently active grouping
30457levels.
30458
30459@d show_groups=4 { \.{\\showgroups} }
30460
30461@<Generate all \eTeX...@>=
30462primitive("showgroups",xray,show_groups);
30463@!@:show_groups_}{\.{\\showgroups} primitive@>
30464
30465@ @<Cases of |xray| for |print_cmd_chr|@>=
30466show_groups:print_esc("showgroups");
30467
30468@ @<Cases for |show_whatever|@>=
30469show_groups: begin begin_diagnostic; show_save_groups;
30470  end;
30471
30472@ @<Types...@>=
30473@!save_pointer=0..save_size; {index into |save_stack|}
30474
30475@ The modifications of \TeX\ required for the display produced by the
30476|show_save_groups| procedure were first discussed by Donald~E. Knuth in
30477{\sl TUGboat\/} {\bf 11}, 165--170 and 499--511, 1990.
30478@^Knuth, Donald Ervin@>
30479
30480In order to understand a group type we also have to know its mode.
30481Since unrestricted horizontal modes are not associated with grouping,
30482they are skipped when traversing the semantic nest.
30483
30484@<Declare \eTeX\ procedures for use...@>=
30485procedure show_save_groups;
30486label found1,found2,found,done;
30487var p:0..nest_size; {index into |nest|}
30488@!m:-mmode..mmode; {mode}
30489@!v:save_pointer; {saved value of |save_ptr|}
30490@!l:quarterword; {saved value of |cur_level|}
30491@!c:group_code; {saved value of |cur_group|}
30492@!a:-1..1; {to keep track of alignments}
30493@!i:integer;
30494@!j:quarterword;
30495@!s:str_number;
30496begin p:=nest_ptr; nest[p]:=cur_list; {put the top level into the array}
30497v:=save_ptr; l:=cur_level; c:=cur_group;
30498save_ptr:=cur_boundary; decr(cur_level);@/
30499a:=1;
30500print_nl(""); print_ln;
30501loop@+begin print_nl("### "); print_group(true);
30502  if cur_group=bottom_level then goto done;
30503  repeat m:=nest[p].mode_field;
30504  if p>0 then decr(p) else m:=vmode;
30505  until m<>hmode;
30506  print(" (");
30507  case cur_group of
30508    simple_group: begin incr(p); goto found2;
30509      end;
30510    hbox_group,adjusted_hbox_group: s:="hbox";
30511    vbox_group: s:="vbox";
30512    vtop_group: s:="vtop";
30513    align_group: if a=0 then
30514        begin if m=-vmode then s:="halign" else s:="valign";
30515        a:=1; goto found1;
30516        end
30517      else  begin if a=1 then print("align entry") else print_esc("cr");
30518        if p>=a then p:=p-a;
30519        a:=0; goto found;
30520        end;
30521    no_align_group:
30522      begin incr(p); a:=-1; print_esc("noalign"); goto found2;
30523      end;
30524    output_group:
30525      begin print_esc("output"); goto found;
30526      end;
30527    math_group: goto found2;
30528    disc_group,math_choice_group:
30529      begin if cur_group=disc_group then print_esc("discretionary")
30530      else print_esc("mathchoice");
30531      for i:=1 to 3 do if i<=saved(-2) then print("{}");
30532      goto found2;
30533      end;
30534    insert_group:
30535      begin if saved(-2)=255 then print_esc("vadjust")
30536      else  begin print_esc("insert"); print_int(saved(-2));
30537        end;
30538      goto found2;
30539      end;
30540    vcenter_group: begin s:="vcenter"; goto found1;
30541      end;
30542    semi_simple_group: begin incr(p); print_esc("begingroup"); goto found;
30543      end;
30544    math_shift_group:
30545      begin if m=mmode then print_char("$")
30546      else if nest[p].mode_field=mmode then
30547        begin print_cmd_chr(eq_no,saved(-2)); goto found;
30548        end;
30549      print_char("$"); goto found;
30550      end;
30551    math_left_group:
30552      begin if type(nest[p+1].eTeX_aux_field)=left_noad then print_esc("left")
30553      else print_esc("middle");
30554      goto found;
30555      end;
30556    end; {there are no other cases}
30557  @<Show the box context@>;
30558  found1: print_esc(s); @<Show the box packaging info@>;
30559  found2: print_char("{");
30560  found: print_char(")"); decr(cur_level);
30561  cur_group:=save_level(save_ptr); save_ptr:=save_index(save_ptr)
30562  end;
30563done: save_ptr:=v; cur_level:=l; cur_group:=c;
30564end;
30565
30566@ @<Show the box packaging info@>=
30567if saved(-2)<>0 then
30568  begin print_char(" ");
30569  if saved(-3)=exactly then print("to") else print("spread");
30570  print_scaled(saved(-2)); print("pt");
30571  end
30572
30573@ @<Show the box context@>=
30574i:=saved(-4);
30575if i<>0 then
30576  if i<box_flag then
30577    begin if abs(nest[p].mode_field)=vmode then j:=hmove else j:=vmove;
30578    if i>0 then print_cmd_chr(j,0) else print_cmd_chr(j,1);
30579    print_scaled(abs(i)); print("pt");
30580    end
30581  else if i<ship_out_flag then
30582    begin if i>=global_box_flag then
30583      begin print_esc("global"); i:=i-(global_box_flag-box_flag);
30584      end;
30585    print_esc("setbox"); print_int(i-box_flag); print_char("=");
30586    end
30587  else print_cmd_chr(leader_ship,i-(leader_flag-a_leaders))
30588
30589@ The |scan_general_text| procedure is much like |scan_toks(false,false)|,
30590but will be invoked via |expand|, i.e., recursively.
30591@^recursion@>
30592
30593@<Declare \eTeX\ procedures for sc...@>=
30594procedure@?scan_general_text; forward;@t\2@>
30595
30596@ The token list (balanced text) created by |scan_general_text| begins
30597at |link(temp_head)| and ends at |cur_val|.  (If |cur_val=temp_head|,
30598the list is empty.)
30599
30600@<Declare \eTeX\ procedures for tok...@>=
30601procedure scan_general_text;
30602label found;
30603var s:normal..absorbing; {to save |scanner_status|}
30604@!w:pointer; {to save |warning_index|}
30605@!d:pointer; {to save |def_ref|}
30606@!p:pointer; {tail of the token list being built}
30607@!q:pointer; {new node being added to the token list via |store_new_token|}
30608@!unbalance:halfword; {number of unmatched left braces}
30609begin s:=scanner_status; w:=warning_index; d:=def_ref;
30610scanner_status:=absorbing; warning_index:=cur_cs;
30611def_ref:=get_avail; token_ref_count(def_ref):=null; p:=def_ref;
30612scan_left_brace; {remove the compulsory left brace}
30613unbalance:=1;
30614loop@+  begin get_token;
30615  if cur_tok<right_brace_limit then
30616    if cur_cmd<right_brace then incr(unbalance)
30617    else  begin decr(unbalance);
30618      if unbalance=0 then goto found;
30619      end;
30620  store_new_token(cur_tok);
30621  end;
30622found: q:=link(def_ref); free_avail(def_ref); {discard reference count}
30623if q=null then cur_val:=temp_head @+ else cur_val:=p;
30624link(temp_head):=q;
30625scanner_status:=s; warning_index:=w; def_ref:=d;
30626end;
30627
30628@ The \.{\\showtokens} command displays a token list.
30629
30630@d show_tokens=5 { \.{\\showtokens} , must be odd! }
30631
30632@<Generate all \eTeX...@>=
30633primitive("showtokens",xray,show_tokens);
30634@!@:show_tokens_}{\.{\\showtokens} primitive@>
30635
30636@ @<Cases of |xray| for |print_cmd_chr|@>=
30637show_tokens:print_esc("showtokens");
30638
30639@ The \.{\\unexpanded} primitive prevents expansion of tokens much as
30640the result from \.{\\the} applied to a token variable.  The
30641\.{\\detokenize} primitive converts a token list into a list of
30642character tokens much as if the token list were written to a file.  We
30643use the fact that the command modifiers for \.{\\unexpanded} and
30644\.{\\detokenize} are odd whereas those for \.{\\the} and \.{\\showthe}
30645are even.
30646
30647@<Generate all \eTeX...@>=
30648primitive("unexpanded",the,1);@/
30649@!@:unexpanded_}{\.{\\unexpanded} primitive@>
30650primitive("detokenize",the,show_tokens);@/
30651@!@:detokenize_}{\.{\\detokenize} primitive@>
30652
30653@ @<Cases of |the| for |print_cmd_chr|@>=
30654else if chr_code=1 then print_esc("unexpanded")
30655else print_esc("detokenize")
30656
30657@ @<Handle \.{\\unexpanded} or \.{\\detokenize} and |return|@>=
30658if odd(cur_chr) then
30659  begin c:=cur_chr; scan_general_text;
30660  if c=1 then the_toks:=cur_val
30661  else begin old_setting:=selector; selector:=new_string; b:=pool_ptr;
30662    p:=get_avail; link(p):=link(temp_head);
30663    token_show(p); flush_list(p);
30664    selector:=old_setting; the_toks:=str_toks(b);
30665    end;
30666  return;
30667  end
30668
30669@ The \.{\\showifs} command displays all currently active conditionals.
30670
30671@d show_ifs=6 { \.{\\showifs} }
30672
30673@<Generate all \eTeX...@>=
30674primitive("showifs",xray,show_ifs);
30675@!@:show_ifs_}{\.{\\showifs} primitive@>
30676
30677@ @<Cases of |xray| for |print_cmd_chr|@>=
30678show_ifs:print_esc("showifs");
30679
30680@
30681@d print_if_line(#)==if #<>0 then
30682  begin print(" entered on line "); print_int(#);
30683  end
30684
30685@<Cases for |show_whatever|@>=
30686show_ifs: begin begin_diagnostic; print_nl(""); print_ln;
30687  if cond_ptr=null then
30688    begin print_nl("### "); print("no active conditionals");
30689    end
30690  else  begin p:=cond_ptr; n:=0;
30691    repeat incr(n); p:=link(p);@+until p=null;
30692    p:=cond_ptr; t:=cur_if; l:=if_line; m:=if_limit;
30693    repeat print_nl("### level "); print_int(n); print(": ");
30694    print_cmd_chr(if_test,t);
30695    if m=fi_code then print_esc("else");
30696    print_if_line(l);
30697    decr(n); t:=subtype(p); l:=if_line_field(p); m:=type(p); p:=link(p);
30698    until p=null;
30699    end;
30700  end;
30701
30702@ The \.{\\interactionmode} primitive allows to query and set the
30703interaction mode.
30704
30705@<Generate all \eTeX...@>=
30706primitive("interactionmode",set_page_int,2);
30707@!@:interaction_mode_}{\.{\\interactionmode} primitive@>
30708
30709@ @<Cases of |set_page_int| for |print_cmd_chr|@>=
30710else if chr_code=2 then print_esc("interactionmode")
30711
30712@ @<Cases for `Fetch the |dead_cycles| or the |insert_penalties|'@>=
30713else if m=2 then cur_val:=interaction
30714
30715@ @<Declare \eTeX\ procedures for use...@>=
30716procedure@?new_interaction; forward;@t\2@>
30717
30718@ @<Cases for |alter_integer|@>=
30719else if c=2 then
30720  begin if (cur_val<batch_mode)or(cur_val>error_stop_mode) then
30721    begin print_err("Bad interaction mode");
30722@.Bad interaction mode@>
30723    help2("Modes are 0=batch, 1=nonstop, 2=scroll, and")@/
30724    ("3=errorstop. Proceed, and I'll ignore this case.");
30725    int_error(cur_val);
30726    end
30727  else  begin cur_chr:=cur_val; new_interaction;
30728    end;
30729  end
30730
30731@ The |middle| feature of \eTeX\ allows one ore several \.{\\middle}
30732delimiters to appear between \.{\\left} and \.{\\right}.
30733
30734@<Generate all \eTeX...@>=
30735primitive("middle",left_right,middle_noad);
30736@!@:middle_}{\.{\\middle} primitive@>
30737
30738@ @<Cases of |left_right| for |print_cmd_chr|@>=
30739else if chr_code=middle_noad then print_esc("middle")
30740
30741@ In constructions such as
30742$$\vbox{\halign{\.{#}\hfil\cr
30743{}\\hbox to \\hsize\{\cr
30744\hskip 25pt \\hskip 0pt plus 0.0001fil\cr
30745\hskip 25pt ...\cr
30746\hskip 25pt \\hfil\\penalty-200\\hfilneg\cr
30747\hskip 25pt ...\}\cr}}$$
30748the stretch components of \.{\\hfil} and \.{\\hfilneg} compensate; they may,
30749however, get modified in order to prevent arithmetic overflow during
30750|hlist_out| when each of them is multiplied by a large |glue_set| value.
30751
30752Since this ``glue rounding'' depends on state variables |cur_g| and
30753|cur_glue| and \TeXXeT\ is supposed to emulate the behaviour of \TeXeT\
30754(plus a suitable postprocessor) as close as possible the glue rounding
30755cannot be postponed until (segments of) an hlist has been reversed.
30756
30757The code below is invoked after the effective width, |rule_wd|, of a glue
30758node has been computed. The glue node is either converted into a kern node
30759or, for leaders, the glue specification is replaced by an equivalent rigid
30760one; the subtype of the glue node remains unchanged.
30761
30762@<Handle a glue node for mixed...@>=
30763if (((g_sign=stretching) and (stretch_order(g)=g_order)) or
30764    ((g_sign=shrinking) and (shrink_order(g)=g_order))) then
30765  begin fast_delete_glue_ref(g);
30766  if subtype(p)<a_leaders then
30767    begin type(p):=kern_node; width(p):=rule_wd;
30768    end
30769  else  begin g:=get_node(glue_spec_size);@/
30770    stretch_order(g):=filll+1; shrink_order(g):=filll+1; {will never match}
30771    width(g):=rule_wd; stretch(g):=0; shrink(g):=0; glue_ptr(p):=g;
30772    end;
30773  end
30774
30775@ The optional |TeXXeT| feature of \eTeX\ contains the code for mixed
30776left-to-right and right-to-left typesetting.  This code is inspired by
30777but different from \TeXeT\ as presented by Donald~E. Knuth and Pierre
30778MacKay in {\sl TUGboat\/} {\bf 8}, 14--25, 1987.
30779@^Knuth, Donald Ervin@>
30780@^MacKay, Pierre@>
30781
30782In order to avoid confusion with \TeXeT\ the present implementation of
30783mixed direction typesetting is called \TeXXeT.  It differs from \TeXeT\
30784in several important aspects:  (1)~Right-to-left text is reversed
30785explicitly by the |ship_out| routine and is written to a normal \.{DVI}
30786file without any |begin_reflect| or |end_reflect| commands; (2)~a
30787|math_node| is (ab)used instead of a |whatsit_node| to record the
30788\.{\\beginL}, \.{\\endL}, \.{\\beginR}, and \.{\\endR} text direction
30789primitives in order to keep the influence on the line breaking algorithm
30790for pure left-to-right text as small as possible; (3)~right-to-left text
30791interrupted by a displayed equation is automatically resumed after that
30792equation; and (4)~the |valign| command code with a non-zero command
30793modifier is (ab)used for the text direction primitives.
30794
30795Nevertheless there is a subtle difference between \TeX\ and \TeXXeT\
30796that may influence the line breaking algorithm for pure left-to-right
30797text.  When a paragraph containing math mode material is broken into
30798lines \TeX\ may generate lines where math mode material is not enclosed
30799by properly nested \.{\\mathon} and \.{\\mathoff} nodes.  Unboxing such
30800lines as part of a new paragraph may have the effect that hyphenation is
30801attempted for `words' originating from math mode or that hyphenation is
30802inhibited for words originating from horizontal mode.
30803
30804In \TeXXeT\ additional \.{\\beginM}, resp.\ \.{\\endM} math nodes are
30805supplied at the start, resp.\ end of lines such that math mode material
30806inside a horizontal list always starts with either \.{\\mathon} or
30807\.{\\beginM} and ends with \.{\\mathoff} or \.{\\endM}.  These
30808additional nodes are transparent to operations such as \.{\\unskip},
30809\.{\\lastpenalty}, or \.{\\lastbox} but they do have the effect that
30810hyphenation is never attempted for `words' originating from math mode
30811and is never inhibited for words originating from horizontal mode.
30812
30813@d TeXXeT_state==eTeX_state(TeXXeT_code)
30814@d TeXXeT_en==(TeXXeT_state>0) {is \TeXXeT\ enabled?}
30815
30816@d XeTeX_upwards_state==eTeX_state(XeTeX_upwards_code)
30817@d XeTeX_upwards==(XeTeX_upwards_state>0)
30818
30819@d XeTeX_use_glyph_metrics_state==eTeX_state(XeTeX_use_glyph_metrics_code)
30820@d XeTeX_use_glyph_metrics==(XeTeX_use_glyph_metrics_state>0)
30821
30822@d XeTeX_inter_char_tokens_state==eTeX_state(XeTeX_inter_char_tokens_code)
30823@d XeTeX_inter_char_tokens_en==(XeTeX_inter_char_tokens_state>0)
30824
30825@d XeTeX_dash_break_state == eTeX_state(XeTeX_dash_break_code)
30826@d XeTeX_dash_break_en == (XeTeX_dash_break_state>0)
30827
30828@d XeTeX_input_normalization_state == eTeX_state(XeTeX_input_normalization_code)
30829@d XeTeX_tracing_fonts_state == eTeX_state(XeTeX_tracing_fonts_code)
30830
30831@d XeTeX_default_input_mode == eTeX_state(XeTeX_default_input_mode_code)
30832@d XeTeX_default_input_encoding == eTeX_state(XeTeX_default_input_encoding_code)
30833
30834@<Cases for |print_param|@>=
30835suppress_fontnotfound_error_code:print_esc("suppressfontnotfounderror");
30836eTeX_state_code+TeXXeT_code:print_esc("TeXXeTstate");
30837eTeX_state_code+XeTeX_upwards_code:print_esc("XeTeXupwardsmode");
30838eTeX_state_code+XeTeX_use_glyph_metrics_code:print_esc("XeTeXuseglyphmetrics");
30839eTeX_state_code+XeTeX_inter_char_tokens_code:print_esc("XeTeXinterchartokenstate");
30840eTeX_state_code+XeTeX_dash_break_code:print_esc("XeTeXdashbreakstate");
30841eTeX_state_code+XeTeX_input_normalization_code:print_esc("XeTeXinputnormalization");
30842eTeX_state_code+XeTeX_tracing_fonts_code:print_esc("XeTeXtracingfonts");
30843
30844@ @<Generate all \eTeX...@>=
30845primitive("suppressfontnotfounderror",assign_int,int_base+suppress_fontnotfound_error_code);@/
30846primitive("TeXXeTstate",assign_int,eTeX_state_base+TeXXeT_code);
30847@!@:TeXXeT_state_}{\.{\\TeXXeT_state} primitive@>
30848primitive("XeTeXupwardsmode",assign_int,eTeX_state_base+XeTeX_upwards_code);
30849@!@:XeTeX_upwards_mode_}{\.{\\XeTeXupwardsmode} primitive@>
30850primitive("XeTeXuseglyphmetrics",assign_int,eTeX_state_base+XeTeX_use_glyph_metrics_code);
30851@!@:XeTeX_use_glyph_metrics_}{\.{\\XeTeXuseglyphmetrics} primitive@>
30852primitive("XeTeXinterchartokenstate",assign_int,eTeX_state_base+XeTeX_inter_char_tokens_code);
30853@!@:XeTeX_use_inter_char_tokens_}{\.{\\XeTeXinterchartokenstate} primitive@>
30854
30855primitive("XeTeXdashbreakstate",assign_int,eTeX_state_base+XeTeX_dash_break_code);
30856@!@:XeTeX_dash_break_state_}{\.{\\XeTeXdashbreakstate} primitive@>
30857
30858primitive("XeTeXinputnormalization",assign_int,eTeX_state_base+XeTeX_input_normalization_code);
30859@!@:XeTeX_input_normalization_}{\.{\\XeTeXinputnormalization} primitive@>
30860
30861primitive("XeTeXtracingfonts",assign_int,eTeX_state_base+XeTeX_tracing_fonts_code);
30862
30863primitive("XeTeXinputencoding",extension,XeTeX_input_encoding_extension_code);
30864primitive("XeTeXdefaultencoding",extension,XeTeX_default_encoding_extension_code);
30865primitive("beginL",valign,begin_L_code);
30866@!@:beginL_}{\.{\\beginL} primitive@>
30867primitive("endL",valign,end_L_code);
30868@!@:endL_}{\.{\\endL} primitive@>
30869primitive("beginR",valign,begin_R_code);
30870@!@:beginR_}{\.{\\beginR} primitive@>
30871primitive("endR",valign,end_R_code);
30872@!@:endR_}{\.{\\endR} primitive@>
30873
30874@ @<Cases of |valign| for |print_cmd_chr|@>=
30875else case chr_code of
30876  begin_L_code: print_esc("beginL");
30877  end_L_code: print_esc("endL");
30878  begin_R_code: print_esc("beginR");
30879  othercases print_esc("endR")
30880  endcases
30881
30882@ @<Cases of |main_control| for |hmode+valign|@>=
30883if cur_chr>0 then
30884  begin if eTeX_enabled(TeXXeT_en,cur_cmd,cur_chr) then
30885@.Improper \\beginL@>
30886@.Improper \\endL@>
30887@.Improper \\beginR@>
30888@.Improper \\endR@>
30889    tail_append(new_math(0,cur_chr));
30890  end
30891else
30892
30893@ An hbox with subtype dlist will never be reversed, even when embedded
30894in right-to-left text.
30895
30896@<Display if this box is never to be reversed@>=
30897if (type(p)=hlist_node)and(box_lr(p)=dlist) then print(", display")
30898
30899@ A number of routines are based on a stack of one-word nodes whose
30900|info| fields contain |end_M_code|, |end_L_code|, or |end_R_code|.  The
30901top of the stack is pointed to by |LR_ptr|.
30902
30903When the stack manipulation macros of this section are used below,
30904variable |LR_ptr| might be the global variable declared here for |hpack|
30905and |ship_out|, or might be local to |post_line_break|.
30906
30907@d put_LR(#)==begin temp_ptr:=get_avail; info(temp_ptr):=#;
30908  link(temp_ptr):=LR_ptr; LR_ptr:=temp_ptr;
30909  end
30910@#
30911@d push_LR(#)==put_LR(end_LR_type(#))
30912@#
30913@d pop_LR==begin temp_ptr:=LR_ptr; LR_ptr:=link(temp_ptr);
30914  free_avail(temp_ptr);
30915  end
30916
30917@<Glob...@>=
30918@!LR_ptr:pointer; {stack of LR codes for |hpack|, |ship_out|, and |init_math|}
30919@!LR_problems:integer; {counts missing begins and ends}
30920@!cur_dir:small_number; {current text direction}
30921
30922@ @<Set init...@>=
30923LR_ptr:=null; LR_problems:=0; cur_dir:=left_to_right;
30924
30925@ @<Insert LR nodes at the beg...@>=
30926begin q:=link(temp_head);
30927if LR_ptr<>null then
30928  begin temp_ptr:=LR_ptr; r:=q;
30929  repeat s:=new_math(0,begin_LR_type(info(temp_ptr))); link(s):=r; r:=s;
30930  temp_ptr:=link(temp_ptr);
30931  until temp_ptr=null;
30932  link(temp_head):=r;
30933  end;
30934while q<>cur_break(cur_p) do
30935  begin if not is_char_node(q) then
30936    if type(q)=math_node then @<Adjust \(t)the LR stack for the |p...@>;
30937  q:=link(q);
30938  end;
30939end
30940
30941@ @<Adjust \(t)the LR stack for the |p...@>=
30942if end_LR(q) then
30943  begin if LR_ptr<>null then if info(LR_ptr)=end_LR_type(q) then pop_LR;
30944  end
30945else push_LR(q)
30946
30947@ We use the fact that |q| now points to the node with \.{\\rightskip} glue.
30948
30949@<Insert LR nodes at the end...@>=
30950if LR_ptr<>null then
30951  begin s:=temp_head; r:=link(s);
30952  while r<>q do
30953    begin s:=r; r:=link(s);
30954    end;
30955  r:=LR_ptr;
30956  while r<>null do
30957    begin temp_ptr:=new_math(0,info(r));
30958    link(s):=temp_ptr; s:=temp_ptr; r:=link(r);
30959    end;
30960  link(s):=q;
30961  end
30962
30963@ @<Initialize the LR stack@>=
30964put_LR(before) {this will never match}
30965
30966@ @<Adjust \(t)the LR stack for the |hp...@>=
30967if end_LR(p) then
30968  if info(LR_ptr)=end_LR_type(p) then pop_LR
30969  else  begin incr(LR_problems); type(p):=kern_node; subtype(p):=explicit;
30970    end
30971else push_LR(p)
30972
30973@ @<Check for LR anomalies at the end of |hp...@>=
30974begin if info(LR_ptr)<>before then
30975  begin while link(q)<>null do q:=link(q);
30976  repeat temp_ptr:=q; q:=new_math(0,info(LR_ptr)); link(temp_ptr):=q;
30977  LR_problems:=LR_problems+10000; pop_LR;
30978  until info(LR_ptr)=before;
30979  end;
30980if LR_problems>0 then
30981  begin @<Report LR problems@>; goto common_ending;
30982  end;
30983pop_LR;
30984if LR_ptr<>null then confusion("LR1");
30985@:this can't happen LR1}{\quad LR1@>
30986end
30987
30988@ @<Report LR problems@>=
30989begin print_ln; print_nl("\endL or \endR problem (");@/
30990print_int(LR_problems div 10000); print(" missing, ");@/
30991print_int(LR_problems mod 10000); print(" extra");@/
30992LR_problems:=0;
30993end
30994
30995@ @<Initialize |hlist_out| for mixed...@>=
30996if eTeX_ex then
30997  begin @<Initialize the LR stack@>;
30998  if box_lr(this_box)=dlist then
30999    if cur_dir=right_to_left then
31000      begin cur_dir:=left_to_right; cur_h:=cur_h-width(this_box);
31001      end
31002    else set_box_lr(this_box)(0);
31003  if (cur_dir=right_to_left)and(box_lr(this_box)<>reversed) then
31004    @<Reverse the complete hlist and set the subtype to |reversed|@>;
31005  end
31006
31007@ @<Finish |hlist_out| for mixed...@>=
31008if eTeX_ex then
31009  begin @<Check for LR anomalies at the end of |hlist_out|@>;
31010  if box_lr(this_box)=dlist then cur_dir:=right_to_left;
31011  end
31012
31013@ @<Handle a math node in |hlist_out|@>=
31014begin if eTeX_ex then
31015    @<Adjust \(t)the LR stack for the |hlist_out| routine; if necessary
31016      reverse an hlist segment and |goto reswitch|@>;
31017  cur_h:=cur_h+width(p);
31018  end
31019
31020@ Breaking a paragraph into lines while \TeXXeT\ is disabled may result
31021in lines whith unpaired math nodes.  Such hlists are silently accepted
31022in the absence of text direction directives.
31023
31024@d LR_dir(#)==(subtype(#) div R_code) {text direction of a `math node'}
31025
31026@<Adjust \(t)the LR stack for the |hl...@>=
31027begin if end_LR(p) then
31028  if info(LR_ptr)=end_LR_type(p) then pop_LR
31029  else  begin if subtype(p)>L_code then incr(LR_problems);
31030    end
31031else  begin push_LR(p);
31032  if LR_dir(p)<>cur_dir then
31033    @<Reverse an hlist segment and |goto reswitch|@>;
31034  end;
31035type(p):=kern_node;
31036end
31037
31038@ @<Check for LR anomalies at the end of |hl...@>=
31039begin while info(LR_ptr)<>before do
31040  begin if info(LR_ptr)>L_code then LR_problems:=LR_problems+10000;
31041  pop_LR;
31042  end;
31043pop_LR;
31044end
31045
31046@ @d edge_node=style_node {a |style_node| does not occur in hlists}
31047@d edge_node_size=style_node_size {number of words in an edge node}
31048@d edge_dist(#)==depth(#) {new |left_edge| position relative to |cur_h|
31049   (after |width| has been taken into account)}
31050
31051@<Declare procedures needed in |hlist_out|, |vlist_out|@>=
31052function new_edge(@!s:small_number;@!w:scaled):pointer;
31053  {create an edge node}
31054var p:pointer; {the new node}
31055begin p:=get_node(edge_node_size); type(p):=edge_node; subtype(p):=s;
31056width(p):=w; edge_dist(p):=0; {the |edge_dist| field will be set later}
31057new_edge:=p;
31058end;
31059
31060@ @<Cases of |hlist_out| that arise...@>=
31061edge_node: begin cur_h:=cur_h+width(p);
31062  left_edge:=cur_h+edge_dist(p); cur_dir:=subtype(p);
31063  end;
31064
31065@ We detach the hlist, start a new one consisting of just one kern node,
31066append the reversed list, and set the width of the kern node.
31067
31068@<Reverse the complete hlist...@>=
31069begin save_h:=cur_h; temp_ptr:=p; p:=new_kern(0); link(prev_p):=p;
31070cur_h:=0; link(p):=reverse(this_box,null,cur_g,cur_glue); width(p):=-cur_h;
31071cur_h:=save_h; set_box_lr(this_box)(reversed);
31072end
31073
31074@ We detach the remainder of the hlist, replace the math node by
31075an edge node, and append the reversed hlist segment to it; the tail of
31076the reversed segment is another edge node and the remainder of the
31077original list is attached to it.
31078
31079@<Reverse an hlist segment...@>=
31080begin save_h:=cur_h; temp_ptr:=link(p); rule_wd:=width(p);
31081free_node(p,small_node_size);
31082cur_dir:=reflected; p:=new_edge(cur_dir,rule_wd); link(prev_p):=p;
31083cur_h:=cur_h-left_edge+rule_wd;
31084link(p):=reverse(this_box,new_edge(reflected,0),cur_g,cur_glue);
31085edge_dist(p):=cur_h; cur_dir:=reflected; cur_h:=save_h;
31086goto reswitch;
31087end
31088
31089@ The |reverse| function defined here is responsible to reverse the
31090nodes of an hlist (segment). The first parameter |this_box| is the enclosing
31091hlist node, the second parameter |t| is to become the tail of the reversed
31092list, and the global variable |temp_ptr| is the head of the list to be
31093reversed. Finally |cur_g| and |cur_glue| are the current glue rounding state
31094variables, to be updated by this function. We remove nodes from the original
31095list and add them to the head of the new one.
31096
31097@<Declare procedures needed in |hlist_out|, |vlist_out|@>=
31098function reverse(@!this_box,@!t:pointer; var cur_g:scaled;
31099  var cur_glue:real):pointer;
31100label reswitch,next_p,done;
31101var l:pointer; {the new list}
31102@!p:pointer; {the current node}
31103@!q:pointer; {the next node}
31104@!g_order: glue_ord; {applicable order of infinity for glue}
31105@!g_sign: normal..shrinking; {selects type of glue}
31106@!glue_temp:real; {glue value before rounding}
31107@!m,@!n:halfword; {count of unmatched math nodes}
31108begin g_order:=glue_order(this_box); g_sign:=glue_sign(this_box);
31109l:=t; p:=temp_ptr; m:=min_halfword; n:=min_halfword;
31110loop@+  begin while p<>null do
31111    @<Move node |p| to the new list and go to the next node;
31112    or |goto done| if the end of the reflected segment has been reached@>;
31113  if (t=null)and(m=min_halfword)and(n=min_halfword) then goto done;
31114  p:=new_math(0,info(LR_ptr)); LR_problems:=LR_problems+10000;
31115    {manufacture one missing math node}
31116  end;
31117done:reverse:=l;
31118end;
31119
31120@ @<Move node |p| to the new list...@>=
31121reswitch: if is_char_node(p) then
31122  repeat f:=font(p); c:=character(p);
31123  cur_h:=cur_h+char_width(f)(char_info(f)(c));
31124  q:=link(p); link(p):=l; l:=p; p:=q;
31125  until not is_char_node(p)
31126else @<Move the non-|char_node| |p| to the new list@>
31127
31128@ @<Move the non-|char_node| |p| to the new list@>=
31129begin q:=link(p);
31130case type(p) of
31131hlist_node,vlist_node,rule_node,kern_node: rule_wd:=width(p);
31132@t\4@>@<Cases of |reverse| that need special treatment@>@;
31133edge_node: confusion("LR2");
31134@:this can't happen LR2}{\quad LR2@>
31135othercases goto next_p
31136endcases;@/
31137cur_h:=cur_h+rule_wd;
31138next_p: link(p):=l;
31139if type(p)=kern_node then if (rule_wd=0)or(l=null) then
31140  begin free_node(p,small_node_size); p:=l;
31141  end;
31142l:=p; p:=q;
31143end
31144
31145@ Need to measure |native_word| and picture nodes when reversing!
31146@<Cases of |reverse|...@>=
31147whatsit_node:
31148  if (subtype(p)=native_word_node)
31149  or (subtype(p)=glyph_node)
31150  or (subtype(p)=pic_node)
31151  or (subtype(p)=pdf_node)
31152  then
31153    rule_wd:=width(p)
31154  else
31155    goto next_p;
31156
31157@ Here we compute the effective width of a glue node as in |hlist_out|.
31158
31159@<Cases of |reverse|...@>=
31160glue_node: begin round_glue;
31161  @<Handle a glue node for mixed...@>;
31162  end;
31163
31164@ A ligature node is replaced by a char node.
31165
31166@<Cases of |reverse|...@>=
31167ligature_node: begin flush_node_list(lig_ptr(p));
31168  temp_ptr:=p; p:=get_avail; mem[p]:=mem[lig_char(temp_ptr)]; link(p):=q;
31169  free_node(temp_ptr,small_node_size); goto reswitch;
31170  end;
31171
31172@ Math nodes in an inner reflected segment are modified, those at the
31173outer level are changed into kern nodes.
31174
31175@<Cases of |reverse|...@>=
31176math_node: begin rule_wd:=width(p);
31177if end_LR(p) then
31178  if info(LR_ptr)<>end_LR_type(p) then
31179    begin type(p):=kern_node; incr(LR_problems);
31180    end
31181  else  begin pop_LR;
31182    if n>min_halfword then
31183      begin decr(n); decr(subtype(p)); {change |after| into |before|}
31184      end
31185    else  begin type(p):=kern_node;
31186      if m>min_halfword then decr(m)
31187      else @<Finish the reversed hlist segment and |goto done|@>;
31188      end;
31189    end
31190else  begin push_LR(p);
31191  if (n>min_halfword)or(LR_dir(p)<>cur_dir) then
31192    begin incr(n); incr(subtype(p)); {change |before| into |after|}
31193    end
31194  else  begin type(p):=kern_node; incr(m);
31195    end;
31196  end;
31197end;
31198
31199@ Finally we have found the end of the hlist segment to be reversed; the
31200final math node is released and the remaining list attached to the
31201edge node terminating the reversed segment.
31202
31203@<Finish the reversed...@>=
31204begin free_node(p,small_node_size);
31205link(t):=q; width(t):=rule_wd; edge_dist(t):=-cur_h-rule_wd; goto done;
31206end
31207
31208@ @<Check for LR anomalies at the end of |s...@>=
31209begin if LR_problems>0 then
31210  begin @<Report LR problems@>; print_char(")"); print_ln;
31211  end;
31212if (LR_ptr<>null)or(cur_dir<>left_to_right) then confusion("LR3");
31213@:this can't happen LR3}{\quad LR3@>
31214end
31215
31216@ Some special actions are required for displayed equation in paragraphs
31217with mixed direction texts.  First of all we have to set the text
31218direction preceding the display.
31219
31220@<Set the value of |x| to the text direction before the display@>=
31221if LR_save=null then x:=0
31222else if info(LR_save)>=R_code then x:=-1@+else x:=1
31223
31224@ @<Prepare for display after an empty...@>=
31225begin pop_nest; @<Set the value of |x|...@>;
31226end
31227
31228@ When calculating the natural width, |w|, of the final line preceding
31229the display, we may have to copy all or part of its hlist.  We copy,
31230however, only those parts of the original list that are relevant for the
31231computation of |pre_display_size|.
31232@^data structure assumptions@>
31233
31234@<Declare subprocedures for |init_math|@>=
31235procedure just_copy(@!p,@!h,@!t:pointer);
31236label found,not_found;
31237var @!r:pointer; {current node being fabricated for new list}
31238@!words:0..5; {number of words remaining to be copied}
31239begin while p<>null do
31240  begin words:=1; {this setting occurs in more branches than any other}
31241  if is_char_node(p) then r:=get_avail
31242  else case type(p) of
31243  hlist_node,vlist_node: begin r:=get_node(box_node_size);
31244    mem[r+6]:=mem[p+6]; mem[r+5]:=mem[p+5]; {copy the last two words}
31245    words:=5; list_ptr(r):=null; {this affects |mem[r+5]|}
31246    end;
31247  rule_node: begin r:=get_node(rule_node_size); words:=rule_node_size;
31248    end;
31249  ligature_node: begin r:=get_avail; {only |font| and |character| are needed}
31250    mem[r]:=mem[lig_char(p)]; goto found;
31251    end;
31252  kern_node,math_node: begin r:=get_node(small_node_size);
31253    words:=small_node_size;
31254    end;
31255  glue_node: begin r:=get_node(small_node_size); add_glue_ref(glue_ptr(p));
31256    glue_ptr(r):=glue_ptr(p); leader_ptr(r):=null;
31257    end;
31258  whatsit_node:@<Make a partial copy of the whatsit...@>;
31259  othercases goto not_found
31260  endcases;
31261  while words>0 do
31262    begin decr(words); mem[r+words]:=mem[p+words];
31263    end;
31264  found: link(h):=r; h:=r;
31265  not_found: p:=link(p);
31266  end;
31267link(h):=t;
31268end;
31269
31270@ When the final line ends with R-text, the value |w| refers to the line
31271reflected with respect to the left edge of the enclosing vertical list.
31272
31273@<Prepare for display after a non-empty...@>=
31274if eTeX_ex then @<Let |j| be the prototype box for the display@>;
31275v:=shift_amount(just_box);
31276@<Set the value of |x|...@>;
31277if x>=0 then
31278  begin p:=list_ptr(just_box); link(temp_head):=null;
31279  end
31280else  begin v:=-v-width(just_box);
31281  p:=new_math(0,begin_L_code); link(temp_head):=p;
31282  just_copy(list_ptr(just_box),p,new_math(0,end_L_code));
31283  cur_dir:=right_to_left;
31284  end;
31285v:=v+2*quad(cur_font);
31286if TeXXeT_en then @<Initialize the LR stack@>
31287
31288@ @<Finish the natural width computation@>=
31289if TeXXeT_en then
31290  begin while LR_ptr<>null do pop_LR;
31291  if LR_problems<>0 then
31292    begin w:=max_dimen; LR_problems:=0;
31293    end;
31294  end;
31295cur_dir:=left_to_right; flush_node_list(link(temp_head))
31296
31297@ In the presence of text direction directives we assume that any LR
31298problems have been fixed by the |hpack| routine.  If the final line
31299contains, however, text direction directives while \TeXXeT\ is disabled,
31300then we set |w:=max_dimen|.
31301
31302@<Cases of `Let |d| be the natural...@>=
31303math_node: begin d:=width(p);
31304  if TeXXeT_en then @<Adjust \(t)the LR stack for the |init_math| routine@>
31305  else if subtype(p)>=L_code then
31306    begin w:=max_dimen; goto done;
31307    end;
31308  end;
31309edge_node: begin d:=width(p); cur_dir:=subtype(p);
31310  end;
31311
31312@ @<Adjust \(t)the LR stack for the |i...@>=
31313if end_LR(p) then
31314  begin if info(LR_ptr)=end_LR_type(p) then pop_LR
31315  else if subtype(p)>L_code then
31316    begin w:=max_dimen; goto done;
31317    end
31318  end
31319else  begin push_LR(p);
31320  if LR_dir(p)<>cur_dir then
31321    begin just_reverse(p); p:=temp_head;
31322    end;
31323  end
31324
31325@ @<Declare subprocedures for |init_math|@>=
31326procedure just_reverse(@!p:pointer);
31327label found,done;
31328var l:pointer; {the new list}
31329@!t:pointer; {tail of reversed segment}
31330@!q:pointer; {the next node}
31331@!m,@!n:halfword; {count of unmatched math nodes}
31332begin m:=min_halfword; n:=min_halfword;
31333if link(temp_head)=null then
31334  begin just_copy(link(p),temp_head,null); q:=link(temp_head);
31335  end
31336else  begin q:=link(p); link(p):=null; flush_node_list(link(temp_head));
31337  end;
31338t:=new_edge(cur_dir,0); l:=t; cur_dir:=reflected;
31339while q<>null do
31340  if is_char_node(q) then
31341    repeat p:=q; q:=link(p); link(p):=l; l:=p;
31342    until not is_char_node(q)
31343  else  begin p:=q; q:=link(p);
31344    if type(p)=math_node then
31345      @<Adjust \(t)the LR stack for the |just_reverse| routine@>;
31346    link(p):=l; l:=p;
31347    end;
31348goto done;
31349found:width(t):=width(p); link(t):=q; free_node(p,small_node_size);
31350done:link(temp_head):=l;
31351end;
31352
31353@ @<Adjust \(t)the LR stack for the |j...@>=
31354if end_LR(p) then
31355  if info(LR_ptr)<>end_LR_type(p) then
31356    begin type(p):=kern_node; incr(LR_problems);
31357    end
31358  else  begin pop_LR;
31359    if n>min_halfword then
31360      begin decr(n); decr(subtype(p)); {change |after| into |before|}
31361      end
31362    else  begin if m>min_halfword then decr(m)@+else goto found;
31363      type(p):=kern_node;
31364      end;
31365    end
31366else  begin push_LR(p);
31367  if (n>min_halfword)or(LR_dir(p)<>cur_dir) then
31368    begin incr(n); incr(subtype(p)); {change |before| into |after|}
31369    end
31370  else  begin type(p):=kern_node; incr(m);
31371    end;
31372  end
31373
31374@ The prototype box is an hlist node with the width, glue set, and shift
31375amount of |just_box|, i.e., the last line preceding the display.  Its
31376hlist reflects the current \.{\\leftskip} and \.{\\rightskip}.
31377
31378@<Let |j| be the prototype box for the display@>=
31379begin if right_skip=zero_glue then j:=new_kern(0)
31380else j:=new_param_glue(right_skip_code);
31381if left_skip=zero_glue then p:=new_kern(0)
31382else p:=new_param_glue(left_skip_code);
31383link(p):=j; j:=new_null_box; width(j):=width(just_box);
31384shift_amount(j):=shift_amount(just_box); list_ptr(j):=p;
31385glue_order(j):=glue_order(just_box); glue_sign(j):=glue_sign(just_box);
31386glue_set(j):=glue_set(just_box);
31387end
31388
31389@ At the end of a displayed equation we retrieve the prototype box.
31390
31391@<Local variables for finishing...@>=
31392@!j:pointer; {prototype box}
31393
31394@ @<Retrieve the prototype box@>=
31395if mode=mmode then j:=LR_box
31396
31397@ @<Flush the prototype box@>=
31398flush_node_list(j)
31399
31400@ The |app_display| procedure used to append the displayed equation
31401and\slash or equation number to the current vertical list has three
31402parameters:  the prototype box, the hbox to be appended, and the
31403displacement of the hbox in the display line.
31404
31405@<Declare subprocedures for |after_math|@>=
31406procedure app_display(@!j,@!b:pointer;@!d:scaled);
31407var z:scaled; {width of the line}
31408@!s:scaled; {move the line right this much}
31409@!e:scaled; {distance from right edge of box to end of line}
31410@!x:integer; {|pre_display_direction|}
31411@!p,@!q,@!r,@!t,@!u:pointer; {for list manipulation}
31412begin s:=display_indent; x:=pre_display_direction;
31413if x=0 then shift_amount(b):=s+d
31414else  begin z:=display_width; p:=b;
31415  @<Set up the hlist for the display line@>;
31416  @<Package the display line@>;
31417  end;
31418append_to_vlist(b);
31419end;
31420
31421@ Here we construct the hlist for the display, starting with node |p|
31422and ending with node |q|. We also set |d| and |e| to the amount of
31423kerning to be added before and after the hlist (adjusted for the
31424prototype box).
31425
31426@<Set up the hlist for the display line@>=
31427if x>0 then e:=z-d-width(p)
31428else  begin e:=d; d:=z-e-width(p);
31429  end;
31430if j<>null then
31431  begin b:=copy_node_list(j); height(b):=height(p); depth(b):=depth(p);
31432  s:=s-shift_amount(b); d:=d+s; e:=e+width(b)-z-s;
31433  end;
31434if box_lr(p)=dlist then q:=p {display or equation number}
31435else  begin {display and equation number}
31436  r:=list_ptr(p); free_node(p,box_node_size);
31437  if r=null then confusion("LR4");
31438  if x>0 then
31439    begin p:=r;
31440    repeat q:=r; r:=link(r); {find tail of list}
31441    until r=null;
31442    end
31443  else  begin p:=null; q:=r;
31444    repeat t:=link(r); link(r):=p; p:=r; r:=t; {reverse list}
31445    until r=null;
31446    end;
31447  end
31448
31449@ In the presence of a prototype box we use its shift amount and width
31450to adjust the values of kerning and add these values to the glue nodes
31451inserted to cancel the \.{\\leftskip} and \.{\\rightskip}.  If there is
31452no prototype box (because the display is preceded by an empty
31453paragraph), or if the skip parameters are zero, we just add kerns.
31454
31455The |cancel_glue| macro creates and links a glue node that is, together
31456with another glue node, equivalent to a given amount of kerning.  We can
31457use |j| as temporary pointer, since all we need is |j<>null|.
31458
31459@d cancel_glue(#)==j:=new_skip_param(#); cancel_glue_cont
31460@d cancel_glue_cont(#)==link(#):=j; cancel_glue_cont_cont
31461@d cancel_glue_cont_cont(#)==link(j):=#; cancel_glue_end
31462@d cancel_glue_end(#)==j:=glue_ptr(#); cancel_glue_end_end
31463@d cancel_glue_end_end(#)==
31464stretch_order(temp_ptr):=stretch_order(j);
31465shrink_order(temp_ptr):=shrink_order(j); width(temp_ptr):=#-width(j);
31466stretch(temp_ptr):=-stretch(j); shrink(temp_ptr):=-shrink(j)
31467
31468@<Package the display line@>=
31469if j=null then
31470  begin r:=new_kern(0); t:=new_kern(0); {the widths will be set later}
31471  end
31472else  begin r:=list_ptr(b); t:=link(r);
31473  end;
31474u:=new_math(0,end_M_code);
31475if type(t)=glue_node then {|t| is \.{\\rightskip} glue}
31476  begin cancel_glue(right_skip_code)(q)(u)(t)(e); link(u):=t;
31477  end
31478else  begin width(t):=e; link(t):=u; link(q):=t;
31479  end;
31480u:=new_math(0,begin_M_code);
31481if type(r)=glue_node then {|r| is \.{\\leftskip} glue}
31482  begin cancel_glue(left_skip_code)(u)(p)(r)(d); link(r):=u;
31483  end
31484else  begin width(r):=d; link(r):=p; link(u):=r;
31485  if j=null then
31486    begin b:=hpack(u,natural); shift_amount(b):=s;
31487    end
31488  else list_ptr(b):=u;
31489  end
31490
31491@ The |scan_tokens| feature of \eTeX\ defines the \.{\\scantokens}
31492primitive.
31493
31494@<Generate all \eTeX...@>=
31495primitive("scantokens",input,2);
31496@!@:scan_tokens_}{\.{\\scantokens} primitive@>
31497
31498@ @<Cases of |input| for |print_cmd_chr|@>=
31499else if chr_code=2 then print_esc("scantokens")
31500
31501@ @<Cases for |input|@>=
31502else if cur_chr=2 then pseudo_start
31503
31504@ The global variable |pseudo_files| is used to maintain a stack of
31505pseudo files.  The |info| field of each pseudo file points to a linked
31506list of variable size nodes representing lines not yet processed: the
31507|info| field of the first word contains the size of this node, all the
31508following words contain ASCII codes.
31509
31510@<Glob...@>=
31511@!pseudo_files:pointer; {stack of pseudo files}
31512
31513@ @<Set init...@>=
31514pseudo_files:=null;
31515
31516@ The |pseudo_start| procedure initiates reading from a pseudo file.
31517
31518@<Declare \eTeX\ procedures for ex...@>=
31519procedure@?pseudo_start; forward;@t\2@>
31520
31521@ @<Declare \eTeX\ procedures for tok...@>=
31522procedure pseudo_start;
31523var old_setting:0..max_selector; {holds |selector| setting}
31524@!s:str_number; {string to be converted into a pseudo file}
31525@!l,@!m:pool_pointer; {indices into |str_pool|}
31526@!p,@!q,@!r:pointer; {for list construction}
31527@!w: four_quarters; {four ASCII codes}
31528@!nl,@!sz:integer;
31529begin scan_general_text;
31530old_setting:=selector; selector:=new_string;
31531token_show(temp_head); selector:=old_setting;
31532flush_list(link(temp_head));
31533str_room(1); s:=make_string;
31534@<Convert string |s| into a new pseudo file@>;
31535flush_string;
31536@<Initiate input from new pseudo file@>;
31537end;
31538
31539@ @<Convert string |s| into a new pseudo file@>=
31540str_pool[pool_ptr]:=si(" "); l:=str_start_macro(s);
31541nl:=si(new_line_char);
31542p:=get_avail; q:=p;
31543while l<pool_ptr do
31544  begin m:=l;
31545  while (l<pool_ptr)and(str_pool[l]<>nl) do incr(l);
31546  sz:=(l-m+7)div 4;
31547  if sz=1 then sz:=2;
31548  r:=get_node(sz); link(q):=r; q:=r; info(q):=hi(sz);
31549  while sz>2 do
31550    begin decr(sz); incr(r);
31551    w.b0:=qi(so(str_pool[m])); w.b1:=qi(so(str_pool[m+1]));
31552    w.b2:=qi(so(str_pool[m+2])); w.b3:=qi(so(str_pool[m+3]));
31553    mem[r].qqqq:=w; m:=m+4;
31554    end;
31555  w.b0:=qi(" "); w.b1:=qi(" "); w.b2:=qi(" "); w.b3:=qi(" ");
31556  if l>m then
31557    begin w.b0:=qi(so(str_pool[m]));
31558    if l>m+1 then
31559      begin  w.b1:=qi(so(str_pool[m+1]));
31560      if l>m+2 then
31561        begin  w.b2:=qi(so(str_pool[m+2]));
31562        if l>m+3 then w.b3:=qi(so(str_pool[m+3]));
31563        end;
31564      end;
31565    end;
31566  mem[r+1].qqqq:=w;
31567  if str_pool[l]=nl then incr(l);
31568  end;
31569info(p):=link(p); link(p):=pseudo_files; pseudo_files:=p
31570
31571@ @<Initiate input from new pseudo file@>=
31572begin_file_reading; {set up |cur_file| and new level of input}
31573line:=0; limit:=start; loc:=limit+1; {force line read}
31574if tracing_scan_tokens>0 then
31575  begin if term_offset>max_print_line-3 then print_ln
31576  else if (term_offset>0)or(file_offset>0) then print_char(" ");
31577  name:=19; print("( "); incr(open_parens); update_terminal;
31578  end
31579else name:=18
31580
31581@ Here we read a line from the current pseudo file into |buffer|.
31582
31583@<Declare \eTeX\ procedures for tr...@>=
31584function pseudo_input: boolean; {inputs the next line or returns |false|}
31585var p:pointer; {current line from pseudo file}
31586@!sz:integer; {size of node |p|}
31587@!w:four_quarters; {four ASCII codes}
31588@!r:pointer; {loop index}
31589begin last:=first; {cf.\ Matthew 19\thinspace:\thinspace30}
31590p:=info(pseudo_files);
31591if p=null then pseudo_input:=false
31592else  begin info(pseudo_files):=link(p); sz:=ho(info(p));
31593  if 4*sz-3>=buf_size-last then
31594    @<Report overflow of the input buffer, and abort@>;
31595  last:=first;
31596  for r:=p+1 to p+sz-1 do
31597    begin w:=mem[r].qqqq;
31598    buffer[last]:=w.b0; buffer[last+1]:=w.b1;
31599    buffer[last+2]:=w.b2; buffer[last+3]:=w.b3;
31600    last:=last+4;
31601    end;
31602  if last>=max_buf_stack then max_buf_stack:=last+1;
31603  while (last>first)and(buffer[last-1]=" ") do decr(last);
31604  free_node(p,sz);
31605  pseudo_input:=true;
31606  end;
31607end;
31608
31609@ When we are done with a pseudo file we `close' it.
31610
31611@<Declare \eTeX\ procedures for tr...@>=
31612procedure pseudo_close; {close the top level pseudo file}
31613var p,@!q: pointer;
31614begin p:=link(pseudo_files); q:=info(pseudo_files);
31615free_avail(pseudo_files); pseudo_files:=p;
31616while q<>null do
31617  begin p:=q; q:=link(p); free_node(p,ho(info(p)));
31618  end;
31619end;
31620
31621@ @<Dump the \eTeX\ state@>=
31622while pseudo_files<>null do pseudo_close; {flush pseudo files}
31623
31624@ @<Generate all \eTeX...@>=
31625primitive("readline",read_to_cs,1);@/
31626@!@:read_line_}{\.{\\readline} primitive@>
31627
31628@ @<Cases of |read| for |print_cmd_chr|@>=
31629else print_esc("readline")
31630
31631@ @<Handle \.{\\readline} and |goto done|@>=
31632if j=1 then
31633  begin while loc<=limit do {current line not yet finished}
31634    begin cur_chr:=buffer[loc]; incr(loc);
31635    if cur_chr=" " then cur_tok:=space_token
31636    @+else cur_tok:=cur_chr+other_token;
31637    store_new_token(cur_tok);
31638    end;
31639  goto done;
31640  end
31641
31642@ Here we define the additional conditionals of \eTeX\ as well as the
31643\.{\\unless} prefix.
31644
31645@d if_def_code=17 { `\.{\\ifdefined}' }
31646@d if_cs_code=18 { `\.{\\ifcsname}' }
31647@d if_font_char_code=19 { `\.{\\iffontchar}' }
31648@d if_in_csname_code=20 { `\.{\\ifincsname}' }
31649
31650@<Generate all \eTeX...@>=
31651primitive("unless",expand_after,1);@/
31652@!@:unless_}{\.{\\unless} primitive@>
31653primitive("ifdefined",if_test,if_def_code);
31654@!@:if_defined_}{\.{\\ifdefined} primitive@>
31655primitive("ifcsname",if_test,if_cs_code);
31656@!@:if_cs_name_}{\.{\\ifcsname} primitive@>
31657primitive("iffontchar",if_test,if_font_char_code);
31658@!@:if_font_char_}{\.{\\iffontchar} primitive@>
31659primitive("ifincsname",if_test,if_in_csname_code);
31660@!@:if_in_csname_}{\.{\\ifincsname} primitive@>
31661
31662@ @<Cases of |expandafter| for |print_cmd_chr|@>=
31663else print_esc("unless")
31664
31665@ @<Cases of |if_test| for |print_cmd_chr|@>=
31666if_def_code:print_esc("ifdefined");
31667if_cs_code:print_esc("ifcsname");
31668if_font_char_code:print_esc("iffontchar");
31669if_in_csname_code:print_esc("ifincsname");
31670
31671@ The result of a boolean condition is reversed when the conditional is
31672preceded by \.{\\unless}.
31673
31674@<Negate a boolean conditional and |goto reswitch|@>=
31675begin get_token;
31676if (cur_cmd=if_test)and(cur_chr<>if_case_code) then
31677  begin cur_chr:=cur_chr+unless_code; goto reswitch;
31678  end;
31679print_err("You can't use `"); print_esc("unless"); print("' before `");
31680@.You can't use \\unless...@>
31681print_cmd_chr(cur_cmd,cur_chr); print_char("'");
31682help1("Continue, and I'll forget that it ever happened.");
31683back_error;
31684end
31685
31686@ The conditional \.{\\ifdefined} tests if a control sequence is
31687defined.
31688
31689We need to reset |scanner_status|, since \.{\\outer} control sequences
31690are allowed, but we might be scanning a macro definition or preamble.
31691
31692@<Cases for |conditional|@>=
31693if_def_code:begin save_scanner_status:=scanner_status;
31694  scanner_status:=normal;
31695  get_next; b:=(cur_cmd<>undefined_cs);
31696  scanner_status:=save_scanner_status;
31697  end;
31698
31699@ The conditional \.{\\ifcsname} is equivalent to \.{\{\\expandafter}
31700\.{\}\\expandafter} \.{\\ifdefined} \.{\\csname}, except that no new
31701control sequence will be entered into the hash table (once all tokens
31702preceding the mandatory \.{\\endcsname} have been expanded).
31703
31704@<Cases for |conditional|@>=
31705if_cs_code:begin n:=get_avail; p:=n; {head of the list of characters}
31706  e:=is_in_csname; is_in_csname:=true;
31707  repeat get_x_token;
31708  if cur_cs=0 then store_new_token(cur_tok);
31709  until cur_cs<>0;
31710  if cur_cmd<>end_cs_name then @<Complain about missing \.{\\endcsname}@>;
31711  @<Look up the characters of list |n| in the hash table, and set |cur_cs|@>;
31712  flush_list(n);
31713  b:=(eq_type(cur_cs)<>undefined_cs);
31714  is_in_csname:=e;
31715  end;
31716
31717@ @<Look up the characters of list |n| in the hash table...@>=
31718m:=first; p:=link(n);
31719while p<>null do
31720  begin if m>=max_buf_stack then
31721    begin max_buf_stack:=m+1;
31722    if max_buf_stack=buf_size then
31723      overflow("buffer size",buf_size);
31724@:TeX capacity exceeded buffer size}{\quad buffer size@>
31725    end;
31726  buffer[m]:=info(p) mod max_char_val; incr(m); p:=link(p);
31727  end;
31728if m>first+1 then
31729  cur_cs:=id_lookup(first,m-first) {|no_new_control_sequence| is |true|}
31730else if m=first then cur_cs:=null_cs {the list is empty}
31731else cur_cs:=single_base+buffer[first] {the list has length one}
31732
31733@ The conditional \.{\\iffontchar} tests the existence of a character in
31734a font.
31735
31736@<Cases for |conditional|@>=
31737if_in_csname_code: b:=is_in_csname;
31738if_font_char_code:begin scan_font_ident; n:=cur_val; scan_usv_num;
31739  if is_native_font(n) then
31740    b:=(map_char_to_glyph(n, cur_val) > 0)
31741  else begin
31742    if (font_bc[n]<=cur_val)and(font_ec[n]>=cur_val) then
31743      b:=char_exists(char_info(n)(qi(cur_val)))
31744    else b:=false;
31745    end;
31746  end;
31747
31748@ The |protected| feature of \eTeX\ defines the \.{\\protected} prefix
31749command for macro definitions.  Such macros are protected against
31750expansions when lists of expanded tokens are built, e.g., for \.{\\edef}
31751or during \.{\\write}.
31752
31753@<Generate all \eTeX...@>=
31754primitive("protected",prefix,8);
31755@!@:protected_}{\.{\\protected} primitive@>
31756
31757@ @<Cases of |prefix| for |print_cmd_chr|@>=
31758else if chr_code=8 then print_esc("protected")
31759
31760@ The |get_x_or_protected| procedure is like |get_x_token| except that
31761protected macros are not expanded.
31762
31763@<Declare \eTeX\ procedures for sc...@>=
31764procedure get_x_or_protected; {sets |cur_cmd|, |cur_chr|, |cur_tok|,
31765  and expands non-protected macros}
31766label exit;
31767begin loop@+begin get_token;
31768  if cur_cmd<=max_command then return;
31769  if (cur_cmd>=call)and(cur_cmd<end_template) then
31770    if info(link(cur_chr))=protected_token then return;
31771  expand;
31772  end;
31773exit:end;
31774
31775@ A group entered (or a conditional started) in one file may end in a
31776different file.  Such slight anomalies, although perfectly legitimate,
31777may cause errors that are difficult to locate.  In order to be able to
31778give a warning message when such anomalies occur, \eTeX\ uses the
31779|grp_stack| and |if_stack| arrays to record the initial |cur_boundary|
31780and |cond_ptr| values for each input file.
31781
31782@<Glob...@>=
31783@!grp_stack : array[0..max_in_open] of save_pointer; {initial |cur_boundary|}
31784@!if_stack : array[0..max_in_open] of pointer; {initial |cond_ptr|}
31785
31786@ When a group ends that was apparently entered in a different input
31787file, the |group_warning| procedure is invoked in order to update the
31788|grp_stack|.  If moreover \.{\\tracingnesting} is positive we want to
31789give a warning message.  The situation is, however, somewhat complicated
31790by two facts:  (1)~There may be |grp_stack| elements without a
31791corresponding \.{\\input} file or \.{\\scantokens} pseudo file (e.g.,
31792error insertions from the terminal); and (2)~the relevant information is
31793recorded in the |name_field| of the |input_stack| only loosely
31794synchronized with the |in_open| variable indexing |grp_stack|.
31795
31796@<Declare \eTeX\ procedures for tr...@>=
31797procedure group_warning;
31798var i:0..max_in_open; {index into |grp_stack|}
31799@!w:boolean; {do we need a warning?}
31800begin base_ptr:=input_ptr; input_stack[base_ptr]:=cur_input;
31801  {store current state}
31802i:=in_open; w:=false;
31803while (grp_stack[i]=cur_boundary)and(i>0) do
31804  begin @<Set variable |w| to indicate if this case should be reported@>;
31805  grp_stack[i]:=save_index(save_ptr); decr(i);
31806  end;
31807if w then
31808  begin print_nl("Warning: end of "); print_group(true);
31809@.Warning: end of...@>
31810  print(" of a different file"); print_ln;
31811  if tracing_nesting>1 then show_context;
31812  if history=spotless then history:=warning_issued;
31813  end;
31814end;
31815
31816@ This code scans the input stack in order to determine the type of the
31817current input file.
31818
31819@<Set variable |w| to...@>=
31820if tracing_nesting>0 then
31821  begin while (input_stack[base_ptr].state_field=token_list)or@|
31822    (input_stack[base_ptr].index_field>i) do decr(base_ptr);
31823  if input_stack[base_ptr].name_field>17 then w:=true;
31824  end
31825
31826@ When a conditional ends that was apparently started in a different
31827input file, the |if_warning| procedure is invoked in order to update the
31828|if_stack|.  If moreover \.{\\tracingnesting} is positive we want to
31829give a warning message (with the same complications as above).
31830
31831@<Declare \eTeX\ procedures for tr...@>=
31832procedure if_warning;
31833var i:0..max_in_open; {index into |if_stack|}
31834@!w:boolean; {do we need a warning?}
31835begin base_ptr:=input_ptr; input_stack[base_ptr]:=cur_input;
31836  {store current state}
31837i:=in_open; w:=false;
31838while if_stack[i]=cond_ptr do
31839  begin @<Set variable |w| to...@>;
31840  if_stack[i]:=link(cond_ptr); decr(i);
31841  end;
31842if w then
31843  begin print_nl("Warning: end of "); print_cmd_chr(if_test,cur_if);
31844@.Warning: end of...@>
31845  print_if_line(if_line); print(" of a different file"); print_ln;
31846  if tracing_nesting>1 then show_context;
31847  if history=spotless then history:=warning_issued;
31848  end;
31849end;
31850
31851@ Conversely, the |file_warning| procedure is invoked when a file ends
31852and some groups entered or conditionals started while reading from that
31853file are still incomplete.
31854
31855@<Declare \eTeX\ procedures for tr...@>=
31856procedure file_warning;
31857var p:pointer; {saved value of |save_ptr| or |cond_ptr|}
31858@!l:quarterword; {saved value of |cur_level| or |if_limit|}
31859@!c:quarterword; {saved value of |cur_group| or |cur_if|}
31860@!i:integer; {saved value of |if_line|}
31861begin p:=save_ptr; l:=cur_level; c:=cur_group; save_ptr:=cur_boundary;
31862while grp_stack[in_open]<>save_ptr do
31863  begin decr(cur_level);
31864  print_nl("Warning: end of file when ");
31865@.Warning: end of file when...@>
31866  print_group(true); print(" is incomplete");@/
31867  cur_group:=save_level(save_ptr); save_ptr:=save_index(save_ptr)
31868  end;
31869save_ptr:=p; cur_level:=l; cur_group:=c; {restore old values}
31870p:=cond_ptr; l:=if_limit; c:=cur_if; i:=if_line;
31871while if_stack[in_open]<>cond_ptr do
31872  begin print_nl("Warning: end of file when ");
31873@.Warning: end of file when...@>
31874  print_cmd_chr(if_test,cur_if);
31875  if if_limit=fi_code then print_esc("else");
31876  print_if_line(if_line); print(" is incomplete");@/
31877  if_line:=if_line_field(cond_ptr); cur_if:=subtype(cond_ptr);
31878  if_limit:=type(cond_ptr); cond_ptr:=link(cond_ptr);
31879  end;
31880cond_ptr:=p; if_limit:=l; cur_if:=c; if_line:=i; {restore old values}
31881print_ln;
31882if tracing_nesting>1 then show_context;
31883if history=spotless then history:=warning_issued;
31884end;
31885
31886@ Here are the additional \eTeX\ primitives for expressions.
31887
31888@<Generate all \eTeX...@>=
31889primitive("numexpr",last_item,eTeX_expr-int_val+int_val);
31890@!@:num_expr_}{\.{\\numexpr} primitive@>
31891primitive("dimexpr",last_item,eTeX_expr-int_val+dimen_val);
31892@!@:dim_expr_}{\.{\\dimexpr} primitive@>
31893primitive("glueexpr",last_item,eTeX_expr-int_val+glue_val);
31894@!@:glue_expr_}{\.{\\glueexpr} primitive@>
31895primitive("muexpr",last_item,eTeX_expr-int_val+mu_val);
31896@!@:mu_expr_}{\.{\\muexpr} primitive@>
31897
31898@ @<Cases of |last_item| for |print_cmd_chr|@>=
31899eTeX_expr-int_val+int_val: print_esc("numexpr");
31900eTeX_expr-int_val+dimen_val: print_esc("dimexpr");
31901eTeX_expr-int_val+glue_val: print_esc("glueexpr");
31902eTeX_expr-int_val+mu_val: print_esc("muexpr");
31903
31904@ This code for reducing |cur_val_level| and\slash or negating the
31905result is similar to the one for all the other cases of
31906|scan_something_internal|, with the difference that |scan_expr| has
31907already increased the reference count of a glue specification.
31908
31909@<Process an expression and |return|@>=
31910begin if m<eTeX_mu then
31911  begin case m of
31912  @/@<Cases for fetching a glue value@>@/
31913  end; {there are no other cases}
31914  cur_val_level:=glue_val;
31915  end
31916else if m<eTeX_expr then
31917  begin case m of
31918  @/@<Cases for fetching a mu value@>@/
31919  end; {there are no other cases}
31920  cur_val_level:=mu_val;
31921  end
31922else  begin cur_val_level:=m-eTeX_expr+int_val; scan_expr;
31923  end;
31924while cur_val_level>level do
31925  begin if cur_val_level=glue_val then
31926    begin m:=cur_val; cur_val:=width(m); delete_glue_ref(m);
31927    end
31928  else if cur_val_level=mu_val then mu_error;
31929  decr(cur_val_level);
31930  end;
31931if negative then
31932  if cur_val_level>=glue_val then
31933    begin m:=cur_val; cur_val:=new_spec(m); delete_glue_ref(m);
31934    @<Negate all three glue components of |cur_val|@>;
31935    end
31936  else negate(cur_val);
31937return;
31938end
31939
31940@ @<Declare \eTeX\ procedures for sc...@>=
31941procedure@?scan_expr; forward;@t\2@>
31942
31943@ The |scan_expr| procedure scans and evaluates an expression.
31944
31945@<Declare procedures needed for expressions@>=
31946@t\4@>@<Declare subprocedures for |scan_expr|@>
31947procedure scan_expr; {scans and evaluates an expression}
31948label restart, continue, found;
31949var a,@!b:boolean; {saved values of |arith_error|}
31950@!l:small_number; {type of expression}
31951@!r:small_number; {state of expression so far}
31952@!s:small_number; {state of term so far}
31953@!o:small_number; {next operation or type of next factor}
31954@!e:integer; {expression so far}
31955@!t:integer; {term so far}
31956@!f:integer; {current factor}
31957@!n:integer; {numerator of combined multiplication and division}
31958@!p:pointer; {top of expression stack}
31959@!q:pointer; {for stack manipulations}
31960begin l:=cur_val_level; a:=arith_error; b:=false; p:=null;
31961@<Scan and evaluate an expression |e| of type |l|@>;
31962if b then
31963  begin print_err("Arithmetic overflow");
31964@.Arithmetic overflow@>
31965  help2("I can't evaluate this expression,")@/
31966    ("since the result is out of range.");
31967  error;
31968  if l>=glue_val then
31969    begin delete_glue_ref(e); e:=zero_glue; add_glue_ref(e);
31970    end
31971  else e:=0;
31972  end;
31973arith_error:=a; cur_val:=e; cur_val_level:=l;
31974end;
31975
31976@ Evaluating an expression is a recursive process:  When the left
31977parenthesis of a subexpression is scanned we descend to the next level
31978of recursion; the previous level is resumed with the matching right
31979parenthesis.
31980
31981@d expr_none=0 {\.( seen, or \.( $\langle\it expr\rangle$ \.) seen}
31982@d expr_add=1 {\.( $\langle\it expr\rangle$ \.+ seen}
31983@d expr_sub=2 {\.( $\langle\it expr\rangle$ \.- seen}
31984@d expr_mult=3 {$\langle\it term\rangle$ \.* seen}
31985@d expr_div=4 {$\langle\it term\rangle$ \./ seen}
31986@d expr_scale=5 {$\langle\it term\rangle$ \.*
31987  $\langle\it factor\rangle$ \./ seen}
31988
31989@<Scan and eval...@>=
31990restart: r:=expr_none; e:=0; s:=expr_none; t:=0; n:=0;
31991continue: if s=expr_none then o:=l@+else o:=int_val;
31992@<Scan a factor |f| of type |o| or start a subexpression@>;
31993found: @<Scan the next operator and set |o|@>;
31994arith_error:=b;
31995@<Make sure that |f| is in the proper range@>;
31996case s of @<Cases for evaluation of the current term@>@;
31997end; {there are no other cases}
31998if o>expr_sub then s:=o@+else @<Evaluate the current expression@>;
31999b:=arith_error;
32000if o<>expr_none then goto continue;
32001if p<>null then @<Pop the expression stack and |goto found|@>
32002
32003@ @<Scan the next op...@>=
32004@<Get the next non-blank non-call token@>;
32005if cur_tok=other_token+"+" then o:=expr_add
32006else if cur_tok=other_token+"-" then o:=expr_sub
32007else if cur_tok=other_token+"*" then o:=expr_mult
32008else if cur_tok=other_token+"/" then o:=expr_div
32009else  begin o:=expr_none;
32010  if p=null then
32011    begin if cur_cmd<>relax then back_input;
32012    end
32013  else if cur_tok<>other_token+")" then
32014    begin print_err("Missing ) inserted for expression");
32015@.Missing ) inserted@>
32016    help1("I was expecting to see `+', `-', `*', `/', or `)'. Didn't.");
32017    back_error;
32018    end;
32019  end
32020
32021@ @<Scan a factor...@>=
32022@<Get the next non-blank non-call token@>;
32023if cur_tok=other_token+"(" then
32024  @<Push the expression stack and |goto restart|@>;
32025back_input;
32026if o=int_val then scan_int
32027else if o=dimen_val then scan_normal_dimen
32028else if o=glue_val then scan_normal_glue
32029else scan_mu_glue;
32030f:=cur_val
32031
32032@ @<Declare \eTeX\ procedures for sc...@>=
32033procedure@?scan_normal_glue; forward;@t\2@>@/
32034procedure@?scan_mu_glue; forward;@t\2@>
32035
32036@ Here we declare two trivial procedures in order to avoid mutually
32037recursive procedures with parameters.
32038
32039@<Declare procedures needed for expressions@>=
32040procedure scan_normal_glue;
32041begin scan_glue(glue_val);
32042end;
32043@#
32044procedure scan_mu_glue;
32045begin scan_glue(mu_val);
32046end;
32047
32048@ Parenthesized subexpressions can be inside expressions, and this
32049nesting has a stack.  Seven local variables represent the top of the
32050expression stack:  |p| points to pushed-down entries, if any; |l|
32051specifies the type of expression currently beeing evaluated; |e| is the
32052expression so far and |r| is the state of its evaluation; |t| is the
32053term so far and |s| is the state of its evaluation; finally |n| is the
32054numerator for a combined multiplication and division, if any.
32055
32056@d expr_node_size=4 {number of words in stack entry for subexpressions}
32057@d expr_e_field(#)==mem[#+1].int {saved expression so far}
32058@d expr_t_field(#)==mem[#+2].int {saved term so far}
32059@d expr_n_field(#)==mem[#+3].int {saved numerator}
32060
32061@<Push the expression...@>=
32062begin q:=get_node(expr_node_size); link(q):=p; type(q):=l;
32063subtype(q):=4*s+r;
32064expr_e_field(q):=e; expr_t_field(q):=t; expr_n_field(q):=n;
32065p:=q; l:=o; goto restart;
32066end
32067
32068@ @<Pop the expression...@>=
32069begin f:=e; q:=p;
32070e:=expr_e_field(q); t:=expr_t_field(q); n:=expr_n_field(q);
32071s:=subtype(q) div 4; r:=subtype(q) mod 4;
32072l:=type(q); p:=link(q); free_node(q,expr_node_size);
32073goto found;
32074end
32075
32076@ We want to make sure that each term and (intermediate) result is in
32077the proper range.  Integer values must not exceed |infinity|
32078($2^{31}-1$) in absolute value, dimensions must not exceed |max_dimen|
32079($2^{30}-1$).  We avoid the absolute value of an integer, because this
32080might fail for the value $-2^{31}$ using 32-bit arithmetic.
32081
32082@d num_error(#)== {clear a number or dimension and set |arith_error|}
32083  begin arith_error:=true; #:=0;
32084  end
32085@d glue_error(#)== {clear a glue spec and set |arith_error|}
32086  begin arith_error:=true; delete_glue_ref(#); #:=new_spec(zero_glue);
32087  end
32088
32089@<Make sure that |f|...@>=
32090if (l=int_val)or(s>expr_sub) then
32091  begin if (f>infinity)or(f<-infinity) then num_error(f);
32092  end
32093else if l=dimen_val then
32094  begin if abs(f)>max_dimen then num_error(f);
32095  end
32096else  begin if (abs(width(f))>max_dimen)or@|
32097   (abs(stretch(f))>max_dimen)or@|
32098   (abs(shrink(f))>max_dimen) then glue_error(f);
32099  end
32100
32101@ Applying the factor |f| to the partial term |t| (with the operator
32102|s|) is delayed until the next operator |o| has been scanned.  Here we
32103handle the first factor of a partial term.  A glue spec has to be copied
32104unless the next operator is a right parenthesis; this allows us later on
32105to simply modify the glue components.
32106
32107@d normalize_glue(#)==
32108  if stretch(#)=0 then stretch_order(#):=normal;
32109  if shrink(#)=0 then shrink_order(#):=normal
32110
32111@<Cases for evaluation of the current term@>=
32112expr_none: if (l>=glue_val)and(o<>expr_none) then
32113    begin t:=new_spec(f); delete_glue_ref(f); normalize_glue(t);
32114    end
32115  else t:=f;
32116
32117@ When a term |t| has been completed it is copied to, added to, or
32118subtracted from the expression |e|.
32119
32120@d expr_add_sub(#)==add_or_sub(#,r=expr_sub)
32121@d expr_a(#)==expr_add_sub(#,max_dimen)
32122
32123@<Evaluate the current expression@>=
32124begin s:=expr_none;
32125if r=expr_none then e:=t
32126else if l=int_val then e:=expr_add_sub(e,t,infinity)
32127else if l=dimen_val then e:=expr_a(e,t)
32128else @<Compute the sum or difference of two glue specs@>;
32129r:=o;
32130end
32131
32132@ The function |add_or_sub(x,y,max_answer,negative)| computes the sum
32133(for |negative=false|) or difference (for |negative=true|) of |x| and
32134|y|, provided the absolute value of the result does not exceed
32135|max_answer|.
32136
32137@<Declare subprocedures for |scan_expr|@>=
32138function add_or_sub(@!x,@!y,@!max_answer:integer;@!negative:boolean):integer;
32139var a:integer; {the answer}
32140begin if negative then negate(y);
32141if x>=0 then
32142  if y<=max_answer-x then a:=x+y@+else num_error(a)
32143else if y>=-max_answer-x then a:=x+y@+else num_error(a);
32144add_or_sub:=a;
32145end;
32146
32147@ We know that |stretch_order(e)>normal| implies |stretch(e)<>0| and
32148|shrink_order(e)>normal| implies |shrink(e)<>0|.
32149
32150@<Compute the sum or diff...@>=
32151begin width(e):=expr_a(width(e),width(t));
32152if stretch_order(e)=stretch_order(t) then
32153  stretch(e):=expr_a(stretch(e),stretch(t))
32154else if (stretch_order(e)<stretch_order(t))and(stretch(t)<>0) then
32155  begin stretch(e):=stretch(t); stretch_order(e):=stretch_order(t);
32156  end;
32157if shrink_order(e)=shrink_order(t) then
32158  shrink(e):=expr_a(shrink(e),shrink(t))
32159else if (shrink_order(e)<shrink_order(t))and(shrink(t)<>0) then
32160  begin shrink(e):=shrink(t); shrink_order(e):=shrink_order(t);
32161  end;
32162delete_glue_ref(t); normalize_glue(e);
32163end
32164
32165@ If a multiplication is followed by a division, the two operations are
32166combined into a `scaling' operation.  Otherwise the term |t| is
32167multiplied by the factor |f|.
32168
32169@d expr_m(#)==#:=nx_plus_y(#,f,0)
32170
32171@<Cases for evaluation of the current term@>=
32172expr_mult: if o=expr_div then
32173    begin n:=f; o:=expr_scale;
32174    end
32175  else if l=int_val then t:=mult_integers(t,f)
32176  else if l=dimen_val then expr_m(t)
32177  else  begin expr_m(width(t)); expr_m(stretch(t)); expr_m(shrink(t));
32178    end;
32179
32180@ Here we divide the term |t| by the factor |f|.
32181
32182@d expr_d(#)==#:=quotient(#,f)
32183
32184@<Cases for evaluation of the current term@>=
32185expr_div: if l<glue_val then expr_d(t)
32186  else  begin expr_d(width(t)); expr_d(stretch(t)); expr_d(shrink(t));
32187    end;
32188
32189@ The function |quotient(n,d)| computes the rounded quotient
32190$q=\lfloor n/d+{1\over2}\rfloor$, when $n$ and $d$ are positive.
32191
32192@<Declare subprocedures for |scan_expr|@>=
32193function quotient(@!n,@!d:integer):integer;
32194var negative:boolean; {should the answer be negated?}
32195@!a:integer; {the answer}
32196begin if d=0 then num_error(a)
32197else  begin if d>0 then negative:=false
32198  else  begin negate(d); negative:=true;
32199    end;
32200  if n<0 then
32201    begin negate(n); negative:=not negative;
32202    end;
32203  a:=n div d; n:=n-a*d; d:=n-d; {avoid certain compiler optimizations!}
32204  if d+n>=0 then incr(a);
32205  if negative then negate(a);
32206  end;
32207quotient:=a;
32208end;
32209
32210@ Here the term |t| is multiplied by the quotient $n/f$.
32211
32212@d expr_s(#)==#:=fract(#,n,f,max_dimen)
32213
32214@<Cases for evaluation of the current term@>=
32215expr_scale: if l=int_val then t:=fract(t,n,f,infinity)
32216  else if l=dimen_val then expr_s(t)
32217  else  begin expr_s(width(t)); expr_s(stretch(t)); expr_s(shrink(t));
32218    end;
32219
32220@ Finally, the function |fract(x,n,d,max_answer)| computes the integer
32221$q=\lfloor xn/d+{1\over2}\rfloor$, when $x$, $n$, and $d$ are positive
32222and the result does not exceed |max_answer|.  We can't use floating
32223point arithmetic since the routine must produce identical results in all
32224cases; and it would be too dangerous to multiply by~|n| and then divide
32225by~|d|, in separate operations, since overflow might well occur.  Hence
32226this subroutine simulates double precision arithmetic, somewhat
32227analogous to \MF's |make_fraction| and |take_fraction| routines.
32228
32229@d too_big=88 {go here when the result is too big}
32230
32231@<Declare subprocedures for |scan_expr|@>=
32232function fract(@!x,@!n,@!d,@!max_answer:integer):integer;
32233label found, found1, too_big, done;
32234var negative:boolean; {should the answer be negated?}
32235@!a:integer; {the answer}
32236@!f:integer; {a proper fraction}
32237@!h:integer; {smallest integer such that |2*h>=d|}
32238@!r:integer; {intermediate remainder}
32239@!t:integer; {temp variable}
32240begin if d=0 then goto too_big;
32241a:=0;
32242if d>0 then negative:=false
32243else  begin negate(d); negative:=true;
32244  end;
32245if x<0 then
32246  begin negate(x); negative:=not negative;
32247  end
32248else if x=0 then goto done;
32249if n<0 then
32250  begin negate(n); negative:=not negative;
32251  end;
32252t:=n div d;
32253if t>max_answer div x then goto too_big;
32254a:=t*x; n:=n-t*d;
32255if n=0 then goto found;
32256t:=x div d;
32257if t>(max_answer-a) div n then goto too_big;
32258a:=a+t*n; x:=x-t*d;
32259if x=0 then goto found;
32260if x<n then
32261  begin t:=x; x:=n; n:=t;
32262  end; {now |0<n<=x<d|}
32263@<Compute \(f)$f=\lfloor xn/d+{1\over2}\rfloor$@>@;
32264if f>(max_answer-a) then goto too_big;
32265a:=a+f;
32266found: if negative then negate(a);
32267goto done;
32268too_big: num_error(a);
32269done: fract:=a;
32270end;
32271
32272@ The loop here preserves the following invariant relations
32273between |f|, |x|, |n|, and~|r|:
32274(i)~$f+\lfloor(xn+(r+d))/d\rfloor=\lfloor x_0n_0/d+{1\over2}\rfloor$;
32275(ii)~|-d<=r<0<n<=x<d|, where $x_0$, $n_0$ are the original values of~$x$
32276and $n$.
32277
32278Notice that the computation specifies |(x-d)+x| instead of |(x+x)-d|,
32279because the latter could overflow.
32280
32281@<Compute \(f)$f=\lfloor xn/d+{1\over2}\rfloor$@>=
32282f:=0; r:=(d div 2)-d; h:=-r;
32283loop@+begin if odd(n) then
32284    begin r:=r+x;
32285    if r>=0 then
32286      begin r:=r-d; incr(f);
32287      end;
32288    end;
32289  n:=n div 2;
32290  if n=0 then goto found1;
32291  if x<h then x:=x+x
32292  else  begin t:=x-d; x:=t+x; f:=f+n;
32293      if x<n then
32294        begin if x=0 then goto found1;
32295        t:=x; x:=n; n:=t;
32296        end;
32297    end;
32298  end;
32299found1:
32300
32301@ The \.{\\gluestretch}, \.{\\glueshrink}, \.{\\gluestretchorder}, and
32302\.{\\glueshrinkorder} commands return the stretch and shrink components
32303and their orders of ``infinity'' of a glue specification.
32304
32305@d glue_stretch_order_code=eTeX_int+6 {code for \.{\\gluestretchorder}}
32306@d glue_shrink_order_code=eTeX_int+7 {code for \.{\\glueshrinkorder}}
32307@d glue_stretch_code=eTeX_dim+7 {code for \.{\\gluestretch}}
32308@d glue_shrink_code=eTeX_dim+8 {code for \.{\\glueshrink}}
32309
32310@<Generate all \eTeX...@>=
32311primitive("gluestretchorder",last_item,glue_stretch_order_code);
32312@!@:glue_stretch_order_}{\.{\\gluestretchorder} primitive@>
32313primitive("glueshrinkorder",last_item,glue_shrink_order_code);
32314@!@:glue_shrink_order_}{\.{\\glueshrinkorder} primitive@>
32315primitive("gluestretch",last_item,glue_stretch_code);
32316@!@:glue_stretch_}{\.{\\gluestretch} primitive@>
32317primitive("glueshrink",last_item,glue_shrink_code);
32318@!@:glue_shrink_}{\.{\\glueshrink} primitive@>
32319
32320@ @<Cases of |last_item| for |print_cmd_chr|@>=
32321glue_stretch_order_code: print_esc("gluestretchorder");
32322glue_shrink_order_code: print_esc("glueshrinkorder");
32323glue_stretch_code: print_esc("gluestretch");
32324glue_shrink_code: print_esc("glueshrink");
32325
32326@ @<Cases for fetching an integer value@>=
32327glue_stretch_order_code, glue_shrink_order_code:
32328  begin scan_normal_glue; q:=cur_val;
32329  if m=glue_stretch_order_code then cur_val:=stretch_order(q)
32330  else cur_val:=shrink_order(q);
32331  delete_glue_ref(q);
32332  end;
32333
32334@ @<Cases for fetching a dimension value@>=
32335glue_stretch_code, glue_shrink_code:
32336  begin scan_normal_glue; q:=cur_val;
32337  if m=glue_stretch_code then cur_val:=stretch(q)
32338  else cur_val:=shrink(q);
32339  delete_glue_ref(q);
32340  end;
32341
32342@ The \.{\\mutoglue} and \.{\\gluetomu} commands convert ``math'' glue
32343into normal glue and vice versa; they allow to manipulate math glue with
32344\.{\\gluestretch} etc.
32345
32346@d mu_to_glue_code=eTeX_glue {code for \.{\\mutoglue}}
32347@d glue_to_mu_code=eTeX_mu {code for \.{\\gluetomu}}
32348
32349@<Generate all \eTeX...@>=
32350primitive("mutoglue",last_item,mu_to_glue_code);
32351@!@:mu_to_glue_}{\.{\\mutoglue} primitive@>
32352primitive("gluetomu",last_item,glue_to_mu_code);
32353@!@:glue_to_mu_}{\.{\\gluetomu} primitive@>
32354
32355@ @<Cases of |last_item| for |print_cmd_chr|@>=
32356mu_to_glue_code: print_esc("mutoglue");
32357glue_to_mu_code: print_esc("gluetomu");
32358
32359@ @<Cases for fetching a glue value@>=
32360mu_to_glue_code: scan_mu_glue;
32361
32362@ @<Cases for fetching a mu value@>=
32363glue_to_mu_code: scan_normal_glue;
32364
32365@ \eTeX\ (in extended mode) supports 32768 (i.e., $2^{15}$) count,
32366dimen, skip, muskip, box, and token registers.  As in \TeX\ the first
32367256 registers of each kind are realized as arrays in the table of
32368equivalents; the additional registers are realized as tree structures
32369built from variable-size nodes with individual registers existing only
32370when needed.  Default values are used for nonexistent registers:  zero
32371for count and dimen values, |zero_glue| for glue (skip and muskip)
32372values, void for boxes, and |null| for token lists (and current marks
32373discussed below).
32374
32375Similarly there are 32768 mark classes; the command \.{\\marks}|n|
32376creates a mark node for a given mark class |0<=n<=32767| (where
32377\.{\\marks0} is synonymous to \.{\\mark}).  The page builder (actually
32378the |fire_up| routine) and the |vsplit| routine maintain the current
32379values of |top_mark|, |first_mark|, |bot_mark|, |split_first_mark|, and
32380|split_bot_mark| for each mark class.  They are accessed as
32381\.{\\topmarks}|n| etc., and \.{\\topmarks0} is again synonymous to
32382\.{\\topmark}.  As in \TeX\ the five current marks for mark class zero
32383are realized as |cur_mark| array.  The additional current marks are
32384again realized as tree structure with individual mark classes existing
32385only when needed.
32386
32387@<Generate all \eTeX...@>=
32388primitive("marks",mark,marks_code);
32389@!@:marks_}{\.{\\marks} primitive@>
32390primitive("topmarks",top_bot_mark,top_mark_code+marks_code);
32391@!@:top_marks_}{\.{\\topmarks} primitive@>
32392primitive("firstmarks",top_bot_mark,first_mark_code+marks_code);
32393@!@:first_marks_}{\.{\\firstmarks} primitive@>
32394primitive("botmarks",top_bot_mark,bot_mark_code+marks_code);
32395@!@:bot_marks_}{\.{\\botmarks} primitive@>
32396primitive("splitfirstmarks",top_bot_mark,split_first_mark_code+marks_code);
32397@!@:split_first_marks_}{\.{\\splitfirstmarks} primitive@>
32398primitive("splitbotmarks",top_bot_mark,split_bot_mark_code+marks_code);
32399@!@:split_bot_marks_}{\.{\\splitbotmarks} primitive@>
32400
32401@ The |scan_register_num| procedure scans a register number that must
32402not exceed 255 in compatibility mode resp.\ 32767 in extended mode.
32403
32404@<Declare \eTeX\ procedures for ex...@>=
32405procedure@?scan_register_num; forward;@t\2@>
32406
32407@ @<Declare procedures that scan restricted classes of integers@>=
32408procedure scan_register_num;
32409begin scan_int;
32410if (cur_val<0)or(cur_val>max_reg_num) then
32411  begin print_err("Bad register code");
32412@.Bad register code@>
32413  help2(max_reg_help_line)("I changed this one to zero.");
32414  int_error(cur_val); cur_val:=0;
32415  end;
32416end;
32417
32418@ @<Initialize variables for \eTeX\ comp...@>=
32419max_reg_num:=255;
32420max_reg_help_line:="A register number must be between 0 and 255.";
32421
32422@ @<Initialize variables for \eTeX\ ext...@>=
32423max_reg_num:=32767;
32424max_reg_help_line:="A register number must be between 0 and 32767.";
32425
32426@ @<Glob...@>=
32427@!max_reg_num: halfword; {largest allowed register number}
32428@!max_reg_help_line: str_number; {first line of help message}
32429
32430@ There are seven almost identical doubly linked trees, one for the
32431sparse array of the up to 32512 additional registers of each kind and
32432one for the sparse array of the up to 32767 additional mark classes.
32433The root of each such tree, if it exists, is an index node containing 16
32434pointers to subtrees for 4096 consecutive array elements.  Similar index
32435nodes are the starting points for all nonempty subtrees for 4096, 256,
32436and 16 consecutive array elements.  These four levels of index nodes are
32437followed by a fifth level with nodes for the individual array elements.
32438
32439Each index node is nine words long.  The pointers to the 16 possible
32440subtrees or are kept in the |info| and |link| fields of the last eight
32441words.  (It would be both elegant and efficient to declare them as
32442array, unfortunately \PASCAL\ doesn't allow this.)
32443
32444The fields in the first word of each index node and in the nodes for the
32445array elements are closely related.  The |link| field points to the next
32446lower index node and the |sa_index| field contains four bits (one
32447hexadecimal digit) of the register number or mark class.  For the lowest
32448index node the |link| field is |null| and the |sa_index| field indicates
32449the type of quantity (|int_val|, |dimen_val|, |glue_val|, |mu_val|,
32450|box_val|, |tok_val|, or |mark_val|).  The |sa_used| field in the index
32451nodes counts how many of the 16 pointers are non-null.
32452
32453The |sa_index| field in the nodes for array elements contains the four
32454bits plus 16 times the type.  Therefore such a node represents a count
32455or dimen register if and only if |sa_index<dimen_val_limit|; it
32456represents a skip or muskip register if and only if
32457|dimen_val_limit<=sa_index<mu_val_limit|; it represents a box register
32458if and only if |mu_val_limit<=sa_index<box_val_limit|; it represents a
32459token list register if and only if
32460|box_val_limit<=sa_index<tok_val_limit|; finally it represents a mark
32461class if and only if |tok_val_limit<=sa_index|.
32462
32463The |new_index| procedure creates an index node (returned in |cur_ptr|)
32464having given contents of the |sa_index| and |link| fields.
32465
32466@d box_val==4 {the additional box registers}
32467@d mark_val=7 {the additional mark classes}
32468@#
32469@d dimen_val_limit=@"20 {$2^4\cdot(|dimen_val|+1)$}
32470@d mu_val_limit=@"40 {$2^4\cdot(|mu_val|+1)$}
32471@d box_val_limit=@"50 {$2^4\cdot(|box_val|+1)$}
32472@d tok_val_limit=@"60 {$2^4\cdot(|tok_val|+1)$}
32473@#
32474@d index_node_size=9 {size of an index node}
32475@d sa_index==type {a four-bit address or a type or both}
32476@d sa_used==subtype {count of non-null pointers}
32477
32478@<Declare \eTeX\ procedures for ex...@>=
32479procedure new_index(@!i:quarterword; @!q:pointer);
32480var k:small_number; {loop index}
32481begin cur_ptr:=get_node(index_node_size); sa_index(cur_ptr):=i;
32482sa_used(cur_ptr):=0; link(cur_ptr):=q;
32483for k:=1 to index_node_size-1 do {clear all 16 pointers}
32484  mem[cur_ptr+k]:=sa_null;
32485end;
32486
32487@ The roots of the seven trees for the additional registers and mark
32488classes are kept in the |sa_root| array.  The first six locations must
32489be dumped and undumped; the last one is also known as |sa_mark|.
32490
32491@d sa_mark==sa_root[mark_val] {root for mark classes}
32492
32493@<Glob...@>=
32494@!sa_root:array[int_val..mark_val] of pointer; {roots of sparse arrays}
32495@!cur_ptr:pointer; {value returned by |new_index| and |find_sa_element|}
32496@!sa_null:memory_word; {two |null| pointers}
32497
32498@ @<Set init...@>=
32499sa_mark:=null; sa_null.hh.lh:=null; sa_null.hh.rh:=null;
32500
32501@ @<Initialize table...@>=
32502for i:=int_val to inter_char_val do sa_root[i]:=null;
32503
32504@ Given a type |t| and a sixteen-bit number |n|, the |find_sa_element|
32505procedure returns (in |cur_ptr|) a pointer to the node for the
32506corresponding array element, or |null| when no such element exists.  The
32507third parameter |w| is set |true| if the element must exist, e.g.,
32508because it is about to be modified.  The procedure has two main
32509branches:  one follows the existing tree structure, the other (only used
32510when |w| is |true|) creates the missing nodes.
32511
32512We use macros to extract the four-bit pieces from a sixteen-bit register
32513number or mark class and to fetch or store one of the 16 pointers from
32514an index node.
32515
32516@d if_cur_ptr_is_null_then_return_or_goto(#)== {some tree element is missing}
32517  begin if cur_ptr=null then
32518    if w then goto #@+else return;
32519  end
32520@#
32521@d hex_dig1(#)==# div 4096 {the fourth lowest hexadecimal digit}
32522@d hex_dig2(#)==(# div 256) mod 16 {the third lowest hexadecimal digit}
32523@d hex_dig3(#)==(# div 16) mod 16 {the second lowest hexadecimal digit}
32524@d hex_dig4(#)==# mod 16 {the lowest hexadecimal digit}
32525@#
32526@d get_sa_ptr==if odd(i) then cur_ptr:=link(q+(i div 2)+1)
32527  else cur_ptr:=info(q+(i div 2)+1)
32528    {set |cur_ptr| to the pointer indexed by |i| from index node |q|}
32529@d put_sa_ptr(#)==if odd(i) then link(q+(i div 2)+1):=#
32530  else info(q+(i div 2)+1):=#
32531    {store the pointer indexed by |i| in index node |q|}
32532@d add_sa_ptr==begin put_sa_ptr(cur_ptr); incr(sa_used(q));
32533  end {add |cur_ptr| as the pointer indexed by |i| in index node |q|}
32534@d delete_sa_ptr==begin put_sa_ptr(null); decr(sa_used(q));
32535  end {delete the pointer indexed by |i| in index node |q|}
32536
32537@<Declare \eTeX\ procedures for ex...@>=
32538procedure find_sa_element(@!t:small_number;@!n:halfword;@!w:boolean);
32539  {sets |cur_val| to sparse array element location or |null|}
32540label not_found,not_found1,not_found2,not_found3,not_found4,exit;
32541var q:pointer; {for list manipulations}
32542@!i:small_number; {a four bit index}
32543begin cur_ptr:=sa_root[t];
32544if_cur_ptr_is_null_then_return_or_goto(not_found);@/
32545q:=cur_ptr; i:=hex_dig1(n); get_sa_ptr;
32546if_cur_ptr_is_null_then_return_or_goto(not_found1);@/
32547q:=cur_ptr; i:=hex_dig2(n); get_sa_ptr;
32548if_cur_ptr_is_null_then_return_or_goto(not_found2);@/
32549q:=cur_ptr; i:=hex_dig3(n); get_sa_ptr;
32550if_cur_ptr_is_null_then_return_or_goto(not_found3);@/
32551q:=cur_ptr; i:=hex_dig4(n); get_sa_ptr;
32552if (cur_ptr=null)and w then goto not_found4;
32553return;
32554not_found: new_index(t,null); {create first level index node}
32555sa_root[t]:=cur_ptr; q:=cur_ptr; i:=hex_dig1(n);
32556not_found1: new_index(i,q); {create second level index node}
32557add_sa_ptr; q:=cur_ptr; i:=hex_dig2(n);
32558not_found2: new_index(i,q); {create third level index node}
32559add_sa_ptr; q:=cur_ptr; i:=hex_dig3(n);
32560not_found3: new_index(i,q); {create fourth level index node}
32561add_sa_ptr; q:=cur_ptr; i:=hex_dig4(n);
32562not_found4: @<Create a new array element of type |t| with index |i|@>;
32563link(cur_ptr):=q; add_sa_ptr;
32564exit:end;
32565
32566@ The array elements for registers are subject to grouping and have an
32567|sa_lev| field (quite analogous to |eq_level|) instead of |sa_used|.
32568Since saved values as well as shorthand definitions (created by e.g.,
32569\.{\\countdef}) refer to the location of the respective array element,
32570we need a reference count that is kept in the |sa_ref| field.  An array
32571element can be deleted (together with all references to it) when its
32572|sa_ref| value is |null| and its value is the default value.
32573@^reference counts@>
32574
32575Skip, muskip, box, and token registers use two word nodes, their values
32576are stored in the |sa_ptr| field.
32577Count and dimen registers use three word nodes, their
32578values are stored in the |sa_int| resp.\ |sa_dim| field in the third
32579word; the |sa_ptr| field is used under the name |sa_num| to store
32580the register number.  Mark classes use four word nodes.  The last three
32581words contain the five types of current marks
32582
32583@d sa_lev==sa_used {grouping level for the current value}
32584@d pointer_node_size=2 {size of an element with a pointer value}
32585@d sa_type(#)==(sa_index(#) div 16) {type part of combined type/index}
32586@d sa_ref(#)==info(#+1) {reference count of a sparse array element}
32587@d sa_ptr(#)==link(#+1) {a pointer value}
32588@#
32589@d word_node_size=3 {size of an element with a word value}
32590@d sa_num==sa_ptr {the register number}
32591@d sa_int(#)==mem[#+2].int {an integer}
32592@d sa_dim(#)==mem[#+2].sc {a dimension (a somewhat esotheric distinction)}
32593@#
32594@d mark_class_node_size=4 {size of an element for a mark class}
32595@#
32596@d fetch_box(#)== {fetch |box(cur_val)|}
32597  if cur_val<256 then #:=box(cur_val)
32598  else  begin find_sa_element(box_val,cur_val,false);
32599    if cur_ptr=null then #:=null@+else #:=sa_ptr(cur_ptr);
32600    end
32601
32602@<Create a new array element...@>=
32603if t=mark_val then {a mark class}
32604  begin cur_ptr:=get_node(mark_class_node_size);
32605  mem[cur_ptr+1]:=sa_null; mem[cur_ptr+2]:=sa_null; mem[cur_ptr+3]:=sa_null;
32606  end
32607else  begin if t<=dimen_val then {a count or dimen register}
32608    begin cur_ptr:=get_node(word_node_size); sa_int(cur_ptr):=0;
32609    sa_num(cur_ptr):=n;
32610    end
32611  else  begin cur_ptr:=get_node(pointer_node_size);
32612    if t<=mu_val then {a skip or muskip register}
32613      begin sa_ptr(cur_ptr):=zero_glue; add_glue_ref(zero_glue);
32614      end
32615    else sa_ptr(cur_ptr):=null; {a box or token list register}
32616    end;
32617  sa_ref(cur_ptr):=null; {all registers have a reference count}
32618  end;
32619sa_index(cur_ptr):=16*t+i; sa_lev(cur_ptr):=level_one
32620
32621@ The |delete_sa_ref| procedure is called when a pointer to an array
32622element representing a register is being removed; this means that the
32623reference count should be decreased by one.  If the reduced reference
32624count is |null| and the register has been (globally) assigned its
32625default value the array element should disappear, possibly together with
32626some index nodes.  This procedure will never be used for mark class
32627nodes.
32628@^reference counts@>
32629
32630@d add_sa_ref(#)==incr(sa_ref(#)) {increase reference count}
32631@#
32632@d change_box(#)== {change |box(cur_val)|, the |eq_level| stays the same}
32633  if cur_val<256 then box(cur_val):=#@+else set_sa_box(#)
32634@#
32635@d set_sa_box(#)==begin find_sa_element(box_val,cur_val,false);
32636  if cur_ptr<>null then
32637    begin sa_ptr(cur_ptr):=#; add_sa_ref(cur_ptr); delete_sa_ref(cur_ptr);
32638    end;
32639  end
32640
32641@<Declare \eTeX\ procedures for tr...@>=
32642procedure delete_sa_ref(@!q:pointer); {reduce reference count}
32643label exit;
32644var p:pointer; {for list manipulations}
32645@!i:small_number; {a four bit index}
32646@!s:small_number; {size of a node}
32647begin decr(sa_ref(q));
32648if sa_ref(q)<>null then return;
32649if sa_index(q)<dimen_val_limit then
32650 if sa_int(q)=0 then s:=word_node_size
32651 else return
32652else  begin if sa_index(q)<mu_val_limit then
32653    if sa_ptr(q)=zero_glue then delete_glue_ref(zero_glue)
32654    else return
32655  else if sa_ptr(q)<>null then return;
32656  s:=pointer_node_size;
32657  end;
32658repeat i:=hex_dig4(sa_index(q)); p:=q; q:=link(p); free_node(p,s);
32659if q=null then {the whole tree has been freed}
32660  begin sa_root[i]:=null; return;
32661  end;
32662delete_sa_ptr; s:=index_node_size; {node |q| is an index node}
32663until sa_used(q)>0;
32664exit:end;
32665
32666@ The |print_sa_num| procedure prints the register number corresponding
32667to an array element.
32668
32669@<Basic print...@>=
32670procedure print_sa_num(@!q:pointer); {print register number}
32671var @!n:halfword; {the register number}
32672begin if sa_index(q)<dimen_val_limit then n:=sa_num(q) {the easy case}
32673else  begin n:=hex_dig4(sa_index(q)); q:=link(q); n:=n+16*sa_index(q);
32674  q:=link(q); n:=n+256*(sa_index(q)+16*sa_index(link(q)));
32675  end;
32676print_int(n);
32677end;
32678
32679@ Here is a procedure that displays the contents of an array element
32680symbolically.  It is used under similar circumstances as is
32681|restore_trace| (together with |show_eqtb|) for the quantities kept in
32682the |eqtb| array.
32683
32684@<Declare \eTeX\ procedures for tr...@>=
32685@!stat procedure show_sa(@!p:pointer;@!s:str_number);
32686var t:small_number; {the type of element}
32687begin begin_diagnostic; print_char("{"); print(s); print_char(" ");
32688if p=null then print_char("?") {this can't happen}
32689else  begin t:=sa_type(p);
32690  if t<box_val then print_cmd_chr(register,p)
32691  else if t=box_val then
32692    begin print_esc("box"); print_sa_num(p);
32693    end
32694  else if t=tok_val then print_cmd_chr(toks_register,p)
32695  else print_char("?"); {this can't happen either}
32696  print_char("=");
32697  if t=int_val then print_int(sa_int(p))
32698  else if t=dimen_val then
32699    begin print_scaled(sa_dim(p)); print("pt");
32700    end
32701  else  begin p:=sa_ptr(p);
32702    if t=glue_val then print_spec(p,"pt")
32703    else if t=mu_val then print_spec(p,"mu")
32704    else if t=box_val then
32705      if p=null then print("void")
32706      else  begin depth_threshold:=0; breadth_max:=1; show_node_list(p);
32707        end
32708    else if t=tok_val then
32709      begin if p<>null then show_token_list(link(p),null,32);
32710      end
32711    else print_char("?"); {this can't happen either}
32712    end;
32713  end;
32714print_char("}"); end_diagnostic(false);
32715end;
32716tats
32717
32718@ Here we compute the pointer to the current mark of type |t| and mark
32719class |cur_val|.
32720
32721@<Compute the mark pointer...@>=
32722begin find_sa_element(mark_val,cur_val,false);
32723if cur_ptr<>null then
32724  if odd(t) then cur_ptr:=link(cur_ptr+(t div 2)+1)
32725  else cur_ptr:=info(cur_ptr+(t div 2)+1);
32726end
32727
32728@ The current marks for all mark classes are maintained by the |vsplit|
32729and |fire_up| routines and are finally destroyed (for \.{INITEX} only)
32730@.INITEX@>
32731by the |final_cleanup| routine.  Apart from updating the current marks
32732when mark nodes are encountered, these routines perform certain actions
32733on all existing mark classes.  The recursive |do_marks| procedure walks
32734through the whole tree or a subtree of existing mark class nodes and
32735preforms certain actions indicted by its first parameter |a|, the action
32736code.  The second parameter |l| indicates the level of recursion (at
32737most four); the third parameter points to a nonempty tree or subtree.
32738The result is |true| if the complete tree or subtree has been deleted.
32739
32740@d vsplit_init==0 {action code for |vsplit| initialization}
32741@d fire_up_init==1 {action code for |fire_up| initialization}
32742@d fire_up_done==2 {action code for |fire_up| completion}
32743@d destroy_marks==3 {action code for |final_cleanup|}
32744@#
32745@d sa_top_mark(#)==info(#+1) {\.{\\topmarks}|n|}
32746@d sa_first_mark(#)==link(#+1) {\.{\\firstmarks}|n|}
32747@d sa_bot_mark(#)==info(#+2) {\.{\\botmarks}|n|}
32748@d sa_split_first_mark(#)==link(#+2) {\.{\\splitfirstmarks}|n|}
32749@d sa_split_bot_mark(#)==info(#+3) {\.{\\splitbotmarks}|n|}
32750
32751@<Declare the function called |do_marks|@>=
32752function do_marks(@!a,@!l:small_number;@!q:pointer):boolean;
32753var i:small_number; {a four bit index}
32754begin if l<4 then {|q| is an index node}
32755  begin for i:=0 to 15 do
32756    begin get_sa_ptr;
32757    if cur_ptr<>null then if do_marks(a,l+1,cur_ptr) then delete_sa_ptr;
32758    end;
32759  if sa_used(q)=0 then
32760    begin free_node(q,index_node_size); q:=null;
32761    end;
32762  end
32763else {|q| is the node for a mark class}
32764  begin case a of
32765  @<Cases for |do_marks|@>@;
32766  end; {there are no other cases}
32767  if sa_bot_mark(q)=null then if sa_split_bot_mark(q)=null then
32768    begin free_node(q,mark_class_node_size); q:=null;
32769    end;
32770  end;
32771do_marks:=(q=null);
32772end;
32773
32774@ At the start of the |vsplit| routine the existing |split_fist_mark|
32775and |split_bot_mark| are discarded.
32776
32777@<Cases for |do_marks|@>=
32778vsplit_init: if sa_split_first_mark(q)<>null then
32779  begin delete_token_ref(sa_split_first_mark(q)); sa_split_first_mark(q):=null;
32780  delete_token_ref(sa_split_bot_mark(q)); sa_split_bot_mark(q):=null;
32781  end;
32782
32783@ We use again the fact that |split_first_mark=null| if and only if
32784|split_bot_mark=null|.
32785
32786@<Update the current marks for |vsplit|@>=
32787begin find_sa_element(mark_val,mark_class(p),true);
32788if sa_split_first_mark(cur_ptr)=null then
32789  begin sa_split_first_mark(cur_ptr):=mark_ptr(p);
32790  add_token_ref(mark_ptr(p));
32791  end
32792else delete_token_ref(sa_split_bot_mark(cur_ptr));
32793sa_split_bot_mark(cur_ptr):=mark_ptr(p);
32794add_token_ref(mark_ptr(p));
32795end
32796
32797@ At the start of the |fire_up| routine the old |top_mark| and
32798|first_mark| are discarded, whereas the old |bot_mark| becomes the new
32799|top_mark|.  An empty new |top_mark| token list is, however, discarded
32800as well in order that mark class nodes can eventually be released.  We
32801use again the fact that |bot_mark<>null| implies |first_mark<>null|; it
32802also knows that |bot_mark=null| implies |top_mark=first_mark=null|.
32803
32804@<Cases for |do_marks|@>=
32805fire_up_init: if sa_bot_mark(q)<>null then
32806  begin if sa_top_mark(q)<>null then delete_token_ref(sa_top_mark(q));
32807  delete_token_ref(sa_first_mark(q)); sa_first_mark(q):=null;
32808  if link(sa_bot_mark(q))=null then {an empty token list}
32809    begin delete_token_ref(sa_bot_mark(q)); sa_bot_mark(q):=null;
32810    end
32811  else add_token_ref(sa_bot_mark(q));
32812  sa_top_mark(q):=sa_bot_mark(q);
32813  end;
32814
32815@ @<Cases for |do_marks|@>=
32816fire_up_done: if (sa_top_mark(q)<>null)and(sa_first_mark(q)=null) then
32817  begin sa_first_mark(q):=sa_top_mark(q); add_token_ref(sa_top_mark(q));
32818  end;
32819
32820@ @<Update the current marks for |fire_up|@>=
32821begin find_sa_element(mark_val,mark_class(p),true);
32822if sa_first_mark(cur_ptr)=null then
32823  begin sa_first_mark(cur_ptr):=mark_ptr(p);
32824  add_token_ref(mark_ptr(p));
32825  end;
32826if sa_bot_mark(cur_ptr)<>null then delete_token_ref(sa_bot_mark(cur_ptr));
32827sa_bot_mark(cur_ptr):=mark_ptr(p); add_token_ref(mark_ptr(p));
32828end
32829
32830@ Here we use the fact that the five current mark pointers in a mark
32831class node occupy the same locations as the the first five pointers of
32832an index node.  For systems using a run-time switch to distinguish
32833between \.{VIRTEX} and \.{INITEX}, the codewords `$|init|\ldots|tini|$'
32834surrounding the following piece of code should be removed.
32835@.INITEX@>
32836@^system dependencies@>
32837
32838@<Cases for |do_marks|@>=
32839@!init destroy_marks: for i:=top_mark_code to split_bot_mark_code do
32840  begin get_sa_ptr;
32841  if cur_ptr<>null then
32842    begin delete_token_ref(cur_ptr); put_sa_ptr(null);
32843    end;
32844  end;
32845tini
32846
32847@ The command code |register| is used for `\.{\\count}', `\.{\\dimen}',
32848etc., as well as for references to sparse array elements defined by
32849`\.{\\countdef}', etc.
32850
32851@<Cases of |register| for |print_cmd_chr|@>=
32852begin if (chr_code<mem_bot)or(chr_code>lo_mem_stat_max) then
32853  cmd:=sa_type(chr_code)
32854else  begin cmd:=chr_code-mem_bot; chr_code:=null;
32855  end;
32856if cmd=int_val then print_esc("count")
32857else if cmd=dimen_val then print_esc("dimen")
32858else if cmd=glue_val then print_esc("skip")
32859else print_esc("muskip");
32860if chr_code<>null then print_sa_num(chr_code);
32861end
32862
32863@ Similarly the command code |toks_register| is used for `\.{\\toks}' as
32864well as for references to sparse array elements defined by
32865`\.{\\toksdef}'.
32866
32867@<Cases of |toks_register| for |print_cmd_chr|@>=
32868begin print_esc("toks");
32869if chr_code<>mem_bot then print_sa_num(chr_code);
32870end
32871
32872@ When a shorthand definition for an element of one of the sparse arrays
32873is destroyed, we must reduce the reference count.
32874
32875@<Cases for |eq_destroy|@>=
32876toks_register,register:
32877  if (equiv_field(w)<mem_bot)or(equiv_field(w)>lo_mem_stat_max) then
32878    delete_sa_ref(equiv_field(w));
32879
32880@ The task to maintain (change, save, and restore) register values is
32881essentially the same when the register is realized as sparse array
32882element or entry in |eqtb|.  The global variable |sa_chain| is the head
32883of a linked list of entries saved at the topmost level |sa_level|; the
32884lists for lowel levels are kept in special save stack entries.
32885
32886@<Glob...@>=
32887@!sa_chain: pointer; {chain of saved sparse array entries}
32888@!sa_level: quarterword; {group level for |sa_chain|}
32889
32890@ @<Set init...@>=
32891sa_chain:=null; sa_level:=level_zero;
32892
32893@ The individual saved items are kept in pointer or word nodes similar
32894to those used for the array elements: a word node with value zero is,
32895however, saved as pointer node with the otherwise impossible |sa_index|
32896value |tok_val_limit|.
32897
32898@d sa_loc==sa_ref {location of saved item}
32899
32900@<Declare \eTeX\ procedures for tr...@>=
32901procedure sa_save(@!p:pointer); {saves value of |p|}
32902var q:pointer; {the new save node}
32903@!i:quarterword; {index field of node}
32904begin if cur_level<>sa_level then
32905  begin check_full_save_stack; save_type(save_ptr):=restore_sa;
32906  save_level(save_ptr):=sa_level; save_index(save_ptr):=sa_chain;
32907  incr(save_ptr); sa_chain:=null; sa_level:=cur_level;
32908  end;
32909i:=sa_index(p);
32910if i<dimen_val_limit then
32911  begin if sa_int(p)=0 then
32912    begin q:=get_node(pointer_node_size); i:=tok_val_limit;
32913    end
32914  else  begin q:=get_node(word_node_size); sa_int(q):=sa_int(p);
32915    end;
32916  sa_ptr(q):=null;
32917  end
32918else  begin q:=get_node(pointer_node_size); sa_ptr(q):=sa_ptr(p);
32919  end;
32920sa_loc(q):=p; sa_index(q):=i; sa_lev(q):=sa_lev(p);
32921link(q):=sa_chain; sa_chain:=q; add_sa_ref(p);
32922end;
32923
32924@ @<Declare \eTeX\ procedures for tr...@>=
32925procedure sa_destroy(@!p:pointer); {destroy value of |p|}
32926begin if sa_index(p)<mu_val_limit then delete_glue_ref(sa_ptr(p))
32927else if sa_ptr(p)<>null then
32928  if sa_index(p)<box_val_limit then flush_node_list(sa_ptr(p))
32929  else delete_token_ref(sa_ptr(p));
32930end;
32931
32932@ The procedure |sa_def| assigns a new value to sparse array elements,
32933and saves the former value if appropriate.  This procedure is used only
32934for skip, muskip, box, and token list registers.  The counterpart of
32935|sa_def| for count and dimen registers is called |sa_w_def|.
32936
32937@d sa_define(#)==if e then
32938    if global then gsa_def(#)@+else sa_def(#)
32939  else define
32940@#
32941@d sa_def_box== {assign |cur_box| to |box(cur_val)|}
32942  begin find_sa_element(box_val,cur_val,true);
32943  if global then gsa_def(cur_ptr,cur_box)@+else sa_def(cur_ptr,cur_box);
32944  end
32945@#
32946@d sa_word_define(#)==if e then
32947    if global then gsa_w_def(#)@+else sa_w_def(#)
32948  else word_define(#)
32949
32950@<Declare \eTeX\ procedures for tr...@>=
32951procedure sa_def(@!p:pointer;@!e:halfword);
32952  {new data for sparse array elements}
32953begin add_sa_ref(p);
32954if sa_ptr(p)=e then
32955  begin @!stat if tracing_assigns>0 then show_sa(p,"reassigning");@+tats@;@/
32956  sa_destroy(p);
32957  end
32958else  begin @!stat if tracing_assigns>0 then show_sa(p,"changing");@+tats@;@/
32959  if sa_lev(p)=cur_level then sa_destroy(p)@+else sa_save(p);
32960  sa_lev(p):=cur_level; sa_ptr(p):=e;
32961  @!stat if tracing_assigns>0 then show_sa(p,"into");@+tats@;@/
32962  end;
32963delete_sa_ref(p);
32964end;
32965@#
32966procedure sa_w_def(@!p:pointer;@!w:integer);
32967begin add_sa_ref(p);
32968if sa_int(p)=w then
32969  begin @!stat if tracing_assigns>0 then show_sa(p,"reassigning");@+tats@;@/
32970  end
32971else  begin @!stat if tracing_assigns>0 then show_sa(p,"changing");@+tats@;@/
32972  if sa_lev(p)<>cur_level then sa_save(p);
32973  sa_lev(p):=cur_level; sa_int(p):=w;
32974  @!stat if tracing_assigns>0 then show_sa(p,"into");@+tats@;@/
32975  end;
32976delete_sa_ref(p);
32977end;
32978
32979@ The |sa_def| and |sa_w_def| routines take care of local definitions.
32980@^global definitions@>
32981Global definitions are done in almost the same way, but there is no need
32982to save old values, and the new value is associated with |level_one|.
32983
32984@<Declare \eTeX\ procedures for tr...@>=
32985procedure gsa_def(@!p:pointer;@!e:halfword); {global |sa_def|}
32986begin add_sa_ref(p);
32987@!stat if tracing_assigns>0 then show_sa(p,"globally changing");@+tats@;@/
32988sa_destroy(p); sa_lev(p):=level_one; sa_ptr(p):=e;
32989@!stat if tracing_assigns>0 then show_sa(p,"into");@+tats@;@/
32990delete_sa_ref(p);
32991end;
32992@#
32993procedure gsa_w_def(@!p:pointer;@!w:integer); {global |sa_w_def|}
32994begin add_sa_ref(p);
32995@!stat if tracing_assigns>0 then show_sa(p,"globally changing");@+tats@;@/
32996sa_lev(p):=level_one; sa_int(p):=w;
32997@!stat if tracing_assigns>0 then show_sa(p,"into");@+tats@;@/
32998delete_sa_ref(p);
32999end;
33000
33001@ The |sa_restore| procedure restores the sparse array entries pointed
33002at by |sa_chain|
33003
33004@<Declare \eTeX\ procedures for tr...@>=
33005procedure sa_restore;
33006var p:pointer; {sparse array element}
33007begin repeat p:=sa_loc(sa_chain);
33008if sa_lev(p)=level_one then
33009  begin if sa_index(p)>=dimen_val_limit then sa_destroy(sa_chain);
33010  @!stat if tracing_restores>0 then show_sa(p,"retaining");@+tats@;@/
33011  end
33012else  begin if sa_index(p)<dimen_val_limit then
33013    if sa_index(sa_chain)<dimen_val_limit then sa_int(p):=sa_int(sa_chain)
33014    else sa_int(p):=0
33015  else  begin sa_destroy(p); sa_ptr(p):=sa_ptr(sa_chain);
33016    end;
33017  sa_lev(p):=sa_lev(sa_chain);
33018  @!stat if tracing_restores>0 then show_sa(p,"restoring");@+tats@;@/
33019  end;
33020delete_sa_ref(p);
33021p:=sa_chain; sa_chain:=link(p);
33022if sa_index(p)<dimen_val_limit then free_node(p,word_node_size)
33023else free_node(p,pointer_node_size);
33024until sa_chain=null;
33025end;
33026
33027@ When the value of |last_line_fit| is positive, the last line of a
33028(partial) paragraph is treated in a special way and we need additional
33029fields in the active nodes.
33030
33031@d active_node_size_extended=5 {number of words in extended active nodes}
33032@d active_short(#)==mem[#+3].sc {|shortfall| of this line}
33033@d active_glue(#)==mem[#+4].sc {corresponding glue stretch or shrink}
33034
33035@<Glob...@>=
33036@!last_line_fill:pointer; {the |par_fill_skip| glue node of the new paragraph}
33037@!do_last_line_fit:boolean; {special algorithm for last line of paragraph?}
33038@!active_node_size:small_number; {number of words in active nodes}
33039@!fill_width:array[0..2] of scaled; {infinite stretch components of
33040  |par_fill_skip|}
33041@!best_pl_short:array[very_loose_fit..tight_fit] of scaled; {|shortfall|
33042  corresponding to |minimal_demerits|}
33043@!best_pl_glue:array[very_loose_fit..tight_fit] of scaled; {corresponding
33044  glue stretch or shrink}
33045
33046@ The new algorithm for the last line requires that the stretchability of
33047|par_fill_skip| is infinite and the stretchability of |left_skip| plus
33048|right_skip| is finite.
33049
33050@<Check for special...@>=
33051do_last_line_fit:=false; active_node_size:=active_node_size_normal;
33052  {just in case}
33053if last_line_fit>0 then
33054  begin q:=glue_ptr(last_line_fill);
33055  if (stretch(q)>0)and(stretch_order(q)>normal) then
33056    if (background[3]=0)and(background[4]=0)and(background[5]=0) then
33057    begin do_last_line_fit:=true;
33058    active_node_size:=active_node_size_extended;
33059    fill_width[0]:=0; fill_width[1]:=0; fill_width[2]:=0;
33060    fill_width[stretch_order(q)-1]:=stretch(q);
33061    end;
33062  end
33063
33064@ @<Other local variables for |try_break|@>=
33065@!g:scaled; {glue stretch or shrink of test line, adjustment for last line}
33066
33067@ Here we initialize the additional fields of the first active node
33068representing the beginning of the paragraph.
33069
33070@<Initialize additional fields of the first active node@>=
33071begin active_short(q):=0; active_glue(q):=0;
33072end
33073
33074@ Here we compute the adjustment |g| and badness |b| for a line from |r|
33075to the end of the paragraph.  When any of the criteria for adjustment is
33076violated we fall through to the normal algorithm.
33077
33078The last line must be too short, and have infinite stretch entirely due
33079to |par_fill_skip|.
33080
33081@<Perform computations for last line and |goto found|@>=
33082begin if (active_short(r)=0)or(active_glue(r)<=0) then goto not_found;
33083  {previous line was neither stretched nor shrunk, or was infinitely bad}
33084if (cur_active_width[3]<>fill_width[0])or@|
33085  (cur_active_width[4]<>fill_width[1])or@|
33086  (cur_active_width[5]<>fill_width[2]) then goto not_found;
33087  {infinite stretch of this line not entirely due to |par_fill_skip|}
33088if active_short(r)>0 then g:=cur_active_width[2]
33089else g:=cur_active_width[6];
33090if g<=0 then goto not_found; {no finite stretch resp.\ no shrink}
33091arith_error:=false; g:=fract(g,active_short(r),active_glue(r),max_dimen);
33092if last_line_fit<1000 then g:=fract(g,last_line_fit,1000,max_dimen);
33093if arith_error then
33094  if active_short(r)>0 then g:=max_dimen@+else g:=-max_dimen;
33095if g>0 then
33096  @<Set the value of |b| to the badness of the last line for stretching,
33097    compute the corresponding |fit_class|, and |goto found|@>
33098else if g<0 then
33099  @<Set the value of |b| to the badness of the last line for shrinking,
33100    compute the corresponding |fit_class|, and |goto found|@>;
33101not_found:end
33102
33103@ These badness computations are rather similar to those of the standard
33104algorithm, with the adjustment amount |g| replacing the |shortfall|.
33105
33106@<Set the value of |b| to the badness of the last line for str...@>=
33107begin if g>shortfall then g:=shortfall;
33108if g>7230584 then if cur_active_width[2]<1663497 then
33109  begin b:=inf_bad; fit_class:=very_loose_fit; goto found;
33110  end;
33111b:=badness(g,cur_active_width[2]);
33112if b>12 then
33113  if b>99 then fit_class:=very_loose_fit
33114  else fit_class:=loose_fit
33115else fit_class:=decent_fit;
33116goto found;
33117end
33118
33119@ @<Set the value of |b| to the badness of the last line for shr...@>=
33120begin if -g>cur_active_width[6] then g:=-cur_active_width[6];
33121b:=badness(-g,cur_active_width[6]);
33122if b>12 then fit_class:=tight_fit@+else fit_class:=decent_fit;
33123goto found;
33124end
33125
33126@ Vanishing values of |shortfall| and |g| indicate that the last line is
33127not adjusted.
33128
33129@<Adjust \(t)the additional data for last line@>=
33130begin if cur_p=null then shortfall:=0;
33131if shortfall>0 then g:=cur_active_width[2]
33132else if shortfall<0 then g:=cur_active_width[6]
33133else g:=0;
33134end
33135
33136@ For each feasible break we record the shortfall and glue stretch or
33137shrink (or adjustment).
33138
33139@<Store \(a)additional data for this feasible break@>=
33140begin best_pl_short[fit_class]:=shortfall; best_pl_glue[fit_class]:=g;
33141end
33142
33143@ Here we save these data in the active node representing a potential
33144line break.
33145
33146@<Store \(a)additional data in the new active node@>=
33147begin active_short(q):=best_pl_short[fit_class];
33148active_glue(q):=best_pl_glue[fit_class];
33149end
33150
33151@ @<Print additional data in the new active node@>=
33152begin print(" s="); print_scaled(active_short(q));
33153if cur_p=null then print(" a=")@+else print(" g=");
33154print_scaled(active_glue(q));
33155end
33156
33157@ Here we either reset |do_last_line_fit| or adjust the |par_fill_skip|
33158glue.
33159
33160@<Adjust \(t)the final line of the paragraph@>=
33161if active_short(best_bet)=0 then do_last_line_fit:=false
33162else  begin q:=new_spec(glue_ptr(last_line_fill));
33163  delete_glue_ref(glue_ptr(last_line_fill));
33164  width(q):=width(q)+active_short(best_bet)-active_glue(best_bet);
33165  stretch(q):=0; glue_ptr(last_line_fill):=q;
33166  end
33167
33168@ When reading \.{\\patterns} while \.{\\savinghyphcodes} is positive
33169the current |lc_code| values are stored together with the hyphenation
33170patterns for the current language.  They will later be used instead of
33171the |lc_code| values for hyphenation purposes.
33172
33173The |lc_code| values are stored in the linked trie analogous to patterns
33174$p_1$ of length~1, with |hyph_root=trie_r[0]| replacing |trie_root| and
33175|lc_code(p_1)| replacing the |trie_op| code.  This allows to compress
33176and pack them together with the patterns with minimal changes to the
33177existing code.
33178
33179@d hyph_root==trie_r[0] {root of the linked trie for |hyph_codes|}
33180
33181@<Initialize table entries...@>=
33182
33183@ @<Store hyphenation codes for current language@>=
33184begin c:=cur_lang; first_child:=false; p:=0;
33185repeat q:=p; p:=trie_r[q];
33186until (p=0)or(c<=so(trie_c[p]));
33187if (p=0)or(c<so(trie_c[p])) then
33188  @<Insert a new trie node between |q| and |p|, and
33189    make |p| point to it@>;
33190q:=p; {now node |q| represents |cur_lang|}
33191@<Store all current |lc_code| values@>;
33192end
33193
33194@ We store all nonzero |lc_code| values, overwriting any previously
33195stored values (and possibly wasting a few trie nodes that were used
33196previously and are not needed now).  We always store at least one
33197|lc_code| value such that |hyph_index| (defined below) will not be zero.
33198
33199@<Store all current |lc_code| values@>=
33200p:=trie_l[q]; first_child:=true;
33201for c:=0 to 255 do
33202  if (lc_code(c)>0)or((c=255)and first_child) then
33203    begin if p=0 then
33204      @<Insert a new trie node between |q| and |p|, and
33205        make |p| point to it@>
33206    else trie_c[p]:=si(c);
33207    trie_o[p]:=qi(lc_code(c));
33208    q:=p; p:=trie_r[q]; first_child:=false;
33209    end;
33210if first_child then trie_l[q]:=0@+else trie_r[q]:=0
33211
33212@ We must avoid to ``take'' location~1, in order to distinguish between
33213|lc_code| values and patterns.
33214
33215@<Pack all stored |hyph_codes|@>=
33216begin if trie_root=0 then for p:=0 to 255 do trie_min[p]:=p+2;
33217first_fit(hyph_root); trie_pack(hyph_root);
33218hyph_start:=trie_ref[hyph_root];
33219end
33220
33221@ The global variable |hyph_index| will point to the hyphenation codes
33222for the current language.
33223
33224@d set_hyph_index== {set |hyph_index| for current language}
33225  if trie_char(hyph_start+cur_lang)<>qi(cur_lang)
33226    then hyph_index:=0 {no hyphenation codes for |cur_lang|}
33227  else hyph_index:=trie_link(hyph_start+cur_lang)
33228@#
33229@d set_lc_code(#)== {set |hc[0]| to hyphenation or lc code for |#|}
33230  if (hyph_index=0) or ((#)>255) then hc[0]:=lc_code(#)
33231  else if trie_char(hyph_index+#)<>qi(#) then hc[0]:=0
33232  else hc[0]:=qo(trie_op(hyph_index+#))
33233
33234@<Glob...@>=
33235@!hyph_start:trie_pointer; {root of the packed trie for |hyph_codes|}
33236@!hyph_index:trie_pointer; {pointer to hyphenation codes for |cur_lang|}
33237
33238@ When |saving_vdiscards| is positive then the glue, kern, and penalty
33239nodes removed by the page builder or by \.{\\vsplit} from the top of a
33240vertical list are saved in special lists instead of being discarded.
33241
33242@d tail_page_disc==disc_ptr[copy_code] {last item removed by page builder}
33243@d page_disc==disc_ptr[last_box_code] {first item removed by page builder}
33244@d split_disc==disc_ptr[vsplit_code] {first item removed by \.{\\vsplit}}
33245
33246@<Glob...@>=
33247@!disc_ptr:array[copy_code..vsplit_code] of pointer; {list pointers}
33248
33249@ @<Set init...@>=
33250page_disc:=null; split_disc:=null;
33251
33252@ The \.{\\pagediscards} and \.{\\splitdiscards} commands share the
33253command code |un_vbox| with \.{\\unvbox} and \.{\\unvcopy}, they are
33254distinguished by their |chr_code| values |last_box_code| and
33255|vsplit_code|.  These |chr_code| values are larger than |box_code| and
33256|copy_code|.
33257
33258@<Generate all \eTeX...@>=
33259primitive("pagediscards",un_vbox,last_box_code);@/
33260@!@:page_discards_}{\.{\\pagediscards} primitive@>
33261primitive("splitdiscards",un_vbox,vsplit_code);@/
33262@!@:split_discards_}{\.{\\splitdiscards} primitive@>
33263
33264@ @<Cases of |un_vbox| for |print_cmd_chr|@>=
33265else if chr_code=last_box_code then print_esc("pagediscards")
33266else if chr_code=vsplit_code then print_esc("splitdiscards")
33267
33268@ @<Handle saved items and |goto done|@>=
33269begin link(tail):=disc_ptr[cur_chr]; disc_ptr[cur_chr]:=null;
33270goto done;
33271end
33272
33273@ The \.{\\interlinepenalties}, \.{\\clubpenalties}, \.{\\widowpenalties},
33274and \.{\\displaywidowpenalties} commands allow to define arrays of
33275penalty values to be used instead of the corresponding single values.
33276
33277@d inter_line_penalties_ptr==equiv(inter_line_penalties_loc)
33278@d club_penalties_ptr==equiv(club_penalties_loc)
33279@d widow_penalties_ptr==equiv(widow_penalties_loc)
33280@d display_widow_penalties_ptr==equiv(display_widow_penalties_loc)
33281
33282@<Generate all \eTeX...@>=
33283primitive("interlinepenalties",set_shape,inter_line_penalties_loc);@/
33284@!@:inter_line_penalties_}{\.{\\interlinepenalties} primitive@>
33285primitive("clubpenalties",set_shape,club_penalties_loc);@/
33286@!@:club_penalties_}{\.{\\clubpenalties} primitive@>
33287primitive("widowpenalties",set_shape,widow_penalties_loc);@/
33288@!@:widow_penalties_}{\.{\\widowpenalties} primitive@>
33289primitive("displaywidowpenalties",set_shape,display_widow_penalties_loc);@/
33290@!@:display_widow_penalties_}{\.{\\displaywidowpenalties} primitive@>
33291
33292@ @<Cases of |set_shape| for |print_cmd_chr|@>=
33293inter_line_penalties_loc: print_esc("interlinepenalties");
33294club_penalties_loc: print_esc("clubpenalties");
33295widow_penalties_loc: print_esc("widowpenalties");
33296display_widow_penalties_loc: print_esc("displaywidowpenalties");
33297
33298@ @<Fetch a penalties array element@>=
33299begin scan_int;
33300if (equiv(m)=null)or(cur_val<0) then cur_val:=0
33301else  begin if cur_val>penalty(equiv(m)) then cur_val:=penalty(equiv(m));
33302  cur_val:=penalty(equiv(m)+cur_val);
33303  end;
33304end
33305
33306@* \[54] System-dependent changes.
33307This section should be replaced, if necessary, by any special
33308modifications of the program
33309that are necessary to make \TeX\ work at a particular installation.
33310It is usually best to design your change file so that all changes to
33311previous sections preserve the section numbering; then everybody's version
33312will be consistent with the published program. More extensive changes,
33313which introduce new sections, can be inserted here; then only the index
33314itself will get a new section number.
33315@^system dependencies@>
33316
33317@* \[55] Index.
33318Here is where you can find all uses of each identifier in the program,
33319with underlined entries pointing to where the identifier was defined.
33320If the identifier is only one letter long, however, you get to see only
33321the underlined entries. {\sl All references are to section numbers instead of
33322page numbers.}
33323
33324This index also lists error messages and other aspects of the program
33325that you might want to look up some day. For example, the entry
33326for ``system dependencies'' lists all sections that should receive
33327special attention from people who are installing \TeX\ in a new
33328operating environment. A list of various things that can't happen appears
33329under ``this can't happen''. Approximately 40 sections are listed under
33330``inner loop''; these account for about 60\pct! of \TeX's running time,
33331exclusive of input and output.
33332